Print

Print


Hi Lukasz,

I believe Matevz was referring to the redirect counter issue.

The redirect counter issue really nailed us over the weekend - the CERN redirectors got up to 1KHz of clients logging in / out.  We had to turn off cross-region redirection, and I'm not quite sure how we will be able to re-enable it.

Brian

On Sep 3, 2012, at 10:55 AM, Lukasz Janyst <[log in to unmask]> wrote:

> A ma ready to build what is in stable now. Waiting for a confirmation from Matevz.
>  
> Lukasz
>  
> On Monday, September 03, 2012 12:51:30 PM Lukasz Janyst wrote:
> > Hi Matevz,
> > 
> >    What problem are you referring to? The redirect counter issue or the
> > thread safety issues? The thread safety is already there in 3.2.2 so if you
> > still see the problems then we need to investigate...
> > 
> > Cheers,
> >    Lukasz
> > 
> > On Friday, August 31, 2012 01:07:56 PM Matevz Tadel wrote:
> > > Luckily we just noticed this one show up again:
> > >    https://savannah.cern.ch/bugs/?93794
> > > 
> > > Please include the fix!
> > > 
> > > Cheers,
> > > Matevz
> > > 
> > > On 08/31/12 08:08, Brian Bockelman wrote:
> > > > Perfectly fine by me.
> > > > 
> > > > So:
> > > > 1) Detailed monitoring fix.
> > > > 2) Sendfile monitoring fix.
> > > > 
> > > > I can't think of anything else?
> > > > 
> > > > Brian
> > > > 
> > > > On Aug 31, 2012, at 9:58 AM, Lukasz Janyst <[log in to unmask]
> > > > 
> > > > <mailto:[log in to unmask]>> wrote:
> > > >> Yes, but I won't manage today. Is Monday fine with you?
> > > >> 
> > > >> Lukasz
> > > >> 
> > > >> On Friday, August 31, 2012 07:34:44 AM Brian Bockelman wrote:
> > > >> > Can we cut a 3.2.3 patch release with these two fixes?
> > > >> > 
> > > >> > Brian
> > > >> > 
> > > >> > On Aug 31, 2012, at 12:14 AM, "Yang, Wei" <[log in to unmask]
> > > >> 
> > > >> <mailto:[log in to unmask]>> wrote:
> > > >> > > I tested the second. I didn't get a chance to test the 1st before I
> > > >> > > lost
> > > >> > > the window of restarting the cluster. But I have sendfile() turned
> > > >> > > off
> > > >> > > and I do get correct results, so it implicitly confirms the 1st
> > > >> > > one.
> > > >> > > 
> > > >> > > regards,
> > > >> > > Wei Yang | [log in to unmask] <mailto:[log in to unmask]>
> > > >> > > |
> > > >> 
> > > >> 650-926-3338(O)
> > > >> 
> > > >> > > On Aug 30, 2012, at 9:29 PM, Wilko Kroeger wrote:
> > > >> > >> Hello Brian
> > > >> > >> 
> > > >> > >> Yes, we also noticed that the detailed monitoring is not working
> > > >> > >> in
> > > >> > >> v3.2.2. We build a version on top of v3.2.2 adding the two
> > > >> > >> commits:
> > > >> > >> 
> > > >> > >> commit e0ad3459c89a163e600070a15936b8fd5d26ff35
> > > >> > >> Author: Andrew Hanushevsky <[log in to unmask]
> > > >> > >> <mailto:[log in to unmask]>>
> > > >> > >> Date: Wed Aug 22 18:56:19 2012 -0700
> > > >> > >> 
> > > >> > >> Make sure read statistics are updated for sendfile() and mmap I/O.
> > > >> > >> 
> > > >> > >> commit e51db4bb0178a21bbe87ccf7c9349b079c2d7455
> > > >> > >> Author: Andrew Hanushevsky <[log in to unmask]
> > > >> > >> <mailto:[log in to unmask]>>
> > > >> > >> Date: Mon Jul 30 16:52:56 2012 -0700
> > > >> > >> 
> > > >> > >> Correct monitor initialization test to start monitor under all
> > > >> > >> configs.
> > > >> > >> 
> > > >> > >> As far as I can tell the detailed monitoring is now working. Wei
> > > >> > >> might
> > > >> > >> have done more testing.
> > > >> > >> 
> > > >> > >> Cheers,
> > > >> > >> 
> > > >> > >> Wilko
> > > >> > >> 
> > > >> > >> On Thu, 30 Aug 2012, Brian Bockelman wrote:
> > > >> > >>> Hi Andy,
> > > >> > >>> 
> > > >> > >>> The core wasn't interesting. However, I tracked it down to this
> > > >> > >>> change
> > > >> > >>> (line 334 in XrdXrootdConfig.cc):
> > > >> > >>> 
> > > >> > >>> if ((!isRedir || (RQList.Next() != 0 &&
> > > >> > >>> XrdXrootdMonitor::Redirect())))
> > > >> > >>> 
> > > >> > >>> became:
> > > >> > >>> 
> > > >> > >>> if ((!isRedir || (RQList.Next() != 0)) &&
> > > >> > >>> XrdXrootdMonitor::Redirect())
> > > >> > >>> 
> > > >> > >>> (in 3.2.2). In master, it is this test:
> > > >> > >>> 
> > > >> > >>> if (!isRedir || XrdXrootdMonitor::Redirect())
> > > >> > >>> 
> > > >> > >>> Note that XrdXrootdMonitor::Redirect always returns 0 (I suspect
> > > >> > >>> the bug
> > > >> > >>> is this).
> > > >> > >>> 
> > > >> > >>> So, basically, I think detailed monitoring is broken in the 3.2.2
> > > >> > >>> release. Matevz, take note...
> > > >> > >>> 
> > > >> > >>> What's the minimal patch? I can ask OSG to push this out ASAP.
> > > >> > >>> 
> > > >> > >>> Brian
> > > >> > >>> 
> > > >> > >>> On Aug 28, 2012, at 9:26 PM, Andrew Hanushevsky <[log in to unmask]
> > > >> 
> > > >> <mailto:[log in to unmask]>> wrote:
> > > >> > >>>> Hi Brian,
> > > >> > >>>> 
> > > >> > >>>> Best to get a gcore on this one. Seems like the monitoring did
> > > >> > >>>> not
> > > >> > >>>> initialize correctly as it's trying to send to fd 0.
> > > >> > >>>> 
> > > >> > >>>> Andy
> > > >> > >>>> 
> > > >> > >>>> -----Original Message----- From: Brian Bockelman
> > > >> > >>>> Sent: Tuesday, August 28, 2012 7:15 PM
> > > >> > >>>> To: <[log in to unmask]
> > > >> > >>>> <mailto:[log in to unmask]>>
> > > >> > >>>> Subject: Strange detailed monitoring issue
> > > >> > >>>> 
> > > >> > >>>> After a power outage locally, Matevz noticed he is not receiving
> > > >> > >>>> monitoring messages.
> > > >> > >>>> 
> > > >> > >>>> Sure enough, from strace:
> > > >> > >>>> 
> > > >> > >>>> [pid 1705] sendto(0,
> > > >> > >>>> "t8\5\270\0\0\0\0\340\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0\30
> > > >> > >>>> 2^
> > > >> > >>>> v#".
> > > >> > >>>> .., 1464, 0, {sa_family=AF_UNSPEC,
> > > >> > >>>> sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = -1 ENOTSOCK
> > > >> > >>>> (Socket
> > > >> > >>>> operation on non-socket)
> > > >> > >>>> 
> > > >> > >>>> Version info:
> > > >> > >>>> 
> > > >> > >>>> [root@red-gridftp3 ~]# rpm -q xrootd-server
> > > >> > >>>> xrootd-server-3.2.2-1.osg.el5.xu
> > > >> > >>>> 
> > > >> > >>>> Log startup is below. Config file snippet is:
> > > >> > >>>> 
> > > >> > >>>> xrootd.monitor all auth flush 30s mbuff 1472 window 5s dest
> > > >> > >>>> files
> > > >> > >>>> io
> > > >> > >>>> info user xrootd.t2.ucsd.edu <http://xrootd.t2.ucsd.edu>:9930
> > > >> 
> > > >> xrd.report xrootd.t2.ucsd.edu <http://xrootd.t2.ucsd.edu>:9931
> > > >> 
> > > >> > >>>> every 30s all sync
> > > >> > >>>> 
> > > >> > >>>> Any ideas? We are at a loss as to what might be happening.
> > > >> > >>>> 
> > > >> > >>>> Brian
> > > >> > >>>> 
> > > >> > >>>> 120828 21:07:13 1663 Scalla is starting. . .
> > > >> > >>>> Copr. 2010 Stanford University, xrd version v3.2.2
> > > >> > >>>> ++++++ xrootd [log in to unmask]
> > > >> 
> > > >> <mailto:[log in to unmask]> initialization started.
> > > >> 
> > > >> > >>>> Config using configuration file /etc/xrootd/xrootd-clustered.cfg
> > > >> > >>>> =====> xrd.port 1094
> > > >> > >>>> =====> xrd.trace conn
> > > >> > >>>> =====> all.adminpath /var/run/xrootd
> > > >> > >>>> =====> xrd.report xrootd.t2.ucsd.edu
> > > >> > >>>> <http://xrootd.t2.ucsd.edu>:9931
> > > >> 
> > > >> every 30s all sync
> > > >> 
> > > >> > >>>> Config maximum number of connections restricted to 65536
> > > >> > >>>> Copr. 2007 Stanford University, xrootd version 2.9.7 build
> > > >> > >>>> v3.2.2
> > > >> > >>>> ++++++ xrootd protocol initialization started.
> > > >> > >>>> =====> all.export / nostage
> > > >> > >>>> =====> xrootd.trace emsg login stall redirect
> > > >> > >>>> =====> xrootd.seclib /usr/lib64/libXrdSec.so
> > > >> > >>>> Config warning: ignoring fslib; libXrdOfs.so is built-in.
> > > >> > >>>> =====> xrootd.fslib /usr/lib64/libXrdOfs.so
> > > >> > >>>> =====> all.pidpath /var/run/xrootd
> > > >> > >>>> =====> xrootd.monitor all auth flush 30s mbuff 1472 window 5s
> > > >> > >>>> dest
> > > >> > >>>> files io info user xrootd.t2.ucsd.edu
> > > >> > >>>> <http://xrootd.t2.ucsd.edu>:9930
> > > >> 
> > > >> Config exporting /
> > > >> 
> > > >> > >>>> ++++++ Authentication system initialization started.
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: ***
> > > >> > >>>> ------------------------------------------------------------ ***
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Mode: server
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Debug: -1
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CA dir:
> > > >> > >>>> /etc/grid-security/certificates 120828 21:07:13 1663
> > > >> > >>>> secgsi_InitOpts:
> > > >> > >>>> CA verification level: 1
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CRL dir:
> > > >> > >>>> /etc/grid-security/certificates/ 120828 21:07:13 1663
> > > >> > >>>> secgsi_InitOpts:
> > > >> > >>>> CRL extension: .r0
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CRL check level: 1
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CRL refresh time: 86400
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Certificate:
> > > >> > >>>> /etc/grid-security/xrd/xrdcert.pem 120828 21:07:13 1663
> > > >> > >>>> secgsi_InitOpts: Key: /etc/grid-security/xrd/xrdkey.pem 120828
> > > >> > >>>> 21:07:13 1663 secgsi_InitOpts: Proxy delegation option: 0
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: GRIDmap file:
> > > >> > >>>> /etc/grid-security/grid-mapfile 120828 21:07:13 1663
> > > >> > >>>> secgsi_InitOpts:
> > > >> > >>>> GRIDmap option: 10
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: GRIDmap cache entries
> > > >> > >>>> expiration
> > > >> > >>>> (secs): 0 120828 21:07:13 1663 secgsi_InitOpts: Authorization
> > > >> > >>>> function: libXrdLcmaps.so 120828 21:07:13 1663 secgsi_InitOpts:
> > > >> > >>>> Authorization function parms:
> > > >> > >>>> --osg,--lcmapscfg,/etc/xrootd/lcmaps.cfg,--loglevel,0|useglobals
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Authorization cache
> > > >> > >>>> entries
> > > >> > >>>> expiration (secs): -1 120828 21:07:13 1663 secgsi_InitOpts:
> > > >> > >>>> Client
> > > >> > >>>> proxy availability in XrdSecEntity.endorsement: 0 120828
> > > >> > >>>> 21:07:13
> > > >> > >>>> 1663
> > > >> > >>>> secgsi_InitOpts: VOMS option: 1
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: MonInfo option: 0
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Crypto modules: ssl
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Ciphers:
> > > >> > >>>> aes-128-cbc:bf-cbc:des-ede3-cbc 120828 21:07:13 1663
> > > >> > >>>> secgsi_InitOpts:
> > > >> > >>>> MDigests: sha1:md5
> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: ***
> > > >> > >>>> ------------------------------------------------------------ ***
> > > >> > >>>> 120828 21:07:13 1663 secgsi_LoadAuthzFun: using
> > > >> > >>>> 'XrdSecgsiAuthzFun()'
> > > >> > >>>> from libXrdLcmaps.so =====> sec.protocol /usr/lib64 gsi
> > > >> > >>>> -certdir:/etc/grid-security/certificates
> > > >> > >>>> -cert:/etc/grid-security/xrd/xrdcert.pem
> > > >> > >>>> -key:/etc/grid-security/xrd/xrdkey.pem -crl:1
> > > >> > >>>> -authzfun:libXrdLcmaps.so -authzfunparms:--osg,--lcmapscfg,/et
> > > >> > >>>> Config
> > > >> > >>>> 1 authentication directives processed in
> > > >> > >>>> /etc/xrootd/xrootd-clustered.cfg ------ Authentication system
> > > >> > >>>> initialization completed.
> > > >> > >>>> ++++++ File system initialization started.
> > > >> > >>>> =====> all.role server
> > > >> > >>>> Config warning: ignoring invalid trace option 'none'.
> > > >> > >>>> =====> ofs.trace none
> > > >> > >>>> =====> ofs.authorize
> > > >> > >>>> =====> ofs.osslib /usr/lib64/libXrdHdfs.so
> > > >> > >>>> ++++++ Authorization system initialization started.
> > > >> > >>>> 120828 21:07:13 1663 acc_Config: Authorization system using
> > > >> > >>>> configuration in /etc/xrootd/xrootd-clustered.cfg =====>
> > > >> > >>>> acc.authdb
> > > >> > >>>> /etc/xrootd/Authfile
> > > >> > >>>> =====> acc.audit deny grant
> > > >> > >>>> Config 2 authorization directives processed in
> > > >> > >>>> /etc/xrootd/xrootd-clustered.cfg Config 1 auth entries processed
> > > >> > >>>> in
> > > >> > >>>> /etc/xrootd/Authfile
> > > >> > >>>> ------ Authorization system initialization completed.
> > > >> > >>>> Copr. 2009, Brian Bockelman, Hdfs Version
> > > >> > >>>> 120828 21:07:13 1663 hdfs_Config: Copr. 2009, Brian Bockelman,
> > > >> > >>>> Hdfs
> > > >> > >>>> Version
> > > >> > >>>> 120828 21:07:13 1663 hdfs_Config: Configuring HDFS.
> > > >> > >>>> =====> oss.namelib /usr/lib64/libXrdCmsTfc.so
> > > >> > >>>> file:/etc/xrootd/storage.xml?protocol=hadoop Copr. 2009
> > > >> > >>>> University
> > > >> > >>>> of
> > > >> > >>>> Nebraska-Lincoln TFC plugin v 1.0
> > > >> > >>>> Params: file:/etc/xrootd/storage.xml?protocol=hadoop
> > > >> > >>>> Xerces-c has been initialized.
> > > >> > >>>> Connecting to the catalog
> > > >> > >>>> file:/etc/xrootd/storage.xml?protocol=hadoop
> > > >> > >>>> Using catalog file /etc/xrootd/storage.xml
> > > >> > >>>> ------ HDFS storage system initialization completed.
> > > >> > >>>> 120828 21:07:13 1663 hdfs_HDFS storage system initialization.:
> > > >> > >>>> completed.
> > > >> > >>>> ++++++ Configuring server role. . .
> > > >> > >>>> =====> all.manager srm.unl.edu <http://srm.unl.edu>:1213
> > > >> > >>>> =====> cms.trace all
> > > >> > >>>> =====> all.adminpath /var/run/xrootd
> > > >> > >>>> 120828 21:07:13 1663 Configure Global System Identification:
> > > >> > >>>> anon-s
> > > >> > >>>> 1213srm.unl.edu <http://1213srm.unl.edu>>>>>
> > > >> > >>>> Config effective /etc/xrootd/xrootd-clustered.cfg ofs
> > > >> > >>>> configuration:
> > > >> > >>>> ofs.role server
> > > >> > >>>> ofs.authorize
> > > >> > >>>> ofs.maxdelay 60
> > > >> > >>>> ofs.osslib /usr/lib64/libXrdHdfs.so
> > > >> > >>>> ofs.persist manual hold 600 logdir /var/run/xrootd/.ofs/posc.log
> > > >> > >>>> ofs.trace 0
> > > >> > >>>> 
> > > >> > >>>> ------ File system server initialization completed.
> > > >> > >>>> Config warning: 'xrootd.prepare logdir' not specified; prepare
> > > >> > >>>> tracking
> > > >> > >>>> disabled. 120828 21:07:13 1675 cms_Finder: Connected to cmsd via
> > > >> > >>>> /var/run/xrootd/.olb/olbd.admin ------ xrootd protocol
> > > >> > >>>> initialization
> > > >> > >>>> completed.
> > > >> > >>>> ------ xrootd [log in to unmask]
> > > >> 
> > > >> <mailto:[log in to unmask]>:1094 initialization completed.
> > > >> 
> > > >> > >>>> ################################################################
> > > >> > >>>> ##
> > > >> > >>>> #####
> > > >> > >>>> #
> > > >> > >>>> Use REPLY-ALL to reply to list
> > > >> > >>>> 
> > > >> > >>>> To unsubscribe from the XROOTD-DEV list, click the following
> > > >> > >>>> link:
> > > >> > >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&
> > > >> > >>>> A=
> > > >> > >>>> 1
> > > >> > >>> 
> > > >> > >>> #################################################################
> > > >> > >>> ##
> > > >> > >>> #####
> > > >> > >>> Use REPLY-ALL to reply to list
> > > >> > >>> 
> > > >> > >>> To unsubscribe from the XROOTD-DEV list, click the following
> > > >> > >>> link:
> > > >> > >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A
> > > >> > >>> =1
> > > >> > >> 
> > > >> > >> ##################################################################
> > > >> > >> ##
> > > >> > >> ####
> > > >> > >> Use REPLY-ALL to reply to list
> > > >> > >> 
> > > >> > >> To unsubscribe from the XROOTD-DEV list, click the following link:
> > > >> > >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=
> > > >> > >> 1
> > > >> > > 
> > > >> > > ###################################################################
> > > >> > > ##
> > > >> > > ###
> > > >> > > Use REPLY-ALL to reply to list
> > > >> > > 
> > > >> > > To unsubscribe from the XROOTD-DEV list, click the following link:
> > > >> > > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
> > > >> > > <winmail.dat>
> > > >> > 
> > > >> > #####################################################################
> > > >> > ##
> > > >> > #
> > > >> > Use REPLY-ALL to reply to list
> > > >> > 
> > > >> > To unsubscribe from the XROOTD-DEV list, click the following link:
> > > >> > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
> > > >> 
> > > >> -----------------------------------------------------------------------
> > > >> --
> > > >> -------
> > > >> 
> > > >> Use REPLY-ALL to reply to list
> > > >> 
> > > >> To unsubscribe from the XROOTD-DEV list, click the following link:
> > > >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
> > > > 
> > > > ------------------------------------------------------------------------
> > > > --
> > > > ------
> > > > 
> > > > Use REPLY-ALL to reply to list
> > > > 
> > > > To unsubscribe from the XROOTD-DEV list, click the following link:
> > > > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
> 
> 
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the XROOTD-DEV list, click the following link:
> 
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
> 


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1