Print

Print


Hi,

On 9/3/12 9:00 AM, Brian Bockelman wrote:
> Hi Lukasz,
>
> I believe Matevz was referring to the redirect counter issue.

Yup, that's what I meant ... this is the commit record referenced from the 
savannah link I sent originally:
http://xrootd.cern.ch/cgi-bin/cgit.cgi/xrootd/commit/?id=4dd4cfa4340cc3ff61684f943392eeeff793c3a5

Matevz

> The redirect counter issue really nailed us over the weekend - the CERN
> redirectors got up to 1KHz of clients logging in / out.  We had to turn off
> cross-region redirection, and I'm not quite sure how we will be able to
> re-enable it.
>
> Brian
>
> On Sep 3, 2012, at 10:55 AM, Lukasz Janyst <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
>> A ma ready to build what is in stable now. Waiting for a confirmation from Matevz.
>>
>> Lukasz
>>
>> On Monday, September 03, 2012 12:51:30 PM Lukasz Janyst wrote:
>> > Hi Matevz,
>> >
>> > What problem are you referring to? The redirect counter issue or the
>> > thread safety issues? The thread safety is already there in 3.2.2 so if you
>> > still see the problems then we need to investigate...
>> >
>> > Cheers,
>> > Lukasz
>> >
>> > On Friday, August 31, 2012 01:07:56 PM Matevz Tadel wrote:
>> > > Luckily we just noticed this one show up again:
>> > > https://savannah.cern.ch/bugs/?93794
>> > >
>> > > Please include the fix!
>> > >
>> > > Cheers,
>> > > Matevz
>> > >
>> > > On 08/31/12 08:08, Brian Bockelman wrote:
>> > > > Perfectly fine by me.
>> > > >
>> > > > So:
>> > > > 1) Detailed monitoring fix.
>> > > > 2) Sendfile monitoring fix.
>> > > >
>> > > > I can't think of anything else?
>> > > >
>> > > > Brian
>> > > >
>> > > > On Aug 31, 2012, at 9:58 AM, Lukasz Janyst <[log in to unmask]
>> <mailto:[log in to unmask]>
>> > > >
>> > > > <mailto:[log in to unmask]>> wrote:
>> > > >> Yes, but I won't manage today. Is Monday fine with you?
>> > > >>
>> > > >> Lukasz
>> > > >>
>> > > >> On Friday, August 31, 2012 07:34:44 AM Brian Bockelman wrote:
>> > > >> > Can we cut a 3.2.3 patch release with these two fixes?
>> > > >> >
>> > > >> > Brian
>> > > >> >
>> > > >> > On Aug 31, 2012, at 12:14 AM, "Yang, Wei" <[log in to unmask]
>> <mailto:[log in to unmask]>
>> > > >>
>> > > >> <mailto:[log in to unmask]>> wrote:
>> > > >> > > I tested the second. I didn't get a chance to test the 1st before I
>> > > >> > > lost
>> > > >> > > the window of restarting the cluster. But I have sendfile() turned
>> > > >> > > off
>> > > >> > > and I do get correct results, so it implicitly confirms the 1st
>> > > >> > > one.
>> > > >> > >
>> > > >> > > regards,
>> > > >> > > Wei Yang | [log in to unmask] <mailto:[log in to unmask]>
>> <mailto:[log in to unmask]>
>> > > >> > > |
>> > > >>
>> > > >> 650-926-3338(O)
>> > > >>
>> > > >> > > On Aug 30, 2012, at 9:29 PM, Wilko Kroeger wrote:
>> > > >> > >> Hello Brian
>> > > >> > >>
>> > > >> > >> Yes, we also noticed that the detailed monitoring is not working
>> > > >> > >> in
>> > > >> > >> v3.2.2. We build a version on top of v3.2.2 adding the two
>> > > >> > >> commits:
>> > > >> > >>
>> > > >> > >> commit e0ad3459c89a163e600070a15936b8fd5d26ff35
>> > > >> > >> Author: Andrew Hanushevsky <[log in to unmask] <mailto:[log in to unmask]>
>> > > >> > >> <mailto:[log in to unmask]>>
>> > > >> > >> Date: Wed Aug 22 18:56:19 2012 -0700
>> > > >> > >>
>> > > >> > >> Make sure read statistics are updated for sendfile() and mmap I/O.
>> > > >> > >>
>> > > >> > >> commit e51db4bb0178a21bbe87ccf7c9349b079c2d7455
>> > > >> > >> Author: Andrew Hanushevsky <[log in to unmask] <mailto:[log in to unmask]>
>> > > >> > >> <mailto:[log in to unmask]>>
>> > > >> > >> Date: Mon Jul 30 16:52:56 2012 -0700
>> > > >> > >>
>> > > >> > >> Correct monitor initialization test to start monitor under all
>> > > >> > >> configs.
>> > > >> > >>
>> > > >> > >> As far as I can tell the detailed monitoring is now working. Wei
>> > > >> > >> might
>> > > >> > >> have done more testing.
>> > > >> > >>
>> > > >> > >> Cheers,
>> > > >> > >>
>> > > >> > >> Wilko
>> > > >> > >>
>> > > >> > >> On Thu, 30 Aug 2012, Brian Bockelman wrote:
>> > > >> > >>> Hi Andy,
>> > > >> > >>>
>> > > >> > >>> The core wasn't interesting. However, I tracked it down to this
>> > > >> > >>> change
>> > > >> > >>> (line 334 in XrdXrootdConfig.cc):
>> > > >> > >>>
>> > > >> > >>> if ((!isRedir || (RQList.Next() != 0 &&
>> > > >> > >>> XrdXrootdMonitor::Redirect())))
>> > > >> > >>>
>> > > >> > >>> became:
>> > > >> > >>>
>> > > >> > >>> if ((!isRedir || (RQList.Next() != 0)) &&
>> > > >> > >>> XrdXrootdMonitor::Redirect())
>> > > >> > >>>
>> > > >> > >>> (in 3.2.2). In master, it is this test:
>> > > >> > >>>
>> > > >> > >>> if (!isRedir || XrdXrootdMonitor::Redirect())
>> > > >> > >>>
>> > > >> > >>> Note that XrdXrootdMonitor::Redirect always returns 0 (I suspect
>> > > >> > >>> the bug
>> > > >> > >>> is this).
>> > > >> > >>>
>> > > >> > >>> So, basically, I think detailed monitoring is broken in the 3.2.2
>> > > >> > >>> release. Matevz, take note...
>> > > >> > >>>
>> > > >> > >>> What's the minimal patch? I can ask OSG to push this out ASAP.
>> > > >> > >>>
>> > > >> > >>> Brian
>> > > >> > >>>
>> > > >> > >>> On Aug 28, 2012, at 9:26 PM, Andrew Hanushevsky <[log in to unmask]
>> <mailto:[log in to unmask]>
>> > > >>
>> > > >> <mailto:[log in to unmask]>> wrote:
>> > > >> > >>>> Hi Brian,
>> > > >> > >>>>
>> > > >> > >>>> Best to get a gcore on this one. Seems like the monitoring did
>> > > >> > >>>> not
>> > > >> > >>>> initialize correctly as it's trying to send to fd 0.
>> > > >> > >>>>
>> > > >> > >>>> Andy
>> > > >> > >>>>
>> > > >> > >>>> -----Original Message----- From: Brian Bockelman
>> > > >> > >>>> Sent: Tuesday, August 28, 2012 7:15 PM
>> > > >> > >>>> To: <[log in to unmask]
>> <mailto:[log in to unmask]>
>> > > >> > >>>> <mailto:[log in to unmask]>>
>> > > >> > >>>> Subject: Strange detailed monitoring issue
>> > > >> > >>>>
>> > > >> > >>>> After a power outage locally, Matevz noticed he is not receiving
>> > > >> > >>>> monitoring messages.
>> > > >> > >>>>
>> > > >> > >>>> Sure enough, from strace:
>> > > >> > >>>>
>> > > >> > >>>> [pid 1705] sendto(0,
>> > > >> > >>>> "t8\5\270\0\0\0\0\340\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0\30
>> > > >> > >>>> 2^
>> > > >> > >>>> v#".
>> > > >> > >>>> .., 1464, 0, {sa_family=AF_UNSPEC,
>> > > >> > >>>> sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = -1 ENOTSOCK
>> > > >> > >>>> (Socket
>> > > >> > >>>> operation on non-socket)
>> > > >> > >>>>
>> > > >> > >>>> Version info:
>> > > >> > >>>>
>> > > >> > >>>> [root@red-gridftp3 ~]# rpm -q xrootd-server
>> > > >> > >>>> xrootd-server-3.2.2-1.osg.el5.xu
>> > > >> > >>>>
>> > > >> > >>>> Log startup is below. Config file snippet is:
>> > > >> > >>>>
>> > > >> > >>>> xrootd.monitor all auth flush 30s mbuff 1472 window 5s dest
>> > > >> > >>>> files
>> > > >> > >>>> io
>> > > >> > >>>> info user xrootd.t2.ucsd.edu <http://xrootd.t2.ucsd.edu>
>> <http://xrootd.t2.ucsd.edu>:9930
>> > > >>
>> > > >> xrd.report xrootd.t2.ucsd.edu <http://xrootd.t2.ucsd.edu>
>> <http://xrootd.t2.ucsd.edu>:9931
>> > > >>
>> > > >> > >>>> every 30s all sync
>> > > >> > >>>>
>> > > >> > >>>> Any ideas? We are at a loss as to what might be happening.
>> > > >> > >>>>
>> > > >> > >>>> Brian
>> > > >> > >>>>
>> > > >> > >>>> 120828 21:07:13 1663 Scalla is starting. . .
>> > > >> > >>>> Copr. 2010 Stanford University, xrd version v3.2.2
>> > > >> > >>>> ++++++ xrootd [log in to unmask]
>> <mailto:[log in to unmask]>
>> > > >>
>> > > >> <mailto:[log in to unmask]> initialization started.
>> > > >>
>> > > >> > >>>> Config using configuration file /etc/xrootd/xrootd-clustered.cfg
>> > > >> > >>>> =====> xrd.port 1094
>> > > >> > >>>> =====> xrd.trace conn
>> > > >> > >>>> =====> all.adminpath /var/run/xrootd
>> > > >> > >>>> =====> xrd.report xrootd.t2.ucsd.edu <http://xrootd.t2.ucsd.edu>
>> > > >> > >>>> <http://xrootd.t2.ucsd.edu>:9931
>> > > >>
>> > > >> every 30s all sync
>> > > >>
>> > > >> > >>>> Config maximum number of connections restricted to 65536
>> > > >> > >>>> Copr. 2007 Stanford University, xrootd version 2.9.7 build
>> > > >> > >>>> v3.2.2
>> > > >> > >>>> ++++++ xrootd protocol initialization started.
>> > > >> > >>>> =====> all.export / nostage
>> > > >> > >>>> =====> xrootd.trace emsg login stall redirect
>> > > >> > >>>> =====> xrootd.seclib /usr/lib64/libXrdSec.so
>> > > >> > >>>> Config warning: ignoring fslib; libXrdOfs.so is built-in.
>> > > >> > >>>> =====> xrootd.fslib /usr/lib64/libXrdOfs.so
>> > > >> > >>>> =====> all.pidpath /var/run/xrootd
>> > > >> > >>>> =====> xrootd.monitor all auth flush 30s mbuff 1472 window 5s
>> > > >> > >>>> dest
>> > > >> > >>>> files io info user xrootd.t2.ucsd.edu <http://xrootd.t2.ucsd.edu>
>> > > >> > >>>> <http://xrootd.t2.ucsd.edu>:9930
>> > > >>
>> > > >> Config exporting /
>> > > >>
>> > > >> > >>>> ++++++ Authentication system initialization started.
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: ***
>> > > >> > >>>> ------------------------------------------------------------ ***
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Mode: server
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Debug: -1
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CA dir:
>> > > >> > >>>> /etc/grid-security/certificates 120828 21:07:13 1663
>> > > >> > >>>> secgsi_InitOpts:
>> > > >> > >>>> CA verification level: 1
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CRL dir:
>> > > >> > >>>> /etc/grid-security/certificates/ 120828 21:07:13 1663
>> > > >> > >>>> secgsi_InitOpts:
>> > > >> > >>>> CRL extension: .r0
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CRL check level: 1
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: CRL refresh time: 86400
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Certificate:
>> > > >> > >>>> /etc/grid-security/xrd/xrdcert.pem 120828 21:07:13 1663
>> > > >> > >>>> secgsi_InitOpts: Key: /etc/grid-security/xrd/xrdkey.pem 120828
>> > > >> > >>>> 21:07:13 1663 secgsi_InitOpts: Proxy delegation option: 0
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: GRIDmap file:
>> > > >> > >>>> /etc/grid-security/grid-mapfile 120828 21:07:13 1663
>> > > >> > >>>> secgsi_InitOpts:
>> > > >> > >>>> GRIDmap option: 10
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: GRIDmap cache entries
>> > > >> > >>>> expiration
>> > > >> > >>>> (secs): 0 120828 21:07:13 1663 secgsi_InitOpts: Authorization
>> > > >> > >>>> function: libXrdLcmaps.so 120828 21:07:13 1663 secgsi_InitOpts:
>> > > >> > >>>> Authorization function parms:
>> > > >> > >>>> --osg,--lcmapscfg,/etc/xrootd/lcmaps.cfg,--loglevel,0|useglobals
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Authorization cache
>> > > >> > >>>> entries
>> > > >> > >>>> expiration (secs): -1 120828 21:07:13 1663 secgsi_InitOpts:
>> > > >> > >>>> Client
>> > > >> > >>>> proxy availability in XrdSecEntity.endorsement: 0 120828
>> > > >> > >>>> 21:07:13
>> > > >> > >>>> 1663
>> > > >> > >>>> secgsi_InitOpts: VOMS option: 1
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: MonInfo option: 0
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Crypto modules: ssl
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: Ciphers:
>> > > >> > >>>> aes-128-cbc:bf-cbc:des-ede3-cbc 120828 21:07:13 1663
>> > > >> > >>>> secgsi_InitOpts:
>> > > >> > >>>> MDigests: sha1:md5
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_InitOpts: ***
>> > > >> > >>>> ------------------------------------------------------------ ***
>> > > >> > >>>> 120828 21:07:13 1663 secgsi_LoadAuthzFun: using
>> > > >> > >>>> 'XrdSecgsiAuthzFun()'
>> > > >> > >>>> from libXrdLcmaps.so =====> sec.protocol /usr/lib64 gsi
>> > > >> > >>>> -certdir:/etc/grid-security/certificates
>> > > >> > >>>> -cert:/etc/grid-security/xrd/xrdcert.pem
>> > > >> > >>>> -key:/etc/grid-security/xrd/xrdkey.pem -crl:1
>> > > >> > >>>> -authzfun:libXrdLcmaps.so -authzfunparms:--osg,--lcmapscfg,/et
>> > > >> > >>>> Config
>> > > >> > >>>> 1 authentication directives processed in
>> > > >> > >>>> /etc/xrootd/xrootd-clustered.cfg ------ Authentication system
>> > > >> > >>>> initialization completed.
>> > > >> > >>>> ++++++ File system initialization started.
>> > > >> > >>>> =====> all.role server
>> > > >> > >>>> Config warning: ignoring invalid trace option 'none'.
>> > > >> > >>>> =====> ofs.trace none
>> > > >> > >>>> =====> ofs.authorize
>> > > >> > >>>> =====> ofs.osslib /usr/lib64/libXrdHdfs.so
>> > > >> > >>>> ++++++ Authorization system initialization started.
>> > > >> > >>>> 120828 21:07:13 1663 acc_Config: Authorization system using
>> > > >> > >>>> configuration in /etc/xrootd/xrootd-clustered.cfg =====>
>> > > >> > >>>> acc.authdb
>> > > >> > >>>> /etc/xrootd/Authfile
>> > > >> > >>>> =====> acc.audit deny grant
>> > > >> > >>>> Config 2 authorization directives processed in
>> > > >> > >>>> /etc/xrootd/xrootd-clustered.cfg Config 1 auth entries processed
>> > > >> > >>>> in
>> > > >> > >>>> /etc/xrootd/Authfile
>> > > >> > >>>> ------ Authorization system initialization completed.
>> > > >> > >>>> Copr. 2009, Brian Bockelman, Hdfs Version
>> > > >> > >>>> 120828 21:07:13 1663 hdfs_Config: Copr. 2009, Brian Bockelman,
>> > > >> > >>>> Hdfs
>> > > >> > >>>> Version
>> > > >> > >>>> 120828 21:07:13 1663 hdfs_Config: Configuring HDFS.
>> > > >> > >>>> =====> oss.namelib /usr/lib64/libXrdCmsTfc.so
>> > > >> > >>>> file:/etc/xrootd/storage.xml?protocol=hadoop Copr. 2009
>> > > >> > >>>> University
>> > > >> > >>>> of
>> > > >> > >>>> Nebraska-Lincoln TFC plugin v 1.0
>> > > >> > >>>> Params: file:/etc/xrootd/storage.xml?protocol=hadoop
>> > > >> > >>>> Xerces-c has been initialized.
>> > > >> > >>>> Connecting to the catalog
>> > > >> > >>>> file:/etc/xrootd/storage.xml?protocol=hadoop
>> > > >> > >>>> Using catalog file /etc/xrootd/storage.xml
>> > > >> > >>>> ------ HDFS storage system initialization completed.
>> > > >> > >>>> 120828 21:07:13 1663 hdfs_HDFS storage system initialization.:
>> > > >> > >>>> completed.
>> > > >> > >>>> ++++++ Configuring server role. . .
>> > > >> > >>>> =====> all.manager srm.unl.edu <http://srm.unl.edu>
>> <http://srm.unl.edu>:1213
>> > > >> > >>>> =====> cms.trace all
>> > > >> > >>>> =====> all.adminpath /var/run/xrootd
>> > > >> > >>>> 120828 21:07:13 1663 Configure Global System Identification:
>> > > >> > >>>> anon-s
>> > > >> > >>>> 1213srm.unl.edu <http://1213srm.unl.edu>
>> <http://1213srm.unl.edu>>>>>
>> > > >> > >>>> Config effective /etc/xrootd/xrootd-clustered.cfg ofs
>> > > >> > >>>> configuration:
>> > > >> > >>>> ofs.role server
>> > > >> > >>>> ofs.authorize
>> > > >> > >>>> ofs.maxdelay 60
>> > > >> > >>>> ofs.osslib /usr/lib64/libXrdHdfs.so
>> > > >> > >>>> ofs.persist manual hold 600 logdir /var/run/xrootd/.ofs/posc.log
>> > > >> > >>>> ofs.trace 0
>> > > >> > >>>>
>> > > >> > >>>> ------ File system server initialization completed.
>> > > >> > >>>> Config warning: 'xrootd.prepare logdir' not specified; prepare
>> > > >> > >>>> tracking
>> > > >> > >>>> disabled. 120828 21:07:13 1675 cms_Finder: Connected to cmsd via
>> > > >> > >>>> /var/run/xrootd/.olb/olbd.admin ------ xrootd protocol
>> > > >> > >>>> initialization
>> > > >> > >>>> completed.
>> > > >> > >>>> ------ xrootd [log in to unmask]
>> <mailto:[log in to unmask]>
>> > > >>
>> > > >> <mailto:[log in to unmask]>:1094 initialization completed.
>> > > >>
>> > > >> > >>>> ################################################################
>> > > >> > >>>> ##
>> > > >> > >>>> #####
>> > > >> > >>>> #
>> > > >> > >>>> Use REPLY-ALL to reply to list
>> > > >> > >>>>
>> > > >> > >>>> To unsubscribe from the XROOTD-DEV list, click the following
>> > > >> > >>>> link:
>> > > >> > >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&
>> > > >> > >>>> A=
>> > > >> > >>>> 1
>> > > >> > >>>
>> > > >> > >>> #################################################################
>> > > >> > >>> ##
>> > > >> > >>> #####
>> > > >> > >>> Use REPLY-ALL to reply to list
>> > > >> > >>>
>> > > >> > >>> To unsubscribe from the XROOTD-DEV list, click the following
>> > > >> > >>> link:
>> > > >> > >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A
>> > > >> > >>> =1
>> > > >> > >>
>> > > >> > >> ##################################################################
>> > > >> > >> ##
>> > > >> > >> ####
>> > > >> > >> Use REPLY-ALL to reply to list
>> > > >> > >>
>> > > >> > >> To unsubscribe from the XROOTD-DEV list, click the following link:
>> > > >> > >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=
>> > > >> > >> 1
>> > > >> > >
>> > > >> > > ###################################################################
>> > > >> > > ##
>> > > >> > > ###
>> > > >> > > Use REPLY-ALL to reply to list
>> > > >> > >
>> > > >> > > To unsubscribe from the XROOTD-DEV list, click the following link:
>> > > >> > > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>> > > >> > > <winmail.dat>
>> > > >> >
>> > > >> > #####################################################################
>> > > >> > ##
>> > > >> > #
>> > > >> > Use REPLY-ALL to reply to list
>> > > >> >
>> > > >> > To unsubscribe from the XROOTD-DEV list, click the following link:
>> > > >> > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>> > > >>
>> > > >> -----------------------------------------------------------------------
>> > > >> --
>> > > >> -------
>> > > >>
>> > > >> Use REPLY-ALL to reply to list
>> > > >>
>> > > >> To unsubscribe from the XROOTD-DEV list, click the following link:
>> > > >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>> > > >
>> > > > ------------------------------------------------------------------------
>> > > > --
>> > > > ------
>> > > >
>> > > > Use REPLY-ALL to reply to list
>> > > >
>> > > > To unsubscribe from the XROOTD-DEV list, click the following link:
>> > > > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>
>> --------------------------------------------------------------------------------
>>
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1