Print

Print


Follow-up Comment #2, bug #100633 (project xrootd):

With some more checking: cmsd was seen to disappear on SL6 systems. On SL6
the default soft limit for "max user processes" is 1024 (which also impacts
threads). The default thread limit for the XrdScheduler is 2048. Before that
limit it creates threads until error from XrdSysThread::Run and then resets
its limit to the number accounted as successfully running.

The complication seems to be that the XrdClient (used via the Pss) runs
threads outside the XrdScheduler and calls exit() on error in starting a
thread:

#0  0x00007f04f0c48cd0 in exit () from /lib64/libc.so.6
#1  0x00007f04ef2a38ed in XrdClientPhyConnection::StartReader
(this=0x7f04c8129460)
    at
/usr/src/debug/xrootd-3.3.0/src/XrdClient/XrdClientPhyConnection.cc:257
#2  0x00007f04ef2978c0 in XrdClientConn::GetAccessToSrv
(this=0x7f04c8010580)
    at /usr/src/debug/xrootd-3.3.0/src/XrdClient/XrdClientConn.cc:1314
#3  0x00007f04ef2b6d50 in XrdClientAdmin::Connect (this=0x7f04d4ad8b80)
    at /usr/src/debug/xrootd-3.3.0/src/XrdClient/XrdClientAdmin.cc:208
#4  0x00007f04ef6ea7a5 in XrdPosixAdminNew::XrdPosixAdminNew
(this=0x7f04d4ad8b80, path=<value optimized out>)
    at /usr/src/debug/xrootd-3.3.0/src/XrdPosix/XrdPosixXrootd.cc:254
#5  0x00007f04ef6ec0b7 in XrdPosixXrootd::Stat (
    path=0x7f04d4ad8c40
"root://localhost:11000//atlas/thefile272029?oss.lcl=1", buf=0x7f04d4ad9c90)
    at /usr/src/debug/xrootd-3.3.0/src/XrdPosix/XrdPosixXrootd.cc:1283
#6  0x00007f04efb0dcb1 in XrdPssSys::Stat (this=<value optimized out>, 
    path=0x7f04e815ac40 "/atlas/thefile272029", buff=0x7f04d4ad9c90,
Opts=<value optimized out>, 
    eP=<value optimized out>) at
/usr/src/debug/xrootd-3.3.0/src/XrdPss/XrdPss.cc:378
#7  0x000000000040e3a7 in XrdCmsBaseFS::Exists (this=0x637b40,
Path=0x7f04e815ac40 "/atlas/thefile272029", 
    fnPos=-20, UpAT=<value optimized out>) at
/usr/src/debug/xrootd-3.3.0/src/XrdCms/XrdCmsBaseFS.cc:166
#8  0x0000000000423c76 in XrdCmsNode::do_State (this=0x7f04e80024e0,
Arg=...)
    at /usr/src/debug/xrootd-3.3.0/src/XrdCms/XrdCmsNode.cc:1167
[...]

One site reported general protection errors at the time the  process exited
(Manchester system log); the ip is inside exit(), i.e. they were generated
after calling exit(), but I've not reproduced them.

The other two sites had glibc heap corruption messages in the cmsd log -
possibly at exit time but I've not reproduced those either.

    _______________________________________________________

Reply to this item at:

  <http://savannah.cern.ch/bugs/?100633>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1