Print

Print


We are running on one instance (CMS) the patched xrootd-3.3.6-3.CERN.slc5 version which has a back port of the xrootd-4.0 semaphore implementation.

We observe since we changed the xrootd version regular lock contention (lock-up ~ once a day) because it seems a synchronous call to a running xrootd on localhost hangs in XrdSys::LinuxSemaphore::Wait() after issuing an XrdCl::FileSystem::Query call e.g. it does not receive a response or it misses the semaphore post. Is the new LinuxSemaphore impelementation stress tested enough in multi threaded applications to rule out a problem there?

The ATLAS instance with xrootd-3.3.4 never showed this hangs and the CMS instance before the update had very rare lock-ups using the GLIBC semaphore implementation at a lower frequency than with the new implementation.

#0  0x00000036bc4d1c69 in syscall () from /lib64/libc.so.6
#1  0x00002b2ce73b0e28 in XrdSys::LinuxSemaphore::Wait() () from /usr/lib64/libXrdCl.so.1
#2  0x00002b2ce73c56d7 in XrdCl::XRootDStatus XrdCl::MessageUtils::WaitForResponse<XrdCl::Buffer>(XrdCl::SyncResponseHandler*, XrdCl::Buffer*&) () from /usr/lib64/libXrdCl.so.1
#3  0x00002b2ce73c03c5 in XrdCl::FileSystem::Query(XrdCl::QueryCode::Code, XrdCl::Buffer const&, XrdCl::Buffer*&, unsigned short) () from /usr/lib64/libXrdCl.so.1
#4  0x00002b2ce70bb9f3 in XrdMqClient::SendMessage (this=0x2b2ce73706e0, msg=<value optimized out>, receiverid=<value optimized out>, sign=<value optimized out>, 
    encrypt=<value optimized out>) at /usr/src/debug/eos-0.3.25-1
...

We have two stack traces with exactly the same signature. When it happens the next time we will also strack-trace the destination xrootd to verify that the origin of the problem is not coming from there however the coincidence of the xrootd core change and the lock-up looks sort of suspicious.

Will provide more information when available.

---
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/110

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1