Investigation pointed to an async-cancel unsafe call in XrdSysLinuxSemaphore.hh, which is made while the asyc cancel delivery is on:

{code}
pthread_setcanceltype( PTHREAD_CANCEL_ASYNCHRONOUS, 0 );
r = syscall( SYS_futex, pValue, FUTEX_WAIT, newVal, 0, 0, 0 );
pthread_setcanceltype( PTHREAD_CANCEL_DEFERRED, 0 );
{code}

Although pthread primitives like pthread_cond_wait are implemented with a futex with async cancel enabled, our XrdSysLinuxSemaphore uses the glibc utility function syscall() to make the futex system call. However the declaration for syscall() (from unstd.h) is declared:

{code}
extern long int syscall (long int __sysno, ...) __THROW;
{code}

(__THROW being defined as throw(), i.e. throws nothing). The NPTL async thread cancel is implemented using the standard stack unwinding; quite similar to C++ exception handling. My conclusion was that gcc 5.1.1 does different optimisation that earlier versions of gcc, and in this case it produces code where the unwinding does not work from this routine.

As an experiment I tried recompiling xroot with a hacked unistd.h to remove the throw() declaration and confirmed that the crash was prevented. (The relevant difference in the generated code in this case is apparently that frame pointer omission optimisation is turned off the for the enclosing function).

For fixed I have tried some things; but I'll leave the discussion of the implementation of possible fixes off the bug report! (I've mentioned them to Michal).


Reply to this email directly or view it on GitHub.



Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1