After further investigation: It looks like the poller is just stuck. The backtrace does not tell why, because the poller wrapper is trying to be extra safe and also avoid the kind of deadlock I described in my previous email. Lukasz On 13.02.2014 10:29, Lukasz Janyst wrote: > OK, it's not a bug in the client but in the poller. One stream > (0x7fd0e4001070) is trying to send some data, and simultaneously another > stream (0x7fcfb80048b0) is being removed due to TTL expiration. > > Andy, can you please have a look? > > Alja, if it disturbs your testing you can temporarily switch to libevent > by playing with XRD_POLLERPREFERENCE envvar. > > Cheers, > Lukasz > > On 13.02.2014 10:17, Lukasz Janyst wrote: >> Strange. I took care to avoid just this kind of deadlock. I will have a >> look. >> >> Lukasz >> >> PS. for the future, please report these as issues on github. >> >> On 13.02.2014 07:09, Alja Mrak-Tadel wrote: >>> Hi, >>> >>> I'm running proxy server from the master branch and I consistently get a >>> dead lock after running jobs more than 15 min, e.g., >>> http://uaf-2.t2.ucsd.edu/~alja/traffic.png >>> >>> The lock seems to be related to removing sockets in the client after >>> elapsing the XRD_DATASERVERTTL of the proxy. This is what I inferred >>> from debugging proxy with gdb (I picked a locked up thread and followed >>> the owners of contended locks): >>> >>> https://gist.github.com/alja/2e6a69c48f864cccf1d9#file-xrdcl-pollerbuiltin-removesocket-bt-txt-L12 >>> >>> >>> and server debug messages http://uaf-2.t2.ucsd.edu/~alja/proxy-lock.log >>> If I set the TTL high enough the proxy server can be 100% efficient and >>> stable. >>> >>> The lock is not simple to reproduce -- I'm running 600 jobs with 10% >>> probability of replicated file paths. The gcore after lock-up is in >>> noric38.slac.stanford.edu:/usr/work/matevz/gcore.12905 >>> >>> Thanks, >>> Alya >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1