Print

Print


Hi,

I'm running proxy server from the master branch and I consistently get a 
dead lock after running jobs more than 15 min, e.g., 
http://uaf-2.t2.ucsd.edu/~alja/traffic.png

The lock seems to be related to removing sockets in the client after 
elapsing the XRD_DATASERVERTTL of the proxy. This is what I inferred 
from debugging proxy with gdb (I picked a locked up thread and followed 
the owners of contended locks):

https://gist.github.com/alja/2e6a69c48f864cccf1d9#file-xrdcl-pollerbuiltin-removesocket-bt-txt-L12 
and server debug messages http://uaf-2.t2.ucsd.edu/~alja/proxy-lock.log
If I set the TTL high enough the proxy server can be 100% efficient and 
stable.

The lock is not simple to reproduce -- I'm running 600 jobs with 10% 
probability of replicated file paths. The gcore after lock-up is in 
noric38.slac.stanford.edu:/usr/work/matevz/gcore.12905

Thanks,
Alya

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1