Hi,
I'm running proxy server from the master branch and I consistently get a
dead lock after running jobs more than 15 min, e.g.,
http://uaf-2.t2.ucsd.edu/~alja/traffic.png
The lock seems to be related to removing sockets in the client after
elapsing the XRD_DATASERVERTTL of the proxy. This is what I inferred
from debugging proxy with gdb (I picked a locked up thread and followed
the owners of contended locks):
https://gist.github.com/alja/2e6a69c48f864cccf1d9#file-xrdcl-pollerbuiltin-removesocket-bt-txt-L12
and server debug messages http://uaf-2.t2.ucsd.edu/~alja/proxy-lock.log
If I set the TTL high enough the proxy server can be 100% efficient and
stable.
The lock is not simple to reproduce -- I'm running 600 jobs with 10%
probability of replicated file paths. The gcore after lock-up is in
noric38.slac.stanford.edu:/usr/work/matevz/gcore.12905
Thanks,
Alya
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|