Print

Print


After further investigation: It looks like the poller is just stuck. The 
backtrace does not tell why, because the poller wrapper is trying to be 
extra safe and also avoid the kind of deadlock I described in my 
previous email.

    Lukasz

On 13.02.2014 10:29, Lukasz Janyst wrote:
> OK, it's not a bug in the client but in the poller. One stream
> (0x7fd0e4001070) is trying to send some data, and simultaneously another
> stream (0x7fcfb80048b0) is being removed due to TTL expiration.
>
> Andy, can you please have a look?
>
> Alja, if it disturbs your testing you can temporarily switch to libevent
> by playing with XRD_POLLERPREFERENCE envvar.
>
> Cheers,
>     Lukasz
>
> On 13.02.2014 10:17, Lukasz Janyst wrote:
>> Strange. I took care to avoid just this kind of deadlock. I will have a
>> look.
>>
>>     Lukasz
>>
>> PS. for the future, please report these as issues on github.
>>
>> On 13.02.2014 07:09, Alja Mrak-Tadel wrote:
>>> Hi,
>>>
>>> I'm running proxy server from the master branch and I consistently get a
>>> dead lock after running jobs more than 15 min, e.g.,
>>> http://uaf-2.t2.ucsd.edu/~alja/traffic.png
>>>
>>> The lock seems to be related to removing sockets in the client after
>>> elapsing the XRD_DATASERVERTTL of the proxy. This is what I inferred
>>> from debugging proxy with gdb (I picked a locked up thread and followed
>>> the owners of contended locks):
>>>
>>> https://gist.github.com/alja/2e6a69c48f864cccf1d9#file-xrdcl-pollerbuiltin-removesocket-bt-txt-L12
>>>
>>>
>>> and server debug messages http://uaf-2.t2.ucsd.edu/~alja/proxy-lock.log
>>> If I set the TTL high enough the proxy server can be 100% efficient and
>>> stable.
>>>
>>> The lock is not simple to reproduce -- I'm running 600 jobs with 10%
>>> probability of replicated file paths. The gcore after lock-up is in
>>> noric38.slac.stanford.edu:/usr/work/matevz/gcore.12905
>>>
>>> Thanks,
>>> Alya
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1