Print

Print


Hi Elvin,

    The handler pointer used in lines 420 - 447 is retrieved from the 
incoming queue using: InQueue::GetHandlerForMessage. This method 
actually removes the handler from InQueue in a mutex protected section. 
Stream::Tick calls InQueue::ReportTimeout that searches through the 
InQueue also in a mutex protected section. So, if the handler pointer is 
present in the Stream object it's not present in the Queue, therefore 
any timeout event caused by Stream::Tick will not be reported to the 
handler currently being processed by the Stream object in the polling 
thread.

    I have another idea though that I am currently exploring. CCing 
xrootd-dev for future (Michal's) reference in case I don't manage to fix 
it before leaving.

Cheers,
    Lukasz

On 03/12/2015 12:19 AM, Elvin Alin Sindrilaru wrote:
>
> Hi Lukasz, Andy,
>
> As part of my "apprenticeship" was also to look over the client code, I thought that this might be a good starting point. Although I am not even half way through with understanding that happens here, I noticed something that might be helpful in this situation.
>
> In XrdClStream.cc around line 420 the method pTransport->MessageReceived is called and inside this function there is a check done to see whether the initial request for this incoming message didn't time out and if so the handler is discarded.
>
> Now imagine that the check passes but after this call the initial request actually expires and the XrdClStream::Tick method runs - therefore the handler is called on the expired request and it is also destroyed (by a different thread - the one running the ticks).
>
> Now in the XrdClStream::OnIncoming method the thread can re-add the handler to the pIncomingQueue even if it was previously destroyed as a consequence of the timed out request. Therefore, a crash will happen when the handler is called the second time - for the legitimate message that we just processed. There is not lock to protect the "region" between lines 420 - 447 in XrdClStream.cc and the Tick thread can expire the current request in the meantime.
>
> Hope it makes sense ...
>
> Cheers,
> Elvin
>
>
> ________________________________________
> From: Andrew Hanushevsky [[log in to unmask]]
> Sent: 11 March 2015 21:34
> To: Lukasz Janyst
> Cc: Elvin Alin Sindrilaru
> Subject: Curious Client Issue
>
> Hi Lukasz,
>
> I have seen several instances of client crashes closely preceded by an
> operation timeout (be it an operation or a socket or whatever -- it's always
> a timeout). The core file appears to indicate that an internal XrdCl
> response handler was subsequently invoked on the a deleted XrdClFile object.
> I suspect is because the XrdClFile::open() returned a timeout error and so
> the object was deleted. However, there apparently is still something out
> there that wants to process that object even after the timeout occurs. Have
> you see this kind of thing before. Do you have any ideas of where to look
> for a smoking gun?
>
> Andy
>
>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1