Print

Print


Hi Brian,

OK, I looked more closely and it appears that no one calls the function that 
enables the sigstop/cont code (that's why I would have liked to see a cerr 
put in there). In other words, it's dead code (and, yes, should be removed). 
That would indicate that some monitoring process in the system is 
temporarily stopping the process. If the machine is also a batch node then I 
could see that happening (it happens here for other processes). Could you 
verify?

Andy

-----Original Message----- 
From: Brian Bockelman
Sent: Tuesday, August 23, 2011 3:18 PM
To: Andrew Hanushevsky
Cc: xrootd-dev
Subject: Re: xrootd redirector repeatedly "crashing"


On Aug 23, 2011, at 5:07 PM, Andrew Hanushevsky wrote:

> On Tue, 23 Aug 2011, Brian Bockelman wrote:
>
>> I'm not quite following what the issue is from the in-code comments, but 
>> isn't this what SO_LINGER (POSIX-compliant) is for?
> No, that isn't (at least wasn't) a solution for the problem. If there
> was an operation in progress when you closed the socket, the close would 
> not be reflected to the person who initiated the operation (unlike all 
> other operating systems) until some activity occurred that caused the 
> kernel to re-inspect the operation (e,g, signal). So, the initiating 
> thread would simply hang. That might no longer be the case in current 
> versions of Linux. But in the day it was a huge hassle since async close() 
> operations occur relatively frequently. Anyway, the implementation of 
> SO_LINGER varies from OS to OS and even in versions of an OS. That's what 
> I found when dealing with this pronlem and that's what people say out 
> there.
>

Hm.  It should be easy to construct a test case, right?  With two threads:
1) Thread 1 creates a socket pair, and writes to the write-end until it 
blocks.
2) Thread 2 waits for some reasonable amount of time, then closes the 
socket.
3) If Linux still shows this behavior, Thread 1 should hang...

Brian