Print

Print


Hi,

Jerome LAURET wrote:
> 
>     Yes, it may be an unresponsive server but the end
> product is a user job crashing according to our users. Some
> of those connections (later retried) would lead to success.
> We can switch to level 3 and see as soon as we can design a
> large scale test: we also have flukes with authentication
> in general I have not reported this yet (I beleive it to be
> LDAP related as changing the LDAP setup changed failure rates
> from 50% to 3% ... but I cannot drop below the 3% failures).
> 
>     In general, could someone indicates what is the most
> reliable setting for Xrootd to retry upon failures (regardless
> of delays this may cause) ??
> 

  That kind of failure has been treated as exceptional and fatal for a 
long time. So the policy was to abort. Newer versions are supposed to 
extend the retry mechanisms also to that circumstance. But I believe 
that it's not your case.

  From your answer I understand that the problem does not happen every 
time. The retry mechanism could patch the problem, but the main issue is 
that your server machine seems unable to receive/send the very first 
bytes after the establishment of a connection. The initial idea was that 
a machine which just accepted a connection was supposed to be able to 
handle a transfer of 10-20 bytes, but we were wrong, and fixed that in a 
later release. Although this is very hard to reproduce/debug.

  The log could help, but since you get the error only a few times, you 
will get an enormous amount of log to document the trouble. If this is 
the case, put the log level to 2 instead of 3.

  Personally I am not aware of interactions with LDAP which can 
interfere with the handshake.

Fabrizio









>     Thanks,
> 
> Fabrizio Furano wrote:
> 
>> Hi Jerome.
>>
>>  This almost never happens, and makes me think about an unresponsive 
>> server. But may be caused by weird connection troubles. Do you get 
>> this immediately or in the middle of a communication?
>>
>>  Since it's a very strance situation, I'd suggest you to put the 
>> client side debug level to 3 and send everything to me. Also the 
>> server side log (after having enabled it, of course) will be useful.
>>
>>  The version included in root4 is rather old, but well known and 
>> tested. I am looking forward hoping that everybody will be switching 
>> to the newer one asap.... at least I will be no more dealing with N 
>> versions of the same code....
>>
>> Fabrizio
>>
>>
>>
>> Jerome LAURET wrote:
>>
>>>     Has anyone experienced this kind of issues and if so, what
>>> to do to resolve it ??
>>>
>>>> Error in <DoHandShake>: Error reading 4 bytes from the server 
>>>> [rcas6132.rcf.bnl.gov:1095].
>>>> Info in <GetAccessToSrv>: HandShake failed with server 
>>>> [rcas6132.rcf.bnl.gov:1095].
>>>> Error in <TXNetFile::CreateTXNf>: Access to server failed (0)
>>>> Error in <Disconnect>: Destroying nonexistent logconnid 0.
>>>> Error in <DoHandShake>: Error reading 4 bytes from the server 
>>>> [rcas6132.rcf.bnl.gov:1095].
>>>> Info in <GetAccessToSrv>: HandShake failed with server 
>>>> [rcas6132.rcf.bnl.gov:1095].
>>>> Error in <TXNetFile::CreateTXNf>: Access to server failed (0)
>>>> Error in <Disconnect>: Destroying nonexistent logconnid 0. 
>>>
>>>
>>>
>>>
>>>     Thank you,
>>>
>