Print

Print


	Let's address this from a diferent angle: in
which version did you "fix" those issues or know those
"issues" have been fixed (the send/receive issue as a
start) ??

- xrd Client: is the version compatible with ROOT 4.04.02
   production version ??

- Server: which version it is - We can re-deploy (STAR
   patches included for LFN and PFN support included) and
   re-test faster than we can trace infrastructure not under
   our control ...

	Thank you,


Fabrizio Furano wrote:
> Hi,
> 
> Jerome LAURET wrote:
> 
>>
>>     Yes, it may be an unresponsive server but the end
>> product is a user job crashing according to our users. Some
>> of those connections (later retried) would lead to success.
>> We can switch to level 3 and see as soon as we can design a
>> large scale test: we also have flukes with authentication
>> in general I have not reported this yet (I beleive it to be
>> LDAP related as changing the LDAP setup changed failure rates
>> from 50% to 3% ... but I cannot drop below the 3% failures).
>>
>>     In general, could someone indicates what is the most
>> reliable setting for Xrootd to retry upon failures (regardless
>> of delays this may cause) ??
>>
> 
>  That kind of failure has been treated as exceptional and fatal for a 
> long time. So the policy was to abort. Newer versions are supposed to 
> extend the retry mechanisms also to that circumstance. But I believe 
> that it's not your case.
> 
>  From your answer I understand that the problem does not happen every 
> time. The retry mechanism could patch the problem, but the main issue is 
> that your server machine seems unable to receive/send the very first 
> bytes after the establishment of a connection. The initial idea was that 
> a machine which just accepted a connection was supposed to be able to 
> handle a transfer of 10-20 bytes, but we were wrong, and fixed that in a 
> later release. Although this is very hard to reproduce/debug.
> 
>  The log could help, but since you get the error only a few times, you 
> will get an enormous amount of log to document the trouble. If this is 
> the case, put the log level to 2 instead of 3.
> 
>  Personally I am not aware of interactions with LDAP which can interfere 
> with the handshake.
> 
> Fabrizio
> 
> 
> 
> 
> 
> 
> 
> 
> 
>>     Thanks,
>>
>> Fabrizio Furano wrote:
>>
>>> Hi Jerome.
>>>
>>>  This almost never happens, and makes me think about an unresponsive 
>>> server. But may be caused by weird connection troubles. Do you get 
>>> this immediately or in the middle of a communication?
>>>
>>>  Since it's a very strance situation, I'd suggest you to put the 
>>> client side debug level to 3 and send everything to me. Also the 
>>> server side log (after having enabled it, of course) will be useful.
>>>
>>>  The version included in root4 is rather old, but well known and 
>>> tested. I am looking forward hoping that everybody will be switching 
>>> to the newer one asap.... at least I will be no more dealing with N 
>>> versions of the same code....
>>>
>>> Fabrizio
>>>
>>>
>>>
>>> Jerome LAURET wrote:
>>>
>>>>     Has anyone experienced this kind of issues and if so, what
>>>> to do to resolve it ??
>>>>
>>>>> Error in <DoHandShake>: Error reading 4 bytes from the server 
>>>>> [rcas6132.rcf.bnl.gov:1095].
>>>>> Info in <GetAccessToSrv>: HandShake failed with server 
>>>>> [rcas6132.rcf.bnl.gov:1095].
>>>>> Error in <TXNetFile::CreateTXNf>: Access to server failed (0)
>>>>> Error in <Disconnect>: Destroying nonexistent logconnid 0.
>>>>> Error in <DoHandShake>: Error reading 4 bytes from the server 
>>>>> [rcas6132.rcf.bnl.gov:1095].
>>>>> Info in <GetAccessToSrv>: HandShake failed with server 
>>>>> [rcas6132.rcf.bnl.gov:1095].
>>>>> Error in <TXNetFile::CreateTXNf>: Access to server failed (0)
>>>>> Error in <Disconnect>: Destroying nonexistent logconnid 0. 
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     Thank you,
>>>>
>>

-- 
              ,,,,,
             ( o o )
          --m---U---m--
              Jerome