Hi, Jerome LAURET wrote: > > Yes, it may be an unresponsive server but the end > product is a user job crashing according to our users. Some > of those connections (later retried) would lead to success. > We can switch to level 3 and see as soon as we can design a > large scale test: we also have flukes with authentication > in general I have not reported this yet (I beleive it to be > LDAP related as changing the LDAP setup changed failure rates > from 50% to 3% ... but I cannot drop below the 3% failures). > > In general, could someone indicates what is the most > reliable setting for Xrootd to retry upon failures (regardless > of delays this may cause) ?? > That kind of failure has been treated as exceptional and fatal for a long time. So the policy was to abort. Newer versions are supposed to extend the retry mechanisms also to that circumstance. But I believe that it's not your case. From your answer I understand that the problem does not happen every time. The retry mechanism could patch the problem, but the main issue is that your server machine seems unable to receive/send the very first bytes after the establishment of a connection. The initial idea was that a machine which just accepted a connection was supposed to be able to handle a transfer of 10-20 bytes, but we were wrong, and fixed that in a later release. Although this is very hard to reproduce/debug. The log could help, but since you get the error only a few times, you will get an enormous amount of log to document the trouble. If this is the case, put the log level to 2 instead of 3. Personally I am not aware of interactions with LDAP which can interfere with the handshake. Fabrizio > Thanks, > > Fabrizio Furano wrote: > >> Hi Jerome. >> >> This almost never happens, and makes me think about an unresponsive >> server. But may be caused by weird connection troubles. Do you get >> this immediately or in the middle of a communication? >> >> Since it's a very strance situation, I'd suggest you to put the >> client side debug level to 3 and send everything to me. Also the >> server side log (after having enabled it, of course) will be useful. >> >> The version included in root4 is rather old, but well known and >> tested. I am looking forward hoping that everybody will be switching >> to the newer one asap.... at least I will be no more dealing with N >> versions of the same code.... >> >> Fabrizio >> >> >> >> Jerome LAURET wrote: >> >>> Has anyone experienced this kind of issues and if so, what >>> to do to resolve it ?? >>> >>>> Error in <DoHandShake>: Error reading 4 bytes from the server >>>> [rcas6132.rcf.bnl.gov:1095]. >>>> Info in <GetAccessToSrv>: HandShake failed with server >>>> [rcas6132.rcf.bnl.gov:1095]. >>>> Error in <TXNetFile::CreateTXNf>: Access to server failed (0) >>>> Error in <Disconnect>: Destroying nonexistent logconnid 0. >>>> Error in <DoHandShake>: Error reading 4 bytes from the server >>>> [rcas6132.rcf.bnl.gov:1095]. >>>> Info in <GetAccessToSrv>: HandShake failed with server >>>> [rcas6132.rcf.bnl.gov:1095]. >>>> Error in <TXNetFile::CreateTXNf>: Access to server failed (0) >>>> Error in <Disconnect>: Destroying nonexistent logconnid 0. >>> >>> >>> >>> >>> Thank you, >>> >