There are no explicit timeouts on the server side. If the client is tardy,
the client is scheduled at an effectively lower piority. That hasn't
changed since day one. A connection reset by peer simply means that the
client closed the socket for no apparent reason. I suspect that the client
can't get through to the server or that the I/O operations are taking an
excessively long time to complete but that would be a stretch. The only
time we've been able to force that is when we completely overload the disk
drive (low cpu usage but very high latency). You can see that via iostat.
On Fri, 19 Oct 2007, Brew, CAJ (Chris) wrote:
> Hi Fabrizio,
> There's no other errors in the xroot log.
> The server does not appear heavy, though there may well be congestion at
> the switch level at the moment.
> The servers are still running a quite (read very) old version of the
> xrootd server software but the client I tried first was one from March
> this year but then I switched to the same release as the server incase
> the cause was a protocol mismatch. Could the timeouts be on the server
> side if it's running an old version?
>> -----Original Message-----
>> From: Fabrizio Furano [mailto:[log in to unmask]]
>> Sent: 19 October 2007 16:32
>> To: Brew, CAJ (Chris)
>> Cc: [log in to unmask]
>> Subject: Re: Failed xrdcp writes
>> Hi Chris,
>> Return code 11 means that the client was not able to write a chunk,
>> after having retried for several times if it was a timeout or network
>> problem, or after having received an explicit error from the server.
>> Even if you have the server log at a low level, you should
>> be able to
>> spot if there are complaints before that line that you quoted (which
>> states that xrdcp gave up)
>> If not, then the cause might be a temporary network trouble
>> (but quite
>> long) or a heavy overload on the server side disks. I
>> remember also that
>> XrdClient, and hence xrdcp, long time ago (about 1 year) had much
>> shorter timeouts, and hence was more prone to this kind of
>> troubles. Are
>> you using a reasonably recent version of it?
>> Brew, CAJ (Chris) wrote:
>>> I'm experimenting with using xrdcp for the babar skiiming at RAL but
>>> some of the writes to the xrootd server fail.
>>> The only clue in the client logs I can find is:
>>> xrdcp failed with exit code 11
>>> though there may be more being eaten by the wrappers.
>>> On the server side I see:
>>> XrdLink: Unable to receive from
>> babar004.1969:[log in to unmask];
>>> connection reset by peer
>>> This doesn't happen for all transfer but a significant minority fail
>>> this way.
>>> Any ideas where I should be looking for the problem?