Print

Print


 Yes, it's "RequestTimeout".

 I remember that Remi set it at >15 minutes in the skimming scripts or 
whatever. Really ugly but it's the only option right now.

Fabrizio

On Wednesday 14 September 2005 04:16 am, Andrew Hanushevsky wrote:
> Hi Wilko,
>
> The timeout is set to 60 seconds. There is an application settable env
> name that will set it to whatever you want (I forget what it is called but
> Fabrizio can immediately say what it is).
>
> Andy
>
> P.S. Yes, Andreas would see the same problem.
>
> On Tue, 13 Sep 2005, Wilko Kroeger wrote:
> > Hello Andy
> >
> >
> > I forgot to mention that Andreas (GridK) saw this problem with xrdcp, but
> > I guess if a data server is very busy and the response of the server is
> > longer then the timeout one would get the same error.
> >
> > Do you or Fabrizio know what the timeout is and if it can be modified
> > (for xrdcp and the perl client)?
> >
> > Cheers,
> >    wilko
> >
> > On Tue, 13 Sep 2005, Andrew Hanushevsky wrote:
> > > Hi Wilko,
> > >
> > > This is the standard problem that we noticed with long-running
> > > commands. The client times out and retries. However, it cannot
> > > establish a new session because the previous session is still running
> > > the cksum command. Short of ding some call-back scheme or idling the
> > > client while the command is running, we don't have an immediate
> > > solution other than increase the timeout.
> > >
> > > Andy
> > >
> > > On Tue, 13 Sep 2005, Wilko Kroeger wrote:
> > > > Hello
> > > >
> > > > While testing xrootd I was using the perl client admin lib to obtain
> > > > the checksum of a file. Many clients were accessing the data sever
> > > > (no redirector was used) in parallel, and each client was looping to
> > > > obtain a checksum.
> > > >
> > > > Loop:
> > > >     XrdInitialize
> > > >     XrdGetChecksum
> > > >     XrdTerminate
> > > >
> > > >
> > > > Very seldom I see the case where the clients prints:
> > > >
> > > > 050912 14:49:53 001 Xrd: ReadPartialAnswer: Error reading msg from
> > > > connmgr (server [datadevsol04.slac.stanford.edu:2094]). 050912
> > > > 14:49:53 001 Xrd: HandleServerError: Communication error with server
> > > > [datadevsol04.slac.stanford.edu:2094]. Rebouncing here. 050912
> > > > 14:49:53 001 Xrd: XrdClientConn::Endsess: Server
> > > > [datadevsol04.slac.stanford.edu:2094] did not return OK message for
> > > > last request. 050912 14:49:53 001 Xrd: SendGenCommand: Server
> > > > declared error 3006:session is active
> > > >
> > > > I believe that the client stills receives the correct checksum but it
> > > > is hard to test as the problem is very rare.
> > > >
> > > >
> > > > The data server log file shows (cut out lines that belong to
> > > > different clients):
> > > >
> > > > 050912 14:44:55 039 wilko.9456:87@kama XrootdProtocol: 1b00 req=3001
> > > > dlen=21
> > > >
> > > > 050912 14:49:53 001 XrdInet: Accepted connection from
> > > > kama.slac.stanford.edu 050912 14:49:53 023 XrdSched: running
> > > > ?:47@kama inq=0
> > > > 050912 14:49:53 023 XrdProtocol: matched protocol xrootd
> > > > 050912 14:49:53 023 ?:47@kama XrdPoll: FD 47 attached to poller 1;
> > > > num=14 050912 14:49:53 023 ?:47@kama XrootdProtocol: 1b00 req=3007
> > > > dlen=0 050912 14:49:53 023 wilko.9456:47@kama XrootdResponse: 1b00
> > > > sending 16 data bytes; status=0 050912 14:49:53 023 XrootdXeq:
> > > > wilko.9456:47@kama login
> > > >
> > > > 050912 14:50:10 039 XrootdXeq: wilko.9456:87@kama disc 0:14:12
> > > > 050912 14:50:10 039 wilko.9456:87@kama XrdPoll: FD 87 detached from
> > > > poller 2; num=12
> > > >
> > > >
> > > > The first line (14:44:55) is a client checksum request, but the sever
> > > > is not returning the answer as there is no corresponding line:
> > > >  ... XrootdResponse: 1a00 sending 16 data bytes; status=0
> > > >
> > > > It looks like that the client is then establishing a new connections,
> > > > wilko.9456:47 (the old one was wilko.9456:87).
> > > >
> > > >
> > > > As I said, I can't easily reproduce this problem, but during skimming
> > > > at GridK the same message was observed. In GridK's case the same
> > > > message repeats every 5 mins until after about 50 mins the
> > > > client aborts because of to many communication errors.
> > > > This problem has been reported in:
> > > > http://babar-hn.slac.stanford.edu:5090/HyperNews/get/SkimSOS/1867.htm
> > > >l
> > > >
> > > > GridK is using xrootd version 20050623-0016, whereas my tests were
> > > > done with the xrootd HEAD as of Sep. 9th. In both cases the data
> > > > server was heavily loaded.
> > > >
> > > > Any ideas?
> > > >
> > > > Cheers,
> > > >    wilko