Yes, it's "RequestTimeout". I remember that Remi set it at >15 minutes in the skimming scripts or whatever. Really ugly but it's the only option right now. Fabrizio On Wednesday 14 September 2005 04:16 am, Andrew Hanushevsky wrote: > Hi Wilko, > > The timeout is set to 60 seconds. There is an application settable env > name that will set it to whatever you want (I forget what it is called but > Fabrizio can immediately say what it is). > > Andy > > P.S. Yes, Andreas would see the same problem. > > On Tue, 13 Sep 2005, Wilko Kroeger wrote: > > Hello Andy > > > > > > I forgot to mention that Andreas (GridK) saw this problem with xrdcp, but > > I guess if a data server is very busy and the response of the server is > > longer then the timeout one would get the same error. > > > > Do you or Fabrizio know what the timeout is and if it can be modified > > (for xrdcp and the perl client)? > > > > Cheers, > > wilko > > > > On Tue, 13 Sep 2005, Andrew Hanushevsky wrote: > > > Hi Wilko, > > > > > > This is the standard problem that we noticed with long-running > > > commands. The client times out and retries. However, it cannot > > > establish a new session because the previous session is still running > > > the cksum command. Short of ding some call-back scheme or idling the > > > client while the command is running, we don't have an immediate > > > solution other than increase the timeout. > > > > > > Andy > > > > > > On Tue, 13 Sep 2005, Wilko Kroeger wrote: > > > > Hello > > > > > > > > While testing xrootd I was using the perl client admin lib to obtain > > > > the checksum of a file. Many clients were accessing the data sever > > > > (no redirector was used) in parallel, and each client was looping to > > > > obtain a checksum. > > > > > > > > Loop: > > > > XrdInitialize > > > > XrdGetChecksum > > > > XrdTerminate > > > > > > > > > > > > Very seldom I see the case where the clients prints: > > > > > > > > 050912 14:49:53 001 Xrd: ReadPartialAnswer: Error reading msg from > > > > connmgr (server [datadevsol04.slac.stanford.edu:2094]). 050912 > > > > 14:49:53 001 Xrd: HandleServerError: Communication error with server > > > > [datadevsol04.slac.stanford.edu:2094]. Rebouncing here. 050912 > > > > 14:49:53 001 Xrd: XrdClientConn::Endsess: Server > > > > [datadevsol04.slac.stanford.edu:2094] did not return OK message for > > > > last request. 050912 14:49:53 001 Xrd: SendGenCommand: Server > > > > declared error 3006:session is active > > > > > > > > I believe that the client stills receives the correct checksum but it > > > > is hard to test as the problem is very rare. > > > > > > > > > > > > The data server log file shows (cut out lines that belong to > > > > different clients): > > > > > > > > 050912 14:44:55 039 wilko.9456:87@kama XrootdProtocol: 1b00 req=3001 > > > > dlen=21 > > > > > > > > 050912 14:49:53 001 XrdInet: Accepted connection from > > > > kama.slac.stanford.edu 050912 14:49:53 023 XrdSched: running > > > > ?:47@kama inq=0 > > > > 050912 14:49:53 023 XrdProtocol: matched protocol xrootd > > > > 050912 14:49:53 023 ?:47@kama XrdPoll: FD 47 attached to poller 1; > > > > num=14 050912 14:49:53 023 ?:47@kama XrootdProtocol: 1b00 req=3007 > > > > dlen=0 050912 14:49:53 023 wilko.9456:47@kama XrootdResponse: 1b00 > > > > sending 16 data bytes; status=0 050912 14:49:53 023 XrootdXeq: > > > > wilko.9456:47@kama login > > > > > > > > 050912 14:50:10 039 XrootdXeq: wilko.9456:87@kama disc 0:14:12 > > > > 050912 14:50:10 039 wilko.9456:87@kama XrdPoll: FD 87 detached from > > > > poller 2; num=12 > > > > > > > > > > > > The first line (14:44:55) is a client checksum request, but the sever > > > > is not returning the answer as there is no corresponding line: > > > > ... XrootdResponse: 1a00 sending 16 data bytes; status=0 > > > > > > > > It looks like that the client is then establishing a new connections, > > > > wilko.9456:47 (the old one was wilko.9456:87). > > > > > > > > > > > > As I said, I can't easily reproduce this problem, but during skimming > > > > at GridK the same message was observed. In GridK's case the same > > > > message repeats every 5 mins until after about 50 mins the > > > > client aborts because of to many communication errors. > > > > This problem has been reported in: > > > > http://babar-hn.slac.stanford.edu:5090/HyperNews/get/SkimSOS/1867.htm > > > >l > > > > > > > > GridK is using xrootd version 20050623-0016, whereas my tests were > > > > done with the xrootd HEAD as of Sep. 9th. In both cases the data > > > > server was heavily loaded. > > > > > > > > Any ideas? > > > > > > > > Cheers, > > > > wilko