Print

Print


Hi,

  You are right, sigh. I found a silly bug in XrdClientConn.cc which 
caused for sure that bug. I just committed the fix for the typo. Instead 
of clearing only the kXR_refresh bit after a server requested command 
repetition, it cleared all the options field. This is consistent with 
what we see.

Fabrizio

Andrew Hanushevsky wrote:
> Hi Wilko,
> 
> OK, there seems to be a somewhat strange conflict in the client. The first
> request is to create a file. That is handled correctly (the client is told
> to wait). However, when the client comes back, it tries to (apparently)
> open the file in read mode, not create mode. That fails because the file
> doesn't exist. That's also a correct response. Why is the client retrying
> in read mode? Fabrizio?
> 
> The reason it works the second time is that the file lookup has occured.
> So, the first request for a create binds correctly to a server that can
> create a file. The real question is what is the client trying to do here?
> 
> Andy
> 
> On Fri, 17 Dec 2004, Wilko Kroeger wrote:
> 
> 
>>Hello Andy
>>
>>I restarted the olbds with -d option and run the test again.
>>
>>
>>>xrdcp -d2  dummy.root  root://bbr-test-xfer1:2094////prod/s13/wk.root
>>
>>The log files are:
>>
>># ----  olbdlog from redirector:  (first time running xrdcp)
>>041217 13:06:25 010 Receive From bbr-test-xfer1.slac.stanford.edu: 0 select  c /prod/s13/wk.root
>>041217 13:06:25 010 do_Select Lookup delay bbr-test-xfer1.slac.stanford.edu 5
>>041217 13:06:30 010 Receive From bbr-test-xfer1.slac.stanford.edu: 1 select  r /prod/s13/wk.root
>>041217 13:06:30 010 do_Select No servers available to r /prod/s13/wk.root
>>
>># ----  olbdlog from oprserv04 (data server)
>>041217 13:06:25 001 Receive From bbr-test-xfer1.slac.stanford.edu: 7@0  state /prod/s13/wk.root
>>
>># ----  olbdlog from oprserv07 (data server)
>>041217 13:06:25 001 Receive From bbr-test-xfer1.slac.stanford.edu: 7@0  state /prod/s13/wk.root
>>
>>
>>running the command a second time:
>>
>>
>>>xrdcp -d2 dummy.root root://bbr-test-xfer1:2094////prod/s13/wk.root
>>
>># ----  olbdlog from redirector:  (second time running xrdcp)
>>
>>041217 13:11:20 010 Receive From bbr-test-xfer1.slac.stanford.edu: 2 select  c /prod/s13/wk.root
>>041217 13:11:20 010 SelServer Server oprserv07.slac.stanford.edu:2094 bound to /prod/s13/wk.root
>>041217 13:11:20 010 do_Select Redirect bbr-test-xfer1.slac.stanford.edu ->
>>       oprserv07.slac.stanford.edu:2094 for /prod/s13/wk.root
>>
>>There are no entries in the olbdlog of the two data servers.
>>
>>
>>
>>I also provided the entires in the xrdlog file (redirector) for the first
>>attempt to run xrdcp.
>>
>># ---- xrdlog redirector (first time running xrdcp)
>>041217 13:06:25 029 XrdInet: Accepted connection from noric13
>>041217 13:06:25 007 XrdSched: running 5@noric13 inq=0
>>041217 13:06:25 007 XrdProtocol: matched protocol xrootd
>>041217 13:06:25 007 ?:15@noric13 XrdPoll: FD 15 attached to poller 0; num=1
>>041217 13:06:25 007 ?:15@noric13 XrootdProtocol: 0000 req=3007 dlen=0
>>041217 13:06:25 007 wilko.18587:15@noric13 XrootdResponse: 0000 sending OK
>>041217 13:06:25 007 XrootdXeq: User logged in as wilko.18587:15@noric13
>>041217 13:06:25 007 wilko.18587:15@noric13 XrootdProtocol: 0000 req=3010 dlen=19
>>041217 13:06:25 007 wilko.18587:15@noric13 ofs_open: 100-1000000664 fn=/prod/s13/wk.root
>>041217 13:06:25 007 odc_Locate: user=wilko.18587:15@noric13 wait :5 by
>>  bbr-test-xfer1 path=/prod/s13/wk.root
>>041217 13:06:25 007 wilko.18587:15@noric13 XrootdProtocol: 0000 stalling
>>  client for 5 sec
>>041217 13:06:30 007 wilko.18587:15@noric13 XrootdProtocol: 0000 req=3010 dlen=19
>>041217 13:06:30 007 wilko.18587:15@noric13 ofs_open: 0-664 fn=/prod/s13/wk.root
>>041217 13:06:30 007 odc_Locate: user=wilko.18587:15@noric13 error: No
>>  servers are available to read the file.:0 by bbr-test-xfer1 path=/prod/s13/wk.root
>>041217 13:06:30 007 wilko.18587:15@noric13 XrootdResponse: 0000 sending
>>  err 3005: No servers are available to read the file.
>>041217 13:06:30 007 XrdLink: wilko.18587:15@noric13 disconnected after 0:00:05 (link read error)
>>041217 13:06:30 007 wilko.18587:15@noric13 XrdPoll: FD 15 detached from poller 0; num=0
>>
>>
>>I hope this helps
>>
>>Cheers,
>>   Wilko
>>
>>
>>
>>
>>On Fri, 17 Dec 2004, Andrew Hanushevsky wrote:
>>
>>
>>>Hi Wilko,
>>>
>>>Could you rerun the test with the olbd's started with the -d option and
>>>send me the log; thanks.
>>>
>>>Andy
>>>
>>>On Thu, 16 Dec 2004, Wilko Kroeger wrote:
>>>
>>>
>>>>
>>>>
>>>>Hello Andy and Fabrizio
>>>>
>>>>I tried to copy a local file into xrootd using the latest xrootd release,
>>>>20041216-0815. I am using one redirector (bbr-test-xfer1) and two data
>>>>servers (oprser04, oprserv07).
>>>>Running xrdcp against a data server works fine, but running it against
>>>>the redirector doesn't work the way it should.
>>>>
>>>>Running xrdcp the first time fails with the error:
>>>>
>>>> 041216 22:32:25 001 Xrd: SendGenCommand Server declared error 3005:No servers
>>>>        are available to read the file.
>>>>
>>>>Running the same xrdcp command a second time the client is redirected to a
>>>>data server and the file is transfered.
>>>>Below is the output from the client and the relevant part from the redirector
>>>>or data server (if aplicable) log files. The output is quiet verbose
>>>>because in the xrootd.cf the 'trace: all' option was used for:
>>>>xrootd,xrd,olb,ofs,oss
>>>>
>>>>Any idea ?
>>>>
>>>>Thanks,
>>>>  Wilko
>>>>
>>>>
>>>>#
>>>>#  first attempt to copy a local file into xrootd
>>>>#
>>>>
>>>>
>>>>>xrdcp -d2 dummy.root  root://bbr-test-xfer1:2094////prod/s5/wk.root
>>>>
>>>># ------------ xrdcp output
>>>>
>>>>041216 22:32:20 001 Xrd: Create Access to server granted.
>>>>041216 22:32:20 001 Xrd: Create Opening the remote file ///prod/s5/wk.root
>>>>041216 22:32:20 001 Xrd: SendGenCommand Sending command Open
>>>>041216 22:32:20 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [bbr-test-xfer1.slac.stanford.edu:2094]...
>>>>041216 22:32:20 001 Xrd: ReadPartialAnswer Server [bbr-test-xfer1.slac.stanford.edu:2094]
>>>>       did not answer OK. Resp status is [kXR_wait]
>>>>041216 22:32:20 001 Xrd: SendGenCommand Server [bbr-test-xfer1.slac.stanford.edu:2094]
>>>>       requested 5 seconds of wait
>>>>041216 22:32:25 001 Xrd: SendGenCommand Sending command Open
>>>>041216 22:32:25 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [bbr-test-xfer1.slac.stanford.edu:2094]...
>>>>041216 22:32:25 001 Xrd: ReadPartialAnswer Server [bbr-test-xfer1.slac.stanford.edu:2094]
>>>>       did not answer OK. Resp status is [kXR_error]
>>>>041216 22:32:25 001 Xrd: Open Server [bbr-test-xfer1.slac.stanford.edu:2094] did not
>>>>       return OK message for last request.
>>>>041216 22:32:25 001 Xrd: SendGenCommand Server declared error 3005:No servers are available
>>>>       to read the file.
>>>>041216 22:32:25 001 Xrd: Create Error opening the file ///prod/s5/wk.root on host
>>>>       bbr-test-xfer1:2094
>>>>041216 22:32:25 001 Xrd: Close File not opened.
>>>>Caching info: MissRate=0 MissCount=0 ReadsCounter=0
>>>>Caching info: BytesUsefulness=0 BytesSubmitted=0 BytesHit=0
>>>>Error accessing path/file for root://bbr-test-xfer1:2094////prod/s5/wk.root
>>>>
>>>># ----------------------- Redirector log file
>>>>
>>>>041216 22:32:20 029 XrdInet: Accepted connection from noric06
>>>>041216 22:32:20 008 XrdSched: running 6@noric06 inq=0
>>>>041216 22:32:20 008 XrdProtocol: matched protocol xrootd
>>>>041216 22:32:20 008 ?:16@noric06 XrdPoll: FD 16 attached to poller 0; num=1
>>>>041216 22:32:20 008 ?:16@noric06 XrootdProtocol: 0000 req=3007 dlen=0
>>>>041216 22:32:20 008 wilko.25211:16@noric06 XrootdResponse: 0000 sending OK
>>>>041216 22:32:20 008 XrootdXeq: User logged in as wilko.25211:16@noric06
>>>>041216 22:32:20 008 wilko.25211:16@noric06 XrootdProtocol: 0000 req=3010 dlen=18
>>>>041216 22:32:20 008 wilko.25211:16@noric06 ofs_open: 100-1000000664 fn=/prod/s5/wk.root
>>>>041216 22:32:20 008 odc_Locate: user=wilko.25211:16@noric06 wait :5 by bbr-test-xfer1
>>>>       path=/prod/s5/wk.root
>>>>041216 22:32:20 008 wilko.25211:16@noric06 XrootdProtocol: 0000 stalling client for 5 sec
>>>>041216 22:32:25 008 wilko.25211:16@noric06 XrootdProtocol: 0000 request timeout;
>>>>       read 0 of 24 bytes
>>>>041216 22:32:25 023 XrdPoll: Poller 0 enable fd 16
>>>>041216 22:32:25 009 XrdSched: running wilko.25211:16@noric06 inq=0
>>>>041216 22:32:25 009 wilko.25211:16@noric06 XrootdProtocol: 0000 req=3010 dlen=18
>>>>041216 22:32:25 009 wilko.25211:16@noric06 ofs_open: 0-664 fn=/prod/s5/wk.root
>>>>041216 22:32:25 009 odc_Locate: user=wilko.25211:16@noric06 error: No servers are available to
>>>>       read the file.:0 by bbr-test-xfer1 path=/prod/s5/wk.root
>>>>041216 22:32:25 009 wilko.25211:16@noric06 XrootdResponse: 0000 sending err 3005:
>>>>       No servers are available to read the file.
>>>>041216 22:32:25 009 XrdLink: wilko.25211:16@noric06 disconnected after 0:00:05 (link read error)
>>>>
>>>>
>>>>#
>>>># 2nd attempt
>>>>#
>>>>
>>>>
>>>>>xrdcp -d2 dummy.root  root://bbr-test-xfer1:2094////prod/s5/wk.root
>>>>
>>>># ------------ xrdcp output
>>>>
>>>>041216 22:36:08 001 Xrd: Create Access to server granted.
>>>>041216 22:36:08 001 Xrd: Create Opening the remote file ///prod/s5/wk.root
>>>>041216 22:36:08 001 Xrd: SendGenCommand Sending command Open
>>>>041216 22:36:08 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [bbr-test-xfer1.slac.stanford.edu:2094]...
>>>>041216 22:36:08 001 Xrd: ReadPartialAnswer Server [bbr-test-xfer1.slac.stanford.edu:2094]
>>>>       did not answer OK. Resp status is [kXR_redirect]
>>>>041216 22:36:08 001 Xrd: HandleServerError Redir count=1
>>>>041216 22:36:08 001 Xrd: HandleServerError Received redirection to
>>>>       [oprserv04.slac.stanford.edu:2094]. Token=[].
>>>>041216 22:36:08 001 Xrd: GetDomainToMatch GetHostName(oprserv04.slac.stanford.edu) returned
>>>>       name=oprserv04.slac.stanford.edu
>>>>041216 22:36:08 001 Xrd: GetDomainToMatch GetDomain(oprserv04.slac.stanford.edu) -->
>>>>       slac.stanford.edu
>>>>041216 22:36:08 001 Xrd: CheckHostDomain Resolved [oprserv04.slac.stanford.edu]'s domain
>>>>       name into [slac.stanford.edu]
>>>>041216 22:36:08 001 Xrd: CheckHostDomain Access granted to the domain of
>>>>       [oprserv04.slac.stanford.edu].
>>>>041216 22:36:08 001 Xrd: Connect Creating a logical connection...
>>>>041216 22:36:08 001 Xrd: Connect Physical connection not found. Creating a new one...
>>>>041216 22:36:08 001 Xrd: Connect Connecting to [oprserv04.slac.stanford.edu:2094]
>>>>041216 22:36:08 001 Xrd: ClientSock::TryConnect Trying to connect tooprserv04.slac.stanford.edu(
>>>>       134.79.85.44):2094 Timeout=60
>>>>041216 22:36:08 001 Xrd: Connect Connected to [oprserv04.slac.stanford.edu:2094]
>>>>041216 22:36:08 001 Xrd: Connect New physical connection to server
>>>>       oprserv04.slac.stanford.edu:2094 succesfully created.
>>>>041216 22:36:08 001 Xrd: Connect LogConn: size:2 count: 1PhyConn: size:2 count: 2
>>>>041216 22:36:08 001 Xrd: Connect Connect(oprserv04.slac.stanford.edu, 2094) returned 1
>>>>041216 22:36:08 001 Xrd: DoHandShake HandShake step 1: Sending 20 bytes to the server
>>>>       [oprserv04.slac.stanford.edu:2094]
>>>>041216 22:36:08 001 Xrd: DoHandShake HandShake step 2: Reading 4 bytes from server
>>>>       [oprserv04.slac.stanford.edu:2094].
>>>>041216 22:36:08 001 Xrd: DoHandShake HandShake step 3: Reading 12 bytes from server
>>>>       [oprserv04.slac.stanford.edu:2094].
>>>>041216 22:36:08 001 Xrd: StartReader Starting reader thread...
>>>>041216 22:36:08 000 Xrd: SocketReaderThread Reader Thread starting.
>>>>041216 22:36:08 001 Xrd: GetAccessToSrv Ok, the server on [oprserv04.slac.stanford.edu:2094]
>>>>       is an  xrootd data server.
>>>>041216 22:36:08 001 Xrd: DoLogin Logging into the server [oprserv04.slac.stanford.edu:2094].
>>>>       pid=25233 uid=wilko
>>>>041216 22:36:08 001 Xrd: SendGenCommand Sending command XTNetconn::doLogin
>>>>041216 22:36:08 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [oprserv04.slac.stanford.edu:2094]...
>>>>041216 22:36:08 001 Xrd: SendGenCommand Communication error detected with
>>>>       [oprserv04.slac.stanford.edu:2094
>>>>041216 22:36:08 001 Xrd: SendGenCommand Sending command Open
>>>>041216 22:36:08 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [oprserv04.slac.stanford.edu:2094]...
>>>>041216 22:36:08 001 Xrd: Create File opened succesfully.
>>>>041216 22:36:08 001 Xrd: main dummy.root --> root://bbr-test-xfer1:2094////prod/s5/wk.root
>>>>041216 22:36:08 25233 Xrd: ReaderThread_loc Reader Thread starting.
>>>>041216 22:36:08 001 Xrd: SendGenCommand Sending command Write
>>>>041216 22:36:08 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [oprserv04.slac.stanford.edu:2094]...
>>>>041216 22:36:08 001 Xrd: SendGenCommand Sending command Close
>>>>041216 22:36:08 001 Xrd: ReadPartialAnswer Reading a XrdClientMessage from the server
>>>>       [oprserv04.slac.stanford.edu:2094]...
>>>>Caching info: MissRate=0 MissCount=0 ReadsCounter=0
>>>>Caching info: BytesUsefulness=0 BytesSubmitted=0 BytesHit=0
>>>>
>>>>
>>>># ------------------ Redirector bbr-test-xfer1 log file
>>>>
>>>>
>>>>041216 22:36:08 029 XrdInet: Accepted connection from noric06
>>>>041216 22:36:08 010 XrdSched: running 7@noric06 inq=0
>>>>041216 22:36:08 010 XrdProtocol: matched protocol xrootd
>>>>041216 22:36:08 010 ?:17@noric06 XrdPoll: FD 17 attached to poller 0; num=1
>>>>041216 22:36:08 010 ?:17@noric06 XrootdProtocol: 0000 req=3007 dlen=0
>>>>041216 22:36:08 010 wilko.25233:17@noric06 XrootdResponse: 0000 sending OK
>>>>041216 22:36:08 010 XrootdXeq: User logged in as wilko.25233:17@noric06
>>>>041216 22:36:08 010 wilko.25233:17@noric06 XrootdProtocol: 0000 req=3010 dlen=18
>>>>041216 22:36:08 010 wilko.25233:17@noric06 ofs_open: 100-1000000664 fn=/prod/s5/wk.root
>>>>041216 22:36:08 010 odc_Locate: user=wilko.25233:17@noric06 redirected to
>>>>       oprserv04.slac.stanford.edu:2094 by bbr-test-xfer1 path=/prod/s5/wk.root
>>>>041216 22:36:08 010 wilko.25233:17@noric06 XrootdProtocol: 0000 redirecting to
>>>>       oprserv04.slac.stanford.edu:2094
>>>>041216 22:36:08 010 XrdLink: wilko.25233:17@noric06 disconnected after 0:00:00 (link read error)
>>>>
>>>>
>>>># ------------------ Data Server oprserv04  log file
>>>>
>>>>041216 22:36:08 029 XrdInet: Accepted connection from noric06
>>>>041216 22:36:08 007 XrdSched: running 5@noric06 inq=0
>>>>041216 22:36:08 007 XrdProtocol: matched protocol xrootd
>>>>041216 22:36:08 007 ?:15@noric06 XrdPoll: FD 15 attached to poller 0; num=1
>>>>041216 22:36:08 007 ?:15@noric06 XrootdProtocol: 0100 req=3007 dlen=0
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdResponse: 0100 sending OK
>>>>041216 22:36:08 007 XrootdXeq: User logged in as wilko.25233:15@noric06
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdProtocol: 0100 req=3010 dlen=18
>>>>041216 22:36:08 007 wilko.25233:15@noric06 ofs_open: 100-1000000664 fn=/prod/s5/wk.root
>>>>041216 22:36:08 007 oss_Alloc_Cache: public cache as /u1/kanga/cache0/%u1%kanga%prod%s5%wk.root
>>>>041216 22:36:08 007 wilko.25233:15@noric06 oss_Open_ufs: fd=8192 flags=2 mode=1000000664
>>>>       path=/u1/kanga/prod/s5/wk.root
>>>>041216 22:36:08 007 wilko.25233:15@noric06 ofs_stat:  pi=5 fn=/prod/s5/wk.root
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdProtocol: 0100 open nmfa /prod/s5/wk.root fh=0
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdResponse: 0100 sending 4 data bytes; rc=0
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdProtocol: 0100 req=3019 dlen=29
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdProtocol: 0100 fh=0 write 29@0
>>>>041216 22:36:08 007 wilko.25233:15@noric06 ofs_write: 29@0 pi=5 fn=/prod/s5/wk.root
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdResponse: 0100 sending OK
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdProtocol: 0100 req=3003 dlen=0
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdFile: closing w /prod/s5/wk.root
>>>>041216 22:36:08 007 wilko.25233:15@noric06 ofs_close: lnks=1 pi=5 fn=/prod/s5/wk.root
>>>>041216 22:36:08 007  ofs_Unopen: numfd=0 pi=5 fn=/prod/s5/wk.root
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdProtocol: 0100 close fh=0
>>>>041216 22:36:08 007 wilko.25233:15@noric06 XrootdResponse: 0100 sending OK
>>>>041216 22:36:08 007 XrdLink: wilko.25233:15@noric06 disconnected after 0:00:00 (link read error)
>>>>
>>>>
>>>