Print

Print


Hi,
  We are observing transfer failings to an Xcache endpoint with error returned as "Error response: numerical argument out of domain".
Oxford runs an Xcache pointing at the RAL endpoint. Oxford runs with 5.3.1. When RAL had 5.2.0 these errors did not appear, but in testing with 5.3.1 we see issues. 
I've managed to reproduce the problem with a local Xcache instance, and also with a xrootd server using local storage (e.g. oss.localroot /xroot-test, and not using the XrdCeph plugin). 
It seems to occur in the last read of the file, and for files with a non 2**x size. e.g. 64MiB seems to work ok, but 64MiB+4bytes fails. 

The flow is  Xcache <-  memcache proxy <- server
If I remove the memcache proxy (e.g. Xcache <- server), then the problem disappears !


The client sees:
```
[2021-09-06 14:47:37.062940 +0000][Dump   ][XRootD            ] [ceph-dev-gw1.gridpp.rl.ac.uk:1094] Got a kXR_error response to request kXR_read (handle: 0x00000000, offset: 67108864, size: 33554432) [3019] Unable to read /root:/ceph-dev-gw1.gridpp.rl.ac.uk:1096/xrootd-test/test/jwalder/test_64MiBp1; numerical argument out of domain
[2021-09-06 14:47:37.063005 +0000][Debug  ][XRootD            ] [ceph-dev-gw1.gridpp.rl.ac.uk:1094] Handling error while processing kXR_read (handle: 0x00000000, offset: 67108864, size: 33554432): [ERROR] Error response: numerical argument out of domain.
[2021-09-06 14:47:37.063043 +0000][Debug  ][ExDbgMsg          ] [ceph-dev-gw1.gridpp.rl.ac.uk:1094] Calling MsgHandler: 0xef5370 (message: kXR_read (handle: 0x00000000, offset: 67108864, size: 33554432) ) with status: [ERROR] Error response: numerical argument out of domain.
[2021-09-06 14:47:37.063073 +0000][Dump   ][File              ] [0xef0720@root://ceph-dev-gw1.gridpp.rl.ac.uk:1094//root://ceph-dev-gw1.gridpp.rl.ac.uk:1096//xrootd-test/test/jwalder/test_64MiBp1?xrdcl.requuid=41b966fc-6a6f-4b9f-a428-56c838d2d9e6] File state error encountered. Message kXR_read (handle: 0x00000000, offset: 67108864, size: 33554432) returned with [ERROR] Server responded with an error: [3019] Unable to read /root:/ceph-dev-gw1.gridpp.rl.ac.uk:1096/xrootd-test/test/jwalder/test_64MiBp1; numerical argument out of domain
[2021-09-06 14:47:37.063102 +0000][Error  ][File              ] [0xef0720@root://ceph-dev-gw1.gridpp.rl.ac.uk:1094//root://ceph-dev-gw1.gridpp.rl.ac.uk:1096//xrootd-test/test/jwalder/test_64MiBp1?xrdcl.requuid=41b966fc-6a6f-4b9f-a428-56c838d2d9e6] Fatal file state error. Message kXR_read (handle: 0x00000000, offset: 67108864, size: 33554432) returned with [ERROR] Server responded with an error: [3019] Unable to read /root:/ceph-dev-gw1.gridpp.rl.ac.uk:1096/xrootd-test/test/jwalder/test_64MiBp1; numerical argument out of domain
[2021-09-06 14:47:37.063143 +0000][Dump   ][File              ] [0xef0720@root://ceph-dev-gw1.gridpp.rl.ac.uk:1094//root://ceph-dev-gw1.gridpp.rl.ac.uk:1096//xrootd-test/test/jwalder/test_64MiBp1?xrdcl.requuid=41b966fc-6a6f-4b9f-a428-56c838d2d9e6] Failing message kXR_read (handle: 0x00000000, offset: 67108864, size: 33554432) with [ERROR] Server responded with an error: [3019] Unable to read /root:/ceph-dev-gw1.gridpp.rl.ac.uk:1096/xrootd-test/test/jwalder/test_64MiBp1; numerical argument out of domain
```

The Xcache reports 
```
210906 15:47:37 962677 orl67423.21467:[log in to unmask] XrootdProtocol: 0100 req=read dlen=8
210906 15:47:37 962677 orl67423.21467:[log in to unmask] XrootdProtocol: 0100 0 fh=0 read 33554432@67108864
210906 15:47:37 962706 XrdSched: running  inq=0
210906 15:47:37 962706 XrdPfc_File: error ProcessBlockResponse block 0x7f5de8026d20, idx=2, off=67108864 error=-33 /xrootd-test/test/jwalder/test_64MiBp1
210906 15:47:37 962677 XrdPfc_File: error Read() io 0x7f5de8015f40, block 2 finished with error 33 numerical argument out of domain /xrootd-test/test/jwalder/test_64MiBp1
210906 15:47:37 962677 XrdPfc_IO: warning Read() error in File::Read(), exit status=-33, error=numerical argument out of domain root:[log in to unmask]:1096//xrootd-test/test/jwalder/test_64MiBp1
210906 15:47:37 962677 ofs_read: orl67423.21467:[log in to unmask] Unable to read /root:/ceph-dev-gw1.gridpp.rl.ac.uk:1096/xrootd-test/test/jwalder/test_64MiBp1; numerical argument out of domain
210906 15:47:37 962677 orl67423.21467:[log in to unmask] XrootdResponse: 0100 sending err 3019: Unable to read /root:/ceph-dev-gw1.gridpp.rl.ac.uk:1096/xrootd-test/test/jwalder/test_64MiBp1; numerical argument out of domain
210906 15:47:37 962677 orl67423.21467:[log in to unmask] XrootdProtocol: 0100 req=close dlen=0
```

There are no obvious errors reported from the memcache proxy or server instances. 
The last read from the proxy appears to be a 4k request (note only 4 bytes are needed at this point):
```
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 XrootdResponse: 0100 sending final 8 info and 8 data bytes
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 XrootdProtocol: 0100 req=pgread dlen=2
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 XrootdProtocol: 0100 0 pgread 4096@67108864 fn=/xrootd-test/test/jwalder/test_64MiBp1
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 ofs_pgRead: 4096@67108864 fn=/xrootd-test/test/jwalder/test_64MiBp1
Rdr: 4096@67108864 pr=0
Cache: Hit slot 8 sz 4 nio 1 uc -1
Cache: Ref 300000008 slot 8 sz 4 uc 0
Rdr: ret 4 hits 1 pr 0
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 XrootdResponse: 0100 sending final 8 info and 8 data bytes
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 XrootdProtocol: 0100 req=close dlen=0
210906 15:56:01 962700 u23.962666:30@ceph-dev-gw1 ofs_close: use=1 fn=/xrootd-test/test/jwalder/test_64MiBp1
Cache: 0 att; rel 1 slots; 0 Faults; 100 root:[log in to unmask]:1095/xrootd-test/test/jwalder/test_64MiBp1
Cache: Stats: 4 Read; 16 Get; 0 Pass; 0 Write; 0 Put; 3 Hits; 1 Miss; 0 pead; 0 HitsPR; 0 MissPR; Path root:[log in to unmask]:1095/xrootd-test/test/jwalder/test_64MiBp1
```
 
I'm not quite sure where to look now,  there are a few places where new EDOM errors were returned between 5.2.0 and 5.3.0, but it wasn't obvious to me where the best place to start looking was, or if any config settings altered the behaviour 

If there are some details of the configs that are relevant to look at please let me know.

Cheers,
James




-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1507

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1