Print

Print


Hello,

It seems we have a problem which has gone undiscovered with dCache since May.   

```
[arossi@fndcatemp1 xrootd]$  xrdcp --tpc delegate only  root://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/test/arossi/volatile/cksumtest root://fndcatemp2.fnal.gov:1097//pnfs/fs/usr/test/arossi/tape/0/cksumtest-`date | tr ' ' '.'`
[1B/1B][100%][==================================================][1B/s]  
```
but

```
[arossi@fndcatemp1 xrootd]$  xrdcp --tpc delegate only -C adler32  root://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/test/arossi/volatile/cksumtest root://fndcatemp2.fnal.gov:1097//pnfs/fs/usr/test/arossi/tape/0/cksumtest-`date | tr ' ' '.'`
[1B/1B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3013] Request 3001 not supported Got an error while querying the checksum! (destination)
```

The following commit seems to be involved:

```
3a1be817e5e90bb6ea4667077d02c30c5d774018 is the first bad commit
commit 3a1be817e5e90bb6ea4667077d02c30c5d774018
Author: Michal Simon <[log in to unmask]>
Date:   Tue May 12 18:50:14 2020 +0200
    [XrdCl] Preserve xrdcl.* cgi elements on redirect as they are important for the internal workflow.
```

though it is difficult to say with certainty, as the preceding commit

```
commit 93c885978c564c380d3b82c4ac7739e16e72492c (HEAD)
Author: Michal Simon <[log in to unmask]>
Date:   Tue May 12 14:57:27 2020 +0200

    [XrdCl] Make sure xrdcl specific cgi is not propagated to server.
```

hangs against the pool.   From the debug output on the latter, we see that the transfer succeeds but that there is no response on close

```
[1B/1B][100%][==================================================][1B/s]  [2020-12-09 12:04:27.545898 -0600][Debug  ][File              ] [0x1e0e8b0@root://fndcatemp2.fnal.gov:1097//pnfs/fs/usr/test/arossi/tape/0/cksumtest-Wed.Dec..9.12:04:26.CST.2020?oss.asize=1&tpc.dlg=fndcatemp2.fnal.gov:1095&tpc.dlgon=1&tpc.key=16d7f326505d10c25fd111aa&tpc.lfn=/pnfs/fs/usr/test/arossi/volatile/cksumtest&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=fndcatemp2.fnal.gov:1095&tpc.stage=copy&tpc.str=1&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=e344949b-64a2-4eff-b6b4-b494a705fbff] Sending a close command for handle 0x0 to dcatest03.fnal.gov:33123
[2020-12-09 12:04:27.545944 -0600][Dump   ][XRootD            ] [dcatest03.fnal.gov:33123] Sending message kXR_close (handle: 0x00000000)
[2020-12-09 12:04:27.545978 -0600][Debug  ][ExDbgMsg          ] [dcatest03.fnal.gov:33123] MsgHandler created: 0x1e15000 (message: kXR_close (handle: 0x00000000) ).
[2020-12-09 12:04:27.546009 -0600][Dump   ][PostMaster        ] [dcatest03.fnal.gov:33123] Sending message kXR_close (handle: 0x00000000) (0x1e0ecc0) through substream 0 expecting answer at 0
[2020-12-09 12:04:27.546069 -0600][Dump   ][AsyncSock         ] [dcatest03.fnal.gov:33123.0] Wrote a message: kXR_close (handle: 0x00000000) (0x1e0ecc0), 24 bytes
[2020-12-09 12:04:27.546122 -0600][Dump   ][AsyncSock         ] [dcatest03.fnal.gov:33123.0] Successfully sent message: kXR_close (handle: 0x00000000) (0x1e0ecc0).
[2020-12-09 12:04:27.546143 -0600][Dump   ][XRootD            ] [dcatest03.fnal.gov:33123] Message kXR_close (handle: 0x00000000) has been successfully sent.
[2020-12-09 12:04:27.546159 -0600][Debug  ][ExDbgMsg          ] [dcatest03.fnal.gov:33123] Moving MsgHandler: 0x1e15000 (message: kXR_close (handle: 0x00000000) ) from out-queu to in-queue.
[2020-12-09 12:04:27.546182 -0600][Dump   ][PostMaster        ] [dcatest03.fnal.gov:33123.0] All messages consumed, disable uplink
```

Be that as it may, our working hypothesis concerning the checksum issue is that, prior to this commit (at least in the 4-series client), the checksum request was issued on the same session, whereas in the new client this is a separate session.  

Issuing a separate request to the pool endpoint, however, is currently not going to work, because it is lacking the UUID for the transfer generated on open at the door, and also because the new connection does not indicate where to redirect (which door), information also obtained from the original open request, which is not present.

We would like to understand whether this is indeed the case, and whether this behavior could be modified on your end so that xrdcp and dCache are once again interoperable using TPC.

Thanks!

Al

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1359#issuecomment-741956967

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1