Are there any known issues with TPC PULL where the final checksum does not match the user provided one? We are running EOS on top of XRootD 4.12.7 and LHCb started using HTTP TPC in production but some of the TPC PULL transfers fail eventually since the final checksum does not match the user provided one. FTS later retries using probably TPC push and the transfers succeeds with the correct checksum. In the EOS logs we don't see any errors whatsoever. For example on the disk node when such an file with the wrong checksum is written, we get clean and successful transfer log: ``` 210219 05:10:34 26159 TPC_PullRequest: event=PULL_START, local=/eos/lhcb/grid/prod/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst, remote=https://lhcbwebdav-kit.gridka.de:2880/pnfs/gridka.de/lhcb/LHCb-Disk/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122 396_00001840_1.rxchad.strip.dst, user=(anonymous); Starting a push request 210219 05:10:34 time=1613707834.930317 func=open level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=XrdFstOfsFile:120 tident=? sec= uid=0 gid=0 name= geo="" path=/eos/lhcb/grid/prod/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst info=cap.sym=<...>&cap.msg=<...>&mgm.logid=6a336346-7268-11eb-a7ac-a4bf0114cb20&mgm.replicaindex=0&mgm.replicahead=0&mgm.etag="342790846266998784:00000000"&mgm.id=4c1d6756&authz=<...> open_mode=201 210219 05:10:34 time=1613707834.930530 func=ProcessCapOpaque level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=XrdFstOfsFile:2253 tident=? sec=(null) uid=99 gid=99 name=(null) geo="" capability=&mgm.access=create&mgm.ruid=7947&mgm.rgid=1470&mgm.uid=99&mgm.gid=99&mgm.path=/eos/lhcb/grid/prod/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst&mgm.manager=eoslhcb-qdb-52bedb7c30.cern.ch:1094&mgm.fid=4c1d6756&mgm.cid=113766179&mgm.sec=https|lhcbprod|[2001:1458:301:cd::100:1ab]||||/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fstagni/CN=693025/CN=Federico Stagni|&mgm.lid=1048850&mgm.bookingsize=1000000000&mgm.fsid=11575&mgm.url0=root://st-048-bbead64b.cern.ch:1095//&mgm.fsid0=11575&mgm.url1=root://p06636710f33060.cern.ch:1095//&mgm.fsid1=10611&cap.valid=1613711434 210219 05:10:34 time=1613707834.930577 func=open level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=XrdFstOfsFile:198 tident=? sec=(null) uid=7947 gid=1470 name=nobody geo="" ns_path=/eos/lhcb/grid/prod/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst fst_path=/data05/0001f2d3/4c1d6756 210219 05:10:34 time=1613707834.930926 func=open level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=XrdFstOfsFile:468 tident=? sec=(null) uid=7947 gid=1470 name=nobody geo="" fst_path=/data05/0001f2d3/4c1d6756 open-mode=301 create-mode=41a4 layout-name=replica oss-opaque=&mgm.lid=1048850&mgm.bookingsize=1000000000 210219 05:10:34 time=1613707834.930943 func=Open level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=ReplicaParLayout:104 tident=? sec= uid=0 gid=0 name= geo="" replica_head=0, replica_index=0 .... 210219 05:10:41 time=1613707841.983478 func=VerifyChecksum level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=XrdFstOfsFile:3017 tident=? sec= uid=7947 gid=1470 name=nobody geo="" (write) checksum type: adler checksum hex: 1edadd8c requested-checksum hex: -none- 210219 05:10:42 time=1613707842.071020 func=_close level=INFO logid=6a336346-7268-11eb-a7ac-a4bf0114cb20 [log in to unmask]:1095 tid=00007f80735fb700 source=XrdFstOfsFile:1807 tident=? sec= uid=7947 gid=1470 name=nobody geo="" msg="done close" rc=0 errc=0 210219 05:10:42 26159 TPC_PullRequest: event=TRANSFER_SUCCESS, local=/eos/lhcb/grid/prod/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst, remote=https://lhcbwebdav-kit.gridka.de:2880/pnfs/gridka.de/lhcb/LHCb-Disk/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst, user=(anonymous), bytes_transferred=180232171, tpc_status=200 ``` And the corresponding error in FTS reads: ``` INFO Fri, 19 Feb 2021 05:10:42 +0100; [1613707842071] BOTH http_plugin TRANSFER:EXIT https://lhcbwebdav-kit.gridka.de:2880/pnfs/gridka.de/lhcb/LHCb-Disk/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst => https://eoslhcb.cern.ch/eos/lhcb/grid/prod/lhcb/MC/2012/RXCHAD.STRIP.DST/00122396/0000/00122396_00001840_1.rxchad.strip.dst ERR Fri, 19 Feb 2021 05:10:42 +0100; Non recoverable error: [5] DESTINATION CHECKSUM MISMATCH User-defined and destination ADLER32 do not match (fc21471b != 1edadd8c) ``` The EOS OFS layer is quite robust when it comes such write operations, so I would suspect some possible issues in the XrdTpc layer related to curl reading from remote destination/ buffering etc. Does this ring a bell? Thanks! -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1404 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1