Print

Print


Hi,

we are facing problems with DPM running on top of XRootD 5.4.3 from 
EPEL, because transfers with xroot protocol gets stuck with gfal2. 
Transfers with simple xrdcopy recovers but only after client detect 60s 
with no activity and create new connection.

* why gfal2 can't recover like xrdcopy?
* what is the difference between 5.4.2 (DPM with this version works 
fine) and 5.4.3 that causes these troubles?

On the xrdcopy client side we see something like (details 
https://vokac.web.cern.ch/vokac/tmp/xrootd-client-5.4.3.log):

...
[2022-07-16 15:23:11.774189 +0200][Debug  ][AsyncSock         ] [dpmhead.farm.particle.cz:1094.0] Async connection call returned
...
[2022-07-16 15:23:17.734260 +0200][Debug  ][ExDbgMsg          ] [msg: 0x7803e538] Assigned MsgHandler: 0x67e650.
[2022-07-16 15:23:17.741195 +0200][Dump   ][AsyncSock         ] [dpmdisk2.farm.particle.cz:1095.0] Received message 0x7803e538 of 65632 bytes
[2022-07-16 15:23:17.741242 +0200][Dump   ][PostMaster        ] [dpmdisk2.farm.particle.cz:1095] Handling received message: 0x7803e538.
[2022-07-16 15:23:17.741248 +0200][Dump   ][PostMaster        ] [dpmdisk2.farm.particle.cz:1095] Ignoring the processing handler for: 0x619538.
[2022-07-16 15:23:17.741262 +0200][Dump   ][XRootDTransport   ] [msg: 0x7803e5c8] Expecting 24 bytes of message body
[2022-07-16 15:23:17.741266 +0200][Dump   ][AsyncSock         ] [dpmdisk2.farm.particle.cz:1095.0] Received message header for 0x7803e5c8 size: 8
[2022-07-16 15:23:17.741272 +0200][Dump   ][XRootD            ] [dpmdisk2.farm.particle.cz:1095] Got a kXR_status response to request kXR_pgread (handle: 0x00000000, offset: 419430400, size: 8388608)
[2022-07-16 15:23:17.741276 +0200][Debug  ][ExDbgMsg          ] [msg: 0x7803e5c8] Assigned MsgHandler: 0x67e650.
[2022-07-16 15:23:26.773143 +0200][Dump   ][TaskMgr           ] Running task: "TickGeneratorTask for: root://dpmhead.farm.particle.cz:1094"
[2022-07-16 15:23:26.773215 +0200][Dump   ][TaskMgr           ] Will rerun task "TickGeneratorTask for: root://dpmhead.farm.particle.cz:1094" at [2022-07-16 15:23:41 +0200]
[2022-07-16 15:23:26.773221 +0200][Dump   ][TaskMgr           ] Running task: "FileTimer task"
[2022-07-16 15:23:26.773229 +0200][Dump   ][TaskMgr           ] Will rerun task "FileTimer task" at [2022-07-16 15:23:41 +0200]
[2022-07-16 15:23:28.446164 +0200][Dump   ][XRootDTransport   ] [dpmhead.farm.particle.cz:1094.0] Stream inactive since 15 seconds, TTL: 1200, allocated SIDs: 0, open files: 0, bound file objects: 0
[2022-07-16 15:23:28.446217 +0200][Dump   ][XRootDTransport   ] [dpmhead.farm.particle.cz:1094.0] Stream inactive since 15 seconds, stream timeout: 60, allocated SIDs: 0, wait barrier: 2022-07-16 15:23:13 +0200
[2022-07-16 15:23:28.775142 +0200][Dump   ][TaskMgr           ] Running task: "TickGeneratorTask for: root://dpmdisk2.farm.particle.cz:1095"
[2022-07-16 15:23:28.775208 +0200][Dump   ][TaskMgr           ] Will rerun task "TickGeneratorTask for: root://dpmdisk2.farm.particle.cz:1095" at [2022-07-16 15:23:43 +0200]
...
[2022-07-16 15:24:18.625518 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Calling MsgHandler: 0x680e50 (message: kXR_pgread (handle: 0x00000000, offset: 436207616, size: 8388608) ) with status: [ERROR] Socket timeout.
[2022-07-16 15:24:18.625528 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] File state error encountered. Message kXR_pgread (handle: 0x00000000, offset: 436207616, size: 8388608) returned with [ERROR] Socket timeout
[2022-07-16 15:24:18.625535 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Putting message kXR_pgread (handle: 0x00000000, offset: 436207616, size: 8388608) in the recovery list
[2022-07-16 15:24:18.625545 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Destroying MsgHandler: 0x680e50.
[2022-07-16 15:24:18.625555 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Calling MsgHandler: 0x67dc90 (message: kXR_pgread (handle: 0x00000000, offset: 411041792, size: 8388608) ) with status: [ERROR] Socket timeout.
[2022-07-16 15:24:18.625560 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] File state error encountered. Message kXR_pgread (handle: 0x00000000, offset: 411041792, size: 8388608) returned with [ERROR] Socket timeout
[2022-07-16 15:24:18.625564 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Putting message kXR_pgread (handle: 0x00000000, offset: 411041792, size: 8388608) in the recovery list
[2022-07-16 15:24:18.625569 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Destroying MsgHandler: 0x67dc90.
[2022-07-16 15:24:18.625593 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Calling MsgHandler: 0x67e650 (message: kXR_pgread (handle: 0x00000000, offset: 419430400, size: 8388608) ) with status: [ERROR] Socket timeout.
[2022-07-16 15:24:18.625598 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] File state error encountered. Message kXR_pgread (handle: 0x00000000, offset: 419430400, size: 8388608) returned with [ERROR] Socket timeout
[2022-07-16 15:24:18.625602 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Putting message kXR_pgread (handle: 0x00000000, offset: 419430400, size: 8388608) in the recovery list
[2022-07-16 15:24:18.625607 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Destroying MsgHandler: 0x67e650.
[2022-07-16 15:24:18.627151 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Calling MsgHandler: 0x67e170 (message: kXR_pgread (handle: 0x00000000, offset: 427819008, size: 8388608) ) with status: [ERROR] Socket timeout.
[2022-07-16 15:24:18.627186 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] File state error encountered. Message kXR_pgread (handle: 0x00000000, offset: 427819008, size: 8388608) returned with [ERROR] Socket timeout
[2022-07-16 15:24:18.627193 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Putting message kXR_pgread (handle: 0x00000000, offset: 427819008, size: 8388608) in the recovery list
[2022-07-16 15:24:18.627197 +0200][Debug  ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Running the recovery procedure
[2022-07-16 15:24:18.627204 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Sending a recovery open command to root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe
[2022-07-16 15:24:18.627224 +0200][Dump   ][XRootD            ] [dpmhead.farm.particle.cz:1094] Sending message kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read )
[2022-07-16 15:24:18.627241 +0200][Debug  ][ExDbgMsg          ] [dpmhead.farm.particle.cz:1094] MsgHandler created: 0x7c003850 (message: kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read ) ).
[2022-07-16 15:24:18.627251 +0200][Dump   ][PostMaster        ] [dpmhead.farm.particle.cz:1094] Sending message kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read ) (0x7c002070) through substream 0 expecting answer at 0
[2022-07-16 15:24:18.627273 +0200][Debug  ][ExDbgMsg          ] [dpmdisk2.farm.particle.cz:1095] Destroying MsgHandler: 0x67e170.
[2022-07-16 15:24:18.627827 +0200][Dump   ][AsyncSock         ] [dpmhead.farm.particle.cz:1094.0] Wrote a message: kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read ) (0x7c002070), 65 bytes
[2022-07-16 15:24:18.627875 +0200][Dump   ][AsyncSock         ] [dpmhead.farm.particle.cz:1094.0] Successfully sent message: kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read ) (0x7c002070).
[2022-07-16 15:24:18.627881 +0200][Dump   ][XRootD            ] [dpmhead.farm.particle.cz:1094] Message kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read ) has been successfully sent.
[2022-07-16 15:24:18.627886 +0200][Debug  ][ExDbgMsg          ] [dpmhead.farm.particle.cz:1094] Moving MsgHandler: 0x7c003850 (message: kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2, mode: 00, flags: kXR_open_read ) ) from out-queu to in-queue.
...

it seems that client vs. server time was 1s out of sync, server side 
looks like (details 
https://vokac.web.cern.ch/vokac/tmp/xrootd-server-5.4.3.log)

220716 15:23:13 28043 Xrd_Inet: Accepted connection on port 1095 [log in to unmask]
...
220716 15:23:18 28023 oss_AioWait: read completed forvokac.1773:[log in to unmask]; rc=0 result=65536 aiocb=0x7f039df2a700
220716 15:23:18 28023 oss_AioWait: read completed forvokac.1773:[log in to unmask]; rc=0 result=65536 aiocb=0x7f039df8ee00
220716 15:23:18 28023 oss_AioWait: read completed forvokac.1773:[log in to unmask]; rc=0 result=65536 aiocb=0x7f039df8e000
220716 15:23:18 28023 oss_AioWait: read completed forvokac.1773:[log in to unmask]; rc=0 result=65536 aiocb=0x7f039307ee00
220716 15:23:18 28019vokac.1773:[log in to unmask]  ofs_aioread: 65536@444530688 fn=/dpm/farm.particle.cz/home/dteam/5G.disk2
220716 15:23:18 28019vokac.1773:[log in to unmask]  oss_AioRead: fd=8000 read 65536@444530688 started; aiocb=0x7f039df90a00
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd beg 65536@444530688 inF=8
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444071936 result=65536 D-S=False-1 inF=7
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444071936 numF=8
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444137472 result=65536 D-S=False-1 inF=6
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444137472 numF=9
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444203008 result=65536 D-S=False-1 inF=5
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444203008 numF=10
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444268544 result=65536 D-S=False-1 inF=4
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444268544 numF=11
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444334080 result=65536 D-S=False-1 inF=3
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444334080 numF=12
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444399616 result=65536 D-S=False-1 inF=2
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444399616 numF=13
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444465152 result=65536 D-S=False-1 inF=1
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending partial 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444465152 numF=14
220716 15:23:18 28023 oss_AioWait: read completed forvokac.1773:[log in to unmask]; rc=0 result=65536 aiocb=0x7f039df90a00
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrd end 65536@444530688 result=65536 D-S=False-1 inF=0
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_Response: 0100 sending final 8 info and 65600 data bytes
220716 15:23:18 28019 Xrootd_PgrwBuff:  Recycle 65536@444530688 numF=15
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrd_Link: Setting FD 30 ref to 2+-1 post=0
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_File: File::Ref 0 after +-1
220716 15:23:18 28019vokac.1773:[log in to unmask]  Xrootd_PgrwAio: 0100 pgrw recycle R D-S=True-1
220716 15:23:20 28043vokac.1773:[log in to unmask]  Xrootd_Protocol: 0100 request timeout; read 0 of 24 bytes
220716 15:23:20 28043 Xrd_Poll: Poller 0 enabledvokac.1773:[log in to unmask]
220716 15:23:37 28018 Xrd_Inet: Accepted connection on port 1095 [log in to unmask]
220716 15:23:37 28018 Xrd_ProtLoad: matched port 1095 protocol XrdHttp
220716 15:23:37 28018 anon.0:28@dpmhead Xrd_Poll: FD 28 attached to poller 1; num=1
220716 15:23:37 28018 Xrd_Poll: Poller 1 enabled anon.0:28@dpmhead
220716 15:23:37 28042 Xrd_Sched: running main accept inq=0
220716 15:23:37 28021 anon.0:28@dpmhead Xrd_Poll: Link 28 terminating: hangup
220716 15:23:37 28045 Xrd_Sched: running anon.0:28@dpmhead inq=0
220716 15:23:37 28045 anon.0:28@dpmhead Xrd_Poll: Poller 1 removing FD 28
220716 15:23:37 28045 anon.0:28@dpmhead Xrd_Poll: FD 28 detached from poller 1; num=0
220716 15:23:41 28027 cms_Finder: Waiting for cms path /var/spool/xrootd/dpmdisk/.olb/olbd.admin
220716 15:24:18 28020vokac.1773:[log in to unmask]  Xrd_Poll: Link 30 terminating: hangup
220716 15:24:18 28019 Xrd_Sched: runningvokac.1773:[log in to unmask]  inq=0
220716 15:24:18 28019 XrootdXeq:vokac.1773:[log in to unmask]  disc 0:01:05
220716 15:24:18 28019vokac.1773:[log in to unmask]  Xrootd_File: closing r /dpm/farm.particle.cz/home/dteam/5G.disk2
220716 15:24:18 28019vokac.1773:[log in to unmask]  ofs_close: use=1 fn=/dpm/farm.particle.cz/home/dteam/5G.disk2
220716 15:24:18 28019vokac.1773:[log in to unmask]  dpmoss_Close: return 0
220716 15:24:18 28019vokac.1773:[log in to unmask]  Xrd_Poll: Poller 0 removing FD 30
220716 15:24:18 28019vokac.1773:[log in to unmask]  Xrd_Poll: FD 30 detached from poller 0; num=0
220716 15:24:18 28042 Xrd_Inet: Accepted connection on port 1095 [log in to unmask]


Petr

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1