@vokac : I went through the client log file and here's what I found out:

  1. The client is not stuck, it's just the server is serving the pgread response into 64KB chunks (which makes it very slow).
[2022-07-16 15:23:15.806312 +0200][Dump   ][AsyncSock         ] [dpmdisk2.farm.particle.cz:1095.0] Received message 0x780380e8 of 65632 bytes

Now, this is very weird because the client is asking for 8MB chunks (this is the default so no surprise here), and the server is chunking the response in 64KB partial responses (by default the server is using 2MB chunks so this is very suspicious).

The client managed to read about ~400MB this way:

2022-07-16 15:23:17.741272 +0200][Dump   ][XRootD            ] [dpmdisk2.farm.particle.cz:1095] Got a kXR_status response to request kXR_pgread (handle: 0x00000000, offset: 419430400, size: 8388608)
  1. Then we are stuck for ~1min and a stream t/o kicks in:
[2022-07-16 15:24:18.625331 +0200][Debug  ][AsyncSock         ] [dpmdisk2.farm.particle.cz:1095.0] Closing the socket
[2022-07-16 15:24:18.625340 +0200][Debug  ][Poller            ] <[2001:1458:d00:64::100:215]:60760><--><[2001:718:401:6025:1:0:1:87]:1095> Removing socket from the poller
[2022-07-16 15:24:18.625398 +0200][Debug  ][PostMaster        ] [dpmdisk2.farm.particle.cz:1095] Recovering error for stream #0: [ERROR] Socket timeout.
[2022-07-16 15:24:18.625407 +0200][Debug  ][PostMaster        ] [dpmdisk2.farm.particle.cz:1095] Reporting disconnection to queued message handlers.
[2022-07-16 15:24:18.625424 +0200][Dump   ][XRootD            ] [dpmdisk2.farm.particle.cz:1095] Stream event reported for msg kXR_pgread (handle: 0x00000000, offset: 436207616, size: 8388608)

the client runs the recovery procedure:

[2022-07-16 15:24:18.627197 +0200][Debug  ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Running the recovery procedure
[2022-07-16 15:24:18.627204 +0200][Dump   ][File              ] [0x676610@root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe] Sending a recovery open command to root://dpmhead.farm.particle.cz:1094//dpm/farm.particle.cz/home/dteam/5G.disk2?xrdcl.requuid=79be2b53-1bfc-414b-baa7-4ade1508f5fe

the client successfully reconnects and reopens the file:

[2022-07-16 15:24:18.828170 +0200][Dump   ][XRootD            ] [dpmdisk2.farm.particle.cz:1095] Got a kXR_ok response to request kXR_open (file: /dpm/farm.particle.cz/home/dteam/5G.disk2?dpm.chunk0=0,0,dpmdisk2.farm.particle.cz:/mnt/fs1/dteam/2022-07-15/5G.disk2.10.1657880610&dpm.dhost=dpmdisk2.farm.particle.cz&dpm.hv2=SeyBJE6wT9Vei4ur/rPkLQ==&dpm.loc=1&dpm.nonce=c021ccc9&dpm.time=1657977858, mode: 00, flags: kXR_open_read )

and resumes the transfer.

  1. After reconnecting, the server starts using (the default) 2MB chunk size:
[2022-07-16 15:24:19.022460 +0200][Dump   ][AsyncSock         ] [dpmdisk2.farm.particle.cz:1095.0] Received message 0x78037bb8 of 2099232 bytes

so the transfer now goes much faster.

We will need some help from @abh3 to understand why is the server using those very small 65KB chunks, @abh3 could please comment on this?


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <xrootd/xrootd/issues/1739/1198108383@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1739#issuecomment-1198108383", "url": "https://github.com/xrootd/xrootd/issues/1739#issuecomment-1198108383", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1