Print

Print


Hi guys, sorry the late replay but I took few days off.

Otoh it seems to try for every operation (given the overall slowdown of the reading to >40 minutes, from originally 30 seconds),

@Axel-Naumann : I had a look at the logs and here's what I see:

The first request is being send at:

[2022-09-06 20:06:06.225956 +0200][Dump   ][XRootD            ] [eospublic.cern.ch:1094] Sending message kXR_open (file: /eos/root-eos/cms_opendata_2012_nanoaod_skimmed/ZZTo4mu.root, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )

this triggers the client to open a connection, the client first fails to connect with IPv6 and then retries and succeeds with IPv4 (this takes about ~1.5 min):

[2022-09-06 20:06:06.228241 +0200][Debug  ][AsyncSock         ] [eospublic.cern.ch:1094.0] Attempting connection to [2001:1458:301:17::100:e]:1094
...
[2022-09-06 20:07:21.227311 +0200][Error  ][AsyncSock         ] [eospublic.cern.ch:1094.0] Unable to connect: operation timed out
...
[2022-09-06 20:07:21.227431 +0200][Debug  ][AsyncSock         ] [eospublic.cern.ch:1094.0] Attempting connection to [::ffff:128.142.160.145]:1094
...
[2022-09-06 20:07:27.591630 +0200][Debug  ][PostMaster        ] [eospublic.cern.ch:1094] Stream 0 connected.

then the client gets redirected to a data server and again fails to connect to the IPv6 address but then succeeds with IPv4 (this again takes less than 1.5 min):

[2022-09-06 20:07:27.606806 +0200][Debug  ][AsyncSock         ] [p06636710d91266.cern.ch:1095.0] Attempting connection to [2001:1458:301:c4::100:8]:1095
...
[2022-09-06 20:08:42.604962 +0200][Error  ][AsyncSock         ] [p06636710d91266.cern.ch:1095.0] Unable to connect: operation timed out
...
[2022-09-06 20:08:42.605062 +0200][Debug  ][AsyncSock         ] [p06636710d91266.cern.ch:1095.0] Attempting connection to [::ffff:128.142.215.72]:1095
...
[2022-09-06 20:08:48.829024 +0200][Debug  ][PostMaster        ] [p06636710d91266.cern.ch:1095] Stream 0 connected.

After a successful open, the client sends 3 read requests, a query and finally closes the file at:

[2022-09-06 20:08:48.922313 +0200][Dump   ][XRootD            ] [p06636710d91266.cern.ch:1095] Got a kXR_ok response to request kXR_close (handle: 0x00000000)

In total it takes about 2 minutes 40 seconds, now I don't understand where are the 40 minutes you mention coming from? Once the connection is established it is reused for all the request the client issues.

and I'd hope that the resilience could happen a bit faster than 2.5 minutes - but I do not know the details nor whether that's cause by our usage in ROOT or xrootd.

It is tunable, you can set the connection window with XRD_CONNECTIONWINDOW envar (the default is 120 seconds).

Perhaps a simpler "fail fast" algorithm than Happy Eyeballs is, for hostnames which resolve to N addresses, have a "short connection timeout" for the first N-1 addresses and use the standard connection timeout for the final address.

The downside of the latter idea is "complexity kills" for what is ultimately an end-user misconfiguration. The potential upside is that it'd help immensely with cases where the N independent addresses represent N independent servers -- an unresponsive server (which users aren't at fault for) would be quickly ignored instead of having the client wait for the full timeout window.

@bbockelm : we could implement this as a feature to be enabled by the user


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <xrootd/xrootd/issues/1779/1245371418@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1779#issuecomment-1245371418", "url": "https://github.com/xrootd/xrootd/issues/1779#issuecomment-1245371418", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1