Hi guys, sorry the late replay but I took few days off. > Otoh it seems to try for every operation (given the overall slowdown of the reading to >40 minutes, from originally 30 seconds), @Axel-Naumann : I had a look at the logs and here's what I see: The first request is being send at: ``` [2022-09-06 20:06:06.225956 +0200][Dump ][XRootD ] [eospublic.cern.ch:1094] Sending message kXR_open (file: /eos/root-eos/cms_opendata_2012_nanoaod_skimmed/ZZTo4mu.root, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ``` this triggers the client to open a connection, the client first fails to connect with IPv6 and then retries and succeeds with IPv4 (this takes about ~1.5 min): ``` [2022-09-06 20:06:06.228241 +0200][Debug ][AsyncSock ] [eospublic.cern.ch:1094.0] Attempting connection to [2001:1458:301:17::100:e]:1094 ... [2022-09-06 20:07:21.227311 +0200][Error ][AsyncSock ] [eospublic.cern.ch:1094.0] Unable to connect: operation timed out ... [2022-09-06 20:07:21.227431 +0200][Debug ][AsyncSock ] [eospublic.cern.ch:1094.0] Attempting connection to [::ffff:128.142.160.145]:1094 ... [2022-09-06 20:07:27.591630 +0200][Debug ][PostMaster ] [eospublic.cern.ch:1094] Stream 0 connected. ``` then the client gets redirected to a data server and again fails to connect to the IPv6 address but then succeeds with IPv4 (this again takes less than 1.5 min): ``` [2022-09-06 20:07:27.606806 +0200][Debug ][AsyncSock ] [p06636710d91266.cern.ch:1095.0] Attempting connection to [2001:1458:301:c4::100:8]:1095 ... [2022-09-06 20:08:42.604962 +0200][Error ][AsyncSock ] [p06636710d91266.cern.ch:1095.0] Unable to connect: operation timed out ... [2022-09-06 20:08:42.605062 +0200][Debug ][AsyncSock ] [p06636710d91266.cern.ch:1095.0] Attempting connection to [::ffff:128.142.215.72]:1095 ... [2022-09-06 20:08:48.829024 +0200][Debug ][PostMaster ] [p06636710d91266.cern.ch:1095] Stream 0 connected. ``` After a successful open, the client sends 3 read requests, a query and finally closes the file at: ``` [2022-09-06 20:08:48.922313 +0200][Dump ][XRootD ] [p06636710d91266.cern.ch:1095] Got a kXR_ok response to request kXR_close (handle: 0x00000000) ``` In total it takes about 2 minutes 40 seconds, now I don't understand where are the 40 minutes you mention coming from? Once the connection is established it is reused for all the request the client issues. > and I'd hope that the resilience could happen a bit faster than 2.5 minutes - but I do not know the details nor whether that's cause by our usage in ROOT or xrootd. It is tunable, you can set the connection window with `XRD_CONNECTIONWINDOW` envar (the default is 120 seconds). > Perhaps a simpler "fail fast" algorithm than Happy Eyeballs is, for hostnames which resolve to N addresses, have a "short connection timeout" for the first N-1 addresses and use the standard connection timeout for the final address. > > The downside of the latter idea is "complexity kills" for what is ultimately an end-user misconfiguration. The potential upside is that it'd help immensely with cases where the N independent addresses represent N independent servers -- an unresponsive server (which users aren't at fault for) would be quickly ignored instead of having the client wait for the full timeout window. @bbockelm : we could implement this as a feature to be enabled by the user -- Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1779#issuecomment-1245371418 You are receiving this because you are subscribed to this thread. Message ID: <[log in to unmask]> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1