It looks like several issues here are interacting. The "cannot assign
requested address" may be the key element. Derek reported this error in a
different context. There are many reasons why this message can come out
(e.g. no source ports, someone isn't setting SO_REUSEADDR when creating
sockets, trying to connect to a dead port in a Docker container, etc).

The other issue is the "being staged" message. That can come from the
cmsd when a file in placed in pending state. That state means the file is
being brought in from another location and the client has to wait for it
to arrive. This occurs in Xcache when a client is assigned a caching
server to bring in a file. However, that depends on the client actually
going to that server to initiate the fetch so that the data server can
complete the pending action. I think that the "cannot assign" message
indicates that the processs is not being completed.

The other interesting part is why this is happening all of a sudden and
I'd say because we are encountering an error that was never encountered
before (i.e. "cannot assign" error).

I'd say a deeper analysis is needed here. I'll take a look as well.

Andy


On Fri, 3 Mar 2023, Brian P Bockelman wrote:

> Ok, from the output, here's the error:
> ```
> 230301 06:28:12 12393 ***@***.*** ofs_open: 0-600 fn=/user/ligo/test_access/access_ligo
> 230301 06:28:12 12393 acc_Audit: ***@***.*** grant https 1fa32e3f.0@[::ffff:129.93.244.204] read /user/ligo/test_access/access_ligo
> [2023-03-01 06:28:12.623677 +0000][Error ][AsyncSock ] ***@***.***:1094.0] Unable to initiate the connection: [ERROR] Socket error: cannot assign requested address
> [2023-03-01 06:28:13.148513 +0000][Info ][AsyncSock ] ***@***.***:1094.0] TLS hand-shake done.
> 230301 06:28:13 12393 ofs_Stall: Stall 3: File access_ligo is being staged; estimated time to completion 3 seconds for /user/ligo/test_access/access_ligo
> 230301 06:28:13 12393 ***@***.*** Xrootd_Protocol: stalling client for 3 sec
> 230301 06:28:13 12393 ***@***.*** ofs_close: use=0 fn=dummy
> ```
> It looks like:
> - the HTTP handler code doesn't take kindly to being stalled here and tries to close the file in response instead of waiting. Because the file handle is not open, the close failed with the message "read does not refer to an open file" (obviously an incorrect message but close 'nuff). @ccaffy - fixing this is probably in your court. Note this occurs during the GET path. Look at the request in Fabio's attached file starting with `230301 06:28:07 12725 http_Protocol`.
> - Unclear if XrdCl is doing the right thing here. I'm guessing it's just bouncing around haphazardly, trying to find a data server without a socket error? @simonmichal, thoughts? Should it fail earlier / harder?
> - For some reason, the OFS layer is able to handle the XrdCl error well for awhile (maybe because the file is cached?), but eventually chokes and starts stalling. Note it is almost exactly 20 minutes between the first error and when things start to stall. Perhaps during those 20 minutes the PFC is somehow swallowing the error, then later on decides to propagate it to OFS, and then the OFS finally starts erroring? @osschar - thoughts?
>
> If I had to guess, the socket error is because IPv6 is disabled on the host.
>
> --
> Reply to this email directly or view it on GitHub:
> https://github.com/xrootd/xrootd/issues/1940#issuecomment-1454083076
> You are receiving this because you are subscribed to this thread.
>
> Message ID: ***@***.***>


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <xrootd/xrootd/issues/1940/1454115770@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1940#issuecomment-1454115770", "url": "https://github.com/xrootd/xrootd/issues/1940#issuecomment-1454115770", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1