I'm unsure if this is some kind of follow-up for #1252, it is pretty confusing...
Here's what I observe:
gfal-copy -p
copying to a non-existent directory runs fine against XRootD 4.12.x, i.e. the parent directories are created.gfal-copy -p
also runs fine against a standalone XRootD 5.0.1 server, parent directories are created.gfal-copy -p
fails to create parent directories and aborts when a redirecting manager (no matter if 4.12.x or 5.0.1) is combined with a 5.0.1 data node, unless DAVIX_DISABLE_SESSION_CACHING=true
is set.The error on the client side is:
[gfalt_copy_file][perform_copy][gfal_http_copy] DESTINATION MAKE_PARENT [gfal_http_copy_make_parent] [gfal_http_exists][gfal_http_access][davix2gliberr] Result Authentication error, reached maximum number of attempts after 1 attempts
That's pretty confusing, here some more details. This is the failing case:
$ gfal-copy -p -vvv -f -K adler32 --checksum-mode both davs://somewhere.example.com/something.root davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root
...
DEBUG Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1
...
DEBUG Davix: < HTTP/1.1 302 Redirect
...
DEBUG Davix: < Location: https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root
...
DEBUG Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1
...
> Host: dataserver.example.com:1094
...
DEBUG Davix: < HTTP/1.1 404 Not Found
DEBUG Davix: < Connection: Close
DEBUG Davix: < Content-Length: 90
...
...
...
DEBUG Davix: Found redirection <PROPFIND davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root>
DEBUG Davix: creating session keys... httpsdataserver.example.com:1094
DEBUG Davix: cached ne_session found ! taken from cache
DEBUG Davix: configure session...
DEBUG Davix: define connection timeout to 30
DEBUG Davix: define operation timeout to 8000
DEBUG Davix: add CA PATH /etc/grid-security/certificates/
DEBUG Davix: disable login/password authentication
DEBUG Davix: enable client cert authentication by callback
DEBUG Davix: -> negotiateRequest
DEBUG Davix: NEON start internal request
DEBUG Davix: Running pre_send hooks
DEBUG Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1
> User-Agent: gfal2-util/1.5.3 gfal2/2.18.1 neon/0.0.29
> TE: trailers
> Host: dataserver.example.com:1094
> Depth: 0
>
DEBUG Davix: Sending request-line and headers:
DEBUG Davix: Doing SSL negotiation.
DEBUG Davix: Request sent; retry is 0.
DEBUG Davix: Aborted request (-4): Could not read status line
DEBUG Davix: sess: Closing connection.
DEBUG Davix: sess: Connection closed.
DEBUG Davix: Delete Cached redirection for <PROPFIND davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root>
DEBUG Davix: Disable Session recycling
DEBUG Davix: -> Error when using reycling of session/redirect : cancel and try again
INFO Davix: Connection problem, retry
DEBUG Davix: (EndRequest)(Libneon) Operation incomplete, kill the connection
DEBUG Davix: sess: Not closing closed connection.
DEBUG Davix: Connection problem: eradicate session
DEBUG Davix: Disable Session recycling
DEBUG Davix: Running post_send hooks
DEBUG Davix: sess: Not closing closed connection.
DEBUG Davix: Running destroy hooks.
DEBUG Davix: Request ends.
After this, it tries to recover with metalink, and expectedly fails. It seems davix
does not re-attempt a connection to the dataserver node without the session cache usage.
Things work well when I use this straightaway:
export DAVIX_DISABLE_SESSION_CACHING=true
I tried both:
http.tlsreuse off
and
http.tlsreuse on
on the dataserver node, no change. Running gfal-copy
direclty against the data server also works.
Logs of the dataserver are pretty harmless:
200831 22:52:36 3487417 XrootdBridge: eyermuth.28:[log in to unmask] login as eyermuth
200831 22:52:36 3487417 acc_Audit: eyermuth.28:[log in to unmask] grant gsi eyermuth@[::ffff:188.184.103.253] stat /cephfs/grid/atlas/user/scratch/bla3/foo.root
200831 22:52:36 3487417 ofs_stat: eyermuth.28:[log in to unmask] Unable to locate /cephfs/grid/atlas/user/scratch/bla3/foo.root; no such file or directory
200831 22:52:36 3487417 XrootdXeq: eyermuth.28:[log in to unmask] disc 0:00:00 (send failure)
200831 22:52:37 3487243 XrootdBridge: eyermuth.29:[log in to unmask] login as eyermuth
200831 22:52:37 3487243 acc_Audit: eyermuth.29:[log in to unmask] grant gsi eyermuth@[::ffff:188.184.103.253] stat /cephfs/grid/atlas/user/scratch/bla3/foo.root
200831 22:52:37 3487243 ofs_stat: eyermuth.29:[log in to unmask] Unable to locate /cephfs/grid/atlas/user/scratch/bla3/foo.root; no such file or directory
200831 22:52:37 3487243 XrootdXeq: eyermuth.29:[log in to unmask] disc 0:00:00 (send failure)
Checking only the failed connection at highest debug level shows:
200831 22:57:16 3487721 XrdInet: Accepted connection from [log in to unmask]
200831 22:57:16 3487700 XrdSched: running main accept inq=0
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: received dlen: 16
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: received dump: 22 03 01 19 17 01 00 19 13 03 03 97 115 75 73 00
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: This does not look like http at pos 0
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: This may look like https
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Protocol matched. https: 1
200831 22:57:16 3487721 XrdProtLoad: matched port 1094 protocol XrdHttp
200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: FD 32 attached to poller 1; num=1
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Process. lp:0x7f9448006ba8 reqstate: 0
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Setting host: [::ffff:188.184.103.253]
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Entering SSL_accept...
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: SSL_accept returned :1
200831 22:57:16 3487721 cryptossl_X509::CertType: certificate has 3 extensions
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Failed to extract DN information.
200831 22:57:16 3487721 sysXrdHttp: Cleanup
200831 22:57:16 3487721 sysXrdHttp: Reset
200831 22:57:16 3487721 sysXrdHttp: XrdHttpReq request ended.
200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: Poller 1 removing FD 32
200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: FD 32 detached from poller 1; num=0
In short, I am unsure what exactly is happening — things only break when using both a redirector and a dataserver node, only if the client uses session caching, and no matter if XRootD uses session caching, but the same setup works with 4.12.x.
Sadly, this means that it fails when FTS tries to transfer data.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1