I'm unsure if this is some kind of follow-up for #1252, it is pretty confusing... Here's what I observe: - `gfal-copy -p` copying to a non-existent directory runs fine against XRootD 4.12.x, i.e. the parent directories are created. - `gfal-copy -p` also runs fine against a standalone XRootD 5.0.1 server, parent directories are created. - `gfal-copy -p` **fails** to create parent directories and aborts when a redirecting manager (no matter if 4.12.x or 5.0.1) is combined with a 5.0.1 data node, unless `DAVIX_DISABLE_SESSION_CACHING=true` is set. The error on the client side is: ``` [gfalt_copy_file][perform_copy][gfal_http_copy] DESTINATION MAKE_PARENT [gfal_http_copy_make_parent] [gfal_http_exists][gfal_http_access][davix2gliberr] Result Authentication error, reached maximum number of attempts after 1 attempts ``` That's pretty confusing, here some more details. This is the failing case: ``` $ gfal-copy -p -vvv -f -K adler32 --checksum-mode both davs://somewhere.example.com/something.root davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root ... DEBUG Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1 ... DEBUG Davix: < HTTP/1.1 302 Redirect ... DEBUG Davix: < Location: https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root ... DEBUG Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1 ... > Host: dataserver.example.com:1094 ... DEBUG Davix: < HTTP/1.1 404 Not Found DEBUG Davix: < Connection: Close DEBUG Davix: < Content-Length: 90 ... ... ... DEBUG Davix: Found redirection <PROPFIND davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root> DEBUG Davix: creating session keys... httpsdataserver.example.com:1094 DEBUG Davix: cached ne_session found ! taken from cache DEBUG Davix: configure session... DEBUG Davix: define connection timeout to 30 DEBUG Davix: define operation timeout to 8000 DEBUG Davix: add CA PATH /etc/grid-security/certificates/ DEBUG Davix: disable login/password authentication DEBUG Davix: enable client cert authentication by callback DEBUG Davix: -> negotiateRequest DEBUG Davix: NEON start internal request DEBUG Davix: Running pre_send hooks DEBUG Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1 > User-Agent: gfal2-util/1.5.3 gfal2/2.18.1 neon/0.0.29 > TE: trailers > Host: dataserver.example.com:1094 > Depth: 0 > DEBUG Davix: Sending request-line and headers: DEBUG Davix: Doing SSL negotiation. DEBUG Davix: Request sent; retry is 0. DEBUG Davix: Aborted request (-4): Could not read status line DEBUG Davix: sess: Closing connection. DEBUG Davix: sess: Connection closed. DEBUG Davix: Delete Cached redirection for <PROPFIND davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root> DEBUG Davix: Disable Session recycling DEBUG Davix: -> Error when using reycling of session/redirect : cancel and try again INFO Davix: Connection problem, retry DEBUG Davix: (EndRequest)(Libneon) Operation incomplete, kill the connection DEBUG Davix: sess: Not closing closed connection. DEBUG Davix: Connection problem: eradicate session DEBUG Davix: Disable Session recycling DEBUG Davix: Running post_send hooks DEBUG Davix: sess: Not closing closed connection. DEBUG Davix: Running destroy hooks. DEBUG Davix: Request ends. ``` After this, it tries to recover with metalink, and expectedly fails. It seems `davix` does not re-attempt a connection to the dataserver node without the session cache usage. Things work well when I use this straightaway: ``` export DAVIX_DISABLE_SESSION_CACHING=true ``` I tried both: ``` http.tlsreuse off ``` and ``` http.tlsreuse on ``` on the dataserver node, no change. Running `gfal-copy` direclty against the data server also works. Logs of the dataserver are pretty harmless: ``` 200831 22:52:36 3487417 XrootdBridge: eyermuth.28:[log in to unmask] login as eyermuth 200831 22:52:36 3487417 acc_Audit: eyermuth.28:[log in to unmask] grant gsi eyermuth@[::ffff:188.184.103.253] stat /cephfs/grid/atlas/user/scratch/bla3/foo.root 200831 22:52:36 3487417 ofs_stat: eyermuth.28:[log in to unmask] Unable to locate /cephfs/grid/atlas/user/scratch/bla3/foo.root; no such file or directory 200831 22:52:36 3487417 XrootdXeq: eyermuth.28:[log in to unmask] disc 0:00:00 (send failure) 200831 22:52:37 3487243 XrootdBridge: eyermuth.29:[log in to unmask] login as eyermuth 200831 22:52:37 3487243 acc_Audit: eyermuth.29:[log in to unmask] grant gsi eyermuth@[::ffff:188.184.103.253] stat /cephfs/grid/atlas/user/scratch/bla3/foo.root 200831 22:52:37 3487243 ofs_stat: eyermuth.29:[log in to unmask] Unable to locate /cephfs/grid/atlas/user/scratch/bla3/foo.root; no such file or directory 200831 22:52:37 3487243 XrootdXeq: eyermuth.29:[log in to unmask] disc 0:00:00 (send failure) ``` Checking only the failed connection at highest debug level shows: ``` 200831 22:57:16 3487721 XrdInet: Accepted connection from [log in to unmask] 200831 22:57:16 3487700 XrdSched: running main accept inq=0 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: received dlen: 16 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: received dump: 22 03 01 19 17 01 00 19 13 03 03 97 115 75 73 00 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: This does not look like http at pos 0 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: This may look like https 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Protocol matched. https: 1 200831 22:57:16 3487721 XrdProtLoad: matched port 1094 protocol XrdHttp 200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: FD 32 attached to poller 1; num=1 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Process. lp:0x7f9448006ba8 reqstate: 0 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Setting host: [::ffff:188.184.103.253] 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Entering SSL_accept... 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: SSL_accept returned :1 200831 22:57:16 3487721 cryptossl_X509::CertType: certificate has 3 extensions 200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Failed to extract DN information. 200831 22:57:16 3487721 sysXrdHttp: Cleanup 200831 22:57:16 3487721 sysXrdHttp: Reset 200831 22:57:16 3487721 sysXrdHttp: XrdHttpReq request ended. 200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: Poller 1 removing FD 32 200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: FD 32 detached from poller 1; num=0 ``` In short, I am unsure what exactly is happening — things only break when using both a redirector and a dataserver node, only if the client uses session caching, and no matter if XRootD uses session caching, but the same setup works with 4.12.x. Sadly, this means that it fails when FTS tries to transfer data. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1276 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1