Print

Print


I'm unsure if this is some kind of follow-up for #1252, it is pretty confusing... 

Here's what I observe:
- `gfal-copy -p` copying to a non-existent directory runs fine against XRootD 4.12.x, i.e. the parent directories are created. 
- `gfal-copy -p` also runs fine against a standalone XRootD 5.0.1 server, parent directories are created. 
- `gfal-copy -p` **fails** to create parent directories and aborts when a redirecting manager (no matter if 4.12.x or 5.0.1) is combined with a 5.0.1 data node, unless `DAVIX_DISABLE_SESSION_CACHING=true` is set. 

The error on the client side is:
```
[gfalt_copy_file][perform_copy][gfal_http_copy] DESTINATION MAKE_PARENT [gfal_http_copy_make_parent]   [gfal_http_exists][gfal_http_access][davix2gliberr] Result Authentication error, reached maximum number of attempts after 1 attempts
```

That's pretty confusing, here some more details. This is the failing case:
```
$ gfal-copy -p -vvv -f -K adler32 --checksum-mode both davs://somewhere.example.com/something.root davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root
...
DEBUG    Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1
...
DEBUG    Davix: < HTTP/1.1 302 Redirect
...
DEBUG    Davix: < Location: https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root
...
DEBUG    Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1
...
> Host: dataserver.example.com:1094
...
DEBUG    Davix: < HTTP/1.1 404 Not Found
DEBUG    Davix: < Connection: Close
DEBUG    Davix: < Content-Length: 90
...
...
...
DEBUG    Davix: Found redirection <PROPFIND davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root>
DEBUG    Davix:  creating session keys... httpsdataserver.example.com:1094
DEBUG    Davix: cached ne_session found ! taken from cache 
DEBUG    Davix: configure session...
DEBUG    Davix: define connection timeout to 30
DEBUG    Davix: define operation timeout to 8000
DEBUG    Davix: add CA PATH /etc/grid-security/certificates/
DEBUG    Davix: disable login/password authentication
DEBUG    Davix: enable client cert authentication by callback 
DEBUG    Davix:  -> negotiateRequest
DEBUG    Davix: NEON start internal request
DEBUG    Davix: Running pre_send hooks
DEBUG    Davix: > PROPFIND /cephfs/grid/atlas/user/scratch/bla3/foo.root HTTP/1.1
> User-Agent: gfal2-util/1.5.3 gfal2/2.18.1 neon/0.0.29
> TE: trailers
> Host: dataserver.example.com:1094
> Depth: 0
> 

DEBUG    Davix: Sending request-line and headers:
DEBUG    Davix: Doing SSL negotiation.
DEBUG    Davix: Request sent; retry is 0.
DEBUG    Davix: Aborted request (-4): Could not read status line
DEBUG    Davix: sess: Closing connection.
DEBUG    Davix: sess: Connection closed.
DEBUG    Davix: Delete Cached redirection for <PROPFIND davs://managernode.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root https://dataserver.example.com:1094/cephfs/grid/atlas/user/scratch/bla3/foo.root>
DEBUG    Davix: Disable Session recycling
DEBUG    Davix:  ->  Error when using reycling of session/redirect : cancel and try again
INFO     Davix: Connection problem, retry
DEBUG    Davix: (EndRequest)(Libneon) Operation incomplete, kill the connection
DEBUG    Davix: sess: Not closing closed connection.
DEBUG    Davix: Connection problem: eradicate session
DEBUG    Davix: Disable Session recycling
DEBUG    Davix: Running post_send hooks
DEBUG    Davix: sess: Not closing closed connection.
DEBUG    Davix: Running destroy hooks.
DEBUG    Davix: Request ends.
```

After this, it tries to recover with metalink, and expectedly fails. It seems `davix` does not re-attempt a connection to the dataserver node without the session cache usage. 

Things work well when I use this straightaway:
```
export DAVIX_DISABLE_SESSION_CACHING=true
```

I tried both:
```
http.tlsreuse off
```
and
```
http.tlsreuse on
```
on the dataserver node, no change. Running `gfal-copy` direclty against the data server also works. 


Logs of the dataserver are pretty harmless:
```
200831 22:52:36 3487417 XrootdBridge: eyermuth.28:[log in to unmask] login as eyermuth
200831 22:52:36 3487417 acc_Audit: eyermuth.28:[log in to unmask] grant gsi eyermuth@[::ffff:188.184.103.253] stat /cephfs/grid/atlas/user/scratch/bla3/foo.root
200831 22:52:36 3487417 ofs_stat: eyermuth.28:[log in to unmask] Unable to locate /cephfs/grid/atlas/user/scratch/bla3/foo.root; no such file or directory
200831 22:52:36 3487417 XrootdXeq: eyermuth.28:[log in to unmask] disc 0:00:00 (send failure)
200831 22:52:37 3487243 XrootdBridge: eyermuth.29:[log in to unmask] login as eyermuth
200831 22:52:37 3487243 acc_Audit: eyermuth.29:[log in to unmask] grant gsi eyermuth@[::ffff:188.184.103.253] stat /cephfs/grid/atlas/user/scratch/bla3/foo.root
200831 22:52:37 3487243 ofs_stat: eyermuth.29:[log in to unmask] Unable to locate /cephfs/grid/atlas/user/scratch/bla3/foo.root; no such file or directory
200831 22:52:37 3487243 XrootdXeq: eyermuth.29:[log in to unmask] disc 0:00:00 (send failure)
```
Checking only the failed connection at highest debug level shows:
```
200831 22:57:16 3487721 XrdInet: Accepted connection from [log in to unmask]
200831 22:57:16 3487700 XrdSched: running main accept inq=0
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: received dlen: 16
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: received dump: 22 03 01 19 17 01 00 19 13 03 03 97 115 75 73 00 
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: This does not look like http at pos 0
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: This may look like https
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Protocol matched. https: 1
200831 22:57:16 3487721 XrdProtLoad: matched port 1094 protocol XrdHttp
200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: FD 32 attached to poller 1; num=1
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp:  Process. lp:0x7f9448006ba8 reqstate: 0
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp:  Setting host: [::ffff:188.184.103.253]
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp:  Entering SSL_accept...
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp:  SSL_accept returned :1
200831 22:57:16 3487721 cryptossl_X509::CertType: certificate has 3 extensions
200831 22:57:16 3487721 anon.0:[log in to unmask] sysXrdHttp: Failed to extract DN information.
200831 22:57:16 3487721 sysXrdHttp:  Cleanup
200831 22:57:16 3487721 sysXrdHttp:  Reset
200831 22:57:16 3487721 sysXrdHttp:  XrdHttpReq request ended.
200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: Poller 1 removing FD 32
200831 22:57:16 3487721 anon.0:[log in to unmask] XrdPoll: FD 32 detached from poller 1; num=0
```

In short, I am unsure what exactly is happening — things only break when using both a redirector and a dataserver node, only if the client uses session caching, and no matter if XRootD uses session caching, but the same setup works with 4.12.x. 
Sadly, this means that it fails when FTS tries to transfer data. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1276
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1