Print

Print


Dear experts,


We are observing issues between xrootd (both redirector and disk node) and FTS servers like these:
```
220712 09:41:26 10649 XrdLink: Unable to send to 7dc42260.5721:[log in to unmask]; connection timed out
220712 09:41:26 10649 XrootdXeq: 7dc42260.5721:[log in to unmask] disc 2:12:21 (socket error)
```
which cause most of our transfers to fail (failures are 10-100 times larger than successes)

On the FTS side we see this:
```
INFO    Tue, 12 Jul 2022 10:14:02 +0200; [1657613642182] DEST http_plugin	CHECKSUM:ENTER	
WARNING Tue, 12 Jul 2022 10:29:52 +0200; Timeout stopped
ERR     Tue, 12 Jul 2022 10:29:52 +0200; Recoverable error: [112] DESTINATION CHECKSUM (Neon): Could not read status line: Connection timed out
```
In short: transfer succeeds (we can see the files on disk), but the checksum part always times out.
We've tried many things to make it work, including a custom plugin for calculation.
We do not observe huge delays in the checksum calculation - nothing that would explain :red_square: 15 minute delays :red_square:!
Both source and destination use `davs://` for copy → HTTPS via XROOTD path on our side.


## Logs

FTS log example:
 - [job](https://fts3-cms.cern.ch:8449/fts3/ftsmon/#/job/5055e2c2-01ba-11ed-845d-fa163eddde58)
 - [logs](https://fts-cms-05.cern.ch:8449/var/log/fts3/transfers/2022-07-12/cmsxrootd.hep.wisc.edu__xrootd.phy.bris.ac.uk/2022-07-12-0829__cmsxrootd.hep.wisc.edu__xrootd.phy.bris.ac.uk__3394306095__5055e2c2-01ba-11ed-845d-fa163eddde58)

Full xrootd.log:
 - [disknode](https://u.pcloud.link/publink/show?code=XZKk64VZMLNLNMDTN9VCtYs7OFDO4ju7vxJy) [16 MB]
 - [redirector](https://u.pcloud.link/publink/show?code=XZ0X64VZG1AQafVTmwml7aRM3iekLVcmDM5V) [8MB]


## Versions, config and operations
We are running both redirector (xrootd.phy.bris.ac.uk:1094) and disk server (io-37-02.acrc.bris.ac.uk:1194) via Docker with `--net=host`.
FTS servers are reachable from within the containers (and host) via IPv4 and IPv6.

Our config can be found on https://github.com/BristolComputing/xrootd-se/tree/main/etc/xrootd (clustered + config.d).

Installed xrootd versions and plugins:
```
|libmacaroons.x86_64           |          0.3.0-2.el7 |           epel |
|scitokens-cpp.x86_64          |          0.7.1-1.el7 |           epel |
|voms.x86_64                   |   2.1.0-0.24.rc2.el7 |           epel |
|xrootd.x86_64                 | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-client.x86_64          | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-client-libs.x86_64     | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-cmstfc.x86_64          |    1.5.2-6.osg36.el7 |    osg-contrib |
|xrootd-lcmaps.x86_64          |       99-1.osg36.el7 |            osg |
|xrootd-libs.x86_64            | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-scitokens.x86_64       | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-selinux.noarch         | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-server.x86_64          | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-server-libs.x86_64     | 1:5.4.3-1.1.osg36.el7 |            osg |
|xrootd-voms.x86_64            | 1:5.4.3-1.1.osg36.el7 |            osg |
```
xrootd-hdfs
```
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar

xrootd-hdfs 2.2.0
Source code repository https://github.com/uobdic/xrootd-hdfs.git -b kreczko-checksum-debug -r 66d7c97
Compiled by CentOS Linux release 7.9.2009 (Core) on 2022-07-08T13:59Z
```

## Other monitoring

### Failures
![image](https://user-images.githubusercontent.com/1213276/178455662-60cda892-4a3d-49e5-bb7c-773404ba550c.png)

### Successes
![image](https://user-images.githubusercontent.com/1213276/178455879-b874edca-8045-4414-bfac-105bc9c108f6.png)


99% of the failures are due to the mentioned timeout. Please note the different scales for the y-axis.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1736
You are receiving this because you are subscribed to this thread.

Message ID: <[log in to unmask]>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1