Dear experts,
We are observing issues between xrootd (both redirector and disk node) and FTS servers like these:
220712 09:41:26 10649 XrdLink: Unable to send to 7dc42260.5721:[log in to unmask]; connection timed out
220712 09:41:26 10649 XrootdXeq: 7dc42260.5721:[log in to unmask] disc 2:12:21 (socket error)
which cause most of our transfers to fail (failures are 10-100 times larger than successes)
On the FTS side we see this:
INFO Tue, 12 Jul 2022 10:14:02 +0200; [1657613642182] DEST http_plugin CHECKSUM:ENTER
WARNING Tue, 12 Jul 2022 10:29:52 +0200; Timeout stopped
ERR Tue, 12 Jul 2022 10:29:52 +0200; Recoverable error: [112] DESTINATION CHECKSUM (Neon): Could not read status line: Connection timed out
In short: transfer succeeds (we can see the files on disk), but the checksum part always times out.
We've tried many things to make it work, including a custom plugin for calculation.
We do not observe huge delays in the checksum calculation - nothing that would explain
Both source and destination use davs://
for copy → HTTPS via XROOTD path on our side.
FTS log example:
Full xrootd.log:
We are running both redirector (xrootd.phy.bris.ac.uk:1094) and disk server (io-37-02.acrc.bris.ac.uk:1194) via Docker with --net=host
.
FTS servers are reachable from within the containers (and host) via IPv4 and IPv6.
Our config can be found on https://github.com/BristolComputing/xrootd-se/tree/main/etc/xrootd (clustered + config.d).
Installed xrootd versions and plugins:
|libmacaroons.x86_64 | 0.3.0-2.el7 | epel |
|scitokens-cpp.x86_64 | 0.7.1-1.el7 | epel |
|voms.x86_64 | 2.1.0-0.24.rc2.el7 | epel |
|xrootd.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-client.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-client-libs.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-cmstfc.x86_64 | 1.5.2-6.osg36.el7 | osg-contrib |
|xrootd-lcmaps.x86_64 | 99-1.osg36.el7 | osg |
|xrootd-libs.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-scitokens.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-selinux.noarch | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-server.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-server-libs.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
|xrootd-voms.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg |
xrootd-hdfs
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar
xrootd-hdfs 2.2.0
Source code repository https://github.com/uobdic/xrootd-hdfs.git -b kreczko-checksum-debug -r 66d7c97
Compiled by CentOS Linux release 7.9.2009 (Core) on 2022-07-08T13:59Z
99% of the failures are due to the mentioned timeout. Please note the different scales for the y-axis.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1