Dear experts, We are observing issues between xrootd (both redirector and disk node) and FTS servers like these: ``` 220712 09:41:26 10649 XrdLink: Unable to send to 7dc42260.5721:[log in to unmask]; connection timed out 220712 09:41:26 10649 XrootdXeq: 7dc42260.5721:[log in to unmask] disc 2:12:21 (socket error) ``` which cause most of our transfers to fail (failures are 10-100 times larger than successes) On the FTS side we see this: ``` INFO Tue, 12 Jul 2022 10:14:02 +0200; [1657613642182] DEST http_plugin CHECKSUM:ENTER WARNING Tue, 12 Jul 2022 10:29:52 +0200; Timeout stopped ERR Tue, 12 Jul 2022 10:29:52 +0200; Recoverable error: [112] DESTINATION CHECKSUM (Neon): Could not read status line: Connection timed out ``` In short: transfer succeeds (we can see the files on disk), but the checksum part always times out. We've tried many things to make it work, including a custom plugin for calculation. We do not observe huge delays in the checksum calculation - nothing that would explain :red_square: 15 minute delays :red_square:! Both source and destination use `davs://` for copy → HTTPS via XROOTD path on our side. ## Logs FTS log example: - [job](https://fts3-cms.cern.ch:8449/fts3/ftsmon/#/job/5055e2c2-01ba-11ed-845d-fa163eddde58) - [logs](https://fts-cms-05.cern.ch:8449/var/log/fts3/transfers/2022-07-12/cmsxrootd.hep.wisc.edu__xrootd.phy.bris.ac.uk/2022-07-12-0829__cmsxrootd.hep.wisc.edu__xrootd.phy.bris.ac.uk__3394306095__5055e2c2-01ba-11ed-845d-fa163eddde58) Full xrootd.log: - [disknode](https://u.pcloud.link/publink/show?code=XZKk64VZMLNLNMDTN9VCtYs7OFDO4ju7vxJy) [16 MB] - [redirector](https://u.pcloud.link/publink/show?code=XZ0X64VZG1AQafVTmwml7aRM3iekLVcmDM5V) [8MB] ## Versions, config and operations We are running both redirector (xrootd.phy.bris.ac.uk:1094) and disk server (io-37-02.acrc.bris.ac.uk:1194) via Docker with `--net=host`. FTS servers are reachable from within the containers (and host) via IPv4 and IPv6. Our config can be found on https://github.com/BristolComputing/xrootd-se/tree/main/etc/xrootd (clustered + config.d). Installed xrootd versions and plugins: ``` |libmacaroons.x86_64 | 0.3.0-2.el7 | epel | |scitokens-cpp.x86_64 | 0.7.1-1.el7 | epel | |voms.x86_64 | 2.1.0-0.24.rc2.el7 | epel | |xrootd.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-client.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-client-libs.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-cmstfc.x86_64 | 1.5.2-6.osg36.el7 | osg-contrib | |xrootd-lcmaps.x86_64 | 99-1.osg36.el7 | osg | |xrootd-libs.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-scitokens.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-selinux.noarch | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-server.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-server-libs.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | |xrootd-voms.x86_64 | 1:5.4.3-1.1.osg36.el7 | osg | ``` xrootd-hdfs ``` Hadoop 3.3.1 Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2 Compiled by ubuntu on 2021-06-15T05:13Z Compiled with protoc 3.7.1 From source with checksum 88a4ddb2299aca054416d6b7f81ca55 This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar xrootd-hdfs 2.2.0 Source code repository https://github.com/uobdic/xrootd-hdfs.git -b kreczko-checksum-debug -r 66d7c97 Compiled by CentOS Linux release 7.9.2009 (Core) on 2022-07-08T13:59Z ``` ## Other monitoring ### Failures ![image](https://user-images.githubusercontent.com/1213276/178455662-60cda892-4a3d-49e5-bb7c-773404ba550c.png) ### Successes ![image](https://user-images.githubusercontent.com/1213276/178455879-b874edca-8045-4414-bfac-105bc9c108f6.png) 99% of the failures are due to the mentioned timeout. Please note the different scales for the y-axis. -- Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1736 You are receiving this because you are subscribed to this thread. Message ID: <[log in to unmask]> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1