Print

Print


OK, problem understood:

On the ATLAS instance, we have discovered that plenty of HTTP TPC PUSH transfers are failing on some FSTs.

For example:

cat /var/log/eos/fst/xrdlog.fst | grep CRL | tail
230818 10:33:07 864 TPC_PushRequest: event=TRANSFER_FAIL, local=/eos/atlas/atlasdatadisk/rucio/mc16_13TeV/1b/61/HITS.34368181._000586.pool.root.1, remote=https://atlas-google-europe-west1.cern.ch:443/atlas-europe-west1-datadisk/rucio/mc16_13TeV/HITS.34368181._000586.pool.root.1?GoogleAccessId=rucio-service%40rucio-integration-dev.iam.gserviceaccount.com&Expires=1692351187&Signature=LUzLgE5c%2FmARMvQ%2BmVyMQcR1TnDBVT8fjs5h%2F0s7VH8JeWwVp2os8vSG%2BcsfkiOGQvKYZk3NYXbymb30DHbSvCXtkXUb1QxmVentIvmUUA3wBiib03z6DYjxcpFFD6gMr6%2BTyg64iooPWaAXOJ8xyqPahLd%2ByHV%2FHgH9zz9nIj2e7ybWWDY3E71QOZHY1XDcmq%2BrCh9vNFjxsgQhXOZR7Jj5%2B%2FhWir6aq8A7WQZXS7t8S%2BhwVIj2IgG4eg30p8A7YOkT5qx6rrpSmVW7sRLhNE9sI0L%2FFthW6hVFGYELWtGI7cQOOpn8RAWV6%2BJVDJVSsFjAb5XFNBY1Mc02zSXJJA%3D%3D, user=(anonymous), bytes_transferred=0; HTTP library failure: Failed to load CRL file (path? access rights?, format?)

After lots of investigation, we found out that the cause of the failure was due to curl that failed load the CRL file coming from URLs contained in the package ca_DigiCertGridCA-1G2-Classic-2015-1.122-1.noarch.

The dump of that CRL file is:

$ openssl crl -in /etc/grid-security/certificates/8dd53007.r0 -text
Certificate Revocation List (CRL):
        Version 2 (0x1)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: /DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1 G2
        Last Update: Aug 16 20:48:10 2023 GMT
        Next Update: Sep 15 20:48:10 2023 GMT
        CRL extensions:
            X509v3 Authority Key Identifier: 
                keyid:36:59:CC:DF:3C:3A:E2:49:93:4F:25:12:42:B6:CF:E9:19:1F:32:B4
            X509v3 CRL Number: 
                3052
            X509v3 Issuing Distrubution Point: critical
                Full Name:
                  URI:http://crl3.digicert.com/DigiCertGridCA-1-G2.crl
                  URI:http://crl4.digicert.com/DigiCertGridCA-1-G2.crl
No Revoked Certificates.
    Signature Algorithm: sha256WithRSAEncryption
         6e:a0:79:2c:dc:e8:6d:74:a9:1d:9e:b6:01:5d:c7:af:b3:64:
         04:80:75:d0:94:c6:10:07:51:37:78:dc:f6:cf:8d:eb:35:1a:
         f8:fc:5f:0e:56:c2:6f:6a:e0:3c:81:ca:77:9a:6b:c8:5a:67:
         ab:71:13:23:2f:06:e1:f7:c2:10:1d:32:e3:d5:44:d6:31:99:
         e1:69:4c:0e:90:ae:c1:99:e2:35:55:88:e1:78:46:9f:10:60:
         9f:43:0b:a8:ce:f5:2e:eb:b9:bd:54:28:07:33:c2:32:7c:a4:
         1b:8e:d6:aa:06:b9:d4:cd:c2:45:2d:76:35:02:60:45:f0:00:
         5f:c3:5d:bf:ff:65:2b:57:f8:bd:ab:ad:32:a8:ae:1c:d0:51:
         4b:d7:92:be:39:2c:ea:e1:9b:d5:66:6b:e4:91:4e:fe:da:9d:
         16:13:b8:7c:a8:92:07:c4:26:35:78:bb:5e:38:a5:11:53:c2:
         6a:54:18:15:75:d6:61:80:83:de:14:d2:ad:96:49:b2:15:06:
         9a:e8:ee:d9:b8:9d:90:3e:74:9d:26:3c:5f:fd:92:7a:1f:5a:
         bf:65:3a:20:b4:0a:cc:c2:00:5c:6a:6f:28:a3:4a:f5:09:fb:
         0a:52:e3:f9:52:00:f5:2d:ee:a2:92:94:bf:24:7b:3c:14:70:
         af:03:f8:4e
-----BEGIN X509 CRL-----
MIICYDCCAUgCAQEwDQYJKoZIhvcNAQELBQAwbDETMBEGCgmSJomT8ixkARkWA2Nv
bTEdMBsGCgmSJomT8ixkARkWDURpZ2lDZXJ0LUdyaWQxFjAUBgNVBAoTDURpZ2lD
ZXJ0IEdyaWQxHjAcBgNVBAMTFURpZ2lDZXJ0IEdyaWQgQ0EtMSBHMhcNMjMwODE2
MjA0ODEwWhcNMjMwOTE1MjA0ODEwWqCBpzCBpDAfBgNVHSMEGDAWgBQ2WczfPDri
SZNPJRJCts/pGR8ytDALBgNVHRQEBAICC+wwdAYDVR0cAQH/BGowaKBmoGSGMGh0
dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdyaWRDQS0xLUcyLmNybIYw
aHR0cDovL2NybDQuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0R3JpZENBLTEtRzIuY3Js
MA0GCSqGSIb3DQEBCwUAA4IBAQBuoHks3OhtdKkdnrYBXcevs2QEgHXQlMYQB1E3
eNz2z43rNRr4/F8OVsJvauA8gcp3mmvIWmercRMjLwbh98IQHTLj1UTWMZnhaUwO
kK7BmeI1VYjheEafEGCfQwuozvUu67m9VCgHM8IyfKQbjtaqBrnUzcJFLXY1AmBF
8ABfw12//2UrV/i9q60yqK4c0FFL15K+OSzq4ZvVZmvkkU7+2p0WE7h8qJIHxCY1
eLteOKURU8JqVBgVddZhgIPeFNKtlkmyFQaa6O7ZuJ2QPnSdJjxf/ZJ6H1q/ZTog
tArMwgBcam8oo0r1CfsKUuP5UgD1Le6ikpS/JHs8FHCvA/hO
-----END X509 CRL-----

If we follow the URL, this CRL is actually coming from the following page: https://www.digicert-grid.com/ , look for DigiCert Grid CA-1 G2

We noticed that if that CRL is NOT the first read by curl, the transfers succeeds. But on the case of the failed transfer above, the bundled CRL file has that CRL in the first position. We only see this problem by chance!

I initially thought that that CRL was badly formed. But why would openssl be able to dump it without any issue?

The fault does not come from the CRL. I tried to use that CRL on our centos 7 machine with the following curl command, and it fails:

$ curl -v --cacert /tmp/fst/.xrdtls/ca_file.pem --crlfile /etc/grid-security/certificates/8dd53007.r0 https://storage01.lcg.cscs.ch:2880/pnfs/lcg.cscs.ch/atlas/atlasdatadisk/rucio/mc23_13p6TeV/a6/65
* About to connect() to storage01.lcg.cscs.ch port 2880 (#0)
*   Trying 2001:620:808:4321::181...
* Connected to storage01.lcg.cscs.ch (2001:620:808:4321::181) port 2880 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /tmp/fst/.xrdtls/ca_file.pem
  CApath: none
* Closing connection 0
curl: (82) Failed to load CRL file (path? access rights?, format?)

BUT, on a machine which is running Rhel 9, everything works fine.

I managed to find what the problem is. I noticed that the curl installed on Rhel 9 has been compiled with openssl:

$ curl -V
curl 7.76.1 (x86_64-redhat-linux-gnu) libcurl/7.76.1 OpenSSL/3.0.7 zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.1 (+libidn2/2.3.0) libssh/0.10.4/openssl/zlib nghttp2/1.43.0

but on the CentOS 7 machine where the XRootD server is running, it is not the case:

$ curl -V
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.53.1 zlib/1.2.7 libidn/1.28 libssh2/1.8.0

After having compiled CURL 7.29.0 with openssl on my CentOS 7 dev machine, I managed to make the CRL file loading work.

So, the issue comes from the fact that curl and libcurl coming from CentOS 7 don't use openssl to read CRLs. It instead uses NSS.
According to the website providing that CRL, SHA-2 has been used for its encoding. Probably NSS does not support that type of hash and therefore fails to load the CRL. If that CRL is the first that curl tries to load, it will immediately fail and will not try to load the remaining CRLs in the CRL bundled file...

The following command allows to test, on a machine, which CRL does not work:

find  /etc/grid-security/certificates/ -name *.r0 | xargs -i bash -c 'curl -v --capath /etc/grid-security/certificates --crlfile {} https://storage01.lcg.cscs.ch:2880/pnfs/lcg.cscs.ch/atlas/atlasdatadisk/rucio/mc23_13p6TeV/a6/65 2>/dev/null 1>/dev/null || echo {}'


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <xrootd/xrootd/issues/2065/1683959388@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/2065#issuecomment-1683959388", "url": "https://github.com/xrootd/xrootd/issues/2065#issuecomment-1683959388", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1