On the same topic, we have some disk nodes in EOSATLAS that have a hard time connecting to the DPM Tokyo site. We have two categories of disk nodes, some that work, some that fail. Again, all nodes have exactly the same configuration, certificates installed etc. These errors were reported by ATLAS since they were seeing 60% efficiency of the link between EOSATLAS and Tokyo.

A simple curl command from one the "offending" disk nodes shows the following fully reproducible error:

[root@st-096-dd904b6a ~]#  curl -v --capath /etc/grid-security/certificates/ https://lcg-se01.icepp.jp:18443
* About to connect() to lcg-se01.icepp.jp port 18443 (#0)
*   Trying 2404:d540:1:589:2:1:2:1...
^[[O^[[I* Connection timed out
*   Trying 157.82.116.96...
* Connected to lcg-se01.icepp.jp (157.82.116.96) port 18443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: /etc/grid-security/certificates/
* NSS error -8054 (SEC_ERROR_REUSED_ISSUER_AND_SERIAL)
* You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
* Closing connection 0
curl: (35) You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.

After debugging a bit, the issue seems to come from the CA authority which signed the Tokyo site certificate, namely KEK:

[root@st-096-dd904b6a ~]# ls -lrt /etc/grid-security/certificates/ | grep "KEK\|2f2f\|617ff"
-rw-r--r--. 1 root root  1299 Jan 15 17:03 KEK.signing_policy
-rw-r--r--. 1 root root  1273 Jan 15 17:03 KEK.pem
-rw-r--r--. 1 root root   464 Jan 15 17:03 KEK.namespaces
-rw-r--r--. 1 root root   318 Jan 15 17:03 KEK.info
-rw-r--r--. 1 root root    44 Jan 15 17:03 KEK.crl_url
lrwxrwxrwx. 1 root root    18 Jan 19 08:07 617ff41b.signing_policy -> KEK.signing_policy
lrwxrwxrwx. 1 root root    14 Jan 19 08:07 617ff41b.namespaces -> KEK.namespaces
lrwxrwxrwx. 1 root root     7 Jan 19 08:07 617ff41b.0 -> KEK.pem
lrwxrwxrwx. 1 root root    18 Jan 19 08:07 2f2f573f.signing_policy -> KEK.signing_policy
lrwxrwxrwx. 1 root root    14 Jan 19 08:07 2f2f573f.namespaces -> KEK.namespaces
lrwxrwxrwx. 1 root root     7 Jan 19 08:07 2f2f573f.0 -> KEK.pem
-rw-r--r--. 1 root root  6466 Mar 17 05:20 617ff41b.r0
-rw-r--r--. 1 root root  6466 Mar 17 11:24 2f2f573f.r0

If I remove the revocation files (last two) the everything works fine. Needless to say, on other disk nodes where this error doesn't show up, the files are exactly the same with identical checksums.

For reference with the revokation files removed, we get:

[root@st-096-dd904b6a ~]# curl -v --capath /etc/grid-security/certificates/ https://lcg-se01.icepp.jp:18443
* About to connect() to lcg-se01.icepp.jp port 18443 (#0)
*   Trying 2404:d540:1:589:2:1:2:1...
* Connected to lcg-se01.icepp.jp (2404:d540:1:589:2:1:2:1) port 18443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: /etc/grid-security/certificates/
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_RSA_WITH_AES_128_CBC_SHA
* Server certificate:
* 	subject: CN=host/lcg-se01.icepp.jp,OU=ICEPP,OU=CRC,O=KEK,C=JP
* 	start date: Oct 26 02:20:03 2020 GMT
* 	expire date: Nov 30 02:20:03 2021 GMT
* 	common name: host/lcg-se01.icepp.jp
* 	issuer: CN=KEK GRID Certificate Authority,OU=CRC,O=KEK,C=JP
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: lcg-se01.icepp.jp:18443
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Date: Wed, 17 Mar 2021 10:28:47 GMT
< Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips DAV/2 mod_gridsite/2.3.4
< Content-Length: 321
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.</p>
<p>Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request.</p>
</body></html>
* Connection #0 to host lcg-se01.icepp.jp left intact

I tried clearing the nss/sssd cache, but the error is still there. Only removing the link files fixes this problem. All the version of the packages are identical to the ones in ticket #1428

Any clue on what might be the issue here?

Thanks,
Elvin


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1429", "url": "https://github.com/xrootd/xrootd/issues/1429", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1