Print

Print


Hi Rob,

 I don't understand step1), as DPM works differently.

 External https/davs clients go to Apache, port 443. Then the Apache
plugin contacts dome ( = xrootd daemon with plugins ) to get
metadata, but this is just internal gymnastics.

 Could you please clarify the workflow of a failing client?
 Is someone sending DAV clients to port 1094? That would be wrong.

Thanks
Fabrizio

Il 30/07/20 18:10, Robert Currie ha scritto:
> Hi,
> 
> Unfortunately the actual error causing most of the grief for us in
> trying to support ATLAS is that connections to the http side of xrootd
> fail due to an authentication issue internally.
> The authn-error seems to happen when a worker within xrootd attempts to
> re-use a http connection which has previously been authenticated using
> an external x509 cert.
> 
> from our xrootd logs:
> step1) Incoming http connection requests a keep alive and is
> authenticated using external x509 using a worker within xroot (this
> completes according to the logs)
> step2) xroot does other things on other workers
> step3) Connection is attempted using worker from step1 which should be
> authenticated with http.secretkey. This then fails due to a authn problem.
> 
> Using debug logging I've verified that the rejection is occurring within
> the xrootd code and not the dpm plugins which is why tracking this down
> from a DPM perspective took so long as requests from one DPM component
> never reach the other leading to strange errors.
> 
> From what I can gather step3 seems to be failing due to trying to use
> external (untrusted in this context) x509 credentials to perform an
> action that only trusts internal connections authenticated by
> http.secretkey.
> (Otherwise there is some undiagnosed bug in the authentication within
> http(s) handling and I've no idea how to dig deeply enough to diagnose
> this).
> 
> The above accounts for some 90%+ of the deletion problems at our site.
> 
> The 2nd issue (in the same GGUS ticket) which seem to be impacting us is
> also a random issue to do with file descriptors which given is at the
> level of connection handling unfortunately goes beyond the available
> manpower/expertise required to accurately diagnose.
> It has been suggested this is due to the glibc malloc we observe the
> same intermittent problem on our headnode using tcmalloc unless this is
> potentially a problem introduced by both.
> Happy to open an issue about this but diagnosing a rare random bug in
> the connection handling but there is no more manpower at the site to
> track down something so fundamental.
> 
> Both of these errors seem to be strongly mitigated by restarting the
> service very frequently even though this is a last extreme resort.
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/xrootd/xrootd/issues/1251#issuecomment-666495017>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABJBUT5HLOKFJVR2ARO2IXLR6GLOHANCNFSM4OYPYKGA>.
> 


-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1251#issuecomment-666507556
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1