Print

Print


Hi,

We have a problem with the way xrootd decides whether a connection is a WAN
connection.

Some of the worker nodes at RAL are in the gridpp.rl.ac.uk domain and our
xrootd servers are in the higher level domain rl.ac.uk.

Jobs running on the machines in the gridpp subdomain fail because the refuse
to connect to the different domain. 

lcg0338 - ~/work/anal21 $ KanCollUtil
/store/PRskims/R14/14.4.0d/BFourBody/02/BFourBody_0291.01.root
041025 11:33:48 14069 Warn: XTNetConn::checkUrlDomain      - Skipping url
[root://csflnx108.rl.ac.uk///store/PRskims/R14/14.4.0d/BFourBody/02/BFourBod
y_0291.01.root.01.root]. Its domain [.rl.ac.uk] differs from the client's
one [.gridpp.rl.ac.uk].
041025 11:33:48 14069 Err : XTNetFile::CTOR                - All server
specified are out of the client's domain. Going into zombie state.
ERR Could not open a file expected to contain the event header: 
ERR    LFN =
/store/PRskims/R14/14.4.0d/BFourBody/02/BFourBody_0291.01.root.01.root
ERR    PFN =
root://bbr-rdr01.rl.ac.uk//store/PRskims/R14/14.4.0d/BFourBody/02/BFourBody_
0291.01.root.01.root
ERR Check collection name and access method... 

Trying to fix this by adding Root.XTNetFileAllowWanConnect:   1 to an
.rootrc file changes the error but does not fix the problem.

lcg0338 - ~/work/anal21 $ KanCollUtil
/store/PRskims/R14/14.4.0d/BFourBody/02/BFourBody_0291.01.root
041025 11:36:21 14080 Err : XTNetConn::handleServerError   - Redirection to
a server out-of-domain disallowed. Abort
Aborted

This is going to cause jobs to randomly fail at RAL, could the client be
fixed to recognise that subdomains are not WAN connections.

Thanks,
Chris.

-- 
  Chris Brew  ([log in to unmask])  +44 1235 446326
  Particle Physics Department              
  Rutherford Appleton Laboratory
  Chilton, Didcot. Oxfordshire.
  OX11 0QX. United Kingdom.