Print

Print


Hi Marcus,
thanks a lot - this is a great start to narrow down the source of the problem. Unfortunately, I don’t have direct access to the DPM logs but i’ve contacted our admin to run some test with him next week. I’ll let you know what I find out. 
Thanks!
-Stephan
 
On Jan 4, 2017, at 16:51, Marcus Ebert <[log in to unmask]> wrote:

Hi Stephan,

This random behaviour especially for doing exactly the same thing over and over again looks like an overload somewhere.

You mention that you are using DPM on the storage side. That means that xrootd has no direct access to the files when asking for it, but all requests should need to go through the DPM headnode/DPM service and involve a database lookup.

Have there been any changes on the DPM/mysql config?  Do you see any errors in the dpm/mysql log files for the same request that gave you an error during your stress test? (debug verbosity was set for the stress test, wasn't it?)

Cheers,
Marcus

On Wed, 4 Jan 2017, Stephan Zimmer wrote:

Dear XrootD experts,

I'm writing to you in the hope you may be able to help me understand or at least identify the issue we've been seeing in our XrootD configuration. We store data on a ATLAS DPM storage element which we access through an xrootd redirector authenticated through x509 personal proxies.

We experimented with the system for a while before encouraging our colleagues to use remote access through the redirector as preferred mode of access, but since doing so, we have been encountering the following error without any further specifics more and more frequently.

[ERROR] Server responded with an error: [3005] Server database error: Communication error on send

Digging through the Xrootd mail archives I've come across this post:
https://listserv.slac.stanford.edu/cgi-bin/wa?A2=ind1003&L=XROOTD-L&P=R227&1=XROOTD-L&9=A&I=-3&J=on&d=No+Match;Match;Matches&z=4

which seems to indicate that 3005 is thrown if an operation is not supported. Actually, I couldn't find any listing where the different
error codes that may be raised by XrootD are provided. Does such
documentation exist and if so, where could I find it?

The biggest challenge in fully debugging this problem is that it occurs
rather randomly. For instance, i've been running a stress test, where I
read the same 5 files (which are clearly accessible on xrootd) 200 times
(specifically i'm using ROOT to add 5 files that are chained together and
retrieve the total number of entris in the tree). During this stress test,
I get the above error message in up to 10% of the cases. In this specific instance, I call the xrootd client through ROOT (5.34.37) TNetXNGFile::Open constructor, but the same problem occurs occasionally when using xrdcp (version 4.2.3) and even when using xrdfs mkdir.

Thank you very much for your help in advance,
Regards,
-Stephan


--
Dr. Stephan Zimmer
DPNC, University of Geneva
24 quai Ernest-Ansermet
CH-1211 Genève 4
[log in to unmask] / [log in to unmask]
mobile: +41766133052

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Stephan Zimmer
DPNC, University of Geneva
24 quai Ernest-Ansermet
CH-1211 Genève 4

mobile: +41766133052



Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1