Print

Print


Hi all,

I'm seeing some issues with xrootd in CMSSW jobs that are causing severe performance issues.  See below for a snippet from the client and relevant server.

It seems that the client is trying to end a previously terminated security session and, after this happens the first time, I always get these "Timeout or error waiting on outstanding blocks".  This happens about every second of a multi-hour job, making the log file a touch verbose.

In this case, the client was talking to red-gridftp9.  I tried a few test transfers, and the server is otherwise responsive (although by no means idle, xrootd is exporting about 4Gbps today).

Any ideas?  I've only seen these timeout issues when there's the server error.  My search has not been exhaustive, so it could just be a coincidence.

Brian

101008 11:32:41 001 Xrd: CheckErrorStatus: Server declared: session not found(error code: 3011)
101008 11:32:41 001 Xrd: Open: File opened succesfully.
08-Oct-2010 11:32:42 CDT  Successfully opened file root://xrootd.unl.edu//store/data/Run2010A/JetMETTau/RECO/Sep17ReReco_v2/0051/BC729A70-BEC6-DF11-AFF4-0018F3D096EE.root
101008 11:32:48 001 Xrd: Read: Timeout or error waiting outstanding blocks. Retrying sync! List of outstanding reqs follows.
-------------------------------------------------- start outstanding reqs dump. freesids: 65533
++++++++++++++++++++++++++++++++++++++++++++++++++++ end  outstanding reqs dump.
101008 11:32:48 001 Xrd: Read: Timeout or error waiting outstanding blocks. Retrying sync! List of outstanding reqs follows.
-------------------------------------------------- start outstanding reqs dump. freesids: 65533
++++++++++++++++++++++++++++++++++++++++++++++++++++ end  outstanding reqs dump.
101008 11:32:48 001 Xrd: Read: Timeout or error waiting outstanding blocks. Retrying sync! List of outstanding reqs follows.
-------------------------------------------------- start outstanding reqs dump. freesids: 65532


================= DUMPING CLIENT REQUEST HEADER =================
                ClientHeader.streamid = 0x75 0x0d
               ClientHeader.requestid = kXR_read (3013)
            ClientHeader.read.fhandle = 0x00 0x00 0x00 0x00
             ClientHeader.read.offset = 11937606541
               ClientHeader.read.rlen = 562291
             ClientHeader.header.dlen = 0
=================== END CLIENT HEADER DUMPING ===================

++++++++++++++++++++++++++++++++++++++++++++++++++++ end  outstanding reqs dump.

[root@red-gridftp9 ~]# grep glidein_.10759 /var/log/xrootd/xrootd.log
101008 11:33:20 8224 glidein_.10759:[log in to unmask] XrootdProtocol: more auth requested; sz=2082
101008 11:33:20 8224 XrootdXeq: glidein_.10759:[log in to unmask] login as 92a2e9e2.0
101008 11:33:20 8224 acc_Audit: glidein_.10759:[log in to unmask] grant gsi [log in to unmask] read /store/data/Run2010A/JetMETTau/RECO/Sep17ReReco_v2/0050/D6E7ADBB-B7C6-DF11-AFC6-002618943975.root
101008 11:40:24 8224 XrootdXeq: glidein_.10759:[log in to unmask] disc 0:07:04
101008 12:26:57 8224 glidein_.10759:[log in to unmask] XrootdProtocol: more auth requested; sz=2082
101008 12:26:58 8224 XrootdXeq: glidein_.10759:[log in to unmask] login as 92a2e9e2.0
101008 12:26:58 8224 glidein_.10759:[log in to unmask] XrootdProtocol: endsess 8222:371.14718
101008 12:26:58 8224 glidein_.10759:[log in to unmask] XrootdProtocol: endsess 8222:371.14718 rc=-3 (No such process)
101008 12:26:58 8224 glidein_.10759:[log in to unmask] XrootdResponse: sending err 3011: session not found
101008 12:26:58 8224 acc_Audit: glidein_.10759:[log in to unmask] grant gsi [log in to unmask] read /store/data/Run2010A/JetMETTau/RECO/Sep17ReReco_v2/0050/E62D2728-B8C6-DF11-8E71-0018F3D0962C.root
101008 12:33:59 8224 XrootdXeq: glidein_.10759:[log in to unmask] disc 0:07:02
101008 12:56:01 8224 glidein_.10759:[log in to unmask] XrootdProtocol: more auth requested; sz=2082
101008 12:56:01 8224 XrootdXeq: glidein_.10759:[log in to unmask] login as 92a2e9e2.0
101008 12:56:01 8224 glidein_.10759:[log in to unmask] XrootdProtocol: endsess 8222:451.14760
101008 12:56:01 8224 glidein_.10759:[log in to unmask] XrootdProtocol: endsess 8222:451.14760 rc=-3 (No such process)
101008 12:56:01 8224 glidein_.10759:[log in to unmask] XrootdResponse: sending err 3011: session not found
101008 12:56:01 8224 acc_Audit: glidein_.10759:[log in to unmask] grant gsi [log in to unmask] read /store/data/Run2010A/JetMETTau/RECO/Sep17ReReco_v2/0047/884E3212-6AC6-DF11-B1C9-0026189438AF.root