Print

Print


Ciao, we are doing a test in CMS, and the servers we have in pisa are apparently choked and fail, but with no apparent error message.

Currently (focussing on a single one), the machines have 1 GB/s in input, and spit to WAn at 350 MB/s, with ~1000 xrootd connections from netstat.

If I try and open a new connection via direct xrdcp (-d3), handshaking goes fine, and the client receives a 

================= DUMPING CLIENT REQUEST HEADER =================
                ClientHeader.streamid = 0x3e 0x00
               ClientHeader.requestid = kXR_read (3013)
            ClientHeader.read.fhandle = 0x00 0x00 0x00 0x00 
             ClientHeader.read.offset = 0
               ClientHeader.read.rlen = 8388608
             ClientHeader.header.dlen = 0
=================== END CLIENT HEADER DUMPING ===================

141003 13:40:37 10598 Xrd: WriteRaw: Writing 24 bytes to physical connection
141003 13:40:37 10598 Xrd: WriteRaw: Writing to substreamid 0
141003 13:40:37 10598 Xrd: Read: Waiting 1outstanding blocks.
141003 13:40:38 10596 Xrd: XrdClientMessage::ReadRaw:  sid: 2, IsAttn: 0, substreamid: 0
141003 13:40:38 10596 Xrd: XrdClientMessage::ReadRaw: Reading data (4194304 bytes) from substream 0
141003 13:40:38 10596 Xrd: ReadRaw: Reading from stormgf2.pi.infn.it:1094
141003 13:41:06 10595 Xrd: DumpPhyConn: Phyconn entry, [log in to unmask]:1094', LogCnt=1 Valid
141003 13:41:36 10595 Xrd: DumpPhyConn: Phyconn entry, [log in to unmask]:1094', LogCnt=1 Valid
141003 13:41:51 10596 Xrd: BuildMessage:  propagating unsol id 2
141003 13:41:51 10596 Xrd: ProcessUnsolicitedMsg: Incoming unsolicited response from streamid 2
141003 13:41:51 10596 Xrd: ProcessUnsolicitedMsg: Processing async response from streamid 2 father=1
141003 13:41:51 10596 Xrd: ProcessUnsolicitedMsg: Putting kXR_read data into cache. Offset=8388608 len 4194304
141003 13:41:51 10596 Xrd: Cache: Submitting 8388608->12582911 to cache.
141003 13:41:51 10596 Xrd: XrdClientMessage::ReadRaw: Reading header (8 bytes).
141003 13:41:51 10596 Xrd: ReadRaw: Reading from stormgf2.pi.infn.it:1094
141003 13:41:51 10598 Xrd: Read: Cache response: got 0@0 bytes. Holes= 0 Outstanding= 1
141003 13:41:51 10598 Xrd: Read: Waiting 1outstanding blocks.
141003 13:41:51 10596 Xrd: XrdClientMessage::ReadRaw:  sid: 3, IsAttn: 0, substreamid: 0
141003 13:41:51 10596 Xrd: XrdClientMessage::ReadRaw: Reading data (4194304 bytes) from substream 0
141003 13:41:51 10596 Xrd: ReadRaw: Reading from stormgf2.pi.infn.it:1094
141003 13:42:06 10595 Xrd: DumpPhyConn: Phyconn entry, [log in to unmask]:1094', LogCnt=1 Valid
...
141003 13:45:26 10596 Xrd: BuildMessage:  propagating unsol id 6
141003 13:45:26 10596 Xrd: ProcessUnsolicitedMsg: Incoming unsolicited response from streamid 6
141003 13:45:26 10596 Xrd: ProcessUnsolicitedMsg: Processing async response from streamid 6 father=1
141003 13:45:26 10596 Xrd: ProcessUnsolicitedMsg: Putting kXR_read data into cache. Offset=25165824 len 4194304
141003 13:45:26 10596 Xrd: Cache: Submitting 25165824->29360127 to cache.
141003 13:45:26 10596 Xrd: XrdClientMessage::ReadRaw: Reading header (8 bytes).
141003 13:45:26 10596 Xrd: ReadRaw: Reading from stormgf2.pi.infn.it:1094
141003 13:45:26 10596 Xrd: XrdClientMessage::ReadRaw:  sid: 7, IsAttn: 0, substreamid: 0
141003 13:45:26 10596 Xrd: XrdClientMessage::ReadRaw: Reading data (4194304 bytes) from substream 0
141003 13:45:26 10598 Xrd: Read: Cache response: got 0@0 bytes. Holes= 0 Outstanding= 1
141003 13:45:26 10598 Xrd: Read: Waiting 1outstanding blocks.
 

but as you can see seconds if not minutes pass between subsequent messages, and eventually all times out.

server side all seems ok initially

141003 13:40:37 6240 acc_Audit: tboccali.10594:[log in to unmask] grant gsi [log in to unmask] read /store/mc/Spring14dr/DYJetsToLL_M-50_13TeV-madgraph-pythia8-tauola_v2/AODSIM/PU20bx25_POSTLS170_V5-v1/00000/42A592AD-5F06-E411-8D25-02163E0104DA.root

and then apparently nothing more

network is sort of saturated, but not at this extent: i can easily work on the machine from remote ...

Servers have 

[root@stormgf2 cms]# rpm -qa|grep xrootd
xrootd-server-libs-3.3.4-1.slc6.x86_64
xrootd-cmstfc-1.4.3-3.osg.el6.x86_64
xrootd-libs-3.3.4-1.slc6.x86_64
xrootd-client-libs-3.3.4-1.slc6.x86_64
xrootd-3.3.4-1.slc6.x86_64

and are configured with 4096 threads

xrd.sched maxt 4096

network flux is as said O(1) GB/s in input, 200-300 MB/s in output.

[root@stormgf2 ~]# ./traffic.pl eth2.7 5 .01
Fri Oct  3 13:48:49 2014 - [eth2.7] MB/s IN:820.940 ********   - OUT 269.265 **         
Fri Oct  3 13:48:54 2014 - [eth2.7] MB/s IN:809.092 ********   - OUT 244.763 **         
Fri Oct  3 13:48:59 2014 - [eth2.7] MB/s IN:830.033 ********   - OUT 261.938 **         

is there any other hard limit I can be confronting with?

thanks

tom








---
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/143

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1