Print

Print


Hi Kyle,

The summary counts are consistent. A read request is 24 bytes in length. 
So, we find

37184778462/1548943168 = 24.007 (as expected)

The average payload in response:

26610785518162/1548943168 = 17179 (which looks pretty good)

However, this is misleading. Looking at the sample of recvfrom's we find:

           File#    Offset           Length
0600 0bc5 00000000 0000000000d74e00 0008a400 = 566272 bytes
1f00 0bc5 00000000 0000000000dff200 00003e00 =  15872 bytes
0500 0bc5 00000000 0000000000e03000 00004400 =  17408 bytes
0500 0bc5 00000000 0000000000e07400 00001400 =   5120 bytes
1f00 0bc5 00000000 0000000000e08800 00001400 =   5120 bytes
0600 0bc5 00000000 0000000000e09c00 00001400 =   5120 bytes

So, you likely do a lot of small reads with the very large reads pushing 
up the average. That said, I would still suspect the monitoring at this 
point.

Andy

On Fri, 24 Jun 2011, Kyle Fransham wrote:

> Hi Andy,
>
> On 06/23/2011 04:28 PM, Andrew Hanushevsky wrote:
>> Hi Kyle,
>> 
>> What we need to do is find out what the inbound traffic really is. So, some
>> more questions:
>> 
>> 1) On your graph is that bytes in/out or packets in/out. If this is packets
>> then the graph is likely correct and you are simply doing a lot of very
>> small reads.
> That's definitely bytes, not packets on the graph.
>> 2) To find out a bit more statistics you can connect to the xrootd server
>> using the xrd command. Do the following:
>> 
>> xrd<the_xrootd_server>
>> query 1 lp
>> exit
>> 
>> The find the values between<in></in>  and<out></out>  that will give you
>> number of bytes in and number out. We need to see if that is reasonable
>> compared to actual requests which you will find between<rd></rd>  and
>> <wr></wr>  (read/write counters).
>
> <rd>1548943168</rd>
> <wr>0</wr>
> <in>37184778462</in>
> <out>26610785518162</out>
>
> So writes are 0, as expected.  Inbound traffic is three orders of magnitude 
> less than outbound, which doesn't correspond to our monitoring, but looks 
> good.
>
>> 3) If those still seem not to correspond then we can look at the actual
>> xrootd kernel calls using strace. For instance:
>> 
>> strace -f -xx -ttt -p<pid>  -e trace=network 2>&1 | grep  'recv('>
>> <outfile>
> Okay, so I see no calls to 'recv(' in the strace.  However, I do see calls to 
> 'recvfrom('  that look like this:
>
> [pid 16211] 1308923304.005638 recvfrom(19,
> 
"\x06\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd7\x4e\x00\x00\x08\xa4\x00\x00\x00\x00\x00", 
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.045490 recvfrom(19, 
> "\x1f\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xdf\xf2\x00\x00\x00\x3e\x00\x00\x00\x00\x00", 
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.046110 recvfrom(19, 
> "\x05\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x30\x00\x00\x00\x44\x00\x00\x00\x00\x00", 
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.999048 recvfrom(19, 
> "\x05\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x74\x00\x00\x00\x14\x00\x00\x00\x00\x00", 
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.999260 recvfrom(19, 
> "\x1f\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x88\x00\x00\x00\x14\x00\x00\x00\x00\x00", 
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923305.000027 recvfrom(19, 
> "\x06\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x9c\x00\x00\x00\x14\x00\x00\x00\x00\x00", 
> 24, 0, NULL, NULL) = 24
>
>> This will capture server recv() requests (inbound traffic). No need to run
>> this more than a minute or two.
>> 
>> 4) If that doesn't reveal anything then the only other option is that there
>> really is something else on that machine that is accepting incoming 
>> traffic.
>> If it isn't udp then netstat should show you who that might be.
> At a first glance, netstat doesn't show any real flags.  Also, xrootd really 
> is the only thing running on this machine, and we see lots of input whenever 
> a user runs jobs, (i.e. reads data from xrootd) and no input otherwise.  So 
> the two are at least correlated...
>
> Thanks for your help!
>
> Kyle
>> Andy
>> 
>> 
>> -----Original Message-----
>> From: Kyle Fransham
>> Sent: Thursday, June 23, 2011 1:05 PM
>> To: Andrew Hanushevsky
>> Cc: xrootd-l
>> Subject: Re: inbound traffic
>> 
>> Hi Andy,
>> 
>> This is a machine at UVic that we use to serve BaBar xrootd files to
>> virtual machines that we spawn in the cloud.  It's running little else
>> besides xrootd.  Any traffic on the external interface (the plot that I
>> sent) is xrootd.  We see very high inbound traffic almost all of the time.
>> 
>> On the back end, we have 10TB or so of data in a lustre filesystem
>> that's distributed across multiple workers.  Since this is a distributed
>> filesystem, we expect (and we do see) traffic on the internal interface
>> that's associated with the reading of xrootd collections.  But we don't
>> expect to see that externally...
>> 
>> What else can I tell you about this machine/setup to help diagnose the
>> problem?
>> 
>> Thanks,
>> 
>> Kyle
>> 
>> On 06/23/2011 03:36 PM, Andrew Hanushevsky wrote:
>>> Hi Kyle,
>>> 
>>> There should be little inbound traffic unless that machine is used for
>>> more
>>> than just xrootd services. What machine are we talking about?
>>> 
>>> Andy
>>> 
>>> -----Original Message-----
>>> From: Kyle Fransham
>>> Sent: Thursday, June 23, 2011 7:48 AM
>>> To: xrootd-l
>>> Subject: inbound traffic
>>> 
>>> Hi all,
>>> 
>>> We've got a single xrootd server serving out BaBar root files over the
>>> WAN.  We notice that there is a lot of inbound traffic, even though our
>>> files are exported read-only.  Attached is a network plot showing the
>>> traffic on the xrootd interface for four simultaneous user analysis
>>> jobs.  (In case you can't see the attachment, the inboud traffic tends
>>> to be about 75% of the outbound traffic.)
>>> 
>>> Is this expected behaviour?
>>> 
>>> Thanks,
>>> 
>>> Kyle
>>> 
>