Hi Kyle,
The summary counts are consistent. A read request is 24 bytes in length.
So, we find
37184778462/1548943168 = 24.007 (as expected)
The average payload in response:
26610785518162/1548943168 = 17179 (which looks pretty good)
However, this is misleading. Looking at the sample of recvfrom's we find:
File# Offset Length
0600 0bc5 00000000 0000000000d74e00 0008a400 = 566272 bytes
1f00 0bc5 00000000 0000000000dff200 00003e00 = 15872 bytes
0500 0bc5 00000000 0000000000e03000 00004400 = 17408 bytes
0500 0bc5 00000000 0000000000e07400 00001400 = 5120 bytes
1f00 0bc5 00000000 0000000000e08800 00001400 = 5120 bytes
0600 0bc5 00000000 0000000000e09c00 00001400 = 5120 bytes
So, you likely do a lot of small reads with the very large reads pushing
up the average. That said, I would still suspect the monitoring at this
point.
Andy
On Fri, 24 Jun 2011, Kyle Fransham wrote:
> Hi Andy,
>
> On 06/23/2011 04:28 PM, Andrew Hanushevsky wrote:
>> Hi Kyle,
>>
>> What we need to do is find out what the inbound traffic really is. So, some
>> more questions:
>>
>> 1) On your graph is that bytes in/out or packets in/out. If this is packets
>> then the graph is likely correct and you are simply doing a lot of very
>> small reads.
> That's definitely bytes, not packets on the graph.
>> 2) To find out a bit more statistics you can connect to the xrootd server
>> using the xrd command. Do the following:
>>
>> xrd<the_xrootd_server>
>> query 1 lp
>> exit
>>
>> The find the values between<in></in> and<out></out> that will give you
>> number of bytes in and number out. We need to see if that is reasonable
>> compared to actual requests which you will find between<rd></rd> and
>> <wr></wr> (read/write counters).
>
> <rd>1548943168</rd>
> <wr>0</wr>
> <in>37184778462</in>
> <out>26610785518162</out>
>
> So writes are 0, as expected. Inbound traffic is three orders of magnitude
> less than outbound, which doesn't correspond to our monitoring, but looks
> good.
>
>> 3) If those still seem not to correspond then we can look at the actual
>> xrootd kernel calls using strace. For instance:
>>
>> strace -f -xx -ttt -p<pid> -e trace=network 2>&1 | grep 'recv('>
>> <outfile>
> Okay, so I see no calls to 'recv(' in the strace. However, I do see calls to
> 'recvfrom(' that look like this:
>
> [pid 16211] 1308923304.005638 recvfrom(19,
>
"\x06\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd7\x4e\x00\x00\x08\xa4\x00\x00\x00\x00\x00",
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.045490 recvfrom(19,
> "\x1f\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xdf\xf2\x00\x00\x00\x3e\x00\x00\x00\x00\x00",
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.046110 recvfrom(19,
> "\x05\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x30\x00\x00\x00\x44\x00\x00\x00\x00\x00",
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.999048 recvfrom(19,
> "\x05\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x74\x00\x00\x00\x14\x00\x00\x00\x00\x00",
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923304.999260 recvfrom(19,
> "\x1f\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x88\x00\x00\x00\x14\x00\x00\x00\x00\x00",
> 24, 0, NULL, NULL) = 24
> [pid 16211] 1308923305.000027 recvfrom(19,
> "\x06\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x9c\x00\x00\x00\x14\x00\x00\x00\x00\x00",
> 24, 0, NULL, NULL) = 24
>
>> This will capture server recv() requests (inbound traffic). No need to run
>> this more than a minute or two.
>>
>> 4) If that doesn't reveal anything then the only other option is that there
>> really is something else on that machine that is accepting incoming
>> traffic.
>> If it isn't udp then netstat should show you who that might be.
> At a first glance, netstat doesn't show any real flags. Also, xrootd really
> is the only thing running on this machine, and we see lots of input whenever
> a user runs jobs, (i.e. reads data from xrootd) and no input otherwise. So
> the two are at least correlated...
>
> Thanks for your help!
>
> Kyle
>> Andy
>>
>>
>> -----Original Message-----
>> From: Kyle Fransham
>> Sent: Thursday, June 23, 2011 1:05 PM
>> To: Andrew Hanushevsky
>> Cc: xrootd-l
>> Subject: Re: inbound traffic
>>
>> Hi Andy,
>>
>> This is a machine at UVic that we use to serve BaBar xrootd files to
>> virtual machines that we spawn in the cloud. It's running little else
>> besides xrootd. Any traffic on the external interface (the plot that I
>> sent) is xrootd. We see very high inbound traffic almost all of the time.
>>
>> On the back end, we have 10TB or so of data in a lustre filesystem
>> that's distributed across multiple workers. Since this is a distributed
>> filesystem, we expect (and we do see) traffic on the internal interface
>> that's associated with the reading of xrootd collections. But we don't
>> expect to see that externally...
>>
>> What else can I tell you about this machine/setup to help diagnose the
>> problem?
>>
>> Thanks,
>>
>> Kyle
>>
>> On 06/23/2011 03:36 PM, Andrew Hanushevsky wrote:
>>> Hi Kyle,
>>>
>>> There should be little inbound traffic unless that machine is used for
>>> more
>>> than just xrootd services. What machine are we talking about?
>>>
>>> Andy
>>>
>>> -----Original Message-----
>>> From: Kyle Fransham
>>> Sent: Thursday, June 23, 2011 7:48 AM
>>> To: xrootd-l
>>> Subject: inbound traffic
>>>
>>> Hi all,
>>>
>>> We've got a single xrootd server serving out BaBar root files over the
>>> WAN. We notice that there is a lot of inbound traffic, even though our
>>> files are exported read-only. Attached is a network plot showing the
>>> traffic on the xrootd interface for four simultaneous user analysis
>>> jobs. (In case you can't see the attachment, the inboud traffic tends
>>> to be about 75% of the outbound traffic.)
>>>
>>> Is this expected behaviour?
>>>
>>> Thanks,
>>>
>>> Kyle
>>>
>
|