Hi Kyle, The summary counts are consistent. A read request is 24 bytes in length. So, we find 37184778462/1548943168 = 24.007 (as expected) The average payload in response: 26610785518162/1548943168 = 17179 (which looks pretty good) However, this is misleading. Looking at the sample of recvfrom's we find: File# Offset Length 0600 0bc5 00000000 0000000000d74e00 0008a400 = 566272 bytes 1f00 0bc5 00000000 0000000000dff200 00003e00 = 15872 bytes 0500 0bc5 00000000 0000000000e03000 00004400 = 17408 bytes 0500 0bc5 00000000 0000000000e07400 00001400 = 5120 bytes 1f00 0bc5 00000000 0000000000e08800 00001400 = 5120 bytes 0600 0bc5 00000000 0000000000e09c00 00001400 = 5120 bytes So, you likely do a lot of small reads with the very large reads pushing up the average. That said, I would still suspect the monitoring at this point. Andy On Fri, 24 Jun 2011, Kyle Fransham wrote: > Hi Andy, > > On 06/23/2011 04:28 PM, Andrew Hanushevsky wrote: >> Hi Kyle, >> >> What we need to do is find out what the inbound traffic really is. So, some >> more questions: >> >> 1) On your graph is that bytes in/out or packets in/out. If this is packets >> then the graph is likely correct and you are simply doing a lot of very >> small reads. > That's definitely bytes, not packets on the graph. >> 2) To find out a bit more statistics you can connect to the xrootd server >> using the xrd command. Do the following: >> >> xrd<the_xrootd_server> >> query 1 lp >> exit >> >> The find the values between<in></in> and<out></out> that will give you >> number of bytes in and number out. We need to see if that is reasonable >> compared to actual requests which you will find between<rd></rd> and >> <wr></wr> (read/write counters). > > <rd>1548943168</rd> > <wr>0</wr> > <in>37184778462</in> > <out>26610785518162</out> > > So writes are 0, as expected. Inbound traffic is three orders of magnitude > less than outbound, which doesn't correspond to our monitoring, but looks > good. > >> 3) If those still seem not to correspond then we can look at the actual >> xrootd kernel calls using strace. For instance: >> >> strace -f -xx -ttt -p<pid> -e trace=network 2>&1 | grep 'recv('> >> <outfile> > Okay, so I see no calls to 'recv(' in the strace. However, I do see calls to > 'recvfrom(' that look like this: > > [pid 16211] 1308923304.005638 recvfrom(19, > "\x06\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd7\x4e\x00\x00\x08\xa4\x00\x00\x00\x00\x00", > 24, 0, NULL, NULL) = 24 > [pid 16211] 1308923304.045490 recvfrom(19, > "\x1f\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xdf\xf2\x00\x00\x00\x3e\x00\x00\x00\x00\x00", > 24, 0, NULL, NULL) = 24 > [pid 16211] 1308923304.046110 recvfrom(19, > "\x05\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x30\x00\x00\x00\x44\x00\x00\x00\x00\x00", > 24, 0, NULL, NULL) = 24 > [pid 16211] 1308923304.999048 recvfrom(19, > "\x05\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x74\x00\x00\x00\x14\x00\x00\x00\x00\x00", > 24, 0, NULL, NULL) = 24 > [pid 16211] 1308923304.999260 recvfrom(19, > "\x1f\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x88\x00\x00\x00\x14\x00\x00\x00\x00\x00", > 24, 0, NULL, NULL) = 24 > [pid 16211] 1308923305.000027 recvfrom(19, > "\x06\x00\x0b\xc5\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\x9c\x00\x00\x00\x14\x00\x00\x00\x00\x00", > 24, 0, NULL, NULL) = 24 > >> This will capture server recv() requests (inbound traffic). No need to run >> this more than a minute or two. >> >> 4) If that doesn't reveal anything then the only other option is that there >> really is something else on that machine that is accepting incoming >> traffic. >> If it isn't udp then netstat should show you who that might be. > At a first glance, netstat doesn't show any real flags. Also, xrootd really > is the only thing running on this machine, and we see lots of input whenever > a user runs jobs, (i.e. reads data from xrootd) and no input otherwise. So > the two are at least correlated... > > Thanks for your help! > > Kyle >> Andy >> >> >> -----Original Message----- >> From: Kyle Fransham >> Sent: Thursday, June 23, 2011 1:05 PM >> To: Andrew Hanushevsky >> Cc: xrootd-l >> Subject: Re: inbound traffic >> >> Hi Andy, >> >> This is a machine at UVic that we use to serve BaBar xrootd files to >> virtual machines that we spawn in the cloud. It's running little else >> besides xrootd. Any traffic on the external interface (the plot that I >> sent) is xrootd. We see very high inbound traffic almost all of the time. >> >> On the back end, we have 10TB or so of data in a lustre filesystem >> that's distributed across multiple workers. Since this is a distributed >> filesystem, we expect (and we do see) traffic on the internal interface >> that's associated with the reading of xrootd collections. But we don't >> expect to see that externally... >> >> What else can I tell you about this machine/setup to help diagnose the >> problem? >> >> Thanks, >> >> Kyle >> >> On 06/23/2011 03:36 PM, Andrew Hanushevsky wrote: >>> Hi Kyle, >>> >>> There should be little inbound traffic unless that machine is used for >>> more >>> than just xrootd services. What machine are we talking about? >>> >>> Andy >>> >>> -----Original Message----- >>> From: Kyle Fransham >>> Sent: Thursday, June 23, 2011 7:48 AM >>> To: xrootd-l >>> Subject: inbound traffic >>> >>> Hi all, >>> >>> We've got a single xrootd server serving out BaBar root files over the >>> WAN. We notice that there is a lot of inbound traffic, even though our >>> files are exported read-only. Attached is a network plot showing the >>> traffic on the xrootd interface for four simultaneous user analysis >>> jobs. (In case you can't see the attachment, the inboud traffic tends >>> to be about 75% of the outbound traffic.) >>> >>> Is this expected behaviour? >>> >>> Thanks, >>> >>> Kyle >>> >