Print

Print


Doug,

There is one redirector -- it doesn't seem to have a problem with any of 
this -- and one data server.  The "problem" namespace is built on five 
underlying filesystems:

/dev/sde1             9.0T  4.4T  4.2T  51% /disk/data3
/dev/sdi1             9.0T  4.4T  4.2T  51% /disk/data4
/dev/sdd1             9.0T  4.2T  4.4T  49% /disk/data5
/dev/sdj1              19T  6.3T   12T  35% /disk/data6
/dev/sdf1              19T  6.3T   12T  35% /disk/data7

The other namespace is built on a single filesystem:

/dev/sdg1             5.4T  282G  4.9T   6% /disk/data0

There is no fancy configuration:


all.manager <redacted>:1213


if named srm
   all.export /srm
   oss.localroot /disk/data3/ns-srm
   oss.space public /disk/data3/xrddata
   oss.space public /disk/data4/xrddata
   oss.space public /disk/data5/xrddata
   oss.space public /disk/data6/xrddata
   oss.space public /disk/data7/xrddata
   xrd.port 1094
fi


if named atlasc
   all.export /atlasC
   oss.localroot /disk/data0/ns-atlasC
   oss.space public /disk/data0/xrddata
   xrd.port 1095
fi


xrd.allow host <redacted>
xrd.allow host *.at3f

cms.allow host <redacted>
cms.allow host *.at3f

all.role server


Plus some security configuration...

	-paul



> Hi Paul,
> 
>   What is the configuration of your xrootd system?  How many data 
> servers, redirector etc.
> 
> thanks,
> 
> Doug
> 
> On 10/07/2011 02:37 PM, Paul T. Keener wrote:
>> Lukasz,
>>
>>> 2011/10/7 Paul T. Keener <[log in to unmask]>:
>>>> I see load averages of 140 to 200 or more and I see a high percent 
>>>> of CPU
>>>> time in wait states (70-80%) with a large number of interrupts 
>>>> (10-15K),
>>>> both of these reported by dstat.
>>>
>>> Hi Paul,
>>>
>>>    if a significant fraction of the CPU is used by ksoftirqd it's
>>> usually an indication that the kernel has troubles managing the
>>> interrupts coming from the NIC (too much data coming from or going to
>>> the wire). If possible, try to put the network driver for your device
>>> in polling mode (the kernel then devotes one core to handling the
>>> stuff coming from NIC and disables the NIC's interrupts), you should
>>> see an improvement.
>>
>> I will look at this, but if it really is a network problem, wouldn't I 
>> see it regardless of which xrootd namespace I was talking to?
>>
>> Thanks.
>>
>>     -paul