Print

Print


Hi Matevz,

OK, based on the log the config file you pointed to is not the one used in 
the associated log. Why? Because a non-zero load is being calculated so 
that means the factors were not zero at the time of the test. Indeed, the 
redirector will avoid heavily loaded servers and that would explain what 
you saw.

Andy

On Fri, 7 Mar 2014, Matevz Tadel wrote:

> Hi Andy,
>
> Is this good enough or I should prepare something else?
>
> Matevz
>
> On 02/27/14 10:49, Matevz Tadel wrote:
>> Hi Andy,
>> 
>> I had "cms.trace all" all along.
>> 
>> This is the extract of redirects:
>>    http://uaf-2.t2.ucsd.edu/~matevz/tmp/cmsd-redirect.txt
>> 
>> The full log:
>>    http://uaf-2.t2.ucsd.edu/~matevz/tmp/cmsd.log
>> 
>> And a sortable table of a set of ~200 files opened with 1 second interval:
>>    http://uaf-2.t2.ucsd.edu/~matevz/tmp/ucsd-openfiles.html
>> - you can sort it by open time (similar to redirect extract);
>> - or by server name to see the distribution over servers.
>> 
>> Our servers are uaf-[3-9], cabinet-8-8-[0-8], cabinet-8-8-[10-13].
>> 
>> You'll see that cabinet 0, 2, 3, 7, 8 and 10 do not get selected at all in 
>> this
>> 200 file test and that uaf-4, 5 and 9 are only selected 2 or 3 times. I 
>> checked
>> there is no weirdness on xrootd / cmsd logs on the under provisioned nodes 
>> (and
>> that I can talk to them directly).
>> 
>> Ah, just noticed ... the cabinet nodes that don't get selected do have a 
>> higher
>> load & cpu usage and the ones that do are not doing anything (which is 
>> really
>> unusual, that's why I didn't even check it at first). So my cms.sched 
>> settings
>> seem to get ignored!
>> 
>> The full config, redirector is xrootd.t2.ucsd.edu:
>>    http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrootd.cfg
>> 
>> Matevz
>> 
>> On 02/27/14 01:05, Andrew Hanushevsky wrote:
>>> Hi Matevz,
>>> 
>>> The only way to find out is to turn on redirect debugging in the cmsd for 
>>> a
>>> while and see what the decisions were. We can go from there once we have a
>>> timeline.
>>> 
>>> Andy
>>> 
>>> On Wed, 26 Feb 2014, Matevz Tadel wrote:
>>> 
>>>> On 02/26/14 09:22, Matevz Tadel wrote:
>>>>> Hi,
>>>>> 
>>>>> We have ~20 of xrootd servers at UCSD, all of them do something else, 
>>>>> too, and
>>>>> are thus under different load. This led to practically all requests 
>>>>> going to a
>>>>> few servers only so I set cms.sched to do round-robin. But this does't 
>>>>> help
>>>>> much, the open requests are still mostly sent to the same few servers.
>>>>> 
>>>>> Could it be that "cms.dfs lookup distrib" causes the redirector to send 
>>>>> the
>>>>> client to the "fastest to respond" server instead of decoupling verify 
>>>>> and
>>>>> redirect steps?
>>>> 
>>>> OK, that wasn't it ... I got hdfs configured on our redirector and tried
>>>> lookup central but it didn't change anything.
>>>> 
>>>> What could cause the redirector to only redirect to a few servers? I have 
>>>> this
>>>> now ... so it should be pure round-robin, right?
>>>>  cms.sched    cpu 0 io 0 mem 0 pag 0 runq 0 space 0 fuzz 100 refreset 
>>>> 3600
>>>> 
>>>> 
>>>> Matevz
>>>> 
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>> 
>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>> 
>> 
>> ########################################################################
>> Use REPLY-ALL to reply to list
>> 
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1