Hi Andy,
Is this good enough or I should prepare something else?
Matevz
On 02/27/14 10:49, Matevz Tadel wrote:
> Hi Andy,
>
> I had "cms.trace all" all along.
>
> This is the extract of redirects:
> http://uaf-2.t2.ucsd.edu/~matevz/tmp/cmsd-redirect.txt
>
> The full log:
> http://uaf-2.t2.ucsd.edu/~matevz/tmp/cmsd.log
>
> And a sortable table of a set of ~200 files opened with 1 second interval:
> http://uaf-2.t2.ucsd.edu/~matevz/tmp/ucsd-openfiles.html
> - you can sort it by open time (similar to redirect extract);
> - or by server name to see the distribution over servers.
>
> Our servers are uaf-[3-9], cabinet-8-8-[0-8], cabinet-8-8-[10-13].
>
> You'll see that cabinet 0, 2, 3, 7, 8 and 10 do not get selected at all in this
> 200 file test and that uaf-4, 5 and 9 are only selected 2 or 3 times. I checked
> there is no weirdness on xrootd / cmsd logs on the under provisioned nodes (and
> that I can talk to them directly).
>
> Ah, just noticed ... the cabinet nodes that don't get selected do have a higher
> load & cpu usage and the ones that do are not doing anything (which is really
> unusual, that's why I didn't even check it at first). So my cms.sched settings
> seem to get ignored!
>
> The full config, redirector is xrootd.t2.ucsd.edu:
> http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrootd.cfg
>
> Matevz
>
> On 02/27/14 01:05, Andrew Hanushevsky wrote:
>> Hi Matevz,
>>
>> The only way to find out is to turn on redirect debugging in the cmsd for a
>> while and see what the decisions were. We can go from there once we have a
>> timeline.
>>
>> Andy
>>
>> On Wed, 26 Feb 2014, Matevz Tadel wrote:
>>
>>> On 02/26/14 09:22, Matevz Tadel wrote:
>>>> Hi,
>>>>
>>>> We have ~20 of xrootd servers at UCSD, all of them do something else, too, and
>>>> are thus under different load. This led to practically all requests going to a
>>>> few servers only so I set cms.sched to do round-robin. But this does't help
>>>> much, the open requests are still mostly sent to the same few servers.
>>>>
>>>> Could it be that "cms.dfs lookup distrib" causes the redirector to send the
>>>> client to the "fastest to respond" server instead of decoupling verify and
>>>> redirect steps?
>>>
>>> OK, that wasn't it ... I got hdfs configured on our redirector and tried
>>> lookup central but it didn't change anything.
>>>
>>> What could cause the redirector to only redirect to a few servers? I have this
>>> now ... so it should be pure round-robin, right?
>>> cms.sched cpu 0 io 0 mem 0 pag 0 runq 0 space 0 fuzz 100 refreset 3600
>>>
>>>
>>> Matevz
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|