Hi Andy, Thanks for looking into this. I'm pretty sure that sched was 0 for all 5 fields during the test. The perl scripts reporting the load are running on the servers though. What I might have added later was fuzz 100. Anyway, with the current config file (the one on the web) the redirector is still doing the same. Should I try commenting out cms.perf and cms.shed and restart the whole cluster -- and then start putting things back in? Will this result in pure round robin open request redirection? Matevz PS - Our xrootd cluster was in certificate hell yesterday, only 4 out of 10 machines were accessible -- so it's a good time to have disruptive fun with it ... both :) and :( On 3/7/14 5:07 PM, Andrew Hanushevsky wrote: > Hi Matevz, > > OK, based on the log the config file you pointed to is not the one used in the > associated log. Why? Because a non-zero load is being calculated so that means > the factors were not zero at the time of the test. Indeed, the redirector will > avoid heavily loaded servers and that would explain what you saw. > > Andy > > On Fri, 7 Mar 2014, Matevz Tadel wrote: > >> Hi Andy, >> >> Is this good enough or I should prepare something else? >> >> Matevz >> >> On 02/27/14 10:49, Matevz Tadel wrote: >>> Hi Andy, >>> >>> I had "cms.trace all" all along. >>> >>> This is the extract of redirects: >>> http://uaf-2.t2.ucsd.edu/~matevz/tmp/cmsd-redirect.txt >>> >>> The full log: >>> http://uaf-2.t2.ucsd.edu/~matevz/tmp/cmsd.log >>> >>> And a sortable table of a set of ~200 files opened with 1 second interval: >>> http://uaf-2.t2.ucsd.edu/~matevz/tmp/ucsd-openfiles.html >>> - you can sort it by open time (similar to redirect extract); >>> - or by server name to see the distribution over servers. >>> >>> Our servers are uaf-[3-9], cabinet-8-8-[0-8], cabinet-8-8-[10-13]. >>> >>> You'll see that cabinet 0, 2, 3, 7, 8 and 10 do not get selected at all in this >>> 200 file test and that uaf-4, 5 and 9 are only selected 2 or 3 times. I checked >>> there is no weirdness on xrootd / cmsd logs on the under provisioned nodes (and >>> that I can talk to them directly). >>> >>> Ah, just noticed ... the cabinet nodes that don't get selected do have a higher >>> load & cpu usage and the ones that do are not doing anything (which is really >>> unusual, that's why I didn't even check it at first). So my cms.sched settings >>> seem to get ignored! >>> >>> The full config, redirector is xrootd.t2.ucsd.edu: >>> http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrootd.cfg >>> >>> Matevz >>> >>> On 02/27/14 01:05, Andrew Hanushevsky wrote: >>>> Hi Matevz, >>>> >>>> The only way to find out is to turn on redirect debugging in the cmsd for a >>>> while and see what the decisions were. We can go from there once we have a >>>> timeline. >>>> >>>> Andy >>>> >>>> On Wed, 26 Feb 2014, Matevz Tadel wrote: >>>> >>>>> On 02/26/14 09:22, Matevz Tadel wrote: >>>>>> Hi, >>>>>> >>>>>> We have ~20 of xrootd servers at UCSD, all of them do something else, too, >>>>>> and >>>>>> are thus under different load. This led to practically all requests going >>>>>> to a >>>>>> few servers only so I set cms.sched to do round-robin. But this does't help >>>>>> much, the open requests are still mostly sent to the same few servers. >>>>>> >>>>>> Could it be that "cms.dfs lookup distrib" causes the redirector to send the >>>>>> client to the "fastest to respond" server instead of decoupling verify and >>>>>> redirect steps? >>>>> >>>>> OK, that wasn't it ... I got hdfs configured on our redirector and tried >>>>> lookup central but it didn't change anything. >>>>> >>>>> What could cause the redirector to only redirect to a few servers? I have this >>>>> now ... so it should be pure round-robin, right? >>>>> cms.sched cpu 0 io 0 mem 0 pag 0 runq 0 space 0 fuzz 100 refreset 3600 >>>>> >>>>> >>>>> Matevz >>>>> >>>>> ######################################################################## >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-DEV list, click the following link: >>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>>>> >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-DEV list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1