On Sat, 25 Jun 2011, Matevz Tadel wrote:
> OK, understood. That's also what I was hit by ... I had the default 'cms.delay
> servers' value (which turns out to be 80%) and four servers ... one went down
> ... and so the whole thing stopped.
Yes, in the case where you have less than say 16 or so servers, the better
choice is to use a specific number.
> I don't have 'cms.sched maxload' set ... the default is 100, right? And another
> thing, runq percentage -- this pertains to system load average, the first number
> reported by the executable starded via cms.perf (being 100 * LoadAvg15 / N_cores
> in XrdOlbMonPerf, it seems)?
Yes, the default is 100. I haven't looked at the MonPerf program in a
while. That was developed by a coalition of partners. Most people just key
off the cpu and io numbers as these are more relevant. Also, you can avoid
the bad side-effects of highly variable servers by nicing down the cmsd
(say -15 or -20, same as xntpd). The idea is that you want the cmsd to be
responsive regardless of load. It hardly uses any resources so a low nice
value won't impact anything.