Print

Print


Thanks Andy,

Every 110 minutes is almost correct for the defaults for cms.ping.  If I read 
the docs correctly, the redirector will gather load data every ten minutes 
(which I now suspect is responsible for the log entries in the data server) but 
log the information every 10th fetch of load data.  This implies a log entry 
every 100 minutes at the redirector.

I'll hold off on any bug reports until after I upgrade our systems to something 
newer and see if anything is still amiss.

Regards,

Patrick






On 10/28/2016 02:47 PM, Andrew Hanushevsky wrote:
> Hi Patrick,
>
> There is an interaction between the interval specified to the script and the
> "int" parameter supplied to the receiver which invariable causes time dilation.
> I don't have a good solution for that other than to say 7 minutes is close
> enough. Note that the documentation
>
> http://xrootd.org/doc/dev45/cms_config.htm#_Toc454223033
>
> says "estimated" time when specifying the "int" value. I know that isn't
> comforting but we have the issue of trying to co-ordinate two async processes
> that compute the load tatistics so the timing will invariably be off without a
> lot more wffort.
>
> As for the redirector providing those statistics that's controlled by the "ping"
> directive.
>
> http://xrootd.org/doc/dev45/cms_config.htm#_Toc454223053
>
> and should be printed every 10 minutes or so (by default). So, if it is actually
> 110 minutes then something is amiss here. You can, of course, report that as a
> bug in github if this is causing problems.
>
> Andy
>
> On Fri, 28 Oct 2016, Patrick McGuigan wrote:
>
>> Hi Andy,
>>
>> I am looking at using something very similar in my setup and I was wondering
>> about the data that I am seeing in the log files.
>>
>> In my data servers I am using:
>> cms.perf int 5m pgm /usr/share/xrootd/utils/XrdOlbMonPerf 300
>>
>> In my cmsd.log files I see lines like:
>> 161028 00:05:59 449 Report_Usage cpu=7 net=72 xeq=0 mem=99 pag=0 dsk=53 23414869
>>
>> being generated every 10 minutes.  I would have expected to see this every
>> five minutes based on the interval of 5 minutes.  Am I missing something?
>>
>> Also, does the log line indicate data being collected from the performance
>> monitoring program, or data being forwarded to the redirector?
>>
>>
>> In the redirector I see lines like:
>> 161028 01:45:59 27500 Node: storage-23-14.local load=9; cpu=10 net=13 inq=2
>> mem=99 pag=0 dsk=23368366 utl=53 shr=[100 3 0]
>>
>> The period between these lines (for the same data server) appears to be 110
>> minutes.  Is the line above simply a periodic summary of the conditions that
>> were last reported from the dataservers?
>>
>>
>> Thanks,
>>
>> Patrick
>>
>>
>>
>> On 10/18/2016 03:23 PM, Andrew Hanushevsky wrote:
>>> Hi Max,
>>>
>>> The load is calculated by the script you specify to run using cms.perf
>>>
>>> http://xrootd.org/doc/dev44/cms_config.htm#_Toc454223033
>>>
>>> if you haven't specified it then only the file system loads are reported and
>>> everything else is zero. We provide a sample script in the "utils" directory of
>>> the source repo called XrdOlbMonPerf and it should work for any Linux system.
>>> However, do verify as we normally don't update this script unless someone
>>> complains and offers a solution.
>>>
>>> Andy
>>>
>>> -----Original Message----- From: Fischer, Max (SCC)
>>> Sent: Tuesday, October 18, 2016 2:45 AM
>>> To: [log in to unmask]
>>> Subject: empty usage statistics (for cms.sched)
>>>
>>> Hi all,
>>>
>>> I'm investigating some broken load balancing in one of our sub-clusters.
>>>> From a set of similar servers using a shared file system, about 1-2 show
>>> considerably higher load (750 vs 30).
>>> My attempt was to change `cms.sched` to prefer machines with low load, but there
>>> was no effect from changing this policy.
>>>
>>> After switching on logging of `cms.ping` [1] statistics, it turns out that *all*
>>> load statistics from servers are reported as 0:
>>>    161018 11:32:09 25026 Node: f01-101-136-e.gridka.de load=0; cpu=0 net=0 inq=0
>>> mem=0 pag=0 dsk=871483960 utl=49 shr=[100 24 0]
>>>
>>> Do I have to adjust the configuration of server or manager to make this work?
>>> The manager is running v4.3, the servers are running v4.3 and v4.4.
>>>
>>> Cheers,
>>> Max
>>>
>>> [1] cms.ping docs
>>> http://xrootd.org/doc/dev43/cms_config.htm#_Toc436250534
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1