Print

Print


Hi Patrick,

Good to hear. What's is it with SWT and power anyway? You've not been 
having much luck of late :-(

As for listing what going on in he background, please refer to:

http://xrootd.org/doc/dev45/XRdv310.htm#_Toc464248860

which describes how to send messages to the the xrootd and ask it to list 
background jobs. We used to have a Perl script to do that but it's been 
long lost and should be replaced by Python 3 anyway. It's pretty simple.

1) the script connects to the local xrootd socket. This is a Unix named 
socket located at <adminpath>/.xrootd/admin and you use the standard 
TCP connect/send/recv system calls on it. The only difference is how you 
construct the "address" you wish to connect to. A Python example on how to 
do that can be found here (and many more places by searching for "connect 
to Unix domain socket python"):
https://pymotw.com/2/socket/uds.html

2) Once connected issue a login request:
http://xrootd.org/doc/dev45/XRdv310.htm#_Toc464248872

3) Once logged in issue the command to list jobs:
http://xrootd.org/doc/dev45/XRdv310.htm#_Toc464248875

You will note that all responses are in xml but there's a Python XML 
library to help you out there. We likely should replace this with an HTTP 
REST interface but that's for some other day.

Andy


On Sun, 30 Jun 2019, Patrick McGuigan wrote:

> Hi Andy,
>
> Yes the previous configuration worked as is.  Interestingly the problem went 
> away after some forced interventions.
>
> About 36 hours before my post we had a power outage in our system when a 
> generator failed to start.  I noticed these problems about 24 hours after the 
> restart.  About 24 hours after my posting we lost cooling in the systems and 
> had to do another shutdown and restart.  Since the restart we have not seen 
> the problem occur again.
>
> Before the last restart, I did notice we were using an altered version of 
> XrdOlbMonPerf for reporting statistics back to the redirector.  I put our 
> previous version back in place.
>
> Either the storage did not come up cleanly after the generator failure, or 
> the redirector was seriously overloading the data servers in question because 
> of the XrdOlbMonPerf issue.  Either way the problem did not arise after a 
> fairly busy week in the cluster.
>
> I would still like to see an example of listing the ongoing checksums using 
> the admin interface?
>
> Thanks,
> Patrick
>
>
> On 6/24/19 1:22 PM, Andrew Hanushevsky wrote:
>> Hi Patrick,
>> 
>> Did this exact config file work in the previous release? Also, is this a 
>> complaint from your proxy server or from the data
>> 
>> To answer your questions:
>> 
>> Based on what you told me, the xrootd.chksum is using the native checksum 
>> calculation. So, there are no external resources being used. However, by 
>> default the checksum calculation uses memory mapped I/O to do the 
>> calculation. I suppose it's possible that there is not enough memory to map 
>> the file or memory mapping is not supported for the file system you are 
>> using. You can run the server specifying -d on the command line and then 
>> trying the xrdadler32 command to see what in he world it is doing
>> 
>> The default cksrdsz as noted in the manual is:
>> http://xrootd.org/doc/dev49/ofs_config.htm#_Toc522916527
>> ofs.cksrdsz 64m
>> 
>> Andy
>> 
>> On Sun, 23 Jun 2019, Patrick McGuigan wrote:
>> 
>>> Hi,
>>> 
>>> We recently updated our storage servers to CentOS 7.6 and our xrootd to 
>>> v4.9.1 and I am starting to see some odd behavior when checksums are being 
>>> computed. The config file is using:
>>> 
>>> xrootd.chksum max 4 adler32
>>> 
>>> 
>>> Some questions:
>>> 
>>> What is the default value for ofsofs.cksrdsz?
>>> 
>>> The documentation for xrootd.chksum state that one can use the 
>>> administrative interface to list checksum jobs.  Are there any examples 
>>> for this?
>>> 
>>> On a dataserver that is hosting a file I tried:
>>> 
>>> export XRD_LOGLEVEL=Dump
>>> xrdadler32 root://localhost//path/to/file
>>> 
>>> and when I look at the output the item that sticks out is this response 
>>> from the server:
>>> 
>>> 
>>> [2019-06-23 16:28:04.201142 -0500][Dump   ][XRootD            ] 
>>> [localhost:1094] Got kXR_wait response of 30 seconds to message kXR_query 
>>> (code: kXR_Qcksum, arg length: 85): Job resources currently not available.
>>> 
>>> How do I determine what resources are currently not available?
>>> the output of xrdfs localhost query stats a (somewhat trimmed) looks like:
>>> 
>>>  <stats id="buff">
>>>    <reqs>39456910</reqs>
>>>    <mem>16047104</mem>
>>>    <buffs>394</buffs>
>>>    <adj>0</adj>
>>>    <xlreqs>0</xlreqs>
>>>    <xlmem>0</xlmem>
>>>    <xlbuffs>0</xlbuffs>
>>>  </stats>
>>> 
>>>  <stats id="link">
>>>    <num>105</num>
>>>    <maxn>282</maxn>
>>>    <tot>47565</tot>
>>>    <in>6137490530255</in>
>>>    <out>17606648096180</out>
>>>    <ctime>14769030</ctime>
>>>    <tmo>354599</tmo>
>>>    <stall>0</stall>
>>>    <sfps>0</sfps>
>>>  </stats>
>>> 
>>>  <stats id="poll">
>>>    <att>105</att>
>>>    <en>357683</en>
>>>    <ev>357616</ev>
>>>    <int>0</int>
>>>  </stats>
>>>  <stats id="proc">
>>>    <usr>
>>>      <s>2941</s>
>>>      <u>438904</u>
>>>    </usr>
>>>    <sys>
>>>      <s>23163</s>
>>>      <u>484269</u>
>>>    </sys>
>>>  </stats>
>>>  <stats id="xrootd">
>>>    <num>47565</num>
>>>    <ops>
>>>      <open>17892</open>
>>>      <rf>0</rf>
>>>      <rd>23894633</rd>
>>>      <pr>0</pr>
>>>      <rv>0</rv>
>>>      <rs>0</rs>
>>>      <wv>0</wv>
>>>      <ws>0</ws>
>>>      <wr>29762958</wr>
>>>      <sync>0</sync>
>>>      <getf>0</getf>
>>>      <putf>0</putf>
>>>      <misc>1999122</misc>
>>>    </ops>
>>>    <sig>
>>>      <ok>0</ok>
>>>      <bad>0</bad>
>>>      <ign>0</ign>
>>>    </sig>
>>>    <aio>
>>>      <num>0</num>
>>>      <max>62</max>
>>>      <rej>39162</rej>
>>>    </aio>
>>>    <err>876103</err>
>>>    <rdr>0</rdr>
>>>    <dly>0</dly>
>>>    <lgn>
>>>      <num>47565</num>
>>>      <af>0</af>
>>>      <au>47565</au>
>>>      <ua>0</ua>
>>>    </lgn>
>>>  </stats>
>>>  <stats id="ofs">
>>>    <role>server</role>
>>>    <opr>7</opr>
>>>    <opw>1</opw>
>>>    <opp>0</opp>
>>>    <ups>0</ups>
>>>    <han>8</han>
>>>    <rdr>0</rdr>
>>>    <bxq>0</bxq>
>>>    <rep>0</rep>
>>>    <err>0</err>
>>>    <dly>0</dly>
>>>    <sok>0</sok>
>>>    <ser>0</ser>
>>>    <tpc>
>>>      <grnt>0</grnt>
>>>      <deny>0</deny>
>>>      <err>0</err>
>>>      <exp>0</exp>
>>>    </tpc>
>>>  </stats>
>>>  <stats id="sched">
>>>    <jobs>30332198</jobs>
>>>    <inq>0</inq>
>>>    <maxinq>6</maxinq>
>>>    <threads>42</threads>
>>>    <idle>21</idle>
>>>    <tcr>923</tcr>
>>>    <tde>881</tde>
>>>    <tlimr>0</tlimr>
>>>  </stats>
>>>  <stats id="sgen">
>>>    <as>1</as>
>>>    <et>0</et>
>>>    <toe>1561325797</toe>
>>>  </stats>
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Patrick
>>> 
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>> 
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>> 
>> 
>> ########################################################################
>> Use REPLY-ALL to reply to list
>> 
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1