Print

Print


Hi Andy,

Yes the previous configuration worked as is.  Interestingly the problem went 
away after some forced interventions.

About 36 hours before my post we had a power outage in our system when a 
generator failed to start.  I noticed these problems about 24 hours after the 
restart.  About 24 hours after my posting we lost cooling in the systems and had 
to do another shutdown and restart.  Since the restart we have not seen the 
problem occur again.

Before the last restart, I did notice we were using an altered version of 
XrdOlbMonPerf for reporting statistics back to the redirector.  I put our 
previous version back in place.

Either the storage did not come up cleanly after the generator failure, or the 
redirector was seriously overloading the data servers in question because of the 
XrdOlbMonPerf issue.  Either way the problem did not arise after a fairly busy 
week in the cluster.

I would still like to see an example of listing the ongoing checksums using the 
admin interface?

Thanks,
Patrick


On 6/24/19 1:22 PM, Andrew Hanushevsky wrote:
> Hi Patrick,
> 
> Did this exact config file work in the previous release? Also, is this a 
> complaint from your proxy server or from the data
> 
> To answer your questions:
> 
> Based on what you told me, the xrootd.chksum is using the native checksum 
> calculation. So, there are no external resources being used. However, by default 
> the checksum calculation uses memory mapped I/O to do the calculation. I suppose 
> it's possible that there is not enough memory to map the file or memory mapping 
> is not supported for the file system you are using. You can run the server 
> specifying -d on the command line and then trying the xrdadler32 command to see 
> what in he world it is doing
> 
> The default cksrdsz as noted in the manual is:
> http://xrootd.org/doc/dev49/ofs_config.htm#_Toc522916527
> ofs.cksrdsz 64m
> 
> Andy
> 
> On Sun, 23 Jun 2019, Patrick McGuigan wrote:
> 
>> Hi,
>>
>> We recently updated our storage servers to CentOS 7.6 and our xrootd to v4.9.1 
>> and I am starting to see some odd behavior when checksums are being computed. 
>> The config file is using:
>>
>> xrootd.chksum max 4 adler32
>>
>>
>> Some questions:
>>
>> What is the default value for ofsofs.cksrdsz?
>>
>> The documentation for xrootd.chksum state that one can use the administrative 
>> interface to list checksum jobs.  Are there any examples for this?
>>
>> On a dataserver that is hosting a file I tried:
>>
>> export XRD_LOGLEVEL=Dump
>> xrdadler32 root://localhost//path/to/file
>>
>> and when I look at the output the item that sticks out is this response from 
>> the server:
>>
>>
>> [2019-06-23 16:28:04.201142 -0500][Dump   ][XRootD            ] 
>> [localhost:1094] Got kXR_wait response of 30 seconds to message kXR_query 
>> (code: kXR_Qcksum, arg length: 85): Job resources currently not available.
>>
>> How do I determine what resources are currently not available?
>> the output of xrdfs localhost query stats a (somewhat trimmed) looks like:
>>
>>  <stats id="buff">
>>    <reqs>39456910</reqs>
>>    <mem>16047104</mem>
>>    <buffs>394</buffs>
>>    <adj>0</adj>
>>    <xlreqs>0</xlreqs>
>>    <xlmem>0</xlmem>
>>    <xlbuffs>0</xlbuffs>
>>  </stats>
>>
>>  <stats id="link">
>>    <num>105</num>
>>    <maxn>282</maxn>
>>    <tot>47565</tot>
>>    <in>6137490530255</in>
>>    <out>17606648096180</out>
>>    <ctime>14769030</ctime>
>>    <tmo>354599</tmo>
>>    <stall>0</stall>
>>    <sfps>0</sfps>
>>  </stats>
>>
>>  <stats id="poll">
>>    <att>105</att>
>>    <en>357683</en>
>>    <ev>357616</ev>
>>    <int>0</int>
>>  </stats>
>>  <stats id="proc">
>>    <usr>
>>      <s>2941</s>
>>      <u>438904</u>
>>    </usr>
>>    <sys>
>>      <s>23163</s>
>>      <u>484269</u>
>>    </sys>
>>  </stats>
>>  <stats id="xrootd">
>>    <num>47565</num>
>>    <ops>
>>      <open>17892</open>
>>      <rf>0</rf>
>>      <rd>23894633</rd>
>>      <pr>0</pr>
>>      <rv>0</rv>
>>      <rs>0</rs>
>>      <wv>0</wv>
>>      <ws>0</ws>
>>      <wr>29762958</wr>
>>      <sync>0</sync>
>>      <getf>0</getf>
>>      <putf>0</putf>
>>      <misc>1999122</misc>
>>    </ops>
>>    <sig>
>>      <ok>0</ok>
>>      <bad>0</bad>
>>      <ign>0</ign>
>>    </sig>
>>    <aio>
>>      <num>0</num>
>>      <max>62</max>
>>      <rej>39162</rej>
>>    </aio>
>>    <err>876103</err>
>>    <rdr>0</rdr>
>>    <dly>0</dly>
>>    <lgn>
>>      <num>47565</num>
>>      <af>0</af>
>>      <au>47565</au>
>>      <ua>0</ua>
>>    </lgn>
>>  </stats>
>>  <stats id="ofs">
>>    <role>server</role>
>>    <opr>7</opr>
>>    <opw>1</opw>
>>    <opp>0</opp>
>>    <ups>0</ups>
>>    <han>8</han>
>>    <rdr>0</rdr>
>>    <bxq>0</bxq>
>>    <rep>0</rep>
>>    <err>0</err>
>>    <dly>0</dly>
>>    <sok>0</sok>
>>    <ser>0</ser>
>>    <tpc>
>>      <grnt>0</grnt>
>>      <deny>0</deny>
>>      <err>0</err>
>>      <exp>0</exp>
>>    </tpc>
>>  </stats>
>>  <stats id="sched">
>>    <jobs>30332198</jobs>
>>    <inq>0</inq>
>>    <maxinq>6</maxinq>
>>    <threads>42</threads>
>>    <idle>21</idle>
>>    <tcr>923</tcr>
>>    <tde>881</tde>
>>    <tlimr>0</tlimr>
>>  </stats>
>>  <stats id="sgen">
>>    <as>1</as>
>>    <et>0</et>
>>    <toe>1561325797</toe>
>>  </stats>
>>
>>
>>
>> Regards,
>>
>> Patrick
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
> 
> ########################################################################
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1