LISTSERV 16.5 - XROOTD-DEV Archives

We've been having space reporting issues here at OU for quite some time.
After every xrootd/cmsd restart, commands like

df /xrd/?oss.cgroup=ATLASDATADISK
xrdfs cstore11 query space /xrd/?oss.cgroup=ATLASDATADISK

report completely nonsensical numbers; mostly 0, or at least MUCH too small.

I can usually get it fixed, but I have to run

frm_admin -c /etc/xrootd/xrootd-clustered.cfg -n clustered audit -fix -force usage

on all storage servers, and usually multiple times, before the numbers
make sense and are stable again. Every time I run this command,
the output is reasonable, something like this:

Audit of 545721 files in ATLASDATADISK space completed with no errors.
Claimed: 0
Actual: 71600543709295

But then when I then run the above oss.cgroup commands again, they're back
to 0 or very small, and I have to do this several times.

I don't think there's a problem with any of the space configuration
or reporting in general, since it's behaving perfectly normal and
consistent once it's stable -- but every time I have to restart,
it's completely messed up, and I can only get it stable again
by re-running the fix multiple times, and even then it only seems
to 'take' when the file system is idle. So that makes me think
that there might be some kind of 'caching' or race condition going on
which keeps messing things up even after 'fixing' it again and again.

Each 'fix' by itself looks perfectly fine, too, the reported fixed
'Actual:' numbers are perfectly reasonable. But they don't stay,
but seem to be getting overwritten by the wrong numbers again.

Here's an example from the fix I just ran on one of the (7 identical) storage servers:

Audit of 35069 files in ATLASSCRATCHDISK space completed with no errors.
Claimed: 20
Actual: 1734698250629

Yet a few minutes later, the oss.cgroup commands above report this again:

ATLASSCRATCHDISK:
Filesystem 1K-blocks Used Available Use% Mounted on
xrootdfs 765694027776 0 299222330089 0% /xrd
0

The space/usage configuration looks fine:

   oss.usage log $(osscachepath)/.logs
   oss.space ATLASDATADISK assign /xrd/atlasdatadisk/
   oss.space ATLASLOCALGROUPDISK assign /xrd/atlaslocalgroupdisk/
   oss.space ATLASSCRATCHDISK assign /xrd/atlasscratchdisk/

And all that works perfectly fine, and tracks very reasonably with file transfers and deletions on the storage -- until I need to restart xrootd, then it's messed up again.

Let me know what other details I can add to help debug this.

Thanks a lot,
Horst

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1213", "url": "https://github.com/xrootd/xrootd/issues/1213", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1