LISTSERV 16.5 - XROOTD-L Archives

Hi Matevz,

>>> I still suspect something goes wrong with the disk selection for
>>> placement of new files -- the full disk should simply not be
>>> chosen over the other disks and thus stop filling up
Does that mean it should stop filling up when it reaches the high 
watermark or at 100%?
So even if i have the situation that all my access/modification times on 
that particular disk are rather new (since i cleared it after i saw the 
issue first) you think it shouldn't happen that this disk fills up 
completely?

I can try to run a custom build in a new container if you give me the 
instructions. However it probably should be on top of v5.4.2 since we 
saw issues with newer versions (very slow transfers - didn't have time 
to look in detail into that).

Cheers,
Nikolai

On 2/28/23 09:05, Matevz Tadel wrote:
> Hi Andy,
> 
> Yeah, the code also looked good to me ... but then, if a fuller disk still gets
> selected (with assumed default oss.alloc, fuzz=0), then perhaps it's the cache
> scan that assigns the partition a wrong free space value? I'm not saying the
> problem is in xrootd, maybe somebody else is playing tricks at the
> system/container level?
> 
> Matevz
> 
> On 2/27/23 23:50, Andrew Hanushevsky wrote:
>> Hi Matevz,
>>
>> It is XrdOssCache::Alloc() and I will gaurantee you that it will not choose a
>> full disk if it has properly confiurted. The default is proper so I don't know
>> if there have been any overrides here.
>>
>> Andy
>>
>>
>> On Mon, 27 Feb 2023, Matevz Tadel wrote:
>>
>>> Hi Nikolai,
>>>
>>> I don't think this matters, xcache uses last file access time as written in the
>>> cinfo file.
>>>
>>> I still suspect something goes wrong with the disk selection for placement of
>>> new files -- the full disk should simply not be chosen over the other disks and
>>> thus stop filling up.
>>>
>>> Wiping the whole cache would help, at least for a while :) ... but it would be
>>> good to understand what is going on here. Would you be able to run with a custom
>>> build? Potentially we could just replace a single library to include some
>>> printouts.
>>>
>>> Andy, remind me please ... where is the code that does disk selection? Is it
>>> XrdOssCache::Alloc()?
>>>
>>> Cheers,
>>> Matevz
>>>
>>> On 2/16/23 02:27, Nikolai Hartmann wrote:
>>>> Hi again,
>>>>
>>>> Actually it seems to only change the "change" time (st_ctime)
>>>>
>>>> touch test
>>>> stat test
>>>> [...]
>>>> Access: 2023-02-16 11:25:11.962804882 +0100
>>>> Modify: 2023-02-16 11:25:11.962804882 +0100
>>>> Change: 2023-02-16 11:25:11.962804882 +0100
>>>>   Birth: 2023-02-16 11:25:11.962804882 +0100
>>>>
>>>> chown xrootd test
>>>> stat test
>>>> [...]
>>>> Access: 2023-02-16 11:25:11.962804882 +0100
>>>> Modify: 2023-02-16 11:25:11.962804882 +0100
>>>> Change: 2023-02-16 11:25:20.322843125 +0100
>>>>   Birth: 2023-02-16 11:25:11.962804882 +0100
>>>>
>>>> Does this play a role?
>>>>
>>>> Cheers,
>>>> Nikolai
>>>>
>>>> On 2/16/23 11:18, Nikolai Hartmann wrote:
>>>>> Hi Matevz (including xrootd list again which i forgot in the last reply),
>>>>>
>>>>>> Well, if for some reason more new files are placed on a single disk,
>>>>>> those files will be "newer" and purge would preferentially wipe data
>>>>>> off other disks.
>>>>> Mhhhh - then i have an idea how i may have triggered this. As mentioned in my
>>>>> first email the issue started after i updated my container image and had to
>>>>> change the xrootd user ids. This changes the Access time of the files - if
>>>>> that is used by xrootd to determine which files are newer than it could just
>>>>> be that the chown process walked this directory last and therefore will purge
>>>>> it last.
>>>>> When i then deleted it when the disk ran full i made the problem even worse
>>>>> since now all the files that end up there are recently accessed.
>>>>>
>>>>> So deleting the whole cache should fix it?
>>>>>
>>>>> Cheers,
>>>>> Nikolai
>>>>>
>>>>> On 2/16/23 10:50, Matevz Tadel wrote:
>>>>>> Hi Andy, Nikolai,
>>>>>>
>>>>>> On 2/15/23 23:51, Andrew Hanushevsky wrote:
>>>>>>> Hi Nikolai,
>>>>>>>
>>>>>>> Hmm, this sounds like an off by one problem in Xcache.
>>>>>>
>>>>>> How? XCache does not do disks, it just uses oss API to a pool.
>>>>>>
>>>>>>> The question is what is
>>>>>>> the "one". It does seem that ity consistently does not purge files from a
>>>>>>> particular disk but then again it doesn't know about disks. So, there is some
>>>>>>> systematic issue that resolves to ignoring a disk. Matevz?
>>>>>>
>>>>>> Well, if for some reason more new files are placed on a single disk, those
>>>>>> files
>>>>>> will be "newer" and purge would preferentially wipe data off other disks.
>>>>>>
>>>>>> That's why I asked in the first email how disks are selected for new files and
>>>>>> if we could inject some debug printouts there.
>>>>>>
>>>>>> Perhaps a coincidence, but the full disk is the one that is listed first by
>>>>>> df.
>>>>>>
>>>>>> The docs say default for oss.alloc fuzz = 0 and that this "forces oss to
>>>>>> always
>>>>>> use the partition with the largest amount of free space" -- so the fuller one
>>>>>> should never get selected for new files. And xcache does pass the appropriate
>>>>>> oss.asize opaque parameter to open.
>>>>>>
>>>>>> https://urldefense.com/v3/__https://xrootd.slac.stanford.edu/doc/dev56/ofs_config.htm*_Toc116508676__;Iw!!Mih3wA!CPJXm6eN-2_hoD2H_DidLrJJIwTvYUTK7V8pRT64GhSwBlmFYugKLfTk2O6zoR2otc1TQNvfczttg_nl$
>>>>>> Matevz
>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 16 Feb 2023, Nikolai Hartmann wrote:
>>>>>>>
>>>>>>>> Hi Andy,
>>>>>>>>
>>>>>>>> The behavior seems to be that it purges all the disks except one. After the
>>>>>>>> other disks now again surpassed the threshold of 95% it seemed to trigger
>>>>>>>> the
>>>>>>>> cleanup and now i have this:
>>>>>>>>
>>>>>>>> Filesystem                 Type      Size  Used Avail Use% Mounted on
>>>>>>>> /dev/sdb                   btrfs     5,5T  5,3T  215G  97% /srv/xcache/b
>>>>>>>> /dev/sda                   btrfs     5,5T  5,0T  560G  90% /srv/xcache/a
>>>>>>>> /dev/sdh                   btrfs     5,5T  4,9T  588G  90% /srv/xcache/h
>>>>>>>> /dev/sdj                   btrfs     5,5T  4,9T  584G  90% /srv/xcache/j
>>>>>>>> /dev/sdf                   btrfs     5,5T  4,9T  580G  90% /srv/xcache/f
>>>>>>>> /dev/sdm                   btrfs     5,5T  5,0T  535G  91% /srv/xcache/m
>>>>>>>> /dev/sdc                   btrfs     5,5T  5,0T  553G  91% /srv/xcache/c
>>>>>>>> /dev/sdg                   btrfs     5,5T  4,9T  612G  90% /srv/xcache/g
>>>>>>>> /dev/sdi                   btrfs     5,5T  4,9T  596G  90% /srv/xcache/i
>>>>>>>> /dev/sdl                   btrfs     5,5T  5,0T  518G  91% /srv/xcache/l
>>>>>>>> /dev/sdn                   btrfs     5,5T  4,9T  570G  90% /srv/xcache/n
>>>>>>>> /dev/sde                   btrfs     5,5T  4,9T  593G  90% /srv/xcache/e
>>>>>>>> /dev/sdk                   btrfs     5,5T  4,8T  677G  88% /srv/xcache/k
>>>>>>>> /dev/sdd                   btrfs     5,5T  4,9T  602G  90% /srv/xcache/d
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Nikolai
>>>>>>>>
>>>>>>>> On 2/14/23 21:52, Andrew Hanushevsky wrote:
>>>>>>>>> Hi Matevz & Nikolai,
>>>>>>>>>
>>>>>>>>> The allocation should favor the disk with the most free space unless it's
>>>>>>>>> atered using the oss.alloc directive:
>>>>>>>>> https://urldefense.com/v3/__https://xrootd.slac.stanford.edu/doc/dev54/ofs_config.htm*_Toc89982400__;Iw!!Mih3wA!AsisYxoXis_6IdoiqK-BwdMsHfHTB41Z4-GEjaMqvO0PQHh6TqU8Sn79JUgDeJDLCvO63yQiG63Zu6syVA$
>>>>>>>>> I don't think Nikolai specifies that and I don't think the pfc alters it in
>>>>>>>>> any way. So, I can't explain why we see that difference other than via an
>>>>>>>>> uneven purge.
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 14 Feb 2023, Matevz Tadel wrote:
>>>>>>>>>
>>>>>>>>>> Hi Nikolai, Andy,
>>>>>>>>>>
>>>>>>>>>> I saw this a long time back, 2++ years. The thing is that xcache does oss
>>>>>>>>>> df on
>>>>>>>>>> the whole space and then deletes files without any knowledge of the
>>>>>>>>>> usage on
>>>>>>>>>> individual disks themselves. Placement of new files should prefer the more
>>>>>>>>>> empty
>>>>>>>>>> disks though, iirc.
>>>>>>>>>>
>>>>>>>>>> I remember asking Andy about how xcache could be made aware of individual
>>>>>>>>>> disks
>>>>>>>>>> and he prepared something for me but it got really complicated when I was
>>>>>>>>>> trying
>>>>>>>>>> to include this into the cache purge algorithm so I think I dropped this.
>>>>>>>>>>
>>>>>>>>>> Andy, could we sneak some debug printouts into oss new file disk
>>>>>>>>>> selection to
>>>>>>>>>> see if something is going wrong there?
>>>>>>>>>>
>>>>>>>>>> Nikolai, how fast does this happen? Is it a matter of days, ie, over many
>>>>>>>>>> purge
>>>>>>>>>> cycles? Is it always the same disk?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Matevz
>>>>>>>>>>
>>>>>>>>>> On 2/13/23 23:21, Nikolai Hartmann wrote:
>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>
>>>>>>>>>>> The config is the following:
>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/etp-computing/xcache-nspawn-lrz/-/blob/086e5ade5d27fc7d5ef59448c955523e453c091f/etc/xrootd/xcache.cfg__;!!Mih3wA!DfZZQn5-SZKaGYvPW97K8SD5gDYYTy0wuUgMgQCUMhwQehl01yhKQdErjCRUz3BoZYL_nKVipwRIRYyR$
>>>>>>>>>>> The directories for `oss.localroot` and `oss.space meta` are on the
>>>>>>>>>>> system
>>>>>>>>>>> disk.
>>>>>>>>>>> The `/srv/xcache/[a-m]` are individually mounted devices.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Nikolai
>>>>>>>>>>>
>>>>>>>>>>> On 2/14/23 00:34, Andrew Hanushevsky wrote:
>>>>>>>>>>>> Hi Nikolai,
>>>>>>>>>>>>
>>>>>>>>>>>> Hmmm, no it seems you are the first one. Then again, not many people
>>>>>>>>>>>> have a
>>>>>>>>>>>> multi-disk setup.  So, could you send a link to your config file? It
>>>>>>>>>>>> might be
>>>>>>>>>>>> the case that all of the metadata files wind up on the same disk and
>>>>>>>>>>>> that is
>>>>>>>>>>>> the source of the issue here.
>>>>>>>>>>>>
>>>>>>>>>>>> Andy
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 13 Feb 2023, Nikolai Hartmann wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear xrootd-l,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm seeing the issue that one of the disks on one of our xcache servers
>>>>>>>>>>>>> fills
>>>>>>>>>>>>> up disproportionally - that means it runs completely full until i
>>>>>>>>>>>>> get "no
>>>>>>>>>>>>> space left on device" errors without xcache running cleanup, while the
>>>>>>>>>>>>> other
>>>>>>>>>>>>> disks still have plenty of space left. My current df output:
>>>>>>>>>>>>>
>>>>>>>>>>>>> /dev/sdb                   btrfs     5,5T  5,2T  273G  96%
>>>>>>>>>>>>> /srv/xcache/b
>>>>>>>>>>>>> /dev/sda                   btrfs     5,5T  4,9T  584G  90%
>>>>>>>>>>>>> /srv/xcache/a
>>>>>>>>>>>>> /dev/sdh                   btrfs     5,5T  5,0T  562G  90%
>>>>>>>>>>>>> /srv/xcache/h
>>>>>>>>>>>>> /dev/sdj                   btrfs     5,5T  5,0T  551G  91%
>>>>>>>>>>>>> /srv/xcache/j
>>>>>>>>>>>>> /dev/sdf                   btrfs     5,5T  4,9T  579G  90%
>>>>>>>>>>>>> /srv/xcache/f
>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you look at the first line you see that disk is 96% full while the
>>>>>>>>>>>>> others
>>>>>>>>>>>>> are around 90%. The issue occurred the first time after i built a new
>>>>>>>>>>>>> container for running xrootd. That change involved switching the
>>>>>>>>>>>>> container
>>>>>>>>>>>>> from centos7 to almalinux8 and changing the xrootd user id (ran
>>>>>>>>>>>>> chown and
>>>>>>>>>>>>> chgrp afterwards on the cache directories which are bind mounted). The
>>>>>>>>>>>>> xrootd
>>>>>>>>>>>>> version stayed the same (5.4.2). The high/low watermark configuration
>>>>>>>>>>>>> is the
>>>>>>>>>>>>> following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> pfc.diskusage 0.90 0.95
>>>>>>>>>>>>>
>>>>>>>>>>>>> I already tried clearing the misbehaving disk (after it ran full to
>>>>>>>>>>>>> 100%),
>>>>>>>>>>>>> but now the issue is reappearing. Has anyone seen similar issues or
>>>>>>>>>>>>> does it
>>>>>>>>>>>>> ring any bells for you?
>>>>>>>>>>>>>
>>>>>>>>>>>>> One thing i checked is the size that xrootd reports in the log for the
>>>>>>>>>>>>> total
>>>>>>>>>>>>> storage and that at least matches what i get when i sum the entries
>>>>>>>>>>>>> from
>>>>>>>>>>>>> `df`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Nikolai
>>>>>>>>>>>>>
>>>>>>>>>>>>> ########################################################################
>>>>>>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>>>>>>
>>>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!DfZZQn5-SZKaGYvPW97K8SD5gDYYTy0wuUgMgQCUMhwQehl01yhKQdErjCRUz3BoZYL_nKVip_SnON6x$
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>
> 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1