Hi Andy, Yeah, the code also looked good to me ... but then, if a fuller disk still gets selected (with assumed default oss.alloc, fuzz=0), then perhaps it's the cache scan that assigns the partition a wrong free space value? I'm not saying the problem is in xrootd, maybe somebody else is playing tricks at the system/container level? Matevz On 2/27/23 23:50, Andrew Hanushevsky wrote: > Hi Matevz, > > It is XrdOssCache::Alloc() and I will gaurantee you that it will not choose a > full disk if it has properly confiurted. The default is proper so I don't know > if there have been any overrides here. > > Andy > > > On Mon, 27 Feb 2023, Matevz Tadel wrote: > >> Hi Nikolai, >> >> I don't think this matters, xcache uses last file access time as written in the >> cinfo file. >> >> I still suspect something goes wrong with the disk selection for placement of >> new files -- the full disk should simply not be chosen over the other disks and >> thus stop filling up. >> >> Wiping the whole cache would help, at least for a while :) ... but it would be >> good to understand what is going on here. Would you be able to run with a custom >> build? Potentially we could just replace a single library to include some >> printouts. >> >> Andy, remind me please ... where is the code that does disk selection? Is it >> XrdOssCache::Alloc()? >> >> Cheers, >> Matevz >> >> On 2/16/23 02:27, Nikolai Hartmann wrote: >>> Hi again, >>> >>> Actually it seems to only change the "change" time (st_ctime) >>> >>> touch test >>> stat test >>> [...] >>> Access: 2023-02-16 11:25:11.962804882 +0100 >>> Modify: 2023-02-16 11:25:11.962804882 +0100 >>> Change: 2023-02-16 11:25:11.962804882 +0100 >>> Birth: 2023-02-16 11:25:11.962804882 +0100 >>> >>> chown xrootd test >>> stat test >>> [...] >>> Access: 2023-02-16 11:25:11.962804882 +0100 >>> Modify: 2023-02-16 11:25:11.962804882 +0100 >>> Change: 2023-02-16 11:25:20.322843125 +0100 >>> Birth: 2023-02-16 11:25:11.962804882 +0100 >>> >>> Does this play a role? >>> >>> Cheers, >>> Nikolai >>> >>> On 2/16/23 11:18, Nikolai Hartmann wrote: >>>> Hi Matevz (including xrootd list again which i forgot in the last reply), >>>> >>>>> Well, if for some reason more new files are placed on a single disk, >>>>> those files will be "newer" and purge would preferentially wipe data >>>>> off other disks. >>>> Mhhhh - then i have an idea how i may have triggered this. As mentioned in my >>>> first email the issue started after i updated my container image and had to >>>> change the xrootd user ids. This changes the Access time of the files - if >>>> that is used by xrootd to determine which files are newer than it could just >>>> be that the chown process walked this directory last and therefore will purge >>>> it last. >>>> When i then deleted it when the disk ran full i made the problem even worse >>>> since now all the files that end up there are recently accessed. >>>> >>>> So deleting the whole cache should fix it? >>>> >>>> Cheers, >>>> Nikolai >>>> >>>> On 2/16/23 10:50, Matevz Tadel wrote: >>>>> Hi Andy, Nikolai, >>>>> >>>>> On 2/15/23 23:51, Andrew Hanushevsky wrote: >>>>>> Hi Nikolai, >>>>>> >>>>>> Hmm, this sounds like an off by one problem in Xcache. >>>>> >>>>> How? XCache does not do disks, it just uses oss API to a pool. >>>>> >>>>>> The question is what is >>>>>> the "one". It does seem that ity consistently does not purge files from a >>>>>> particular disk but then again it doesn't know about disks. So, there is some >>>>>> systematic issue that resolves to ignoring a disk. Matevz? >>>>> >>>>> Well, if for some reason more new files are placed on a single disk, those >>>>> files >>>>> will be "newer" and purge would preferentially wipe data off other disks. >>>>> >>>>> That's why I asked in the first email how disks are selected for new files and >>>>> if we could inject some debug printouts there. >>>>> >>>>> Perhaps a coincidence, but the full disk is the one that is listed first by >>>>> df. >>>>> >>>>> The docs say default for oss.alloc fuzz = 0 and that this "forces oss to >>>>> always >>>>> use the partition with the largest amount of free space" -- so the fuller one >>>>> should never get selected for new files. And xcache does pass the appropriate >>>>> oss.asize opaque parameter to open. >>>>> >>>>> https://urldefense.com/v3/__https://xrootd.slac.stanford.edu/doc/dev56/ofs_config.htm*_Toc116508676__;Iw!!Mih3wA!CPJXm6eN-2_hoD2H_DidLrJJIwTvYUTK7V8pRT64GhSwBlmFYugKLfTk2O6zoR2otc1TQNvfczttg_nl$ >>>>> Matevz >>>>> >>>>>> Andy >>>>>> >>>>>> >>>>>> On Thu, 16 Feb 2023, Nikolai Hartmann wrote: >>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> The behavior seems to be that it purges all the disks except one. After the >>>>>>> other disks now again surpassed the threshold of 95% it seemed to trigger >>>>>>> the >>>>>>> cleanup and now i have this: >>>>>>> >>>>>>> Filesystem Type Size Used Avail Use% Mounted on >>>>>>> /dev/sdb btrfs 5,5T 5,3T 215G 97% /srv/xcache/b >>>>>>> /dev/sda btrfs 5,5T 5,0T 560G 90% /srv/xcache/a >>>>>>> /dev/sdh btrfs 5,5T 4,9T 588G 90% /srv/xcache/h >>>>>>> /dev/sdj btrfs 5,5T 4,9T 584G 90% /srv/xcache/j >>>>>>> /dev/sdf btrfs 5,5T 4,9T 580G 90% /srv/xcache/f >>>>>>> /dev/sdm btrfs 5,5T 5,0T 535G 91% /srv/xcache/m >>>>>>> /dev/sdc btrfs 5,5T 5,0T 553G 91% /srv/xcache/c >>>>>>> /dev/sdg btrfs 5,5T 4,9T 612G 90% /srv/xcache/g >>>>>>> /dev/sdi btrfs 5,5T 4,9T 596G 90% /srv/xcache/i >>>>>>> /dev/sdl btrfs 5,5T 5,0T 518G 91% /srv/xcache/l >>>>>>> /dev/sdn btrfs 5,5T 4,9T 570G 90% /srv/xcache/n >>>>>>> /dev/sde btrfs 5,5T 4,9T 593G 90% /srv/xcache/e >>>>>>> /dev/sdk btrfs 5,5T 4,8T 677G 88% /srv/xcache/k >>>>>>> /dev/sdd btrfs 5,5T 4,9T 602G 90% /srv/xcache/d >>>>>>> >>>>>>> Cheers, >>>>>>> Nikolai >>>>>>> >>>>>>> On 2/14/23 21:52, Andrew Hanushevsky wrote: >>>>>>>> Hi Matevz & Nikolai, >>>>>>>> >>>>>>>> The allocation should favor the disk with the most free space unless it's >>>>>>>> atered using the oss.alloc directive: >>>>>>>> https://urldefense.com/v3/__https://xrootd.slac.stanford.edu/doc/dev54/ofs_config.htm*_Toc89982400__;Iw!!Mih3wA!AsisYxoXis_6IdoiqK-BwdMsHfHTB41Z4-GEjaMqvO0PQHh6TqU8Sn79JUgDeJDLCvO63yQiG63Zu6syVA$ >>>>>>>> I don't think Nikolai specifies that and I don't think the pfc alters it in >>>>>>>> any way. So, I can't explain why we see that difference other than via an >>>>>>>> uneven purge. >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 14 Feb 2023, Matevz Tadel wrote: >>>>>>>> >>>>>>>>> Hi Nikolai, Andy, >>>>>>>>> >>>>>>>>> I saw this a long time back, 2++ years. The thing is that xcache does oss >>>>>>>>> df on >>>>>>>>> the whole space and then deletes files without any knowledge of the >>>>>>>>> usage on >>>>>>>>> individual disks themselves. Placement of new files should prefer the more >>>>>>>>> empty >>>>>>>>> disks though, iirc. >>>>>>>>> >>>>>>>>> I remember asking Andy about how xcache could be made aware of individual >>>>>>>>> disks >>>>>>>>> and he prepared something for me but it got really complicated when I was >>>>>>>>> trying >>>>>>>>> to include this into the cache purge algorithm so I think I dropped this. >>>>>>>>> >>>>>>>>> Andy, could we sneak some debug printouts into oss new file disk >>>>>>>>> selection to >>>>>>>>> see if something is going wrong there? >>>>>>>>> >>>>>>>>> Nikolai, how fast does this happen? Is it a matter of days, ie, over many >>>>>>>>> purge >>>>>>>>> cycles? Is it always the same disk? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Matevz >>>>>>>>> >>>>>>>>> On 2/13/23 23:21, Nikolai Hartmann wrote: >>>>>>>>>> Hi Andy, >>>>>>>>>> >>>>>>>>>> The config is the following: >>>>>>>>>> >>>>>>>>>> https://urldefense.com/v3/__https://gitlab.physik.uni-muenchen.de/etp-computing/xcache-nspawn-lrz/-/blob/086e5ade5d27fc7d5ef59448c955523e453c091f/etc/xrootd/xcache.cfg__;!!Mih3wA!DfZZQn5-SZKaGYvPW97K8SD5gDYYTy0wuUgMgQCUMhwQehl01yhKQdErjCRUz3BoZYL_nKVipwRIRYyR$ >>>>>>>>>> The directories for `oss.localroot` and `oss.space meta` are on the >>>>>>>>>> system >>>>>>>>>> disk. >>>>>>>>>> The `/srv/xcache/[a-m]` are individually mounted devices. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Nikolai >>>>>>>>>> >>>>>>>>>> On 2/14/23 00:34, Andrew Hanushevsky wrote: >>>>>>>>>>> Hi Nikolai, >>>>>>>>>>> >>>>>>>>>>> Hmmm, no it seems you are the first one. Then again, not many people >>>>>>>>>>> have a >>>>>>>>>>> multi-disk setup. So, could you send a link to your config file? It >>>>>>>>>>> might be >>>>>>>>>>> the case that all of the metadata files wind up on the same disk and >>>>>>>>>>> that is >>>>>>>>>>> the source of the issue here. >>>>>>>>>>> >>>>>>>>>>> Andy >>>>>>>>>>> >>>>>>>>>>> On Mon, 13 Feb 2023, Nikolai Hartmann wrote: >>>>>>>>>>> >>>>>>>>>>>> Dear xrootd-l, >>>>>>>>>>>> >>>>>>>>>>>> I'm seeing the issue that one of the disks on one of our xcache servers >>>>>>>>>>>> fills >>>>>>>>>>>> up disproportionally - that means it runs completely full until i >>>>>>>>>>>> get "no >>>>>>>>>>>> space left on device" errors without xcache running cleanup, while the >>>>>>>>>>>> other >>>>>>>>>>>> disks still have plenty of space left. My current df output: >>>>>>>>>>>> >>>>>>>>>>>> /dev/sdb btrfs 5,5T 5,2T 273G 96% >>>>>>>>>>>> /srv/xcache/b >>>>>>>>>>>> /dev/sda btrfs 5,5T 4,9T 584G 90% >>>>>>>>>>>> /srv/xcache/a >>>>>>>>>>>> /dev/sdh btrfs 5,5T 5,0T 562G 90% >>>>>>>>>>>> /srv/xcache/h >>>>>>>>>>>> /dev/sdj btrfs 5,5T 5,0T 551G 91% >>>>>>>>>>>> /srv/xcache/j >>>>>>>>>>>> /dev/sdf btrfs 5,5T 4,9T 579G 90% >>>>>>>>>>>> /srv/xcache/f >>>>>>>>>>>> [...] >>>>>>>>>>>> >>>>>>>>>>>> If you look at the first line you see that disk is 96% full while the >>>>>>>>>>>> others >>>>>>>>>>>> are around 90%. The issue occurred the first time after i built a new >>>>>>>>>>>> container for running xrootd. That change involved switching the >>>>>>>>>>>> container >>>>>>>>>>>> from centos7 to almalinux8 and changing the xrootd user id (ran >>>>>>>>>>>> chown and >>>>>>>>>>>> chgrp afterwards on the cache directories which are bind mounted). The >>>>>>>>>>>> xrootd >>>>>>>>>>>> version stayed the same (5.4.2). The high/low watermark configuration >>>>>>>>>>>> is the >>>>>>>>>>>> following: >>>>>>>>>>>> >>>>>>>>>>>> pfc.diskusage 0.90 0.95 >>>>>>>>>>>> >>>>>>>>>>>> I already tried clearing the misbehaving disk (after it ran full to >>>>>>>>>>>> 100%), >>>>>>>>>>>> but now the issue is reappearing. Has anyone seen similar issues or >>>>>>>>>>>> does it >>>>>>>>>>>> ring any bells for you? >>>>>>>>>>>> >>>>>>>>>>>> One thing i checked is the size that xrootd reports in the log for the >>>>>>>>>>>> total >>>>>>>>>>>> storage and that at least matches what i get when i sum the entries >>>>>>>>>>>> from >>>>>>>>>>>> `df`. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Nikolai >>>>>>>>>>>> >>>>>>>>>>>> ######################################################################## >>>>>>>>>>>> Use REPLY-ALL to reply to list >>>>>>>>>>>> >>>>>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>>>>>>>>> https://urldefense.com/v3/__https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1__;!!Mih3wA!DfZZQn5-SZKaGYvPW97K8SD5gDYYTy0wuUgMgQCUMhwQehl01yhKQdErjCRUz3BoZYL_nKVip_SnON6x$ >>>>>>>>> >>>>>>> >>>>> >> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1