Well, that sounds like a good step forward. If it's simple to do then I'd
say go ahead. But I think it only forestalls people clamoring for a plugin
(I've been down that road before and no matter how great one idea is
someone else thinks they have a better idea :-).
P.S. You may want to replace share count by a simple percentage (or have a
way of specifying it along withshare count -- though it's likely to be too
confusing). The share count idea assumes all parties are equal. So, a
share count of 3 gives everyone a 33% share. But that's not what I might
want. I may want one party to get 50% of the space and the other two 25%
(I might even want to over commit for my own personal bizarre reasons).
So, consider that.
Andy
On Wed, 16 May 2018, Matevz Tadel wrote:
> On 05/16/18 13:44, Andrew Hanushevsky wrote:
>> Well, there is a full on hammer solution by just using a specialized oss
>> plugin. The cache gets it's usage values from this plugin. You could wrap
>> it and override the space usage method. Of course, like I said this is a
>> rather heavy duy fix. There is merit to Wei's solution by having a
>> lihhtweight plugin that either returns space usage and is used instead of
>> the oss plugin if it was specified. Alternatively, have some kind of
>> decision plugin that us used to determine if you have enough space left. I
>> think a plugin approach will be the only way we satisfy everyone's
>> conflicting ideas of what "space" really means.
>
> I have a proposal. Supersede pfc.diskusage with
>
> pfc.storagelimits files <low> <nominal> <high> disk <low> <high>
> share-count <num> purge-interval <time>
>
> This would:
> a) keep usage of actual cached files between F_nominal and F_high;
>
> b) if total disk usage gets above D_high, cache will retreat to F_low (using
> share-count to determine what fraction of required space to concede on each
> purge cycle.
>
> c) when cache would have to purge below F_low, it stops caching new files, it
> acts as ram-only proxy for new files until disk becomes available. It will
> not purge beyond F_low.
>
> Additionally, we can add:
>
> pfc.purge older-than <time> not-accessed-in <time>
>
> What do you think?
>
> Matevz
>
>> Andy
>>
>> On Wed, 16 May 2018, Matevz Tadel wrote:
>>
>>> On 05/15/18 20:34, Yang, Wei wrote:
>>>> I am evaluating Xcache at NERSC, using their shared filesystems. I run
>>>> into two issues
>>>>
>>>> 1. "pfc.diskusage low high" . I want to set 5TB and 8TB respectively. But
>>>> the diskusage is based on filesystem's total and used space. The total
>>>> and used of the whole filesystem is 7PB and 5.8PB. So Xcache always
>>>> thinks that I run out of space and remove everything. Anyway to change
>>>> this?
>>>
>>> We would need to add a flag to the pfc.diskusage ... like total (what we
>>> have now) vs. actual (what you are asking for now). Hmmh, or would one
>>> want to have both controls at the same time, i.e., to start purging if
>>> either of the following is true:
>>> a) cache is using more than X GB (or X % of total space on the partition);
>>> b) the disk partition has more than X GB used (or is more than X % full).
>>>
>>> Let me think ... I think it might make sense to have both options, then
>>> several xcache instances can use the same pool of disks without stepping
>>> on each other's toes.
>>>
>>>> Is it possible to allow a call a simple script to determine the current
>>>> used space? Also set the interval of cleaning?
>>>
>>> No, we use OfsDF (or something like that). But we need to loop over all
>>> cinfo files to decide which to remove so we can also calculate actual disk
>>> usage while we're doing this.
>>>
>>> Andy, a question. On most linux FSs, when we have a sparse file, only
>>> blocks that are actually written "consume" disk space. Is this universally
>>> true? Is it ok if I only use stored block-count as the size estimate for
>>> actual disk usage? Can we know the block size of the FS? The problem, I
>>> think, is only the last, not fully occupied block .. and depending on the
>>> cache block size and the number of files in the cache this could lead to a
>>> relatively large "rounding error".
>>>
>>> Time interval can be changed ... oh ... it's not in THE DOCS :)
>>>
>>> pfc.diskusage fracLow fracHigh
>>>
>>> is actually
>>>
>>> pfc.diskusage fracLow fracHigh [sleep time]
>>>
>>> where time is interval between purges checks, clamped between 60 and 3600
>>> seconds.
>>>
>>> https://github.com/xrootd/xrootd/blob/master/src/XrdFileCache/XrdFileCacheConfiguration.cc#L337
>>>
>>> I have no clue about 2 below ...
>>>
>>> Matevz
>>>
>>>> 2. Putting the cache space on NERSC GPFS works but putting it on NERSC
>>>> Luster (not sure what version) does not. When the cache is on Luster,
>>>> nothing is cached and the log shows this:
>>>>
>>>> 180515 14:49:30 2801 XrootdXeq: yangw.3336:27@dtn04 pub IPv4 login
>>>> 180515 14:49:30 2801 XrdFileCache_Manager: info Cache::Attach()
>>>> root://27@localfile:1094//atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip
>>>> 180515 14:49:30 2801 ofs_FAttr: Unable to set attr XrdFrm.Pfn from
>>>> /global/cscratch1/sd/yangw/xcache/dtn04.nersc.gov/xrd/datafiles/data/01/4D55FB5AE70A000000008037cd1501000000404%;
>>>> operation not supported
>>>> 180515 14:49:30 2801 XrdFileCache_File: error File::Open() Create failed
>>>> for data file
>>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip,
>>>> err=Operation not supported
>>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip
>>>> 180515 14:49:42 2801 XrdFileCache_IO: info IOEntireFile::Detach()
>>>> 0x1cd6600
>>>> 180515 14:49:42 2801 XrootdXeq: yangw.3336:27@dtn04 disc 0:00:12
>>>>
>>>> --
>>>> Wei Yang | [log in to unmask] | 650-926-3338(O)
>>>>
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the XCACHE-L list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>>>
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XCACHE-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>>
>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XCACHE-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
|