LISTSERV 16.5 - XCACHE-L Archives

Subscriber's Corner

Email Lists

XCACHE-L Archives

XCACHE-L@LISTSERV.SLAC.STANFORD.EDU

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		XCACHE-L Home
		XCACHE-L May 2018

Subject:

Re: Issues of setting up a Xcache at NERSC

From:

Matevz Tadel <[log in to unmask]>

Reply-To:

list for xcache development and deployment <[log in to unmask]>

Date:

Wed, 16 May 2018 14:55:59 -0700

Content-Type:

text/plain

Parts/Attachments:

text/plain (134 lines)

On 05/16/18 13:44, Andrew Hanushevsky wrote:
> Well, there is a full on hammer solution by just using a specialized oss plugin. 
> The cache gets it's usage values from this plugin. You could wrap it and 
> override the space usage method. Of course, like I said this is a rather heavy 
> duy fix. There is merit to Wei's solution by having a lihhtweight plugin that 
> either returns space usage and is used instead of the oss plugin if it was 
> specified. Alternatively, have some kind of decision plugin that us used to 
> determine if you have enough space left. I think a plugin approach will be the 
> only way we satisfy everyone's conflicting ideas of what "space" really means.

I have a proposal. Supersede pfc.diskusage with

pfc.storagelimits files <low> <nominal> <high> disk <low> <high>
                   share-count <num> purge-interval <time>

This would:
a) keep usage of actual cached files between F_nominal and F_high;

b) if total disk usage gets above D_high, cache will retreat to F_low (using 
share-count to determine what fraction of required space to concede on each 
purge cycle.

c) when cache would have to purge below F_low, it stops caching new files, it 
acts as ram-only proxy for new files until disk becomes available. It will not 
purge beyond F_low.

Additionally, we can add:

pfc.purge older-than <time> not-accessed-in <time>

What do you think?

Matevz

> Andy
> 
> On Wed, 16 May 2018, Matevz Tadel wrote:
> 
>> On 05/15/18 20:34, Yang, Wei wrote:
>>> I am evaluating Xcache at NERSC, using their shared filesystems. I run into 
>>> two issues
>>>
>>> 1. "pfc.diskusage low high" . I want to set 5TB and 8TB respectively. But the 
>>> diskusage is based on filesystem's total and used space. The total and used 
>>> of the whole filesystem is 7PB and 5.8PB. So Xcache always thinks that I run 
>>> out of space and remove everything. Anyway to change this?
>>
>> We would need to add a flag to the pfc.diskusage ... like total (what we have 
>> now) vs. actual (what you are asking for now). Hmmh, or would one want to have 
>> both controls at the same time, i.e., to start purging if either of the 
>> following is true:
>> a) cache is using more than X GB (or X % of total space on the partition);
>> b) the disk partition has more than X GB used (or is more than X % full).
>>
>> Let me think ... I think it might make sense to have both options, then 
>> several xcache instances can use the same pool of disks without stepping on 
>> each other's toes.
>>
>>> Is it possible to allow a call a simple script to determine the current used 
>>> space? Also set the interval of cleaning?
>>
>> No, we use OfsDF (or something like that). But we need to loop over all cinfo 
>> files to decide which to remove so we can also calculate actual disk usage 
>> while we're doing this.
>>
>> Andy, a question. On most linux FSs, when we have a sparse file, only blocks 
>> that are actually written "consume" disk space. Is this universally true? Is 
>> it ok if I only use stored block-count as the size estimate for actual disk 
>> usage? Can we know the block size of the FS? The problem, I think, is only the 
>> last, not fully occupied block .. and depending on the cache block size and 
>> the number of files in the cache this could lead to a relatively large 
>> "rounding error".
>>
>> Time interval can be changed ... oh ... it's not in THE DOCS :)
>>
>>  pfc.diskusage fracLow fracHigh
>>
>> is actually
>>
>>  pfc.diskusage fracLow fracHigh [sleep time]
>>
>> where time is interval between purges checks, clamped between 60 and 3600 
>> seconds.
>>
>> https://github.com/xrootd/xrootd/blob/master/src/XrdFileCache/XrdFileCacheConfiguration.cc#L337 
>>
>>
>> I have no clue about 2 below ...
>>
>> Matevz
>>
>>> 2. Putting the cache space on NERSC GPFS works but putting it on NERSC Luster 
>>> (not sure what version) does not. When the cache is on Luster, nothing is 
>>> cached and the log shows this:
>>>
>>> 180515 14:49:30 2801 XrootdXeq: yangw.3336:27@dtn04 pub IPv4 login
>>> 180515 14:49:30 2801 XrdFileCache_Manager: info Cache::Attach() 
>>> root://27@localfile:1094//atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip 
>>>
>>> 180515 14:49:30 2801 ofs_FAttr: Unable to set attr XrdFrm.Pfn from 
>>> /global/cscratch1/sd/yangw/xcache/dtn04.nersc.gov/xrd/datafiles/data/01/4D55FB5AE70A000000008037cd1501000000404%; 
>>> operation not supported
>>> 180515 14:49:30 2801 XrdFileCache_File: error File::Open() Create failed for 
>>> data file 
>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip, 
>>> err=Operation not supported 
>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip 
>>>
>>> 180515 14:49:42 2801 XrdFileCache_IO: info IOEntireFile::Detach() 0x1cd6600
>>> 180515 14:49:42 2801 XrootdXeq: yangw.3336:27@dtn04 disc 0:00:12
>>>
>>> -- 
>>> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XCACHE-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XCACHE-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XCACHE-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1

Top of Message | Previous Page | Permalink

Search Archives

Advanced Options

Options

		Log In
		Get Password

		Search Archives

		Subscribe or Unsubscribe