LISTSERV mailing list manager LISTSERV 16.5

Help for XCACHE-L Archives


XCACHE-L Archives

XCACHE-L Archives


XCACHE-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

XCACHE-L Home

XCACHE-L Home

XCACHE-L  May 2018

XCACHE-L May 2018

Subject:

Re: Issues of setting up a Xcache at NERSC

From:

Matevz Tadel <[log in to unmask]>

Reply-To:

list for xcache development and deployment <[log in to unmask]>

Date:

Wed, 16 May 2018 14:55:59 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (134 lines)

On 05/16/18 13:44, Andrew Hanushevsky wrote:
> Well, there is a full on hammer solution by just using a specialized oss plugin. 
> The cache gets it's usage values from this plugin. You could wrap it and 
> override the space usage method. Of course, like I said this is a rather heavy 
> duy fix. There is merit to Wei's solution by having a lihhtweight plugin that 
> either returns space usage and is used instead of the oss plugin if it was 
> specified. Alternatively, have some kind of decision plugin that us used to 
> determine if you have enough space left. I think a plugin approach will be the 
> only way we satisfy everyone's conflicting ideas of what "space" really means.

I have a proposal. Supersede pfc.diskusage with

pfc.storagelimits files <low> <nominal> <high> disk <low> <high>
                   share-count <num> purge-interval <time>

This would:
a) keep usage of actual cached files between F_nominal and F_high;

b) if total disk usage gets above D_high, cache will retreat to F_low (using 
share-count to determine what fraction of required space to concede on each 
purge cycle.

c) when cache would have to purge below F_low, it stops caching new files, it 
acts as ram-only proxy for new files until disk becomes available. It will not 
purge beyond F_low.

Additionally, we can add:

pfc.purge older-than <time> not-accessed-in <time>

What do you think?

Matevz

> Andy
> 
> On Wed, 16 May 2018, Matevz Tadel wrote:
> 
>> On 05/15/18 20:34, Yang, Wei wrote:
>>> I am evaluating Xcache at NERSC, using their shared filesystems. I run into 
>>> two issues
>>>
>>> 1. "pfc.diskusage low high" . I want to set 5TB and 8TB respectively. But the 
>>> diskusage is based on filesystem's total and used space. The total and used 
>>> of the whole filesystem is 7PB and 5.8PB. So Xcache always thinks that I run 
>>> out of space and remove everything. Anyway to change this?
>>
>> We would need to add a flag to the pfc.diskusage ... like total (what we have 
>> now) vs. actual (what you are asking for now). Hmmh, or would one want to have 
>> both controls at the same time, i.e., to start purging if either of the 
>> following is true:
>> a) cache is using more than X GB (or X % of total space on the partition);
>> b) the disk partition has more than X GB used (or is more than X % full).
>>
>> Let me think ... I think it might make sense to have both options, then 
>> several xcache instances can use the same pool of disks without stepping on 
>> each other's toes.
>>
>>> Is it possible to allow a call a simple script to determine the current used 
>>> space? Also set the interval of cleaning?
>>
>> No, we use OfsDF (or something like that). But we need to loop over all cinfo 
>> files to decide which to remove so we can also calculate actual disk usage 
>> while we're doing this.
>>
>> Andy, a question. On most linux FSs, when we have a sparse file, only blocks 
>> that are actually written "consume" disk space. Is this universally true? Is 
>> it ok if I only use stored block-count as the size estimate for actual disk 
>> usage? Can we know the block size of the FS? The problem, I think, is only the 
>> last, not fully occupied block .. and depending on the cache block size and 
>> the number of files in the cache this could lead to a relatively large 
>> "rounding error".
>>
>> Time interval can be changed ... oh ... it's not in THE DOCS :)
>>
>>  pfc.diskusage fracLow fracHigh
>>
>> is actually
>>
>>  pfc.diskusage fracLow fracHigh [sleep time]
>>
>> where time is interval between purges checks, clamped between 60 and 3600 
>> seconds.
>>
>> https://github.com/xrootd/xrootd/blob/master/src/XrdFileCache/XrdFileCacheConfiguration.cc#L337 
>>
>>
>> I have no clue about 2 below ...
>>
>> Matevz
>>
>>> 2. Putting the cache space on NERSC GPFS works but putting it on NERSC Luster 
>>> (not sure what version) does not. When the cache is on Luster, nothing is 
>>> cached and the log shows this:
>>>
>>> 180515 14:49:30 2801 XrootdXeq: yangw.3336:27@dtn04 pub IPv4 login
>>> 180515 14:49:30 2801 XrdFileCache_Manager: info Cache::Attach() 
>>> root://27@localfile:1094//atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip 
>>>
>>> 180515 14:49:30 2801 ofs_FAttr: Unable to set attr XrdFrm.Pfn from 
>>> /global/cscratch1/sd/yangw/xcache/dtn04.nersc.gov/xrd/datafiles/data/01/4D55FB5AE70A000000008037cd1501000000404%; 
>>> operation not supported
>>> 180515 14:49:30 2801 XrdFileCache_File: error File::Open() Create failed for 
>>> data file 
>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip, 
>>> err=Operation not supported 
>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip 
>>>
>>> 180515 14:49:42 2801 XrdFileCache_IO: info IOEntireFile::Detach() 0x1cd6600
>>> 180515 14:49:42 2801 XrootdXeq: yangw.3336:27@dtn04 disc 0:00:12
>>>
>>> -- 
>>> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XCACHE-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XCACHE-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XCACHE-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

June 2018
May 2018
April 2018
March 2018
February 2018

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use