LISTSERV mailing list manager LISTSERV 16.5

Help for XCACHE-L Archives


XCACHE-L Archives

XCACHE-L Archives


XCACHE-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

XCACHE-L Home

XCACHE-L Home

XCACHE-L  May 2018

XCACHE-L May 2018

Subject:

Re: Issues of setting up a Xcache at NERSC

From:

Andrew Hanushevsky <[log in to unmask]>

Reply-To:

list for xcache development and deployment <[log in to unmask]>

Date:

Wed, 16 May 2018 15:13:28 -0700

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (152 lines)

Well, that sounds like a good step forward. If it's simple to do then I'd 
say go ahead. But I think it only forestalls people clamoring for a plugin 
(I've been down that road before and no matter how great one idea is 
someone else thinks they have a better idea :-).

P.S. You may want to replace share count by a simple percentage (or have a 
way of specifying it along withshare count -- though it's likely to be too 
confusing). The share count idea assumes all parties are equal. So, a 
share count of 3 gives everyone a 33% share. But that's not what I might 
want. I may want one party to get 50% of the space and the other two 25% 
(I might even want to over commit for my own personal bizarre reasons). 
So, consider that.

Andy

On Wed, 16 May 2018, Matevz Tadel wrote:

> On 05/16/18 13:44, Andrew Hanushevsky wrote:
>> Well, there is a full on hammer solution by just using a specialized oss 
>> plugin. The cache gets it's usage values from this plugin. You could wrap 
>> it and override the space usage method. Of course, like I said this is a 
>> rather heavy duy fix. There is merit to Wei's solution by having a 
>> lihhtweight plugin that either returns space usage and is used instead of 
>> the oss plugin if it was specified. Alternatively, have some kind of 
>> decision plugin that us used to determine if you have enough space left. I 
>> think a plugin approach will be the only way we satisfy everyone's 
>> conflicting ideas of what "space" really means.
>
> I have a proposal. Supersede pfc.diskusage with
>
> pfc.storagelimits files <low> <nominal> <high> disk <low> <high>
>                  share-count <num> purge-interval <time>
>
> This would:
> a) keep usage of actual cached files between F_nominal and F_high;
>
> b) if total disk usage gets above D_high, cache will retreat to F_low (using 
> share-count to determine what fraction of required space to concede on each 
> purge cycle.
>
> c) when cache would have to purge below F_low, it stops caching new files, it 
> acts as ram-only proxy for new files until disk becomes available. It will 
> not purge beyond F_low.
>
> Additionally, we can add:
>
> pfc.purge older-than <time> not-accessed-in <time>
>
> What do you think?
>
> Matevz
>
>> Andy
>> 
>> On Wed, 16 May 2018, Matevz Tadel wrote:
>> 
>>> On 05/15/18 20:34, Yang, Wei wrote:
>>>> I am evaluating Xcache at NERSC, using their shared filesystems. I run 
>>>> into two issues
>>>> 
>>>> 1. "pfc.diskusage low high" . I want to set 5TB and 8TB respectively. But 
>>>> the diskusage is based on filesystem's total and used space. The total 
>>>> and used of the whole filesystem is 7PB and 5.8PB. So Xcache always 
>>>> thinks that I run out of space and remove everything. Anyway to change 
>>>> this?
>>> 
>>> We would need to add a flag to the pfc.diskusage ... like total (what we 
>>> have now) vs. actual (what you are asking for now). Hmmh, or would one 
>>> want to have both controls at the same time, i.e., to start purging if 
>>> either of the following is true:
>>> a) cache is using more than X GB (or X % of total space on the partition);
>>> b) the disk partition has more than X GB used (or is more than X % full).
>>> 
>>> Let me think ... I think it might make sense to have both options, then 
>>> several xcache instances can use the same pool of disks without stepping 
>>> on each other's toes.
>>> 
>>>> Is it possible to allow a call a simple script to determine the current 
>>>> used space? Also set the interval of cleaning?
>>> 
>>> No, we use OfsDF (or something like that). But we need to loop over all 
>>> cinfo files to decide which to remove so we can also calculate actual disk 
>>> usage while we're doing this.
>>> 
>>> Andy, a question. On most linux FSs, when we have a sparse file, only 
>>> blocks that are actually written "consume" disk space. Is this universally 
>>> true? Is it ok if I only use stored block-count as the size estimate for 
>>> actual disk usage? Can we know the block size of the FS? The problem, I 
>>> think, is only the last, not fully occupied block .. and depending on the 
>>> cache block size and the number of files in the cache this could lead to a 
>>> relatively large "rounding error".
>>> 
>>> Time interval can be changed ... oh ... it's not in THE DOCS :)
>>> 
>>>  pfc.diskusage fracLow fracHigh
>>> 
>>> is actually
>>> 
>>>  pfc.diskusage fracLow fracHigh [sleep time]
>>> 
>>> where time is interval between purges checks, clamped between 60 and 3600 
>>> seconds.
>>> 
>>> https://github.com/xrootd/xrootd/blob/master/src/XrdFileCache/XrdFileCacheConfiguration.cc#L337 
>>> 
>>> I have no clue about 2 below ...
>>> 
>>> Matevz
>>> 
>>>> 2. Putting the cache space on NERSC GPFS works but putting it on NERSC 
>>>> Luster (not sure what version) does not. When the cache is on Luster, 
>>>> nothing is cached and the log shows this:
>>>> 
>>>> 180515 14:49:30 2801 XrootdXeq: yangw.3336:27@dtn04 pub IPv4 login
>>>> 180515 14:49:30 2801 XrdFileCache_Manager: info Cache::Attach() 
>>>> root://27@localfile:1094//atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip 
>>>> 180515 14:49:30 2801 ofs_FAttr: Unable to set attr XrdFrm.Pfn from 
>>>> /global/cscratch1/sd/yangw/xcache/dtn04.nersc.gov/xrd/datafiles/data/01/4D55FB5AE70A000000008037cd1501000000404%; 
>>>> operation not supported
>>>> 180515 14:49:30 2801 XrdFileCache_File: error File::Open() Create failed 
>>>> for data file 
>>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip, 
>>>> err=Operation not supported 
>>>> /atlas/rucio/transient/31/c5/panda.HITS.14104313._000001.pool.root.1.14104313-3930415979-13437313503-47-10.zip 
>>>> 180515 14:49:42 2801 XrdFileCache_IO: info IOEntireFile::Detach() 
>>>> 0x1cd6600
>>>> 180515 14:49:42 2801 XrootdXeq: yangw.3336:27@dtn04 disc 0:00:12
>>>> 
>>>> -- 
>>>> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
>>>> 
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>> 
>>>> To unsubscribe from the XCACHE-L list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>>> 
>>> 
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>> 
>>> To unsubscribe from the XCACHE-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>> 
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XCACHE-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

June 2018
May 2018
April 2018
March 2018
February 2018

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use