LISTSERV 16.5 - XROOTD-L Archives

Hi Andy,

I think I would be interested in further development along these lines, 
but please take into account that I really don't know what I am doing yet!

I can see another potential alternative to the issue by using multiple 
partitions to represent the 6.5TB storage.  I can probably do this in 
hardware where the 6.5TB is a "virtual drive" and I create N number of 
logical drives, or I can create N partitions on the 6.5TB drive.

If I choose a large enough value for N, the number of files in a given 
cache directory would be controlled ( NumFiles ~ 6.5TB / [ N * 
AvgFileSize] )

The upside is that I can nominally control the number of files in a 
directory and that hardware problems might only screw up 6.5TB of 
storage (not 13).  The downside is the loss of some space due to 
redundant partitioning structures and the more complicated management.


Anyone see other issues with this approach?

Patrick


Andrew Hanushevsky wrote:
> Hi Patrick,
> 
> In general, a good LVM will provide better dispersement of data (and 
> theoretically better performance) than using a partioning mechanism. 
> When we wrote the partitioning code, such LVM's were hard to find and 
> ones that existed were rather expensive (think almost 10 years ago). 
> That said, partitioning gives you (for most LVM's) better control of 
> recovery granularity. If a partition dies, you need only to recover 
> files from that partition. In many LVM's, when a portion goes you may 
> need to recover a whole filesystem -- a rather daunting task for TB 
> filesystems.
> 
> Some filesystem (e.g., ZFS) do better than others and include integrated 
> LVM support so they are more resilient. Others are not and some even 
> wind up hurting you when configured on a huge partition as internal 
> limitations either introduce huge allocation units (wasting a lot of 
> space) or significantly reduce the number of files you can allocate 
> (i.e., are short on inodes). So, this adds yet another level of complexity.
> 
> You are right that multi-TB cache partitions pose problems in Linux, and 
> are challenging in terms of good file dispersal to achieve high 
> performance (though, in all fairness, Linux is not alone in this). We 
> have toyed with the idea of creating multiple subdirectories in the 
> cache partition to alleviate those problems but have always put that on 
> the back burner because LVM's were coming out that exhibited rather good 
> performance and resiliency.
> 
> Generally, I prefer cache partition no more than a couple of terabytes 
> (hard to get these days). That way I don't loose too much data when one 
> of those goes and it's not so big to pose directory problems given a 
> reasonably large file size (500MB or more).
> 
> I would be interested in knowing how useful people think is providing 
> further support for cache partitions by introducing another directory 
> level within a partition or whether people feel comfortable with today's 
> LVM's to simply go with that.
> 
> Andy
> 
> 
> ----- Original Message ----- From: "Patrick McGuigan" <[log in to unmask]>
> To: "Wilko Kroeger" <[log in to unmask]>
> Cc: <[log in to unmask]>
> Sent: Wednesday, November 07, 2007 2:50 PM
> Subject: Re: Question about oss.cache directive
> 
> 
>> Hi Wilko,
>>
>> Your answers were very helpful.  I better understand the cache 
>> directive, but I am curious if anyone has used largish partitions to 
>> create a cache?
>>
>> Our partitions will be 6.5TB (XFS) and I am a little dubious about 
>> using such a large partition to support a cache.  In the scenario that 
>> you outline all of the files would reside in the base directory of the 
>> cache directory (/xrd/cache01 or /xrd/cache02).  I am concerned that 
>> the directory would have a large number of files which might result in 
>> slower access to files because the way that linux deals with large 
>> directories.
>>
>>
>> Another alternative is to use LVM to create one large partition, but I 
>> will need to look at the load balancing issues when some servers have 
>> twice as much storage than others.
>>
>> Any and all advice or experience is appreciated,
>>
>> Patrick
>>
>>
>>
>>
>>
>> Wilko Kroeger wrote:
>>>
>>> Hello Patrick
>>>
>>> Fabrizio answered already most of the question. I just have a few 
>>> comments.
>>>
>>> If you have more then one partition that an xrootd server should 
>>> serve you should use the cache directive.
>>> The cache is working by placing a file in a cache directory and 
>>> creating a link between this file and the proper file name. For example:
>>> if the file name is /xrd/test/d1/file1 and your you use the the cache 
>>> directive
>>> ooss.cache /xrd*
>>> the file would be put (lets pick cache xrd2) into
>>>    /xrd2/%xrd%test%d1%file1
>>> and a link is created:
>>>> ls -l /xrd/test/d1/file1 ->  /xrd2/%xrd%test%d1%file1
>>>
>>> As you can see there are no directories in the cache. The file name 
>>> in the cache is the proper file name with all '/' replaced by '%'.
>>>
>>>
>>> As xrootd will export /xrd you have to create a /xrd directory. I 
>>> guess this will not be in the '/' root partition but in one of you 
>>> data partition (/xrd1 /xrd2) and therefore you will need a link.
>>>  /xrd -> /xrd1
>>>
>>> However, in this case, doing an 'ls /xrd' would list all files in 
>>> /xrd1 which could be quite large depending how many files you have. 
>>> Therefore, you might want to have a link like
>>>  /xrd -> /xrd1/xrd
>>> In this case 'ls /xrd' would not list the files in the /xrd1 cache.
>>>
>>> Another possibility would be to make the cache directories a little 
>>> bit more explicit. Mount your two partitions as:
>>> /xrd
>>> /xrd/cache1
>>> and create the directory
>>> /xrd/cache0
>>> and then use
>>> ooss.cache /xrd/cache*
>>>
>>> I hope these comments helped a little bit.
>>>
>>>   Cheers,
>>>       Wilko
>>>
>>>
>>>
>>> On Wed, 7 Nov 2007, Patrick McGuigan wrote:
>>>
>>>> Hi,
>>>>
>>>> I am setting up an xrootd cluster for the first time and I have a 
>>>> question about the oss.cache directive.
>>>>
>>>> Some of my data servers have two partitions (and some have one) that 
>>>> I want to use for storage.  Is it true that the oss.cache directive 
>>>> MUST be used to put two partitions into service?  How is load 
>>>> balancing (based on space) managed on caches versus partitions?  Are 
>>>> there any performance penalties to using the cache directive?
>>>>
>>>> Finally, when a directory is created within a cache, does the 
>>>> directory get created on both partitions?
>>>>
>>>>
>>>>
>>>> If the partition on a one mount server is /xrd1 and the partitions 
>>>> on dual-mount server are /xrd1 and /xrd2, would the following 
>>>> snippet from the config file be appropriate:
>>>>
>>>>
>>>> #
>>>> #
>>>> olb.path rw /xrd
>>>> #
>>>> oss.cache public /xrd*
>>>> #
>>>> xrootd.fslib /opt/xrootd/lib/libXrdOfs.so
>>>> xrootd.export /xrd
>>>>
>>>>
>>>>
>>>> I am expecting this to create a global namespace rooted at /xrd that 
>>>> is writable and would use both partitions of dual-mount data server.
>>>>
>>>>
>>>>
>>>> Thanks for any information,
>>>>
>>>> Patrick
>>>>
>>