Print

Print


	We do get huge lists of files to HPSS (and do not use
individual pftps). We observed within a single process P1
access patterns along the lines of one file open / processing
and while processing, another client P2 requests other files,
kicking the tape out of the drive (imagine many clients
simultaneously of course). On the next file N+1 from P1, the
same tape is being mounted again and so on ... leading to
several 10th of mount / dismount along the work unit of
{P}i which could be avoided. Pre-staging would allow larger
bulk request, tape sorting etc ...

	Worth a try.

	Thank you,

Andrew Hanushevsky wrote:
> Hi Pavel,
> 
> Yes, look at XrdClientAdmin.hh. The XrdClientAdmin interface has a
> Prepare() method that allows you to pass a list of files. This list is
> passed to xrootd which then locates the files and locates them. If you
> also specify kXR_stage then the files will be staged if not on disk. There
> are additional options but for now this should be enough to get you going.
> 
> That said, what we have found is that unless you have taken great care to
> group files that will be likely used together on the same tape (something
> we find difficult to do), the probability of improving tape access is
> rather low unless you get a huge list of files into HPSS. That in itself
> causes other problems.
> 
> Andy
> 
> On Wed, 25 Oct 2006, Pavel Jakl wrote:
> 
>> Hi Andy and others,
>>
>> I would like to discuss the topic of having a possibility to "prepare" a
>> list of files before user's jobs processing or at the beginning of the
>> job.  The fact why I am rising this topic is a optimization of the
>> access to our tape system (HPSS).
>>
>> What we are doing now is that we organize the requests in a fashion that
>> we have as many file requests as possible on the same tape at HPSS and
>> therefore achieve better IO performance. Of course, one can imagine that
>> when we would have bigger list than we can sort better and have more
>> file's requests at the same tape.
>>
>> Our framework gives a possibility that user's lists within one job are
>> already sorted in some fashion and with a big hope that those files are
>> presumably at the same tape. The problem is that processing of these
>> lists goes at subsequent order, one by one.
>> So, my effort is to give to xrootd a list of files to "prepared" before
>> processing them (or at least the full list would start to be prepared
>> when job will start). Can I somehow "publish" to server that these files
>> from the list, I will use in very close future ?
>>
>> So, I have figured 2 possible cases how to do it:
>>
>> 1) Use AsyncOpen at client side
>>
>>     I am scared of this solution, since it can use lots of resources
>> with many simultaneously opened connections. I can see jobs with
>> thousand of files. In our case, when we have 400 nodes for job's
>> processing, it could be very big number of connections to redirector node.
>>
>> 2) There is some sort of "prepare" methods at server side. So, how can I
>> call them from the client side ?
>>
>> This solution would be better for me, if I could somehow pass the list
>> to the server through the client and do not need any other assistance or
>> presence of the client at each file's preparation.
>> I know that there could be a problem with this. The files can disappear
>> between the prepared and actual requested time of processing. (server
>> went down, purging etc.). I think that the occurrence of this case is
>> usually very small. However, I didn't make any sophisticated
>> investigation to prove it since it would be very hard to get these
>> statistic.
>>
>> Thanks for any suggestion or help
>> Cheers
>> Pavel
>>

-- 
              ,,,,,
             ( o o )
          --m---U---m--
              Jerome