Print

Print


Hi Pavel,

Yes, look at XrdClientAdmin.hh. The XrdClientAdmin interface has a
Prepare() method that allows you to pass a list of files. This list is
passed to xrootd which then locates the files and locates them. If you
also specify kXR_stage then the files will be staged if not on disk. There
are additional options but for now this should be enough to get you going.

That said, what we have found is that unless you have taken great care to
group files that will be likely used together on the same tape (something
we find difficult to do), the probability of improving tape access is
rather low unless you get a huge list of files into HPSS. That in itself
causes other problems.

Andy

On Wed, 25 Oct 2006, Pavel Jakl wrote:

> Hi Andy and others,
>
> I would like to discuss the topic of having a possibility to "prepare" a
> list of files before user's jobs processing or at the beginning of the
> job.  The fact why I am rising this topic is a optimization of the
> access to our tape system (HPSS).
>
> What we are doing now is that we organize the requests in a fashion that
> we have as many file requests as possible on the same tape at HPSS and
> therefore achieve better IO performance. Of course, one can imagine that
> when we would have bigger list than we can sort better and have more
> file's requests at the same tape.
>
> Our framework gives a possibility that user's lists within one job are
> already sorted in some fashion and with a big hope that those files are
> presumably at the same tape. The problem is that processing of these
> lists goes at subsequent order, one by one.
> So, my effort is to give to xrootd a list of files to "prepared" before
> processing them (or at least the full list would start to be prepared
> when job will start). Can I somehow "publish" to server that these files
> from the list, I will use in very close future ?
>
> So, I have figured 2 possible cases how to do it:
>
> 1) Use AsyncOpen at client side
>
>     I am scared of this solution, since it can use lots of resources
> with many simultaneously opened connections. I can see jobs with
> thousand of files. In our case, when we have 400 nodes for job's
> processing, it could be very big number of connections to redirector node.
>
> 2) There is some sort of "prepare" methods at server side. So, how can I
> call them from the client side ?
>
> This solution would be better for me, if I could somehow pass the list
> to the server through the client and do not need any other assistance or
> presence of the client at each file's preparation.
> I know that there could be a problem with this. The files can disappear
> between the prepared and actual requested time of processing. (server
> went down, purging etc.). I think that the occurrence of this case is
> usually very small. However, I didn't make any sophisticated
> investigation to prove it since it would be very hard to get these
> statistic.
>
> Thanks for any suggestion or help
> Cheers
> Pavel
>