Print

Print


Hi Jan,

Another way to speed up the processing is to use the Prepare method that 
allows you to set in motion all the steps needed to get file location 
information. As far as finding out the location of a list of files, that may 
be doable but has problems of its own. In your case it probably doesn't 
matter but in the general case two things may happen: 1) the location may be 
incorrect by the time you get the information (i.e., the file has been moved 
or deleted), and 2) there is no particular location for files that don't 
exist yet (this includes files that may be in an MSS but not yet on disk). 
The latter is more problematical as it takes a while to determine that. 
Anyway, we'll look into a mechanism to get you file location information 
(one of n for each file) using a list.

Andy

----- Original Message ----- 
From: "Fabrizio Furano" <[log in to unmask]>
To: "Jan Iwaszkiewicz" <[log in to unmask]>
Cc: <[log in to unmask]>; "Maarten Ballintijn" <[log in to unmask]>; 
"Gerri Ganis" <[log in to unmask]>
Sent: Wednesday, August 16, 2006 10:09 AM
Subject: Re: Quering locations of a vector of files


> Hi Jan,
>
>  at the moment such a primitive is not part of the protocol. The simpler 
> way of doing it is to call Stat for each file, but this reduces the 
> per-file overhead only by a small amount, with respect to a Open call.
>  In fact, both primitives actually drive the client to the final endpoint 
> (the file), so you cannot avoid the overhead (mainly communication 
> latencies) of being redirected to other servers.
>
>  Since you say it's critical for you, my suggestion is to open as many 
> files as you can in the parallel way. Doing so, all the latencies are 
> parallelized, and you can expect a much higher performance.
>
>  To do this, just call TFile::AsyncOpen(fname) for each file you need to 
> open (a cycle), and then, later, you can call the regular TFile::Open 
> (another cycle).
>   The async call is non-blocking and very fast. You can find an example of 
> its ROOT-based usage here:
>
> http://root.cern.ch/root/Version512.news.html
>
>  The ugly thing is that doing this you are using a lot of resources, so, 
> if you have really a lot of files to open (let's say, 5000) and the 
> resources are a problem, maybe you can find a workaround by opening them 
> in bunches of fixed size.
>
> Fabrizio
>
> Jan Iwaszkiewicz wrote:
>> Hi,
>>
>> In PROOF we realized that we need a possibility to query exact locations 
>> of a set of files. As far as I have seen in the xrootd protocol, there is 
>> no way to ask for locations of a vector of files.
>>
>> At the beginning of a query, we want to check exact locations of all the 
>> files form a data set. The current implementation does it by opening all 
>> the files, one by one.
>> The speed is about 30 files/sec. For many queries, the lookup takes much 
>> longer than the processing.
>> It is a critical problem for us.
>>
>> The bool XrdClientAdmin::SysStatX(const char *paths_list, kXR_char 
>> *binInfo) method can check multiple files but it only verifies whether 
>> the files exist.
>> I imagine that it would be best for us to have something similar but 
>> returning file locations. Is such an extension to the protocol 
>> possible/reasonable to implement?
>>
>> Cheers,
>> Jan
>