Print

Print


Hi 

Gregory,
Yes, we need an URL of specific xrootd cache node. We use the same machines as PROOF slaves as well.

I understand the problems that URLs can be only "advisory".
We use the resolved urls to open the right files on right nodes. So once we make that decision we can use original "redirector URLs" on specific data servers and the files that are local will be opened without contacting the redirector. 

In addition, as far as I understood, the redirector can decide to replicate the files but deleting happens only if there is not enough disk space. We want to delete least recently accessed files first. So after dataset validation (which opens all files) the chance that file is deleted is close to 0. Still a data server may go down. 

A very good solution to utilize the current xrootd protocol and to get all locations of all the files is this:

There exists XrdClientAdmin::FilesExist which takes a vector of file names and returns a Boolean vector with answers. What we need is simply a vector of these vectors (a Boolean vector per each data server).
So it should be enough to implement one method (on redirector) that takes the vector of files names, calls FilesExist on each of data servers and returns all the answers to the caller.

What do you think? 

Cheers,
Jan


-----Original Message-----
From: [log in to unmask] on behalf of Gregory J. Sharp
Sent: Tue 8/22/2006 4:14 AM
To: Xrootd Mailing List
Subject: Re: Quering locations of a vector of files
 

On 19Aug2006, at 18:02 , Fons Rademakers wrote:

> Hi Andy, Fabrizio,
>
>   what we really urgently would like to have is an xrootd command  
> that takes as input a vector of generic xrootd urls and returns a  
> vector with resolved urls (including multiple urls in case the same  
> file exists on more than one leaf node). Of course the first time  
> this will take some time since the head node will have to ask the  
> leaf nodes, but from then on this info lives in the xrootd head  
> node cache, so it should be very quick. We need the final location  
> in PROOF to submit work packets with priority to the nodes that  
> have the data local.
>
> Can you tell me if this feature is possible and if we can get it soon?

I would like to understand what you are really asking for. When you  
say you want a "resolved url", do you mean a path to the physical  
cache server(s) that currently caches the file?
And when you say "nodes that have the data local", do you intend to  
run an xrootd cache server on each compute node? Or does "local"  
mean, on the same LAN?

My understanding (probably out of date), is that an Xrootd cache node  
is free to remove a file from its cache at any time, unless somebody  
has taken a specific action to pin it in the cache. Statistically, it  
is likely that it will still be cached when PROOF tries to access it,  
but there is no promise. So the "resolved urls" will at best be  
advisory. Has that changed?

--
Gregory J. Sharp                   email: [log in to unmask]
Wilson Synchrotron Laboratory      url: http://www.lepp.cornell.edu/ 
~gregor
Cornell University                 ph:  +1 607 255 4882
Ithaca, NY 14853                   fax: +1 607 255 8062