Yes, that would be fine. Please let us know when you've "hacked" this feature.
Cheers, Fons.
Andrew Hanushevsky wrote:
> Hi Fons,
>
> It's pretty easy to do. I would assume you'd role that into your proof
> protocol, yes?
>
> Andy
>
> ----- Original Message ----- From: "Fons Rademakers"
> <[log in to unmask]>
> To: "Andrew Hanushevsky" <[log in to unmask]>
> Cc: "Fabrizio Furano" <[log in to unmask]>; "Jan Iwaszkiewicz"
> <[log in to unmask]>; <[log in to unmask]>; <[log in to unmask]>; "Gerri
> Ganis" <[log in to unmask]>
> Sent: Tuesday, August 22, 2006 2:26 AM
> Subject: Re: Quering locations of a vector of files
>
>
>> Hi Andy,
>>
>> that would be no problem assuming we can easily query the oldb admin
>> interface. How would this be done, via a popen/pclose or is there on API?
>>
>> Cheers, Fons.
>>
>>
>>
>> Andrew Hanushevsky wrote:
>>> Hi Fons,
>>>
>>> It would probably be relatively easy to do if the query was entered
>>> via the OLB admin interface. It's more difficult to do via an xroot
>>> protocol query request. Would that satisfy you?
>>>
>>> Andy
>>>
>>> ----- Original Message ----- From: "Fons Rademakers"
>>> <[log in to unmask]>
>>> To: "Fabrizio Furano" <[log in to unmask]>
>>> Cc: "Jan Iwaszkiewicz" <[log in to unmask]>; <[log in to unmask]>;
>>> <[log in to unmask]>; "Gerri Ganis" <[log in to unmask]>
>>> Sent: Saturday, August 19, 2006 3:02 PM
>>> Subject: Re: Quering locations of a vector of files
>>>
>>>
>>>> Hi Andy, Fabrizio,
>>>>
>>>> what we really urgently would like to have is an xrootd command
>>>> that takes as input a vector of generic xrootd urls and returns a
>>>> vector with resolved urls (including multiple urls in case the same
>>>> file exists on more than one leaf node). Of course the first time
>>>> this will take some time since the head node will have to ask the
>>>> leaf nodes, but from then on this info lives in the xrootd head node
>>>> cache, so it should be very quick. We need the final location in
>>>> PROOF to submit work packets with priority to the nodes that have
>>>> the data local.
>>>>
>>>> Can you tell me if this feature is possible and if we can get it soon?
>>>>
>>>> Cheers, Fons.
>>>>
>>>>
>>>>
>>>> Fabrizio Furano wrote:
>>>>> Hi Jan,
>>>>>
>>>>> I see, imho this means that there is very little overhead you can
>>>>> overlap, at least on the client side. Or that you are opening all
>>>>> those files towards very few servers, or the same one. I hope not.
>>>>>
>>>>> Anyway the async open was not meant as a way to speed up the open
>>>>> primitive, but as a way to do other things while the open is in
>>>>> progress, or to stage many files in parallel without serializing
>>>>> the waits. But in your situation it seems that there are not so
>>>>> many waits to parallelize.
>>>>>
>>>>> Fabrizio
>>>>>
>>>>>
>>>>> Jan Iwaszkiewicz wrote:
>>>>>> Hi!
>>>>>>
>>>>>> I have done some test as Fabrizio advised.
>>>>>> The results of tests with asynchronous open are similar to those
>>>>>> with standard open:
>>>>>>
>>>>>> I used the following code:
>>>>>>
>>>>>> TTime starttime = gSystem->Now();
>>>>>> TList *toOpenList = new TList();
>>>>>> toOpenList->SetOwner(kFALSE);
>>>>>> TIter nextElem(fDset->GetListOfElements());
>>>>>> while (TDSetElement *elem =
>>>>>> dynamic_cast<TDSetElement*>(nextElem())) {
>>>>>> TFile::AsyncOpen(elem->GetFileName());
>>>>>> toOpenList->Add(elem);
>>>>>> }
>>>>>>
>>>>>> TFile::EAsyncOpenStatus aos;
>>>>>> TIter nextToOpen(toOpenList);
>>>>>> while (toOpenList->GetSize() > 0) {
>>>>>> while (TDSetElement* elem =
>>>>>> dynamic_cast<TDSetElement*>(nextToOpen())) {
>>>>>> aos = TFile::GetAsyncOpenStatus(elem->GetFileName());
>>>>>> if (aos == TFile::kAOSSuccess || aos == TFile::kAOSNotAsync
>>>>>> || aos == TFile::kAOSFailure) {
>>>>>> elem->Lookup();
>>>>>> toOpenList->Remove(elem);
>>>>>> }
>>>>>> else if (aos != TFile::kAOSInProgress)
>>>>>> Error("fileOpenTestTmp", "unknown aos");
>>>>>> }
>>>>>> nextToOpen.Reset();
>>>>>> };
>>>>>> toOpenList->Delete();
>>>>>>
>>>>>> TTime endtime = gSystem->Now();
>>>>>> Float_t time_holder = Long_t(endtime-starttime)/Float_t(1000);
>>>>>> cout << "Openning time was " << time_holder << " seconds" << endl;
>>>>>>
>>>>>>
>>>>>> The result is:
>>>>>>
>>>>>> #files asynchronous standard TFile::Open
>>>>>> 300 12.5 11.7
>>>>>> 240 9.68 9.4
>>>>>> 120 4.5 4.6
>>>>>>
>>>>>> Have a nice weekend!
>>>>>> Jan
>>>>>>
>>>>>> Jan Iwaszkiewicz wrote:
>>>>>>> Hi Fabrizio, Hi Andy!
>>>>>>>
>>>>>>> Thank you for the answers.
>>>>>>> I'm making tests with TFile::AsyncOpen and will keep you
>>>>>>> informed. Maybe I should clarify that we want to lookup locations
>>>>>>> of the files on the PROOF master node but then open the files on
>>>>>>> worker nodes. The point of the lookup is to determine what files
>>>>>>> each worker will open/process. For the problems that Andy described:
>>>>>>> 1) I agree. 2) It seems to be even more important to parallelize it.
>>>>>>>
>>>>>>> In fact the possibility to get all locations of a file is also
>>>>>>> high on our wish-list. It would prevent us from opening a remote
>>>>>>> file while another copy is on one of our workers. We have no
>>>>>>> mechanism to avoid it. I think it's quite different use case than
>>>>>>> file serving. We want to make best use of a set of nodes
>>>>>>> belonging to a PROOF session. It would be very usefull to have
>>>>>>> this functionality!
>>>>>>> Cheers,
>>>>>>> Jan
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]]
>>>>>>> Sent: Wed 8/16/2006 10:47 PM
>>>>>>> To: Fabrizio Furano; Jan Iwaszkiewicz
>>>>>>> Cc: [log in to unmask]; [log in to unmask]; Gerardo Ganis
>>>>>>> Subject: Re: Quering locations of a vector of files
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> Another way to speed up the processing is to use the Prepare
>>>>>>> method that allows you to set in motion all the steps needed to
>>>>>>> get file location information. As far as finding out the location
>>>>>>> of a list of files, that may be doable but has problems of its
>>>>>>> own. In your case it probably doesn't matter but in the general
>>>>>>> case two things may happen: 1) the location may be incorrect by
>>>>>>> the time you get the information (i.e., the file has been moved
>>>>>>> or deleted), and 2) there is no particular location for files
>>>>>>> that don't exist yet (this includes files that may be in an MSS
>>>>>>> but not yet on disk). The latter is more problematical as it
>>>>>>> takes a while to determine that. Anyway, we'll look into a
>>>>>>> mechanism to get you file location information (one of n for each
>>>>>>> file) using a list.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Fabrizio Furano"
>>>>>>> <[log in to unmask]>
>>>>>>> To: "Jan Iwaszkiewicz" <[log in to unmask]>
>>>>>>> Cc: <[log in to unmask]>; "Maarten Ballintijn"
>>>>>>> <[log in to unmask]>; "Gerri Ganis" <[log in to unmask]>
>>>>>>> Sent: Wednesday, August 16, 2006 10:09 AM
>>>>>>> Subject: Re: Quering locations of a vector of files
>>>>>>>
>>>>>>>
>>>>>>>> Hi Jan,
>>>>>>>>
>>>>>>>> at the moment such a primitive is not part of the protocol. The
>>>>>>>> simpler way of doing it is to call Stat for each file, but this
>>>>>>>> reduces the per-file overhead only by a small amount, with
>>>>>>>> respect to a Open call.
>>>>>>>> In fact, both primitives actually drive the client to the final
>>>>>>>> endpoint (the file), so you cannot avoid the overhead (mainly
>>>>>>>> communication latencies) of being redirected to other servers.
>>>>>>>>
>>>>>>>> Since you say it's critical for you, my suggestion is to open
>>>>>>>> as many files as you can in the parallel way. Doing so, all the
>>>>>>>> latencies are parallelized, and you can expect a much higher
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> To do this, just call TFile::AsyncOpen(fname) for each file you
>>>>>>>> need to open (a cycle), and then, later, you can call the
>>>>>>>> regular TFile::Open (another cycle).
>>>>>>>> The async call is non-blocking and very fast. You can find an
>>>>>>>> example of its ROOT-based usage here:
>>>>>>>>
>>>>>>>> http://root.cern.ch/root/Version512.news.html
>>>>>>>>
>>>>>>>> The ugly thing is that doing this you are using a lot of
>>>>>>>> resources, so, if you have really a lot of files to open (let's
>>>>>>>> say, 5000) and the resources are a problem, maybe you can find a
>>>>>>>> workaround by opening them in bunches of fixed size.
>>>>>>>>
>>>>>>>> Fabrizio
>>>>>>>>
>>>>>>>> Jan Iwaszkiewicz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> In PROOF we realized that we need a possibility to query exact
>>>>>>>>> locations of a set of files. As far as I have seen in the
>>>>>>>>> xrootd protocol, there is no way to ask for locations of a
>>>>>>>>> vector of files.
>>>>>>>>>
>>>>>>>>> At the beginning of a query, we want to check exact locations
>>>>>>>>> of all the files form a data set. The current implementation
>>>>>>>>> does it by opening all the files, one by one.
>>>>>>>>> The speed is about 30 files/sec. For many queries, the lookup
>>>>>>>>> takes much longer than the processing.
>>>>>>>>> It is a critical problem for us.
>>>>>>>>>
>>>>>>>>> The bool XrdClientAdmin::SysStatX(const char *paths_list,
>>>>>>>>> kXR_char *binInfo) method can check multiple files but it only
>>>>>>>>> verifies whether the files exist.
>>>>>>>>> I imagine that it would be best for us to have something
>>>>>>>>> similar but returning file locations. Is such an extension to
>>>>>>>>> the protocol possible/reasonable to implement?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>> --
>>>> Org: CERN, European Laboratory for Particle Physics.
>>>> Mail: 1211 Geneve 23, Switzerland
>>>> E-Mail: [log in to unmask] Phone: +41 22 7679248
>>>> WWW: http://fons.rademakers.org Fax: +41 22 7669640
>>>>
>>>
>>
>> --
>> Org: CERN, European Laboratory for Particle Physics.
>> Mail: 1211 Geneve 23, Switzerland
>> E-Mail: [log in to unmask] Phone: +41 22 7679248
>> WWW: http://fons.rademakers.org Fax: +41 22 7669640
>>
>
--
Org: CERN, European Laboratory for Particle Physics.
Mail: 1211 Geneve 23, Switzerland
E-Mail: [log in to unmask] Phone: +41 22 7679248
WWW: http://fons.rademakers.org Fax: +41 22 7669640
|