Yes, that would be fine. Please let us know when you've "hacked" this feature. Cheers, Fons. Andrew Hanushevsky wrote: > Hi Fons, > > It's pretty easy to do. I would assume you'd role that into your proof > protocol, yes? > > Andy > > ----- Original Message ----- From: "Fons Rademakers" > <[log in to unmask]> > To: "Andrew Hanushevsky" <[log in to unmask]> > Cc: "Fabrizio Furano" <[log in to unmask]>; "Jan Iwaszkiewicz" > <[log in to unmask]>; <[log in to unmask]>; <[log in to unmask]>; "Gerri > Ganis" <[log in to unmask]> > Sent: Tuesday, August 22, 2006 2:26 AM > Subject: Re: Quering locations of a vector of files > > >> Hi Andy, >> >> that would be no problem assuming we can easily query the oldb admin >> interface. How would this be done, via a popen/pclose or is there on API? >> >> Cheers, Fons. >> >> >> >> Andrew Hanushevsky wrote: >>> Hi Fons, >>> >>> It would probably be relatively easy to do if the query was entered >>> via the OLB admin interface. It's more difficult to do via an xroot >>> protocol query request. Would that satisfy you? >>> >>> Andy >>> >>> ----- Original Message ----- From: "Fons Rademakers" >>> <[log in to unmask]> >>> To: "Fabrizio Furano" <[log in to unmask]> >>> Cc: "Jan Iwaszkiewicz" <[log in to unmask]>; <[log in to unmask]>; >>> <[log in to unmask]>; "Gerri Ganis" <[log in to unmask]> >>> Sent: Saturday, August 19, 2006 3:02 PM >>> Subject: Re: Quering locations of a vector of files >>> >>> >>>> Hi Andy, Fabrizio, >>>> >>>> what we really urgently would like to have is an xrootd command >>>> that takes as input a vector of generic xrootd urls and returns a >>>> vector with resolved urls (including multiple urls in case the same >>>> file exists on more than one leaf node). Of course the first time >>>> this will take some time since the head node will have to ask the >>>> leaf nodes, but from then on this info lives in the xrootd head node >>>> cache, so it should be very quick. We need the final location in >>>> PROOF to submit work packets with priority to the nodes that have >>>> the data local. >>>> >>>> Can you tell me if this feature is possible and if we can get it soon? >>>> >>>> Cheers, Fons. >>>> >>>> >>>> >>>> Fabrizio Furano wrote: >>>>> Hi Jan, >>>>> >>>>> I see, imho this means that there is very little overhead you can >>>>> overlap, at least on the client side. Or that you are opening all >>>>> those files towards very few servers, or the same one. I hope not. >>>>> >>>>> Anyway the async open was not meant as a way to speed up the open >>>>> primitive, but as a way to do other things while the open is in >>>>> progress, or to stage many files in parallel without serializing >>>>> the waits. But in your situation it seems that there are not so >>>>> many waits to parallelize. >>>>> >>>>> Fabrizio >>>>> >>>>> >>>>> Jan Iwaszkiewicz wrote: >>>>>> Hi! >>>>>> >>>>>> I have done some test as Fabrizio advised. >>>>>> The results of tests with asynchronous open are similar to those >>>>>> with standard open: >>>>>> >>>>>> I used the following code: >>>>>> >>>>>> TTime starttime = gSystem->Now(); >>>>>> TList *toOpenList = new TList(); >>>>>> toOpenList->SetOwner(kFALSE); >>>>>> TIter nextElem(fDset->GetListOfElements()); >>>>>> while (TDSetElement *elem = >>>>>> dynamic_cast<TDSetElement*>(nextElem())) { >>>>>> TFile::AsyncOpen(elem->GetFileName()); >>>>>> toOpenList->Add(elem); >>>>>> } >>>>>> >>>>>> TFile::EAsyncOpenStatus aos; >>>>>> TIter nextToOpen(toOpenList); >>>>>> while (toOpenList->GetSize() > 0) { >>>>>> while (TDSetElement* elem = >>>>>> dynamic_cast<TDSetElement*>(nextToOpen())) { >>>>>> aos = TFile::GetAsyncOpenStatus(elem->GetFileName()); >>>>>> if (aos == TFile::kAOSSuccess || aos == TFile::kAOSNotAsync >>>>>> || aos == TFile::kAOSFailure) { >>>>>> elem->Lookup(); >>>>>> toOpenList->Remove(elem); >>>>>> } >>>>>> else if (aos != TFile::kAOSInProgress) >>>>>> Error("fileOpenTestTmp", "unknown aos"); >>>>>> } >>>>>> nextToOpen.Reset(); >>>>>> }; >>>>>> toOpenList->Delete(); >>>>>> >>>>>> TTime endtime = gSystem->Now(); >>>>>> Float_t time_holder = Long_t(endtime-starttime)/Float_t(1000); >>>>>> cout << "Openning time was " << time_holder << " seconds" << endl; >>>>>> >>>>>> >>>>>> The result is: >>>>>> >>>>>> #files asynchronous standard TFile::Open >>>>>> 300 12.5 11.7 >>>>>> 240 9.68 9.4 >>>>>> 120 4.5 4.6 >>>>>> >>>>>> Have a nice weekend! >>>>>> Jan >>>>>> >>>>>> Jan Iwaszkiewicz wrote: >>>>>>> Hi Fabrizio, Hi Andy! >>>>>>> >>>>>>> Thank you for the answers. >>>>>>> I'm making tests with TFile::AsyncOpen and will keep you >>>>>>> informed. Maybe I should clarify that we want to lookup locations >>>>>>> of the files on the PROOF master node but then open the files on >>>>>>> worker nodes. The point of the lookup is to determine what files >>>>>>> each worker will open/process. For the problems that Andy described: >>>>>>> 1) I agree. 2) It seems to be even more important to parallelize it. >>>>>>> >>>>>>> In fact the possibility to get all locations of a file is also >>>>>>> high on our wish-list. It would prevent us from opening a remote >>>>>>> file while another copy is on one of our workers. We have no >>>>>>> mechanism to avoid it. I think it's quite different use case than >>>>>>> file serving. We want to make best use of a set of nodes >>>>>>> belonging to a PROOF session. It would be very usefull to have >>>>>>> this functionality! >>>>>>> Cheers, >>>>>>> Jan >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]] >>>>>>> Sent: Wed 8/16/2006 10:47 PM >>>>>>> To: Fabrizio Furano; Jan Iwaszkiewicz >>>>>>> Cc: [log in to unmask]; [log in to unmask]; Gerardo Ganis >>>>>>> Subject: Re: Quering locations of a vector of files >>>>>>> Hi Jan, >>>>>>> >>>>>>> Another way to speed up the processing is to use the Prepare >>>>>>> method that allows you to set in motion all the steps needed to >>>>>>> get file location information. As far as finding out the location >>>>>>> of a list of files, that may be doable but has problems of its >>>>>>> own. In your case it probably doesn't matter but in the general >>>>>>> case two things may happen: 1) the location may be incorrect by >>>>>>> the time you get the information (i.e., the file has been moved >>>>>>> or deleted), and 2) there is no particular location for files >>>>>>> that don't exist yet (this includes files that may be in an MSS >>>>>>> but not yet on disk). The latter is more problematical as it >>>>>>> takes a while to determine that. Anyway, we'll look into a >>>>>>> mechanism to get you file location information (one of n for each >>>>>>> file) using a list. >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> ----- Original Message ----- From: "Fabrizio Furano" >>>>>>> <[log in to unmask]> >>>>>>> To: "Jan Iwaszkiewicz" <[log in to unmask]> >>>>>>> Cc: <[log in to unmask]>; "Maarten Ballintijn" >>>>>>> <[log in to unmask]>; "Gerri Ganis" <[log in to unmask]> >>>>>>> Sent: Wednesday, August 16, 2006 10:09 AM >>>>>>> Subject: Re: Quering locations of a vector of files >>>>>>> >>>>>>> >>>>>>>> Hi Jan, >>>>>>>> >>>>>>>> at the moment such a primitive is not part of the protocol. The >>>>>>>> simpler way of doing it is to call Stat for each file, but this >>>>>>>> reduces the per-file overhead only by a small amount, with >>>>>>>> respect to a Open call. >>>>>>>> In fact, both primitives actually drive the client to the final >>>>>>>> endpoint (the file), so you cannot avoid the overhead (mainly >>>>>>>> communication latencies) of being redirected to other servers. >>>>>>>> >>>>>>>> Since you say it's critical for you, my suggestion is to open >>>>>>>> as many files as you can in the parallel way. Doing so, all the >>>>>>>> latencies are parallelized, and you can expect a much higher >>>>>>>> performance. >>>>>>>> >>>>>>>> To do this, just call TFile::AsyncOpen(fname) for each file you >>>>>>>> need to open (a cycle), and then, later, you can call the >>>>>>>> regular TFile::Open (another cycle). >>>>>>>> The async call is non-blocking and very fast. You can find an >>>>>>>> example of its ROOT-based usage here: >>>>>>>> >>>>>>>> http://root.cern.ch/root/Version512.news.html >>>>>>>> >>>>>>>> The ugly thing is that doing this you are using a lot of >>>>>>>> resources, so, if you have really a lot of files to open (let's >>>>>>>> say, 5000) and the resources are a problem, maybe you can find a >>>>>>>> workaround by opening them in bunches of fixed size. >>>>>>>> >>>>>>>> Fabrizio >>>>>>>> >>>>>>>> Jan Iwaszkiewicz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> In PROOF we realized that we need a possibility to query exact >>>>>>>>> locations of a set of files. As far as I have seen in the >>>>>>>>> xrootd protocol, there is no way to ask for locations of a >>>>>>>>> vector of files. >>>>>>>>> >>>>>>>>> At the beginning of a query, we want to check exact locations >>>>>>>>> of all the files form a data set. The current implementation >>>>>>>>> does it by opening all the files, one by one. >>>>>>>>> The speed is about 30 files/sec. For many queries, the lookup >>>>>>>>> takes much longer than the processing. >>>>>>>>> It is a critical problem for us. >>>>>>>>> >>>>>>>>> The bool XrdClientAdmin::SysStatX(const char *paths_list, >>>>>>>>> kXR_char *binInfo) method can check multiple files but it only >>>>>>>>> verifies whether the files exist. >>>>>>>>> I imagine that it would be best for us to have something >>>>>>>>> similar but returning file locations. Is such an extension to >>>>>>>>> the protocol possible/reasonable to implement? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Jan >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> -- >>>> Org: CERN, European Laboratory for Particle Physics. >>>> Mail: 1211 Geneve 23, Switzerland >>>> E-Mail: [log in to unmask] Phone: +41 22 7679248 >>>> WWW: http://fons.rademakers.org Fax: +41 22 7669640 >>>> >>> >> >> -- >> Org: CERN, European Laboratory for Particle Physics. >> Mail: 1211 Geneve 23, Switzerland >> E-Mail: [log in to unmask] Phone: +41 22 7679248 >> WWW: http://fons.rademakers.org Fax: +41 22 7669640 >> > -- Org: CERN, European Laboratory for Particle Physics. Mail: 1211 Geneve 23, Switzerland E-Mail: [log in to unmask] Phone: +41 22 7679248 WWW: http://fons.rademakers.org Fax: +41 22 7669640