Hi Andy, that would be no problem assuming we can easily query the oldb admin interface. How would this be done, via a popen/pclose or is there on API? Cheers, Fons. Andrew Hanushevsky wrote: > Hi Fons, > > It would probably be relatively easy to do if the query was entered via > the OLB admin interface. It's more difficult to do via an xroot protocol > query request. Would that satisfy you? > > Andy > > ----- Original Message ----- From: "Fons Rademakers" > <[log in to unmask]> > To: "Fabrizio Furano" <[log in to unmask]> > Cc: "Jan Iwaszkiewicz" <[log in to unmask]>; <[log in to unmask]>; > <[log in to unmask]>; "Gerri Ganis" <[log in to unmask]> > Sent: Saturday, August 19, 2006 3:02 PM > Subject: Re: Quering locations of a vector of files > > >> Hi Andy, Fabrizio, >> >> what we really urgently would like to have is an xrootd command that >> takes as input a vector of generic xrootd urls and returns a vector >> with resolved urls (including multiple urls in case the same file >> exists on more than one leaf node). Of course the first time this will >> take some time since the head node will have to ask the leaf nodes, >> but from then on this info lives in the xrootd head node cache, so it >> should be very quick. We need the final location in PROOF to submit >> work packets with priority to the nodes that have the data local. >> >> Can you tell me if this feature is possible and if we can get it soon? >> >> Cheers, Fons. >> >> >> >> Fabrizio Furano wrote: >>> Hi Jan, >>> >>> I see, imho this means that there is very little overhead you can >>> overlap, at least on the client side. Or that you are opening all >>> those files towards very few servers, or the same one. I hope not. >>> >>> Anyway the async open was not meant as a way to speed up the open >>> primitive, but as a way to do other things while the open is in >>> progress, or to stage many files in parallel without serializing the >>> waits. But in your situation it seems that there are not so many >>> waits to parallelize. >>> >>> Fabrizio >>> >>> >>> Jan Iwaszkiewicz wrote: >>>> Hi! >>>> >>>> I have done some test as Fabrizio advised. >>>> The results of tests with asynchronous open are similar to those >>>> with standard open: >>>> >>>> I used the following code: >>>> >>>> TTime starttime = gSystem->Now(); >>>> TList *toOpenList = new TList(); >>>> toOpenList->SetOwner(kFALSE); >>>> TIter nextElem(fDset->GetListOfElements()); >>>> while (TDSetElement *elem = >>>> dynamic_cast<TDSetElement*>(nextElem())) { >>>> TFile::AsyncOpen(elem->GetFileName()); >>>> toOpenList->Add(elem); >>>> } >>>> >>>> TFile::EAsyncOpenStatus aos; >>>> TIter nextToOpen(toOpenList); >>>> while (toOpenList->GetSize() > 0) { >>>> while (TDSetElement* elem = >>>> dynamic_cast<TDSetElement*>(nextToOpen())) { >>>> aos = TFile::GetAsyncOpenStatus(elem->GetFileName()); >>>> if (aos == TFile::kAOSSuccess || aos == TFile::kAOSNotAsync >>>> || aos == TFile::kAOSFailure) { >>>> elem->Lookup(); >>>> toOpenList->Remove(elem); >>>> } >>>> else if (aos != TFile::kAOSInProgress) >>>> Error("fileOpenTestTmp", "unknown aos"); >>>> } >>>> nextToOpen.Reset(); >>>> }; >>>> toOpenList->Delete(); >>>> >>>> TTime endtime = gSystem->Now(); >>>> Float_t time_holder = Long_t(endtime-starttime)/Float_t(1000); >>>> cout << "Openning time was " << time_holder << " seconds" << endl; >>>> >>>> >>>> The result is: >>>> >>>> #files asynchronous standard TFile::Open >>>> 300 12.5 11.7 >>>> 240 9.68 9.4 >>>> 120 4.5 4.6 >>>> >>>> Have a nice weekend! >>>> Jan >>>> >>>> Jan Iwaszkiewicz wrote: >>>>> Hi Fabrizio, Hi Andy! >>>>> >>>>> Thank you for the answers. >>>>> I'm making tests with TFile::AsyncOpen and will keep you informed. >>>>> Maybe I should clarify that we want to lookup locations of the >>>>> files on the PROOF master node but then open the files on worker >>>>> nodes. The point of the lookup is to determine what files each >>>>> worker will open/process. For the problems that Andy described: >>>>> 1) I agree. 2) It seems to be even more important to parallelize it. >>>>> >>>>> In fact the possibility to get all locations of a file is also high >>>>> on our wish-list. It would prevent us from opening a remote file >>>>> while another copy is on one of our workers. We have no mechanism >>>>> to avoid it. I think it's quite different use case than file >>>>> serving. We want to make best use of a set of nodes belonging to a >>>>> PROOF session. It would be very usefull to have this functionality! >>>>> Cheers, >>>>> Jan >>>>> >>>>> -----Original Message----- >>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]] >>>>> Sent: Wed 8/16/2006 10:47 PM >>>>> To: Fabrizio Furano; Jan Iwaszkiewicz >>>>> Cc: [log in to unmask]; [log in to unmask]; Gerardo Ganis >>>>> Subject: Re: Quering locations of a vector of files >>>>> Hi Jan, >>>>> >>>>> Another way to speed up the processing is to use the Prepare method >>>>> that allows you to set in motion all the steps needed to get file >>>>> location information. As far as finding out the location of a list >>>>> of files, that may be doable but has problems of its own. In your >>>>> case it probably doesn't matter but in the general case two things >>>>> may happen: 1) the location may be incorrect by the time you get >>>>> the information (i.e., the file has been moved or deleted), and 2) >>>>> there is no particular location for files that don't exist yet >>>>> (this includes files that may be in an MSS but not yet on disk). >>>>> The latter is more problematical as it takes a while to determine >>>>> that. Anyway, we'll look into a mechanism to get you file location >>>>> information (one of n for each file) using a list. >>>>> >>>>> Andy >>>>> >>>>> ----- Original Message ----- From: "Fabrizio Furano" >>>>> <[log in to unmask]> >>>>> To: "Jan Iwaszkiewicz" <[log in to unmask]> >>>>> Cc: <[log in to unmask]>; "Maarten Ballintijn" >>>>> <[log in to unmask]>; "Gerri Ganis" <[log in to unmask]> >>>>> Sent: Wednesday, August 16, 2006 10:09 AM >>>>> Subject: Re: Quering locations of a vector of files >>>>> >>>>> >>>>>> Hi Jan, >>>>>> >>>>>> at the moment such a primitive is not part of the protocol. The >>>>>> simpler way of doing it is to call Stat for each file, but this >>>>>> reduces the per-file overhead only by a small amount, with respect >>>>>> to a Open call. >>>>>> In fact, both primitives actually drive the client to the final >>>>>> endpoint (the file), so you cannot avoid the overhead (mainly >>>>>> communication latencies) of being redirected to other servers. >>>>>> >>>>>> Since you say it's critical for you, my suggestion is to open as >>>>>> many files as you can in the parallel way. Doing so, all the >>>>>> latencies are parallelized, and you can expect a much higher >>>>>> performance. >>>>>> >>>>>> To do this, just call TFile::AsyncOpen(fname) for each file you >>>>>> need to open (a cycle), and then, later, you can call the regular >>>>>> TFile::Open (another cycle). >>>>>> The async call is non-blocking and very fast. You can find an >>>>>> example of its ROOT-based usage here: >>>>>> >>>>>> http://root.cern.ch/root/Version512.news.html >>>>>> >>>>>> The ugly thing is that doing this you are using a lot of >>>>>> resources, so, if you have really a lot of files to open (let's >>>>>> say, 5000) and the resources are a problem, maybe you can find a >>>>>> workaround by opening them in bunches of fixed size. >>>>>> >>>>>> Fabrizio >>>>>> >>>>>> Jan Iwaszkiewicz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> In PROOF we realized that we need a possibility to query exact >>>>>>> locations of a set of files. As far as I have seen in the xrootd >>>>>>> protocol, there is no way to ask for locations of a vector of files. >>>>>>> >>>>>>> At the beginning of a query, we want to check exact locations of >>>>>>> all the files form a data set. The current implementation does it >>>>>>> by opening all the files, one by one. >>>>>>> The speed is about 30 files/sec. For many queries, the lookup >>>>>>> takes much longer than the processing. >>>>>>> It is a critical problem for us. >>>>>>> >>>>>>> The bool XrdClientAdmin::SysStatX(const char *paths_list, >>>>>>> kXR_char *binInfo) method can check multiple files but it only >>>>>>> verifies whether the files exist. >>>>>>> I imagine that it would be best for us to have something similar >>>>>>> but returning file locations. Is such an extension to the >>>>>>> protocol possible/reasonable to implement? >>>>>>> >>>>>>> Cheers, >>>>>>> Jan >>>>> >>>>> >>>>> >>> >> >> -- >> Org: CERN, European Laboratory for Particle Physics. >> Mail: 1211 Geneve 23, Switzerland >> E-Mail: [log in to unmask] Phone: +41 22 7679248 >> WWW: http://fons.rademakers.org Fax: +41 22 7669640 >> > -- Org: CERN, European Laboratory for Particle Physics. Mail: 1211 Geneve 23, Switzerland E-Mail: [log in to unmask] Phone: +41 22 7679248 WWW: http://fons.rademakers.org Fax: +41 22 7669640