Omar, How big is the original file, before reconstruction. Thanks, Stepan On 3/6/13 9:03 PM, Omar Moreno wrote: > Hello Everyone, > > Just to give everyone an idea, a micro DST with basic track > information, hit information and Ecal cluster info is approx. 29 > Mb/500,000 test run events. The reconstructed LCIO file used to > generate the root file was approx. 5.4 Gigs and it took about 4 > minutes. I expect the size to increase for data from an electron run > but it shouldn't be by much. I'll go ahead and study this using MC > data and see how much bigger the file gets. > > --Omar Moreno > > > > > > > > > > On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight > <[log in to unmask] <mailto:[log in to unmask]>> wrote: > > Hi Stepan, > > I agree 100%. I think we want exactly what you proposed a year > ago; a format with physics objects suitable for physics analysis > (the proposed "micro-DST"). This kind of thing is relatively easy > to provide and will be a very useful thing to have. In fact, the > kind of "flat ntuple" format that Omar began with can, I believe, > be read in and operated on with PAW, since the .rz format is the > same. However, if he goes the next step as has been recommended > in the software group, and writes classes to the ROOT file that > require a dictionary to read back, the data format will be ROOT only. > > A couple of points that are important to understand... > > 1) Homer brings up an important point, which is the fact that the > only way we have to write these ROOT files is to use the LCIO C++ > API. That is to say, one does the java reconstruction in lcsim > that creates LCIO objects and writes out an LCIO file. Then one > runs a separate C++ program that reads in the LCIO objects with > the LCIO C++ API and outputs this NTuple using root classes. > Therefore, no information that is currently not persisted in the > LCIO EDM by our reconstruction will ever be available in the ROOT > Ntuple. So, this business of writing out text files for vertexing > and other information not currently being written to LCIO does not > go away by creating ROOT Ntuples. The only way to eliminate that > issue is to improve the completeness of our LCIO-based EDM. For > example, Matt has been writing out vertexing information to text > files and reading it back into ROOT. However, LCIO DOES include > vertex objects and if we created these during reconstruction, we > would get that information in the LCIO file automatically, and it > would then easily be accessible later on via LCIO. There are a > few examples of data types we might want to persist that don't > have an LCIO class, but LCIO includes a "Generic Object" class > that can be used to encapsulate anything we might want to add. > Again, only by getting the data we want in LCIO will it ever be > accessible in ROOT. So, in my opinion, this is where we should be > focusing our attention. > > 2) As far as how to do ROOT-based analysis, Homer again touched on > the heart of the matter. One can create a ROOT Ntuple and perform > analysis on that. In practice, this rarely means using ROOT on > the command line, or even CINT macros since ROOT's C interpreter > is so badly broken that it is not really usable for anything other > than making final plots from already-analyzed data. In practice, > one usually runs some standalone compiled C++ that uses the ROOT > libraries to do the analysis on a ROOT DST. For this reason, it > is just as easy to have that compiled C++ use the LCIO C++ API to > access the LCIO objects directly from the LCIO DST, and then use > all of the familiar ROOT tools in that code to do the analysis, > writing out whatever final histograms or post-analysis ntuples one > might want in to a ROOT file for later plotting. The only > difference is that in the former scenario, one learns the ROOT EDM > that we invent for the DST, and for the latter, one learns the > LCIO EDM. To the extent that one is a mirror reflection of the > other, one has to do just as much work writing the C++ analysis > code either way. That is why it doesn't make any sense to > duplicate the entire LCIO EDM in ROOT (one file for the price of > two!) and why we should really only be considering creation of a > new ROOT-based "micro-DST" format aimed at physics analysis that > will be much slimmer than the LCIO. Those that need more than is > in the "micro-DST" can very easily run their C++/ROOT analysis > code accessing the data directly from LCIO using the LCIO C++ API. > > Cheers, > Tim > > On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan <[log in to unmask] > <mailto:[log in to unmask]>> wrote: > > > Hello Homer and Jeremy, > > > > It seems we all have right ideas and looks like very similar > ideas on > > how analysis of data must be done. > > The confusion looks to me comes from definitions of "analysis" and > > "DST"s. When about a year ago I > > brought up the question of DSTs, and even sent out possible format > > (attached document), I basically > > wanted what Jeremy said in the second sentence after (3), physics > > objects only. What Omar showed > > today was very different from what I could describe as DSTs. I > > understand Matt's point that in some > > cases you will need fine details, but I am not sure if everyone will > > need that level of details. > > So I still think if we are talking about DSTs, the format should be > > "physics objects only". And if Omar > > can make use of what I proposed a year ago will be great. > > > > As for general analysis, if we stick with (1), than we will make > large > > number of collaborators who are > > used to do analysis in ROOT quite unhappy. I understand that > duplicating > > processed data in many > > formats is also not a reasonable approach. So, if (2) means > (sorry for > > my ignorance) we can have some > > kind of "portal" that can connect LCIO recon file to ROOT, then > it is > > probably the best way to go. > > > > Again, sorry if I am misinterpreting the issue and/or repeating > what was > > already clear from your emails. > > > > Regards, Stepan > > > > On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote: > >> Hi, Homer. > >> > >> Thanks for the thoughts. > >> > >> My view is that user analysis has three possible pathways which > make sense to consider: > >> > >> 1) Pure Java analysis using lcsim and outputting histograms to > AIDA files, viewable in JAS. > >> > >> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping > over these events, and making histograms from a ROOT script. > >> > >> 3) Pure ROOT analysis, operating on a ROOT DST file. > >> > >> I don't really think that we need a DST containing all of the > information which is already present in the final LCIO recon file. > This level of duplication is not desirable. Rather, the ROOT DST > should contain physics objects only, e.g. the equivalent of LCIO > ReconstructedParticles, Tracks, and Clusters, along with event > information. This should be sufficient for doing a pure physics > analysis, e.g. good enough for most users. It is also likely that > it could be represented using simple arrays rather than classes, > which to me is desirable for this kind of format. > >> > >> If one wants to look at the associated hits of the tracks, or > something similarly detailed, then it seems to me that it would be > better to use the #1 and #2 approaches, as we can then avoid > "reinventing the wheel" by making ROOT files that mimic the > structure of the existing LCIO output. This approach would > require working from the LCIO output, but I really don't see a > problem there. It is not onerous at all. The API is > straightforward and well-documented, and examples can be provided. > There is already a simple analysis script in my examples that you > linked which plots information from Tracks in an LCIO file using > ROOT histogramming. Similar plots could easily be made for the > hits, etc. > >> > >> I suppose one could demand that all this data be put into ROOT > including the hits, but you're left with the same problem. > Someone still has to learn the API of whatever classes are used > to store the data, and the class headers also need to be loaded to > interpret the data. Whether that format is LCIO or ROOT, it is > essentially the same level of knowledge that would be required. > My feeling is actually that this will be more > difficult/cumbersome to work with in ROOT rather than LCIO. I > wonder why we can't just go with what we already have, e.g. the > LCIO API, rather than invent something analogous which does not > seem to serve a very clear purpose. One can already use what's > there in the linked example to look at the full events, so can we > start there and see how far we get? > >> > >> If someone has a clear use case where pure ROOT data is needed > at the lowest level of detail, I would consider this request, but > I have seen nothing concrete so far along these lines. > >> > >> --Jeremy > >> > >> -----Original Message----- > >> From: Homer [mailto:[log in to unmask] > <mailto:[log in to unmask]>] > >> Sent: Wednesday, March 06, 2013 2:51 PM > >> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy > I.; Graf, Norman A.; Moreno, Omar; Nelson, Timothy Knight > >> Subject: DSTs and work on slcio files using C++ > >> > >> Hi, > >> > >> I decided not to comment during the meeting because it might > have created more contention and I also wanted to hear Jeremy's, > Norman's and Omar's responses first before throwing this out > there. That said, from the point of view of someone who has been > doing lcsim SiD analysis on slcio files I find the problems with > using the two formats in HPS a little strange. For SiD we take > slcio files and then run jet clustering and flavor tagging using > C++ code in the lcfi and > >> lcfi+ packages. For the flavor tagging we write out root files for > >> lcfi+ running the > >> TMVA training and then for both the jet clustering and the > flavor tagging we write out slcio files. I believe Malachi has > done his whole analysis in C++ as a Marlin processor. I had also > successfully tested reading slcio files in ROOT using a recipe > provided by Jeremy. I dropped using it when I realized that it was > quite simple to write the analysis in java. Perhaps one solution > is to stick to doing all development, even for the DST, in > java/lcsim and to just provide examples of how to access the data > from C++/ROOT reading slcio files. Jeremy had documented much of > this long ago at: > >> > >> > https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT > >> > >> If we just provide some examples, wouldn't that help to at > least put out the current fires? This would also avoid having to > support numerous extra sets of data (DSTs and microDSTs in both > formats with multiple passes and subsets)?? > >> Maybe I'm wrong but I think one can provide simple recipes or > modules for accessing any of the slcio file contents in ROOT. > >> > >> Homer > >> > >> > >> > ######################################################################## > >> Use REPLY-ALL to reply to list > >> > >> To unsubscribe from the HPS-SOFTWARE list, click the following > link: > >> > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 > <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1> > > > > <dst.pdf> > > > > ------------------------------------------------------------------------ > > Use REPLY-ALL to reply to list > > To unsubscribe from the HPS-SOFTWARE list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 > <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the HPS-SOFTWARE list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1