Stepan, The original EVIO file is 1.5 Gb but I only ran reconstruction on half the file. There is a lot of extra information that is being stored in the final reconstructed LCIO file, such as FPGA Data, that should be removed so I'm sure that the file size is a bit inflated. I'm sure once we filter out junk events and remove some unnecessary collections the file size will decrease significantly. --Omar Moreno On Wed, Mar 6, 2013 at 7:05 PM, Stepan Stepanyan <[log in to unmask]> wrote: > Omar, > > How big is the original file, before reconstruction. > > Thanks, Stepan > > > On 3/6/13 9:03 PM, Omar Moreno wrote: > > Hello Everyone, > > Just to give everyone an idea, a micro DST with basic track information, > hit information and Ecal cluster info is approx. 29 Mb/500,000 test run > events. The reconstructed LCIO file used to generate the root file was > approx. 5.4 Gigs and it took about 4 minutes. I expect the size to > increase for data from an electron run but it shouldn't be by much. I'll > go ahead and study this using MC data and see how much bigger the file > gets. > > --Omar Moreno > > > > > > > > > > On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight < > [log in to unmask]> wrote: > >> Hi Stepan, >> >> I agree 100%. I think we want exactly what you proposed a year ago; a >> format with physics objects suitable for physics analysis (the proposed >> "micro-DST"). This kind of thing is relatively easy to provide and will be >> a very useful thing to have. In fact, the kind of "flat ntuple" format >> that Omar began with can, I believe, be read in and operated on with PAW, >> since the .rz format is the same. However, if he goes the next step as has >> been recommended in the software group, and writes classes to the ROOT file >> that require a dictionary to read back, the data format will be ROOT only. >> >> A couple of points that are important to understand... >> >> 1) Homer brings up an important point, which is the fact that the only >> way we have to write these ROOT files is to use the LCIO C++ API. That is >> to say, one does the java reconstruction in lcsim that creates LCIO objects >> and writes out an LCIO file. Then one runs a separate C++ program that >> reads in the LCIO objects with the LCIO C++ API and outputs this NTuple >> using root classes. Therefore, no information that is currently not >> persisted in the LCIO EDM by our reconstruction will ever be available in >> the ROOT Ntuple. So, this business of writing out text files for vertexing >> and other information not currently being written to LCIO does not go away >> by creating ROOT Ntuples. The only way to eliminate that issue is to >> improve the completeness of our LCIO-based EDM. For example, Matt has been >> writing out vertexing information to text files and reading it back into >> ROOT. However, LCIO DOES include vertex objects and if we created these >> during reconstruction, we would get that information in the LCIO file >> automatically, and it would then easily be accessible later on via LCIO. >> There are a few examples of data types we might want to persist that don't >> have an LCIO class, but LCIO includes a "Generic Object" class that can be >> used to encapsulate anything we might want to add. Again, only by getting >> the data we want in LCIO will it ever be accessible in ROOT. So, in my >> opinion, this is where we should be focusing our attention. >> >> 2) As far as how to do ROOT-based analysis, Homer again touched on the >> heart of the matter. One can create a ROOT Ntuple and perform analysis on >> that. In practice, this rarely means using ROOT on the command line, or >> even CINT macros since ROOT's C interpreter is so badly broken that it is >> not really usable for anything other than making final plots from >> already-analyzed data. In practice, one usually runs some standalone >> compiled C++ that uses the ROOT libraries to do the analysis on a ROOT DST. >> For this reason, it is just as easy to have that compiled C++ use the LCIO >> C++ API to access the LCIO objects directly from the LCIO DST, and then use >> all of the familiar ROOT tools in that code to do the analysis, writing out >> whatever final histograms or post-analysis ntuples one might want in to a >> ROOT file for later plotting. The only difference is that in the former >> scenario, one learns the ROOT EDM that we invent for the DST, and for the >> latter, one learns the LCIO EDM. To the extent that one is a mirror >> reflection of the other, one has to do just as much work writing the C++ >> analysis code either way. That is why it doesn't make any sense to >> duplicate the entire LCIO EDM in ROOT (one file for the price of two!) and >> why we should really only be considering creation of a new ROOT-based >> "micro-DST" format aimed at physics analysis that will be much slimmer than >> the LCIO. Those that need more than is in the "micro-DST" can very easily >> run their C++/ROOT analysis code accessing the data directly from LCIO >> using the LCIO C++ API. >> >> Cheers, >> Tim >> >> On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan <[log in to unmask]> wrote: >> >> > Hello Homer and Jeremy, >> > >> > It seems we all have right ideas and looks like very similar ideas on >> > how analysis of data must be done. >> > The confusion looks to me comes from definitions of "analysis" and >> > "DST"s. When about a year ago I >> > brought up the question of DSTs, and even sent out possible format >> > (attached document), I basically >> > wanted what Jeremy said in the second sentence after (3), physics >> > objects only. What Omar showed >> > today was very different from what I could describe as DSTs. I >> > understand Matt's point that in some >> > cases you will need fine details, but I am not sure if everyone will >> > need that level of details. >> > So I still think if we are talking about DSTs, the format should be >> > "physics objects only". And if Omar >> > can make use of what I proposed a year ago will be great. >> > >> > As for general analysis, if we stick with (1), than we will make large >> > number of collaborators who are >> > used to do analysis in ROOT quite unhappy. I understand that duplicating >> > processed data in many >> > formats is also not a reasonable approach. So, if (2) means (sorry for >> > my ignorance) we can have some >> > kind of "portal" that can connect LCIO recon file to ROOT, then it is >> > probably the best way to go. >> > >> > Again, sorry if I am misinterpreting the issue and/or repeating what was >> > already clear from your emails. >> > >> > Regards, Stepan >> > >> > On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote: >> >> Hi, Homer. >> >> >> >> Thanks for the thoughts. >> >> >> >> My view is that user analysis has three possible pathways which make >> sense to consider: >> >> >> >> 1) Pure Java analysis using lcsim and outputting histograms to AIDA >> files, viewable in JAS. >> >> >> >> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping over >> these events, and making histograms from a ROOT script. >> >> >> >> 3) Pure ROOT analysis, operating on a ROOT DST file. >> >> >> >> I don't really think that we need a DST containing all of the >> information which is already present in the final LCIO recon file. This >> level of duplication is not desirable. Rather, the ROOT DST should contain >> physics objects only, e.g. the equivalent of LCIO ReconstructedParticles, >> Tracks, and Clusters, along with event information. This should be >> sufficient for doing a pure physics analysis, e.g. good enough for most >> users. It is also likely that it could be represented using simple arrays >> rather than classes, which to me is desirable for this kind of format. >> >> >> >> If one wants to look at the associated hits of the tracks, or >> something similarly detailed, then it seems to me that it would be better >> to use the #1 and #2 approaches, as we can then avoid "reinventing the >> wheel" by making ROOT files that mimic the structure of the existing LCIO >> output. This approach would require working from the LCIO output, but I >> really don't see a problem there. It is not onerous at all. The API is >> straightforward and well-documented, and examples can be provided. There >> is already a simple analysis script in my examples that you linked which >> plots information from Tracks in an LCIO file using ROOT histogramming. >> Similar plots could easily be made for the hits, etc. >> >> >> >> I suppose one could demand that all this data be put into ROOT >> including the hits, but you're left with the same problem. Someone still >> has to learn the API of whatever classes are used to store the data, and >> the class headers also need to be loaded to interpret the data. Whether >> that format is LCIO or ROOT, it is essentially the same level of knowledge >> that would be required. My feeling is actually that this will be more >> difficult/cumbersome to work with in ROOT rather than LCIO. I wonder why >> we can't just go with what we already have, e.g. the LCIO API, rather than >> invent something analogous which does not seem to serve a very clear >> purpose. One can already use what's there in the linked example to look at >> the full events, so can we start there and see how far we get? >> >> >> >> If someone has a clear use case where pure ROOT data is needed at the >> lowest level of detail, I would consider this request, but I have seen >> nothing concrete so far along these lines. >> >> >> >> --Jeremy >> >> >> >> -----Original Message----- >> >> From: Homer [mailto:[log in to unmask]] >> >> Sent: Wednesday, March 06, 2013 2:51 PM >> >> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy I.; Graf, >> Norman A.; Moreno, Omar; Nelson, Timothy Knight >> >> Subject: DSTs and work on slcio files using C++ >> >> >> >> Hi, >> >> >> >> I decided not to comment during the meeting because it might have >> created more contention and I also wanted to hear Jeremy's, Norman's and >> Omar's responses first before throwing this out there. That said, from the >> point of view of someone who has been doing lcsim SiD analysis on slcio >> files I find the problems with using the two formats in HPS a little >> strange. For SiD we take slcio files and then run jet clustering and flavor >> tagging using C++ code in the lcfi and >> >> lcfi+ packages. For the flavor tagging we write out root files for >> >> lcfi+ running the >> >> TMVA training and then for both the jet clustering and the flavor >> tagging we write out slcio files. I believe Malachi has done his whole >> analysis in C++ as a Marlin processor. I had also successfully tested >> reading slcio files in ROOT using a recipe provided by Jeremy. I dropped >> using it when I realized that it was quite simple to write the analysis in >> java. Perhaps one solution is to stick to doing all development, even for >> the DST, in java/lcsim and to just provide examples of how to access the >> data from C++/ROOT reading slcio files. Jeremy had documented much of this >> long ago at: >> >> >> >> >> https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT >> >> >> >> If we just provide some examples, wouldn't that help to at least put >> out the current fires? This would also avoid having to support numerous >> extra sets of data (DSTs and microDSTs in both formats with multiple passes >> and subsets)?? >> >> Maybe I'm wrong but I think one can provide simple recipes or modules >> for accessing any of the slcio file contents in ROOT. >> >> >> >> Homer >> >> >> >> >> >> >> ######################################################################## >> >> Use REPLY-ALL to reply to list >> >> >> >> To unsubscribe from the HPS-SOFTWARE list, click the following link: >> >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 >> > >> > <dst.pdf> >> >> > > ------------------------------ > > Use REPLY-ALL to reply to list > > To unsubscribe from the HPS-SOFTWARE list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 > > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the HPS-SOFTWARE list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1