Print

Print


Omar,

How big is the original file, before reconstruction.

Thanks, Stepan

On 3/6/13 9:03 PM, Omar Moreno wrote:
> Hello Everyone,
>
> Just to give everyone an idea, a micro DST with basic track 
> information, hit information and Ecal cluster info is approx. 29 
> Mb/500,000 test run events.  The reconstructed LCIO file used to 
> generate the root file was approx. 5.4 Gigs and it took about 4 
> minutes.  I expect the size to increase for data from an electron run 
> but it shouldn't be by much.  I'll go ahead and study this using MC 
> data and see how much bigger the file gets.
>
> --Omar Moreno
>
>
>
>
>
>
>
>
>
> On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight 
> <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>
>     Hi Stepan,
>
>     I agree 100%.  I think we want exactly what you proposed a year
>     ago; a format with physics objects suitable for physics analysis
>     (the proposed "micro-DST").  This kind of thing is relatively easy
>     to provide and will be a very useful thing to have.  In fact, the
>     kind of "flat ntuple" format that Omar began with can, I believe,
>     be read in and operated on with PAW, since the .rz format is the
>     same.  However, if he goes the next step as has been recommended
>     in the software group, and writes classes to the ROOT file that
>     require a dictionary to read back, the data format will be ROOT only.
>
>     A couple of points that are important to understand...
>
>     1) Homer brings up an important point, which is the fact that the
>     only way we have to write these ROOT files is to use the LCIO C++
>     API.  That is to say, one does the java reconstruction in lcsim
>     that creates LCIO objects and writes out an LCIO file.  Then one
>     runs a separate C++ program that reads in the LCIO objects with
>     the LCIO C++ API and outputs this NTuple using root classes.
>     Therefore, no information that is currently not persisted in the
>     LCIO EDM by our reconstruction will ever be available in the ROOT
>     Ntuple.  So, this business of writing out text files for vertexing
>     and other information not currently being written to LCIO does not
>     go away by creating ROOT Ntuples.  The only way to eliminate that
>     issue is to improve the completeness of our LCIO-based EDM.  For
>     example, Matt has been writing out vertexing information to text
>     files and reading it back into ROOT.  However, LCIO DOES include
>     vertex objects and if we created these during reconstruction, we
>     would get that information in the LCIO file automatically, and it
>     would then easily be accessible later on via LCIO.  There are a
>     few examples of data types we might want to persist that don't
>     have an LCIO class, but LCIO includes a "Generic Object" class
>     that can be used to encapsulate anything we might want to add.
>      Again, only by getting the data we want in LCIO will it ever be
>     accessible in ROOT.  So, in my opinion, this is where we should be
>     focusing our attention.
>
>     2) As far as how to do ROOT-based analysis, Homer again touched on
>     the heart of the matter.  One can create a ROOT Ntuple and perform
>     analysis on that.  In practice, this rarely means using ROOT on
>     the command line, or even CINT macros since ROOT's C interpreter
>     is so badly broken that it is not really usable for anything other
>     than making final plots from already-analyzed data.  In practice,
>     one usually runs some standalone compiled C++ that uses the ROOT
>     libraries to do the analysis on a ROOT DST.  For this reason, it
>     is just as easy to have that compiled C++ use the LCIO C++ API to
>     access the LCIO objects directly from the LCIO DST, and then use
>     all of the familiar ROOT tools in that code to do the analysis,
>     writing out whatever final histograms or post-analysis ntuples one
>     might want in to a ROOT file for later plotting.  The only
>     difference is that in the former scenario, one learns the ROOT EDM
>     that we invent for the DST, and for the latter, one learns the
>     LCIO EDM.  To the extent that one is a mirror reflection of the
>     other, one has to do just as much work writing the C++ analysis
>     code either way.  That is why it doesn't make any sense to
>     duplicate the entire LCIO EDM in ROOT (one file for the price of
>     two!) and why we should really only be considering creation of a
>     new ROOT-based "micro-DST" format aimed at physics analysis that
>     will be much slimmer than the LCIO.  Those that need more than is
>     in the "micro-DST" can very easily run their C++/ROOT analysis
>     code accessing the data directly from LCIO using the LCIO C++ API.
>
>     Cheers,
>     Tim
>
>     On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan <[log in to unmask]
>     <mailto:[log in to unmask]>> wrote:
>
>     > Hello Homer and Jeremy,
>     >
>     > It seems we all have right ideas and looks like very similar
>     ideas on
>     > how analysis of data must be done.
>     > The confusion looks to me comes from definitions of "analysis" and
>     > "DST"s. When about a year ago I
>     > brought up the question of DSTs, and even sent out possible format
>     > (attached document), I basically
>     > wanted what Jeremy said in the second sentence after (3), physics
>     > objects only. What Omar showed
>     > today was very different from what I could describe as DSTs. I
>     > understand Matt's point that in some
>     > cases you will need fine details, but I am not sure if everyone will
>     > need that level of details.
>     > So I still think if we are talking about DSTs, the format should be
>     > "physics objects only". And if Omar
>     > can make use of what I proposed a year ago will be great.
>     >
>     > As for general analysis, if we stick with (1), than we will make
>     large
>     > number of collaborators who are
>     > used to do analysis in ROOT quite unhappy. I understand that
>     duplicating
>     > processed data in many
>     > formats is also not a reasonable approach. So, if (2) means
>     (sorry for
>     > my ignorance) we can have some
>     > kind of "portal" that can connect LCIO recon file to ROOT, then
>     it is
>     > probably the best way to go.
>     >
>     > Again, sorry if I am misinterpreting the issue and/or repeating
>     what was
>     > already clear from your emails.
>     >
>     > Regards, Stepan
>     >
>     > On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote:
>     >> Hi, Homer.
>     >>
>     >> Thanks for the thoughts.
>     >>
>     >> My view is that user analysis has three possible pathways which
>     make sense to consider:
>     >>
>     >> 1) Pure Java analysis using lcsim and outputting histograms to
>     AIDA files, viewable in JAS.
>     >>
>     >> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping
>     over these events, and making histograms from a ROOT script.
>     >>
>     >> 3) Pure ROOT analysis, operating on a ROOT DST file.
>     >>
>     >> I don't really think that we need a DST containing all of the
>     information which is already present in the final LCIO recon file.
>      This level of duplication is not desirable.  Rather, the ROOT DST
>     should contain physics objects only, e.g. the equivalent of LCIO
>     ReconstructedParticles, Tracks, and Clusters, along with event
>     information.  This should be sufficient for doing a pure physics
>     analysis, e.g. good enough for most users.  It is also likely that
>     it could be represented using simple arrays rather than classes,
>     which to me is desirable for this kind of format.
>     >>
>     >> If one wants to look at the associated hits of the tracks, or
>     something similarly detailed, then it seems to me that it would be
>     better to use the #1 and #2 approaches, as we can then avoid
>     "reinventing the wheel" by making ROOT files that mimic the
>     structure of the existing LCIO output.  This approach would
>     require working from the LCIO output, but I really don't see a
>     problem there.  It is not onerous at all.  The API is
>     straightforward and well-documented, and examples can be provided.
>      There is already a simple analysis script in my examples that you
>     linked which plots information from Tracks in an LCIO file using
>     ROOT histogramming.  Similar plots could easily be made for the
>     hits, etc.
>     >>
>     >> I suppose one could demand that all this data be put into ROOT
>     including the hits, but you're left with the same problem.
>      Someone still has to learn the API of whatever classes are used
>     to store the data, and the class headers also need to be loaded to
>     interpret the data.  Whether that format is LCIO or ROOT, it is
>     essentially the same level of knowledge that would be required.
>      My feeling is actually that this will be more
>     difficult/cumbersome to work with in ROOT rather than LCIO.  I
>     wonder why we can't just go with what we already have, e.g. the
>     LCIO API, rather than invent something analogous which does not
>     seem to serve a very clear purpose.  One can already use what's
>     there in the linked example to look at the full events, so can we
>     start there and see how far we get?
>     >>
>     >> If someone has a clear use case where pure ROOT data is needed
>     at the lowest level of detail, I would consider this request, but
>     I have seen nothing concrete so far along these lines.
>     >>
>     >> --Jeremy
>     >>
>     >> -----Original Message-----
>     >> From: Homer [mailto:[log in to unmask]
>     <mailto:[log in to unmask]>]
>     >> Sent: Wednesday, March 06, 2013 2:51 PM
>     >> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy
>     I.; Graf, Norman A.; Moreno, Omar; Nelson, Timothy Knight
>     >> Subject: DSTs and work on slcio files using C++
>     >>
>     >> Hi,
>     >>
>     >> I decided not to comment during the meeting because it might
>     have created more contention and I also wanted to hear Jeremy's,
>     Norman's and Omar's responses first before throwing this out
>     there. That said, from the point of view of someone who has been
>     doing lcsim SiD analysis on slcio files I find the problems with
>     using the two formats in HPS a little strange. For SiD we take
>     slcio files and then run jet clustering and flavor tagging using
>     C++ code in the lcfi and
>     >> lcfi+ packages. For the flavor tagging we write out root files for
>     >> lcfi+ running the
>     >> TMVA training and then for both the jet clustering and the
>     flavor tagging we write out slcio files. I believe Malachi has
>     done his whole analysis in C++ as a Marlin processor. I had also
>     successfully tested reading slcio files in ROOT using a recipe
>     provided by Jeremy. I dropped using it when I realized that it was
>     quite simple to write the analysis in java. Perhaps one solution
>     is to stick to doing all development, even for the DST, in
>     java/lcsim and to just provide examples of how to access the data
>     from C++/ROOT reading slcio files. Jeremy had documented much of
>     this long ago at:
>     >>
>     >>
>     https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT
>     >>
>     >> If we just provide some examples, wouldn't that help to at
>     least put out the current fires? This would also avoid having to
>     support numerous extra sets of data (DSTs and microDSTs in both
>     formats with multiple passes and subsets)??
>     >> Maybe I'm wrong but I think one can provide simple recipes or
>     modules for accessing any of the slcio file contents in ROOT.
>     >>
>     >>     Homer
>     >>
>     >>
>     >>
>     ########################################################################
>     >> Use REPLY-ALL to reply to list
>     >>
>     >> To unsubscribe from the HPS-SOFTWARE list, click the following
>     link:
>     >>
>     https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>     <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1>
>     >
>     > <dst.pdf>
>
>
>
> ------------------------------------------------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 
> <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1