LISTSERV 16.5 - HPS-SOFTWARE Archives

Hi Omar,

Thanks for a quick response. It will be very important to really know
what is the final size of the reconstructed event. Number you have is x10
larger than the original event size. In the proposal we have x5 inflation
of the event size after reconstruction. At the meeting today Matt explained
that the number in the proposal was not well motivated, but what you
have, it seems like a good motivation. Even with removal of FPGS data
I am not sure size will go down by x10, or what we probably want x50.
Is this large size due to the overhead of the LCIO format?

We are having these discussions about formats and analysis, I think event
size will play important role in these discussions. I do not think analysis
of data that HPS will get can be done on event that will be x10 or even
x5 larger than the original event.

Regards, Stepan

On 3/6/13 10:18 PM, Omar Moreno wrote:
> Stepan,
>
> The original EVIO file is 1.5 Gb but I only ran reconstruction on half 
> the file.  There is a lot of extra information that is being stored in 
> the final reconstructed LCIO file, such as FPGA Data, that should be 
> removed so I'm sure that the file size is a bit inflated.  I'm sure 
> once we filter out junk events and remove some unnecessary collections 
> the file size will decrease significantly.
>
> --Omar Moreno
>
>
> On Wed, Mar 6, 2013 at 7:05 PM, Stepan Stepanyan <[log in to unmask] 
> <mailto:[log in to unmask]>> wrote:
>
>     Omar,
>
>     How big is the original file, before reconstruction.
>
>     Thanks, Stepan
>
>
>     On 3/6/13 9:03 PM, Omar Moreno wrote:
>>     Hello Everyone,
>>
>>     Just to give everyone an idea, a micro DST with basic track
>>     information, hit information and Ecal cluster info is approx. 29
>>     Mb/500,000 test run events.  The reconstructed LCIO file used to
>>     generate the root file was approx. 5.4 Gigs and it took about 4
>>     minutes.  I expect the size to increase for data from an electron
>>     run but it shouldn't be by much.  I'll go ahead and study this
>>     using MC data and see how much bigger the file gets.
>>
>>     --Omar Moreno
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight
>>     <[log in to unmask] <mailto:[log in to unmask]>>
>>     wrote:
>>
>>         Hi Stepan,
>>
>>         I agree 100%.  I think we want exactly what you proposed a
>>         year ago; a format with physics objects suitable for physics
>>         analysis (the proposed "micro-DST").  This kind of thing is
>>         relatively easy to provide and will be a very useful thing to
>>         have.  In fact, the kind of "flat ntuple" format that Omar
>>         began with can, I believe, be read in and operated on with
>>         PAW, since the .rz format is the same.  However, if he goes
>>         the next step as has been recommended in the software group,
>>         and writes classes to the ROOT file that require a dictionary
>>         to read back, the data format will be ROOT only.
>>
>>         A couple of points that are important to understand...
>>
>>         1) Homer brings up an important point, which is the fact that
>>         the only way we have to write these ROOT files is to use the
>>         LCIO C++ API.  That is to say, one does the java
>>         reconstruction in lcsim that creates LCIO objects and writes
>>         out an LCIO file.  Then one runs a separate C++ program that
>>         reads in the LCIO objects with the LCIO C++ API and outputs
>>         this NTuple using root classes. Therefore, no information
>>         that is currently not persisted in the LCIO EDM by our
>>         reconstruction will ever be available in the ROOT Ntuple.
>>          So, this business of writing out text files for vertexing
>>         and other information not currently being written to LCIO
>>         does not go away by creating ROOT Ntuples.  The only way to
>>         eliminate that issue is to improve the completeness of our
>>         LCIO-based EDM.  For example, Matt has been writing out
>>         vertexing information to text files and reading it back into
>>         ROOT.  However, LCIO DOES include vertex objects and if we
>>         created these during reconstruction, we would get that
>>         information in the LCIO file automatically, and it would then
>>         easily be accessible later on via LCIO.  There are a few
>>         examples of data types we might want to persist that don't
>>         have an LCIO class, but LCIO includes a "Generic Object"
>>         class that can be used to encapsulate anything we might want
>>         to add.  Again, only by getting the data we want in LCIO will
>>         it ever be accessible in ROOT.  So, in my opinion, this is
>>         where we should be focusing our attention.
>>
>>         2) As far as how to do ROOT-based analysis, Homer again
>>         touched on the heart of the matter.  One can create a ROOT
>>         Ntuple and perform analysis on that.  In practice, this
>>         rarely means using ROOT on the command line, or even CINT
>>         macros since ROOT's C interpreter is so badly broken that it
>>         is not really usable for anything other than making final
>>         plots from already-analyzed data.  In practice, one usually
>>         runs some standalone compiled C++ that uses the ROOT
>>         libraries to do the analysis on a ROOT DST.  For this reason,
>>         it is just as easy to have that compiled C++ use the LCIO C++
>>         API to access the LCIO objects directly from the LCIO DST,
>>         and then use all of the familiar ROOT tools in that code to
>>         do the analysis, writing out whatever final histograms or
>>         post-analysis ntuples one might want in to a ROOT file for
>>         later plotting.  The only difference is that in the former
>>         scenario, one learns the ROOT EDM that we invent for the DST,
>>         and for the latter, one learns the LCIO EDM.  To the extent
>>         that one is a mirror reflection of the other, one has to do
>>         just as much work writing the C++ analysis code either way.
>>          That is why it doesn't make any sense to duplicate the
>>         entire LCIO EDM in ROOT (one file for the price of two!) and
>>         why we should really only be considering creation of a new
>>         ROOT-based "micro-DST" format aimed at physics analysis that
>>         will be much slimmer than the LCIO.  Those that need more
>>         than is in the "micro-DST" can very easily run their C++/ROOT
>>         analysis code accessing the data directly from LCIO using the
>>         LCIO C++ API.
>>
>>         Cheers,
>>         Tim
>>
>>         On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan
>>         <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>
>>         > Hello Homer and Jeremy,
>>         >
>>         > It seems we all have right ideas and looks like very
>>         similar ideas on
>>         > how analysis of data must be done.
>>         > The confusion looks to me comes from definitions of
>>         "analysis" and
>>         > "DST"s. When about a year ago I
>>         > brought up the question of DSTs, and even sent out possible
>>         format
>>         > (attached document), I basically
>>         > wanted what Jeremy said in the second sentence after (3),
>>         physics
>>         > objects only. What Omar showed
>>         > today was very different from what I could describe as DSTs. I
>>         > understand Matt's point that in some
>>         > cases you will need fine details, but I am not sure if
>>         everyone will
>>         > need that level of details.
>>         > So I still think if we are talking about DSTs, the format
>>         should be
>>         > "physics objects only". And if Omar
>>         > can make use of what I proposed a year ago will be great.
>>         >
>>         > As for general analysis, if we stick with (1), than we will
>>         make large
>>         > number of collaborators who are
>>         > used to do analysis in ROOT quite unhappy. I understand
>>         that duplicating
>>         > processed data in many
>>         > formats is also not a reasonable approach. So, if (2) means
>>         (sorry for
>>         > my ignorance) we can have some
>>         > kind of "portal" that can connect LCIO recon file to ROOT,
>>         then it is
>>         > probably the best way to go.
>>         >
>>         > Again, sorry if I am misinterpreting the issue and/or
>>         repeating what was
>>         > already clear from your emails.
>>         >
>>         > Regards, Stepan
>>         >
>>         > On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote:
>>         >> Hi, Homer.
>>         >>
>>         >> Thanks for the thoughts.
>>         >>
>>         >> My view is that user analysis has three possible pathways
>>         which make sense to consider:
>>         >>
>>         >> 1) Pure Java analysis using lcsim and outputting
>>         histograms to AIDA files, viewable in JAS.
>>         >>
>>         >> 2) LCIO/ROOT analysis, reading in the LCIO recon files,
>>         looping over these events, and making histograms from a ROOT
>>         script.
>>         >>
>>         >> 3) Pure ROOT analysis, operating on a ROOT DST file.
>>         >>
>>         >> I don't really think that we need a DST containing all of
>>         the information which is already present in the final LCIO
>>         recon file.  This level of duplication is not desirable.
>>          Rather, the ROOT DST should contain physics objects only,
>>         e.g. the equivalent of LCIO ReconstructedParticles, Tracks,
>>         and Clusters, along with event information.  This should be
>>         sufficient for doing a pure physics analysis, e.g. good
>>         enough for most users.  It is also likely that it could be
>>         represented using simple arrays rather than classes, which to
>>         me is desirable for this kind of format.
>>         >>
>>         >> If one wants to look at the associated hits of the tracks,
>>         or something similarly detailed, then it seems to me that it
>>         would be better to use the #1 and #2 approaches, as we can
>>         then avoid "reinventing the wheel" by making ROOT files that
>>         mimic the structure of the existing LCIO output.  This
>>         approach would require working from the LCIO output, but I
>>         really don't see a problem there.  It is not onerous at all.
>>          The API is straightforward and well-documented, and examples
>>         can be provided.  There is already a simple analysis script
>>         in my examples that you linked which plots information from
>>         Tracks in an LCIO file using ROOT histogramming.  Similar
>>         plots could easily be made for the hits, etc.
>>         >>
>>         >> I suppose one could demand that all this data be put into
>>         ROOT including the hits, but you're left with the same
>>         problem.  Someone still has to learn the API of whatever
>>         classes are used to store the data, and the class headers
>>         also need to be loaded to interpret the data.  Whether that
>>         format is LCIO or ROOT, it is essentially the same level of
>>         knowledge that would be required.  My feeling is actually
>>         that this will be more difficult/cumbersome to work with in
>>         ROOT rather than LCIO.  I wonder why we can't just go with
>>         what we already have, e.g. the LCIO API, rather than invent
>>         something analogous which does not seem to serve a very clear
>>         purpose.  One can already use what's there in the linked
>>         example to look at the full events, so can we start there and
>>         see how far we get?
>>         >>
>>         >> If someone has a clear use case where pure ROOT data is
>>         needed at the lowest level of detail, I would consider this
>>         request, but I have seen nothing concrete so far along these
>>         lines.
>>         >>
>>         >> --Jeremy
>>         >>
>>         >> -----Original Message-----
>>         >> From: Homer [mailto:[log in to unmask]
>>         <mailto:[log in to unmask]>]
>>         >> Sent: Wednesday, March 06, 2013 2:51 PM
>>         >> To: Jaros, John A.; Graham, Mathew Thomas; McCormick,
>>         Jeremy I.; Graf, Norman A.; Moreno, Omar; Nelson, Timothy Knight
>>         >> Subject: DSTs and work on slcio files using C++
>>         >>
>>         >> Hi,
>>         >>
>>         >> I decided not to comment during the meeting because it
>>         might have created more contention and I also wanted to hear
>>         Jeremy's, Norman's and Omar's responses first before throwing
>>         this out there. That said, from the point of view of someone
>>         who has been doing lcsim SiD analysis on slcio files I find
>>         the problems with using the two formats in HPS a little
>>         strange. For SiD we take slcio files and then run jet
>>         clustering and flavor tagging using C++ code in the lcfi and
>>         >> lcfi+ packages. For the flavor tagging we write out root
>>         files for
>>         >> lcfi+ running the
>>         >> TMVA training and then for both the jet clustering and the
>>         flavor tagging we write out slcio files. I believe Malachi
>>         has done his whole analysis in C++ as a Marlin processor. I
>>         had also successfully tested reading slcio files in ROOT
>>         using a recipe provided by Jeremy. I dropped using it when I
>>         realized that it was quite simple to write the analysis in
>>         java. Perhaps one solution is to stick to doing all
>>         development, even for the DST, in java/lcsim and to just
>>         provide examples of how to access the data from C++/ROOT
>>         reading slcio files. Jeremy had documented much of this long
>>         ago at:
>>         >>
>>         >>
>>         https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT
>>         >>
>>         >> If we just provide some examples, wouldn't that help to at
>>         least put out the current fires? This would also avoid having
>>         to support numerous extra sets of data (DSTs and microDSTs in
>>         both formats with multiple passes and subsets)??
>>         >> Maybe I'm wrong but I think one can provide simple recipes
>>         or modules for accessing any of the slcio file contents in ROOT.
>>         >>
>>         >>     Homer
>>         >>
>>         >>
>>         >>
>>         ########################################################################
>>         >> Use REPLY-ALL to reply to list
>>         >>
>>         >> To unsubscribe from the HPS-SOFTWARE list, click the
>>         following link:
>>         >>
>>         https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>         <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1>
>>         >
>>         > <dst.pdf>
>>
>>
>>
>>     ------------------------------------------------------------------------
>>
>>     Use REPLY-ALL to reply to list
>>
>>     To unsubscribe from the HPS-SOFTWARE list, click the following link:
>>     https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>     <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1>
>>
>>
>
>
> ------------------------------------------------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 
> <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1