Print

Print


Stepan,

The original EVIO file is 1.5 Gb but I only ran reconstruction on half the
file.  There is a lot of extra information that is being stored in the
final reconstructed LCIO file, such as FPGA Data, that should be removed so
I'm sure that the file size is a bit inflated.  I'm sure once we filter out
junk events and remove some unnecessary collections the file size will
decrease significantly.

--Omar Moreno


On Wed, Mar 6, 2013 at 7:05 PM, Stepan Stepanyan <[log in to unmask]> wrote:

>  Omar,
>
> How big is the original file, before reconstruction.
>
> Thanks, Stepan
>
>
> On 3/6/13 9:03 PM, Omar Moreno wrote:
>
> Hello Everyone,
>
>  Just to give everyone an idea, a micro DST with basic track information,
> hit information and Ecal cluster info is approx. 29 Mb/500,000 test run
> events.  The reconstructed LCIO file used to generate the root file was
> approx. 5.4 Gigs and it took about 4 minutes.  I expect the size to
> increase for data from an electron run but it shouldn't be by much.  I'll
> go ahead and study this using MC data and see how much bigger the file
> gets.
>
>  --Omar Moreno
>
>
>
>
>
>
>
>
>
> On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight <
> [log in to unmask]> wrote:
>
>> Hi Stepan,
>>
>> I agree 100%.  I think we want exactly what you proposed a year ago; a
>> format with physics objects suitable for physics analysis (the proposed
>> "micro-DST").  This kind of thing is relatively easy to provide and will be
>> a very useful thing to have.  In fact, the kind of "flat ntuple" format
>> that Omar began with can, I believe, be read in and operated on with PAW,
>> since the .rz format is the same.  However, if he goes the next step as has
>> been recommended in the software group, and writes classes to the ROOT file
>> that require a dictionary to read back, the data format will be ROOT only.
>>
>> A couple of points that are important to understand...
>>
>> 1) Homer brings up an important point, which is the fact that the only
>> way we have to write these ROOT files is to use the LCIO C++ API.  That is
>> to say, one does the java reconstruction in lcsim that creates LCIO objects
>> and writes out an LCIO file.  Then one runs a separate C++ program that
>> reads in the LCIO objects with the LCIO C++ API and outputs this NTuple
>> using root classes. Therefore, no information that is currently not
>> persisted in the LCIO EDM by our reconstruction will ever be available in
>> the ROOT Ntuple.  So, this business of writing out text files for vertexing
>> and other information not currently being written to LCIO does not go away
>> by creating ROOT Ntuples.  The only way to eliminate that issue is to
>> improve the completeness of our LCIO-based EDM.  For example, Matt has been
>> writing out vertexing information to text files and reading it back into
>> ROOT.  However, LCIO DOES include vertex objects and if we created these
>> during reconstruction, we would get that information in the LCIO file
>> automatically, and it would then easily be accessible later on via LCIO.
>>  There are a few examples of data types we might want to persist that don't
>> have an LCIO class, but LCIO includes a "Generic Object" class that can be
>> used to encapsulate anything we might want to add.  Again, only by getting
>> the data we want in LCIO will it ever be accessible in ROOT.  So, in my
>> opinion, this is where we should be focusing our attention.
>>
>> 2) As far as how to do ROOT-based analysis, Homer again touched on the
>> heart of the matter.  One can create a ROOT Ntuple and perform analysis on
>> that.  In practice, this rarely means using ROOT on the command line, or
>> even CINT macros since ROOT's C interpreter is so badly broken that it is
>> not really usable for anything other than making final plots from
>> already-analyzed data.  In practice, one usually runs some standalone
>> compiled C++ that uses the ROOT libraries to do the analysis on a ROOT DST.
>>  For this reason, it is just as easy to have that compiled C++ use the LCIO
>> C++ API to access the LCIO objects directly from the LCIO DST, and then use
>> all of the familiar ROOT tools in that code to do the analysis, writing out
>> whatever final histograms or post-analysis ntuples one might want in to a
>> ROOT file for later plotting.  The only difference is that in the former
>> scenario, one learns the ROOT EDM that we invent for the DST, and for the
>> latter, one learns the LCIO EDM.  To the extent that one is a mirror
>> reflection of the other, one has to do just as much work writing the C++
>> analysis code either way.  That is why it doesn't make any sense to
>> duplicate the entire LCIO EDM in ROOT (one file for the price of two!) and
>> why we should really only be considering creation of a new ROOT-based
>> "micro-DST" format aimed at physics analysis that will be much slimmer than
>> the LCIO.  Those that need more than is in the "micro-DST" can very easily
>> run their C++/ROOT analysis code accessing the data directly from LCIO
>> using the LCIO C++ API.
>>
>> Cheers,
>> Tim
>>
>> On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan <[log in to unmask]> wrote:
>>
>> > Hello Homer and Jeremy,
>> >
>> > It seems we all have right ideas and looks like very similar ideas on
>> > how analysis of data must be done.
>> > The confusion looks to me comes from definitions of "analysis" and
>> > "DST"s. When about a year ago I
>> > brought up the question of DSTs, and even sent out possible format
>> > (attached document), I basically
>> > wanted what Jeremy said in the second sentence after (3), physics
>> > objects only. What Omar showed
>> > today was very different from what I could describe as DSTs. I
>> > understand Matt's point that in some
>> > cases you will need fine details, but I am not sure if everyone will
>> > need that level of details.
>> > So I still think if we are talking about DSTs, the format should be
>> > "physics objects only". And if Omar
>> > can make use of what I proposed a year ago will be great.
>> >
>> > As for general analysis, if we stick with (1), than we will make large
>> > number of collaborators who are
>> > used to do analysis in ROOT quite unhappy. I understand that duplicating
>> > processed data in many
>> > formats is also not a reasonable approach. So, if (2) means (sorry for
>> > my ignorance) we can have some
>> > kind of "portal" that can connect LCIO recon file to ROOT, then it is
>> > probably the best way to go.
>> >
>> > Again, sorry if I am misinterpreting the issue and/or repeating what was
>> > already clear from your emails.
>> >
>> > Regards, Stepan
>> >
>> > On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote:
>> >> Hi, Homer.
>> >>
>> >> Thanks for the thoughts.
>> >>
>> >> My view is that user analysis has three possible pathways which make
>> sense to consider:
>> >>
>> >> 1) Pure Java analysis using lcsim and outputting histograms to AIDA
>> files, viewable in JAS.
>> >>
>> >> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping over
>> these events, and making histograms from a ROOT script.
>> >>
>> >> 3) Pure ROOT analysis, operating on a ROOT DST file.
>> >>
>> >> I don't really think that we need a DST containing all of the
>> information which is already present in the final LCIO recon file.  This
>> level of duplication is not desirable.  Rather, the ROOT DST should contain
>> physics objects only, e.g. the equivalent of LCIO ReconstructedParticles,
>> Tracks, and Clusters, along with event information.  This should be
>> sufficient for doing a pure physics analysis, e.g. good enough for most
>> users.  It is also likely that it could be represented using simple arrays
>> rather than classes, which to me is desirable for this kind of format.
>> >>
>> >> If one wants to look at the associated hits of the tracks, or
>> something similarly detailed, then it seems to me that it would be better
>> to use the #1 and #2 approaches, as we can then avoid "reinventing the
>> wheel" by making ROOT files that mimic the structure of the existing LCIO
>> output.  This approach would require working from the LCIO output, but I
>> really don't see a problem there.  It is not onerous at all.  The API is
>> straightforward and well-documented, and examples can be provided.  There
>> is already a simple analysis script in my examples that you linked which
>> plots information from Tracks in an LCIO file using ROOT histogramming.
>>  Similar plots could easily be made for the hits, etc.
>> >>
>> >> I suppose one could demand that all this data be put into ROOT
>> including the hits, but you're left with the same problem.  Someone still
>> has to learn the API of whatever classes are used to store the data, and
>> the class headers also need to be loaded to interpret the data.  Whether
>> that format is LCIO or ROOT, it is essentially the same level of knowledge
>> that would be required.  My feeling is actually that this will be more
>> difficult/cumbersome to work with in ROOT rather than LCIO.  I wonder why
>> we can't just go with what we already have, e.g. the LCIO API, rather than
>> invent something analogous which does not seem to serve a very clear
>> purpose.  One can already use what's there in the linked example to look at
>> the full events, so can we start there and see how far we get?
>> >>
>> >> If someone has a clear use case where pure ROOT data is needed at the
>> lowest level of detail, I would consider this request, but I have seen
>> nothing concrete so far along these lines.
>> >>
>> >> --Jeremy
>> >>
>> >> -----Original Message-----
>> >> From: Homer [mailto:[log in to unmask]]
>> >> Sent: Wednesday, March 06, 2013 2:51 PM
>> >> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy I.; Graf,
>> Norman A.; Moreno, Omar; Nelson, Timothy Knight
>> >> Subject: DSTs and work on slcio files using C++
>> >>
>> >> Hi,
>> >>
>> >> I decided not to comment during the meeting because it might have
>> created more contention and I also wanted to hear Jeremy's, Norman's and
>> Omar's responses first before throwing this out there. That said, from the
>> point of view of someone who has been doing lcsim SiD analysis on slcio
>> files I find the problems with using the two formats in HPS a little
>> strange. For SiD we take slcio files and then run jet clustering and flavor
>> tagging using C++ code in the lcfi and
>> >> lcfi+ packages. For the flavor tagging we write out root files for
>> >> lcfi+ running the
>> >> TMVA training and then for both the jet clustering and the flavor
>> tagging we write out slcio files. I believe Malachi has done his whole
>> analysis in C++ as a Marlin processor. I had also successfully tested
>> reading slcio files in ROOT using a recipe provided by Jeremy. I dropped
>> using it when I realized that it was quite simple to write the analysis in
>> java. Perhaps one solution is to stick to doing all development, even for
>> the DST, in java/lcsim and to just provide examples of how to access the
>> data from C++/ROOT reading slcio files. Jeremy had documented much of this
>> long ago at:
>> >>
>> >>
>> https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT
>> >>
>> >> If we just provide some examples, wouldn't that help to at least put
>> out the current fires? This would also avoid having to support numerous
>> extra sets of data (DSTs and microDSTs in both formats with multiple passes
>> and subsets)??
>> >> Maybe I'm wrong but I think one can provide simple recipes or modules
>> for accessing any of the slcio file contents in ROOT.
>> >>
>> >>     Homer
>> >>
>> >>
>> >>
>> ########################################################################
>> >> Use REPLY-ALL to reply to list
>> >>
>> >> To unsubscribe from the HPS-SOFTWARE list, click the following link:
>> >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>> >
>>  > <dst.pdf>
>>
>>
>
> ------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1