LISTSERV mailing list manager LISTSERV 16.5

Help for HPS-SOFTWARE Archives


HPS-SOFTWARE Archives

HPS-SOFTWARE Archives


HPS-SOFTWARE@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

HPS-SOFTWARE Home

HPS-SOFTWARE Home

HPS-SOFTWARE  March 2013

HPS-SOFTWARE March 2013

Subject:

Re: DSTs and work on slcio files using C++

From:

"Nelson, Timothy Knight" <[log in to unmask]>

Reply-To:

Software for the Heavy Photon Search Experiment <[log in to unmask]>

Date:

Wed, 6 Mar 2013 20:51:52 -0800

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (137 lines) , winmail.dat (137 lines)

Hi Omar,

I know this stuff gets discussed at the software meeting, but I think some key people (e.g. Stepan and John) maybe aren't up to speed on these things.  I understand that the "micro-DST" API is really a lot simpler and easier to deal with for "end users" doing physics analysis and that's great.  What I'm trying to avoid is having the micro-DST bloat into a largely complete replication of the LCIO, which defeats the purpose of the thing and its unique contribution to our capabilities.  So, I'm trying to add my thoughts on what this micro DST is by first considering what it should not be.

For those doing more than "end user" analysis, the LCIO C++ API isn't exactly opaque and is rather well documented:
http://lcio.desy.de/v02-01-02/doc/doxygen_api/html/index.html

For example, I don't imagine any of us would have much trouble figuring out how to do things with these classes in some C++/ROOT code to study the ECal:
http://lcio.desy.de/v02-01-02/doc/doxygen_api/html/classEVENT_1_1ReconstructedParticle.html
http://lcio.desy.de/v02-01-02/doc/doxygen_api/html/classEVENT_1_1Cluster.html
http://lcio.desy.de/v02-01-02/doc/doxygen_api/html/classEVENT_1_1CalorimeterHit.html
http://lcio.desy.de/v02-01-02/doc/doxygen_api/html/classEVENT_1_1RawCalorimeterHit.html

... so for things that require digging deeper than the micro-DST allows, there is already a solution that allows you to code in C++ (and therefore use all your favorite ROOT and CERN libraries on the data) without creating a format that fully captures all of what is in LCIO.

Tim

On Mar 6, 2013, at 6:21 PM, Omar Moreno <[log in to unmask]> wrote:

> 
> 
> 
> On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight <[log in to unmask]> wrote:
> Hi Stepan,
> 
> I agree 100%.  I think we want exactly what you proposed a year ago; a format with physics objects suitable for physics analysis (the proposed "micro-DST").  This kind of thing is relatively easy to provide and will be a very useful thing to have.  In fact, the kind of "flat ntuple" format that Omar began with can, I believe, be read in and operated on with PAW, since the .rz format is the same.  However, if he goes the next step as has been recommended in the software group, and writes classes to the ROOT file that require a dictionary to read back, the data format will be ROOT only.
> 
> A couple of points that are important to understand...
> 
> 1) Homer brings up an important point, which is the fact that the only way we have to write these ROOT files is to use the LCIO C++ API.  That is to say, one does the java reconstruction in lcsim that creates LCIO objects and writes out an LCIO file.  Then one runs a separate C++ program that reads in the LCIO objects with the LCIO C++ API and outputs this NTuple using root classes. Therefore, no information that is currently not persisted in the LCIO EDM by our reconstruction will ever be available in the ROOT Ntuple.  So, this business of writing out text files for vertexing and other information not currently being written to LCIO does not go away by creating ROOT Ntuples.  The only way to eliminate that issue is to improve the completeness of our LCIO-based EDM.  For example, Matt has been writing out vertexing information to text files and reading it back into ROOT.  However, LCIO DOES include vertex objects and if we created these during reconstruction, we would get that information in the LCIO file automatically, and it would then easily be accessible later on via LCIO.  There are a few examples of data types we might want to persist that don't have an LCIO class, but LCIO includes a "Generic Object" class that can be used to encapsulate anything we might want to add.  Again, only by getting the data we want in LCIO will it ever be accessible in ROOT.  So, in my opinion, this is where we should be focusing our attention.
> 
> This is something that was discussed at the software meeting and I believe most of us are aware of the issue.  I have began working on persisting some of this data using classes such as ReconstructedParticle and Vertex but it's still in the early stages.  Hopefully, I can make some progress on this soon.
> 
> 2) As far as how to do ROOT-based analysis, Homer again touched on the heart of the matter.  One can create a ROOT Ntuple and perform analysis on that.  In practice, this rarely means using ROOT on the command line, or even CINT macros since ROOT's C interpreter is so badly broken that it is not really usable for anything other than making final plots from already-analyzed data.  In practice, one usually runs some standalone compiled C++ that uses the ROOT libraries to do the analysis on a ROOT DST.  For this reason, it is just as easy to have that compiled C++ use the LCIO C++ API to access the LCIO objects directly from the LCIO DST, and then use all of the familiar ROOT tools in that code to do the analysis, writing out whatever final histograms or post-analysis ntuples one might want in to a ROOT file for later plotting.  The only difference is that in the former scenario, one learns the ROOT EDM that we invent for the DST, and for the latter, one learns the LCIO EDM.  To the extent that one is a mirror reflection of the other, one has to do just as much work writing the C++ analysis code either way.  That is why it doesn't make any sense to duplicate the entire LCIO EDM in ROOT (one file for the price of two!) and why we should really only be considering creation of a new ROOT-based "micro-DST" format aimed at physics analysis that will be much slimmer than the LCIO.  Those that need more than is in the "micro-DST" can very easily run their C++/ROOT analysis code accessing the data directly from LCIO using the LCIO C++ API.
> 
> I think it's unfair to compare the API that I would be providing to read the DSTs to the LCIO C++ API.  The API I will be providing should be very simple and anyone who has some experience working with ROOT TTress should be able to pick it up very quickly.  It is basically and extension of the example given in the ROOT users manual so they would have that example to work with along with all of the examples (including on in python) I will be providing as well.  Of course, this remains to be seen until people start using the API and for that to happen we need to start making DSTs available. 
> 
> 
> Cheers,
> Tim
> 
> On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan <[log in to unmask]> wrote:
> 
> > Hello Homer and Jeremy,
> >
> > It seems we all have right ideas and looks like very similar ideas on
> > how analysis of data must be done.
> > The confusion looks to me comes from definitions of "analysis" and
> > "DST"s. When about a year ago I
> > brought up the question of DSTs, and even sent out possible format
> > (attached document), I basically
> > wanted what Jeremy said in the second sentence after (3), physics
> > objects only. What Omar showed
> > today was very different from what I could describe as DSTs. I
> > understand Matt's point that in some
> > cases you will need fine details, but I am not sure if everyone will
> > need that level of details.
> > So I still think if we are talking about DSTs, the format should be
> > "physics objects only". And if Omar
> > can make use of what I proposed a year ago will be great.
> >
> > As for general analysis, if we stick with (1), than we will make large
> > number of collaborators who are
> > used to do analysis in ROOT quite unhappy. I understand that duplicating
> > processed data in many
> > formats is also not a reasonable approach. So, if (2) means (sorry for
> > my ignorance) we can have some
> > kind of "portal" that can connect LCIO recon file to ROOT, then it is
> > probably the best way to go.
> >
> > Again, sorry if I am misinterpreting the issue and/or repeating what was
> > already clear from your emails.
> >
> > Regards, Stepan
> >
> > On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote:
> >> Hi, Homer.
> >>
> >> Thanks for the thoughts.
> >>
> >> My view is that user analysis has three possible pathways which make sense to consider:
> >>
> >> 1) Pure Java analysis using lcsim and outputting histograms to AIDA files, viewable in JAS.
> >>
> >> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping over these events, and making histograms from a ROOT script.
> >>
> >> 3) Pure ROOT analysis, operating on a ROOT DST file.
> >>
> >> I don't really think that we need a DST containing all of the information which is already present in the final LCIO recon file.  This level of duplication is not desirable.  Rather, the ROOT DST should contain physics objects only, e.g. the equivalent of LCIO ReconstructedParticles, Tracks, and Clusters, along with event information.  This should be sufficient for doing a pure physics analysis, e.g. good enough for most users.  It is also likely that it could be represented using simple arrays rather than classes, which to me is desirable for this kind of format.
> >>
> >> If one wants to look at the associated hits of the tracks, or something similarly detailed, then it seems to me that it would be better to use the #1 and #2 approaches, as we can then avoid "reinventing the wheel" by making ROOT files that mimic the structure of the existing LCIO output.  This approach would require working from the LCIO output, but I really don't see a problem there.  It is not onerous at all.  The API is straightforward and well-documented, and examples can be provided.  There is already a simple analysis script in my examples that you linked which plots information from Tracks in an LCIO file using ROOT histogramming.  Similar plots could easily be made for the hits, etc.
> >>
> >> I suppose one could demand that all this data be put into ROOT including the hits, but you're left with the same problem.  Someone still has to learn the API of whatever classes are used to store the data, and the class headers also need to be loaded to interpret the data.  Whether that format is LCIO or ROOT, it is essentially the same level of knowledge that would be required.  My feeling is actually that this will be more difficult/cumbersome to work with in ROOT rather than LCIO.  I wonder why we can't just go with what we already have, e.g. the LCIO API, rather than invent something analogous which does not seem to serve a very clear purpose.  One can already use what's there in the linked example to look at the full events, so can we start there and see how far we get?
> >>
> >> If someone has a clear use case where pure ROOT data is needed at the lowest level of detail, I would consider this request, but I have seen nothing concrete so far along these lines.
> >>
> >> --Jeremy
> >>
> >> -----Original Message-----
> >> From: Homer [mailto:[log in to unmask]]
> >> Sent: Wednesday, March 06, 2013 2:51 PM
> >> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy I.; Graf, Norman A.; Moreno, Omar; Nelson, Timothy Knight
> >> Subject: DSTs and work on slcio files using C++
> >>
> >> Hi,
> >>
> >> I decided not to comment during the meeting because it might have created more contention and I also wanted to hear Jeremy's, Norman's and Omar's responses first before throwing this out there. That said, from the point of view of someone who has been doing lcsim SiD analysis on slcio files I find the problems with using the two formats in HPS a little strange. For SiD we take slcio files and then run jet clustering and flavor tagging using C++ code in the lcfi and
> >> lcfi+ packages. For the flavor tagging we write out root files for
> >> lcfi+ running the
> >> TMVA training and then for both the jet clustering and the flavor tagging we write out slcio files. I believe Malachi has done his whole analysis in C++ as a Marlin processor. I had also successfully tested reading slcio files in ROOT using a recipe provided by Jeremy. I dropped using it when I realized that it was quite simple to write the analysis in java. Perhaps one solution is to stick to doing all development, even for the DST, in java/lcsim and to just provide examples of how to access the data from C++/ROOT reading slcio files. Jeremy had documented much of this long ago at:
> >>
> >> https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT
> >>
> >> If we just provide some examples, wouldn't that help to at least put out the current fires? This would also avoid having to support numerous extra sets of data (DSTs and microDSTs in both formats with multiple passes and subsets)??
> >> Maybe I'm wrong but I think one can provide simple recipes or modules for accessing any of the slcio file contents in ROOT.
> >>
> >>     Homer
> >>
> >>
> >> ########################################################################
> >> Use REPLY-ALL to reply to list
> >>
> >> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
> >
> > <dst.pdf>
> 
> 


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
June 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use