Print

Print


I agree with what Jeremy says  (except I do want the 1d strip clusters in the DST)…as far as the extra sets of data, here is what I imagine (propose?) we have:

---->  raw datasets in evio format; all events (obviously)
---->  reconstructed dataset (includes all info in raw dataset) in lcio format; some loose selection is applied (e.g. at least 1 e+ and 1e- track) so this in not all events…we'll need to develop what this selection is so as not to throw out interesting events.  
---->  what Omar called microDSTs, which I propose are the only DSTs we have…these will be ROOT files and are subsets (both in events and information in them) of the recon files depending on what specific groups/analyses want. 

Now, we need to get a much better idea of what the size of these samples would be before we decide on a path forward…






On Mar 6, 2013, at 3:10 PM, "McCormick, Jeremy I." <[log in to unmask]> wrote:

> Hi, Homer.
> 
> Thanks for the thoughts.
> 
> My view is that user analysis has three possible pathways which make sense to consider:
> 
> 1) Pure Java analysis using lcsim and outputting histograms to AIDA files, viewable in JAS.
> 
> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping over these events, and making histograms from a ROOT script.
> 
> 3) Pure ROOT analysis, operating on a ROOT DST file.
> 
> I don't really think that we need a DST containing all of the information which is already present in the final LCIO recon file.  This level of duplication is not desirable.  Rather, the ROOT DST should contain physics objects only, e.g. the equivalent of LCIO ReconstructedParticles, Tracks, and Clusters, along with event information.  This should be sufficient for doing a pure physics analysis, e.g. good enough for most users.  It is also likely that it could be represented using simple arrays rather than classes, which to me is desirable for this kind of format.
> 
> If one wants to look at the associated hits of the tracks, or something similarly detailed, then it seems to me that it would be better to use the #1 and #2 approaches, as we can then avoid "reinventing the wheel" by making ROOT files that mimic the structure of the existing LCIO output.  This approach would require working from the LCIO output, but I really don't see a problem there.  It is not onerous at all.  The API is straightforward and well-documented, and examples can be provided.  There is already a simple analysis script in my examples that you linked which plots information from Tracks in an LCIO file using ROOT histogramming.  Similar plots could easily be made for the hits, etc. 
> 
> I suppose one could demand that all this data be put into ROOT including the hits, but you're left with the same problem.  Someone still has to learn the API of whatever classes are used to store the data, and the class headers also need to be loaded to interpret the data.  Whether that format is LCIO or ROOT, it is essentially the same level of knowledge that would be required.  My feeling is actually that this will be more difficult/cumbersome to work with in ROOT rather than LCIO.  I wonder why we can't just go with what we already have, e.g. the LCIO API, rather than invent something analogous which does not seem to serve a very clear purpose.  One can already use what's there in the linked example to look at the full events, so can we start there and see how far we get?
> 
> If someone has a clear use case where pure ROOT data is needed at the lowest level of detail, I would consider this request, but I have seen nothing concrete so far along these lines.
> 
> --Jeremy
> 
> -----Original Message-----
> From: Homer [mailto:[log in to unmask]] 
> Sent: Wednesday, March 06, 2013 2:51 PM
> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy I.; Graf, Norman A.; Moreno, Omar; Nelson, Timothy Knight
> Subject: DSTs and work on slcio files using C++
> 
> Hi,
> 
> I decided not to comment during the meeting because it might have created more contention and I also wanted to hear Jeremy's, Norman's and Omar's responses first before throwing this out there. That said, from the point of view of someone who has been doing lcsim SiD analysis on slcio files I find the problems with using the two formats in HPS a little strange. For SiD we take slcio files and then run jet clustering and flavor tagging using C++ code in the lcfi and 
> lcfi+ packages. For the flavor tagging we write out root files for 
> lcfi+ running the
> TMVA training and then for both the jet clustering and the flavor tagging we write out slcio files. I believe Malachi has done his whole analysis in C++ as a Marlin processor. I had also successfully tested reading slcio files in ROOT using a recipe provided by Jeremy. I dropped using it when I realized that it was quite simple to write the analysis in java. Perhaps one solution is to stick to doing all development, even for the DST, in java/lcsim and to just provide examples of how to access the data from C++/ROOT reading slcio files. Jeremy had documented much of this long ago at:
> 
> https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT
> 
> If we just provide some examples, wouldn't that help to at least put out the current fires? This would also avoid having to support numerous extra sets of data (DSTs and microDSTs in both formats with multiple passes and subsets)??
> Maybe I'm wrong but I think one can provide simple recipes or modules for accessing any of the slcio file contents in ROOT.
> 
>    Homer
> 


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1