LISTSERV mailing list manager LISTSERV 16.5

Help for HPS-SOFTWARE Archives


HPS-SOFTWARE Archives

HPS-SOFTWARE Archives


HPS-SOFTWARE@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

HPS-SOFTWARE Home

HPS-SOFTWARE Home

HPS-SOFTWARE  March 2013

HPS-SOFTWARE March 2013

Subject:

Re: DSTs and work on slcio files using C++

From:

Stepan Stepanyan <[log in to unmask]>

Reply-To:

Software for the Heavy Photon Search Experiment <[log in to unmask]>

Date:

Thu, 7 Mar 2013 07:21:37 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (197 lines)

Hello Tim and Omar,

I am sure there is some unnecessary "weight" we can lose here.
Will be good to know what exactly final event size is before
proposal deadline.

Thanks, Stepan

On 3/6/13 11:55 PM, Nelson, Timothy Knight wrote:
> If, at the moment, we are keeping all six samples of APV25 readout for each hit, we almost certainly have files that are oversized by a significant factor.
>
> Tim
>
> On Mar 6, 2013, at 8:47 PM, Omar Moreno<[log in to unmask]>  wrote:
>
>> I don't think the issue is with the LCIO overhead I think we are just storing a lot of collections we can do without.  For example, a the moment the final reconstructed LCIO file contains a collection of all raw tracker hits, a collection of all fitted raw tracker hits and collection containing all of the fit parameters from the fit to the six samples of each of the hits.  This is a lot of extra information that we probably don't want to save in the final recon LCIO.
>>
>> I took a quick look at the contents of the reconstructed lcio file and found that all of the noisy channels weren't being filtered which explains the huge file size.  I went ahead and reran the reconstruction but this time filtering noisy channels and the file size dropped to 5GB.  It still a bit large, but I'm sure we can drop the size down quite a bit once we remove all there extra unneeded collections.
>>
>> --Omar Moreno
>>
>>
>> On Wed, Mar 6, 2013 at 7:54 PM, Stepan Stepanyan<[log in to unmask]>  wrote:
>> Hi Omar,
>>
>> Thanks for a quick response. It will be very important to really know
>> what is the final size of the reconstructed event. Number you have is x10
>> larger than the original event size. In the proposal we have x5 inflation
>> of the event size after reconstruction. At the meeting today Matt explained
>> that the number in the proposal was not well motivated, but what you
>> have, it seems like a good motivation. Even with removal of FPGS data
>> I am not sure size will go down by x10, or what we probably want x50.
>> Is this large size due to the overhead of the LCIO format?
>>
>> We are having these discussions about formats and analysis, I think event
>> size will play important role in these discussions. I do not think analysis
>> of data that HPS will get can be done on event that will be x10 or even
>> x5 larger than the original event.
>>
>> Regards, Stepan
>>
>>
>> On 3/6/13 10:18 PM, Omar Moreno wrote:
>>> Stepan,
>>>
>>> The original EVIO file is 1.5 Gb but I only ran reconstruction on half the file.  There is a lot of extra information that is being stored in the final reconstructed LCIO file, such as FPGA Data, that should be removed so I'm sure that the file size is a bit inflated.  I'm sure once we filter out junk events and remove some unnecessary collections the file size will decrease significantly.
>>>
>>> --Omar Moreno
>>>
>>>
>>> On Wed, Mar 6, 2013 at 7:05 PM, Stepan Stepanyan<[log in to unmask]>  wrote:
>>> Omar,
>>>
>>> How big is the original file, before reconstruction.
>>>
>>> Thanks, Stepan
>>>
>>>
>>> On 3/6/13 9:03 PM, Omar Moreno wrote:
>>>> Hello Everyone,
>>>>
>>>> Just to give everyone an idea, a micro DST with basic track information, hit information and Ecal cluster info is approx. 29 Mb/500,000 test run events.  The reconstructed LCIO file used to generate the root file was approx. 5.4 Gigs and it took about 4 minutes.  I expect the size to increase for data from an electron run but it shouldn't be by much.  I'll go ahead and study this using MC data and see how much bigger the file gets.
>>>>
>>>> --Omar Moreno
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 6, 2013 at 4:29 PM, Nelson, Timothy Knight<[log in to unmask]>  wrote:
>>>> Hi Stepan,
>>>>
>>>> I agree 100%.  I think we want exactly what you proposed a year ago; a format with physics objects suitable for physics analysis (the proposed "micro-DST").  This kind of thing is relatively easy to provide and will be a very useful thing to have.  In fact, the kind of "flat ntuple" format that Omar began with can, I believe, be read in and operated on with PAW, since the .rz format is the same.  However, if he goes the next step as has been recommended in the software group, and writes classes to the ROOT file that require a dictionary to read back, the data format will be ROOT only.
>>>>
>>>> A couple of points that are important to understand...
>>>>
>>>> 1) Homer brings up an important point, which is the fact that the only way we have to write these ROOT files is to use the LCIO C++ API.  That is to say, one does the java reconstruction in lcsim that creates LCIO objects and writes out an LCIO file.  Then one runs a separate C++ program that reads in the LCIO objects with the LCIO C++ API and outputs this NTuple using root classes. Therefore, no information that is currently not persisted in the LCIO EDM by our reconstruction will ever be available in the ROOT Ntuple.  So, this business of writing out text files for vertexing and other information not currently being written to LCIO does not go away by creating ROOT Ntuples.  The only way to eliminate that issue is to improve the completeness of our LCIO-based EDM.  For example, Matt has been writing out vertexing information to text files and reading it back                           into ROOT.  However, LCIO DOES include vertex objects and if we created these during reconstruction, we would get that information in the LCIO file automatically, and it would then easily be accessible later on via LCIO.  There are a few examples of data types we might want to persist that don't have an LCIO class, but LCIO includes a "Generic Object" class that can be used to encapsulate anything we might want to add.  Again, only by getting the data we want in LCIO will it ever be accessible in ROOT.  So, in my opinion, this is where we should be focusing our attention.
>>>>
>>>> 2) As far as how to do ROOT-based analysis, Homer again touched on the heart of the matter.  One can create a ROOT Ntuple and perform analysis on that.  In practice, this rarely means using ROOT on the command line, or even CINT macros since ROOT's C interpreter is so badly broken that it is not really usable for anything other than making final plots from already-analyzed data.  In practice, one usually runs some standalone compiled C++ that uses the ROOT libraries to do the analysis on a ROOT DST.  For this reason, it is just as easy to have that compiled C++ use the LCIO C++ API to access the LCIO objects directly from the LCIO DST, and then use all of the familiar ROOT tools in that code to do the analysis, writing out whatever final histograms or post-analysis ntuples one might want in to a ROOT file for later plotting.  The only difference is that in the former scenario, one learns the ROOT EDM that we invent for the DST, and for the latter, one learns the LCIO EDM.  To the extent that one is a mirror reflection of the other, one has to do just as much work writing the C++ analysis code either way.  That is why it doesn't make any sense to duplicate the entire LCIO EDM in ROOT (one file for the price of two!) and why we should really only be considering creation of a new ROOT-based "micro-DST" format aimed at physics analysis that will be much slimmer than the LCIO.  Those that need more than is in the "micro-DST" can very easily run their C++/ROOT analysis code accessing the data directly from LCIO using the LCIO C++ API.
>>>>
>>>> Cheers,
>>>> Tim
>>>>
>>>> On Mar 6, 2013, at 3:49 PM, Stepan Stepanyan<[log in to unmask]>  wrote:
>>>>
>>>>> Hello Homer and Jeremy,
>>>>>
>>>>> It seems we all have right ideas and looks like very similar ideas on
>>>>> how analysis of data must be done.
>>>>> The confusion looks to me comes from definitions of "analysis" and
>>>>> "DST"s. When about a year ago I
>>>>> brought up the question of DSTs, and even sent out possible format
>>>>> (attached document), I basically
>>>>> wanted what Jeremy said in the second sentence after (3), physics
>>>>> objects only. What Omar showed
>>>>> today was very different from what I could describe as DSTs. I
>>>>> understand Matt's point that in some
>>>>> cases you will need fine details, but I am not sure if everyone will
>>>>> need that level of details.
>>>>> So I still think if we are talking about DSTs, the format should be
>>>>> "physics objects only". And if Omar
>>>>> can make use of what I proposed a year ago will be great.
>>>>>
>>>>> As for general analysis, if we stick with (1), than we will make large
>>>>> number of collaborators who are
>>>>> used to do analysis in ROOT quite unhappy. I understand that duplicating
>>>>> processed data in many
>>>>> formats is also not a reasonable approach. So, if (2) means (sorry for
>>>>> my ignorance) we can have some
>>>>> kind of "portal" that can connect LCIO recon file to ROOT, then it is
>>>>> probably the best way to go.
>>>>>
>>>>> Again, sorry if I am misinterpreting the issue and/or repeating what was
>>>>> already clear from your emails.
>>>>>
>>>>> Regards, Stepan
>>>>>
>>>>> On 3/6/13 6:10 PM, McCormick, Jeremy I. wrote:
>>>>>> Hi, Homer.
>>>>>>
>>>>>> Thanks for the thoughts.
>>>>>>
>>>>>> My view is that user analysis has three possible pathways which make sense to consider:
>>>>>>
>>>>>> 1) Pure Java analysis using lcsim and outputting histograms to AIDA files, viewable in JAS.
>>>>>>
>>>>>> 2) LCIO/ROOT analysis, reading in the LCIO recon files, looping over these events, and making histograms from a ROOT script.
>>>>>>
>>>>>> 3) Pure ROOT analysis, operating on a ROOT DST file.
>>>>>>
>>>>>> I don't really think that we need a DST containing all of the information which is already present in the final LCIO recon file.  This level of duplication is not desirable.  Rather, the ROOT DST should contain physics objects only, e.g. the equivalent of LCIO ReconstructedParticles, Tracks, and Clusters, along with event information.  This should be sufficient for doing a pure physics analysis, e.g. good enough for most users.  It is also likely that it could be represented using simple arrays rather than classes, which to me is desirable for this kind of format.
>>>>>>
>>>>>> If one wants to look at the associated hits of the tracks, or something similarly detailed, then it seems to me that it would be better to use the #1 and #2 approaches, as we can then avoid "reinventing the wheel" by making ROOT files that mimic the structure of the existing LCIO output.  This approach would require working from the LCIO output, but I really don't see a problem there.  It is not onerous at all.  The API is straightforward and well-documented, and examples can be provided.  There is already a simple analysis script in my examples that you linked which plots information from Tracks in an LCIO file using ROOT histogramming.  Similar plots could easily be made for the hits, etc.
>>>>>>
>>>>>> I suppose one could demand that all this data be put into ROOT including the hits, but you're left with the same problem.  Someone still has to learn the API of whatever classes are used to store the data, and the class headers also need to be loaded to interpret the data.  Whether that format is LCIO or ROOT, it is essentially the same level of knowledge that would be required.  My feeling is actually that this will be more difficult/cumbersome to work with in ROOT rather than LCIO.  I wonder why we can't just go with what we already have, e.g. the LCIO API, rather than invent something analogous which does not seem to serve a very clear purpose.  One can already use what's there in the linked example to look at the full events, so can we start there and see how far we get?
>>>>>>
>>>>>> If someone has a clear use case where pure ROOT data is needed at the lowest level of detail, I would consider this request, but I have seen nothing concrete so far along these lines.
>>>>>>
>>>>>> --Jeremy
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Homer [mailto:[log in to unmask]]
>>>>>> Sent: Wednesday, March 06, 2013 2:51 PM
>>>>>> To: Jaros, John A.; Graham, Mathew Thomas; McCormick, Jeremy I.; Graf, Norman A.; Moreno, Omar; Nelson, Timothy Knight
>>>>>> Subject: DSTs and work on slcio files using C++
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I decided not to comment during the meeting because it might have created more contention and I also wanted to hear Jeremy's, Norman's and Omar's responses first before throwing this out there. That said, from the point of view of someone who has been doing lcsim SiD analysis on slcio files I find the problems with using the two formats in HPS a little strange. For SiD we take slcio files and then run jet clustering and flavor tagging using C++ code in the lcfi and
>>>>>> lcfi+ packages. For the flavor tagging we write out root files for
>>>>>> lcfi+ running the
>>>>>> TMVA training and then for both the jet clustering and the flavor tagging we write out slcio files. I believe Malachi has done his whole analysis in C++ as a Marlin processor. I had also successfully tested reading slcio files in ROOT using a recipe provided by Jeremy. I dropped using it when I realized that it was quite simple to write the analysis in java. Perhaps one solution is to stick to doing all development, even for the DST, in java/lcsim and to just provide examples of how to access the data from C++/ROOT reading slcio files. Jeremy had documented much of this long ago at:
>>>>>>
>>>>>> https://confluence.slac.stanford.edu/display/hpsg/Loading+LCIO+Files+into+ROOT
>>>>>>
>>>>>> If we just provide some examples, wouldn't that help to at least put out the current fires? This would also avoid having to support numerous extra sets of data (DSTs and microDSTs in both formats with multiple passes and subsets)??
>>>>>> Maybe I'm wrong but I think one can provide simple recipes or modules for accessing any of the slcio file contents in ROOT.
>>>>>>
>>>>>>      Homer
>>>>>>
>>>>>>
>>>>>> ########################################################################
>>>>>> Use REPLY-ALL to reply to list
>>>>>>
>>>>>> To unsubscribe from the HPS-SOFTWARE list, click the following link:
>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>>>> <dst.pdf>
>>>>
>>>>
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the HPS-SOFTWARE list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>>>
>>>
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the HPS-SOFTWARE list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the HPS-SOFTWARE list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
June 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use