LISTSERV mailing list manager LISTSERV 16.5

Help for HPS-SOFTWARE Archives


HPS-SOFTWARE Archives

HPS-SOFTWARE Archives


HPS-SOFTWARE@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

HPS-SOFTWARE Home

HPS-SOFTWARE Home

HPS-SOFTWARE  December 2014

HPS-SOFTWARE December 2014

Subject:

Re: FW: cleaning up the ECAL clustering code

From:

"McCormick, Jeremy I." <[log in to unmask]>

Reply-To:

Software for the Heavy Photon Search Experiment <[log in to unmask]>

Date:

Wed, 17 Dec 2014 00:51:51 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (1 lines)

Hi, Kyle.

Thanks for the response. This is a very good discussion, and you make valid points.

I have already written a Driver that can run any clustering algorithm which implements the Clusterer interface, and it is working on several of the simpler algorithms (EcalClusterICBasic and the legacy test run algorithm).

ecal-recon/src/main/java/org/hps/recon/ecal/cluster/ClusterDriver.java

This class implements the common Driver API for clustering and includes...

-Setting up the detector and conditions related references in detectorChanged.

-Setting of input hit and output cluster collection names.

-Creating the clustering algorithm object from a factory class.

-Flag to indicate whether empty cluster collections should be created and added to the event.

-Flag to raise an error if the input hit collection is not found.

-Flag to skip events where no clusters are created.

-Flag to indicate whether clusters should be stored to the event or marked as transient.

-Setting of numerical cuts in a double array that are passed to the Clusterer via the Driver's setCuts method.

-Default initialization of the Clusterer in startOfData.

-Logging to a logger its config and event processing info.

The Driver implements a default process method which will perform clustering according to the above parameters.

The Clusterer API has a createClusters method, as well as methods for setting/getting of numerical cuts by index or string. (You can have an empty array if there are no cuts.)

"This seems to me like it is greatly complicating the clustering code."

What I've described is perhaps more complicated than implementing clustering directly via the Driver API, in that it is another level of abstraction, but it allows for far more flexibility and ease of use, and it is better organized. It allows one to easily swap out a clustering algorithm by changing one argument to a Driver (the Clusterer name), rather than needing to understand exactly what a certain clustering Driver has implemented in its API (e.g. collection names, various flags, etc.). If specialized behavior is needed then a specific Driver can be written to wrap that algorithm. Having a common Driver that can run all clustering algorithms also easily allows debugging to be done in an organized/standard way.

Also, the algorithms are not different in terms of their input and output. They all take a list of hits and turn them into a collection of clusters. I have also included a reference to the event in the API method as this might be useful/required by some algorithms. All the algorithms we have now also include cuts. So to my mind it useful to have an API that actually implements this "contract". And the clustering algorithms themselves should not need to know all the Driver related parameters that I described above which can be commonly configured.

"This same argument applies to making a utility class. etc."

There are only a few utility methods I pulled out so far, but I think this can be greatly expanded. As far as code duplication, I don't know for sure yet as this requires going over each algorithm in detail. I really would like to pull out the energy corrections code from EcalClusterIC which I've already discussed with Holly a bit (I think this is in progress actually).

--Jeremy

-----Original Message-----
From: Kyle McCarty [mailto:[log in to unmask]]
Sent: Tuesday, December 16, 2014 4:13 PM
To: McCormick, Jeremy I.
Cc: hps-software
Subject: Re: FW: cleaning up the ECAL clustering code

Hello Jeremy,

I wanted to respond to your proposal in JIRA, but the JIRA comments are not really very good for more detailed or long discussions, so I am putting here instead. Just so the software list doesn't have to reference JIRA to know what I am talking about, let me copy your proposal here.


* Move clustering related classes and Drivers to recon.ecal.cluster. (This is already started.)
* Create a Clusterer API which implements only the clustering algorithms. It would not extend Driver. It should have an API for generically setting cuts and their names, so that a Driver can easily configure it from an input array of doubles.
* Convert existing clustering algorithms from Drivers to extension classes of the new Clusterer interface and its abstract implementation.
* Add a generic ClusterDriver that is able to load different clusterer algorithms from a name string.
* Add specific clustering Drivers if they are needed to wrap certain clusterers (e.g. if the default behavior of ClusterDriver needs to be overridden).
* Add a ClusterUtilities class with common static utility methods that are used in many different clusterers.
* Try to remove as much as possible code duplication between the different types of clusterers (GTP, CTP, IC, cosmic, etc.).
* Move cosmic clustering Drivers from analysis.ecal.cosmic to the new ecal-recon clustering package.
* Fully document each of the clustering algorithms, and include relevant links in the javadoc. (Such as links to CLAS or HPS notes, etc.)



I will respond to several of these points below:


* Move clustering related classes and Drivers to recon.ecal.cluster. (This is already started.)

I think that this entirely reasonable. Grouping the clustering code together can make the package less messy.


* Add a ClusterUtilities class with common static utility methods that are used in many different clusterers.
* Try to remove as much as possible code duplication between the different types of clusterers (GTP, CTP, IC, cosmic, etc.).

For this, I think we need have a serious look at the individual clustering algorithms and see how much shared code actually exists between them. I do think that the two GTP algorithms can be abstracted a lot more, and actually intended to do that once I finish the one that is in-progress. However, I'm not sure that there is really a lot of overlap between the remaining clustering algorithms. Obviously they all have "setHitCollectionName" or something along those lines, but what else do they have? Most of them likely have seed energy cuts, but does the cosmic clustering algorithm even have that? Before we start abstracting all clustering algorithms, we will need to compile a list of things that they all have in common and make sure that making an abstract class would actually cut down on the code and not just produce a new class with very few methods. If there is a lot of overlap, I agree that keeping all code centralized is useful because it means that updates or fixes can be propagated to all affected classes without having to manually make sure that they align, but I'm not sure that there is enough overlap for this to work.


This same argument applies to making a utility class. We need to make sure that there is enough overlap to justify it.


* Move cosmic clustering Drivers from analysis.ecal.cosmic to the new ecal-recon clustering package.

This would make sense. All the clustering code should probably be kept together.


* Fully document each of the clustering algorithms, and include relevant links in the JavaDoc. (Such as links to CLAS or HPS notes, etc.)

I always support thorough documentation, so I am all for this. Just let me know if I need to add anything to mine. I agree with Holly's statement on JIRA that it should probably be the code developer's job to actually write the documentation.



I grouped the last few points together.


* Create a Clusterer API which implements only the clustering algorithms. It would not extend Driver. It should have an API for generically setting cuts and their names, so that a Driver can easily configure it from an input array of doubles.
* Convert existing clustering algorithms from Drivers to extension classes of the new Clusterer interface and its abstract implementation.
* Add a generic ClusterDriver that is able to load different clusterer algorithms from a name string.
* Add specific clustering Drivers if they are needed to wrap certain clusterers (e.g. if the default behavior of ClusterDriver needs to be overridden).

This seems to me like it is greatly complicating the clustering code. It creates a totally new API for clustering and tries to fit a bunch of what I see as fairly disparate algorithms and classes into a single box. I feel like this is going to be difficult to accomplish and take a fair amount of work and testing, but lacks an obvious advantage that I can see. Can you explain what the goal/benefit you are aiming for with this? Maybe I am misunderstanding what you are trying to do.


Thanks,

Kyle


On Tue, Dec 16, 2014 at 6:12 PM, McCormick, Jeremy I. <[log in to unmask]> wrote:

Thanks...very useful information!


-----Original Message-----
From: Kyle McCarty [mailto:[log in to unmask]]
Sent: Tuesday, December 16, 2014 2:55 PM
To: McCormick, Jeremy I.
Cc: Holly Vance
Subject: Re: cleaning up the ECAL clustering code

Hello Jeremy,


The two clustering algorithms that are mine are GTPEcalClusterer and GTPOnlineEcalClusterer. These are both implementations of the hardware clustering algorithm that is most current. The GTPEcalClusterer is the original algorithm and is used in the readout simulation to simulate the hardware clustering on Monte Carlo data. The GTPOnlineEcalClusterer is a work-in-progress version that is designed to run on readout data instead. The reason there are two is because the clustering algorithm uses a time window to analyze hits and determine which one falls into a cluster and which do not. For Monte Carlo, we treat each event as a 2 ns window, so the algorithm builds its time buffer of hits by storing events and treating each one as 2 ns. The readout just outputs a large number of hits that were within a certain time window and each individual event does not represent any particular time length. This means that each event must be considered independently and a time buffer must be generated from the hits within the event using their time stamp instead. Since this is a fairly significant difference in a fundamental aspect of the algorithm, I felt that it was not reasonable to try and make one algorithm that worked for both. This is particularly true because the simulation clusterer has already been tested thoroughly and added to the steering files, so changing it drastically now would risk breaking the Monte Carlo simulation.


It might be better, when the online algorithm is finished, to rename them something like "GTPMonteCarloEcalClusterer" and "GTPReadoutEcalClusterer" since these more accurately represent their function, but I was holding off on renaming them until the online algorithm is working. Currently, it can not be completed because it crashes when building clusters due to the fact that "addHit" is HPSEcalCluster uses "getRawEnergy," and as we have been discussing on the mailing list, that is a problem. Once this issue is resolved, the algorithm will be completed and tested. Also, at this point I will see if I can abstract the two drivers at all to cut down on repeated code. I did this already for the trigger drivers, but it is trickier for the clustering.


CTPEcalClusterer is the old clustering algorithm from the last run. I believe it is retained largely for legacy and reference purposes. I do not know if it is reasonable to keep. Perhaps it should be moved to a "test-run" package so that it doesn't clutter up the active code?

All of the "IC" clustering codes are Holly's and she would be able to explain them better than I would.


I do agree that it would be most reasonable to have one cluster object if that is possible, but I am not highly familiar with the regular HPSEcalCluster and only loosely familiar with Holly's version. Perhaps she could offer more insight into whether this is possible?


Let me know if I can help with anything,

Kyle


On Tue, Dec 16, 2014 at 3:28 PM, McCormick, Jeremy I. <[log in to unmask]> wrote:

Hi,

I was looking at cleaning up the ECAL clustering code with some changes to packages etc. Right now it is a bit of a mess, because there is quite a lot of code duplication between algorithms, as well as Drivers that are all doing the same thing (setting basic collection arguments, setting common cuts, etc.)

For more details, see this JIRA item where I have outlined a proposal to clean this up and do a heavy restructuring of the existing code.

https://jira.slac.stanford.edu/browse/HPSJAVA-363

I see in ecal.recon these clustering Drivers...

CTPEcalClusterer
EcalClusterIC
EcalClusterICBasic
GTPEcalClusterer
GTPOnlineClusterer
HPSEcalCluster

Could we get a brief description of each clustering Driver for some basic documentation that I can work from to try and do this? This can go on the JIRA page.

I would also like some information about what are the different types of cuts these are using, a brief description of how the algorithm works, etc.

It is also not clear to me that we need or want so many different clustering engines in our recon. Holly suggests discussing this in detail so we can identify common algorithms, and I agree with this.

Then there are now two types of clusters implemented...

HPSEcalCluster
HPSEcalClusterIC

I think we should be working from one cluster class, not two. So I would propose merging them unless there is some technical reason not to do this.

Long term, I'd like to move everything to the new ecal.cluster sub-package and abandon/deprecate/remove the existing Drivers. (I also have a few cosmic clustering Drivers that I will move to ecal.cluster too.)

If you need to make immediate changes (this week) to clustering code for the reconstruction to work, please just modify/fix the classes in ecal.recon for now. I am very aware we need not break anything with the current data taking and recon steering files, so I am not modifying any of the existing Drivers in place. Meanwhile, I'm working on making a sub-package where things can be reimplemented in a more structured way, including pulling out the core algorithms from the actual Driver classes. As we verify each of the clustering algorithms with tests, we can move to the re-implementation class in the sub-package and then abandon the old Driver.

Any concerns/comments then please send to hps-software or write comments on the JIRA item.

Thanks.

--Jeremy





########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
June 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use