Print

Print


Hello Jeremy,

I wanted to respond to your proposal in JIRA, but the JIRA comments are not
really very good for more detailed or long discussions, so I am putting
here instead. Just so the software list doesn't have to reference JIRA to
know what I am talking about, let me copy your proposal here.

>
>    - Move clustering related classes and Drivers to recon.ecal.cluster.
>    (This is already started.)
>    - Create a Clusterer API which implements only the clustering
>    algorithms. It would not extend Driver. It should have an API for
>    generically setting cuts and their names, so that a Driver can easily
>    configure it from an input array of doubles.
>    - Convert existing clustering algorithms from Drivers to extension
>    classes of the new Clusterer interface and its abstract implementation.
>    - Add a generic ClusterDriver that is able to load different clusterer
>    algorithms from a name string.
>    - Add specific clustering Drivers if they are needed to wrap certain
>    clusterers (e.g. if the default behavior of ClusterDriver needs to be
>    overridden).
>    - Add a ClusterUtilities class with common static utility methods that
>    are used in many different clusterers.
>    - Try to remove as much as possible code duplication between the
>    different types of clusterers (GTP, CTP, IC, cosmic, etc.).
>    - Move cosmic clustering Drivers from analysis.ecal.cosmic to the new
>    ecal-recon clustering package.
>    - Fully document each of the clustering algorithms, and include
>    relevant links in the javadoc. (Such as links to CLAS or HPS notes, etc.)
>
>

I will respond to several of these points below:

>
>    - Move clustering related classes and Drivers to recon.ecal.cluster.
>    (This is already started.)
>
> I think that this entirely reasonable. Grouping the clustering code
together can make the package less messy.

>
>    - Add a ClusterUtilities class with common static utility methods that
>    are used in many different clusterers.
>    - Try to remove as much as possible code duplication between the
>    different types of clusterers (GTP, CTP, IC, cosmic, etc.).
>
> For this, I think we need have a serious look at the individual clustering
algorithms and see how much shared code actually exists between them. I do
think that the two GTP algorithms can be abstracted a lot more, and
actually intended to do that once I finish the one that is in-progress.
However, I'm not sure that there is really a lot of overlap between the
remaining clustering algorithms. Obviously they all have
"setHitCollectionName" or something along those lines, but what else do
they have? Most of them likely have seed energy cuts, but does the cosmic
clustering algorithm even have that? Before we start abstracting all
clustering algorithms, we will need to compile a list of things that they
all have in common and make sure that making an abstract class would
actually cut down on the code and not just produce a new class with very
few methods. If there is a lot of overlap, I agree that keeping all code
centralized is useful because it means that updates or fixes can be
propagated to all affected classes without having to manually make sure
that they align, but I'm not sure that there is enough overlap for this to
work.

This same argument applies to making a utility class. We need to make sure
that there is enough overlap to justify it.

>
>    - Move cosmic clustering Drivers from analysis.ecal.cosmic to the new
>    ecal-recon clustering package.
>
> This would make sense. All the clustering code should probably be kept
together.

>
>    - Fully document each of the clustering algorithms, and include
>    relevant links in the JavaDoc. (Such as links to CLAS or HPS notes, etc.)
>
> I always support thorough documentation, so I am all for this. Just let me
know if I need to add anything to mine. I agree with Holly's statement on
JIRA that it should probably be the code developer's job to actually write
the documentation.


I grouped the last few points together.

>
>    - Create a Clusterer API which implements only the clustering
>    algorithms. It would not extend Driver. It should have an API for
>    generically setting cuts and their names, so that a Driver can easily
>    configure it from an input array of doubles.
>    - Convert existing clustering algorithms from Drivers to extension
>    classes of the new Clusterer interface and its abstract implementation.
>    - Add a generic ClusterDriver that is able to load different clusterer
>    algorithms from a name string.
>    - Add specific clustering Drivers if they are needed to wrap certain
>    clusterers (e.g. if the default behavior of ClusterDriver needs to be
>    overridden).
>
> This seems to me like it is greatly complicating the clustering code. It
creates a totally new API for clustering and tries to fit a bunch of what I
see as fairly disparate algorithms and classes into a single box. I feel
like this is going to be difficult to accomplish and take a fair amount of
work and testing, but lacks an obvious advantage that I can see. Can you
explain what the goal/benefit you are aiming for with this? Maybe I am
misunderstanding what you are trying to do.

Thanks,

Kyle

On Tue, Dec 16, 2014 at 6:12 PM, McCormick, Jeremy I. <
[log in to unmask]> wrote:
>
> Thanks...very useful information!
>
> -----Original Message-----
> From: Kyle McCarty [mailto:[log in to unmask]]
> Sent: Tuesday, December 16, 2014 2:55 PM
> To: McCormick, Jeremy I.
> Cc: Holly Vance
> Subject: Re: cleaning up the ECAL clustering code
>
> Hello Jeremy,
>
>
> The two clustering algorithms that are mine are GTPEcalClusterer and
> GTPOnlineEcalClusterer. These are both implementations of the hardware
> clustering algorithm that is most current. The GTPEcalClusterer is the
> original algorithm and is used in the readout simulation to simulate the
> hardware clustering on Monte Carlo data. The GTPOnlineEcalClusterer is a
> work-in-progress version that is designed to run on readout data instead.
> The reason there are two is because the clustering algorithm uses a time
> window to analyze hits and determine which one falls into a cluster and
> which do not. For Monte Carlo, we treat each event as a 2 ns window, so the
> algorithm builds its time buffer of hits by storing events and treating
> each one as 2 ns. The readout just outputs a large number of hits that were
> within a certain time window and each individual event does not represent
> any particular time length. This means that each event must be considered
> independently and a time buffer must be generated from the hits within the
> event using their time stamp instead. Since this is a fairly significant
> difference in a fundamental aspect of the algorithm, I felt that it was not
> reasonable to try and make one algorithm that worked for both. This is
> particularly true because the simulation clusterer has already been tested
> thoroughly and added to the steering files, so changing it drastically now
> would risk breaking the Monte Carlo simulation.
>
>
> It might be better, when the online algorithm is finished, to rename them
> something like "GTPMonteCarloEcalClusterer" and "GTPReadoutEcalClusterer"
> since these more accurately represent their function, but I was holding off
> on renaming them until the online algorithm is working. Currently, it can
> not be completed because it crashes when building clusters due to the fact
> that "addHit" is HPSEcalCluster uses "getRawEnergy," and as we have been
> discussing on the mailing list, that is a problem. Once this issue is
> resolved, the algorithm will be completed and tested. Also, at this point I
> will see if I can abstract the two drivers at all to cut down on repeated
> code. I did this already for the trigger drivers, but it is trickier for
> the clustering.
>
>
> CTPEcalClusterer is the old clustering algorithm from the last run. I
> believe it is retained largely for legacy and reference purposes. I do not
> know if it is reasonable to keep. Perhaps it should be moved to a
> "test-run" package so that it doesn't clutter up the active code?
>
> All of the "IC" clustering codes are Holly's and she would be able to
> explain them better than I would.
>
>
> I do agree that it would be most reasonable to have one cluster object if
> that is possible, but I am not highly familiar with the regular
> HPSEcalCluster and only loosely familiar with Holly's version. Perhaps she
> could offer more insight into whether this is possible?
>
>
> Let me know if I can help with anything,
>
> Kyle
>
>
> On Tue, Dec 16, 2014 at 3:28 PM, McCormick, Jeremy I. <
> [log in to unmask]> wrote:
>
>         Hi,
>
>         I was looking at cleaning up the ECAL clustering code with some
> changes to packages etc.  Right now it is a bit of a mess, because there is
> quite a lot of code duplication between algorithms, as well as Drivers that
> are all doing the same thing (setting basic collection arguments, setting
> common cuts, etc.)
>
>         For more details, see this JIRA item where I have outlined a
> proposal to clean this up and do a heavy restructuring of the existing code.
>
>         https://jira.slac.stanford.edu/browse/HPSJAVA-363
>
>         I see in ecal.recon these clustering Drivers...
>
>         CTPEcalClusterer
>         EcalClusterIC
>         EcalClusterICBasic
>         GTPEcalClusterer
>         GTPOnlineClusterer
>         HPSEcalCluster
>
>         Could we get a brief description of each clustering Driver for
> some basic documentation that I can work from to try and do this?  This can
> go on the JIRA page.
>
>         I would also like some information about what are the different
> types of cuts these are using, a brief description of how the algorithm
> works, etc.
>
>         It is also not clear to me that we need or want so many different
> clustering engines in our recon.  Holly suggests discussing this in detail
> so we can identify common algorithms, and I agree with this.
>
>         Then there are now two types of clusters implemented...
>
>         HPSEcalCluster
>         HPSEcalClusterIC
>
>         I think we should be working from one cluster class, not two.  So
> I would propose merging them unless there is some technical reason not to
> do this.
>
>         Long term, I'd like to move everything to the new ecal.cluster
> sub-package and abandon/deprecate/remove the existing Drivers.  (I also
> have a few cosmic clustering Drivers that I will move to ecal.cluster too.)
>
>         If you need to make immediate changes (this week) to clustering
> code for the reconstruction to work, please just modify/fix the classes in
> ecal.recon for now.  I am very aware we need not break anything with the
> current data taking and recon steering files, so I am not modifying any of
> the existing Drivers in place.  Meanwhile, I'm working on making a
> sub-package where things can be reimplemented in a more structured way,
> including pulling out the core algorithms from the actual Driver classes.
> As we verify each of the clustering algorithms with tests, we can move to
> the re-implementation class in the sub-package and then abandon the old
> Driver.
>
>         Any concerns/comments then please send to hps-software or write
> comments on the JIRA item.
>
>         Thanks.
>
>         --Jeremy
>
>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1