Hello Jeremy, This is just an update to let you know that the GTPOnlineClusterer driver is now finished and appears to be functioning appropriately. I haven't abstracted it yet (I want to look at trigger diagnostics first, now that it seems the SSP trigger bank exists and I have clusters to work with), but I will within a day or two. Thanks, Kyle On Tue, Dec 16, 2014 at 11:59 PM, Holly Vance <[log in to unmask]> wrote: > > Hi, > > I thought I would mention a comment on this-the energy corrections no > longer exist at all in the offline recon clustering. The only driver where > this gets called is in ReconParticleDriver (which occurs separately from > clustering but uses the particle id in track cluster matching). I am > working to add another driver that could be used after clustering and > separately from the SVT. This would allow someone to run clustering and > then run a cluster analysis driver which could perhaps make a good guess > about particle id in a loose pairs trigger and then apply corrections. > > -Holly > > On Tue, Dec 16, 2014 at 9:42 PM, Kyle McCarty <[log in to unmask]> wrote: > >> Hello Jeremy, >> >> Okay, I think that answers most of my concerns. I will let you know if I >> have any further questions, probably after I have had a chance to really >> look over the actual code. >> >> I do think we should make the wrapper classes for the clustering drivers >> to allow for regular parameter names in the steering files and use this as >> the default, though. Otherwise, we actually make it harder for users to >> figure out what is going on even compared to the inconsistent parameter >> names we have now. Otherwise, I think this should work; we'll just want to >> make sure we do some side-by-side tests with the old driver versions to >> make sure everything is running smoothly. >> >> Thanks, >> >> Kyle >> >> On Tue, Dec 16, 2014 at 9:12 PM, McCormick, Jeremy I. < >> [log in to unmask]> wrote: >>> >>> Wow Kyle, you type fast. :) >>> >>> Let me address these questions individually below... >>> >>> > For the API, are you basically just passing the events to the >>> clustering algorithm and then mandating that they have a means to get a >>> list of cluster objects? >>> >>> The Clusterer classes must implement this method... >>> >>> List<Cluster> createClusters(EventHeader event, List<CalorimeterHit> >>> hits); >>> >>> This returns a list of clusters from the hits and may access/use event >>> data as needed. The ClusterDriver calls this automatically from its >>> process method. >>> >>> > i.e., the API is essentially an extra layer between the event stream >>> and the, previously, diver object? This would alleviate most the issues >>> with trying to standardize the clustering methods, since it would allow >>> them to remain significantly different internally. >>> >>> Yes, this is exactly right. It is an extra layer so that the clustering >>> algorithms can vary independently of the common Driver settings. The >>> Driver, for instance, determines the output collection name, input hit >>> collection nam,e logging, etc. Each of the algorithms can be completely >>> different internally. >>> >>> > Also, how does a driver specify to the API what parameters it needs? >>> >>> There are two ways to do this, one generic that uses the double array, >>> and one specific that uses the Driver API. >>> >>> I have implemented in the Clusterer interface a way to set generic >>> numerical cuts by passing a double array. In this way, the common >>> ClusterDriver can generically pass these to the clustering algorithm >>> without needing specific "set" methods for each one. Internally, the >>> Clusterer may set instance variable values from these cuts in its >>> initialize method, which is called from the Driver's startOfData method. >>> These cuts also have names so you can call for instance setCut("someCut", >>> 1.0) rather than needing to know the numerical indices. >>> >>> The other way is writing a specialized implementation of ClusterDriver, >>> that will directly pass arguments to its Clusterer object. I have done >>> this already for the IC Clusterer, so that can provide an example. It is >>> pretty straightforward, and I don't think it is confusing. These Drivers >>> may also add any additional "set" methods that are needed as well as >>> override/augment any of the standard Driver methods like process, >>> startOfData, etc. >>> >>> > If that is the case, do we still gain much cleanliness of code if we >>> need almost as many wrapper classes as we have drivers? >>> >>> Yes, I think the code is still much cleaner even if there is a wrapper >>> Driver for every Clusterer, because these Driver classes will then only >>> implement required specializations for a particular Clusterer class. This >>> is much better than having every clustering algorithm extend Driver >>> directly, which has already lead to a lot of code duplication and >>> inconsistency between them. >>> >>> > One of my concerns with this is with steering file input. [etc.] >>> >>> I agree that the double array of cuts is perhaps less obvious and more >>> prone to error than having individual "set" methods which are accessed via >>> the steering file. This is a side effect of having the Clusterer not >>> implement Driver, but in practice it can be easily alleviated by writing a >>> simple ClusterDriver speciailization class (as described above) that has >>> these setters on it and passes the values to its Clusterer. This requires >>> writing very little code. >>> >>> > I'm not against this, but it is worth noting that Holly's corrections >>> are for full-sized clusters with shared hits while the other algorithms use >>> the hardware's 3x3 spatial window. The sampling fraction between is also >>> different for this reason. (My plots >>> > suggest it is around 0.86 for the GTP clustering algorithm, though >>> this is extracted from distribution plots and not a formal study.) Also, >>> it's important not to apply energy corrections for offline analysis >>> algorithms to the hardware algorithms, as this would >>> > not accurately represent what the hardware actually does anymore. >>> >>> Okay, thanks for the information here. As far as which sampling numbers >>> are applied in which algorithms, I have not changed any of this core >>> behavior, and all the algorithms I have ported to ecal.cluster are left >>> intact. >>> >>> I still think it would be useful to pull out the energy correction code >>> so that if desired one can easily look at the uncorrected energies. >>> Currently, I don't think it is really possible to do this because the >>> corrections occur inline with the actual clustering. >>> >>> By the way, I'm not looking at the implementations in ecal.cluster as >>> some kind of "set in stone" API. We can certainly change and add features >>> as needed, and your suggestions/questions are quite helpful here. Thanks! >>> >>> --Jeremy >>> >>> -----Original Message----- >>> From: Kyle McCarty [mailto:[log in to unmask]] >>> Sent: Tuesday, December 16, 2014 5:50 PM >>> To: McCormick, Jeremy I. >>> Cc: hps-software >>> Subject: Re: FW: cleaning up the ECAL clustering code >>> >>> Hello Jeremy, >>> >>> >>> The driver sounds good to me. That is a lot of useful options and it >>> would be better to have them centralized. I have a few more questions about >>> the API, but this sounds good to have regardless. >>> >>> For the API, are you basically just passing the events to the clustering >>> algorithm and then mandating that they have a means to get a list of >>> cluster objects? i.e., the API is essentially an extra layer between the >>> event stream and the, previously, diver object? This would alleviate most >>> the issues with trying to standardize the clustering methods, since it >>> would allow them to remain significantly different internally. >>> >>> Also, how does a driver specify to the API what parameters it needs? >>> Each driver, especially the GTP algorithm and Holly's algorithm, will need >>> certain fields that are unique to them. Is accounting for this what you >>> meant to use a wrapper class for? If that is the case, do we still gain >>> much cleanliness of code if we need almost as many wrapper classes as we >>> have drivers? >>> >>> >>> One of my concerns with this is with steering file input. You mentioned >>> using an array of doubles as an input. While this is acceptable internally, >>> since only code experts need to understand it, I think we should avoid >>> using this as the input to a steering file. Non-experts need to make >>> steering files for customized studies or for testing different settings and >>> it is very non-intuitive to input a series of numbers with no indication >>> what those represent. For example, if this were the case with the trigger, >>> we would have input like "0.125 1.300 0.200 1.700 2 0.500 2.000 1.200 1.0 >>> 30" which, while I can understand it because I tend to use the same >>> sequence for trigger cuts, it is completely non-obvious to anyone else. >>> Additionally, it makes user input error a significant concern, since it is >>> difficult to check whether everything is in the right position. I think it >>> is important for the steering file to take parameter entries that are >>> descriptive so anyone looking at the steering file can have some idea what >>> it is they are looking at. >>> >>> >>> >>> I really would like to pull out the energy corrections code from >>> EcalClusterIC which I've already discussed with Holly a bit (I think this >>> is in progress actually). >>> >>> >>> >>> I'm not against this, but it is worth noting that Holly's corrections >>> are for full-sized clusters with shared hits while the other algorithms use >>> the hardware's 3x3 spatial window. The sampling fraction between is also >>> different for this reason. (My plots suggest it is around 0.86 for the GTP >>> clustering algorithm, though this is extracted from distribution plots and >>> not a formal study.) Also, it's important not to apply energy corrections >>> for offline analysis algorithms to the hardware algorithms, as this would >>> not accurately represent what the hardware actually does anymore. >>> >>> >>> Thanks, >>> >>> Kyle >>> >>> >>> On Tue, Dec 16, 2014 at 7:51 PM, McCormick, Jeremy I. < >>> [log in to unmask]> wrote: >>> >>> Hi, Kyle. >>> >>> Thanks for the response. This is a very good discussion, and >>> you make valid points. >>> >>> I have already written a Driver that can run any clustering >>> algorithm which implements the Clusterer interface, and it is working on >>> several of the simpler algorithms (EcalClusterICBasic and the legacy test >>> run algorithm). >>> >>> >>> ecal-recon/src/main/java/org/hps/recon/ecal/cluster/ClusterDriver.java >>> >>> This class implements the common Driver API for clustering and >>> includes... >>> >>> -Setting up the detector and conditions related references in >>> detectorChanged. >>> >>> -Setting of input hit and output cluster collection names. >>> >>> -Creating the clustering algorithm object from a factory class. >>> >>> -Flag to indicate whether empty cluster collections should be >>> created and added to the event. >>> >>> -Flag to raise an error if the input hit collection is not found. >>> >>> -Flag to skip events where no clusters are created. >>> >>> -Flag to indicate whether clusters should be stored to the event >>> or marked as transient. >>> >>> -Setting of numerical cuts in a double array that are passed to >>> the Clusterer via the Driver's setCuts method. >>> >>> -Default initialization of the Clusterer in startOfData. >>> >>> -Logging to a logger its config and event processing info. >>> >>> The Driver implements a default process method which will >>> perform clustering according to the above parameters. >>> >>> The Clusterer API has a createClusters method, as well as >>> methods for setting/getting of numerical cuts by index or string. (You can >>> have an empty array if there are no cuts.) >>> >>> "This seems to me like it is greatly complicating the clustering >>> code." >>> >>> What I've described is perhaps more complicated than >>> implementing clustering directly via the Driver API, in that it is another >>> level of abstraction, but it allows for far more flexibility and ease of >>> use, and it is better organized. It allows one to easily swap out a >>> clustering algorithm by changing one argument to a Driver (the Clusterer >>> name), rather than needing to understand exactly what a certain clustering >>> Driver has implemented in its API (e.g. collection names, various flags, >>> etc.). If specialized behavior is needed then a specific Driver can be >>> written to wrap that algorithm. Having a common Driver that can run all >>> clustering algorithms also easily allows debugging to be done in an >>> organized/standard way. >>> >>> Also, the algorithms are not different in terms of their input >>> and output. They all take a list of hits and turn them into a collection >>> of clusters. I have also included a reference to the event in the API >>> method as this might be useful/required by some algorithms. All the >>> algorithms we have now also include cuts. So to my mind it useful to have >>> an API that actually implements this "contract". And the clustering >>> algorithms themselves should not need to know all the Driver related >>> parameters that I described above which can be commonly configured. >>> >>> "This same argument applies to making a utility class. etc." >>> >>> There are only a few utility methods I pulled out so far, but I >>> think this can be greatly expanded. As far as code duplication, I don't >>> know for sure yet as this requires going over each algorithm in detail. I >>> really would like to pull out the energy corrections code from >>> EcalClusterIC which I've already discussed with Holly a bit (I think this >>> is in progress actually). >>> >>> --Jeremy >>> >>> -----Original Message----- >>> From: Kyle McCarty [mailto:[log in to unmask]] >>> Sent: Tuesday, December 16, 2014 4:13 PM >>> To: McCormick, Jeremy I. >>> Cc: hps-software >>> Subject: Re: FW: cleaning up the ECAL clustering code >>> >>> Hello Jeremy, >>> >>> I wanted to respond to your proposal in JIRA, but the JIRA >>> comments are not really very good for more detailed or long discussions, so >>> I am putting here instead. Just so the software list doesn't have to >>> reference JIRA to know what I am talking about, let me copy your proposal >>> here. >>> >>> >>> * Move clustering related classes and Drivers to >>> recon.ecal.cluster. (This is already started.) >>> * Create a Clusterer API which implements only the >>> clustering algorithms. It would not extend Driver. It should have an API >>> for generically setting cuts and their names, so that a Driver can easily >>> configure it from an input array of doubles. >>> * Convert existing clustering algorithms from >>> Drivers to extension classes of the new Clusterer interface and its >>> abstract implementation. >>> * Add a generic ClusterDriver that is able to load >>> different clusterer algorithms from a name string. >>> * Add specific clustering Drivers if they are >>> needed to wrap certain clusterers (e.g. if the default behavior of >>> ClusterDriver needs to be overridden). >>> * Add a ClusterUtilities class with common static >>> utility methods that are used in many different clusterers. >>> * Try to remove as much as possible code >>> duplication between the different types of clusterers (GTP, CTP, IC, >>> cosmic, etc.). >>> * Move cosmic clustering Drivers from >>> analysis.ecal.cosmic to the new ecal-recon clustering package. >>> * Fully document each of the clustering >>> algorithms, and include relevant links in the javadoc. (Such as links to >>> CLAS or HPS notes, etc.) >>> >>> >>> >>> I will respond to several of these points below: >>> >>> >>> * Move clustering related classes and Drivers to >>> recon.ecal.cluster. (This is already started.) >>> >>> I think that this entirely reasonable. Grouping the clustering >>> code together can make the package less messy. >>> >>> >>> * Add a ClusterUtilities class with common static >>> utility methods that are used in many different clusterers. >>> * Try to remove as much as possible code >>> duplication between the different types of clusterers (GTP, CTP, IC, >>> cosmic, etc.). >>> >>> For this, I think we need have a serious look at the individual >>> clustering algorithms and see how much shared code actually exists between >>> them. I do think that the two GTP algorithms can be abstracted a lot more, >>> and actually intended to do that once I finish the one that is in-progress. >>> However, I'm not sure that there is really a lot of overlap between the >>> remaining clustering algorithms. Obviously they all have >>> "setHitCollectionName" or something along those lines, but what else do >>> they have? Most of them likely have seed energy cuts, but does the cosmic >>> clustering algorithm even have that? Before we start abstracting all >>> clustering algorithms, we will need to compile a list of things that they >>> all have in common and make sure that making an abstract class would >>> actually cut down on the code and not just produce a new class with very >>> few methods. If there is a lot of overlap, I agree that keeping all code >>> centralized is useful because it means that updates or fixes can be >>> propagated to all affected classes without having to manually make sure >>> that they align, but I'm not sure that there is enough overlap for this to >>> work. >>> >>> >>> This same argument applies to making a utility class. We need to >>> make sure that there is enough overlap to justify it. >>> >>> >>> * Move cosmic clustering Drivers from >>> analysis.ecal.cosmic to the new ecal-recon clustering package. >>> >>> This would make sense. All the clustering code should probably >>> be kept together. >>> >>> >>> * Fully document each of the clustering >>> algorithms, and include relevant links in the JavaDoc. (Such as links to >>> CLAS or HPS notes, etc.) >>> >>> I always support thorough documentation, so I am all for this. >>> Just let me know if I need to add anything to mine. I agree with Holly's >>> statement on JIRA that it should probably be the code developer's job to >>> actually write the documentation. >>> >>> >>> >>> I grouped the last few points together. >>> >>> >>> * Create a Clusterer API which implements only the >>> clustering algorithms. It would not extend Driver. It should have an API >>> for generically setting cuts and their names, so that a Driver can easily >>> configure it from an input array of doubles. >>> * Convert existing clustering algorithms from >>> Drivers to extension classes of the new Clusterer interface and its >>> abstract implementation. >>> * Add a generic ClusterDriver that is able to load >>> different clusterer algorithms from a name string. >>> * Add specific clustering Drivers if they are >>> needed to wrap certain clusterers (e.g. if the default behavior of >>> ClusterDriver needs to be overridden). >>> >>> >>> This seems to me like it is greatly complicating the clustering >>> code. It creates a totally new API for clustering and tries to fit a bunch >>> of what I see as fairly disparate algorithms and classes into a single box. >>> I feel like this is going to be difficult to accomplish and take a fair >>> amount of work and testing, but lacks an obvious advantage that I can see. >>> Can you explain what the goal/benefit you are aiming for with this? Maybe I >>> am misunderstanding what you are trying to do. >>> >>> >>> Thanks, >>> >>> Kyle >>> >>> >>> On Tue, Dec 16, 2014 at 6:12 PM, McCormick, Jeremy I. < >>> [log in to unmask]> wrote: >>> >>> Thanks...very useful information! >>> >>> >>> -----Original Message----- >>> From: Kyle McCarty [mailto:[log in to unmask]] >>> Sent: Tuesday, December 16, 2014 2:55 PM >>> To: McCormick, Jeremy I. >>> Cc: Holly Vance >>> Subject: Re: cleaning up the ECAL clustering code >>> >>> Hello Jeremy, >>> >>> >>> The two clustering algorithms that are mine are >>> GTPEcalClusterer and GTPOnlineEcalClusterer. These are both implementations >>> of the hardware clustering algorithm that is most current. The >>> GTPEcalClusterer is the original algorithm and is used in the readout >>> simulation to simulate the hardware clustering on Monte Carlo data. The >>> GTPOnlineEcalClusterer is a work-in-progress version that is designed to >>> run on readout data instead. The reason there are two is because the >>> clustering algorithm uses a time window to analyze hits and determine which >>> one falls into a cluster and which do not. For Monte Carlo, we treat each >>> event as a 2 ns window, so the algorithm builds its time buffer of hits by >>> storing events and treating each one as 2 ns. The readout just outputs a >>> large number of hits that were within a certain time window and each >>> individual event does not represent any particular time length. This means >>> that each event must be considered independently and a time buffer must be >>> generated from the hits within the event using their time stamp instead. >>> Since this is a fairly significant difference in a fundamental aspect of >>> the algorithm, I felt that it was not reasonable to try and make one >>> algorithm that worked for both. This is particularly true because the >>> simulation clusterer has already been tested thoroughly and added to the >>> steering files, so changing it drastically now would risk breaking the >>> Monte Carlo simulation. >>> >>> >>> It might be better, when the online algorithm is >>> finished, to rename them something like "GTPMonteCarloEcalClusterer" and >>> "GTPReadoutEcalClusterer" since these more accurately represent their >>> function, but I was holding off on renaming them until the online algorithm >>> is working. Currently, it can not be completed because it crashes when >>> building clusters due to the fact that "addHit" is HPSEcalCluster uses >>> "getRawEnergy," and as we have been discussing on the mailing list, that is >>> a problem. Once this issue is resolved, the algorithm will be completed and >>> tested. Also, at this point I will see if I can abstract the two drivers at >>> all to cut down on repeated code. I did this already for the trigger >>> drivers, but it is trickier for the clustering. >>> >>> >>> CTPEcalClusterer is the old clustering algorithm from >>> the last run. I believe it is retained largely for legacy and reference >>> purposes. I do not know if it is reasonable to keep. Perhaps it should be >>> moved to a "test-run" package so that it doesn't clutter up the active code? >>> >>> All of the "IC" clustering codes are Holly's and she >>> would be able to explain them better than I would. >>> >>> >>> I do agree that it would be most reasonable to have one >>> cluster object if that is possible, but I am not highly familiar with the >>> regular HPSEcalCluster and only loosely familiar with Holly's version. >>> Perhaps she could offer more insight into whether this is possible? >>> >>> >>> Let me know if I can help with anything, >>> >>> Kyle >>> >>> >>> On Tue, Dec 16, 2014 at 3:28 PM, McCormick, Jeremy I. < >>> [log in to unmask]> wrote: >>> >>> Hi, >>> >>> I was looking at cleaning up the ECAL clustering >>> code with some changes to packages etc. Right now it is a bit of a mess, >>> because there is quite a lot of code duplication between algorithms, as >>> well as Drivers that are all doing the same thing (setting basic collection >>> arguments, setting common cuts, etc.) >>> >>> For more details, see this JIRA item where I >>> have outlined a proposal to clean this up and do a heavy restructuring of >>> the existing code. >>> >>> >>> https://jira.slac.stanford.edu/browse/HPSJAVA-363 >>> >>> I see in ecal.recon these clustering Drivers... >>> >>> CTPEcalClusterer >>> EcalClusterIC >>> EcalClusterICBasic >>> GTPEcalClusterer >>> GTPOnlineClusterer >>> HPSEcalCluster >>> >>> Could we get a brief description of each >>> clustering Driver for some basic documentation that I can work from to try >>> and do this? This can go on the JIRA page. >>> >>> I would also like some information about what >>> are the different types of cuts these are using, a brief description of how >>> the algorithm works, etc. >>> >>> It is also not clear to me that we need or want >>> so many different clustering engines in our recon. Holly suggests >>> discussing this in detail so we can identify common algorithms, and I agree >>> with this. >>> >>> Then there are now two types of clusters >>> implemented... >>> >>> HPSEcalCluster >>> HPSEcalClusterIC >>> >>> I think we should be working from one cluster >>> class, not two. So I would propose merging them unless there is some >>> technical reason not to do this. >>> >>> Long term, I'd like to move everything to the >>> new ecal.cluster sub-package and abandon/deprecate/remove the existing >>> Drivers. (I also have a few cosmic clustering Drivers that I will move to >>> ecal.cluster too.) >>> >>> If you need to make immediate changes (this >>> week) to clustering code for the reconstruction to work, please just >>> modify/fix the classes in ecal.recon for now. I am very aware we need not >>> break anything with the current data taking and recon steering files, so I >>> am not modifying any of the existing Drivers in place. Meanwhile, I'm >>> working on making a sub-package where things can be reimplemented in a more >>> structured way, including pulling out the core algorithms from the actual >>> Driver classes. As we verify each of the clustering algorithms with tests, >>> we can move to the re-implementation class in the sub-package and then >>> abandon the old Driver. >>> >>> Any concerns/comments then please send to >>> hps-software or write comments on the JIRA item. >>> >>> Thanks. >>> >>> --Jeremy >>> >>> >>> >>> >>> >>> >>> >> ------------------------------ >> >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the HPS-SOFTWARE list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1 >> ------------------------------ >> >> Spam >> <https://www.spamtrap.odu.edu/canit/b.php?i=01NsqGPvQ&m=b860de7bcab7&t=20141216&c=s> >> Not spam >> <https://www.spamtrap.odu.edu/canit/b.php?i=01NsqGPvQ&m=b860de7bcab7&t=20141216&c=n> >> Forget previous vote >> <https://www.spamtrap.odu.edu/canit/b.php?i=01NsqGPvQ&m=b860de7bcab7&t=20141216&c=f> >> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the HPS-SOFTWARE list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1