Some random thoughts on randomness…

1) I am fairly well convinced that the MC tracking digitization code can be made deterministic with current configuration options that allow turning off noise in the readout chip.

As Tim (who wrote the code) states:

"The charge deposition model we use is entirely deterministic. All variation is added before (in Geant energy deposition modeling) or after (in readout electronics simulation)."

I found the following Driver in the tracking recon which uses a NoiselessReadoutChip:

org.hps.recon.tracking.SimpleTrackerDigiDriver

(Is that the right Driver for this, by the way? Should I use instead SimpleSVTReadout?)

When I use this Driver in the reconstruction chain, instead of the standard one that includes noise, the digitization always produces the same number of raw tracker hits. So that config seems to remove randomness from the digitization. or at least in the number of hits produced.

In looking at the digitization code itself, the only place I saw use of the Java Random class was in the ReadoutChip classes, so I think we’re doing fine here as long as the NoiselessReadoutChip or some other noiseless model is used instead of the classes that include the simulated noise.

2) The ECAL readout simulation seems to have some slight variation in its output in terms of number of triggered events produced. I ran it over 2.5 million “merged" input MC events generated with SLIC and sometimes I would get 944 triggered events and other times 945. As Sho explained in the software meeting, this is perhaps expected at some level due to use of a “tiebreaker” algorithm. I don’t think this part of the data chain is so important to have perfectly repeatable, and this constitutes a pretty small amount of variation. Maybe we can live with this as is?

3) The tracking recon output itself does seem to vary in a way that is not desirable. Running over 945 input physics events that were generated with the trigger simulation, I see plus or minus 1 to 3 tracks in the total number of reconstructed tracks (which is something like 1768 or so tracks as the mean though I haven’t calculated it exactly). I believe this part of the event processing should ideally be perfectly repeatable. My first experiment was to replace all uses of HashMap in lcsim/tracking/ with LinkedHashMap and then re-run with this update, but I still see this slight amount of variation. I believe this should be investigated more deeply. Perhaps someone knows exactly which maps are being used for iteration in the tracking?

4) The detector simulation in SLIC should always be repeatable by setting the random seed through the command line option (type ‘slic’ to get command options). By default, the random engine is seeded with the current time in milliseconds, so in the default configuration results are not repeated.

On Sep 18, 2014, at 9:37 AM, Sho Uemura <[log in to unmask]> wrote:

On Thu, 18 Sep 2014, McCormick, Jeremy I. wrote:

Hi,

I understand there is built-in random noise in the ReadoutChip classes within the digitization code, so that?s one know source of (intended) variation. It would be nice if the random seeds for these objects were somehow externally settable so that results could be made deterministic if required. For testing the MC recon, I got around this by using a digi Driver that has a NoiselessReadoutChip. That seemed to reduce the variation somewhat.

SimpleSVTReadout doesn't use ReadoutChip.

As for the other point, I do understand there will be some variation of results between runs, due to iterating over hash map value lists, which are determined by object memory ordering that changes each time you run the recon. I?ve actually done a little bit of hacking to try and replace all usages of HashMap with LinkedHashMap in the tracking in lcsim. But I still get slight differences on the level of 1 or 2 more or fewer tracks on the order of every 1000 recon events.

This concerns me, because it means at some level that our MC results are not deterministically repeatable, and it also means that writing meaningful, integrated test cases that use the MC recon is basically impossible.

It means you can't do tests that assert exact event counts or hit counts. Those tests are good for testing that sim/recon are unchanged, but they don't actually test that they're doing the correct things. You can still do tests that assert parameters of distributions (means, resolutions) or put ranges on numbers (e.g. don't assert n=1713, assert n in the range 1713 +/- sqrt(1713)). Those are more meaningful for testing that you're getting good results (the couple of tests Pelle wrote are of this type), but they can only be written by someone who knows what the distributions should look like and what the key performance indicators are.

This is not the case for the ECAL reconstruction, which always gives the same number of digitized hits and clusters when noise is turned off.

On a related note, the ECAL readout simulation itself still has some amount of variation in the number of events it writes, even when noise is turned off. For instance, across multiple runs on 2.5 million input events, the output would vary (seemingly randomly) between 944 and 945 triggered events.

Do you think this stuff is worth looking into? Is having the readout sim and recon be perfectly repeatable a desirable or necessary goal?

I don't need it to be. You should decide if you do.

BTW, I noticed this stuff when writing a test case where I tried to assert that X number of tracks should be found in a given recon run on a known input file. This is basically impossible because the number of tracks always varies slightly. I also wanted to write a chaining test to run the readout and then the recon, and again, it wasn?t possible because the number of output events from the trigger simulation was varying slightly between each run.

?Jeremy

On Sep 18, 2014, at 8:38 AM, Sho Uemura <[log in to unmask]> wrote:

SimpleSvtReadout: <addNoise>false</addNoise>

Ordering of hits in collections is probably going to be random since we're iterating over HashMaps.

I think CDFSiSensorSim is deterministic but I don't know for sure. Ask Tim.

Can't guarantee there isn't anything else. This is not something we've been careful about.

On Thu, 18 Sep 2014, McCormick, Jeremy I. wrote:

If I turn off noise in the FADC Driver, is there any other source of randomness in the readout simulation that would prevent event processing from being exactly repeatable?

If so, how can I turn it off?