docs/pubs/0001-lcdd

lcdd-paper.tex 3133 -> 3134

--- docs/pubs/0001-lcdd/lcdd-paper.tex	2014-05-20 21:59:25 UTC (rev 3133)
+++ docs/pubs/0001-lcdd/lcdd-paper.tex	2014-05-20 23:14:30 UTC (rev 3134)
@@ -24,6 +24,8 @@

 %% $Id: elsarticle-template-num.tex 4 2009-10-24 08:22:58Z rishi $
 %%
 %%

+
+%% target: full-length writeup, shorten as necessary for NIM, CPC, etc.

 \documentclass[preprint,12pt,3p]{elsarticle}
 
 \usepackage{graphicx}

@@ -114,19 +116,7 @@

 %% \[log in to unmask]
 
 \begin{abstract}

-%% geant4 powerful but complex, large and steep learning curve
-%% requires expertise.
-%% want to have a program, not a toolkit
-%% gdml solved some of the problem (e.g. geom) but much remains
-%% e.g. fields, etc.
-%% some applications exist, but are very domain-spcific (e.g. medical) library work and get references.
-Linear Collider Detector Description (LCDD) is an XML format for fully describing complex High Energy Physics experimental detector setups at runtime for the Geant4 simulation engine.
-%% fully defined runtime simulation, not HEP-specific.
-%% parser and data format to replace C++ classes
-%% free the end user from the need to know c++ coding or Geant4 architecture/class specifics
-%% still need to know the Geant4 physics, e.g physics lists, regions, step size...
-%%
-%% target full-length writeup, shorten as necessary for NIM, CPC...

+Geant4 is a powerful framework for simulating the interactions of particles with matter and fields, but it is also large and complex, requiring a considerable amount of expertise to utilize fully. It is advertised as a toolkit rather than an application.  Most users want to have a program they may run rather than being required to create their own application.  This requires that many of the simulation parameters are defined at runtime rather than embedded into source code.  Specifically, in order to remove the requirement of providing customized code to define the experimental setup, the geometry must be defined by a data format rather than via a set of compiled classes.  Some applications exist but tend to be domain specific, such as medical physics simulations for radiotherapy.  GDML provides an XML geometry description, but not the other require information, which is usually called detector description, such as detectors and fields.  W!
 e present here the Linear Collider Detector Des[...]

 \end{abstract}
 
 %% \begin{keyword}

@@ -146,19 +136,14 @@

 %% main text
 \section{Introduction}

-%% the following should be an implementation example

+%% free the end user from the need to know c++ coding or Geant4 architecture/class specifics
+%% still need to know the Geant4 physics, e.g physics lists, regions, step size...
+%%

 
 Geant4 encapsulates the best knowledge of the interaction of particles with material and fields.  It is distributed as a set of source files with compilation instructions, e.g. a toolkit that the user must assemble into a working application based on provided examples and tutorials.  However, this can be a daunting task, as it requires expertise not only in the toolkit but in the details of C++ implementations.  Typically, the end user must implement their own geometry setup using C++ code and instantiate and configure all the necessary components for the simulation, including physics list, user actions, fields, etc.  Freeing the end user from such requirements has been the goal of the LCDD project, in order that they may focus on the physics questions they wish to explore.

-%% FIXME: some duplication with prior paragraph
-In experimental physics, Geant4 has become the de facto software package for modeling complicated geometries and readouts.  It defines a complete set of APIs in the geometry and detector description area.  But this has still required that end users create a C++ implementation for their specific geometry, which requires an expert to code.  There are problems with a code-centric approach to defining geometries for end users, as opposed to a data oriented system that binds to a standard API.  The latter has a number of advantages over the former.

+Individual Monte Carlo programs have typically used custom data formats and APIs for detector description, with complex interfaces that define shapes and readouts.  User applications based on these frameworks implement their detectors using the programmatic interfaces.  Some packages define their own runtime data formats for geometry description.  Historically, the lack of standardization in this area of detector description has hindered data interchangeability between different tools.  When geometry is defined by computer code, usually along with a set of associated run-time parameters, the user’s simulation package tends to greatly increase in size as the number of detector components grows, leading to maintenance issues and a proliferation of code.

-
-
-Linear Collider detector research programs have simulated in detail the response of a number of different detector designs and subdetector technologies.  The Silicon Detector (SiD) collaboration has optimized the design of its full detector concept through many different iterations.  This required the simulation of widely varying geometric layouts and readout schemes and the development of software to support this flexibility.
-
-Individual Monte Carlo programs have typically used custom data formats and APIs for detector description, with complex interfaces that define shapes and readouts.  User applications based on these frameworks implement their detectors using the programmatic interfaces.  Some packages define their own runtime data formats for geometry description.  Historically, the lack of standardization in this area of detector description has hindered data interchangeability between different tools.  When geometry is defined by computer code, usually along with a set of associated run-time parameters, the user’s simulation package tends to greatly increase in size as the number of detector components grows, leading to maintenance issues and code bloat.
-

 \section{GDML}
 
 % topic: GDML/geometry

@@ -167,21 +152,16 @@

 %% how does it answer question? import/export/exchange
 %% geometry is only a part of the project; still need regions, physics limits, fields, visualization, etc.

-\par

 GDML, or Geometry Description Markup Language, is an XML binding to the Geant4 geometry API, based upon work originally done for the LHCb project.  That collaboration’s XML tools were used as the basis for a more generic geometry data format.  In addition to Geant4, there is a GDML binding to the ROOT framework.  This allows users to define their geometries using a data-driven approach rather than having to write C++ source code.  Both frameworks provide services such as locating volumes given 3D points and traversing the geometry hierarchy.

-\par

 GDML can fully describe materials, variables and definitions, geometric solids, and a hierarchical structure of physical and logical volumes.  Materials are defined as either chemical elements or combinations thereof.  Simple constants can be defined, as well as equations that use trigonometric functions.  The support for different types of geometric solids is extensive, including Constructive Solid Geometry (CSG) solids defined algorithmically, and boundary representation (BREP) objects defined by planes forming a closed 3D surface.  A solid plus a material comprises a logical volume that can be used as a template and placed multiple times.  The logical volumes are then placed in a nested hierarchy of solids, comprising the structure of the geometry.  There is a top volume under which any number of child volumes are placed, which in turn may contain other volumes.  Unlike in CAD software, each of these volumes must be bounded, or closed!
 , and volumes may not overlap each other.
 
 \section{LCDD}

-\par
- GDML provides the geometric data binding, but additional information is required to fully describe a detector’s parameters.  Usually, this complete set of data is called “detector description”, of which geometry is only one part.  Other frameworks that use a data centric approach for the geometry have generally still required other supplementary definitions to be included at runtime in order to fully define the detector.  For instance, several experimental frameworks can read geometric XML but then require macros to be executed at runtime to define the readouts.  But there are problems with this approach.  This information is then not easily recoverable or associated with the appropriate geometry components during a later stage of reconstruction or analysis.

+GDML provides the geometric data binding, but additional information is required to fully describe a detector’s parameters.  Usually, this complete set of data is called “detector description”, of which geometry is only one part.  Other frameworks that use a data centric approach for the geometry have generally still required other supplementary definitions to be included at runtime in order to fully define the detector.  For instance, several experimental frameworks can read geometric XML but then require macros to be executed at runtime to define the readouts.  But there are problems with this approach.  This information is then not easily recoverable or associated with the appropriate geometry components during a later stage of reconstruction or analysis.

-\par
- A more thorough approach is needed to guarantee the consistency and integrity of the detector data.  The LCDD format was designed to provide a complete description of a complex physics detector at run-time to a Geant4-based program.  It uses the GDML format for the geometric definitions and structure.  GDML is then extended to include readouts, visualization, magnetic fields, physics limits, and identifier definitions. Various types of detectors, ranging from simple test beams to full “4 PI” detectors, are describable to an arbitrary level of detail, using only an XML file.

+A more thorough approach is needed to guarantee the consistency and integrity of the detector data.  The LCDD format was designed to provide a complete description of a complex physics detector at run-time to a Geant4-based program.  It uses the GDML format for the geometric definitions and structure.  GDML is then extended to include readouts, visualization, magnetic fields, physics limits, and identifier definitions. Various types of detectors, ranging from simple test beams to full “4 PI” detectors, are describable to an arbitrary level of detail, using only an XML file.

-\par

 LCDD is built upon the GDML data format and C++ parser.  It extends GDML’s XML data format by using facilities of the XML Schema (XSD) language.  It reuses the GDML code infrastructure by registering additional element handlers with GDML’s parser.  The extension point is the volume element, which may contain references to LCDD objects, such as sensitive detectors.  One benefit of this approach is that the parser, without any alteration, can also read in plain GDML files.
 
 %% put here a skeleton XML document showing the overall LCDD structure

@@ -193,7 +173,6 @@

 
 %% FIXME: too much technical detail here (appendix?)

-\par

 The volume element in GDML is the only part of the format that is changed in order to connect it to associated LCDD objects.  This is the portion of the XML schema definition that extends this part of GDML.
 
 \begin{verbatim}

@@ -211,14 +190,12 @@

 </xs:extension>
 \end{verbatim}

-\par

 Aside from this addition, the GDML XML format is unchanged and is simply in-lined within its LCDD container.  The LCDD extension classes handle these references.  The volume elements can also be read as plain GDML by parser’s such as the one in ROOT, as long as it skips over these extension elements.
 
 \section{Volume Element}
 
 %% FIXME: move to end after all elements have been described

-\par

 LCDD extends GDML by adding optional elements to the volume element.  A proper GDML parser will simply ignore these unknown tags when processing the file.  There are no other alterations to standard GDML made by LCDD, so the extension is relatively clean.  Deriving a valid GDML file from LCDD is therefore quite straightforward.  The following example assigns to an example volume a sensitive detector, a set of physics limits, a detector region, a set of visualization attributes, and a region.
 
 \begin{verbatim}

@@ -232,15 +209,12 @@

 </volume>
 \end{verbatim}

-\par

 The LCDD objects named in this volume description are actually references to previously defined elements.  For example, the “EcalBarrel” sensitive detector is defined prior to the volume definition, and the parser will retrieve its definition from an in-memory data structure and assign the sensitive detector to the named volume.  A similar strategy is used for the other objects referenced by the extended volume element.
 
 \section{Header Element}

-\par

 Every LCDD file begins with a header that defines basic metadata about the detector.  An author tag gives the names of the people who created the file, as well as an optional email contact.  The detector tag within the header provides information about the detector, including the name, version, and an external URL, possibly containing additional information about the concept.  Also contained in the header is the name of the external program that generated the file.  This includes the name of the generator, its version, a reference to the original file that produced the LCDD, if there was one, and a checksum on the file, produced using an MD5 algorithm.

-\par

 Here is an example of a header from a detector used for SiD detector studies:
 
 \begin{verbatim}

@@ -253,15 +227,12 @@

 </header>
 \end{verbatim}

-\par

 The name of the detector is “sidloi3”.  It was produced by the GeomConverter software package and the author is .  The name of the detector is particularly useful, as it can be used as a key for looking up conditions information within reconstruction programs, as well as tagging output data files with the detector that was used to run the simulation.  And the generator checksum will show whether two LCDD files are essentially the same.
 
 \section{Sensitive Detectors}

-\par

 The configuration and assignment of sensitive detectors to geometric volumes is a primary component of most Geant4 user applications.  The sensitive detectors are used to accumulate positions and energy measurements from particle interactions in the material.  These records can be used to construct hits that are persisted to data files for reconstruction and analysis.  LCDD provides XML bindings to classes that implementation G4VsensitiveDetector.  Each of these models a particular type of detector.  In this scheme, there are three main types of sensitive detectors: trackers, scorers, and calorimeters.

-\par

 All extend a common XML element within the schema, which defines basic settings for every sensitive detector.  Each detector has a name, used to uniquely identify it within the document.  The name is used to connect individual logical volumes with their sensitive detectors.  A sensitive detector also has an associated hits collection where the hits in the sensitive detector will be stored.  Different detectors may write into the same hits collection.  There is a a flag indicates whether or not the detector is an end cap.  An energy cut setting can be used to discard hits that do not reach certain threshold.  There is a verbosity setting to control print screen output from the detector while the simulation is running.  Except for the name of the detector and its corresponding hit collection, which are required, these settings are all optional.  There is also an optional child element on each detector, pointing to the identifier specificat!
 ion that will be used to assign unique identifier[...]
 
 %% describe abstractly concept of detectors

@@ -282,20 +253,16 @@

 </tracker>
 \end{verbatim}

-\par

 The hits from this detector will not be combined, so that every step in the simulation will result in a hit in the output hits collection.  It has no energy cut, so hits of all energies will be recorded.  The endcap flag is turned off.  The IDs of the hits are constructed using the referenced identifier specification, which is defined in another XML section.
 
 \subsection{Scorer}

-\par

 The Scorer type is the simplest of the three sensitive detector implementations.  It records the passage of particles through a volume.  The main difference between the Tracker and the Scorer is that the latter will only record one hit for each unique G4Track that passes through it, whereas the Tracker class records all steps.
 
 \subsection{Calorimeter}

-\par

 The Calorimeter detector type can be used when it would be prohibitively expensive in terms of storage to write out all the individual particle steps like with the trackers.  In general, these are sampling or homogeneous calorimeters with readouts using cells of a certain granularity.  Within these cells, the energies deposited by particles in the detector are added together across each event to create one hit per cell in the output hits collection.  To model the cell segmentation, each calorimeter optionally has an associated segmentation object that performs the readout segmentation on the fly.

-\par

 The following XML defines a calorimeter with uniform sized cells created by a virtual segmentation class.
 
 \begin{verbatim}

@@ -305,23 +272,18 @@

 </calorimeter>
 \end{verbatim}

-\par

 The grid element divides the logical volumes that reference it into cells of that size, in this case 3.5 x 3.5 mm.
 
 \section{Segmentation}

-\par

 Sensitive volumes in a calorimeter detector usually require virtual subdivision in order that energy depositions can be summed and saved to hits with the proper granularity of identifier and position.  In LCDD, the concept of artificially dividing these volumes is called “segmentation.”  For highly segmented calorimeters, such as those found in ILC detector concepts, it would require literally millions of individual cell volumes to model the readouts as built, which could be prohibitive in terms of memory usage, as well as highly complicated to implement in the XML.  There are also cases where simulating a complex readout is more straightforward when performed by an algorithm rather than modeling it geometrically, such as with projective readouts that have many different shapes and sizes of cells.

-\par

 All segmentation elements extend a basic type that has no attributes.  The parameters defining the size of the cells are specific to the sub-type of segmentation.  This element is a child of the calorimeter that will use it, and each calorimeter detector is allowed to have one of these.  The implementation classes define names that can be referenced in identifier descriptions for writing the cell field values at a certain position into the output identifiers of the hits.  These are called bin values because they are similar to histogram bins.
 
 \subsection{Grid XYZ Segmentation}

-\par

 The grid\_xyz is one of the more generic types.  It segments a volume into cells along its X, Y, or Z axes, or any combination thereof, creating a regular Cartesian readout grid.  The bin values of the cells are available as the fields “x”, “y”, and “z” from the identifier specification.  In this type of segmentation, the cell indices are numbered from –N to N, such that no information about the topology or boundaries of the segmented volume is required by the algorithm in order to represent the distance from the origin.

-\par

 The following XML creates a segmentation that divides a volume into 1 x 1 cm cells in X and Y.
 
 \begin{verbatim}

@@ -336,31 +298,26 @@

 
 \subsection{Projective Cylinder Segmentation}

-\par

 The projective\_cylinder element divides a series of nested cylindrical tubes into projective towers.  This is an example of a projective cylinder segmentation that divides the theta and phi regions into 1000 and 2000 bins, respectively.
 
 \begin{verbatim}
 <projective_cylinder ntheta=”1000” nphi=”2000” />
 \end{verbatim}

-\par

 The number of theta and phi segments are specified as parameters.  The phi values segment a 360 degree region, and the theta values cover 0 to 180.  The cell indices indicate segments of these regions, such that 360 bins in phi would map to the cell indices 0 to 359 and correspond to 1 degree each.
 
 \subsection{Non-projective Cylinder Segmentation}

-\par

 A nonprojective\_cylinder segmentation element can divide the surfaces of concentric cylinders into cells of equal size.
 
 \begin{verbatim}
 <nonprojective_cylinder grid_size_phi=”10.0” grid_size_z=”10.0” />
 \end{verbatim}

-\par

 The above segmentation will divide the surface of a cylinder into 10 x 10 mm cells.
 
 \subsection{Projective ZPlane Segmentation}

-\par

 The projective\_zplane segmentation divides an endcap zplane into projective segments.
 
 \begin{verbatim}

@@ -369,7 +326,6 @@

 
 \subsection{Global Grid XYZ Segmentation}

-\par

 The global\_grid\_xyz segmentation divides a global space into regular sized rectalinear cells.
 
 \begin{verbatim}

@@ -378,13 +334,10 @@

 
 \section{Identifiers}

-\par

 Identifiers associate hits from sensitive detectors to their geometric components, as well as cell indices from the segmentation grid, in the case of calorimeters.  Each sensitive detector may have an identifier specification associated with it.  This is used to construct a unique 64-bit ID from physical volume numbers, such as layer number, and segmentation values, like X and Y cell indices.  The user is ultimately responsible for making sure this combination of values uniquely identifies a hit.

-\par

 All of the identifier specifications are contained in an ID dictionary called the iddict.  Each specification has a corresponding element called the idspec.  The idspec elements contain idfield tags that define a single field within the identifier.  These fields can be from 1 to 32 bits and may be signed or unsigned.

-\par

 Below is an example of an identifier for an ILC ECal detector.
 
 \begin{verbatim}

@@ -400,15 +353,12 @@

 </idspec>
 \end{verbatim}

-\par

 The first five fields of this identifier derive from volume identifier numbers.  The “x” and “y” fields are taken from the segmentation values calculated at the hit’s step position during the simulation.  Together, these values identify a unique cell in the ECal and can be used to recalculate the cell’s position during reconstruction and analysis.
 
 \section{Physics Limits}

-\par

 Physics limits can be assigned to volumes in order to control the low level behavior of the simulation.  For instance, for performance purposes, the range cut can be increased to control the simulation time of electromagnetic showers.

-\par

 The following example will restrict the step lengths to 5 mm.
 
 \begin{verbatim}

@@ -419,12 +369,10 @@

 </limits>
 \end{verbatim}

-\par

 The limits that can be set include the maximum step length, the maximum track length, the maximum particle lifetime, the minimum particle kinetic energy, and the minimum range (or “range cut” in Geant4 terminology).
 
 \section{Regions}

-\par

 Regions are assigned to geometric volumes using the region element.  These are collections of volumes that share similar characteristics.  A flag specifies whether or not secondary particles produced in the simulation are stored into the output particle collections.  The following example defines a tracking region in which all secondary particles will be stored into the output.
 
 \begin{verbatim}

@@ -434,12 +382,10 @@

 </regions>
 \end{verbatim}

-\par

 The above example also specifies that particles which would not travel more than 10 millimeters will not be produce, and it also establishes an energy cut of 1 MeV for storing particles to the output.
 
 \section{Magnetic Field}

-\par

 Realistic simulation of magnetic fields is typically an important part of the detector simulation.  There are currently three types of fields available.  When the field regions overlap, the B-field components are added to each other as an overlay.  The solenoid element has an inner and outer field value.  The following is an example of a solenoid with a 5 Tesla magnetic field in Bz.
 
 \begin{verbatim}

@@ -450,12 +396,10 @@

 </fields>
 \end{verbatim}

-\par

 In addition to the solenoid, a dipole field is available that uses a simple polynomial fit with a variable number of coefficients.  An RZ field map can be used to simulate a non-uniform field in the radial and Z directions with a uniform phi.
 
 \section{Visualization}

-\par

 Geant4 has a number of built-in mechanisms for visualizing detectors.  These range from real-time visualization using OpenGL to generation of file formats such as VRML.  LCDD allows one to attach visualization attributes to geometric volumes with an XML element that binds to the G4VisAttributes class.  The supported visualization attributes include line style (broken or unbroken), drawing style (wireframe or solid), whether or not daughter volumes should be visible, and whether the volume itself is visible.  The color of volumes may also be assigned using individual RGB values from 0.0 to 1.0, as well as the alpha.  A 0.0 alpha value would render the component invisible.  The following is an example of visualization attributes.
 
 \begin{verbatim}

@@ -473,15 +417,13 @@

 
 \section{Examples}

-\par
-The SiD collaboration has simulated a number of “4 PI” detector designs.  The current design for its Detector Baseline Document (DBD) is the sidloi3 detector, which is composed of vertex, tracking, and calorimeter sub-systems.  LCDD was used to model and simulate these sub-detectors in a variety of physics scenarios.  This includes an ECAL with several million readout channels as well as a Silicon Vertex Tracker with thousands of tracking modules per sub-detector.

+Linear Collider detector research programs have simulated in detail the response of a number of different detector designs and subdetector technologies.  The Silicon Detector (SiD) collaboration has optimized the design of its full detector concept through many different iterations.  This required the simulation of widely varying geometric layouts and readout schemes and the development of software to support this flexibility.  The current design for its Detector Baseline Document (DBD) is the sidloi3 detector, which is composed of vertex, tracking, and calorimeter sub-systems, as well as support, masks and dead material.  LCDD was used to model and simulate these sub-detectors in a variety of physics scenarios.  This includes an ECAL with several million readout channels as well as a Silicon Vertex Tracker with thousands of tracking modules per sub-detector.

 
 \begin{figure}[h]
 \caption{The SiD Silicon Vertex Tracker Barrel.}
 \includegraphics[width=0.5\textwidth]{sidloi3_tracker_barrel}
 \end{figure}

-\par

 The Heavy Photon Search (HPS) is a direct Dark Matter search.  Its test run detector was simulated using LCDD, in a variety of configurations.
 
 %% This is a placeholder graphic only from an OpenGL screen scrape!

@@ -492,7 +434,6 @@

 
 \section{Conclusion}

-\par

 LCDD is a robust and complete system for modeling detectors using the Geant4 simulation toolkit.  It has been used by a variety of experimental physics collaborations to prototype various detector designs.
 
 %% \begin{verbatim}

[Note: Some over-long lines of diff output only partialy shown]

Commit in `docs/pubs/0001-lcdd` on MAIN
`lcdd-paper.tex`	+10	-69	3133 -> 3134