LISTSERV 16.5 - ATLAS-SCCS-PLANNING-L Archives

------ Forwarded Message
From: Michael Ernst <[log in to unmask]>
Date: Mon, 28 Sep 2009 09:20:09 -0400
To: <[log in to unmask]>
Subject: [Usatlas-grid-l] ATLAS User Analysis Test (UAT) in Oct., 2009

Dear Colleagues,
 Please take note that there is an important ATLAS computing exercise
coming up in the second half of October. ATLAS computing coordination is
in the process of organizing a large scale worldwide end user analysis
test engaging as many users as possible to run their analysis code on
several hundred million events we have produced in the U.S. over the
course of the summer. Important to note is that the test is intended be
an end-to-end exercise and the replication of datasets produced on the
grid to end user work stations and Tier-3 centers as far as they exist
is an integral part.

This exercise will be an important indicator as to the readiness of our
distributed facility for analysis operation, challenging in particular
for the Tier-2 centers. We will be using the time until the start of the
test to get our resources and operations prepared as best as possible.
Forums like our weekly computing meeting on Wednesday and the ones
devoted to specific topics like Data Management will be used to
coordinate the preparatory work and to address specific questions you
may have. Please make sure you can attend these meetings.

Best regards,
Michael
-----Original Message-----
From: Jim Shank [mailto:[log in to unmask]]
Sent: Monday, September 28, 2009 2:14 AM
To: atlas-project-adc-operations
Cc: Fabiola Gianotti; lankford Lankford; Dave Charlton
Subject: User Analysis Test (UAT) in Oct., 2009

Dear ADC ops,
    Below are some details emerging as we firm up plans for the next
analysis test. I suggest we go over this at the next ADC ops meeting.
Regards,
              Jim

User Analysis Test (UAT)
Date: 21-23 Oct. (28-30 Oct. as a backup date)
Coordinator: J. Shank

Scope: to get as many user analysis jobs running over the world-wide
resources. Users are meant to run their normal analysis jobs during this

time. We will have instructions for including (trivially!) a
metric-gathering option for both Ganga and pAthena. We are distributing
a few large datasets that we are encouraging people to run over. Details

of the datasets below.

Note: This is NOT an analysis Jamboree--a phrase we use for more of the
tutorial-type meeting. This is a follow-on test to the STEP09 exercise
and our last one before data taking.

Pretest (starts now): Replication of datasets to as many analysis sites
as possible; setup Hammercloud test to verify sites
Action Item: Kaushik: specify disk space required.

Details of test.
   Users will run analysis jobs on a set of large AOD datasets for the
first 2 days, 3rd day is for copying output to T3's or local disks.
   Participants should have some experience running analysis jobs on the

grid (Ganga or pAthena). This is not the exercise to learn how to use
the grid--read the Physics Analysis Workbook, take (or follow on the
web) tutorials, go to jamborees for this type of introduction.

The plan is to have 5 large containers (say 100M each of AOD) and
distributed two to each T2 and then give the (expert) users each a list
of three to run over (so that no one user had jobs which ran only on a
single T2). We are of course flexible on this and I suspect a more
realistic test would be to give users  a set of 2-3 containers to run on

which are on two T2s that are not necessarily in the same cloud.

In keeping with the requests from physics coordination to not disrupt
users any more than necessary, the proposal is for the users to run
whatever their usual jobs are over these big samples (for those who run
RAW, ESD, and cosmic, we plan to have them distributed as well but not
at the hundred million event scale of course). This has the added
advantage of including potential "problem jobs" that might be missed in
a more controlled test. The plan is to have experienced (power or group)

users run the majority of jobs over AODs and produce ntuples (DPDs). We
will then have users (including less experienced users) dq2_get the
resulting ntuples to their local storage (T3's, campus cluster, or
desktops).

The large datasets are AOD and should not in general require db access,
but this could depend on the the particular user's analysis. The current

plan is to instead provide smaller RAW, ESD, and cosmic data sets which
do need various kinds of db access and replicate them to the various
T2s. Since they are smaller this should not be a major issue.

Action item: Jim Cochran will set up a Twiki with details of the the
large dataset and instructions on how to include hammerCloud-type
metrics.
Action Item: Jim Cochran will identify other datasets that need to be
replicated to analysis sites.
Action Item: Jim Shank and Massimo will push to get more users involved.
Action Item: Massimo and Jim S. will set up a generic UAT Twiki for
this.

Goals:
The aim is  to get a measurement of the "efficiency", time to ntuple,
etc. --essentially the same as hammercloud, but with users actively
involved, including a large amount of file movement with dq2_get.

Large dataset:

Event type in the dataset:
Essentially JF35, which is primarily multijet but with appropriate
amounts of W, Z, J/Psi, DY, ttbar etc., that satisfy the JF35 cut - it
was noted that most W->munus (and Z->mumu as well) will be lost since
the 35 GeV "jet cut" doesn't include muons. One of the initial
containers (27M events) is actually JF17, should still be useful to
include in the test.

The first two containers are:

#  step09.00000011.jetStream_medcut.recon.AOD.a84/
   * jet pt > 35 GeV, estimated total size 14900 GB, 9769 files, 97.69M
events
   * cross section: 75,075 nb, filter efficiency = 0.1385 -> 10,398 nb.
97.69M events -> 9.4pb-1 integrated luminosity
# step09.00000011.jetStream_lowcut.recon.AOD.a84/
   * jet pt > 17 GeV, stimated total size 3674 GB, 2749 files, 27.49M
events
   * cross section: 1,453,600 nb, filter efficiency = 0.0706 -> 102,624
nb. 27.49M events -> 0.26 pb-1 integrated luminosity

These two containers above have been replicated fully to all US Tier
2's.  Nurcan has run validation tests on them.  They are ready for other

users.

   In addition, I have made the following new containers.  Their names
still contain the filter used - this can be easily masked by creating a
new container name, if we want to.  The following container are only
available at BNL - not replicated to Tier 2's yet.  But there is some
overlap in events between the containers below and the containers above
(with step09* names):

groupmc08.105807.JF35_pythia_jet_filter.merge.AOD.e418_a84_t53/
  199.88M events, ~30 TB, ~19 pb-1

groupmc08.105807.JF35_pythia_jet_filter.merge.AOD.e359_a84_t53/
  22.81M events

   Finally, we have some unmerged containers.  It would take a couple of

days to merge them, if we decide to use them.  Some (small fraction) of
these are also in the step09* containers:

groupmc08.105807.JF35_pythia_jet_filter.recon.AOD.e449_a84/
  109.85M events

groupmc08.105802.JF17_pythia_jet_filter.recon.AOD.e347_a84/
  99.94M events

   So, total is ~500M, but not all of the containers fully
non-overlapping or consistent naming yet.

Action Item: Kaushik will   merge and make new containers.

_______________________________________________
Usatlas-grid-l mailing list
[log in to unmask]
https://lists.bnl.gov/mailman/listinfo/usatlas-grid-l

------ End of Forwarded Message