Print

Print


Richards corrections below (the list bounced his message).

--
  /------------------------------------+-------------------------\
|Stephen J. Gowdy, SLAC               | CERN     Office: 32-2-A22|
|http://www.slac.stanford.edu/~gowdy/ | CH-1211 Geneva 23        |
|http://calendar.yahoo.com/gowdy      | Switzerland              |
|EMail: [log in to unmask]       | Tel: +41 22 767 5840     |
  \------------------------------------+-------------------------/

---------- Forwarded message ----------
Date: Wed, 16 Aug 2006 20:00:22 -0700
From: "Mount, Richard P." <[log in to unmask]>
To: "Gowdy, Stephen" <[log in to unmask]>,
     atlas-sccs-planning-l <[log in to unmask]>
Subject: RE: Minutes of ATLAS/SCCS Planning Meeting (16th August 2006)

[corrections in square brackets]

-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of Stephen J. Gowdy
Sent: Wednesday, August 16, 2006 9:28 AM
To: atlas-sccs-planning-l
Subject: Minutes of ATLAS/SCCS Planning Meeting (16th August 2006)

ATLAS SCCS Planning 16Aug2006
-----------------------------

   SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#

Present: Booker, Stephen, Wei, Richard, Chuck, Randy, Charlie, Steffen

Agenda:

1. DQ2 Status/Web Proxy

     Tried limit access to bear minimum. Difficult to find out what that
     actually was. Stopped production at SLAC since Monday. Opened up a
     little yesterday (to allow updates from inside the SLAC network).

     Developers have had feedback about security from their
     management. Two issues to do with it. There is the web page with
     the username and passwords. And there is the issue of leaving that
     user as being able to do anything with the mysql database.

     No other feedback yet on the webserver. Pleased with the fact that
     offsite use has been limited. Not sure if it still requires
     permission or not (no offsite access but the administrator can
     change it).

2. Trigger Farm Status

     Nothing to say this week.

3. ATLAS Oracle Server

     The machine is installed (racked and powered) and working with
     Charley to get a network.

4. Slots for ATLAS Production jobs and other batch related stuff

     Neal is back, so he can set it up now.

     iptables hack has been deployed on the batch system. It looks like
     the success rate recently has been near 100%. We could think about
     increasing the number of jobs that can run for production. Not sure
     we want to raise the limit while everyone is out of town, perhaps
     after we get back next week.

5. Tier-2 Web Page

     Wei has agreed to be responsible for the webpage. Richard has had
     an exchange of email with Su Dong and Doug Olsen about what to do
     with grid certificates. Folk should get a SLAC certificate, as
     people at SLAC will know in general who they are. Arla would be the
     sponsor.
[Olsen --> Olson]
[SLAC certificate --> OSG/SLAC certificate]

     There are some issues to do with adding users to the list of users
     being able to use SLAC. It does seem that being added to the ATLAS
     list doesn't actually entitle you to get access, so it is fine that
     this is a different step. For the moment I (or Charlie) can email
     requests for people to be added.

     For normal accounts they should fill in the SLUO form and then fax
     the account request form (and other AUP forms) to Charlie. Not sure
     how we add new place to the list of institutions on the SLUO
     form. We still need to establish who the PIs are. This is probably
     an issue for SLUO. Perhaps should add all US Institutions.

6. AOB

     Tier-2 Advisory Board meeting was help last Monday. There were Jim
     Shank and everyone but Peter Long and Charlie from the proposal and
     many folk from SCCS. Was appropriate not to have Richard as the
     chair. Nominations are accepted up to Monday and then there will be
     an election, some already received. Richard describes our plans to
     get the Tier-2 up before we get any money. Would like to get it
     fully functional by the time of the analysis workshop at LBL in
     October. Talked about the relationship between the top down
     management from ATLAS and the upwards representation from the
     users. The Resource Allocation Board for US ATLAS happens to have a
     person from each Tier-2 so perhaps should have the chair added. The
     current situation is that 80% of the Tier-2 resources so go to
     Simulation Production, that is a little strange as 2/3 of the money
     was directed to go towards storage (which are not used a great deal
     by simulation production). To keep resources flowing into Tier-2s
     it is important to be responsive to local requests. It seems like
     the ability of the Advisory Board to steer decisions will be
     stronger than it might look on paper. As we have no experience of
     running the Tier-2 wanted advice on if the split was okay in the
     proposal between disk and CPU. The current setup was described where
     we have SL machines for building and run RHEL for the backend. Jim
     would really like us to make that work for ATLAS but someone has to
     do the validation. Need to arrange for someone to do the
     validation. Not sure what a validation is currently. Production
     does do some sort of validation before running, so something has
     already been done.

     Harvard Workshop:
  	   Agenda:
  	   http://indico.cern.ch/conferenceDisplay.py?confId=4897

  	   Email [log in to unmask] MAC address for wireless
  	   access.
[help last Monday --> held last Monday]
[Shank and everybody but Peter Long --> Shank representing US ATLAS Management and everybody but Peter Loch]




Action Items:
-------------

060816 Wei	Setup ATLAS/SLAC Webpage

060816 Charlie	Talk to SLUO about adding institutions.

060816 Neal	Setup atlas priority group for LSF

060816 Chuck	Check with Bob about web server approval need

060809 Richard	Come back with person responsible for web site
         060816 Done. Found Wei.

060809 Stephen Ask what dq2user needs to do in MySQL 060816 No good
  	       answer. Limited to dq2user from offsite can only SELECT
  	       from localreplicas. From onsite can do
  	       SELECT,UPDATE,DELETE and INSERT to either localreplicas
  	       or queued_transfers_SLAC. We'll see if that works or
  	       not. Without onsite privileges production stopped.

060802 Stephen	Find out about failing ATLAS jobs at SLAC
         060809 Wei asked if anything useful and there were 9 completely
                successful jobs (but those were tests). We should find
  	      out if the intermediate files are kept and are useful.
         060816 Wei iptables to use the local replica seems to solve the
  	      problem. Done.

060726 Stephen	Find out about maximum memory and local storage per job
         060802 No news yet.
         060809 No news yet.
         060824 Official answer is 2GB memory... biggest is reco just now
  	      at 1.3GB, still to add magnetic field map. Many simulation
  	      jobs around 500MB. If we don't put more than one ATLAS
  	      job on a node should be okay for the near future. Raises
  	      question of how much pressure we feel to put on non-ATLAS
  	      orders to get more than 1GB/core.

060412 Systems  Provide Oracle service for ATLAS Trigger testing (RT
46089)
         060419 No ticket yet, so nothing done.
         060426 Now have ticket 46089.
         060503 No news.
         060524 Steffen has provided configuration information. Now in Chuck's
  	      hands.
         060628 Randy will ask Chuck about status.
         060726 First on list for V240 but not sure when it will
                happen. Will put a T3a on it.
         060802 John checking for rack space.
         060809 Still needs allocated rack space.

060224 Chuck	Will check on web server request for DQ2 machine
         060301 Waiting for web server request information from Stephen.
         060308 Haven't checked yet; haven't received Stephen's request yet.
         060315 Still not sent Chuck information.
         060405 No update.
         060412 No update.
         060419 No update.
         060426 Resubmitted request as it was lost before.
         060503 Not heard anything for about two weeks.
         060524 Stephen need to update ticket, moving from yakut to DQ2 machine.
         060628 Request updated now running on atl-dq2, waiting for scan.
         060726 Meeting on Monday about this.
         060802 Requesting new user to run services, blocked offsite web
  	      access, Need to explore MySQL security.
         060809 New user available, more ideas for MySQL security.
         060816 New action item created due to change of request.

060224 Richard	Discuss ATLAS trigger machines with others in SCCS
         060301 Only limited response from John W was resigned
  	      acceptance... need to work on an actual deployment plan as
  	      there are real issues to be solved.
         060308 John aware and in plans as much as anything is. New
  	      engineer will take over.
         060315 No update.
         060405 No update.
         060412 No update.
         060419 No update.
         060426 No update.
         060503 No update.
         060524 RT 45823. Engineer looking at power availability. On track for
  	      August.
         060628 Understand schedule, Randy will make sure John is aware.
         060726 Need to nail down when power will be available. Steffen
  	      things he can make it happen with existing equipment.
         060802 Looks like this will fit in SCCS. Can reuse rack,
                switches and fibres.
         060809 Everything looking good for this now.