LISTSERV 16.5 - ATLAS-SCCS-PLANNING-L Archives

ATLAS SCCS Planning 30Aug2006
-----------------------------

  9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#

Present: Charlie, Wei, Steffen, Stephen, Richard, Gary, Chuck, Gary,
 	 Su Dong,

Agenda:

1. DQ2 Status/Web Proxy

    Going to be a test of data transfers Tier-0 <-> Tier-1 <->
    Tier-2. THe bit that will go to SLAC will only be 70GB. There will
    be a transfer from BNL to SLAC on the 11th/12th and back on the
    13th/14th. We will not have the full network available due to power
    not being available. Shouldn't be an issue for this test.

    Prior to the test we need to make sure that there is enough disk
    space and transfers are currently working. Wei should be around the
    week of the tests. Might be useful if the transition to the
    production account for DQ2 could happen before the test.

2. ATLAS Oracle Server

    Over the time we've been discussing this the importance of Oracle
    for the SLAC tests has gone done. This is partly due to product
    that was developed to allow scaling was based on MySQL. It may be
    in the future that we would need it but it doesn't look likely at
    the moment.

3. Slots for ATLAS Production jobs and other batch related stuff

    Raised the number of jobs to 30. Have had 47 jobs running at the
    same time. No real problems beyond the load on the NFS being higher
    than before, it was using about 25% for 62MB/s. Once we can get the
    jobs running on machines with more local scratch space they will
    not need to use the NFS server.

    The fairshare group is setup (atlas, with one member) but it
    doesn't have any allocation yet. 1000 shares, 10% goes to the
    unwashed masses, 10% to LCD, 30% to one GLAST function and 50% to
    another GLAST function. The GLAST allocation should reduce soon so
    should allocate 10% to ATLAS. Want to have separate controls over
    the allocation to local vrs grid use.

    Could we direct ATLAS Grid jobs to the general queues and other
    Grid jobs to the osgq? Yes. Some of the ATLAS jobs take too long
    for most of the general queues. They do specify a RUNTIME but not a
    CPUTIME.

    Would like to move the ATLAS OSG jobs to a larger number of
    machines but with a lower priority. They should also be on machines
    with large local disk space. Will need to have the grid jobs
    specify how much space they need. Can we tell user analysis jobs
    from real production jobs as they both use the PANDA system? Not
    sure. Might need to be monitored in the future.

4. Validation of ATLAS jobs on RHEL

    Currently ATLAS only runs a validation on each kit. This is done by
    production before running at each site. So SLAC is officially
    validated. There is someone working with DavidQ to try to do a
    event-by-event validation. Currently looking at event
    generators. Should see if SLAC can help out here, it may save us
    some pain the the future.

5. Further report on Boston meeting

    Very noticeable when people presented at the meeting that no one had
    any significant storage even though we were directed towards
    spending 2/3 of the money on storage as were many past successful
    proposals. The current requirement for a Tier-2 is to have 1TB of
    disk. We need to look at this carefully in the future. Expect in
    practise that will be less than this 2/3 until real data starts to
    arrive. To get the 2/3 ratio when that happens may need to start
    the build up somewhat earlier. Due to the specific SLAC setup may
    not deviate too much from the plan as SLAC has a significant amount
    of CPU currently but very little ATLAS disk. There is a
    recognition that the Tier-2s would be managed much like our
    Western Tier-2 will be managed with a strong local representation
    but there is no model on how to achieve it.

6. AOB

    - SLAC ATLAS web page

      Consensus that each Tier-2 will have a similar looking Tier-2 web
      page. They should follow the same template (which hasn't been
      developed yet). Trying to get the content out just now for
      comments. Can later fix it into the correct framework. We should
      let others know that we want to follow the "standard" methods.

    - CERN Users

      May need to be a special case as we do need some people from CERN
      to log on to SLAC.

Action Items:
-------------

060830 Stephen	Talk more to DavidQ about validation.

060830 Richard	Talk to Gregory about "getting" disk

060823 Stephen	Find out what current validation processes exist
        060830 Done.

060823 Wei	Talk to Neal about raising the osgq limit
        060830 Done.

060816 Wei	Setup ATLAS/SLAC Web page
        060823 Wei circulated a not try to bring back comments for next
 	      week.
        060830 First draft up.

060816 Charlie	Talk to SLUO about adding institutions.
        060830 Will take time to converge but will be done.

060816 Neal	Setup atlas priority group for LSF
        060823 Not done yet.
        060830 Group setup, not priority given yet.

060816 Chuck	Check with Bob about web server approval need
        060823 To be done.
        060830 To be done.

060809 Stephen Ask what dq2user needs to do in MySQL
        060816 No good answer. Limited to dq2user from offsite can only
 	      SELECT from localreplicas. From onsite can do
 	      SELECT,UPDATE,DELETE and INSERT to either localreplicas
 	      or queued_transfers_SLAC. We'll see if that works or
 	      not. Without onsite privileges production stopped.
        060823 Sounds like things are working again, but no concrete info.
        060830 Mark this as done.

060412 Systems  Provide Oracle service for ATLAS Trigger testing (RT 
46089)
        060419 No ticket yet, so nothing done.
        060426 Now have ticket 46089.
        060503 No news.
        060524 Steffen has provided configuration information. Now in 
Chuck's
 	      hands.
        060628 Randy will ask Chuck about status.
        060726 First on list for V240 but not sure when it will
               happen. Will put a T3a on it.
        060802 John checking for rack space.
        060809 Still needs allocated rack space.
        060816 Has rack and power, waiting for network.
        060823 Waiting for switch,
        060830 No longer required. "Done."