Print

Print


ATLAS SCCS Planning 26July2006
------------------------------

  SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#

Present: Steffen, Randy, Stephen, Charlie, Len, JohnB, Chuck, Shirley, Wei

Agenda:

1. DQ2 Status/Web Proxy

    There will be a meeting next week about the web server
    approval. There are probably a long list of questions that are not
    well defined. John will try to get a list prior to the meeting.

    The typical clients are the SLAC batch system to get input data and
    put output data. Other major clients are BNL to transfer the data
    out and folk who run production to monitor it.

    Who will complain if it doesn't work? At the moment suspect it will
    be BNL folk.

    Other questions, where does code come from, who updates it for
    security updates? What data it is serving?

2. Trigger Farm Status

    The power will not be available before September, probably more
    like October. Will install in IR2 for now.

3. ATLAS Oracle Server

    Steffen has suggested a name and that it is okay to put it in the
    IFZ. Not sure where this falls in the priority list of John's
    queues. Has power for 128 BaBar machines and after that 64. 24 of
    those are for SSRL and 25 are GLAST, 8 for RD.

4. Slots for ATLAS Production jobs and other batch related stuff

    Might be useful if we can increase the number of jobs. Perhaps
    should make a dedicated queue for this. All production jobs run as
    the same user so having a per-user limit isn't useful in
    general. Will add it to the 45 queues we have...

    Can we add more resources to the new queue? If we look at all the
    public and BaBar jobs slots on 32% are used. Typically it has been
    less than 50%. GLAST is getting ready for two data changes (dealing
    with live data from the LAT) starting in August. Is there a forum
    to discuss sharing resources? Neal and Randy are trying to set
    something up.

    Also need some ability to do other simulation for local
    users. Should we continue to use the generic queues or create an
    ATLAS specific queue? Would prefer to use the general queues with a
    fair share list. Should ask for this to be created. Charlie will
    request this (of Neal and unix-admin).

5. Panda use of /tmp

    Panda jobs have been using /tmp. Some need up to 5GB per job. All
    machines beyond the tori (perhaps not with HyperThreading turned
    on) has enough space. Could think about adding a disk to some
    nomas, not that we normally like to make machines different.

    The ATLAS jobs currently have no way to request this but have asked
    Wei to hack the request ('-R scratch > 5') into the lsf.pm module.

6. DQ2/Data Management Workshop 28/29th September 2006 @ BNL

    SCCS and SLAC's travel are over committed already so need to get
    this on the list as soon as possible.

    Wei is the point person for ATLAS in SCCS. He will be spending most
    of his time on ATLAS. Wei will hopefully go.

7. AOB

    News: We're officially on ATLAS and the Tier-2 was approved. Wei is
    the lead person for handling ATLAS. The majority of his time will
    be worrying about and taking care of ATLAS.

    Need to also get an idea of how much memory is needed for
    jobs. Also want to know about what is required for local
    storage. The latest production release has been delayed to allow
    developers to reduce the memory usage.

    Money for the Tier-2 meant to arrive around the 1st
    September. Would like the hardware in production before the end of
    November, so need to get the door out the door before the end of
    September.

    There is a chunk of the lab budget for air cooling but not for
    water cooling. Which means that the visible cost would be a lot
    larger for water cooling.

Action Items:
-------------

060726 Randy	Talk to Richard about DQ2 Workshop

060726 Charlie	Request ATLAS LSF priority group

060726 Stephen	Find out about maximum memory and local storage per job

060628 Stephen	Arrange phone conference for future 9am meetings
        060704 Done for today, need to redo for future meetings.
        060726 Done. Same number.

060412 Systems  Provide Oracle service for ATLAS Trigger testing (RT 
46089)
        060419 No ticket yet, so nothing done.
        060426 Now have ticket 46089.
        060503 No news.
        060524 Steffen has provided configuration information. Now in 
Chuck's
 	      hands.
        060628 Randy will ask Chuck about status.
        060726 First on list for V240 but not sure when it will
               happen. Will put a T3a on it.

060224 Chuck	Will check on web server request for DQ2 machine
        060301 Waiting for web server request information from Stephen.
        060308 Haven't checked yet; haven't received Stephen's request yet.
        060315 Still not sent Chuck information.
        060405 No update.
        060412 No update.
        060419 No update.
        060426 Resubmitted request as it was lost before.
        060503 Not heard anything for about two weeks.
        060524 Stephen need to update ticket, moving from yakut to DQ2 
machine.
        060628 Request updated now running on atl-dq2, waiting for scan.
        060726 Meeting on Monday about this.

060224 Richard	Discuss ATLAS trigger machines with others in SCCS
        060301 Only limited response from John W was resigned
 	      acceptance... need to work on an actual deployment plan as
 	      there are real issues to be solved.
        060308 John aware and in plans as much as anything is. New
 	      engineer will take over.
        060315 No update.
        060405 No update.
        060412 No update.
        060419 No update.
        060426 No update.
        060503 No update.
        060524 RT 45823. Engineer looking at power availability. On track 
for
 	      August.
        060628 Understand schedule, Randy will make sure John is aware.
        060726 Need to nail down when power will be available. Steffen
        things he can make it happen with existing equipment.