LISTSERV mailing list manager LISTSERV 16.5
















By Topic:










By Author:











Monospaced Font








Minutes of ATLAS/SCCS Planning Meeting 23rd Jan 2008


"Stephen J. Gowdy" <[log in to unmask]>


23 Jan 2008 18:58:18 +0100 (CET)Wed, 23 Jan 2008 18:58:18 +0100 (CET)





TEXT/PLAIN (148 lines)

ATLAS SCCS Planning 23Jan2008

  9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#

Present: Stephen, Wei, Randy, Chuck, Peter, Richard, Bob


1. DQ2 Status/Web Proxy

    There is a problem with DQ2. The number of jobs running is much
    higher than at SLAC. This is the third time this problem has been
    seen. They are still working on it. PANDA thinks there are 800 when
    there are really only 500. Over the weekend it thought there was
    more than 1k but was only 700 or 800 hundred.

2. Tier-2 Hardware

    The Tier-2 CPUs are there but not installed yet. All the Dell
    machines are being installed but at the same time there is a
    reduction in the number of machines previously installed. They are
    being moved to the new Black Box. Will be increasing the fair
    shares a little but not to the final number for a month or so. The
    Dell machines are too long to fit in the Black Box, so need to
    deinstall machines from the water cooled racks and moved to the
    Black Box at IR12. The new systems will be put into the water
    cooled racks. While installing the new machines will cable up the
    older machines in the Black Box. Not sure what staff levels will
    be so hard to give concrete schedule.

    Making sure that the storage will be able to be plugged in. Looking
    at what is needed from GLAST and BaBar to put order together. Have
    a fairly reasonable offer in hand. The Thumpers will use 1TB
    disks. Each server will use 4 Gb ports so the storage will use
    almost as much networking ports as the CPUs. If this continues the
    Tier-2 will not lean on the hardware from the rest of the lab but
    it will on the support. As we are buying the same for everyone can
    make priority based decisions on who needs it at any particular
    time. Not evident that there is no where to put them. There is an
    issue with 10gigE Intel cards in the Thumpers reported, but
    shouldn't effect us. Might see some Thumper-like device that is
    really a JBOD with 10gigE on the motherboard.

3. AOB

    - Gatekeeper

    Could probably use a faster machine. We also need more gridFTP
    machines. If we use 3 20Zs that would probably be enough. Not sure
    if multicore machines would help. Best way to get fastest machines
    is to use machines bought for batch. There is nobody's money ear
    marked for this. Was wondered if we could use the BaBar machines
    for ATLAS instead of buying new ones but it was too late. But
    perhaps could sell some of the Dell 1950s if they are not
    needed. For gridFTP not completely sure where the CPU time is
    spent. Might be interesting to find out what other Tier-2s are
    using. Could take some of the opterons machines bought for ATLAS to
    do this. Memory is one of the issues just now, could add more

    Will take some boers consistently for these services. Need four
    machines, taking the last four will take it down to 135.

    - External Security Review

    In late February will be having an external person doing a external
    scan. On the week of March 15th will start trying to do penetration
    from internal. Will start having meetings this afternoon to tighten
    up security at lab. Some examples will be screen savers with
    passwords, running crack on local passwords, etc. This will be a
    pretty tough review, so everyone needs to help. Some extra staff
    time will need to be put on security.

    For ATLAS need to make sure things like the mySQL server is up to

    - Network

    Working up upgrading external network to 10gigE.

    - Job slots on xxl

    At the moment the limit is 62. xlong is about a day and many jobs
    are running longer than that so they need to be moved to xxl. Would
    like to increase the number of slots in xxl to something like
    256. Wei has a graph to saw that 400 jobs had been killed due to
    this. Could also increase the length of xlong queue, try to keep
    that around a day. ATLAS jobs are meant to take around a day. Also
    need to make sure jobs don't run forever. US ATLAS should define an
    amount of CPU time that is needed at the Tier-2 queues. Will
    discuss in email about changing length limits. Will also have a
    meeting at SLAC to see if xxl could accept more jobs.

    Is there one type of machine that kills jobs? BaBar is asking to
    decrease the CPUF on the Dells as they are not performing as well
    as expected.

    Working on a proposal to submit jobs to a single queue
    ("general"). You would specify the CPU time needed. Short jobs
    would be run first but as longer jobs aged they would get higher
    priority to start. This would be useful with jobs start requesting
    multiple cores, could put short jobs on cores will you create a
    free machine.

    - Stephen Leaving

    Leaving ATLAS and SLAC and moving to CERN and CMS. It has been
    getting talked about for a long time but is finally happening in a
    fast fashion. Peter will be taking over this meeting. Stephen &
    Peter need to talk to come up with a transition plan, expect
    Stephen will still run at least the next meeting.

    - SL4 status

    The new machines are being installed with RHEL4. Have had the green
    light from GLAST on moving to RHEL4 and 64 bit. BaBar have been
    running on it but not compiling on it. ATLAS also doesn't compile
    in 64-bit mode.

    - FDR

    All US ATLAS Tier-2 sites are participating. Need to upgrade the
    network. Could perhaps move the switches needed first. Have done
    two already but those were the easy ones to schedule. The next ones
    will not be so easy as they effect more people. Would probably be
    too much work to do them all at the same time. Small steps are
    generally better. Have learnt that running name servers on local
    machines has caused problems, could perhaps stop doing that. Do now
    have enough UPS to support the first module for the 10gigE uplink
    to ESnet.

Action Items:

080123 Stephen Find out if there is or can be a US ATLAS batch limit(s)

|Stephen J. Gowdy, SLAC | CERN Office: 32-2-A22|
| | CH-1211 Geneva 23 |
| | Switzerland |
|EMail: [log in to unmask] | Tel: +41 22 767 5840 |

Top of Message | Previous Page | Permalink

Advanced Options


Log In

Log In

Get Password

Get Password

Search Archives

Search Archives

Subscribe or Unsubscribe

Subscribe or Unsubscribe


September 2016
July 2016
June 2016
May 2016
April 2016
March 2016
November 2015
September 2015
July 2015
June 2015
May 2015
April 2015
February 2015
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
September 2013
August 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006



Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use