

ATLAS SCCS Planning 06Jun2007

  9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#

Present: Wei, Stephen, Len, JohnB, Chuck, Charlie, Booker, Richard,
 	 Randy, Tofigh


1. DQ2 Status/Web Proxy

    Not much to say this week. Noticed that almost every site
    complained that they don't have enough jobs run. BNL occasionally
    has a problem with dCache but don't think they've fully fixed
    it. After they claimed it was fixed have authentication problems
    with their dCache system and timeouts. Kaushik will raise this
    issue as lots of CPUs are going idle.

    Last Friday told SLAC ATLAS folk how to use the new area but not
    sure if anyone else has used it yet.

2. Tier-2 Hardware

    Should get some power to the racks in the next day. If we get all
    the other cabling done should be able to do the software install by
    next week. Can burn them in on some simulation. So if all goes well
    we could see them at the end of next week.

3. AOB

    - Power problems

    Have had some machines loose power with no explanation. Don't seem
    to be able to track down what is going on. Not sure if Boris has
    been told. It could be perhaps correlated with high harmonics. The
    machines that are effected are a particular row down stairs. A
    dozen machines that dropped off are not in that row. Boris did see
    harmonics high elsewhere. It seems this happens when you have a
    large number of similar machines on a transformer will produce
    voltage harmonics. We've had these tories for years but only see
    this recently. If you only have current harmonics it doesn't matter
    but if it effects voltage it alters the behaviour of other power
    supplies. This seems more like weather than laminar flow. Do we
    know if other data centres see this? Might be interesting to do an
    experiment and publish a paper about this. When raising this at
    conferences no one else seems to have noticed this. Getting to the
    point where it would be worthwhile to take 512 machines and do
    things with the power supply to produce a paper that would be very
    useful for the world to know.

    Do have the first harmonics suppressor on order for $30k. This will
    be the first of five if it is effective. They would not all be
    installed till next calender year. Need to install the single
    system and then get some months experience, depending on what is
    seen. Installing them needs a power outage so it will take some
    time to schedule that. It seems that 10% or 20% fluctuations from
    these harmonics cause some power supplies to believe there is a
    power problem and turn themselves off.

    Should perhaps write up what we know to get a discussion going
    somewhere (HEPiX, vendors?). What number is needed to show it? Is
    64 enough (PetaCache cluster)?

    atl-prod01 and 02 always have this problem but 03 and 04 don't.

    Could use a tori retirement strategy to do this. Would perhaps also
    use T1 or V20z. About to plug in a bunch of machines, could perhaps
    use those with a large drum of cable...

    We should write it down and we can talk to Sun about this.

Action Items:

070530 Wei	Reply to Alexei about AOD replication to SLAC
        070606 Done. His interpretation wasn't exactly what we
 	      wanted. Also noticed lots of AOD coming in without the
 	      archive bit set. Wei will email him about this.

070516 Andy	Setup meeting about xrootd collaboration
        070523 Not done yet. Would like a list from Stephen of
 	      who should be there.
        070530 Will ask tomorrow at xrootd meeting, Stephen emailed
 	      a suggested list of people.
        070606 Will drop this and talk to Andy about it.

070502 Stephen	Arrange meeting about ATLAS TAG data on PetaCache
        070509 Not done yet.
        070516 Not done yet. Did talk a little to David Quarrie.
        070523 Not done yet.
        070530 Not done yet.
        070606 Not done yet.