ATLAS SCCS Planning 06Jun2007
-----------------------------
9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#
Present: Wei, Stephen, Len, JohnB, Chuck, Charlie, Booker, Richard,
Randy, Tofigh
Agenda:
1. DQ2 Status/Web Proxy
Not much to say this week. Noticed that almost every site
complained that they don't have enough jobs run. BNL occasionally
has a problem with dCache but don't think they've fully fixed
it. After they claimed it was fixed have authentication problems
with their dCache system and timeouts. Kaushik will raise this
issue as lots of CPUs are going idle.
Last Friday told SLAC ATLAS folk how to use the new area but not
sure if anyone else has used it yet.
2. Tier-2 Hardware
Should get some power to the racks in the next day. If we get all
the other cabling done should be able to do the software install by
next week. Can burn them in on some simulation. So if all goes well
we could see them at the end of next week.
3. AOB
- Power problems
Have had some machines loose power with no explanation. Don't seem
to be able to track down what is going on. Not sure if Boris has
been told. It could be perhaps correlated with high harmonics. The
machines that are effected are a particular row down stairs. A
dozen machines that dropped off are not in that row. Boris did see
harmonics high elsewhere. It seems this happens when you have a
large number of similar machines on a transformer will produce
voltage harmonics. We've had these tories for years but only see
this recently. If you only have current harmonics it doesn't matter
but if it effects voltage it alters the behaviour of other power
supplies. This seems more like weather than laminar flow. Do we
know if other data centres see this? Might be interesting to do an
experiment and publish a paper about this. When raising this at
conferences no one else seems to have noticed this. Getting to the
point where it would be worthwhile to take 512 machines and do
things with the power supply to produce a paper that would be very
useful for the world to know.
Do have the first harmonics suppressor on order for $30k. This will
be the first of five if it is effective. They would not all be
installed till next calender year. Need to install the single
system and then get some months experience, depending on what is
seen. Installing them needs a power outage so it will take some
time to schedule that. It seems that 10% or 20% fluctuations from
these harmonics cause some power supplies to believe there is a
power problem and turn themselves off.
Should perhaps write up what we know to get a discussion going
somewhere (HEPiX, vendors?). What number is needed to show it? Is
64 enough (PetaCache cluster)?
atl-prod01 and 02 always have this problem but 03 and 04 don't.
Could use a tori retirement strategy to do this. Would perhaps also
use T1 or V20z. About to plug in a bunch of machines, could perhaps
use those with a large drum of cable...
We should write it down and we can talk to Sun about this.
Action Items:
-------------
070530 Wei Reply to Alexei about AOD replication to SLAC
070606 Done. His interpretation wasn't exactly what we
wanted. Also noticed lots of AOD coming in without the
archive bit set. Wei will email him about this.
070516 Andy Setup meeting about xrootd collaboration
070523 Not done yet. Would like a list from Stephen of
who should be there.
070530 Will ask tomorrow at xrootd meeting, Stephen emailed
a suggested list of people.
070606 Will drop this and talk to Andy about it.
070502 Stephen Arrange meeting about ATLAS TAG data on PetaCache
070509 Not done yet.
070516 Not done yet. Did talk a little to David Quarrie.
070523 Not done yet.
070530 Not done yet.
070606 Not done yet.
|