ATLAS SCCS Planning 06Jun2007 ----------------------------- 9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935# Present: Wei, Stephen, Len, JohnB, Chuck, Charlie, Booker, Richard, Randy, Tofigh Agenda: 1. DQ2 Status/Web Proxy Not much to say this week. Noticed that almost every site complained that they don't have enough jobs run. BNL occasionally has a problem with dCache but don't think they've fully fixed it. After they claimed it was fixed have authentication problems with their dCache system and timeouts. Kaushik will raise this issue as lots of CPUs are going idle. Last Friday told SLAC ATLAS folk how to use the new area but not sure if anyone else has used it yet. 2. Tier-2 Hardware Should get some power to the racks in the next day. If we get all the other cabling done should be able to do the software install by next week. Can burn them in on some simulation. So if all goes well we could see them at the end of next week. 3. AOB - Power problems Have had some machines loose power with no explanation. Don't seem to be able to track down what is going on. Not sure if Boris has been told. It could be perhaps correlated with high harmonics. The machines that are effected are a particular row down stairs. A dozen machines that dropped off are not in that row. Boris did see harmonics high elsewhere. It seems this happens when you have a large number of similar machines on a transformer will produce voltage harmonics. We've had these tories for years but only see this recently. If you only have current harmonics it doesn't matter but if it effects voltage it alters the behaviour of other power supplies. This seems more like weather than laminar flow. Do we know if other data centres see this? Might be interesting to do an experiment and publish a paper about this. When raising this at conferences no one else seems to have noticed this. Getting to the point where it would be worthwhile to take 512 machines and do things with the power supply to produce a paper that would be very useful for the world to know. Do have the first harmonics suppressor on order for $30k. This will be the first of five if it is effective. They would not all be installed till next calender year. Need to install the single system and then get some months experience, depending on what is seen. Installing them needs a power outage so it will take some time to schedule that. It seems that 10% or 20% fluctuations from these harmonics cause some power supplies to believe there is a power problem and turn themselves off. Should perhaps write up what we know to get a discussion going somewhere (HEPiX, vendors?). What number is needed to show it? Is 64 enough (PetaCache cluster)? atl-prod01 and 02 always have this problem but 03 and 04 don't. Could use a tori retirement strategy to do this. Would perhaps also use T1 or V20z. About to plug in a bunch of machines, could perhaps use those with a large drum of cable... We should write it down and we can talk to Sun about this. Action Items: ------------- 070530 Wei Reply to Alexei about AOD replication to SLAC 070606 Done. His interpretation wasn't exactly what we wanted. Also noticed lots of AOD coming in without the archive bit set. Wei will email him about this. 070516 Andy Setup meeting about xrootd collaboration 070523 Not done yet. Would like a list from Stephen of who should be there. 070530 Will ask tomorrow at xrootd meeting, Stephen emailed a suggested list of people. 070606 Will drop this and talk to Andy about it. 070502 Stephen Arrange meeting about ATLAS TAG data on PetaCache 070509 Not done yet. 070516 Not done yet. Did talk a little to David Quarrie. 070523 Not done yet. 070530 Not done yet. 070606 Not done yet.