Print

Print


________________________________

From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of Popescu, Razvan
Sent: Wednesday, April 25, 2007 12:23 PM
To: [log in to unmask]
Subject: [Usatlas-grid-l] notes F&O 4/25



Status (outstanding issues) Web Page:

 

-          Let's try HN forum "Announcements" available at [log in to unmask] and https://www.usatlas.bnl.gov/HyperNews/racf/login.pl?userid=&url=/HyperNews/racf/top.pl

-          Don't forget to subscribe to the forum if you expect to receive postings via email

-          Since the most recent posting are not necessary a good reflection of recent issues (see anticipated events announced well in advance) we're still working on another tool (complement or substitute). To be followed up...

 

Production:

 

-          Overall OK

-          Site isues:

o        BU: Disk timeouts due to I/O overload. Under investigation.

o        GL: No (not enough) production files transferred due to competition w/ calibration data replication flow. Maybe DQ2/FTS parameters must be readjusted - Who should look into this? Hiro, Wensheng, Shawn. The site indicates resources available at all the necessary levels (disk, net, cpu). We should try to crank up the controls and look for signs of bottlenecks. Then, we may consider replication of services. (But what if a certain service throttles work to match available resources? Then, we must know about this feature in advance, otherwise no sign of resource saturation/bottleneck will be found. Tricky!)

 

ACTION ITEM: Hiro, Wensheng - Try to increase the amount of data pushed towards AGL, by releasing throttles on FTS/DQ2. to be followed up.

 

-          Production running steadily at OSCER@OU w/o local DQ2 services. (thanks: Marco, Horst, Paul, Kharthik, et al.) This is a non-USATLAS prototype site using uberftp, driven by PANDA, to access data at a "regional" T2 storage site (OU in this example). Encouraging results.

-          Overall efficiency of production ~50% (1400-1500jobs/day out of a target of 3000/day). Multiple causes, site related problems.

 

 

Data Management:

 

-          Wensheng: storage endpoint "selector" patch available. Who wants to try it should feel free to do so.

-           

ACTION ITEM:  Wensheng, please send the instructions to the mail list.

ACTION ITEM: Razvan, Hiro et al - clarify if the addition of storage endpoints require changes of FTS channels or DQ2 services

 

 

OSG 0.6

 

-          Deployed in production at AGL and for testing at UC and UTA. All indications are good.

-          Green light to a rolling deployment in production. Sites will pick their own deployment schedules and have a recovery plan.

 

OSG-EGEE Interoperability:

 

-          A recent example of ATLAS submitted client getting out of control due to incorrect assumptions about the computing environment. Who is supposed to correct it: the client must check for non-LCG compliant environment or USATLAS sites should add all that's necessary to become compliant?

-          John Hover will participate to the Thu weekly ATLAS Production meetings to provide the needed communication channel between the two (federated) domains.

-          In principle ATLAS requirements are fed into OSG plans to make the OSG env compatible with ATLAS needs. Between iterations, temporary solutions could be employed.

 

Site updates:

 

-          MW: MySQL upgrade to v5. Works OK. Filename longer than 250ch pb. resolved. Teraport work in progress. MWT2_UC back online - not all slots full yet.

-          BU: new person on board - will do DQ2 related work.

-          UTA: shutdown of SWT2 today

-          OU: ITB cluster (v0.6) almost up. OSCER good throughput.

-          SLAC: storage almost online (work on OS installation; infrastructure complete). Water cooling work ahead of schedule, might advance CPU upgrade to late June.