Status (outstanding issues) Web
Page:
-
Let’s try HN forum “Announcements”
available at [log in to unmask] and https://www.usatlas.bnl.gov/HyperNews/racf/login.pl?userid=&url=/HyperNews/racf/top.pl
-
Don’t forget to subscribe to the
forum if you expect to receive postings via email
-
Since the most recent posting are
not necessary a good reflection of recent issues (see anticipated events
announced well in advance) we’re still working on another tool (complement or
substitute). To be followed up…
Production:
-
Overall
OK
-
Site
isues:
o
BU: Disk timeouts due to I/O
overload. Under investigation.
o
GL: No (not enough) production files
transferred due to competition w/ calibration data replication flow. Maybe
DQ2/FTS parameters must be readjusted – Who should look into this? Hiro,
Wensheng, Shawn. The site indicates resources available at all the necessary
levels (disk, net, cpu). We should try to crank up the controls and look for
signs of bottlenecks. Then, we may consider replication of services. (But what
if a certain service throttles work to match available resources? Then, we must
know about this feature in advance, otherwise no sign of resource
saturation/bottleneck will be found. Tricky!)
ACTION ITEM: Hiro, Wensheng – Try to
increase the amount of data pushed towards AGL, by releasing throttles on
FTS/DQ2. to be followed up.
-
Production running steadily at
OSCER@OU w/o local DQ2 services. (thanks: Marco, Horst, Paul, Kharthik, et al.)
This is a non-USATLAS prototype site using uberftp, driven by PANDA, to access
data at a “regional” T2 storage site (OU in this example). Encouraging
results.
-
Overall efficiency of production
~50% (1400-1500jobs/day out of a target of 3000/day). Multiple causes, site
related problems.
Data
Management:
-
Wensheng: storage endpoint
“selector” patch available. Who wants to try it should feel free to do
so.
-
ACTION ITEM: Wensheng, please
send the instructions to the mail list.
ACTION ITEM: Razvan, Hiro et al –
clarify if the addition of storage endpoints require changes of FTS channels or
DQ2 services
OSG 0.6
-
Deployed in production at AGL and
for testing at UC and UTA. All indications are
good.
-
Green light to a rolling deployment
in production. Sites will pick their own deployment schedules and have a
recovery plan.
OSG-EGEE
Interoperability:
-
A recent example of ATLAS submitted
client getting out of control due to incorrect assumptions about the computing
environment. Who is supposed to correct it: the client must check for non-LCG
compliant environment or USATLAS sites should add all that’s necessary to become
compliant?
-
John Hover will participate to the
Thu weekly ATLAS Production meetings to provide the needed communication channel
between the two (federated) domains.
-
In principle ATLAS requirements are
fed into OSG plans to make the OSG env compatible with ATLAS needs. Between
iterations, temporary solutions could be employed.
Site
updates:
-
MW: MySQL upgrade to v5. Works OK.
Filename longer than 250ch pb. resolved. Teraport work in progress. MWT2_UC back
online – not all slots full yet.
-
BU: new person on board – will do
DQ2 related work.
-
UTA: shutdown of SWT2
today
-
OU: ITB cluster (v0.6) almost up.
OSCER good throughput.
-
SLAC: storage almost online (work on
OS installation; infrastructure complete). Water cooling work ahead of schedule,
might advance CPU upgrade to late
June.