Print

Print


Hi Charlie,

The LSF event log file is hosted by the LSF master. We switched the LSF master host during that time and I did’t notice that. As a result, Grid jobs can continue to come to SLAC but they won’t be able to get their status update (because the Grid can’t access LSF event log). ATLAS jobs are generic pilots and don’t care about this status (pilot update its status with Panda). WLCG submits a probe job to SLAC once a hour, and will check the status. So that is why saw no problem from ATLAS side, but the WLCG reliability and availability test failed for those 6 days. 

regards,
Wei Yang  |  [log in to unmask]  |  650-926-3338(O)



On Jul 19, 2014, at 12:51 AM, Young, Charles C. <[log in to unmask]> wrote:

> Hi Wei, 
> 
> Just to understand the point about LSF scheduler. Are you saying that issues with accessing LSF log files biased the numbers in the report and they are actually better? Or are you saying that LSF scheduler problem led to lower availability? 
> 
> The numbers for June are about 20% down. Translates to 6 days out of the month. I wasn't paying close attention but did not notice jobs not running for a week. Nor did I get complaints from others — someone must have been running batch jobs separate from Tier-2 production. Was the drop-off not a global one but somehow reduced the number of machines available? Cheers.
> 
> Charlie
> --
> Charles C. Young
> M.S. 43, Stanford Linear Accelerator Center      
> P.O. Box 20450                                        
> Stanford, CA 94309                                      
> [log in to unmask]                                
> voice  (650) 926 2669                        
> fax    (650) 926 2923                      
> CERN GSM +41 76 487 2069
> 
> From: <Yang>, Wei Yang <[log in to unmask]>
> Date: Thursday, July 17, 2014 7:33 PM
> To: atlas-sccs-planning-l <[log in to unmask]>
> Subject: Fwd: T2 Reliability & Availability - June 2014
> 
> fyi, We were pretty low in june. We changed the LSF scheduler master host from Solaris to Linux, and ran into subtle issues in accessing LSF log files via NFS. and it all happened when I took a few days of sick leave … there was also a scheduled outage in June.
> 
> Wei Yang  |  [log in to unmask]  |  1-650-926-3338
> 
> 
> 
> 
> Begin forwarded message:
> 
>> From: WLCG Office <[log in to unmask]>
>> Subject: RE: T2 Reliability & Availability - June 2014
>> Date: July 17, 2014 at 8:27:43 AM PDT
>> To: "project-wlcg-cb (Members of the WLCG CB)" <[log in to unmask]>
>> Cc: "project-lcg-gdb (LCG - Grid Deployment Board)" <[log in to unmask]>, "sam-support (SAM support)" <[log in to unmask]>, "[log in to unmask]" <[log in to unmask]>, "[log in to unmask]" <[log in to unmask]>
>> 
>> Dear all,
>> 
>> The final T2 reliability & availability for June 2014 is now available at:
>> 
>> https://espace2013.cern.ch/WLCG-document-repository/ReliabilityAvailability/2014/june-14/  under titles starting with "WLCG_All_Sites..."
>> 
>> The reports take into consideration all re-computation requests received in the last 10 calendar days as described in the re-computation policy. 
>> 
>> Kind regards,
>> Cath
>> 
>> 
>> -----------------------------------------------
>> WLCG Office
>> IT Dept - CERN
>> CH-1211 Genève, Switzerland
>> www.cern.ch/wlcg
>> From: WLCG Office
>> Sent: 02 July 2014 11:02
>> To: project-wlcg-cb (Members of the WLCG CB)
>> Cc: project-lcg-gdb (LCG - Grid Deployment Board); sam-support (SAM support); [log in to unmask]; [log in to unmask]
>> Subject: T2 Reliability & Availability - June 2014
>> 
>> Dear all,
>> 
>> The draft T2 reliability & availability reports for June 2014 are now available at:
>> 
>> http://sam-reports.web.cern.ch/sam-reports/2014/201406/wlcg/ under titles starting with "WLCG_All_Sites..."
>> 
>> Please verify your data and send any comments to WLCG Office by 12 July.
>> 
>> Any requests for recomputation must be submitted via GGUS within the next 10 calendar days; full details here.
>> 
>> Kind regards,
>> Cath
>> 
>> 
>> -----------------------------------------------
>> WLCG Office
>> IT Dept - CERN
>> CH-1211 Genève, Switzerland
>> www.cern.ch/wlcg
> 
> 
> Use REPLY-ALL to reply to list
> To unsubscribe from the ATLAS-SCCS-PLANNING-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=ATLAS-SCCS-PLANNING-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the ATLAS-SCCS-PLANNING-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=ATLAS-SCCS-PLANNING-L&A=1