Print

Print


I will keep you posted as this is hopefully fixed soon...

Begin forwarded message:

From: <[log in to unmask]<mailto:[log in to unmask]>>
Subject: Building 50 machine room cooling issue - surrey04b is down
Date: January 27, 2016 at 1:47:38 PM PST
To: <[log in to unmask]<mailto:[log in to unmask]>>

The following outage has been Modified:

Start:   1/27/2016 8:00 AM
End:     Open

Details: The high temperature issues this morning caused multiple disk failures on
surrey04b.  We are working with the vendor in an effort to recover the data
but there is no estimate at this time...it is possible it could take hours
or more.  The following filesystems are unavailable because of this:

/nfs/slac/g/acctest
/nfs/slac/g/ad
/nfs/slac/g/cd
/nfs/slac/g/cd_mccelog
/nfs/slac/g/cd_physicselog3
/nfs/slac/g/estb
/nfs/slac/g/exo
/nfs/slac/g/exo-software
/nfs/slac/g/facet
/nfs/farm/g/glast/u03
/nfs/farm/g/glast/u11
/nfs/surrey04b/vol/vol1/g.glast.u11
/nfs/slac/g/glast/calibrations
/nfs/slac/g/glast/flight
/nfs/slac/g/glast/fmx000
/nfs/slac/g/glast/ground
/nfs/slac/g/glast/ltx000
/nfs/slac/g/glast/online
/nfs/slac/g/glast/users
/nfs/slac/g/glast/workingGroups
/nfs/slac/g/jas
/nfs/slac/g/lcd/cvs
/nfs/slac/g/lcd/mc
/nfs/slac/g/lcd/simu
/nfs/slac/g/lcd/subversion
/nfs/slac/g/lcd/farm_data
/nfs/slac/g/lcls
/nfs/slac/g/lcls.dev
/nfs/slac/g/lcls.dev.cmlog
/nfs/slac/g/lcls.cmlog
/nfs/slac/g/lcls.physics
/nfs/slac/g/scs/net/cando
/nfs/slac/g/scs/libvirt
/nfs/slac/g/scs/linux_migr
/nfs/slac/g/scs/redhat
/nfs/slac/g/scs/redhat.Images
/nfs/slac/g/scs/scilinux
/nfs/slac/g/svac
/nfs/slac/g/testfac
/nfs/surrey04b/g.testfac
/nfs/slac/g/testfac_asta
/nfs/slac/g/testfac_esa
/nfs/slac/g/testfac_esb
/nfs/slac/g/testfac_log
/nfs/slac/work

We were planning to migrate this data over to gpfs and had started that process
a few weeks ago, so some of these filesystems have been duplicated but are a
few weeks out of date.  These are the only ones that were not yet copied
and so are not available on gpfs:

/nfs/slac/g/glast/calibrations
/nfs/slac/g/glast/flight
/nfs/slac/g/glast/fmx000
/nfs/slac/g/glast/ground
/nfs/slac/g/glast/ltx000
/nfs/slac/g/glast/online
/nfs/slac/g/glast/users
/nfs/slac/g/glast/workingGroups
/nfs/slac/g/scs/libvirt
/nfs/slac/g/scs/linux_migr
/nfs/slac/g/scs/redhat
/nfs/slac/g/scs/scilinux

We were planning to do a final sync up and cutover of the data after coordinating with the users of each filesystem.  If you prefer
not to wait for surrey04b to recover we could change your nfs path to point
to the new gpfs location, as long as you realize that it will be missing
whatever changes were made since it was first copied over.  If/when surrey04b
comes back up, you could copy over whatever is missing.  If you think you
would like to do this, send mail with your request to unix-admin.


-----------------------------

A UPS (Uninterruptible Power Supply) unit failed early this morning, resulting in high temperatures in part of the first floor computer room. The problem was resolved at 6:05AM. Some systems shut down to prevent overheating and are being restored to service.

Scheduled: No
Contact:   unix-admin
CC:
User Services Affected: surrey04b nfs
Hardware Affected:
Submitter: Dart, Renata

See: https://www-internal.slac.stanford.edu/comp-out/outage.aspx?id=2814

########################################################################

To unsubscribe from the COMP-OUT list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=COMP-OUT&A=1


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1