Print

Print


Hi Charlie,

> When one disk fails, how many TB's of data are affected? 

With single parity, the likely configuration on one thumper box will be 3 groups of raid5. Two raid5 groups will be 13+1 and one raid5 group will be 14+1. Three disks will be used for hot spares. A ZFS pool strips across these three groups. So if there is a double failure in one group, we lose ALL data on the thumper. There are other configuration (such as one ZFS pool for each group) that will only bring down the box partially when a double failure happens in one group. However, these configurations provide much less flexibility if we want change the allocation for production and local user. It will also come with performance penalty.

The current double parity configuration is 13+2, 13+2 and 12+2, plus 2 hot spares. (I actually prefer 3*(13+1) and 1 hot spare, but that is a different story about hot spares). So the difference is 2 disks out of 40, or 5%. Each disk has a raw space of around 500GB.

> You mean a second disk failing before everything on the array 
> has been reconstructed? Hope the components are not that 
> unreliable. What is the advertised MTBF for a drive? 

I don't know exactly the MTBF. But I think we do see double failure happens already with other storages. And we worry that given the increasingly larger size of hard disk, the reconstruction time will increase and thus our vulnerability to double failure.

> Tier 2 is not supposed to be primary storage for anything. 
> True it may be the only storage for production before it is 
> transferred to Tier 1 for archival. However, that is 
> recoverable by regenerating so we actually have infinite 
> number of "backups". On the scale of things, regenerating is cheap. 
>
> Production could be more tolerant. One, data is swept to Tier 
> 1 for archival, so we should not lose more than a day or 
> two's worth. Two, production data is, almost by definition, 
> something with a fairly long lead time and regenerating them 
> would be acceptable. Active analysis data disappearing is 
> likely to have greater impact on users. Can we use our tape 
> system to back these up -- assuming it is much faster getting 
> things back from our tapes than from across the country? 

Because T2 storage are primarily been used as cache, data come and go frequently. So it is likely that the backup and the DQ2 database will be inconsistent. The effect at SLAC, at BNL, and at Panda production is likely to be significant to find out if the inconsistency is due to failure of storage, or something is wrong with DQ2, and to deal with problems come out of this inconsistency. For this reason, I think the reliability of the storage system deserve a priority. 

Also if at some point, we think double parity is unnecessary, or loosing 5% space is significant, and DQ2 is mature, we may consider changing it to single parity, and we can make the adjustment when our next purchased storage come.

--
Wei Yang  |  [log in to unmask]  |  650-926-3338(O)