Print

Print


Jacek,

> We are talking here about sizes of *individual*
> chunks that are transferred, my point is that
> data+index are not a single file.

	There are two uses for this data.  One is in my fault tolerance
calculations from a couple of weeks back.  In that case, we are assuming
that one or more nodes has failed and that we need to replicate the
chunks on it across all the tables.  I generally expect that (a) the
replicas of the failed node's chunks will be distributed across the
other nodes of the cluster, although an alternative design is to have
(b) another node that exactly mirrors the failed one.  In case (a), the
re-replication is distributed.  In case (b), it's limited by the size of
the output pipe on the original mirror node and the new mirror node.
For fault tolerance, all data and indexes on the failed node need to be
re-replicated.

	The second case is Mario's download-on-demand use case.  (Note
that I'm still not certain how realistic it will prove to be.)  That use
case works better with the distributed replica design.  In this case, I
think we may be more constrained by the outbound bandwidth of the DAC or
the inbound bandwidth of the user.

-- 
Kian-Tat Lim, LSST Data Management, [log in to unmask]

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1