I'm not sure this still matters, but I'm confused. I thought we were talking about the transfer time for a single "chunk". I took "chunk" to mean qserv-chunk-of-a-table. This means that we need to transfer a number of bytes equal to the on-disk representation of that table. For MyISAM, this means the raw row data + myisam overhead + index files. rowdata+overhead+index all need to accessible for chunk X of table A to be usable. It is not clear to me that transferring rowdata and index concurrently is faster unless you are getting around TCP window/congestion control (or you have multiple source nodes, multiple pipes, and multiple destinations). Still not sure this is worth arguing about. I feel that the larger question of whether it's a good idea to have an external qserv cluster that has no local (data) storage and uses the LSST cluster as a backing store is a bit orthogonal. For the larger question, I am a bit concerned that I haven't heard of implementations auto-caching HDFS clusters or distributed dbms clusters that use other clusters as backing stores, whether research, proprietary, or open-source. This concerns me. (On the other hand, it's probably worth a master's or phd thesis.) -Daniel On 09/24/2013 05:19 PM, Becla, Jacek wrote: > We are talking here about sizes of *individual* > chunks that are transferred, my point is that > data+index are not a single file. > > All 20,000 chunks are going through the same > pipe too, right? So if we consider db and index, > it 40,000 chunks. > > Jacek > > > > > On 9/24/2013 5:15 PM, Wang, Daniel Liwei wrote: >> Wait, why is it faster in parallel? Same pipe, right? Unless you are >> thinking disjoint sets of source-pipe-dest. >> >> -Daniel >> >> On 09/24/2013 04:44 PM, Jacek Becla wrote: >>> As we just talked, my numbers are for data chunks, >>> index is up to 2x larger, so we can use 2x larger >>> numbers. Data+index come in separate files, so >>> they can be transferred in parallel, so I think >>> it'd be unfair to assume 3x my numbers though >>> >>> Jacek >>> >>> >>> >>> On 9/24/2013 3:07 PM, Jacek Becla wrote: >>>>> Chunks are expected to be multiple terabytes in size, which >>>>> means that downloads are hours long. >>>> K-T, >>>> >>>> Based on the baseline, which assumes flat 20K chunks per tables, >>>> the largest chunk will be 255 GB. The numbers are (in GB, >>>> DR1 --> DR11) >>>> - Object: 2 --> 4 >>>> - ObjExtra: 25 --> 69 >>>> - Source: 9 --> 255 >>>> - ForcedSrc: 2 --> 98 >>>> >>>> This is in LDM-141, dbL2, L141 (and nearby) >>>> >>>> And, that is before compression. >>>> >>>> We talked about keeping chunk size const rather than #chunks >>>> constants, which will probably make us go with DR1-size chunk >>>> sizes, thus keeping chunk size closer to 25 GB than 1/4 TB) >>>> >>>> Jacek >>>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the QSERV-L list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1