LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  September 2013

QSERV-L September 2013

Subject:

Re: chunk size

From:

"Daniel L. Wang" <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Tue, 24 Sep 2013 17:37:40 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (88 lines)

I'm not sure this still matters, but I'm confused. I thought we were 
talking about the transfer time for a single "chunk". I took "chunk" to 
mean qserv-chunk-of-a-table. This means that we need to transfer a 
number of bytes equal to the on-disk representation of that table. For 
MyISAM, this means the raw row data + myisam overhead + index files. 
rowdata+overhead+index all need to accessible for chunk X of table A to 
be usable. It is not clear to me that transferring rowdata and index 
concurrently is faster unless you are getting around TCP 
window/congestion control (or you have multiple source nodes, multiple 
pipes, and multiple destinations).

Still not sure this is worth arguing about. I feel that the larger 
question of whether it's a good idea to have an external qserv cluster 
that has no local (data) storage and uses the LSST cluster as a backing 
store is a bit orthogonal. For the larger question, I am a bit concerned 
that I haven't heard of implementations auto-caching HDFS clusters or 
distributed dbms clusters that use other clusters as backing stores, 
whether research, proprietary, or open-source. This concerns me. (On the 
other hand, it's probably worth a master's or phd thesis.)

-Daniel

On 09/24/2013 05:19 PM, Becla, Jacek wrote:
> We are talking here about sizes of *individual*
> chunks that are transferred, my point is that
> data+index are not a single file.
>
> All 20,000 chunks are going through the same
> pipe too, right? So if we consider db and index,
> it 40,000 chunks.
>
> Jacek
>
>
>
>
> On 9/24/2013 5:15 PM, Wang, Daniel Liwei wrote:
>> Wait, why is it faster in parallel? Same pipe, right? Unless you are
>> thinking disjoint sets of source-pipe-dest.
>>
>> -Daniel
>>
>> On 09/24/2013 04:44 PM, Jacek Becla wrote:
>>> As we just talked, my numbers are for data chunks,
>>> index is up to 2x larger, so we can use 2x larger
>>> numbers. Data+index come in separate files, so
>>> they can be transferred in parallel, so I think
>>> it'd be unfair to assume 3x my numbers though
>>>
>>> Jacek
>>>
>>>
>>>
>>> On 9/24/2013 3:07 PM, Jacek Becla wrote:
>>>>> 	Chunks are expected to be multiple terabytes in size, which
>>>>> means that downloads are hours long.
>>>> K-T,
>>>>
>>>> Based on the baseline, which assumes flat 20K chunks per tables,
>>>> the largest chunk will be 255 GB. The numbers are (in GB,
>>>> DR1 --> DR11)
>>>>       - Object:    2 -->   4
>>>>       - ObjExtra: 25 -->  69
>>>>       - Source:    9 --> 255
>>>>       - ForcedSrc: 2 -->  98
>>>>
>>>> This is in LDM-141, dbL2, L141 (and nearby)
>>>>
>>>> And, that is before compression.
>>>>
>>>> We talked about keeping chunk size const rather than #chunks
>>>> constants, which will probably make us go with DR1-size chunk
>>>> sizes, thus keeping chunk size closer to 25 GB than 1/4 TB)
>>>>
>>>> Jacek
>>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the QSERV-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use