LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  February 2015

QSERV-L February 2015

Subject:

Re: data set for large scale test

From:

"Daniel L. Wang" <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Thu, 26 Feb 2015 16:08:04 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (64 lines)

On 02/26/2015 03:46 PM, Fabrice Jammes wrote:
> Daniel,
>
> I have a question between the lines below.
>
> On 02/26/2015 03:32 PM, Daniel L. Wang wrote:
>> On 02/26/2015 01:45 PM, Kian-Tat Lim wrote:
>>>> The general case is very expensive (lookup position and chunk for each
>>>> position!?), and we are only going to get away with it because our
>>>> bulk-loads for ForcedSource will be spatially-restricted.
>>>     I'm pretty sure that ForcedSource and (final) Object tables will
>>> be available at the same time, so the partitioner could look up the
>>> coordinates based on objectId using a simple merge.  I thought you were
>>> already going to have a central objectId-to-chunk (or even subchunk?)
>>> index anyway; scanning an input ForcedSource table for all its 
>>> objectIds
>>> and then doing a single query (or at least batching objectIds to reduce
>>> the number of queries by an order of magnitude) to get the mappings
>>> doesn't sound ridiculous.
>> I think we're on the same page. The general case is truly expensive: 
>> loading arbitrary child table rows requires lookups on the director 
>> table. We can certainly batch this, but again, if child table rows 
>> come in an arbitrary order, and don't have the one-to-many 
>> (object<->forcedsource) relationship, it's really expensive.
>>
>> But yes, batching should be really effective because of the shape of 
>> our data, and the patterns in which we produce it. I don't think a 
>> coordinated multi-table partitioning action is scalable: it means 
>> that the director and all its child table rows need to be available 
>> *at the same time*. (Oh, want to add another child table? Oh, I guess 
>> I need to repartition the director and the other 4 child tables and 
>> reload them. Uh, no.)
>>
>> My point is not that we don't know how to scale it, but that the 
>> processing model is different from what we do now. The current 
>> partitioner and loader can load director and child tables one at a 
>> time, without checking existing data, sharing only partitioning 
>> parameters (stripes/substripes). The catch is that it requires the 
>> child table to be pre-joined (effectively) to the director table in 
>> order to have partitioning coordinates. We know this is insufficient.
> Using the secondary index (built using Object table) to compute the 
> chunk of a given Source w.r.t its objectId column would avoid this 
> join, isn't it?
> For example, if a source i has objectId field equal to j, then we can 
> query the secondary index on objectId=j to get the chunk of the 
> source, this should work.
> Of course we have to build the secondary index prior to this operation.
This is effectively a join, no? I'm not suggesting sending a SQL join 
query into the czar's normal pipeline. But looking up chunkId with the 
secondary index is an index-only join. I think we might still want to 
create a smaller lookup table for each batch of child table rows, 
depending on how fast we can make the full index lookups: Is it faster 
to do 10 million full-index lookups (on disk), or 100k full-index 
lookups, create a 100k hash table, and 10 million lookups on the 
in-memory table? I don't know yet.

  -Daniel

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use