LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  June 2014

QSERV-L June 2014

Subject:

Re: Secondary indexes

From:

Serge Monkewitz <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Tue, 10 Jun 2014 22:45:16 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (39 lines)

On Jun 10, 2014, at 7:57 PM, Kian-Tat Lim <[log in to unmask]> wrote:

> Serge,
> 
>> If there were, we could just use it for objectId queries as well. This
>> is tempting to me, except that I think the partitioner needs something
>> like a central “objectId" -> chunk mapping unless we force people to
>> supply an associated object position for all partitioned tables.
> 
> 	It doesn't necessarily need to be central -- we could spray the
> list of objectIds to the workers and have each check its own chunk(s).

If I’m not misunderstanding what you mean, this has computational complexity Θ(N) and requires p*Θ(N) bytes to be sent over the network (where N is the number of input records and p is the number of workers). Using an the index structure and parallelizing across p workers, we should get computation costs of Θ(N/p * (log N/p + log K)), where K is the number of object IDs, and 2*Θ(N) bytes sent over the network. I think it could work well if the partitioning input is relatively small (which should be the case when continuously loading small batches), though it does mean we’d have to move components of the partitioner into MySQL UDFs to avoid the costs of a query per objectId.

I think this could work well if the partitioning input is relatively small (which should be the case when continuously loading small batches), though it does mean we’d have to move components of the partitioner into MySQL UDFs to avoid the costs of a query per objectId. But I’m not so sure it wouldn’t become a bottleneck if someone hands us a really big pile of relatively local data (not coming across some thin WAN pipe) all at once.

Also, wouldn’t the workers serving queries be less insulated from the performance effects of loading data in this setup?

>> And if we need it for the partitioner, we might as well take advantage
>> on the czar. Once we are there, it does not seem like a big stretch to
>> allow using the same sorts of indexes on columns other than the PK of
>> the director table.
> 
> 	True.
> 
> 	I think you're right that the per-chunk overhead is always going
> to be too big.  Too bad, though -- MySQL has indexes down in the
> workers, but we can't use them very effectively.  On top of that, I
> don't think there's much of a way to compress the in-memory
> representation of the objectId (or secondary) index.  Maybe a Bloom
> filter?

For compression, I still think we should do the run-length encoding I proposed in a previous e-mail, as there is little or no storage penalty in the general case, and what sounds like around a factor of 1000 savings for LSST. But the bloom filter idea is really interesting… if we forget about the partitioner for a second, there’s no reason to require a perfect mapping from objectIds to chunks - it should be perfectly OK to dispatch to more than one chunk (as long as it’s a small number). I’ll have to read up on bloom filter space/accuracy tradeoffs - it’s definitely something we should think about when we are revisiting the objectId indexing.
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use