LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  September 2013

QSERV-L September 2013

Subject:

partitioning 8b row table

From:

Jacek Becla <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Tue, 3 Sep 2013 18:55:18 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (144 lines)

I wonder if we could leverage existing tools to partition
the 8b-row table (surprise!) that is coming in Summer2013.
I am thinking about breaking that table into a small number
(8-16 or so) of smaller tables, and gluing them together
through list partitioning.

Serge: do you think we could use your new partitioner
for that? How long do you think it'd take to partition
8b row table (the input would be in a small number of
databases), if we use lsst10. I guess we could hijack
lsst-dev if needed.

Douglas: I don't worry as much about loading, because
it will be a small number of tables, but if we could use
your script to simplify that'd surely be handy.

The important thing to note is that whatever we do must
be production quality, we can't lose any data, we can't
duplicate any rows etc.


Jacek



On 09/03/2013 04:57 PM, Smith, Douglas A. wrote:
>
> This ingest discussion, is this ingest into just mysql, or
> into qserv?  Which would include partitioning as part
> of it...
>
> I've got the latest qserv load script loading more than
> one table at a time right now, one chunk for each cpu
> on the machine.  I can run tests on having that on or
> off to see how long it takes to load data...
>
> But it would be better to try and test the creation of
> myisam table without the mysql daemon, but that would
> take linking to the myisam libs in some util we would need
> to create, and we haven't gotten around to that...
>
> Douglas
>
>
> On 09/03/2013 04:52 PM, Jacek Becla wrote:
>> I am worried about 8 billion row table. Every time I asked
>> in the past I was told the scale will be comparable to W13.
>> 8 vs 3.9 billion will cause more than 2x more pain!
>>
>>
>> Question: if we process in parallel and ingest into
>> separate databases, will all forced sources for a given
>> object always end up in the same table? If that is the
>> case, I'd strongly suggest to dedupe in parallel in
>> separate databases.
>>
>> If not, I'd compute level-2 HTM index and partition
>> the tables using list partitioning:
>> http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html
>> and use that instead of federating mentioned earlier.
>> That'd come very handy for reindexing etc.
>>
>>
>> Jacek
>>
>>
>>
>>
>> On 09/03/2013 03:42 PM, Yusra AlSayyad wrote:
>>> Greg,
>>>
>>> Your estimate is consistent with the estimate from your 5 < RA < 7.5 run:
>>> The 2.5 degree half overlap database has 447180788 forced sources. If the complete ncsa RA range is 50 degrees then 447180788*50/2.5 = 8.9 billion
>>>
>>> Another consideration is that we'll be will be running an update statement on the RunDeepForcedSource table to dedupe. This will join the RunDeepForcedSource table with a temporary table of duplicate IDs generated by a group by.  Sorting will have to happen either in the group by or in creating an index (in addition to the regular planned index).
>>>
>>> -Yusra
>>>
>>> On Sep 3, 2013, at 3:13 PM, "Daues, Gregory Edward" <[log in to unmask]> wrote:
>>>
>>>>
>>>> Hi Jacek,
>>>>
>>>>      If I  am estimating things properly, the scale is 8 billion rows (~ 200 sdss 'runs'
>>>> each which can have 40 million rows on average).     The one ingest test that I performed
>>>> suggested this could be  three days runtime for ingestion in serial.
>>>>
>>>>             Greg
>>>> ________________________________________
>>>> From: [log in to unmask] [[log in to unmask]] on behalf of Jacek Becla [[log in to unmask]]
>>>> Sent: Tuesday, September 03, 2013 3:15 PM
>>>> To: Kian-Tat Lim
>>>> Cc: [log in to unmask]
>>>> Subject: Re: [LSST-data] DM Status Call Minutes 2013-09-03
>>>>
>>>> K-T
>>>>
>>>> Federating would be instantaneous (but we need to test
>>>> if there are any unexpected side effects, that is human
>>>> time)
>>>>
>>>> As to stitching vs loading:
>>>>     - Simon can comment how much time he saved by loading in
>>>>       parallel (I think if was in the order of 3-4 days)
>>>>     - timing for stitching is captured here:
>>>>       https://dev.lsstcorp.org/trac/wiki/mysqlLargeTables
>>>>       (short summary: <2 days).
>>>>
>>>> Note that we are working closely with mysql to get the
>>>> 2 days down to few hours - they have a bug in the tool
>>>> that rebuilds an index which prevents us from taking
>>>> full advantage of available hardware.
>>>>
>>>> Note that these numbers are highly dependent on number
>>>> of rows we are talking about. What numbers are we
>>>> talking about?
>>>>
>>>> Jacek
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 09/03/2013 12:55 PM, Kian-Tat Lim wrote:
>>>>> Jacek,
>>>>>
>>>>>> Most painful part was stitching the tables together.
>>>>>> If we have time to try doing better than a brute force,
>>>>>> I'd like to experiment with using federated engine
>>>>>> this time around.
>>>>>         The most important question is whether the time to
>>>>> stitch/federate the tables was greater than it would have been to load
>>>>> the tables sequentially in the first place.  Do you remember?
>>>>>
>>>>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use