Print

Print


Dominique

I think the best would be to break the largest tables into
a reasonably small number of smaller tables (say 16, that
should put the table sizes back into the regime where things
are working reasonably fast.). I think writing a trivial
python script to do that would be easier than going through
partitioner. But indeed, if it is experimental, we could
run it by the partitioner. Would you like the instructions
how to use the partitioner, or will you be trying to divide
the largest tables through a simple python script first?

Jacek





On 01/13/2014 10:36 AM, Dominique Boutigny wrote:
> Hi Jacek,
>
> As far as I understand there is no unsolved issues with the standard
> (non qserv) procedure, it is just awfully (and at the limit of the
> usability) long. The idea here was to try qserv partitioning to see if
> it is improving the performances for the ingestion step as well as for
> the user access.
> There are no issues with the Data Challenge, as IN2P3 already fulfilled
> its commitment with LSST, the goal is to explore this partitioning
> functionality on a single MySQL node.
>
> Dominique
>
> Le 13/01/2014 10:22, Jacek Becla a écrit :
>> Fabrice
>>
>> It'd be better to not pull Qserv into any Data Challenges yet.
>>
>> Is there an issue with using the merge engine? If we use
>> partitioner to just break data into separate tables, we can
>> do that easily without partitioner, and if we start using the
>> advanced features like overlap, it will quickly get out of hand.
>>
>> Can Osman send more specific information what is not working?
>> As I said earlier, I have not heard about any issues since I
>> exchanged emails with him on Dec 11 (which I just forwarded
>> to you).
>>
>> Jacek
>>
>>
>>
>> On 01/13/2014 09:53 AM, Fabrice Jammes wrote:
>>> Hello,
>>>
>>> Osman Aidel, a CC-IN2P3 expert in databases administration, try
>>> currently to load in MySQL the 3TB dataset produced during last data
>>> challenge.
>>> Osman and Dominique Boutigny succeeded in loading the whole dataset in
>>> MySQL, but some post-processing steps of this dataset (like removal of
>>> duplicates) take a infinite time.
>>>
>>> Please remark that these issues are good news for Qserv as it validates
>>> its distributed data model ;-).
>>>
>>> Christian Arnault, french manager for LSST computing, thinks that some
>>> of the tools developed by Qserv team could help Osman, and CC-IN2P3.
>>> Indeed, the partitioning algorithm developed by Serge could be used to
>>> partition the DC dataset in a collection of chunks.
>>> Osman could then load a part of contiguous chunks of this collection
>>> into a single-node MySQL server.
>>>
>>> Do you think this proposal could be attainable soon ? Indeed, French
>>> physicists are interested in studying a representative sample of the DC
>>> dataset and this solution would help them a lot.
>>>
>>> Thanks,
>>>
>>> Fabrice
>>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1