Print

Print


Jacek,

Thanks for your answer. I will let Fabrice and Osman to deal with this. 
I guess that Fabrice knows how to run the partitioner but he may have 
some specific questions.

Dominique
Le 13/01/2014 11:54, Jacek Becla a écrit :
> Dominique
>
> I think the best would be to break the largest tables into
> a reasonably small number of smaller tables (say 16, that
> should put the table sizes back into the regime where things
> are working reasonably fast.). I think writing a trivial
> python script to do that would be easier than going through
> partitioner. But indeed, if it is experimental, we could
> run it by the partitioner. Would you like the instructions
> how to use the partitioner, or will you be trying to divide
> the largest tables through a simple python script first?
>
> Jacek
>
>
>
>
>
> On 01/13/2014 10:36 AM, Dominique Boutigny wrote:
>> Hi Jacek,
>>
>> As far as I understand there is no unsolved issues with the standard
>> (non qserv) procedure, it is just awfully (and at the limit of the
>> usability) long. The idea here was to try qserv partitioning to see if
>> it is improving the performances for the ingestion step as well as for
>> the user access.
>> There are no issues with the Data Challenge, as IN2P3 already fulfilled
>> its commitment with LSST, the goal is to explore this partitioning
>> functionality on a single MySQL node.
>>
>> Dominique
>>
>> Le 13/01/2014 10:22, Jacek Becla a écrit :
>>> Fabrice
>>>
>>> It'd be better to not pull Qserv into any Data Challenges yet.
>>>
>>> Is there an issue with using the merge engine? If we use
>>> partitioner to just break data into separate tables, we can
>>> do that easily without partitioner, and if we start using the
>>> advanced features like overlap, it will quickly get out of hand.
>>>
>>> Can Osman send more specific information what is not working?
>>> As I said earlier, I have not heard about any issues since I
>>> exchanged emails with him on Dec 11 (which I just forwarded
>>> to you).
>>>
>>> Jacek
>>>
>>>
>>>
>>> On 01/13/2014 09:53 AM, Fabrice Jammes wrote:
>>>> Hello,
>>>>
>>>> Osman Aidel, a CC-IN2P3 expert in databases administration, try
>>>> currently to load in MySQL the 3TB dataset produced during last data
>>>> challenge.
>>>> Osman and Dominique Boutigny succeeded in loading the whole dataset in
>>>> MySQL, but some post-processing steps of this dataset (like removal of
>>>> duplicates) take a infinite time.
>>>>
>>>> Please remark that these issues are good news for Qserv as it 
>>>> validates
>>>> its distributed data model ;-).
>>>>
>>>> Christian Arnault, french manager for LSST computing, thinks that some
>>>> of the tools developed by Qserv team could help Osman, and CC-IN2P3.
>>>> Indeed, the partitioning algorithm developed by Serge could be used to
>>>> partition the DC dataset in a collection of chunks.
>>>> Osman could then load a part of contiguous chunks of this collection
>>>> into a single-node MySQL server.
>>>>
>>>> Do you think this proposal could be attainable soon ? Indeed, French
>>>> physicists are interested in studying a representative sample of 
>>>> the DC
>>>> dataset and this solution would help them a lot.
>>>>
>>>> Thanks,
>>>>
>>>> Fabrice
>>>>
>>>
>>> ######################################################################## 
>>>
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the QSERV-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>
>

-- 
Dominique Boutigny  -  CNRS / CC-IN2P3
Now at SLAC National Accelerator Laboratory

Mail     : [log in to unmask] -  [log in to unmask]
Office   : +1 650-926-5759   -  Cellular : +1 774-232-0912



########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1