Print

Print


Alright, so it sounds to me that the final resolution is to do it in a
separated way and not add the duplicator to the loader itself. Now it's
much clearer to me. Thanks for everyone's help!

-Vaikunth

On Fri, Dec 19, 2014 at 2:54 PM, Salnikov, Andrei A. <
[log in to unmask]> wrote:

>  OK, now I remember that it does not need partitioner running
>
> first, thanks :)
>
>
>
> If we do not consider duplicator as a universally-useful solution
>
> then we should probably avoid adding it to data loader. It's totally
>
> OK with me to run partitioner separately before data loader, loader
>
> already supports pre-partitioned data so it should be OK to run
>
> duplicator first and then pass few special options to loader to use
>
> duplicated data without partitioning them in loader.
>
>
> Cheers,
> Andy
>
>
>
> *From:* Serge Monkewitz [mailto:[log in to unmask]]
> *Sent:* Friday, December 19, 2014 12:13 PM
> *To:* Fabrice Jammes
> *Cc:* Salnikov, Andrei A.; Vaikunth Thukral; qserv-l
>
> *Subject:* Re: [QSERV-L] Adding Duplicator to Integration Tests
>
>
>
> Andy and Fabrice,
>
>
>
>     The previous duplicator may have run on partitioned data, but this is
> not how the new one works anymore. With the new one, you first generate an
> index of the input data (basically, you bin it into spherical triangles
> that the duplicator will then copy-paste around to other parts of the sky).
> The executable for that is called sph-htm-index, and it’s options/inputs
> look pretty similar to the partitioner options/inputs. Once you’ve
> generated an HTM index with sph-htm-index, you then call sph-duplicate,
> which integrates both duplication as well as partitioning and outputs
> partitioned data directly. There is no need to call sph-partition
> beforehand or afterwards.
>
>
>
> Please have a look at the following web-page for documentation (including
> an example of how to run it). I think it’ll help to understand a little of
> what’s going on under the hood:
>
>
>
> https://github.com/LSST/partition/blob/master/docs/duplication.md
>
>
>
> Anyway, the upshot is that it should be possible to add a small amount of
> test case specific code to invoke the duplicator as in the example, without
> disturbing the loader code. It may even be preferable to structure things
> that way - the duplicator is/was intended to be a stop-gap way of
> generating lots of data so we can test at scale before production, and
> maybe it shouldn’t be conflated with / influence the design of our data
> loading scripts.
>
>
>
> Cheers,
>
> Serge
>
>
>
>
>
> On Dec 19, 2014, at 10:08 AM, Fabrice Jammes <[log in to unmask]>
> wrote:
>
>
>
>  Hi Andy, Serge and Vaikunth,
>
> The previous duplicator (partition.py) was running on partitioned data (I
> don't know how it was doing to not duplicate PK).
> Whereas I think that sph-duplicate is running on non-partitioned data,
> could you confirm it to us please Serge?
>
> sph-duplicate  --help provide good documentation on options, but I don't
> see how to provide an input data to this tool?
> If I'm right, Serge, could you please add it?
>
> Thanks,
>
> Fabrice
>
> On 12/19/2014 09:54 AM, Salnikov, Andrei A. wrote:
>
>  Hi Vaikunth,
>
>
>   I think it's better to add duplicator call to data-loader itself. It's
> probably
>
> the same amount of work if you do it in integration scripts but
> duplication should
>
> happen after partitioning so logically we should just stick into a loader
> script
>
> after call to partitioner. I think it will be straightforward to do, there
> should
>
> not be much interaction with other code (except passing all required
> parameters).
>
> And it can probably be done in DM-1570 if you do not want to open new
> ticket.
>
> Cheers,
> Andy
>
>
>
>
> *From:* Vaikunth Thukral [mailto:[log in to unmask] <[log in to unmask]>]
> *Sent:* Friday, December 19, 2014 9:23 AM
> *To:* Salnikov, Andrei A.
> *Cc:* qserv-l
> *Subject:* Re: [QSERV-L] Adding Duplicator to Integration Tests
>
>
>
> Hi Andy,
>
> I saw from the code that the loader also enables loading of
> pre-partitioned data, something that the duplicator produces. Another
> possibility could be to use the duplicator upstream and use the loader
> as-is on this now-partitioned data. This could be a quicker solution for
> adding the new test case, but if the ideal way is to just have the loader
> do the duplication then I won't mind looking into it. Would that
> technically block this issue (1570) then? What do you guys think?
>
> -Vaikunth
>
>
>
> On Fri, Dec 19, 2014 at 12:38 AM, Salnikov, Andrei A. <
> [log in to unmask]> wrote:
>
> Hi Vaikunth,
>
> duplication should be a part of data loading and needs to be added to
> data-loader
>
> script eventually. Do you want to work on that?
>
>
> Cheers,
> Andy
>
>
>
>
> *From:* [log in to unmask] [mailto:[log in to unmask]] *On
> Behalf Of *Vaikunth Thukral
> *Sent:* Thursday, December 18, 2014 9:14 PM
> *To:* qserv-l
> *Subject:* [QSERV-L] Adding Duplicator to Integration Tests
>
>
>
> Hi all,
>
> Serge, Fabrice and I have been talking about how to go forward with adding
> the test case of using the duplicator (DM-1570). I know Fabrice is working
> on using AndyS's new loader to load data for integration tests using
> partitioning.
>
> Would it be acceptable to modify the loader script to use/add a
> duplication option instead of just partitioning?
>
> -Vaikunth
>
>
>   ------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>
>
>  ------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>
>
>  ------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>
>
>
> ------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1