Alright, so it sounds to me that the final resolution is to do it in a separated way and not add the duplicator to the loader itself. Now it's much clearer to me. Thanks for everyone's help! -Vaikunth On Fri, Dec 19, 2014 at 2:54 PM, Salnikov, Andrei A. < [log in to unmask]> wrote: > OK, now I remember that it does not need partitioner running > > first, thanks :) > > > > If we do not consider duplicator as a universally-useful solution > > then we should probably avoid adding it to data loader. It's totally > > OK with me to run partitioner separately before data loader, loader > > already supports pre-partitioned data so it should be OK to run > > duplicator first and then pass few special options to loader to use > > duplicated data without partitioning them in loader. > > > Cheers, > Andy > > > > *From:* Serge Monkewitz [mailto:[log in to unmask]] > *Sent:* Friday, December 19, 2014 12:13 PM > *To:* Fabrice Jammes > *Cc:* Salnikov, Andrei A.; Vaikunth Thukral; qserv-l > > *Subject:* Re: [QSERV-L] Adding Duplicator to Integration Tests > > > > Andy and Fabrice, > > > > The previous duplicator may have run on partitioned data, but this is > not how the new one works anymore. With the new one, you first generate an > index of the input data (basically, you bin it into spherical triangles > that the duplicator will then copy-paste around to other parts of the sky). > The executable for that is called sph-htm-index, and it’s options/inputs > look pretty similar to the partitioner options/inputs. Once you’ve > generated an HTM index with sph-htm-index, you then call sph-duplicate, > which integrates both duplication as well as partitioning and outputs > partitioned data directly. There is no need to call sph-partition > beforehand or afterwards. > > > > Please have a look at the following web-page for documentation (including > an example of how to run it). I think it’ll help to understand a little of > what’s going on under the hood: > > > > https://github.com/LSST/partition/blob/master/docs/duplication.md > > > > Anyway, the upshot is that it should be possible to add a small amount of > test case specific code to invoke the duplicator as in the example, without > disturbing the loader code. It may even be preferable to structure things > that way - the duplicator is/was intended to be a stop-gap way of > generating lots of data so we can test at scale before production, and > maybe it shouldn’t be conflated with / influence the design of our data > loading scripts. > > > > Cheers, > > Serge > > > > > > On Dec 19, 2014, at 10:08 AM, Fabrice Jammes <[log in to unmask]> > wrote: > > > > Hi Andy, Serge and Vaikunth, > > The previous duplicator (partition.py) was running on partitioned data (I > don't know how it was doing to not duplicate PK). > Whereas I think that sph-duplicate is running on non-partitioned data, > could you confirm it to us please Serge? > > sph-duplicate --help provide good documentation on options, but I don't > see how to provide an input data to this tool? > If I'm right, Serge, could you please add it? > > Thanks, > > Fabrice > > On 12/19/2014 09:54 AM, Salnikov, Andrei A. wrote: > > Hi Vaikunth, > > > I think it's better to add duplicator call to data-loader itself. It's > probably > > the same amount of work if you do it in integration scripts but > duplication should > > happen after partitioning so logically we should just stick into a loader > script > > after call to partitioner. I think it will be straightforward to do, there > should > > not be much interaction with other code (except passing all required > parameters). > > And it can probably be done in DM-1570 if you do not want to open new > ticket. > > Cheers, > Andy > > > > > *From:* Vaikunth Thukral [mailto:[log in to unmask] <[log in to unmask]>] > *Sent:* Friday, December 19, 2014 9:23 AM > *To:* Salnikov, Andrei A. > *Cc:* qserv-l > *Subject:* Re: [QSERV-L] Adding Duplicator to Integration Tests > > > > Hi Andy, > > I saw from the code that the loader also enables loading of > pre-partitioned data, something that the duplicator produces. Another > possibility could be to use the duplicator upstream and use the loader > as-is on this now-partitioned data. This could be a quicker solution for > adding the new test case, but if the ideal way is to just have the loader > do the duplication then I won't mind looking into it. Would that > technically block this issue (1570) then? What do you guys think? > > -Vaikunth > > > > On Fri, Dec 19, 2014 at 12:38 AM, Salnikov, Andrei A. < > [log in to unmask]> wrote: > > Hi Vaikunth, > > duplication should be a part of data loading and needs to be added to > data-loader > > script eventually. Do you want to work on that? > > > Cheers, > Andy > > > > > *From:* [log in to unmask] [mailto:[log in to unmask]] *On > Behalf Of *Vaikunth Thukral > *Sent:* Thursday, December 18, 2014 9:14 PM > *To:* qserv-l > *Subject:* [QSERV-L] Adding Duplicator to Integration Tests > > > > Hi all, > > Serge, Fabrice and I have been talking about how to go forward with adding > the test case of using the duplicator (DM-1570). I know Fabrice is working > on using AndyS's new loader to load data for integration tests using > partitioning. > > Would it be acceptable to modify the loader script to use/add a > duplication option instead of just partitioning? > > -Vaikunth > > > ------------------------------ > > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > > > ------------------------------ > > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > > > ------------------------------ > > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > > > > ------------------------------ > > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1