Print

Print


Alright, so it sounds to me that the final resolution is to do it in a separated way and not add the duplicator to the loader itself. Now it's much clearer to me. Thanks for everyone's help!

-Vaikunth

On Fri, Dec 19, 2014 at 2:54 PM, Salnikov, Andrei A. <[log in to unmask]> wrote:

OK, now I remember that it does not need partitioner running

first, thanks :)

 

If we do not consider duplicator as a universally-useful solution

then we should probably avoid adding it to data loader. It's totally

OK with me to run partitioner separately before data loader, loader

already supports pre-partitioned data so it should be OK to run

duplicator first and then pass few special options to loader to use

duplicated data without partitioning them in loader.


Cheers,
Andy

 

From: Serge Monkewitz [mailto:[log in to unmask]]
Sent: Friday, December 19, 2014 12:13 PM
To: Fabrice Jammes
Cc: Salnikov, Andrei A.; Vaikunth Thukral; qserv-l


Subject: Re: [QSERV-L] Adding Duplicator to Integration Tests

 

Andy and Fabrice,

 

    The previous duplicator may have run on partitioned data, but this is not how the new one works anymore. With the new one, you first generate an index of the input data (basically, you bin it into spherical triangles that the duplicator will then copy-paste around to other parts of the sky). The executable for that is called sph-htm-index, and it’s options/inputs look pretty similar to the partitioner options/inputs. Once you’ve generated an HTM index with sph-htm-index, you then call sph-duplicate, which integrates both duplication as well as partitioning and outputs partitioned data directly. There is no need to call sph-partition beforehand or afterwards.

 

Please have a look at the following web-page for documentation (including an example of how to run it). I think it’ll help to understand a little of what’s going on under the hood:

 

 

Anyway, the upshot is that it should be possible to add a small amount of test case specific code to invoke the duplicator as in the example, without disturbing the loader code. It may even be preferable to structure things that way - the duplicator is/was intended to be a stop-gap way of generating lots of data so we can test at scale before production, and maybe it shouldn’t be conflated with / influence the design of our data loading scripts.

 

Cheers,

Serge

 

 

On Dec 19, 2014, at 10:08 AM, Fabrice Jammes <[log in to unmask]> wrote:



Hi Andy, Serge and Vaikunth,

The previous duplicator (partition.py) was running on partitioned data (I don't know how it was doing to not duplicate PK).
Whereas I think that sph-duplicate is running on non-partitioned data, could you confirm it to us please Serge?

sph-duplicate  --help provide good documentation on options, but I don't see how to provide an input data to this tool?
If I'm right, Serge, could you please add it?

Thanks,

Fabrice

On 12/19/2014 09:54 AM, Salnikov, Andrei A. wrote:

Hi Vaikunth,


I think it's better to add duplicator call to data-loader itself. It's probably

the same amount of work if you do it in integration scripts but duplication should

happen after partitioning so logically we should just stick into a loader script

after call to partitioner. I think it will be straightforward to do, there should

not be much interaction with other code (except passing all required parameters).

And it can probably be done in DM-1570 if you do not want to open new ticket.

Cheers,
Andy


 

From: Vaikunth Thukral [mailto:[log in to unmask]] 
Sent: Friday, December 19, 2014 9:23 AM
To: Salnikov, Andrei A.
Cc: qserv-l
Subject: Re: [QSERV-L] Adding Duplicator to Integration Tests

 

Hi Andy,

I saw from the code that the loader also enables loading of pre-partitioned data, something that the duplicator produces. Another possibility could be to use the duplicator upstream and use the loader as-is on this now-partitioned data. This could be a quicker solution for adding the new test case, but if the ideal way is to just have the loader do the duplication then I won't mind looking into it. Would that technically block this issue (1570) then? What do you guys think?

-Vaikunth

 

On Fri, Dec 19, 2014 at 12:38 AM, Salnikov, Andrei A. <[log in to unmask]> wrote:

Hi Vaikunth,

duplication should be a part of data loading and needs to be added to data-loader

script eventually. Do you want to work on that?


Cheers,
Andy
 

 

From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of Vaikunth Thukral
Sent: Thursday, December 18, 2014 9:14 PM
To: qserv-l
Subject: [QSERV-L] Adding Duplicator to Integration Tests

 

Hi all,

Serge, Fabrice and I have been talking about how to go forward with adding the test case of using the duplicator (DM-1570). I know Fabrice is working on using AndyS's new loader to load data for integration tests using partitioning.

Would it be acceptable to modify the loader script to use/add a duplication option instead of just partitioning?

-Vaikunth

 


Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

 


Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

 


Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

 



Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1




Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1