Print

Print


Hi Fabrice,

    OK, I read through that page. I just want to point out that you will have to be careful to partition based on the deep source position, not the deep forced source positions. This is because duplicate forced sources are not guaranteed to have identical (ra, decl) coordinates. They will however be associated with the same deep source (have identical deepSourceId values). Thus, to run the deduplication procedure from the trac page on chunks, you'll need to ensure that duplicates always end up in the same chunk.

I’m not sure whether the deep forced source data includes the ra,dec of the deep source it was derived from.

If it does not, things will get “interesting”. It should be the case that duplicates have positions that are extremely close to one another. So the first thing to do would be to go ahead and partition with the deep forced source position anyway. To check that no duplicates where split across chunks, you’ll want to set up the partitioner such that each chunk contains exactly one sub-chunk, and such that the overlap radius is non-zero but small (let’s say an arc-minute). This way, the partitioner will split input into chunks, and, for each chunk, provide nearby rows (the overlap). If the two deep forced sources in a duplicate pair are assigned to different chunks, then one will be in the overlap of the chunk for the other, and vice-versa.

So, load all chunk and chunk overlap tables, and check for the existence of split duplicate pairs by testing whether equi-joining a chunk and its overlap on deep source ID yields any rows.

Hopefully you will not encounter any cases where this actually happens, in which case you can just drop all the overlap tables. But if you are unlucky, you’ll need to deal with the annoyance of picking a chunk for each split duplicate pair (I would just assign the pair to the chunk with the smaller ID), and adding/removing rows from chunks to reflect your decisions.

I’m happy to help if you run into any problems!

Cheers,
Serge

On Jan 16, 2014, at 2:23 PM, Fabrice Jammes <[log in to unmask]> wrote:

> Hello Serge,
> 
> Interesting informations are available here :
> https://dev.lsstcorp.org/trac/wiki/Summer2013/ConfigAndStackTestingPlans/DedupeForcedSources
> 
> Thanks for your offer to help with partitioner, i could contact you soon.
> 
> Have a nice day,
> 
> Fabrice
> 
> ########################################################################
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1