Print

Print


Hello Serge,

Thanks for your answer and sorry for the delay, but i'm concurrently 
working on eupspkg.

More answer below :

On 01/20/2014 09:16 PM, Serge Monkewitz wrote:
>
> On Jan 17, 2014, at 6:01 AM, Fabrice Jammes <[log in to unmask] 
> <mailto:[log in to unmask]>> wrote:
>
>> On 01/17/2014 02:54 AM, Serge Monkewitz wrote:
>>> Hi Fabrice,
>>>
>>>     OK, I read through that page. I just want to point out that you 
>>> will have to be careful to partition based on the deep source 
>>> position, not the deep forced source positions. This is because 
>>> duplicate forced sources are not guaranteed to have identical (ra, 
>>> decl) coordinates. They will however be associated with the same 
>>> deep source (have identical deepSourceId values). Thus, to run the 
>>> deduplication procedure from the trac page on chunks, you'll need to 
>>> ensure that duplicates always end up in the same chunk.
>>>
>>> I’m not sure whether the deep forced source data includes the ra,dec 
>>> of the deep source it was derived from.
>> Table RunDeepForcedSource doesn't seem to contains this information.
>> In this table, the only fields which could contains these spatial 
>> informations are :
>>
>>        cluster_id BIGINT NULL,
>>        cluster_coord_ra DOUBLE NULL,
>>        cluster_coord_decl DOUBLE NULL,
>
>> Bogdan, Dominique, or Philippe, could you please confirm that the 
>> RunDeepForcedSource table doesn't contains the ra,dec of DeepSource 
>> entities it references ?
>
> I think those are normally added by source association, and probably 
> aren’t populated here.
These columns contain data, i will check if each RunDeepForcedSource 
associated to one DeepSource has the same or different values. In the 
second case it means that this is not the DeepSource coordinate.
>
>>> If it does not, things will get “interesting”. It should be the case 
>>> that duplicates have positions that are extremely close to one 
>>> another. So the first thing to do would be to go ahead and partition 
>>> with the deep forced source position anyway. To check that no 
>>> duplicates where split across chunks, you’ll want to set up the 
>>> partitioner such that each chunk contains exactly one sub-chunk, and 
>>> such that the overlap radius is non-zero but small (let’s say an 
>>> arc-minute). This way, the partitioner will split input into chunks, 
>>> and, for each chunk, provide nearby rows (the overlap). If the two 
>>> deep forced sources in a duplicate pair are assigned to different 
>>> chunks, then one will be in the overlap of the chunk for the other, 
>>> and vice-versa.
>> ok, very good idea, I fully agree with it, and will rely on it.
>> One question : could duplicate RunDeepForcedSource entities sets have 
>> a cardinality greater than 2 ?
>
> I don’t think so. IIRC the pairs arise from pixel overlap between 
> adjacent coadd “tiles" (I’m not sure what the correct terminology is 
> here) of the stripe 82 coadd; those were laid out in a single row 
> meaning that at most 2 “tiles” cover any particular position (i.e. 
> deep source).
Yes you're right. Bogdan explain me that images from same run have only 
overlap on their "right" and "left" side, and that run are horizontal 
stripes. So, it's only possible to have two duplicate DeepForcedSource 
related to one DeepSource in a run.
>
>> Maybe we could tell scientists that for entities placed near the 
>> limit of a chunk, related entities like DeepSource of 
>> DeepForcedSource could be placed in the nearest neighbor chunk ?
>> Furthermore, Osman plans to use MySQL merge engine to manage chunk 
>> union, this technology should solve this side effect problem (except 
>> at the border of the zone covered by the union of contiguous chunks), 
>> shouldn't it ?
>> This would allow scientist to have consistent data in a significant 
>> area (except on the area border) and could be enough for this first 
>> test? Don't you think so ?
>
> For a first test I think this is perfectly fine.
>
>> If no, I think two new questions are opened :
>> - couldn't we assign the whole pair to the chunk containing the 
>> DeepSource entity referenced by the pair ? I think this would allow 
>> spatial queries on DeepSource (ra,dec) to be consistent.
>
> You certainly could. I’m not sure what you mean by consistent spatial 
> queries though - the chunking of RunDeepForcedSource serves only to 
> bring each chunk down to a manageable size and would otherwise be 
> hidden via the MERGE engine. I was thinking that DeepSource would be 
> loaded as a single table, since it is so small.
>
>> - in case the user issue a spatial query on DeepForcedSource 
>> (ra,dec), he could lost an element of a pair if it has been placed in 
>> a contiguous chunk (containing the other element of the pair), is 
>> this a problem for scientist ? I don't know.
>
> It may or may not be, however scientists would have to account for 
> this even with an unpartitioned table. If that's a concern, we could 
> measure the maximum angular separation between elements of a pair, and 
> tell scientists to expand their search regions by that amount.
>
>> In the future, Qserv may solve these questions as it manages chunk 
>> overlaps. Maybe, for now, we could let the merge engine do the job ? 
>> It seems this would lead to consistent data everywhere except on the 
>> merged chunks union border.
>
> Once you’ve assigned split pairs to chunks, I would discard the 
> overlap tables completely, and use the MERGE engine to combine the 
> chunks. I think this avoids any problems, but I’m not sure what you 
> mean by consistent data here, and
> perhaps I am missing something. Can you elaborate?

I agree with you.
By consistent I meaned that queries on (ra,decl) of DeepForcedSource and 
queries on a DeepSourceId return correct results. This imply that no 
DeepForcedSource are contained in an other chunk that the one used by 
the query.
Using the merge engine, or loading several chunks in the same table will 
correct this.

Note that data must respect my definition of consistency in order to set 
correctly the "is_primary" flag. We have to be carefull with it if this 
flag is setted before agregating chunk. Then we should use the 
techniques you proposed to put the DeepForcedSource related to one 
DeepSource in the same chunk.

Hope i'm clear enough ...

Cheers,

Fabrice

>
> Cheers,
> Serge
>
> ------------------------------------------------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1