Jacek, (+Daniel) On Oct 25, 2013, at 4:54 PM, Jacek Becla wrote: > In that case, I think we will need to use ra/decl of these sources, > (our scheme has always been that the location of object drives > partitioning for all its sources), refMatch is the only exception. > This will complicate partitioner. I captured that requirement > on trac at: > > https://dev.lsstcorp.org/trac/wiki/db/Qserv/Partitioning > > Serge - please keep that in mind when improving/redesigning > partitioner. Don't worry. This is a minor detail from the POV of the partitioner. Recall that we already ran into this very situation with the association pipeline prototype, in that the source clustering algorithm it employs doesn't necessarily assign every Source to an Object. In that case, I ended up setting the objectId of the Source to NULL, and copying the source ra/dec to the objectRa,objectDec columns, which is equivalent in spirit to what you propose above. One major piece that is missing from the new partitioner is that it currently expects to see a partitioning position in every record (not an ID). In other words, you currently have to feed it denormalized data. That's partly because I figured it should be easy enough for the pipelines to denormalize for me, and partly because I was too lazy to implement an external index mapping from object ID to partitioning position (or chunk/sub-chunk ID). However, the (not too) long term plan is to lift this requirement. As it turns out, Daniel's got a very nice sounding plan for secondary indexes (and I believe he's on the way to a working implementation) that the master will eventually use to lookup the chunk/sub-chunk ID for an object ID. As soon his work on that lands, I'll wire that into the partitioner, or modify it to suit my needs (*), and denormalization prior to partitioning will no longer be necessary. Serge (*) Daniel - in case you are reading along, my thinking here is that there are some not strictly partitioning related computations we'll want to perform prior to being able to serve up a table. Specifically, we'll want to know the maximum angular separation between each coordinate pair in the table and the partitioning coordinates. Given a spatially restricted query against a non- partitioning coordinate pair, we can than constrain which (sub-) chunks must be looked at to satisfy the query by padding the search region with this maximum angular separation, and then looking at only those (sub-) chunks that overlap this padded region. If the object ID index used by the partitioner is from object ID to (RA, Dec) rather than directly to (sub-)chunk ID, then I can easily roll this computation right into partitioning. Another reason the partitioner might prefer a mapping to (RA,Dec) is that it would allow adding overlap to tables other than Object. I do realize this is not currently in the baseline, but maybe it's worth keeping that option open. > On 10/25/2013 04:37 PM, Kian-Tat Lim wrote: >> Jacek, >> >> One thing that came up: Mario and Robert think that we may have >> Sources that cannot be associated with Objects or DiaObjects (that >> will >> become Objects) or SSObjects. These would most likely be noise, >> so they >> don't want to create new Objects or, worse, ForcedSources for >> them. So >> some Sources might have NULL for both objectId and ssObjectId. This >> doesn't change the current schema, but it's something we should >> look out >> for. ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1