Print

Print


Hello,

On 09/04/2013 04:48 PM, Serge Monkewitz wrote:
>
>> Some other notes:
>> - I added LSST__Source (a copy of LSST__Object) to qservMeta. I 
>> probably could have soft-linked it to LSST__Object.*, but for now I 
>> just made a copy off the objectId index table.
>> - The fact that the qserv master doesn't recognize Source.objectId as 
>> special and look up a chunk for it in the objectId index table seems 
>> like a bug.
>
> So, it turns out that the current qserv master code sub-chunks the 
> above join (Daniel - is that expected?).
The join code thinks that there are always benefits to subchunk a 
subchunked table when it is being joined. It is stupid, but at least it 
provides correct results under some circumstances. I added a TODO to 
think about this, because it doesn't seem like something that we can 
design and define in our heads as we write code. You might consider the 
join predicate (hello, USING() and ON() syntax) and what columns are 
indexed (hello, metadata) or approximate sizes of chunk and subchunk 
tables. Sometimes it looks like query analyzer and optimizer problem. It 
looks easy to get wrong.

The code is only starting to realize that there is more than one table 
that might endure objectId lookups. It only recently allowed for the 
column to be named something other than "objectId". You can call it a 
bug. It is in the same class as "not implemented". The whole idea of 
indexing a database by one column needs some design discussion--for me, 
no details have been hashed out other than the 'we really need this 
master or boss table thing' hallway comment.

> In particular, I see stuff like this in the worker logs on ccqserv27 
> for chunk 12833, which contains  the objectId in question:
>
> Runner running Task: msg: session=19 chunk=12833 db=LSST frag: 
> q=SELECT s.ra,s.decl FROM LSST.Source_12833 AS 
> s,Subchunks_LSST_12833.Object_12833_0 AS o WHERE 
> o.objectId=154062577146538502 AND s.objectId=o.objectId,...
>
> But I also see the same for a bunch of different chunks, and the same 
> query shows up on other workers. This is what I would call a missed 
> opportunity by the master-side query generation logic. But the only 
> ill effect is that execution isn't as fast as it could be.
One of many. Should we gather a list? Could be useful for FDR (list of 
known ways to improve performance beyond our benchmarks).


 > If I understand things correctly, this means that the Object 
sub-chunks created on the fly by qserv workers
 > will always be empty, so that Source/Object join and Object 
near-neighbor queries will always come back
 > with nothing on the IN2P3 cluster. Something must  have gone 
seriously wrong with duplication or
 > partitioning of the Object table.

I think it would be useful to come up with a set of sanity-check queries 
to run on a worker. The sanity check probably involves the qms, and 
would be run as a post-load check (not required by qserv worker query 
execution code).
e.g.
SELECT count(chunkId), count(subChunkId), count(*) from <chunked table>
then make sure all are equal.
SELECT count(*) from <chunked table> where chunkId <> <chunk#>
then make sure count is 0.

-Daniel

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1