Print

Print


Hello Qserv clan,

On my run, I was thinking about the subchunking and overlap problem, and 
the dec-stripe solution, and I think I've come up with an in-between 
solution.

One of the reasons that we just stored overlap and then queried by using 
joining against subchunk+overlap, was that we didn't want to compute the 
exact overlap region for each query, for each subchunk. How about this:

* store subchunks, but not overlap.
* query against subchunk + adjacent subchunks. Effectively, this gives 
us overlap ~= subchunk width.
* computing this may have some noticeable cost if the czar has to do it 
(full scan of 10k chunks each with 200 subchunks = computation of 2 
million adjacency subchunks), so maybe we can push it to the workers, 
where it can be computed almost for free (actually, we can cache it on 
each worker)..
* subchunks are on-the-fly, so we save computation of overlap subchunks 
completely, and build half the temp tables as before
* Workaround subchunks from adjacent chunks by storing them using 
virtual subchunk numbers.
* This requires somewhat more code and complexity, but also eliminates 
the previous overlap management, so the net cost/complexity increase is 
low(?).


Another different idea: We build subchunk tables on the fly because the 
mysql optimizer is too stupid to use a "subchunkid=X" condition in the 
WHERE clause to its fullest effect. Did we try coaxing it by using a 
subquery?

i.e., SELECT o1.blah, o2.blah FROM (SELECT ... FROM Object_N WHERE 
subchunkid=X) as o1, (SELECT ... FROM Object_N WHERE subchunkid = X) AS 
o2 WHERE...
instead of
SELECT o1.blah, o2.blah FROM Object_N AS o1, Object_N AS o2 WHERE 
subchunkid = X AND ...;

Have a great weekend, and thanks for the great week!
-Daniel

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1