Hello Qserv clan,
On my run, I was thinking about the subchunking and overlap problem, and
the dec-stripe solution, and I think I've come up with an in-between
solution.
One of the reasons that we just stored overlap and then queried by using
joining against subchunk+overlap, was that we didn't want to compute the
exact overlap region for each query, for each subchunk. How about this:
* store subchunks, but not overlap.
* query against subchunk + adjacent subchunks. Effectively, this gives
us overlap ~= subchunk width.
* computing this may have some noticeable cost if the czar has to do it
(full scan of 10k chunks each with 200 subchunks = computation of 2
million adjacency subchunks), so maybe we can push it to the workers,
where it can be computed almost for free (actually, we can cache it on
each worker)..
* subchunks are on-the-fly, so we save computation of overlap subchunks
completely, and build half the temp tables as before
* Workaround subchunks from adjacent chunks by storing them using
virtual subchunk numbers.
* This requires somewhat more code and complexity, but also eliminates
the previous overlap management, so the net cost/complexity increase is
low(?).
Another different idea: We build subchunk tables on the fly because the
mysql optimizer is too stupid to use a "subchunkid=X" condition in the
WHERE clause to its fullest effect. Did we try coaxing it by using a
subquery?
i.e., SELECT o1.blah, o2.blah FROM (SELECT ... FROM Object_N WHERE
subchunkid=X) as o1, (SELECT ... FROM Object_N WHERE subchunkid = X) AS
o2 WHERE...
instead of
SELECT o1.blah, o2.blah FROM Object_N AS o1, Object_N AS o2 WHERE
subchunkid = X AND ...;
Have a great weekend, and thanks for the great week!
-Daniel
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|