Print

Print


On 09/04/2013 09:00 PM, Kian-Tat Lim wrote:
> Daniel,
>
>> The join code thinks that there are always benefits to subchunk a
>> subchunked table when it is being joined. It is stupid, but at least
>> it provides correct results under some circumstances. I added a TODO
>> to think about this, because it doesn't seem like something that we
>> can design and define in our heads as we write code. You might
>> consider the join predicate (hello, USING() and ON() syntax) and
>> what columns are indexed (hello, metadata) or approximate sizes of
>> chunk and subchunk tables. Sometimes it looks like query analyzer
>> and optimizer problem. It looks easy to get wrong.
> 	I don't think we should be trying to outsmart MySQL (or the DBA)
> here.
I agree that we shouldn't try to outsmart MySQL. But we are, explicitly, 
for spatial self-joins. Because it is disastrous without them. Right 
now, there is no code to detect equi-joins. There is also no 
(well-understood) code to support more than a two-way spatial join.

> I believe the only queries for which subchunking is definitively
> better are spatially-limited self-joins, which hopefully are relatively
> easy to recognize.  All other queries should just be passed through as
> is (at least until subchunking is proven to help them).
You have to know that you can pass them through. This might be simple, 
but it hasn't been thought out. It would also be nice to know when the 
query can't be executed accurately. These things haven't been defined 
precisely, and perhaps I'm really dense, but the first cut at the logic 
is broken and I felt there was hidden, dangerous complexity lurking 
underneath. Still, I haven't revisited after the new parse framework, so 
maybe it's much simpler now. Dare to dream.

-Daniel

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1