Ok, we're on the same page.
Having some real test case should a help a lot in finding the good
optimizations for this interesting problem.
in2p3 cluster can of course be used to run this kind of experimentation,
and of course I'm interested in contributing ;-)
Cheers,
Fabrice
On 02/26/2015 04:08 PM, Daniel L. Wang wrote:
>> Using the secondary index (built using Object table) to compute the
>> chunk of a given Source w.r.t its objectId column would avoid this
>> join, isn't it?
>> For example, if a source i has objectId field equal to j, then we can
>> query the secondary index on objectId=j to get the chunk of the
>> source, this should work.
>> Of course we have to build the secondary index prior to this operation.
> This is effectively a join, no? I'm not suggesting sending a SQL join
> query into the czar's normal pipeline. But looking up chunkId with the
> secondary index is an index-only join. I think we might still want to
> create a smaller lookup table for each batch of child table rows,
> depending on how fast we can make the full index lookups: Is it faster
> to do 10 million full-index lookups (on disk), or 100k full-index
> lookups, create a 100k hash table, and 10 million lookups on the
> in-memory table? I don't know yet.
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|