Ok, we're on the same page. Having some real test case should a help a lot in finding the good optimizations for this interesting problem. in2p3 cluster can of course be used to run this kind of experimentation, and of course I'm interested in contributing ;-) Cheers, Fabrice On 02/26/2015 04:08 PM, Daniel L. Wang wrote: >> Using the secondary index (built using Object table) to compute the >> chunk of a given Source w.r.t its objectId column would avoid this >> join, isn't it? >> For example, if a source i has objectId field equal to j, then we can >> query the secondary index on objectId=j to get the chunk of the >> source, this should work. >> Of course we have to build the secondary index prior to this operation. > This is effectively a join, no? I'm not suggesting sending a SQL join > query into the czar's normal pipeline. But looking up chunkId with the > secondary index is an index-only join. I think we might still want to > create a smaller lookup table for each batch of child table rows, > depending on how fast we can make the full index lookups: Is it faster > to do 10 million full-index lookups (on disk), or 100k full-index > lookups, create a 100k hash table, and 10 million lookups on the > in-memory table? I don't know yet. ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1