Print

Print


Thanks everybody for answering my questions. I'll restrict
index generation to director table only then, will make new 
ticket for that.

Cheers,
Andy


Becla, Jacek wrote on 2014-12-19:
> Yes, the plan always was to build a "secondary index"
> (basically a mapping from objectId to chunkId and subChunkId)
> *only for the Object table* (or, the director table, speaking
> in general terms).
> 
> It is perfectly fine to restrict secondary index to
> a single column, but we should
> a) document it,
> b) catch when someone tries multi-column, and gracefully fail
> 
> Jacek
> 
> 
> 
> On 12/19/2014 12:34 PM, Serge Monkewitz wrote:
>> Hi Andy,
>> 
>> On Dec 19, 2014, at 12:09 PM, Salnikov, Andrei A.
>> <[log in to unmask]> wrote:
>> 
>>> we just talked with Fabrice about one of the problems that new
>>> data loader created for Fabrice and we need to understand how
>>> to fix it. The essence of it is that data loader tries to create
>>> index table in qservMeta for every partitioned table. This breaks
>>> if when the table's primary key has more than one colum because
>>> data loader only supports one-column PK.
>> 
>> By index, do you mean the PK -> chunkId secondary index? If so, I think
> we only plan to provide this for director tables. For a table like Source,
> or the AvgForcedPhotYearly table Fabrice was trying to load, I don't think
> such an index is expected. (Daniel, please correct me if I'm wrong).
>> 
>>> To understand how to fix it I'd like to get an answer to few
>>> questions:
>>> - Do we need an index for every partitioned table? If not then
>>>   we should add a parameter to config file which disables index
>>>   generation for specific tables. Or do we need index only for
>>>   director table?
>> 
>> I think the answer is only for directors.
>> 
>>> - Do we need an index for tables which have multi-column PK, will
>>>   qserv even support this? If yes then index table needs to have
>>>   the same PK columns as the original table. If not then I can
>>>   just skip generating index for those problematic tables.
>> 
>> For now, I believe we should stick with single column (integer) PKs for
> directors. Supporting multi-column PKs is of course doable, but it would
> complicate query analysis. We'd have to look for multiple equality
> predicate parse tree nodes ANDed together (or worse, split over multiple
> ON clauses) to identify equijoins.
>> 
>>> Related questions:
>>> - should duplicator support multi-column PK in the future?
>>>   I guess 'id' option in config file needs to be a list in this
>>>   case.
>> 
>> My 2 cents: please no. The duplicator is intended as a stop-gap way to
> generate lots of data for testing, and is already complicated enough. If I
> cannot assume that the director and table PKs are 64 bit integers, then
> generating unique IDs for duplicated records becomes much harder, and in
> some cases  impossible.
>> 
>> I also don't think that we get a lot of value out of the effort that
>> would be needed to do this, but others may disagree.
>> 
>>> - what is the official name for the qservMeta index, I think
>>>   "secondary index" is mentioned, but this does not make too
>>>   much sens to me.
>> 
>> That's a question for Daniel. I think of (chunkId, subChunkId) as the
> primary way of looking things up in the Qserv system, so it makes sense to
> me that an index on director PK be "secondary".
>> 
>> Cheers,
>> Serge
>> ########################################################################
>> Use REPLY-ALL to reply to list
>> 
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1