Thanks everybody for answering my questions. I'll restrict index generation to director table only then, will make new ticket for that. Cheers, Andy Becla, Jacek wrote on 2014-12-19: > Yes, the plan always was to build a "secondary index" > (basically a mapping from objectId to chunkId and subChunkId) > *only for the Object table* (or, the director table, speaking > in general terms). > > It is perfectly fine to restrict secondary index to > a single column, but we should > a) document it, > b) catch when someone tries multi-column, and gracefully fail > > Jacek > > > > On 12/19/2014 12:34 PM, Serge Monkewitz wrote: >> Hi Andy, >> >> On Dec 19, 2014, at 12:09 PM, Salnikov, Andrei A. >> <[log in to unmask]> wrote: >> >>> we just talked with Fabrice about one of the problems that new >>> data loader created for Fabrice and we need to understand how >>> to fix it. The essence of it is that data loader tries to create >>> index table in qservMeta for every partitioned table. This breaks >>> if when the table's primary key has more than one colum because >>> data loader only supports one-column PK. >> >> By index, do you mean the PK -> chunkId secondary index? If so, I think > we only plan to provide this for director tables. For a table like Source, > or the AvgForcedPhotYearly table Fabrice was trying to load, I don't think > such an index is expected. (Daniel, please correct me if I'm wrong). >> >>> To understand how to fix it I'd like to get an answer to few >>> questions: >>> - Do we need an index for every partitioned table? If not then >>> we should add a parameter to config file which disables index >>> generation for specific tables. Or do we need index only for >>> director table? >> >> I think the answer is only for directors. >> >>> - Do we need an index for tables which have multi-column PK, will >>> qserv even support this? If yes then index table needs to have >>> the same PK columns as the original table. If not then I can >>> just skip generating index for those problematic tables. >> >> For now, I believe we should stick with single column (integer) PKs for > directors. Supporting multi-column PKs is of course doable, but it would > complicate query analysis. We'd have to look for multiple equality > predicate parse tree nodes ANDed together (or worse, split over multiple > ON clauses) to identify equijoins. >> >>> Related questions: >>> - should duplicator support multi-column PK in the future? >>> I guess 'id' option in config file needs to be a list in this >>> case. >> >> My 2 cents: please no. The duplicator is intended as a stop-gap way to > generate lots of data for testing, and is already complicated enough. If I > cannot assume that the director and table PKs are 64 bit integers, then > generating unique IDs for duplicated records becomes much harder, and in > some cases impossible. >> >> I also don't think that we get a lot of value out of the effort that >> would be needed to do this, but others may disagree. >> >>> - what is the official name for the qservMeta index, I think >>> "secondary index" is mentioned, but this does not make too >>> much sens to me. >> >> That's a question for Daniel. I think of (chunkId, subChunkId) as the > primary way of looking things up in the Qserv system, so it makes sense to > me that an index on director PK be "secondary". >> >> Cheers, >> Serge >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the QSERV-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 >> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1