Print

Print


On Jun 10, 2014, at 6:33 PM, Kian-Tat Lim <[log in to unmask]> wrote:

> Serge and Daniel had a conversation on HipChat today about secondary
> indexes.  It sounds like this is meant to handle indexes in addition to
> objectId that map from something to chunk number.  I'm concerned because
> adding indexing on the czar adds another layer, another copy of data,
> more (shared) czar state, and another place for things to get out of
> sync and go wrong.
> 
> Is there some way for us to lower the overhead of issuing queries to
> chunks so that we can just use "normal" local per-chunk indexes instead
> of a central index?

If there were, we could just use it for objectId queries as well. This is tempting to me, except that I think the partitioner needs something like a central “objectId" -> chunk mapping unless we force people to supply an associated object position for all partitioned tables. And if we need it for the partitioner, we might as well take advantage on the czar. Once we are there, it does not seem like a big stretch to allow using the same sorts of indexes on columns other than the PK of the director table. Still, I’d personally be quite happy to reduce scope.

I don’t really know how much we can reduce overhead for issuing queries - that’s more of a Daniel/AndyH question. I think the new async xrootd client work and result marshaling rewrite should provide nice gains. We could maybe also look at something like sending multiple short queries per chunk to improve throughput without hurting latency too much. But that has its own complexity cost, and I am worried that broadcasting large amounts of short queries might interfere with shared scans.

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1