Print

Print


Jacek,

I'm happy to discuss it. FYI, the system already accepts multiple 
emptyChunk files (one per db), but it defaults to emptyChunks.txt if 
none is found.
Have a look at populateEmptyChunkInfo() in metadata.py.
Also, note that in the czar log you should see a message saying what it 
was looking for. Sorry, it's not configurable. It will always look in 
$CWD/empty_<dbname>.txt . At the time, I was hoping we could eliminate 
the empty chunks file with a lightweight secondary index, but that got 
shelved because I didn't have a good solution to handle updates.

-Daniel

On 05/16/2014 01:03 PM, Becla, Jacek wrote:
> Daniel
>
> I feel like it'd be good to get to the bottom of it this FY,
> at minimum we should allow multiple emptyChunk lists to
> coexist (one per database). I'll add to the list of to-discuss
> topics for next Wed.
>
> Thanks,
> Jacek
>
>
>
>
>
>
> On 05/16/2014 12:29 PM, Wang, Daniel Liwei wrote:
>> Ideally, we would rebuild it upon any changes to the dirTable. There was
>> code in indexing.py(I think?) that was a placeholder attempt of code
>> that could generate it.
>>
>> There are a couple ways of generating it:
>> 1) from the range of numbers defined by the min and max chunk number,
>> filter out chunks determined to be non-empty by the existence of
>> dirtable_NNN tables. This is what my scripts did, and it was a hassle to
>> get it to work.
>>
>> 2) Do a special all-chunks query (count(*) or similar), but don't squash
>> on errors-- add them to the empty chunks list.
>>
>> 3) Populate the empty chunks file when you create a database, or when a
>> czar becomes aware that a database exists. Every time you load data, you
>> know what chunks you are creating, so remove those chunks from the empty
>> chunks file/list. The czar always checks the db entry in css to see if
>> its empty chunks file is out of date. It is impossible to delete
>> partitioned rows, so chunks never become empty after being non-empty.
>>
>> 4) Compute a non-empty chunk list from the sec-index list: select
>> distinct chunkId from blah (will go obsolete soon) and create a
>> hash-table or std::map, and use it. The czar can compute and cache this
>> the first time the db is accessed.
>>
>> I know, it's annoying.
>> -Daniel
>>
>>
>> -Daniel
>>
>>
>> On 05/16/2014 11:59 AM, Jacek Becla wrote:
>>> Daniel
>>>
>>> Can you remind me what the plan are with regards to the
>>> emptyChunk.txt? Will it be going away any time soon?
>>> It can still lead to a lot of confusion: I had a fully
>>> working environment based on PT1.1 data set, but after
>>> I run qserv-testdata it silently overwrote the version
>>> I had (in build/dist/etc), and as a result I started
>>> getting an error:
>>>
>>> Table 'LSST.Object_1234567890' doesn't exist
>>>
>>> which I did not connect with bad emptyChunk.txt for
>>> quite some time.
>>>
>>> Thanks,
>>> Jacek
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the QSERV-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1