On Feb 5, 2016, at 1:52 PM, Gates, John H <[log in to unmask]> wrote:Hi Jacek,
I was looking through LDM-135 and came across a couple of things.
This is the only scan that specifically mentions the previous data releases. I'm not sure why it is special like that. It also looks like we aren't making any promises at all about other scans for older releases.
- one synchronized full table scan of Object_Narrow and Object_Extra every 8 hours for the latest and previous data releases.
I'm not sure why a node needs enough memory to hold the processed chunk and the next chunk. I would think it only needs to have enough memory to deal with the chunk it is working with right now, assuming this is per shared scan.
- For a self-joins, a single shared scans will be sufficient, however each node must have sufficient memory to hold 2 chunks at any given time (the processed chunk and next chunk). Refer to the sizing model [LDM-141] for further details on the cost of shared scans.
I also had a couple of thoughts on the sizing model and shared scans. It looks like we're heading for one cluster per release, which is simple but means doing joins across releases is not easy. If, on the other hand it is decided to put all releases on the same cluster, and put all the same chunks for all the releases on one node, the worker scheduler can be pretty easily modified to order scans by chunkId then release number. Provided an increase in worker nodes and czars with each added release, it should work. Joins between object_extra between releases could require a fair bit of memory, but if one release is under utilized, the other queries should be faster.
-John
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1