Nate Not today, we were talking about qserv hangout tomorrow (Wed) at noon Jacek On 12/01/2015 09:40 AM, Pease, Nathan wrote: > it sounds like there is another meeting on this today; is there a hangout for it? I'd like to join remotely if possible. > > thanks, > nate > ________________________________________ > From: [log in to unmask] <[log in to unmask]> on behalf of Andrew Hanushevsky <[log in to unmask]> > Sent: Tuesday, December 1, 2015 8:52 AM > To: Becla, Jacek > Cc: qserv-l > Subject: Re: [QSERV-L] Fwd: Re: shared scan > > OK. I mean you are right there is no point in duplicating the effort. I > felt tat a clean self-contained implementation would be more maintainable > in the long run. I will call probably at noon. > > Andy > > On Tue, 1 Dec 2015, Jacek Becla wrote: > >> [sending to the list] >> >> Andy, >> >> We sort of realized you were thinking about implementing the scheduler >> all on your side, that is why I asked so prominently :), there is no >> point in wasting efforts. Our perhaps naive reaction was "but the >> design of the existing ScanScheduler is not too bad, maybe we should >> at least reuse it?" But I think we can be talked out of it! If you >> call in tomorrow maybe we can discuss all this a bit. >> >> Thanks >> Jacek >> >> >> >> >> >> >> -------- Forwarded Message -------- >> Subject: Re: shared scan >> Date: Tue, 1 Dec 2015 02:23:21 -0800 >> From: Andrew Hanushevsky <[log in to unmask]> >> To: Becla, Jacek <[log in to unmask]> >> CC: Gates, John H <[log in to unmask]>, Mueller, Fritz >> <[log in to unmask]> >> >> Hi All, >> >> Well, it may seem that we are working at cross-puposes. I was under the >> impression thatthe new (proposed) shared scan scheduler would replace wat >> is in qserv at the moment. So, it would seem to me that devoting a lot of >> time to further improve wht's there would be missplaced. I am proposing a >> rather self-contained interface. The shared scan scheduler works on all of >> the worker nodes trying to maximize the use of locked memory while >> minimizing the amount that is locked. That is not an easy task. If you >> look closely at AddQuery() it should be apparent that the scheduler ants >> to know which tables a query will need to access and whether those tables >> need to be locked. The tables you pass into Create() can be optionally >> locked at the front (which I would assume that at least the object table >> would be always locked -- otherwise it doesn't make much sense >> resource-wise). Anyway it would be good to straighten all of this out. I >> am not in favor of further massaging the existing code. >> >> Andy >> >> On Tue, 1 Dec 2015, Becla, Jacek wrote: >> >>> John: thanks for writing this up. >>> >>> Andy, see below. >>> >>> The #1 question is, on which side of the fence are we doing scheduling? >>> Your side, or Qserv side (ScanScheduler)? If yours, why? >>> >>> I inserted more comments below >>> >>> >>>> On Nov 30, 2015, at 3:41 PM, Gates, John H <[log in to unmask]> >>>> wrote: >>>> >>>> Jacek, Fritz, >>>> >>>> Please look this over and see if I missed anything or got something wrong >>>> >>>> >>>> >>>> Hi Andy, >>>> >>>> We (Jacek, Fritz, Nate, and John) had a discussion about the scan >>>> scheduler today. We'd like to know a bit more about what you have in mind, >>>> and let you know what we already have. >>>> >>>> There is currently a scan scheduler (wsched::ScanScheduler). UserQueries >>>> are broken into TaskMsg by the czar and sent to the worker, which turns >>>> them into Tasks. The Tasks are given to the BlendScheduler, which gives >>>> any Tasks with scantable_size > 0 to the ScanScheduler. The czar does all >>>> of the query analysis at this time >>>> >>>> The ScanScheduler has an active heap and a pending heap, both of which are >>>> minimum value heaps. It has chunk id that is currently being read in from >>>> disk (lastChunk). If a new Task with a chunk id higher than the lastChunk >>>> is added, it goes on the active heap. If it is less than or equal the >>>> chunkId, it goes on the pending heap. Once the active heap is empty, the >>>> pending heap is swapped with the active heap and the lastChunk is set >>>> equal to the top element of the new active heap. >>>> >>>> The ScanScheduler is currently allowed to advance to the next chunk id as >>>> soon as ANY query on the lastChunk finishes. This is pretty naive and will >>>> need to change. The current ScanScheduler is concerned with disk i/o and >>>> not concerned about memory constraints. Changing this is simply a matter >>>> of changing the _ready() function in the ScanScheduler so that Tasks can >>>> be started only when enough memory is available, or some other criteria. >>>> >>>> Scanning always goes by chunk id. There are not separate schedulers for >>>> Source tables and Object tables. >>>> Scan scheduling will need to consider how much memory is available and the >>>> size of the files that would need to be locked. >>>> There are currently 4 different scans that will probably each need their >>>> own scheduler: >>>> Object 1hr per full scan >>>> Object joined with Source 8hr per full scan >>>> Object joined with Forced Source 8hr per full scan >>>> Object joined with Object_Extra 12hr per full scan >>>> For each one, the appropriate tables need to be locked, the "Object" >>>> scheduler would only lock the Object table files for its current chunk id >>>> in memory, while "Object joined with Source" scheduler would lock Object >>>> and Source tables for its the current chunk id. >>> >>> Note that these are just core production tables, there were be many more, >>> Object will be vertically partitioned into several tables, there will be >>> many level 3 user tables >>> >>> >>> >>>> Looking at this, it might be better to go with schedulers that run at >>>> expected rates (1hr/full_scan, 8hr/full_scan, 12hr/full_scan) and have >>>> flags indicating which tables they want to use. The problem being that the >>>> number of permutations of joins gets out of hand quickly. It would be >>>> simple to rank them by chunk id and then group them by which tables are >>>> needed. (Are there Source table only queries? Object_Extra table only >>>> queries? Object, ObjectExtra and Source?) >>>> >>>> It might be desirable to have the Object scheduler be able to identify >>>> slow Tasks and take all Tasks for that UserQuery and move them to the >>>> Object joined with Source scheduler, so they don't bog down the Object >>>> scheduler. This would require a unique user query query id or something >>>> similar. >>>> >>>> I don't think this would be difficult to do with the current >>>> BlendScheduler and ScanScheduler. They already contain code to limit the >>>> number of threads spawned by any scheduler type and easy to change values >>>> for controlling their limits at a high level in the code. It's pretty easy >>>> to have multiple schedulers and switch between them at compile time (or at >>>> program start up if we really want to). Thoughts? >>>> >>>> The table sizes should be something like: ( first year size -> size >>>> after 10 years ) >>>> Object 1x >>>> ObjExtra 10x >>>> Forced Source 1x> 10x >>>> Source 5x>40x >>>> >>>> Should we do anything for tables required for the query that don't need to >>>> be locked? >>> >>> >>> Do we need to pass to your functions the tables that do not need to >>> be locked as part of shared scans, say we have a query: >>> >>> SELECT <whatever> >>> FROM Object o >>> JOIN Source s using (o.objectId = s.objectId) >>> JOIN Filter f using s.filterId = f.filterId >>> WHERE f.filterName = r >>> >>> the table Filter is tiny (6 rows) and there is no need to lock it, should >>> we still pass it? No? >>> I guess not, but the docs needs clarification. >>> >>> >>>> >>>> We need to ask Mario: >>>> - Will we have queries that want to see sources and forces sources? >>>> - Joining between data releases - do we need to handle all Drs data >>>> through the same qserv instance? >>> >>> Im going to ask Mario >>> >>>> >>>> For scheduling to work, we will need some information available. This will >>>> need to part of the interface. >>>> - Which tables are locked in memory? >>>> - How many Tasks are using a particular table locked in memory? (Free them >>>> by reference counting?) >>>> - How much memory have we locked up? >>>> - What's the most memory we should have locked up? >>>> - Before a table is locked in memory, how much room is it likely to take? >>>> >>>> Note that the GroupScheduler is working through its own Tasks. Its Tasks >>>> only involve a couple of chunks, but it still needs some memory to work >>>> with. >>>> >>>> >>>> >>>> >>>> Concerns/clarification for anything above? >>>> >>>> What are the arguments for having your code do the scheduling? >>>> >>>> important details of the file locking? >>>> >>>> >>>> >>>> Thanks, >>>> John >>> >>> >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the QSERV-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 >> > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1