LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  December 2015

QSERV-L December 2015

Subject:

Fwd: Re: shared scan

From:

Jacek Becla <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Tue, 1 Dec 2015 08:30:23 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (156 lines)

[sending to the list]

Andy,

We sort of realized you were thinking about implementing the scheduler
all on your side, that is why I asked so prominently :), there is no
point in wasting efforts. Our perhaps naive reaction was "but the
design of the existing ScanScheduler is not too bad, maybe we should
at least reuse it?" But I think we can be talked out of it! If you
call in tomorrow maybe we can discuss all this a bit.

Thanks
Jacek






-------- Forwarded Message --------
Subject: Re: shared scan
Date: Tue, 1 Dec 2015 02:23:21 -0800
From: Andrew Hanushevsky <[log in to unmask]>
To: Becla, Jacek <[log in to unmask]>
CC: Gates, John H <[log in to unmask]>, Mueller, Fritz 
<[log in to unmask]>

Hi All,

Well, it may seem that we are working at cross-puposes. I was under the
impression thatthe new (proposed) shared scan scheduler would replace wat
is in qserv at the moment. So, it would seem to me that devoting a lot of
time to further improve wht's there would be missplaced. I am proposing a
rather self-contained interface. The shared scan scheduler works on all of
the worker nodes trying to maximize the use of locked memory while
minimizing the amount that is locked. That is not an easy task. If you
look closely at AddQuery() it should be apparent that the scheduler ants
to know which tables a query will need to access and whether those tables
need to be locked. The tables you pass into Create() can be optionally
locked at the front (which I would assume that at least the object table
would be always locked -- otherwise it doesn't make much sense
resource-wise). Anyway it would be good to straighten all of this out. I
am not in favor of further massaging the existing code.

Andy

On Tue, 1 Dec 2015, Becla, Jacek wrote:

> John: thanks for writing this up.
>
> Andy, see below.
>
> The #1 question is, on which side of the fence are we doing scheduling?
> Your side, or  Qserv side (ScanScheduler)? If yours, why?
>
> I inserted more comments below
>
>
>> On Nov 30, 2015, at 3:41 PM, Gates, John H <[log in to unmask]> wrote:
>>
>> Jacek, Fritz,
>>
>> Please look this over and see if I missed anything or got something wrong
>>
>>
>>
>> Hi Andy,
>>
>> We (Jacek, Fritz, Nate, and John) had a discussion about the scan scheduler today. We'd like to know a bit more about what you have in mind, and let you know what we already have.
>>
>> There is currently a scan scheduler (wsched::ScanScheduler). UserQueries are broken into TaskMsg by the czar and sent to the worker, which turns them into Tasks. The Tasks are given to the BlendScheduler, which gives any Tasks with scantable_size > 0 to the ScanScheduler. The czar does all of the query analysis at this time
>>
>> The ScanScheduler has an active heap and a pending heap, both of which are minimum value heaps. It has chunk id that is currently being read in from disk (lastChunk). If a new Task with a chunk id higher than the lastChunk is added, it goes on the active heap. If it is less than or equal the chunkId, it goes on the pending heap. Once the active heap is empty, the pending heap is swapped with the active heap and the lastChunk is set equal to the top element of the new active heap.
>>
>> The ScanScheduler is currently allowed to advance to the next chunk id as soon as ANY query on the lastChunk finishes. This is pretty naive and will need to change. The current ScanScheduler is concerned with disk i/o and not concerned about memory constraints. Changing this is simply a matter of changing the _ready() function in the ScanScheduler so that Tasks can be started only when enough memory is available, or some other criteria.
>>
>> Scanning always goes by chunk id. There are not separate schedulers for Source tables and Object tables.
>> Scan scheduling will need to consider how much memory is available and the size of the files that would need to be locked.
>> There are currently 4 different scans that will probably each need their own scheduler:
>>    Object                               1hr per full scan
>>    Object joined with Source            8hr per full scan
>>    Object joined with Forced Source     8hr per full scan
>>    Object joined with Object_Extra      12hr per full scan
>> For each one, the appropriate tables need to be locked, the "Object" scheduler would only lock the Object table files for its current chunk id in memory, while "Object joined with Source" scheduler would lock Object and Source tables for its the current chunk id.
>
> Note that these are just ˙˙core˙˙ production tables, there were be many more, Object will be vertically partitioned into several tables, there will be many level 3 user tables˙˙
>
>
>
>>    Looking at this, it might be better to go with schedulers that run at expected rates (1hr/full_scan, 8hr/full_scan, 12hr/full_scan) and have flags indicating which tables they want to use. The problem being that the number of permutations of joins gets out of hand quickly. It would be simple to rank them by chunk id and then group them by which tables are needed. (Are there Source table only queries? Object_Extra table only queries? Object, ObjectExtra and Source?)
>>
>> It might be desirable to have the Object scheduler be able to identify slow Tasks and take all Tasks for that UserQuery and move them to the Object joined with Source scheduler, so they don't bog down the Object scheduler. This would require a unique user query query id or something similar.
>>
>> I don't think this would be difficult to do with the current BlendScheduler and ScanScheduler. They already contain code to limit the number of threads spawned by any scheduler type and easy to change values for controlling their limits at a high level in the code. It's pretty easy to have multiple schedulers and switch between them at compile time (or at program start up if we really want to).  Thoughts?
>>
>> The table sizes should be something like:   ( first year size -> size after 10 years )
>>   Object    1x
>>   ObjExtra 10x
>>   Forced Source  1x˙˙> 10x
>>   Source   5x˙˙>40x
>>
>> Should we do anything for tables required for the query that don't need to be locked?
>
>
> Do we  need to pass to your functions the tables that do not need to
> be locked as part of shared scans, say we have a query:
>
> SELECT <whatever>
> FROM Object o
> JOIN Source s using (o.objectId = s.objectId)
> JOIN Filter f using s.filterId = f.filterId
> WHERE f.filterName = ˙˙r˙˙
>
> the table Filter is tiny (6 rows) and there is no need to lock it, should we still pass it? No?
> I guess not, but the docs needs clarification.
>
>
>>
>> We need to ask Mario:
>> - Will we have queries that want to see sources and forces sources?
>> - Joining between data releases - do we need to handle all Drs data through the same qserv instance?
>
> I˙˙m going to ask Mario
>
>>
>> For scheduling to work, we will need some information available. This will need to part of the interface.
>> - Which tables are locked in memory?
>> - How many Tasks are using a particular table locked in memory? (Free them by reference counting?)
>> - How much memory have we locked up?
>> - What's the most memory we should have locked up?
>> - Before a table is locked in memory, how much room is it likely to take?
>>
>> Note that the GroupScheduler is working through its own Tasks. Its Tasks only involve a couple of chunks, but it still needs some memory to work with.
>>
>>
>>
>>
>> Concerns/clarification for anything above?
>>
>> What are the arguments for having your code do the scheduling?
>>
>> important details of the file locking?
>>
>>
>>
>> Thanks,
>> John
>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use