Print

Print


Last week, we got memman working with mmap and mlock and there was a 
huge speed boost (done in less than half the time). The downside was the 
mlock call took several seconds and broke the worker scheduling. Jobs 
were only being taken from the highest priority scheduler and 
interactive jobs had a significant delay before the worker would get to 
them (2 or 3 minutes). So I've been looking into the problem and it 
looks like I've got it fixed. I don't entirely understand what is 
happening, though.

If things run in the following sequence, the queries run very fast. The 
main slowdown is waiting for mlock to finish. For example

[2016-06-10T21:28:34.651Z] ... ScanScheduler::commandStart QI=290477:6487;
[2016-06-10T21:28:34.651Z] ... QueryRunner::runQuery() QI=290477:6487;
[2016-06-10T21:28:34.651Z] ... QI=290477:6487; waitForMemMan begin
        Waiting for mlock here - we may have had to wait for a few mlock 
calls. I suspect mlock for this function took about 3 seconds.
[2016-06-10T21:28:49.579Z] ... QI=290477:6487; waitForMemMan end
        At this point, the sql query is sent to mysql, and in < 0.2 
seconds it's ready to transmit results
[2016-06-10T21:28:49.707Z] ... _transmit last=1 QI=290477:6487;
[2016-06-10T21:28:49.707Z] ... _transmit last=1 QI=290477:6487;
[2016-06-10T21:28:49.708Z] ... QI=290477:6487; processing sec=15
[2016-06-10T21:28:49.770Z] ... BlendScheduler::commandFinish QI=290477:6487;
        And in 0.3 seconds the query is done

So, if we go through the trouble of using mmap and mlock, the queries 
run fast and most of the time (80-90%) is spent in mlock. These are very 
simple queries, but that seems strange as neither mmap nor mlock are 
expected to read the file into memory, which is what I thought to be 
expensive. And this works just as well for a single query as it does for 
a group of queries on the same chunk. There's this strange mlock bottle 
neck, but if you pay the price, everything else is much faster. It's 
also worth noting that the system load is really low when using mlock. 
Without mlock, the system load would be around 120, but with it the load 
would be a fairly steady 40.

For the mlock speed up to work, the mlock call must be completed before 
passing the query to mysql, or the speedup vanishes. Also, if more than 
one mlock call is running at the same time, the speedup vanishes. To get 
the scheduler to work properly, waiting for the mlock call must happen 
outside the scheduler.

I've got code in tickets/DM-6518 that appears to work but needs cleanup, 
and a bit more testing. However, SELECT COUNT(*) FROM Object; takes 
about 30sec and SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 
AND 2; takes about 30min, which is much better than 3.8min and 1hr 15min 
respectively.


-John


And test results:

Before mlock ,'SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 
2;'  would take 1hr 15min.

DM-5709 - older without fix to let scheduler work properly but using mlock.
group
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 2; 3539300   
27 min 51.44 sec
SELECT count(*) from Object WHERE u_apFluxSigma between 0 and 2.2e-30; 
321080583   31 min 11.31 sec
select count(*) FROM Object WHERE u_apFluxSigma between 0 and 2.27e-30; 
475244843   31 min 43.64 sec
SELECT COUNT(*) FROM Object;   3min 50sec

-----------------------------------------------------------------------------------
DM-6518 - latest - has fix for scheduler and uses mlock.
Solo queries
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 2;    27 min 
39.02 sec
SELECT count(*) from Object WHERE u_apFluxSigma between 0 and 2.2e-30;   
5 min 13.92 sec
SELECT COUNT(*) FROM Object;   31.61 sec

group
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 2;    30 min 
56.24 sec
SELECT count(*) from Object WHERE u_apFluxSigma between 0 and 2.2e-30;   
17 min 9.31 sec
select count(*) FROM Object WHERE u_apFluxSigma between 0 and 
2.27e-30;   17 min 9.00 sec
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 2 AND 3;    30 min 
53.80 sec
SELECT COUNT(*) FROM Object;    55.64 sec and 34.50 sec

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1