Last week, we got memman working with mmap and mlock and there was a
huge speed boost (done in less than half the time). The downside was the
mlock call took several seconds and broke the worker scheduling. Jobs
were only being taken from the highest priority scheduler and
interactive jobs had a significant delay before the worker would get to
them (2 or 3 minutes). So I've been looking into the problem and it
looks like I've got it fixed. I don't entirely understand what is
happening, though.
If things run in the following sequence, the queries run very fast. The
main slowdown is waiting for mlock to finish. For example
[2016-06-10T21:28:34.651Z] ... ScanScheduler::commandStart QI=290477:6487;
[2016-06-10T21:28:34.651Z] ... QueryRunner::runQuery() QI=290477:6487;
[2016-06-10T21:28:34.651Z] ... QI=290477:6487; waitForMemMan begin
Waiting for mlock here - we may have had to wait for a few mlock
calls. I suspect mlock for this function took about 3 seconds.
[2016-06-10T21:28:49.579Z] ... QI=290477:6487; waitForMemMan end
At this point, the sql query is sent to mysql, and in < 0.2
seconds it's ready to transmit results
[2016-06-10T21:28:49.707Z] ... _transmit last=1 QI=290477:6487;
[2016-06-10T21:28:49.707Z] ... _transmit last=1 QI=290477:6487;
[2016-06-10T21:28:49.708Z] ... QI=290477:6487; processing sec=15
[2016-06-10T21:28:49.770Z] ... BlendScheduler::commandFinish QI=290477:6487;
And in 0.3 seconds the query is done
So, if we go through the trouble of using mmap and mlock, the queries
run fast and most of the time (80-90%) is spent in mlock. These are very
simple queries, but that seems strange as neither mmap nor mlock are
expected to read the file into memory, which is what I thought to be
expensive. And this works just as well for a single query as it does for
a group of queries on the same chunk. There's this strange mlock bottle
neck, but if you pay the price, everything else is much faster. It's
also worth noting that the system load is really low when using mlock.
Without mlock, the system load would be around 120, but with it the load
would be a fairly steady 40.
For the mlock speed up to work, the mlock call must be completed before
passing the query to mysql, or the speedup vanishes. Also, if more than
one mlock call is running at the same time, the speedup vanishes. To get
the scheduler to work properly, waiting for the mlock call must happen
outside the scheduler.
I've got code in tickets/DM-6518 that appears to work but needs cleanup,
and a bit more testing. However, SELECT COUNT(*) FROM Object; takes
about 30sec and SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1
AND 2; takes about 30min, which is much better than 3.8min and 1hr 15min
respectively.
-John
And test results:
Before mlock ,'SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND
2;' would take 1hr 15min.
DM-5709 - older without fix to let scheduler work properly but using mlock.
group
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 2; 3539300
27 min 51.44 sec
SELECT count(*) from Object WHERE u_apFluxSigma between 0 and 2.2e-30;
321080583 31 min 11.31 sec
select count(*) FROM Object WHERE u_apFluxSigma between 0 and 2.27e-30;
475244843 31 min 43.64 sec
SELECT COUNT(*) FROM Object; 3min 50sec
-----------------------------------------------------------------------------------
DM-6518 - latest - has fix for scheduler and uses mlock.
Solo queries
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 2; 27 min
39.02 sec
SELECT count(*) from Object WHERE u_apFluxSigma between 0 and 2.2e-30;
5 min 13.92 sec
SELECT COUNT(*) FROM Object; 31.61 sec
group
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 1 AND 2; 30 min
56.24 sec
SELECT count(*) from Object WHERE u_apFluxSigma between 0 and 2.2e-30;
17 min 9.31 sec
select count(*) FROM Object WHERE u_apFluxSigma between 0 and
2.27e-30; 17 min 9.00 sec
SELECT COUNT(*) FROM Source WHERE flux_sinc BETWEEN 2 AND 3; 30 min
53.80 sec
SELECT COUNT(*) FROM Object; 55.64 sec and 34.50 sec
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|