Attendees: Serge, Daniel, Douglas, Bill Chickering, Srini, Jacek
Measuring memory usage for queries (Srini)
- L1, L2 hits, cache misses
- using perfctr now
- PAPI (http://icl.cs.utk.edu/papi/) would be a good
alternative
- perfctr requires rebuilding kernel, PAPI wouldn't
- intercepting queries in parser code (had to rebuilt mysql)
- calculating memory references (read last level cache
miss before and after query)
- can monitor user level and kernel level separately
- counters are done per query (per thread id)
- could be useful to watch if cache miss rates goes up
as we increase concurrency to see if we are approaching
memory contention in shared scans
- in short term, instrumenting block reads should be
sufficient for monitoring shared scan
Shared scans
- prototype working well, promising:
- 1 query - 3 min, 30 queries - 5 min
from now on, use qserv mailing list
for *all* qserv-related discussion
- connect Bill to mailing list
- issues with objectId
- likely problem: wrong column name used in install
scripts (x_chunkId x_subChunkId)
- queries with joins - likely related to the above
- try on 1 node first [Douglas]
- try near neighbor [Douglas]
- try shared scans on 300 nodes later if time permits
(after joins, objId, nn demonstrated)
- fixes Serge implemented for concurrency
"stuck" at in2p3, Serge's account disabled because
of issues with paperwork, new forms being processed
- test queries for shared scans (first iteration):
https://dev.lsstcorp.org/trac/wiki/db/Queries/testingSharedScans
- need to check which are not supported by new parser,
rewrite or fix parser, or skip [Serge]
- test environment setup by Douglas on lsst-dbdev5
- use that for testing queries, or the private setup that
Serge has
- installer broken at the tip of master because of new xrootd
that was added for 300-node test as part of emergency fixes
for concurrency
- will merge the working special branch used at in2p3
into the tip of the master, by tomorrow [Douglas]
- concurrency issue
- mysteriously went away after restarting everything
- 300 nodes
- full object scans in ~8 sec, most data cached
- full source scan (off disk, that does not fit into
memory) 6-10 min, not bad
- check disk speed on these machines, compare
effective io with raw disk bandwidth available on
the cluster [Douglas]
- ref match
https://dev.lsstcorp.org/trac/wiki/db/Qserv/partitioning#MatchTables
- realistically, impossible to finish and run pipeqa
before fdr
- Bill will check what Daniel did for near neighbor
(similar code will be needed for ref match)
- Daniel working on "JOIN" syntax now, ~2 more days of work,
this is needed for ref match
- Bill will read up, talk to Serge next week, try to
implement ref match
- qserv hackathon planned for Nov 11-15 @slac
- timing works, but in2p3 can support travel for only
one person, we really need both Fabrice and Emmanuel
- talk to Dominique [Jacek]
- possibly delay until new FY @in2p3 (aligned with calendar
year), and do the meeting in January
Jacek
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|