Attendees: Serge, Daniel, Douglas, Bill Chickering, Srini, Jacek Measuring memory usage for queries (Srini) - L1, L2 hits, cache misses - using perfctr now - PAPI (http://icl.cs.utk.edu/papi/) would be a good alternative - perfctr requires rebuilding kernel, PAPI wouldn't - intercepting queries in parser code (had to rebuilt mysql) - calculating memory references (read last level cache miss before and after query) - can monitor user level and kernel level separately - counters are done per query (per thread id) - could be useful to watch if cache miss rates goes up as we increase concurrency to see if we are approaching memory contention in shared scans - in short term, instrumenting block reads should be sufficient for monitoring shared scan Shared scans - prototype working well, promising: - 1 query - 3 min, 30 queries - 5 min from now on, use qserv mailing list for *all* qserv-related discussion - connect Bill to mailing list - issues with objectId - likely problem: wrong column name used in install scripts (x_chunkId x_subChunkId) - queries with joins - likely related to the above - try on 1 node first [Douglas] - try near neighbor [Douglas] - try shared scans on 300 nodes later if time permits (after joins, objId, nn demonstrated) - fixes Serge implemented for concurrency "stuck" at in2p3, Serge's account disabled because of issues with paperwork, new forms being processed - test queries for shared scans (first iteration): https://dev.lsstcorp.org/trac/wiki/db/Queries/testingSharedScans - need to check which are not supported by new parser, rewrite or fix parser, or skip [Serge] - test environment setup by Douglas on lsst-dbdev5 - use that for testing queries, or the private setup that Serge has - installer broken at the tip of master because of new xrootd that was added for 300-node test as part of emergency fixes for concurrency - will merge the working special branch used at in2p3 into the tip of the master, by tomorrow [Douglas] - concurrency issue - mysteriously went away after restarting everything - 300 nodes - full object scans in ~8 sec, most data cached - full source scan (off disk, that does not fit into memory) 6-10 min, not bad - check disk speed on these machines, compare effective io with raw disk bandwidth available on the cluster [Douglas] - ref match https://dev.lsstcorp.org/trac/wiki/db/Qserv/partitioning#MatchTables - realistically, impossible to finish and run pipeqa before fdr - Bill will check what Daniel did for near neighbor (similar code will be needed for ref match) - Daniel working on "JOIN" syntax now, ~2 more days of work, this is needed for ref match - Bill will read up, talk to Serge next week, try to implement ref match - qserv hackathon planned for Nov 11-15 @slac - timing works, but in2p3 can support travel for only one person, we really need both Fabrice and Emmanuel - talk to Dominique [Jacek] - possibly delay until new FY @in2p3 (aligned with calendar year), and do the meeting in January Jacek ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1