Print

Print


Attendees: Serge, Daniel, Douglas, Bill Chickering, Srini, Jacek


Measuring memory usage for queries (Srini)
  - L1, L2 hits, cache misses
  - using perfctr now
    - PAPI (http://icl.cs.utk.edu/papi/) would be a good
      alternative
    - perfctr requires rebuilding kernel, PAPI wouldn't
  - intercepting queries in parser code (had to rebuilt mysql)
  - calculating memory references (read last level cache
    miss before and after query)
  -  can monitor user level and kernel level separately
  - counters are done per query (per thread id)
  - could be useful to watch if cache miss rates goes up
    as we increase concurrency to see if we are approaching
    memory contention in shared scans

  - in short term, instrumenting block reads should be
    sufficient for monitoring shared scan


Shared scans
  - prototype working well, promising:
    - 1 query - 3 min, 30 queries - 5 min


from now on, use qserv mailing list
for *all* qserv-related discussion
  - connect Bill to mailing list


- issues with objectId
   - likely problem: wrong column name used in install
     scripts (x_chunkId x_subChunkId)
   - queries with joins - likely related to the above
- try on 1 node first [Douglas]
- try near neighbor [Douglas]

- try shared scans on 300 nodes later if time permits
   (after joins, objId, nn demonstrated)

- fixes Serge implemented for concurrency
   "stuck" at in2p3, Serge's account disabled because
   of issues with paperwork, new forms being processed

- test queries for shared scans (first iteration):
   https://dev.lsstcorp.org/trac/wiki/db/Queries/testingSharedScans
    - need to check which are not supported by new parser,
      rewrite or fix parser, or skip [Serge]

- test environment setup by Douglas on lsst-dbdev5
    - use that for testing queries, or the private setup that
      Serge has

- installer broken at the tip of master because of new xrootd
   that was added for 300-node test as part of emergency fixes
   for concurrency
- will merge the working special branch used at in2p3
   into the tip of the master, by tomorrow [Douglas]

- concurrency issue
    - mysteriously went away after restarting everything

- 300 nodes
    - full object scans in ~8 sec, most data cached
    - full source scan (off disk, that does not fit into
      memory) 6-10 min, not bad
      - check disk speed on these machines, compare
        effective io with raw disk bandwidth available on
        the cluster [Douglas]

  - ref match
https://dev.lsstcorp.org/trac/wiki/db/Qserv/partitioning#MatchTables
    - realistically, impossible to finish and run pipeqa
      before fdr
    - Bill will check what Daniel did for near neighbor
      (similar code will be needed for ref match)
    - Daniel working on "JOIN" syntax now, ~2 more days of work,
      this is needed for ref match
    - Bill will read up, talk to Serge next week, try to
      implement ref match

  - qserv hackathon planned for Nov 11-15 @slac
    - timing works, but in2p3 can support travel for only
      one person, we really need both Fabrice and Emmanuel
      - talk to Dominique [Jacek]
    - possibly delay until new FY @in2p3 (aligned with calendar
      year), and do the meeting in January

Jacek

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1