LISTSERV 16.5 - QSERV-L Archives

[we skipped the mtg last Thursday, this was a replacement]

Present: Fabrice, Douglas, Bill, Serge, Jacek



Bill/logging
  - main uses of logging?
    - Debugging
    - Probably won't be doing heavy analytics as
      companies typically do
    - access patterns captured through mysql query log
      - which should probably be integrated into
        the new central logging:
         - eg, mysql can write to to file instead of table,
           and logging will take it from there
  - don't overdesign logging now!
  - avoid strong coupling with qserv, will make it easier
    to migrate in the future if we need to


Douglas
  - started looking at zookeeper, reading docs
    and installing
  - not clear yet what it could be used for
  - document all lessons learned, in 2 trac pages:
    dev log and summary
  - btw, kafka uses zookeeper


Fabrice
  - implemented start/stop service for qserv, (a.la. init.d)
  - have script that loads metadata, handy for testing,
    now can test only metadata loading. This all used
    to be hardcoded
  - working on W13 test data set, loaded metadata,
    trying which queries not working
  - automated testing is working, but many queries broken
    - Daniel working on fixing some of them in the parser
    - should easily work with buildbot
     - need to check with Robyn (but she is busy with FDR now)


Serge
  - is taking one more pass on the big parser ticket
  - looking into getting rid of config file for
    partitioner and duplicator, (custom config file
    format, probably better to swig c++/python, and
    talk to qms from python, it will be all simpler)
- next: applying fixes for concurrency
- and after than: getting rid of subchunks

- 2 layers of code that execute sql, probably related
   to the new shared scan code



Redesign (qserv core)
---------------------
  - lot of old, unused code in qserv, need to clean up
  - but also need to look at higher level class redesign
  - and even higher. Some of the key open questions:
    - how smart/dumb a worker should be?
       - perhaps a little smarter?
       - What are cons of having smarter worker?
           - want workers to be easily disposable
           - as we increase interface complexity,
             debugging gets harder

Things to redesign on the worker:
  - class redesign probably needed, eg queries are tracked
    by hash, there no hash collision strategy, nothing ever
    gets deleted, interractions for 2 queries that start at
    ~the same time might get mixed up
  - perhaps premerge on worker instead of sending individual
    chunk results to master
  - perhaps instead of sending individual queries to each
    subchunk, send simple template to worker, and worker
    distributes to subchunks (this increases worker complexity..)

modularity
  - master/worker separation is clean, but
    beyond that it quickly gets very murky
  - modules we thought would be useful to define:
    - query rewriting (this is on master only)
    - parsing logic (this is on master only)
    - thread pooling (this is needed both on M and W,
      probably will need one common module, plus two
      implementations: one for M and one for W, but
      this needs more thinking)
    - sql execution (mainly worker, but there are some
      things happening on the master too. Potentially
      the logic could be structured similarly to thread
      pooling: common plus 2 implementations)
    - xrootd glue layer
    - merging logic
    - c++ geometry (on master, could be used on worker)
    - logging (small module, eg for things like turning
      to json, the actual logging is called from everywhere)
    - metadata interface
    - chunk mapping

  - perhaps put each module in separate name space and
    subdirectory?
  - each module should have its own unit testing (perhaps
    its own directory for tests)

  - look at each class in the source code and try to assign
    it to module(s)
    --> action item: Bill will look into that, will have
        something that we can be used to drive discussion
        at the next meeting (this Thursday)
      - capture in trac page
        - definition of module, where it belongs (m/w/both), etc


Redesign (qserv installation/administration)
--------------------------------------------

  - two big things that need attention:
    a) packaging
    b) use of scons
    - modularity is not bad

Packaging
  - action item: evaluate eups, makes most sense if Fabrice/Emmanuel
    do that. Document in trac

Scons
  - evaluate use of scons, it is now heavily used in install,
    (and for building qserv)
  - scons-based build system for qserv needs major rework
  - it would be good to get some advice from some experts
    outside of qserv team
  - action item: write down what scons is doing / what we
    want from scons (Fabrice)
  - also, need to get rid of perl scripts

Jacek

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1