LISTSERV 16.5 - QSERV-L Archives

Attendees: Daniel, Serge, Jacek

* recent update to redhat 6.3 broke boost (1.41) on slac machines.
   - related Red Hat bugzilla case: #908774
   - ended up downgrading offending packages to get
     back to working version
   - redhat is downplaying the importance of this fix, but
     other labs are starting to join us and complain too

* weekly status report
  --> need to update more regularly, many qserv tasks
    not updated

* qserv review
  - tentatively planning for late July

* need to plan & document our plans re
   schema evolution and provenance
  - started trac page about schema evolution
    https://dev.lsstcorp.org/trac/wiki/db/SchemaEvolution
  - provenance harder!
  - dedicated phone call to discuss these issues
    Tue Mar 5 at 10:00am pacific

* upgrading boost to boot 1.53?
  - Serge would find some new features handy
  - no, too aggressive, better to rely on stable,
    mainstream packages that come with OS

* extra feature for partitioner: auto-detecting
   csv separator type
   - would be nice to auto-detect in loader if
     input rows are tab or comma separated,
   - requested/suggested by in2p3
   - ok to assume all files from given group have
     same separator? Yes
   --> defer adding new features like this until ongoing
       work on partitioner finished

* another extra feature for partitioner:
   read field names from the first line of file
  - would be nice to have
  - implement later, low priority

* to keep things simpler, we should require input data
   to perfectly match schema of table we load into
    - if realignment needed, pre-process input data

* should partitioner be able to reorder columns?
  - don't over-complicate for now

* should partitioner be able to drop columns from
   input data
  - needed to drop _chunkId column from our input
    data from pt1.2
  - don't over-complicate, clean up in separate
    clean up step, not in partitioner

* we want partitioner/duplicator to support
   sampling (eg produce 10% of what could be produced)
  - synchronizing ids between tables would be useful
  - implement in duplicator, not partitioner
    - already have ids in memory in duplicator,
      so it is easy

refmatch
  - patching qserv to deal with refmatch will
    take ~1 week
  - do shortly after partitioner ready


how should refmatch be treated in metadata
  - just add flag "isMatchTable"
  - should we also store info which tables are
    related to a given match table?
    - particularly useful if we want to manage
      multiple clusters of match tables, eg
     2mass2object, sdss2object, etc...
     - would be nice, for later


DC_W13_Stripe82_40deg does not have index
on RunDeepForcedSource.objectId
  - DC_W13_Stripe82 does
  - need to add index building to scripts used
    for production so that things are consistent
    and not forgotten
    - a mess... ingest for either Source or
      ForcedSource was rewritten, not sure which
      code is exercised, need to talk to KT


myisam
- can't sort by random column, only by index


Daniel redoing logging system for worker as part
of export path work

Jacek working on integrating metadata with qserv.

Helped Dave Monet, no need to use qserv for that.

Still want to try performance for differently
sorted data
  - managed to kill mysqld on lsst10 while sorting
    a copy of stripe82_40 deg last night...


Thanks,
Jacek

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1