Print

Print


Should have sent to this list too..




-------- Original Message --------
Subject: notes from discussion about logging (db team + Steve)
Date: Tue, 08 Oct 2013 11:57:27 -0700
From: Jacek Becla <[log in to unmask]>
Organization: SLAC National Accelerator Laboratory
To: LSST Data Management <[log in to unmask]>

Steve, Serge, Daniel, Bill, Douglas, Jacek


Purpose of the meeting: to capture database
requirements and sync thoughts of the db team,
so that we are well prepared for DM-wide
discussion about logging.


action items:
  - capture requirements / statement what is needed
    a) db team is capturing db-related input through:
       https://dev.lsstcorp.org/trac/wiki/db/Qserv/Logging
       (will add a section based on this meeting discussions)
    b) need to do that for middleware / DM-wide. KT will
       surely coordinate
  - document (summarize) what we already learned about
    existing tools
  - Bill will experiment with most-promising systems,
    will migrate the logging system he wrote for Qserv,
    without committing code, just to get some hands-on
    experience. ~1 day worth of work


Requirements (brought up at the meeting):
  - want flexible system, every piece of code might
    generate different structure. So key/value best
  - want to impose some structure, like timestamp,
    thread id, component id
  - want to dynamically turn on/off parts of logging,
    and query immediately after
  - estimate expected rate... 1 million/sec aggregate???
  - want to easily filter out subsets, drill in various
    directions, eg all info for short window of time,
    all info for a given component, etc
  - want to query on local logs, not only on centralized
    loging server
  - not too concerned if we lose a line of log here and there
  - need to support simple, easy logging for developers,
    eg log to screen or file is good, setting up a server
    is bad


observation: different people have different idea
about logging levels, need project-level policy


best candidates:
  - Apache Flume (Java)
  - Kafka (Java)
  - Facebook's Scribe (C++)


Apache flum looks most attractive
  - can monitor directories, watch files
  - free format structure
  - common solution is to put data in hdfs,
    query using hive, sql-like api
  - can be setup with "routers", where router is collecting
    logs for local application and data never hits
    logging server
  - application could write to a file (or cout),
    flum would watch the files. Files can be structured,
    eg json


binary vs text logs
  - [Daniel:]
     - want to work with logs without writing text processing
       - if purely text, need to do text parsing
     - happy if we have both binary and text
     - but text only is not good enough


Steve:
  - free form definitely needed (apps team expressed this
    many times)
  - pex harness, didn't work (logging daemon writing to db).
    whole backend didn't keep up
  - bigger / larger scale requirements - need to check w/KT


Jacek

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1