Should have sent to this list too.. -------- Original Message -------- Subject: notes from discussion about logging (db team + Steve) Date: Tue, 08 Oct 2013 11:57:27 -0700 From: Jacek Becla <[log in to unmask]> Organization: SLAC National Accelerator Laboratory To: LSST Data Management <[log in to unmask]> Steve, Serge, Daniel, Bill, Douglas, Jacek Purpose of the meeting: to capture database requirements and sync thoughts of the db team, so that we are well prepared for DM-wide discussion about logging. action items: - capture requirements / statement what is needed a) db team is capturing db-related input through: https://dev.lsstcorp.org/trac/wiki/db/Qserv/Logging (will add a section based on this meeting discussions) b) need to do that for middleware / DM-wide. KT will surely coordinate - document (summarize) what we already learned about existing tools - Bill will experiment with most-promising systems, will migrate the logging system he wrote for Qserv, without committing code, just to get some hands-on experience. ~1 day worth of work Requirements (brought up at the meeting): - want flexible system, every piece of code might generate different structure. So key/value best - want to impose some structure, like timestamp, thread id, component id - want to dynamically turn on/off parts of logging, and query immediately after - estimate expected rate... 1 million/sec aggregate??? - want to easily filter out subsets, drill in various directions, eg all info for short window of time, all info for a given component, etc - want to query on local logs, not only on centralized loging server - not too concerned if we lose a line of log here and there - need to support simple, easy logging for developers, eg log to screen or file is good, setting up a server is bad observation: different people have different idea about logging levels, need project-level policy best candidates: - Apache Flume (Java) - Kafka (Java) - Facebook's Scribe (C++) Apache flum looks most attractive - can monitor directories, watch files - free format structure - common solution is to put data in hdfs, query using hive, sql-like api - can be setup with "routers", where router is collecting logs for local application and data never hits logging server - application could write to a file (or cout), flum would watch the files. Files can be structured, eg json binary vs text logs - [Daniel:] - want to work with logs without writing text processing - if purely text, need to do text parsing - happy if we have both binary and text - but text only is not good enough Steve: - free form definitely needed (apps team expressed this many times) - pex harness, didn't work (logging daemon writing to db). whole backend didn't keep up - bigger / larger scale requirements - need to check w/KT Jacek ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1