> flag (by email or on the page) issues that
> we should think about, discuss etc.
A few things that came to mind right away:
Every step should produce metrics (number of rows, at a minimum)
that allow for at least minimal checks on progress and correctness.
If you don't run manual quality checks after doing a "LOAD DATA
INFILE", why should you need to do so after doing a qserv load? We need
to make the process so bulletproof that no such checks are needed. Yes,
things will go wrong during loading; yes, some of those might even be
things we haven't though of; even so, we should be able to develop
essentially a mathematical *proof* that the load will happen correctly
if it is reported as successful (at least given certain assumptions like
"LOAD DATA INFILE" works correctly).
The partitioning step is very map/reduce-like (with a simple
reduce that could just be concatenation). While we don't have to use an
off-the-shelf M/R system, we should be able to use all available
resources (i.e. workers) to do this.
--
Kian-Tat Lim, LSST Data Management, [log in to unmask]
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|