LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  October 2013

QSERV-L October 2013

Subject:

Re: data loading

From:

Douglas Smith <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Fri, 25 Oct 2013 10:39:01 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (56 lines)

On 10/25/2013 01:35 AM, Kian-Tat Lim wrote:
> Jacek,
>
>>> 	If you don't run manual quality checks after doing a "LOAD DATA
>>> INFILE", why should you need to do so after doing a qserv load?
>> Where did I imply it? We didn't plan any manual quality checks,
>> the only "manual" operations that we thought a person would do
>> would be:
>>   a) someone responsible for producing the data we are loading
>>      needs to come up with configuration (e.g., needs to decide
>>      which tables to partition, what are partition parameters etc)
> 	I agree.  (But I don't necessarily like the mention of
> "location" in there; that should be controlled by the system
> exclusively.)
>
>>   b) DBA sends a command "publish data set <x>", I think letting
>>      the system to automatically publish data after it is all
>>      ready would be too much
> 	What is the DBA doing before sending the command?  Is this just
> a matter of timing, or is the DBA supposed to be verifying something?
> The latter is what I was worried about.  I'm also worried about the
> assumption that we need lots of extra disk space for staging.  For
> read-only data, loading chunks into their final locations as directly as
> possible (but only enabling their usage in the metadata when at least
> one copy of each has been loaded) would not seem to be a problem.

The DBA is really just waiting for the production manager process to
say that all the data for the processing has been created and is good.
There will be 100k's of tables created so the verify of the table
quality will be done by the system and then reported to the DBA.

Not completely sure about the need for lots of disk space for staging,
but some will be needed.  The extra space on the worker nodes
might be used, but the system needs to have the freedom not to have
to use the workers.  We will get into a position where the workers
will have i/o load to serve the existing data while we have to produce
new data, and it might not be possible to use existing workers.
But what resources to use will just be a computing site configuration,
either existing worker nodes, or another set of resources should be
freely defined.

As for producing "into place" because of replication of data this
will never really be true.  One set might be produced into place,
but then another replication will have to be copied from this and
moved onto other worker nodes in any case.  The data placement
and replication service should decide where the data goes, because
of possible changing worker nodes while producing data, this
might not be where the tables were created, and we need that
freedom also.

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use