LISTSERV 16.5 - QSERV-L Archives

I would like to add Qserv will likely run faster with more RAM on the worker nodes. Currently, Qserv will work best if it can lock the shared scan tables for 2 or 3 chunks in memory, and not all the memory can be made available for locking the tables. At in2p3, about 10GB is available for locking tables, which is a bit low for the the size of the tables in our chunks. It's worth considering 32GB or more of RAM for the workers in the cluster.

From: [log in to unmask] <[log in to unmask]> on behalf of Fabrice Jammes <[log in to unmask]>
Sent: Tuesday, June 28, 2016 7:50 AM
To: Marcus Ebert
Cc: rgm; Dominique Boutigny; Eckhard Sutorius; George Beckett; qserv-l; Emmanuel GANGLER; Philippe Gris; VULPESCU Bogdan; Yvan Calas; Fabio Hernandez
Subject: Re: [QSERV-L] ingesting data into qserv

Hi Bob,

Please let me know when the mysql database will be ready. I'll also be in vacation at then end of July and August.

Here's some information about machines we currently use to run Qserv:

- 50 Dell nodes X

- 10TB hard disk (7TB partition dedicated to data)

- 8 cores (Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz)

- 16 GB RAM

We currently split this in two clusters, each with an identical ~30TB dataset (i.e. 1,2 TB per node), but we plan to load 120TB in the middle-term.

This cluster is used for development purpose, for running large scale tests against fresh Qserv code.

I think this setup is reasonable, other Qserv developers might give you additional informations.

Fabio and Yvan, from IN2P3 computing center, might also provide you additional information about hardware.

Regards,

Fabrice

De: "Marcus Ebert" <[log in to unmask]>
À: "Fabrice Jammes" <[log in to unmask]>
Cc: "rgm" <[log in to unmask]>, "Dominique Boutigny" <[log in to unmask]>, "Eckhard Sutorius" <[log in to unmask]>, "George Beckett" <[log in to unmask]>, [log in to unmask], "Emmanuel GANGLER" <[log in to unmask]>, "Philippe Gris" <[log in to unmask]>, "VULPESCU Bogdan" <[log in to unmask]>
Envoyé: Lundi 27 Juin 2016 15:22:31
Objet: Re: [QSERV-L] ingesting data into qserv

Hi Fabrice,

Thank you very much for offering your help and the proposal on how to proceed!

Let me just summarize, what we already did and what we want to do:

We have some qserv installations in VMs (on SL6 and CentOS7) running and did some general tests with the included integration test datasets in different configurations and to learn about the setup of qserv and the interaction of the worker nodes and master.

To see how it works under real life conditions and to plan for future hardware investment, we wanted to try out a test installation for a currently used database. That way we could compare it directly against the mono node installation using the traditional way and scale the requirements to the data amount LSST will have. We would also like to learn how to manage such qserv cluster and how to distribute the data on it in production since we will host the database here locally one day, and also want to get experience with the docker usage. In addition to our current test machines, there will be also an openstack installation that we can use for this kind of tests here. To be able to do so, we would like to have this database in qserv installed locally at our machines. Once this is successful, usable, and stable running on our site, we would like to make the hosted database available to researcher queries too, to see how it works under real life conditions in different configurations when used for astronomy analyses.

However, all databases currently used here are in MSSQL format and we are working right now on a good way to have it converted to mysql, especially for the schema. We did this successfully with a test database and work now on one of the UKIDSS releases. After that is done, we need to verify that what we have in mysql is the same like what is in MSSQL since some of the defaults are differently defined. Once that is done, we could go on to see how it can be loaded to qserv (probably by using a mysql dump as base for it?).

Eckhard and I are working on it right now, but we will be on vacation over the next weeks, however at different times.

Once we have verified that the mysql database we get out of the conversion gives the same result to queries like the MSSQL one, we could prepare the small sample to be integrated in the integration tests.

For the hardware aspect: Are there any recommendations out there what should be used for running qserv with LSST data when using bare-metal and a specific amount of worker nodes? Since it's distributed, I assume one wouldn't want to have such single powerful and fast machines like what was used for single node TB sized database servers so far, but instead a lot of machines with enough RAM and reasonable fast disk access?

One release of the UKIDSS data will be at the order of some TB (1-5TB depending on the release).

Cheers,

Marcus

On 23 June 2016 at 09:53, Fabrice Jammes <[log in to unmask]> wrote:

Bob,

On my side I'm very happy to help some adventurous early adopters :-)

My lab is located in Clermont-Ferrand, local LSST project manager is Emmanuel Gangler, local LSST computing manager is Philippe Gris, and Bogdan Vulpescu and I are engineers.

Bodgan explores multiple parts of LSST stack, both on the technical and science sides, and I'm fully dedicated to Qserv (I closely work with SLAC since a 3 years).

Are Marcus and Eckhard available right now for starting technical discussions? Which amount how data do you plan to load?

Here's the plan we could follow:

- The first step is to prepare a consistent, ~10MB, sample of your dataset.

- The second step is to integrate it in our integration tests, so that it will be automatically loaded on a Qserv mono-node instance. I can help here.

- Third step will be to determine manual data-loading procedure from point below. Bogdan Vulpescu and I might help here.

- Final step will be to set up a Qserv cluster and run manual data-loading procedure against it. We support CentOS7 + Docker on bare-metal or OpenStack infrasctructure.

Regards,

Fabrice

De: "Fabrice Jammes" <[log in to unmask]>
À: "Dominique Boutigny" <[log in to unmask]>, "marcus ebert" <[log in to unmask]>
Cc: "fabrice.jammes" <[log in to unmask]>
Envoyé: Jeudi 23 Juin 2016 09:58:38
Objet: Re: ingesting data into qserv

De: "Fabrice Jammes" <[log in to unmask]>
À: "fabrice.jammes" <[log in to unmask]>
Envoyé: Jeudi 23 Juin 2016 09:57:16
Objet: Fwd: ingesting data into qserv

---------- Forwarded message ----------
From: Bob Mann <[log in to unmask]>
Date: 22 June 2016 at 17:53
Subject: ingesting data into qserv
To: "[log in to unmask]" <[log in to unmask]>
Cc: Dominique Boutigny <[log in to unmask]>, "[log in to unmask]" <[log in to unmask]>, Eckhard Sutorius <[log in to unmask]>, George Beckett <[log in to unmask]>

Fabrice,

Apologies for emailing you out of the blue, but Dominique suggested your name as a contact from whom to seek advice about ingesting data into qserv.

As you may know, there is now a team in Edinburgh funded to make preparations for a UK DAC. and one of the things we are wanting to do is to load into qserv one of our own sky survey archives - probably one of the recent releases from UKIDSS - for which we have a log of user-executed queries, so that we can see how qserv performs on a realistic query workload.

We had heard from the LSST DM team that bulk ingest into qserv was still not easy, but that it had improved significantly recently, and Dominique suggested that you would be a good contact to find out more about that. Would you be able to offer us some advice on that? At our end, Marcus Ebert and Eckhard Sutorius (both Cc’d above) are the people doing the work, George Beckett is the LSST:Uk Project Manager and I am the LSST:UK Project Leader.

Many thanks

Bob Mann

Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1