Hi Georges,

In my understanding, we do not index tables on worker for now.
Indeed we have two indexes on the master (geometry, for cone search, and objectId<->chunkId index for search by object id, we call it secondary index) and we try not to add indexes on the worker to save space, and time.
This might change in the future if astronomers really push for indexing a few fields.

FYI, we have tried to do parallel indexing a few years ago, with Monty Widenius help: https://dev.lsstcorp.org/trac/wiki/mysqlLargeTablesAtIn2p3
this feature might be much more supported now.

Regards,

Fabrice


De: "George Beckett" <[log in to unmask]>
À: "Fabrice Jammes" <[log in to unmask]>, "QSERV-L" <[log in to unmask]>
Cc: "LI Teng" <[log in to unmask]>, "Emmanuel GANGLER" <[log in to unmask]>, "Fabio Hernandez" <[log in to unmask]>, "Dominique Boutigny" <[log in to unmask]>
Envoyé: Mercredi 24 Janvier 2018 15:17:17
Objet: RE: Qserv ingest/ indexing performance, questions about

Hi Fabrice (Igor and John),

 

Thanks for your feedback. It sounds as if ingest experiments are scaling up well.

 

Teng and I discussed your note about not having space to store indexes (other than the primary key). This seemed surprising to us, as we have estimated roughly that each index would likely take around 1—2% of the table storage. Is it the case that the return on investment is not worthwhile, if optimising for particular searches can’t be achieved with individual keys?

 

Our main concern is that indexing is time-consuming. We’ve been able to reducing indexing times to tractable levels by increasing the sort-buffer size in MySQL and using multi-threaded sort-recover. However, we’re only dealing with ~1 TB tables, so are several orders of magnitude behind you.

 

It would be good to hear your thoughts on the pros and cons of indexing/ not indexing, and your estimates of space requirements.

 

Thanks again,

George.

 

Cc: Teng, Emmanuel, Fabio, Dominique.

 

--

Dr George Beckett,

University of Edinburgh

 

Email: [log in to unmask]

Tel: +44 131 651 3577 (EPCC) /

      +44 131 6688 255 (Astronomy)

Web: http://www.epcc.ed.ac.uk/

 

The University of Edinburgh is a charitable body, registered in

Scotland, with registration number SC005336.

 

From: Fabrice Jammes [mailto:[log in to unmask]]
Sent: 07 December 2017 13:22
To: BECKETT George <[log in to unmask]>
Cc: LI Teng <[log in to unmask]>; QSERV-L <[log in to unmask]>; Emmanuel GANGLER <[log in to unmask]>; Fabio Hernandez <[log in to unmask]>; Dominique Boutigny <[log in to unmask]>
Subject: Re: Qserv ingest/ indexing performance, questions about

 

Dear Georges en Teng,

 

Sorry, I was travelling and I missed your previous email.

We're now ingesting ~ 100TB dataset inside Qserv and some progress should have been done related to data-loading.

Igor Gaponenko and John Gates are responsible for data ingestion and should have information about current state of the art. Would you like to get in touch with them?

 

FYI, I do not think we add additional indexes on our chunked tables (only PK), for space reason.

 

Regards,

 

Fabrice

 


De: "George Beckett" <[log in to unmask]>
À: "Fabrice Jammes" <[log in to unmask]>
Cc: "LI Teng" <[log in to unmask]>
Envoyé: Jeudi 7 Décembre 2017 11:37:21
Objet: RE: Qserv ingest/ indexing performance, questions about

 

Dear Fabrice,

 

Have you had a chance to consider my previous email? Teng has made some further progress w/ MySQL and Qserv, for ingesting and indexing a ~100GB test table, using ‘myisamchk’ as follows:

 

-        Setting a larger sort_buffer_size produces significant performance improvements, up to around 16GB.

-        The choice of indexing mode (recover vs. safe-recover) is significant and, given we have plenty of spare disk space on our server, the ‘recover’ mode also produces good speed-up.

 

We’re also considering using multithreading (parallel-recover), though note this is classed as ‘Beta’, so wonder if that will lead to problems. Teng is best placed to explain further (and correct anything above that I may have misunderstood).

 

Thanks again,

George.

 

Cc: Teng.

 

--

Dr George Beckett,

University of Edinburgh

 

Email: [log in to unmask]

Tel: +44 131 651 3577 (EPCC) /

      +44 131 6688 255 (Astronomy)

Web: http://www.epcc.ed.ac.uk/

 

The University of Edinburgh is a charitable body, registered in

Scotland, with registration number SC005336.

 

From: BECKETT George
Sent: 16 November 2017 16:40
To: [log in to unmask] <[log in to unmask]>
Cc: LI Teng <[log in to unmask]>
Subject: Qserv ingest/ indexing performance, questions about

 

Dear Fabrice,

 

I hope this email finds you well. I’d like to introduce you to a colleague of mine, Teng Li (cc’ed), who has recently joined the LSST:UK team here in Edinburgh.

 

Teng is working on an evaluation of Qserv, which has a couple of different angles. First, we are creating a suite of benchmark queries with which to test performance of Qserv for what will be crucial use cases for UK astronomers. Second, we are investigating the ingest mechanism for Qserv, given that we expect to need to quickly ingest large tables from LSST (and other surveys) during the Commissioning and Operations phases.

 

Teng has set up a couple of Qserv and MySQL instances into which he is ingesting copies of the UKIDSS and SDSS datasets. However, he has found the process to be very slow for the largest tables (especially indexing). A look through the LSST webpages led us to a report that you prepared some time ago on Qserv ingest/ indexing:

 

https://dev.lsstcorp.org/trac/wiki/mysqlLargeTablesAtIn2p3

 

and

 

https://dev.lsstcorp.org/trac/wiki/mysqlLargeTables.

 

Based on this, we have some ideas of how to improve ingest – e.g. looking at influence of sort_buffer_size and other indexing parameters. However, before we go too far with this investigation, we thought it worthwhile to check with you if there is any more up-to-date results, or if the situation has changed in the last four years.

 

Thanks in advance for any advice. I’ll leave it with Teng to answer any more technical questions you may have on our planned experiments, or our setup.

 

Speak to you soon,

George.

 

Cc: Teng.

 

--

Dr George Beckett,

University of Edinburgh

 

Email: [log in to unmask]

Tel: +44 131 651 3577 (EPCC) /

      +44 131 6688 255 (Astronomy)

Web: http://www.epcc.ed.ac.uk/

 

The University of Edinburgh is a charitable body, registered in

Scotland, with registration number SC005336.

 

 




Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1