Print

Print


I meant HTM and not HTML, autocorrect :)

Srini
________________________________________
From: [log in to unmask] [[log in to unmask]] on behalf of Srinivasan Chandrasekharan
Sent: Wednesday, April 06, 2016 6:19 AM
To: Kian-Tat Lim
Cc: qserv-l
Subject: Re: [QSERV-L] spatial DB - quick question.

Thanks KT. What you say makes sense and I had something similar in mind so its good to know I am the right path. I will try out your query on a smaller chunk of the data so that I can see how inefficiently it executes based on the Btree indexing I have on the columns.

I played around with R-trees a while back but primarily as a test case for some research but it was pretty limited. I haven't tried out qserv at all recently (more than a year) but I think I should soon.

I maybe able to cut down on the size of the table also after discussions so I think I may get speed up just by removing unwanted rows.

About the HTML triple indexes, I have no experience with that. Do you know who I can talk to ? to learn more about it ?

Regards,
Srini
________________________________________
From: Kian-Tat Lim
Sent: Wednesday, April 06, 2016 12:35 AM
To: Srinivasan Chandrasekharan
Cc: qserv-l
Subject: Re: [QSERV-L] spatial DB - quick question.

Srini,

> >So I have a catalog of 600 million stars. The data that I have is :
> >
> >ra, dec, muRa, muDec, …. some mag columns etc etc.
> >
> >I want to add a few columns,
> >
> >1. Add column for each row that tells me how many stars I have in the
> >neighborhood with distance x.
> >2. Add column for each row that tells me how many stars I have in the
> >neighborhood with distance x + 1.

This is essentially a spatial (self-)cross-match plus GROUP BY (or
subquery) and COUNT(*) aggregation.

SELECT A.id, COUNT(*) AS neighbors_x FROM Stars AS A, Stars AS B
WHERE A.id != B.id AND SPH_DIST(A.ra, A.dec, B.ra, B.dec) <= x
GROUP BY A.id

Executing this directly will typically be very inefficient, of course.

Adding RA and dec ranges to the WHERE clause (like B.ra BETWEEN A.ra - x
AND A.ra + x, but handling wraparound) could speed things up
significantly if you've B-tree indexed on RA and dec and your database
engine can handle the range indexing well.  I think MySQL does *not* do
this well, SQL Server does (hence its usage in SDSS), and I'm not sure
about PostGreSQL.  Computing zones (RA bands within dec stripes) or HTM
trixel indexes can also help optimize the query.  An R-tree index is
another possibility, but it may require that you transform to x,y,z
coordinates for best speed.

Qserv's chunk/subchunk scheme is supposed to make queries like this more
efficient.  You could try running on a single-node instance.

--
Kian-Tat Lim, LSST Data Management, [log in to unmask]
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1