Print

Print


Hi Jacek,

On 09/13/2013 12:15 PM, Becla, Jacek wrote:
> ##### large area (500 sq deg) object scan with simple aggregation
> SELECT  AVG(parallax)
> FROM    Object
> WHERE   qserv_areaspec_box(-3,7,10, 60)
This works:

SELECT  AVG(gFlux_PS)
FROM    Object
WHERE   qserv_areaspec_box(-3,10,7, 60)

I fixed the coordinate ordering to something you probably meant. We can 
argue about the ordering of coordinates in qserv_areaspec_box() later.

> ##### full object scan, density per declination zone
> SELECT  ROUND(decl_PS, 0) AS declRounded, COUNT(*) AS theCount
> FROM     Object
> GROUP BY declRounded
> ORDER BY theCount;
Now it works.

> ##### near neighbor, 100 sqdeg
> SELECT   COUNT(*)
> FROM     Object o1, Object o2
> WHERE    qserv_areaspec_box(-5,-5, 5, 5)
>     AND    qserv_angSep(o1.ra_PS, o1.decl_PS, o2.ra_PS, o2.decl_PS) < 0.1;
This is really slow, primarily because so much work is wasted on 
creating/destroying empty subchunks. The master doesn't (though it 
could) know what subchunks are empty, so it (a) generates queries acting 
on empty subchunks and (b) requests those empty subchunks to be created.

Solutions:
1. Move subchunk query generation to the worker (protocol changes, 
dependency shuffling to promote the query templating code into common). 
This doesn't really fix anything by itself, except it improves dispatch 
speed.

2. Let the master know what subChunks are populated. We can build an 
index pretty easily from the objectid index. Now the master can skip 
worrying about empty subchunks. (similarly, we can create the 
emptychunks file). Everyone wins here, minus the time to create the 
index. I think we can create it in-memory on startup and maybe cache it 
on disk to avoid reading on startup.

3. Have the worker figure out what subchunks are populated. The worker 
is really in the right position to do this. Coupled with 1., it would be 
comparable to 2.

We can probably do 2. It's the easiest. But not right now, not today.

> plus if maybe you could get that to work...
>
> SELECT
>     ROUND(scisql_fluxToAbMag(uFlux_PS)-scisql_fluxToAbMag(gFlux_PS), 0)
> AS UG,
>     ROUND(scisql_fluxToAbMag(gFlux_PS)-scisql_fluxToAbMag(rFlux_PS), 0)
> AS GR,
>     ROUND(scisql_fluxToAbMag(rFlux_PS)-scisql_fluxToAbMag(iFlux_PS), 0)
> AS RI,
>     ROUND(scisql_fluxToAbMag(iFlux_PS)-scisql_fluxToAbMag(zFlux_PS), 0)
> AS IZ,
>     ROUND(scisql_fluxToAbMag(zFlux_PS)-scisql_fluxToAbMag(yFlux_PS), 0) AS ZY
> FROM Object
> WHERE
>     scisql_fluxToAbMag(uFlux_PS)-scisql_fluxToAbMag(gFlux_PS) +
>     scisql_fluxToAbMag(gFlux_PS)-scisql_fluxToAbMag(rFlux_PS) +
>     scisql_fluxToAbMag(rFlux_PS)-scisql_fluxToAbMag(iFlux_PS) +
>     scisql_fluxToAbMag(iFlux_PS)-scisql_fluxToAbMag(zFlux_PS) +
>     scisql_fluxToAbMag(zFlux_PS)-scisql_fluxToAbMag(yFlux_PS) < <x>;
Actually, this might work. It returns no rows with <x> = 1. It probably 
helped that I cleaned up some of the alias handling for aggregation. 
Don't have time to test it right now.

code on u/danielw/sScan5

-Daniel

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1