Print

Print


Hey guys,

If on looking into this you see an uncaught/unprocessed error handling path after a failed query on the xrootd node is the cause, I have an almost-ready-to-go change from before the break that addresses that and some attendant reporting issues back up the chain.

Let me know if that turns out to be the case, and I can at least put up what I have in its current state for people to borrow from.  (I had wanted to add regression/unit tests and clean up a little before putting it up for general review.)

  --FritzM.

> On Jan 7, 2015, at 9:58 PM, Serge Monkewitz <[log in to unmask]> wrote:
> 
> Jacek,
> 
>    Try logging in to lsst-db1.ipac.caltech.edu. If you can get in, the stack is in /home/lsstadm/stack, qserv is in /home/lsstadm/qserv, and the run directory is /db1/run/qserv-master.
> 
> However, note that Tatiana also requested that I upgrade qserv to tip of master. I tried, but master doesn’t build for me. I’m in the process of installing a fresh stack to see if it helps. Once I’m up and running again, I’ll see if we can reproduce with the latest code.
> 
> Cheers,
> Serge
> 
>> On Jan 7, 2015, at 9:49 PM, Jacek Becla <[log in to unmask]> wrote:
>> 
>> Hmm.... I have an account at ipac now, so
>> a) if you can tell me how to reproduce it (e.g., where
>>   do I find qserv you are talking to etc), and
>> b) if someone tells me where to log in (I didn't try
>>   to log in to ipac machines yet),
>> 
>> I will look into that.
>> 
>> Jacek
>> 
>> 
>> 
>>> On 01/07/2015 04:41 PM, Tatiana Goldina wrote:
>>> Hello,
>>> 
>>> I noticed that xrootd is failing quite consistently when I am trying to constrain an area to a polygon:
>>>    select * from DeepSource where scisql_s2PtInPoly(ra, decl, 0.4, 1.0, 0.4, 1.2, 0.6, 1.2, 0.6, 1.0) = 1;
>>> 
>>> Last time I was trying various spatial constraints. xrootd died at the polygon test.
>>> 
>>> Are there any pointers to what is wrong?
>>> 
>>> Thank you!
>>> 
>>> Tatiana
>>> 
>>> 1. select * from DeepSource where scisql_s2PtInCircle(ra, decl, 0.5, 1.1, 0.2) = 1;
>>> ERROR 4120 (Proxy): Error during execution:
>>> -1 Ref=1 Resource(/chk/LSST/6970): 20150107-15:59:54, Complete (success), 0,
>>> Ref=2 Resource(/chk/LSST/7138): 20150107-15:59:54, Complete (success), 0,
>>> Ref=3 Resource(/chk/LSST/7140): 20150107-15:59:54, Complete (success), 0,
>>> Ref=4 Resource(/chk/LSST/7308) (-1)
>>> 
>>> 2. select * from DeepSource where scisql_s2PtInBox(ra, decl, 0.4, 1.0, 0.6, 1.2) = 1;
>>> 2027 rows in set (0.53 sec)
>>> 
>>> 3. select * from DeepSource where scisql_s2PtInPoly(ra, decl, 0.4, 1.0, 0.4, 1.2, 0.6, 1.2, 0.6, 1.0) = 1;
>>> never returned
>>> 
>>> The report from qserv-status.sh
>>> 
>>> INFO: Qserv execution directory : /db1/run/qserv-master
>>> MySQL running (4049)                                       [  OK  ]
>>> xrootd is dead but PID file exists                         [FAILED]
>>>  see /db1/run/qserv-master/var/run/worker/xrootd.pid
>>> ZooKeeper daemon is running (4110)                         [  OK  ]
>>> mysql-proxy is running (4137)                              [  OK  ]
>>> qserv-czar is running (4149)                               [  OK  ]
>>> 
>>> This is the tail of var/log/qserv-czar.log
>>> 
>>> 0107 16:07:02.153 [0x7f6552bfd700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [193]LSST?9"LSST.DeepSource*?
>>> oSELECT * FROM LSST.DeepSource_7310 AS QST_1_ WHERE scisql_s2PtInPoly(ra,decl,0.4,1.0,0.4,1.2,0.6,1.2,0.6,1.0)=1+r_18be7db4d2025f994bda94973d3b4770d5_7310_0"Task=0x7
>>> f653c234480 processing id=0
>>> Task 0x7f653cae9c00 sess=ok Status = 1 isWrite
>>> Task Handler calling RelBuff.
>>> 0107 16:07:02.153 [0x7f6552bfd700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
>>> Task Handler calling trunc.
>>> Task 0x7f653c234480 sess=ok Status = 1 isWrite
>>> Task Handler calling RelBuff.
>>> 0107 16:07:02.154 [0x7f65535fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
>>> Task Handler calling trunc.
>>> Task 0x7f653c234480 sess=ok Status = 0 isSync
>>> 0107 16:15:02.522 [0x7f65657fb700] ERROR root (build/ccontrol/MergingRequester.cc:129) - Error receiving result.
>>> 0107 16:15:02.522 [0x7f65657fb700] DEBUG root (build/qdisp/QueryRequest.cc:208) - Error finish
>>> Session Complete: cancel=0 task=ok
>>> Task 0x7f653c234480 Kill status = isDone mhpend=0
>>> RelTask dodel=0 id=0
>>> 0107 16:15:02.522 [0x7f65657fb700] INFO  root (build/qdisp/QueryRequest.cc:213) - Request::Finished() with error (clean).
>>> Opening xroot://127.0.0.1:1094//chk/LSST/7310
>>> 0107 16:15:02.522 [0x7f65657fb700] ERROR root (build/qdisp/Executive.cc:328) - Resource provision error /chk/LSST/7310
>>> 0107 16:15:02.522 [0x7f65657fb700] INFO  root (build/qdisp/Executive.cc:416) - Executive (0x2f16380) UNTRACKING id=5
>>> 0107 16:15:02.522 [0x7f65657fb700] DEBUG root (build/qdisp/Executive.cc:334) - Provision was ok
>>> 0107 16:15:07.124 [0x7f65675fe700] INFO  root (build/qdisp/Executive.cc:480) - Still 4 in flight.
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>> 
>>> To unsubscribe from the QSERV-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>> 
>> ########################################################################
>> Use REPLY-ALL to reply to list
>> 
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
> 
> ########################################################################
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1