Hey guys,
If on looking into this you see an uncaught/unprocessed error handling path after a failed query on the xrootd node is the cause, I have an almost-ready-to-go change from before the break that addresses that and some attendant reporting issues back up the chain.
Let me know if that turns out to be the case, and I can at least put up what I have in its current state for people to borrow from. (I had wanted to add regression/unit tests and clean up a little before putting it up for general review.)
--FritzM.
> On Jan 7, 2015, at 9:58 PM, Serge Monkewitz <[log in to unmask]> wrote:
>
> Jacek,
>
> Try logging in to lsst-db1.ipac.caltech.edu. If you can get in, the stack is in /home/lsstadm/stack, qserv is in /home/lsstadm/qserv, and the run directory is /db1/run/qserv-master.
>
> However, note that Tatiana also requested that I upgrade qserv to tip of master. I tried, but master doesn’t build for me. I’m in the process of installing a fresh stack to see if it helps. Once I’m up and running again, I’ll see if we can reproduce with the latest code.
>
> Cheers,
> Serge
>
>> On Jan 7, 2015, at 9:49 PM, Jacek Becla <[log in to unmask]> wrote:
>>
>> Hmm.... I have an account at ipac now, so
>> a) if you can tell me how to reproduce it (e.g., where
>> do I find qserv you are talking to etc), and
>> b) if someone tells me where to log in (I didn't try
>> to log in to ipac machines yet),
>>
>> I will look into that.
>>
>> Jacek
>>
>>
>>
>>> On 01/07/2015 04:41 PM, Tatiana Goldina wrote:
>>> Hello,
>>>
>>> I noticed that xrootd is failing quite consistently when I am trying to constrain an area to a polygon:
>>> select * from DeepSource where scisql_s2PtInPoly(ra, decl, 0.4, 1.0, 0.4, 1.2, 0.6, 1.2, 0.6, 1.0) = 1;
>>>
>>> Last time I was trying various spatial constraints. xrootd died at the polygon test.
>>>
>>> Are there any pointers to what is wrong?
>>>
>>> Thank you!
>>>
>>> Tatiana
>>>
>>> 1. select * from DeepSource where scisql_s2PtInCircle(ra, decl, 0.5, 1.1, 0.2) = 1;
>>> ERROR 4120 (Proxy): Error during execution:
>>> -1 Ref=1 Resource(/chk/LSST/6970): 20150107-15:59:54, Complete (success), 0,
>>> Ref=2 Resource(/chk/LSST/7138): 20150107-15:59:54, Complete (success), 0,
>>> Ref=3 Resource(/chk/LSST/7140): 20150107-15:59:54, Complete (success), 0,
>>> Ref=4 Resource(/chk/LSST/7308) (-1)
>>>
>>> 2. select * from DeepSource where scisql_s2PtInBox(ra, decl, 0.4, 1.0, 0.6, 1.2) = 1;
>>> 2027 rows in set (0.53 sec)
>>>
>>> 3. select * from DeepSource where scisql_s2PtInPoly(ra, decl, 0.4, 1.0, 0.4, 1.2, 0.6, 1.2, 0.6, 1.0) = 1;
>>> never returned
>>>
>>> The report from qserv-status.sh
>>>
>>> INFO: Qserv execution directory : /db1/run/qserv-master
>>> MySQL running (4049) [ OK ]
>>> xrootd is dead but PID file exists [FAILED]
>>> see /db1/run/qserv-master/var/run/worker/xrootd.pid
>>> ZooKeeper daemon is running (4110) [ OK ]
>>> mysql-proxy is running (4137) [ OK ]
>>> qserv-czar is running (4149) [ OK ]
>>>
>>> This is the tail of var/log/qserv-czar.log
>>>
>>> 0107 16:07:02.153 [0x7f6552bfd700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [193]LSST?9"LSST.DeepSource*?
>>> oSELECT * FROM LSST.DeepSource_7310 AS QST_1_ WHERE scisql_s2PtInPoly(ra,decl,0.4,1.0,0.4,1.2,0.6,1.2,0.6,1.0)=1+r_18be7db4d2025f994bda94973d3b4770d5_7310_0"Task=0x7
>>> f653c234480 processing id=0
>>> Task 0x7f653cae9c00 sess=ok Status = 1 isWrite
>>> Task Handler calling RelBuff.
>>> 0107 16:07:02.153 [0x7f6552bfd700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
>>> Task Handler calling trunc.
>>> Task 0x7f653c234480 sess=ok Status = 1 isWrite
>>> Task Handler calling RelBuff.
>>> 0107 16:07:02.154 [0x7f65535fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
>>> Task Handler calling trunc.
>>> Task 0x7f653c234480 sess=ok Status = 0 isSync
>>> 0107 16:15:02.522 [0x7f65657fb700] ERROR root (build/ccontrol/MergingRequester.cc:129) - Error receiving result.
>>> 0107 16:15:02.522 [0x7f65657fb700] DEBUG root (build/qdisp/QueryRequest.cc:208) - Error finish
>>> Session Complete: cancel=0 task=ok
>>> Task 0x7f653c234480 Kill status = isDone mhpend=0
>>> RelTask dodel=0 id=0
>>> 0107 16:15:02.522 [0x7f65657fb700] INFO root (build/qdisp/QueryRequest.cc:213) - Request::Finished() with error (clean).
>>> Opening xroot://127.0.0.1:1094//chk/LSST/7310
>>> 0107 16:15:02.522 [0x7f65657fb700] ERROR root (build/qdisp/Executive.cc:328) - Resource provision error /chk/LSST/7310
>>> 0107 16:15:02.522 [0x7f65657fb700] INFO root (build/qdisp/Executive.cc:416) - Executive (0x2f16380) UNTRACKING id=5
>>> 0107 16:15:02.522 [0x7f65657fb700] DEBUG root (build/qdisp/Executive.cc:334) - Provision was ok
>>> 0107 16:15:07.124 [0x7f65675fe700] INFO root (build/qdisp/Executive.cc:480) - Still 4 in flight.
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the QSERV-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|