Hey guys, If on looking into this you see an uncaught/unprocessed error handling path after a failed query on the xrootd node is the cause, I have an almost-ready-to-go change from before the break that addresses that and some attendant reporting issues back up the chain. Let me know if that turns out to be the case, and I can at least put up what I have in its current state for people to borrow from. (I had wanted to add regression/unit tests and clean up a little before putting it up for general review.) --FritzM. > On Jan 7, 2015, at 9:58 PM, Serge Monkewitz <[log in to unmask]> wrote: > > Jacek, > > Try logging in to lsst-db1.ipac.caltech.edu. If you can get in, the stack is in /home/lsstadm/stack, qserv is in /home/lsstadm/qserv, and the run directory is /db1/run/qserv-master. > > However, note that Tatiana also requested that I upgrade qserv to tip of master. I tried, but master doesn’t build for me. I’m in the process of installing a fresh stack to see if it helps. Once I’m up and running again, I’ll see if we can reproduce with the latest code. > > Cheers, > Serge > >> On Jan 7, 2015, at 9:49 PM, Jacek Becla <[log in to unmask]> wrote: >> >> Hmm.... I have an account at ipac now, so >> a) if you can tell me how to reproduce it (e.g., where >> do I find qserv you are talking to etc), and >> b) if someone tells me where to log in (I didn't try >> to log in to ipac machines yet), >> >> I will look into that. >> >> Jacek >> >> >> >>> On 01/07/2015 04:41 PM, Tatiana Goldina wrote: >>> Hello, >>> >>> I noticed that xrootd is failing quite consistently when I am trying to constrain an area to a polygon: >>> select * from DeepSource where scisql_s2PtInPoly(ra, decl, 0.4, 1.0, 0.4, 1.2, 0.6, 1.2, 0.6, 1.0) = 1; >>> >>> Last time I was trying various spatial constraints. xrootd died at the polygon test. >>> >>> Are there any pointers to what is wrong? >>> >>> Thank you! >>> >>> Tatiana >>> >>> 1. select * from DeepSource where scisql_s2PtInCircle(ra, decl, 0.5, 1.1, 0.2) = 1; >>> ERROR 4120 (Proxy): Error during execution: >>> -1 Ref=1 Resource(/chk/LSST/6970): 20150107-15:59:54, Complete (success), 0, >>> Ref=2 Resource(/chk/LSST/7138): 20150107-15:59:54, Complete (success), 0, >>> Ref=3 Resource(/chk/LSST/7140): 20150107-15:59:54, Complete (success), 0, >>> Ref=4 Resource(/chk/LSST/7308) (-1) >>> >>> 2. select * from DeepSource where scisql_s2PtInBox(ra, decl, 0.4, 1.0, 0.6, 1.2) = 1; >>> 2027 rows in set (0.53 sec) >>> >>> 3. select * from DeepSource where scisql_s2PtInPoly(ra, decl, 0.4, 1.0, 0.4, 1.2, 0.6, 1.2, 0.6, 1.0) = 1; >>> never returned >>> >>> The report from qserv-status.sh >>> >>> INFO: Qserv execution directory : /db1/run/qserv-master >>> MySQL running (4049) [ OK ] >>> xrootd is dead but PID file exists [FAILED] >>> see /db1/run/qserv-master/var/run/worker/xrootd.pid >>> ZooKeeper daemon is running (4110) [ OK ] >>> mysql-proxy is running (4137) [ OK ] >>> qserv-czar is running (4149) [ OK ] >>> >>> This is the tail of var/log/qserv-czar.log >>> >>> 0107 16:07:02.153 [0x7f6552bfd700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [193]LSST?9"LSST.DeepSource*? >>> oSELECT * FROM LSST.DeepSource_7310 AS QST_1_ WHERE scisql_s2PtInPoly(ra,decl,0.4,1.0,0.4,1.2,0.6,1.2,0.6,1.0)=1+r_18be7db4d2025f994bda94973d3b4770d5_7310_0"Task=0x7 >>> f653c234480 processing id=0 >>> Task 0x7f653cae9c00 sess=ok Status = 1 isWrite >>> Task Handler calling RelBuff. >>> 0107 16:07:02.153 [0x7f6552bfd700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer >>> Task Handler calling trunc. >>> Task 0x7f653c234480 sess=ok Status = 1 isWrite >>> Task Handler calling RelBuff. >>> 0107 16:07:02.154 [0x7f65535fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer >>> Task Handler calling trunc. >>> Task 0x7f653c234480 sess=ok Status = 0 isSync >>> 0107 16:15:02.522 [0x7f65657fb700] ERROR root (build/ccontrol/MergingRequester.cc:129) - Error receiving result. >>> 0107 16:15:02.522 [0x7f65657fb700] DEBUG root (build/qdisp/QueryRequest.cc:208) - Error finish >>> Session Complete: cancel=0 task=ok >>> Task 0x7f653c234480 Kill status = isDone mhpend=0 >>> RelTask dodel=0 id=0 >>> 0107 16:15:02.522 [0x7f65657fb700] INFO root (build/qdisp/QueryRequest.cc:213) - Request::Finished() with error (clean). >>> Opening xroot://127.0.0.1:1094//chk/LSST/7310 >>> 0107 16:15:02.522 [0x7f65657fb700] ERROR root (build/qdisp/Executive.cc:328) - Resource provision error /chk/LSST/7310 >>> 0107 16:15:02.522 [0x7f65657fb700] INFO root (build/qdisp/Executive.cc:416) - Executive (0x2f16380) UNTRACKING id=5 >>> 0107 16:15:02.522 [0x7f65657fb700] DEBUG root (build/qdisp/Executive.cc:334) - Provision was ok >>> 0107 16:15:07.124 [0x7f65675fe700] INFO root (build/qdisp/Executive.cc:480) - Still 4 in flight. >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the QSERV-L list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the QSERV-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1