Daniel, I just figure out why my copy of master is hosed. master expects CSS metadata in JSON format, and I still have the old key-value stuff in zookeeper. Is there a docs page on the expected structure of the JSON so I can translate it? Cheers, Serge On Jan 8, 2015, at 12:26 AM, Serge Monkewitz <[log in to unmask]> wrote: > Hey all, > > The xrootd death happened because Tatiana had a small typo in her query (scisql_s2PtInPoly should be scisql_s2PtInCPoly). As Fritz surmised, this led to an uncaught std::runtime_error and subsequent worker death. Since Fritz is in the process of fixing this, hopefully we can just wait for his changes to land. > > However, there are bigger problems. First of all, master doesn’t build for me. I get this error: > > File "/home/lsstadm/qserv/SConstruct", line 104: > env.Alias("dist-core", get_install_targets()) > File "/home/lsstadm/qserv/SConstruct", line 90: > exports=['env', 'ARGUMENTS']) > File "/home/lsstadm/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Script/SConscript.py", line 609: > return method(*args, **kw) > File "/home/lsstadm/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Script/SConscript.py", line 546: > return _SConscript(self.fs, *files, **subst_kw) > File "/home/lsstadm/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Script/SConscript.py", line 260: > exec _file_ in call_stack[-1].globals > File "/home/lsstadm/qserv/build/SConscript", line 39: > canBuild = detect.checkMySql(env) and detect.setXrootd(env) and detect.checkXrootdLink(env) > File "/home/lsstadm/qserv/site_scons/detect.py", line 225: > xrdLibPath = findXrootdLibPath("XrdCl", env["LIBPATH"]) > File "/home/lsstadm/qserv/site_scons/detect.py", line 213: > if os.access(os.path.join(path, fName), os.R_OK): > File "/home/lsstadm/stack/Linux64/anaconda/2.1.0/lib/python2.7/posixpath.py", line 77: > elif path == '' or path.endswith('/'): > > which is caused by the fact that env[“LIBPATH”] looks like: > > [[], '/home/lsstadm/stack/Linux64/antlr/2.7.7/lib', '/home/lsstadm/stack/Linux64/boost/1.55.0.1.lsst2/lib', '/home/lsstadm/stack/Linux64/log4cxx/0.10.0.lsst1+2/lib', '/home/lsstadm/stack/Linux64/xrootd/4.0.0rc4-qsClient2/lib', '/home/lsstadm/stack/Linux64/zookeeper/3.4.6/c-binding/lib', '/home/lsstadm/stack/Linux64/mysql/5.1.65.lsst1/lib', '/home/lsstadm/stack/Linux64/protobuf/2.4.1/lib', '/home/lsstadm/stack/Linux64/log/10.0+3/lib'] > > The first element is [], which comes from https://github.com/LSST/qserv/blob/master/site_scons/state.py#L173 where a PathVariable called SWIG_SWIG_LIB is given a default value of []. I can fix the build by changing the default to an empty string… but I don’t know enough scons to say whether that’s the right thing to do. Can one of the scons gurus confirm that’s the right fix? > > More serious is that after having hacked around that, even the simplest count(*) query just hangs forever. I see this in the worker log: > > ------ xrootd [log in to unmask]:1094 initialization completed. > 150108 00:11:15 23867 XrootdXeq: lsstadm.23942:[log in to unmask] pvt IPv4 login > 0108 00:11:15.598 [0x7f716850a700] INFO root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/6970 > 0108 00:11:15.599 [0x7f716850a700] INFO root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7138 > 0108 00:11:15.638 [0x7f716850a700] INFO root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7140 > 0108 00:11:15.638 [0x7f716850a700] INFO root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7308 > 0108 00:11:15.638 [0x7f716850a700] INFO root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7310 > 0108 00:11:15.639 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/6970 > 0108 00:11:15.639 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 2e-06 seconds > 0108 00:11:15.639 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds > 0108 00:11:15.639 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?6*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_6970_0"\x008\x02 > 0108 00:11:15.639 [0x7f716850a700] WARN root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/6970 > 150108 00:11:15 23867 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!! > 0108 00:11:15.639 [0x7f7160467700] INFO root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7138 > 0108 00:11:15.639 [0x7f7160467700] INFO root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 1e-06 seconds > 0108 00:11:15.639 [0x7f7160164700] INFO root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7308 > 0108 00:11:15.640 [0x7f7160164700] INFO root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 0 seconds > 0108 00:11:15.640 [0x7f7160164700] INFO root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds > 0108 00:11:15.640 [0x7f7160164700] INFO root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?9*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7308_0"\x008\x02 > 0108 00:11:15.640 [0x7f7160164700] WARN root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7308 > 150108 00:11:15 23972 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!! > 0108 00:11:15.639 [0x7f7160366700] INFO root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7310 > 0108 00:11:15.640 [0x7f7160366700] INFO root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 1e-06 seconds > 0108 00:11:15.640 [0x7f7160467700] INFO root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds > 0108 00:11:15.640 [0x7f7160366700] INFO root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds > 0108 00:11:15.640 [0x7f7160467700] INFO root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?7*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7138_0"\x008\x02 > 0108 00:11:15.640 [0x7f7160366700] INFO root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?9*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7310_0"\x008\x02 > 0108 00:11:15.640 [0x7f7160467700] WARN root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7138 > 150108 00:11:15 23969 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!! > 0108 00:11:15.640 [0x7f7160366700] WARN root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7310 > 150108 00:11:15 23970 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!! > 0108 00:11:15.679 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7140 > 0108 00:11:15.679 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 1e-06 seconds > 0108 00:11:15.679 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds > 0108 00:11:15.679 [0x7f716850a700] INFO root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?7*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7140_0"\x008\x02 > 0108 00:11:15.679 [0x7f716850a700] WARN root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7140 > 150108 00:11:15 23867 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!! > > and this in the czar log: > > 0108 00:11:15.588 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: -1 100 Dispatch Query. > 0108 00:11:15.588 [0x7f3b34c8c700] DEBUG root (build/ccontrol/userQueryProxy.cc:118) - EXECUTING UserQuery_submit(1) > 0108 00:11:15.588 [0x7f3b34c8c700] INFO root (build/ccontrol/UserQuery.cc:270) - UserQuery::_setupMerger() > 0108 00:11:15.589 [0x7f3b34c8c700] INFO root (build/ccontrol/UserQuery.cc:176) - UserQuery beginning submission > 0108 00:11:15.589 [0x7f3b34c8c700] INFO root (build/qproc/QuerySession.cc:361) - Non-subchunked > 0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning queries: > 0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_ > 0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment > 0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_ > 0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=1 > 0108 00:11:15.589 [0x7f3b34c8c700] INFO root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/6970 > 0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 6970 1200 Exec add pth=/chk/LSST/6970 > Opening xroot://127.0.0.1:1094//chk/LSST/6970 > 0108 00:11:15.597 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok > 0108 00:11:15.597 [0x7f3b34c8c700] INFO root (build/qproc/QuerySession.cc:361) - Non-subchunked > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning queries: > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_ > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_ > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=2 > 0108 00:11:15.598 [0x7f3b34c8c700] INFO root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7138 > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7138 1200 Exec add pth=/chk/LSST/7138 > Opening xroot://127.0.0.1:1094//chk/LSST/7138 > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok > 0108 00:11:15.598 [0x7f3b34c8c700] INFO root (build/qproc/QuerySession.cc:361) - Non-subchunked > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning queries: > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_ > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_ > 0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=3 > 0108 00:11:15.599 [0x7f3b34c8c700] INFO root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7140 > 0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7140 1200 Exec add pth=/chk/LSST/7140 > Opening xroot://127.0.0.1:1094//chk/LSST/7140 > 0108 00:11:15.599 [0x7f3b2cdfa700] INFO root (build/qdisp/QueryResource.cc:55) - Provision done > 0108 00:11:15.599 [0x7f3b1bfff700] INFO root (build/qdisp/QueryResource.cc:55) - Provision done > 0108 00:11:15.599 [0x7f3b2cdfa700] INFO root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127) > 0108 00:11:15.599 [0x7f3b1bfff700] INFO root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127) > 0108 00:11:15.599 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?6*p > @SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_6970_0"Task=0x7f3b0c000eb0 processing id=0 > 0108 00:11:15.599 [0x7f3b1bfff700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?7*p > @SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7138_0"Task=0x7f3b10002c60 processing id=0 > 0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok > 0108 00:11:15.599 [0x7f3b34c8c700] INFO root (build/qproc/QuerySession.cc:361) - Non-subchunked > 0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning queries: > 0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_ > 0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment > 0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_ > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=4 > 0108 00:11:15.600 [0x7f3b34c8c700] INFO root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7308 > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7308 1200 Exec add pth=/chk/LSST/7308 > Opening xroot://127.0.0.1:1094//chk/LSST/7308 > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok > 0108 00:11:15.600 [0x7f3b34c8c700] INFO root (build/qproc/QuerySession.cc:361) - Non-subchunked > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning queries: > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_ > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_ > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=5 > 0108 00:11:15.600 [0x7f3b34c8c700] INFO root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7310 > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7310 1200 Exec add pth=/chk/LSST/7310 > Opening xroot://127.0.0.1:1094//chk/LSST/7310 > 0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok > 0108 00:11:15.601 [0x7f3b34c8c700] INFO root (app.py:565) - Query dispatch (1) took 0.015103 seconds > 0108 00:11:15.601 [0x7f3b34c8c700] INFO root (build/qdisp/Executive.cc:483) - Still 5 in flight. > Task 0x7f3b0c000eb0 sess=ok Status = 1 isWrite > Task Handler calling RelBuff. > 0108 00:11:15.638 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer > Task Handler calling trunc. > Task 0x7f3b10002c60 sess=ok Status = 1 isWrite > Task Handler calling RelBuff. > 0108 00:11:15.638 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer > Task Handler calling trunc. > 0108 00:11:15.638 [0x7f3b1bfff700] INFO root (build/qdisp/QueryResource.cc:55) - Provision done > 0108 00:11:15.638 [0x7f3b1bfff700] INFO root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127) > 0108 00:11:15.638 [0x7f3b1bfff700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?7*p > @SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7140_0"Task=0x7f3b10000a700108 00:11:15.638 [0x7f3b2cdfa700] INFO root > (build/qdisp/QueryResource.cc:55) - Provision done > 0108 00:11:15.638 [0x7f3b2cdfa700] INFO root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127) > 0108 00:11:15.639 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?9*p > @SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7308_0"Task=0x7f3b0c001aa0 processing id=0 > 0108 00:11:15.639 [0x7f3b2cdfa700] INFO root (build/qdisp/QueryResource.cc:55) - Provision done > 0108 00:11:15.639 [0x7f3b2cdfa700] INFO root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127) > 0108 00:11:15.639 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?9*p > @SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7310_0"Task=0x7f3b0c001fd0 processing id=0 > processing id=0 > Task 0x7f3b0c001aa0 sess=ok Status = 1 isWrite > Task Handler calling RelBuff. > 0108 00:11:15.639 [0x7f3b1b5fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer > Task Handler calling trunc. > Task 0x7f3b0c001fd0 sess=ok Status = 1 isWrite > Task Handler calling RelBuff. > 0108 00:11:15.639 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer > Task Handler calling trunc. > Task 0x7f3b10000a70 sess=ok Status = 1 isWrite > Task Handler calling RelBuff. > 0108 00:11:15.679 [0x7f3b1b5fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer > Task Handler calling trunc. > > Can anyone else confirm they have a working tip of master? Based on the log files, is there anything I should be looking at? > > Thanks for any help, > Serge > > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1