Print

Print


Hi Serge,

Thanks for this detailed report, master build fine on my machine but you're right there's something strange with LIBPATH, I check and let you know.

Cheers,

Fabrice

On 01/08/2015 12:26 AM, Serge Monkewitz wrote:
[log in to unmask]" type="cite">
Hey all,

    The xrootd death happened because Tatiana had a small typo in her query (scisql_s2PtInPoly should be scisql_s2PtInCPoly). As Fritz surmised, this led to an uncaught std::runtime_error and subsequent worker death. Since Fritz is in the process of fixing this, hopefully we can just wait for his changes to land.

However, there are bigger problems. First of all, master doesn’t build for me. I get this error:

  File "/home/lsstadm/qserv/SConstruct", line 104:
    env.Alias("dist-core", get_install_targets())
  File "/home/lsstadm/qserv/SConstruct", line 90:
    exports=['env', 'ARGUMENTS'])
  File "/home/lsstadm/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Script/SConscript.py", line 609:
    return method(*args, **kw)
  File "/home/lsstadm/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Script/SConscript.py", line 546:
    return _SConscript(self.fs, *files, **subst_kw)
  File "/home/lsstadm/stack/Linux64/scons/2.3.0+1/lib/scons/SCons/Script/SConscript.py", line 260:
    exec _file_ in call_stack[-1].globals
  File "/home/lsstadm/qserv/build/SConscript", line 39:
    canBuild = detect.checkMySql(env) and detect.setXrootd(env) and detect.checkXrootdLink(env)
  File "/home/lsstadm/qserv/site_scons/detect.py", line 225:
    xrdLibPath = findXrootdLibPath("XrdCl", env["LIBPATH"])
  File "/home/lsstadm/qserv/site_scons/detect.py", line 213:
    if os.access(os.path.join(path, fName), os.R_OK):
  File "/home/lsstadm/stack/Linux64/anaconda/2.1.0/lib/python2.7/posixpath.py", line 77:
    elif path == '' or path.endswith('/'):

which is caused by the fact that env[“LIBPATH”] looks like:

[[], '/home/lsstadm/stack/Linux64/antlr/2.7.7/lib', '/home/lsstadm/stack/Linux64/boost/1.55.0.1.lsst2/lib', '/home/lsstadm/stack/Linux64/log4cxx/0.10.0.lsst1+2/lib', '/home/lsstadm/stack/Linux64/xrootd/4.0.0rc4-qsClient2/lib', '/home/lsstadm/stack/Linux64/zookeeper/3.4.6/c-binding/lib', '/home/lsstadm/stack/Linux64/mysql/5.1.65.lsst1/lib', '/home/lsstadm/stack/Linux64/protobuf/2.4.1/lib', '/home/lsstadm/stack/Linux64/log/10.0+3/lib']

The first element is [], which comes from https://github.com/LSST/qserv/blob/master/site_scons/state.py#L173 where a PathVariable called SWIG_SWIG_LIB is given a default value of []. I can fix the build by changing the default to an empty string… but I don’t know enough scons to say whether that’s the right thing to do. Can one of the scons gurus confirm that’s the right fix?

More serious is that after having hacked around that, even the simplest count(*) query just hangs forever. I see this in the worker log:

------ xrootd [log in to unmask]:1094 initialization completed.
150108 00:11:15 23867 XrootdXeq: lsstadm.23942:[log in to unmask] pvt IPv4 login
0108 00:11:15.598 [0x7f716850a700] INFO  root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/6970
0108 00:11:15.599 [0x7f716850a700] INFO  root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7138
0108 00:11:15.638 [0x7f716850a700] INFO  root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7140
0108 00:11:15.638 [0x7f716850a700] INFO  root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7308
0108 00:11:15.638 [0x7f716850a700] INFO  root (build/xrdsvc/SsiService.cc:88) - Got provision call where rName is: /chk/LSST/7310
0108 00:11:15.639 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/6970
0108 00:11:15.639 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 2e-06 seconds
0108 00:11:15.639 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds
0108 00:11:15.639 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?6*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_6970_0"\x008\x02
0108 00:11:15.639 [0x7f716850a700] WARN  root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/6970
150108 00:11:15 23867 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!!
0108 00:11:15.639 [0x7f7160467700] INFO  root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7138
0108 00:11:15.639 [0x7f7160467700] INFO  root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 1e-06 seconds
0108 00:11:15.639 [0x7f7160164700] INFO  root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7308
0108 00:11:15.640 [0x7f7160164700] INFO  root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 0 seconds
0108 00:11:15.640 [0x7f7160164700] INFO  root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds
0108 00:11:15.640 [0x7f7160164700] INFO  root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?9*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7308_0"\x008\x02
0108 00:11:15.640 [0x7f7160164700] WARN  root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7308
150108 00:11:15 23972 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!!
0108 00:11:15.639 [0x7f7160366700] INFO  root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7310
0108 00:11:15.640 [0x7f7160366700] INFO  root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 1e-06 seconds
0108 00:11:15.640 [0x7f7160467700] INFO  root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds
0108 00:11:15.640 [0x7f7160366700] INFO  root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds
0108 00:11:15.640 [0x7f7160467700] INFO  root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?7*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7138_0"\x008\x02
0108 00:11:15.640 [0x7f7160366700] INFO  root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?9*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7310_0"\x008\x02
0108 00:11:15.640 [0x7f7160467700] WARN  root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7138
150108 00:11:15 23969 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!!
0108 00:11:15.640 [0x7f7160366700] WARN  root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7310
150108 00:11:15 23970 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!!
0108 00:11:15.679 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:131) - ProcessRequest, service=/chk/LSST/7140
0108 00:11:15.679 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:135) - BindRequest took 1e-06 seconds
0108 00:11:15.679 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:142) - GetRequest took 1e-06 seconds
0108 00:11:15.679 [0x7f716850a700] INFO  root (build/xrdsvc/SsiSession.cc:144) - ### 127 byte request: \x08\x01\x12\x04LSST\x18?7*p\x0a@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_\x1a*r_18f199f87464a85cd976643d8bec45a7a_7140_0"\x008\x02
0108 00:11:15.679 [0x7f716850a700] WARN  root (build/xrdsvc/SsiSession.cc:148) - WARNING: unowned chunk query detected: /chk/LSST/7140
150108 00:11:15 23867 ssi_DoIt: lsstadm.23942:[log in to unmask] Invalid session process return!!

and this in the czar log:

0108 00:11:15.588 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: -1 100 Dispatch Query.
0108 00:11:15.588 [0x7f3b34c8c700] DEBUG root (build/ccontrol/userQueryProxy.cc:118) - EXECUTING UserQuery_submit(1)
0108 00:11:15.588 [0x7f3b34c8c700] INFO  root (build/ccontrol/UserQuery.cc:270) - UserQuery::_setupMerger()
0108 00:11:15.589 [0x7f3b34c8c700] INFO  root (build/ccontrol/UserQuery.cc:176) - UserQuery beginning submission
0108 00:11:15.589 [0x7f3b34c8c700] INFO  root (build/qproc/QuerySession.cc:361) - Non-subchunked
0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning  queries: 
0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_
0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment
0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_
0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=1
0108 00:11:15.589 [0x7f3b34c8c700] INFO  root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/6970
0108 00:11:15.589 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 6970 1200 Exec add pth=/chk/LSST/6970
Opening xroot://127.0.0.1:1094//chk/LSST/6970
0108 00:11:15.597 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok
0108 00:11:15.597 [0x7f3b34c8c700] INFO  root (build/qproc/QuerySession.cc:361) - Non-subchunked
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning  queries: 
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=2
0108 00:11:15.598 [0x7f3b34c8c700] INFO  root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7138
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7138 1200 Exec add pth=/chk/LSST/7138
Opening xroot://127.0.0.1:1094//chk/LSST/7138
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok
0108 00:11:15.598 [0x7f3b34c8c700] INFO  root (build/qproc/QuerySession.cc:361) - Non-subchunked
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning  queries: 
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_
0108 00:11:15.598 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=3
0108 00:11:15.599 [0x7f3b34c8c700] INFO  root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7140
0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7140 1200 Exec add pth=/chk/LSST/7140
Opening xroot://127.0.0.1:1094//chk/LSST/7140
0108 00:11:15.599 [0x7f3b2cdfa700] INFO  root (build/qdisp/QueryResource.cc:55) - Provision done
0108 00:11:15.599 [0x7f3b1bfff700] INFO  root (build/qdisp/QueryResource.cc:55) - Provision done
0108 00:11:15.599 [0x7f3b2cdfa700] INFO  root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127)
0108 00:11:15.599 [0x7f3b1bfff700] INFO  root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127)
0108 00:11:15.599 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?6*p
@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_6970 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_6970_0"Task=0x7f3b0c000eb0 processing id=0
0108 00:11:15.599 [0x7f3b1bfff700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?7*p
@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7138 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7138_0"Task=0x7f3b10002c60 processing id=0
0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok
0108 00:11:15.599 [0x7f3b34c8c700] INFO  root (build/qproc/QuerySession.cc:361) - Non-subchunked
0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning  queries: 
0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_
0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment
0108 00:11:15.599 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=4
0108 00:11:15.600 [0x7f3b34c8c700] INFO  root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7308
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7308 1200 Exec add pth=/chk/LSST/7308
Opening xroot://127.0.0.1:1094//chk/LSST/7308
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok
0108 00:11:15.600 [0x7f3b34c8c700] INFO  root (build/qproc/QuerySession.cc:361) - Non-subchunked
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:381) - returning  queries: 
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/QuerySession.cc:383) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:151) - no nextFragment
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qproc/TaskMsgFactory2.cc:153) - SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:396) - Executive (0x18d64a0) tracking id=5
0108 00:11:15.600 [0x7f3b34c8c700] INFO  root (build/qdisp/Executive.cc:172) - Exec add pth=/chk/LSST/7310
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/MessageStore.cc:49) - Msg: 7310 1200 Exec add pth=/chk/LSST/7310
Opening xroot://127.0.0.1:1094//chk/LSST/7310
0108 00:11:15.600 [0x7f3b34c8c700] DEBUG root (build/qdisp/Executive.cc:337) - Provision was ok
0108 00:11:15.601 [0x7f3b34c8c700] INFO  root (app.py:565) - Query dispatch (1) took 0.015103 seconds
0108 00:11:15.601 [0x7f3b34c8c700] INFO  root (build/qdisp/Executive.cc:483) - Still 5 in flight.
Task 0x7f3b0c000eb0 sess=ok Status = 1 isWrite
Task Handler calling RelBuff.
0108 00:11:15.638 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
Task Handler calling trunc.
Task 0x7f3b10002c60 sess=ok Status = 1 isWrite
Task Handler calling RelBuff.
0108 00:11:15.638 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
Task Handler calling trunc.
0108 00:11:15.638 [0x7f3b1bfff700] INFO  root (build/qdisp/QueryResource.cc:55) - Provision done
0108 00:11:15.638 [0x7f3b1bfff700] INFO  root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127)
0108 00:11:15.638 [0x7f3b1bfff700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?7*p
@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7140 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7140_0"Task=0x7f3b10000a700108 00:11:15.638 [0x7f3b2cdfa700] INFO  root 
(build/qdisp/QueryResource.cc:55) - Provision done
0108 00:11:15.638 [0x7f3b2cdfa700] INFO  root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127)
0108 00:11:15.639 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?9*p
@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7308 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7308_0"Task=0x7f3b0c001aa0 processing id=0
0108 00:11:15.639 [0x7f3b2cdfa700] INFO  root (build/qdisp/QueryResource.cc:55) - Provision done
0108 00:11:15.639 [0x7f3b2cdfa700] INFO  root (build/qdisp/QueryRequest.cc:87) - New QueryRequest with payload(127)
0108 00:11:15.639 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:97) - Requesting [127]LSST?9*p
@SELECT count(*) AS QS1_COUNT FROM LSST.DeepSource_7310 AS QST_1_*r_18f199f87464a85cd976643d8bec45a7a_7310_0"Task=0x7f3b0c001fd0 processing id=0
 processing id=0
Task 0x7f3b0c001aa0 sess=ok Status = 1 isWrite
Task Handler calling RelBuff.
0108 00:11:15.639 [0x7f3b1b5fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
Task Handler calling trunc.
Task 0x7f3b0c001fd0 sess=ok Status = 1 isWrite
Task Handler calling RelBuff.
0108 00:11:15.639 [0x7f3b2cdfa700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
Task Handler calling trunc.
Task 0x7f3b10000a70 sess=ok Status = 1 isWrite
Task Handler calling RelBuff.
0108 00:11:15.679 [0x7f3b1b5fe700] DEBUG root (build/qdisp/QueryRequest.cc:103) - Early release of request buffer
Task Handler calling trunc.

Can anyone else confirm they have a working tip of master? Based on the log files, is there anything I should be looking at?

Thanks for any help,
Serge


Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1




Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1