Print

Print


Hi,

Interface between Qserv and xrdssi API is very interesting, both product 
are so nice.
So I would really like to be involved in this bug fixing.

I think we should look inside function below on the xrootd client side:
https://github.com/lsst/qserv/search?utf8=%E2%9C%93&q=ProcessResponseData
isn't it? Here, in GetResponseData(), the client constraints the size of 
the xrootd result message to buffer.size() isn't it?
But in Qserv, it seems buffer.size() parameter is initialized to the 
size of the whole xrootd sv answer (i.e. the whole protobuf query result 
message) which mean: the size of the full SQL query result.
Whereas in Andy simple example it seems to have a fixed size.

On xrootd server side it seems the response stream is build in append() 
method of next class (cf. l82):
https://github.com/lsst/qserv/blob/ff47e3fd708e3e8dfcef05f59f8395ac3540137f/core/modules/xrdsvc/ChannelStream.cc#L82
but the response buffer seems to be filled with the full SQL query 
result in one time. In Andy example, XrdSsiSession and the XrdSvStream
are aware of the client constraint on buffer size (i.e. buffer.size) in 
order to chunk correctly the answer, isn't it? This should also be the 
case in Qserv?

I can try to set up high verbosity inside it and then reproduce the 
error, this may help us to understand?

Thanks

Fabrice


Le 11/02/2015 22:08, Andrew Hanushevsky a écrit :
> OK, let's work on resolving this. I think the real issue is that 
> getResponseData wants you to piece the buffers together and we can fix 
> that. Let's get together so I completely understand what you are doing.
>
> Andy
>
> On Wed, 11 Feb 2015, Daniel L Wang wrote:
>
>> Yes, this is one of the 2mb buffer problems.
>>
>> (typed on a small touch screen)
>>
>> On Feb 11, 2015 7:29 PM, Tatiana Goldina <[log in to unmask]> 
>> wrote:
>>>
>>> This is the tail from czar.log - I hope it would help to see what is 
>>> wrong.
>>>
>>> 0211 19:23:21.662 [0x7fcb535fe700] DEBUG root 
>>> (build/qproc/TaskMsgFactory2.cc:154) - SELECT * FROM 
>>> LSST.Science_Ccd_Exposure AS QST_1_ WHERE 
>>> scisql_s2PtInCircle(ra,decl,0.5,1.0,0.5)=1 LIMIT 3000
>>> 0211 19:23:21.662 [0x7fcb535fe700] DEBUG root 
>>> (build/qdisp/Executive.cc:397) - Executive (0x2bf3d80) tracking id=1
>>> 0211 19:23:21.662 [0x7fcb535fe700] INFO  root 
>>> (build/qdisp/Executive.cc:173) - Exec add pth=/chk/LSST/1234567890
>>> 0211 19:23:21.662 [0x7fcb535fe700] DEBUG root 
>>> (build/qdisp/MessageStore.cc:49) - Msg: 1234567890 1200 Exec add 
>>> pth=/chk/LSST/1234567890
>>> Opening xroot://127.0.0.1:1094//chk/LSST/1234567890
>>> 0211 19:23:21.662 [0x7fcb535fe700] DEBUG root 
>>> (build/qdisp/Executive.cc:338) - Provision was ok
>>> 0211 19:23:21.663 [0x7fcb3ebfd700] INFO  root 
>>> (build/qdisp/QueryResource.cc:56) - Provision done
>>> 0211 19:23:21.663 [0x7fcb3ebfd700] INFO  root 
>>> (build/qdisp/QueryRequest.cc:89) - New QueryRequest with payload(182)
>>> 0211 19:23:21.663 [0x7fcb3ebfd700] DEBUG root 
>>> (build/qdisp/QueryRequest.cc:99) - Requesting, payload size: [182]
>>> Task=0x7fcb28dab640 processing id=0
>>> 0211 19:23:21.663 [0x7fcb535fe700] INFO  root (app.py:565) - Query 
>>> dispatch (7) took 0.003240 seconds
>>> 0211 19:23:21.663 [0x7fcb535fe700] INFO  root 
>>> (build/qdisp/Executive.cc:484) - Still 1 in flight.
>>> Task 0x7fcb28dab640 sess=ok Status = 1 isWrite
>>> Task Handler calling RelBuff.
>>> 0211 19:23:21.704 [0x7fcb3f5fe700] DEBUG root 
>>> (build/qdisp/QueryRequest.cc:105) - Early release of request buffer
>>> Task Handler calling trunc.
>>> Task 0x7fcb28dab640 sess=ok Status = 1 isSync
>>> Task Handler responding with stream.
>>> 0211 19:23:21.913 [0x7fcb3ebfd700] INFO  root 
>>> (build/qdisp/QueryRequest.cc:148) - GetResponseData with buffer of 0
>>> Task 0x7fcb28dab640 SetBuff Async Status=isReady
>>> 0211 19:23:21.913 [0x7fcb3ebfd700] INFO  root 
>>> (build/qdisp/QueryRequest.cc:150) - Initiated request ok
>>> Task 0x7fcb28dab640 sess=ok Status = 1 isReady
>>> Task Handler calling ProcessResponseData.
>>> 0211 19:23:21.914 [0x7fcb3f5fe700] INFO  root 
>>> (build/qdisp/QueryRequest.cc:180) - ProcessResponse[data] with 
>>> buflen=1 (more)
>>> Task 0x7fcb28dab640 SetBuff Async Status=isReady
>>> Task 0x7fcb28dab640 sess=ok Status = 1 isReady
>>> Task Handler calling ProcessResponseData.
>>> 0211 19:23:21.916 [0x7fcb3ffff700] INFO  root 
>>> (build/qdisp/QueryRequest.cc:180) - ProcessResponse[data] with 
>>> buflen=25 (more)
>>> Task 0x7fcb28dab640 SetBuff Async Status=isReady
>>> Task 0x7fcb28dab640 sess=ok Status = 1 isReady
>>> Task Handler calling ProcessResponseData.
>>> 0211 19:23:21.925 [0x7fcb3ebfd700] INFO  root 
>>> (build/qdisp/QueryRequest.cc:180) - ProcessResponse[data] with 
>>> buflen=2097152 (last)
>>> 0211 19:23:21.925 [0x7fcb3ebfd700] ERROR root 
>>> (build/ccontrol/MergingRequester.cc:70) - MergingRequester size 
>>> mismatch: expected 2378306  got 2097152
>>> 0211 19:23:26.663 [0x7fcb535fe700] INFO  root 
>>> (build/qdisp/Executive.cc:432) - Executive (0x2bf3d80) REAPED id=1
>>> 0211 19:23:31.664 [0x7fcb535fe700] INFO  root 
>>> (build/qdisp/Executive.cc:189) - entry state:0x7fcb400360c0 
>>> Resource(/chk/LSST/1234567890): 20150211-19:23:21, Error merging 
>>> result, 1420, Result message MD5 mismatch)
>>> 0211 19:23:31.664 [0x7fcb535fe700] INFO  root 
>>> (build/qdisp/Executive.cc:194) - Query exec finish. 1 dispatched.
>>> 0211 19:23:31.664 [0x7fcb535fe700] DEBUG root 
>>> (build/qdisp/MessageStore.cc:49) - Msg: 1234567890 1215 Error 
>>> merging result 1420 (Result message MD5 mismatch) 1423711401
>>> 0211 19:23:31.664 [0x7fcb535fe700] INFO  root 
>>> (build/qdisp/Executive.cc:197) - Query exec error:. 1 != 0
>>> 0211 19:23:31.664 [0x7fcb535fe700] INFO  root 
>>> (build/rproc/InfileMerger.cc:325) - Merged 
>>> qservResult.result_4492663602 into qservResult.result_4492663602
>>> 0211 19:23:31.664 [0x7fcb535fe700] ERROR root 
>>> (build/ccontrol/UserQuery.cc:221) - Joined everything (failure!)
>>> 0211 19:23:31.665 [0x7fcb535fe700] INFO  root (app.py:569) - Query 
>>> exec (7) took 10.001105 seconds
>>> 0211 19:23:31.665 [0x7fcb535fe700] ERROR root 
>>> (build/qdisp/Executive.cc:307) - Ref=1 
>>> Resource(/chk/LSST/1234567890): 20150211-19:23:21, Error merging 
>>> result, 1420, Result message MD5 mismatch
>>> 0211 19:23:31.666 [0x7fcb535fe700] DEBUG root (app.py:389) - 
>>> reporting -1 -1 Ref=1 Resource(/chk/LSST/1234567890): 
>>> 20150211-19:23:21, Error merging result, 1420, Result message MD5 
>>> mismatch
>>> 0211 19:23:31.666 [0x7fcb535fe700] ERROR root 
>>> (build/qdisp/MessageStore.cc:47) - Msg: -1 -1 Ref=1 
>>> Resource(/chk/LSST/1234567890): 20150211-19:23:21, Error merging 
>>> result, 1420, Result message MD5 mismatch
>>> 0211 19:23:31.667 [0x7fcb535fe700] INFO  root (app.py:574) - Final 
>>> state of all queries error
>>> 0211 19:23:31.668 [0x7fcb521fc700] INFO  root 
>>> (build/ccontrol/UserQuery.cc:251) - Discarded UserQuery(7)
>>>
>>> On Feb 11, 2015, at 2:36 PM, Fabrice Jammes 
>>> <[log in to unmask]> wrote:
>>>
>>>> Hi Tatiana,
>>>>
>>>> I think you're right. The 2MB xrootd can produce this error message.
>>>> qserv-czar.log produce more accurate information, but I don't know 
>>>> if you can access it?
>>>>
>>>> Cheers,
>>>>
>>>> Fabrice
>>>>
>>>> On 02/11/2015 02:23 PM, Tatiana Goldina wrote:
>>>>> I don't think I saw this error before, but it may be related to 
>>>>> 2MB xroot limit (DM-1847). If the limit is lower, I do get the data.
>>>>>
>>>>> sql> select * from Science_Ccd_Exposure where 
>>>>> scisql_s2PtInCircle(ra, decl, 0.5, 1.0, 0.5)=1 LIMIT 3000
>>>>> [2015-02-11 13:43:17] [42S02][1051] Unknown table 'result_4414650179'
>>>>> [2015-02-11 13:43:17] [Proxy][4120] Error during execution:
>>>>> -1 Ref=1 Resource(/chk/qservTest_caseSUI_qserv/1234567890): 
>>>>> 20150211-13:43:06, Error merging result, 1420, Result message MD5 
>>>>> mismatch (-1)
>>>>>
>>>>> ######################################################################## 
>>>>>
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>>>
>>>> ######################################################################## 
>>>>
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>>>
>>>>
>>>> -- 
>>>> BEGIN-ANTISPAM-VOTING-LINKS
>>>> ------------------------------------------------------
>>>>
>>>> Teach CanIt if this mail (ID 04NPaAUBC) is spam:
>>>> Spam: 
>>>> https://canit.ipac.caltech.edu/canit/b.php?i=04NPaAUBC&m=b93d2877e507&c=s
>>>> Not spam: 
>>>> https://canit.ipac.caltech.edu/canit/b.php?i=04NPaAUBC&m=b93d2877e507&c=n
>>>> Forget vote: 
>>>> https://canit.ipac.caltech.edu/canit/b.php?i=04NPaAUBC&m=b93d2877e507&c=f
>>>> ------------------------------------------------------
>>>> END-ANTISPAM-VOTING-LINKS
>>>
>>> ######################################################################## 
>>>
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the QSERV-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1