Hi Serge,
Disclaimer: It wouldn't surprise me if one of the possible paths still
leaked resources. This code is complicated. The primary problem was the
normal successful operations were leaking mysql connections, of which we
don't have that many to leak. This needed to be fixed quickly because
tests were breaking.
I see your point, and will say that when I do a '^c' to cancel a
request, it isn't leaking mysql connections. It does look like there's
an issue with _finishStatus that bears a closer look.
I think an easier way would be to change
if (_finishStatus != ACTIVE) {
return;
}
to:
if (_finishStatus != ACTIVE) {
if (_finishStatus == CANCELLED) {
_cleanup();
}
return;
}
I see this as safe as either way we would want to get rid of those
pointers at that point. Resetting the shared_ptr's at the point you do
would cause the current object to be destroyed before calling
Finished(true).
I do think this could use a closer look, but I need to go.
-John
On 05/08/15 15:30, Serge Monkewitz wrote:
> Hi John and Fritz,
>
> I have a question regarding the patch from 2716 - sorry to be dense. I
> basically don't understand how the memory leaks are getting fixed in the
> cancellation case. In this e-mail I'm just going to write down what I think
> I understand, and maybe if you have the time, one of you could tell me what I'm
> missing.
>
> From what I can see, ResponseRequester holds a shared_ptr to a CancelFunc,
> which holds a raw pointer to a QueryRequest that it owns. The only way the
> CancelFunc (and hence QueryRequest) can be destroyed is if
> ResponseRequester.registerCancel() is called (it replaces the previous
> CancelFunc), or if the ResponseRequester is destroyed. registerCancel() is
> called once per try, but the CancelFunc set by the last retry will not get
> destroyed by a subsequent registerCancel() call.
>
> Prior to 2716, one reason the ResponseRequester would never be destroyed is
> that QueryRequest held a shared_ptr back to its ResponseRequester, which is
> cyclic. To fix that, QueryRequest::_cleanup() was introduced; it resets that
> shared_ptr (and the retry functor pointer) when processing for that QueryRequest
> has finished. This _cleanup method is called at the end of _errorFinish and
> _finish.
>
> However, there are exit paths in these functions where _cleanup() is not
> called - in both cases, if (_finishStatus != ACTIVE), there is a bare return.
> _finishStatus is set in 4 places: the constructor, _finish and _errorFinish,
> but also in cancel(). If I'm reading the code correctly, _cleanup() will not be
> called if QueryRequest::cancel() is ever called.
>
> Because cancel() resets _retryFunc separately, the other reference cycle
> involving Executive::DispatchAction is broken, but... how is MergingRequester
> getting destroyed, given this other cycle?
>
> Also, if you look at _errorFinish again, Finished(shouldCancel) is only called
> if _finishStatus == ACTIVE, which means cancel() does not invoke
> XrdSsiRequest::Finished(true). It seems like this might mean we are not
> notifying xrootd of request cancellation.
>
> Another problem is that QueryRequest::ProcessResponse calls cancel() if
> cancelled(), even though the only way for cancelled() to return true is if
> cancel() was called previously (which should have cleaned up the request).
>
> Based on the above, I was thinking QueryRequest::cancel() should look
> more like:
>
> void QueryRequest::cancel() {
> {
> boost::lock_guard<boost::mutex> lock(_finishStatusMutex);
> if(_finishStatus != ACTIVE) {
> return; // Don't do anything if already finished or cancelled.
> }
> _finishStatus = CANCELLED;
> }
> _cleanup(); // Prevent retries and allow requester deletion.
> _status.report(ExecStatus::CANCEL);
> if(!Finished(true)) {
> LOGF_ERROR("Error cleaning up QueryRequest");
> } else {
> LOGF_INFO("QueryRequest cancelled");
> }
> }
>
>
> Where am I going wrong?
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|