Print

Print


Hi John and Fritz,

    I have a question regarding the patch from 2716 - sorry to be dense. I
basically don't understand how the memory leaks are getting fixed in the
cancellation case. In this e-mail I'm just going to write down what I think
I understand, and maybe if you have the time, one of you could tell me what I'm
missing.

From what I can see, ResponseRequester holds a shared_ptr to a CancelFunc,
which holds a raw pointer to a QueryRequest that it owns. The only way the
CancelFunc (and hence QueryRequest) can be destroyed is if
ResponseRequester.registerCancel() is called (it replaces the previous
CancelFunc), or if the ResponseRequester is destroyed. registerCancel() is
called once per try, but the CancelFunc set by the last retry will not get
destroyed by a subsequent registerCancel() call.

Prior to 2716, one reason the ResponseRequester would never be destroyed is
that QueryRequest held a shared_ptr back to its ResponseRequester, which is
cyclic. To fix that, QueryRequest::_cleanup() was introduced; it resets that
shared_ptr (and the retry functor pointer) when processing for that QueryRequest
has finished. This _cleanup method is called at the end of _errorFinish and
_finish.

However, there are exit paths in these functions where _cleanup() is not
called - in both cases, if (_finishStatus != ACTIVE), there is a bare return.
_finishStatus is set in 4 places: the constructor, _finish and _errorFinish,
but also in cancel(). If I'm reading the code correctly, _cleanup() will not be
called if QueryRequest::cancel() is ever called.

Because cancel() resets _retryFunc separately, the other reference cycle
involving Executive::DispatchAction is broken, but... how is MergingRequester
getting destroyed, given this other cycle?

Also, if you look at _errorFinish again, Finished(shouldCancel) is only called
if _finishStatus == ACTIVE, which means cancel() does not invoke
XrdSsiRequest::Finished(true). It seems like this might mean we are not
notifying xrootd of request cancellation.

Another problem is that QueryRequest::ProcessResponse calls cancel() if
cancelled(), even though the only way for cancelled() to return true is if
cancel() was called previously (which should have cleaned up the request).

Based on the above, I was thinking QueryRequest::cancel() should look
more like:

    void QueryRequest::cancel() {
        {
            boost::lock_guard<boost::mutex> lock(_finishStatusMutex);
            if(_finishStatus != ACTIVE) {
                return; // Don't do anything if already finished or cancelled.
            }
            _finishStatus = CANCELLED;
        }
        _cleanup(); // Prevent retries and allow requester deletion.
        _status.report(ExecStatus::CANCEL);
        if(!Finished(true)) {
            LOGF_ERROR("Error cleaning up QueryRequest");
        } else {
            LOGF_INFO("QueryRequest cancelled");
        }
    }


Where am I going wrong?
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1