Print

Print


Hi Serge,

Well, XrdSsiProviderServer is a simple global extern symbol so there is 
nothing to destroy there. Of course, a provider object address is assigned 
as the symbol's content. So, at exit that object would not be destroyed. 
The question is whether or not there is a need to destroy that object if 
you are exiting anyway (yes, for valgrind purposes and, perhaps, for other 
reasons you would want all destructors to be called).

Now, xrootd was never architected to call exit(); which is why you never 
see that in XrdMain.cc. Over the years (which has probably changed now) 
exiting from the main thread when other threads were active caused more 
problems that it was worth trying to deal with and it was written to 
simply accept a hard "kill" with no damaging side-effects. This may not 
work all that well with the qserv plug-in as there may be state 
information that has to be cleaned up. That said, you still have to deal 
with the fact that a server may crash and you never get a chance to 
shutdown cleanly anyway. So, the best course of action is to never put 
yourself in the position where you have to correct state information in 
that case (i.e. siply accept the fact that destructors will not always be 
called at shutdown). Of course, that's easy for me to say.

Perhaps, the best solution is either you implement a SIGTERM handler or 
xrootd can implement one (though what it would do is another story). Let 
me know your throughts on this.

Andy

On Fri, 28 Aug 2015, Serge Monkewitz wrote:

> [+AndyH]
>
> XrdSsiProviderServer is a global that will only be destroyed if you explicitly destruct it, if you call exit(), or if you return normally from main(). It might also get called if somebody dlclose()s the worker shared library, but I really donÿÿt know for sure whatÿÿs guaranteed in that case.
>
> Our worker is being stopped via delivery of SIGTERM to the xrootd process (followed up by SIGKILL if that doesnÿÿt work). Since exit() is not guaranteed to be async-signal-safe and we have no idea who might be using the object at arbitrary points, I donÿÿt think we can tie SIGTERM to a signal handler that does any of those things and call it a day. Iÿÿm not sure itÿÿs possible to do this correctly from inside an xrootd plugin, i.e. xrootd might have to provide graceful shutdown for us to be able to assume that our global destructors will get called.
>
> From what Iÿÿve read, one is supposed to do this sort of thing by having main() sigwait() for SIGTERM or SIGINT, or by having main() read from a ÿÿterminationÿÿ pipe that is written to from inside signal handlers. When a graceful shutdown request is received, one runs cleanup code and returns from main() normally. Xrd/XrdMain.cc <http://xrdmain.cc/> doesnÿÿt seem to do that, but I could easily be missing another approach or not understanding the code very well.
>
> Andy - can you advise?
>
> Serge
>
>> On Aug 28, 2015, at 12:16 PM, John Gates <[log in to unmask]> wrote:
>>
>> If that one destructor gets called, it should get everything on the list. The thing is we need to know when xrootd is done with it.
>>
>> -John
>>
>> On 08/28/15 12:08, Becla, Jacek wrote:
>>> ÿÿa list of destructorsÿÿÿÿ that sounds like it might be less trivial that I expected. If you think you need more than 1SP go ahead and tweak it today
>>>
>>> Jacek
>>>
>>>
>>>> On Aug 28, 2015, at 9:00 AM, Gates, John H <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>>
>>>> Also, there's a long list of destructors that aren't being called, including the one that should release the lock. I think this isn't being deleted.
>>>>    XrdSsiProvider *XrdSsiProviderServer = new lsst::qserv::xrdsvc::SsiProviderServer;
>>>>
>>>> On 08/27/15 20:54, Becla, Jacek wrote:
>>>>> For the record, I have a fix in DM-3618
>>>>>
>>>>> John, FYI, AndyS is restarting xrootd in wmgr because apparently that is the only way to force xrootd to refresh chunk inventory (at the moment)
>>>>>
>>>>> Jacek
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 27, 2015, at 7:39 PM, Becla, Jacek <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>>>>
>>>>>> John
>>>>>>
>>>>>> Your latest code is giving me troubles. Integration test killed xrootd, the tail of the log is similar to what you observed intermittently:
>>>>>>
>>>>>> [2015-08-27T21:32:13.084-0500] [0x7f92c2df8720] INFO  root (build/xrdsvc/SsiService.cc <http://ssiservice.cc/>:142) - Cleaning up scratchDb: qservScratch.
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] WARN  root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:302) - memLockStatus LOCKED_OTHER wrong uid. Expected 29962 got 28572 err=
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] WARN  root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:316) - Memory tables were not released cleanly! LockStatus=LOCKED_OTHER
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] DEBUG root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:272) - execLockSql CREATE DATABASE IF NOT EXISTS q_memoryLockDb;CREATE TABLE IF NOT EXISTS q_memoryLockDb.memoryLockTbl ( keyId INT UNIQUE, uid INT ) ENGINE = MEMORY;
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] DEBUG root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:272) - execLockSql INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 )
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] ERROR root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:373) - Lock failed, exiting. query=INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 ) err=Error 1062: Duplicate entry '1' for key 'keyId' Unable to execute query: INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 )
>>>>>>
>>>>>>
>>>>>> I saved full log here:
>>>>>>
>>>>>> /home/becla/qserv-run/2015_08/var/log/worker/xrootd.log
>>>>>>
>>>>>> I hope we will be able to resolve it very soon, if we wonÿÿt, backup plan:
>>>>>> a) back off all changes
>>>>>> b) disable the code that is causing xrootd to die while you investigate
>>>>>>
>>>>>> Iÿÿll try to debug it tonight
>>>>>>
>>>>>> Jacek
>>>>>>
>>>>>> Use REPLY-ALL to reply to list
>>>>>>
>>>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1>
>>>>>
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1>
>>>
>>
>>
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1