LISTSERV mailing list manager LISTSERV 16.5

Help for QSERV-L Archives


QSERV-L Archives

QSERV-L Archives


QSERV-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

QSERV-L Home

QSERV-L Home

QSERV-L  September 2015

QSERV-L September 2015

Subject:

Re: xrootd graceful shutdown? (was: memoryLockDb troubles)

From:

Andrew Hanushevsky <[log in to unmask]>

Reply-To:

General discussion for qserv (LSST prototype baseline catalog)

Date:

Thu, 3 Sep 2015 14:34:28 -0700

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (124 lines)

Hi Serge,

Well, XrdSsiProviderServer is a simple global extern symbol so there is 
nothing to destroy there. Of course, a provider object address is assigned 
as the symbol's content. So, at exit that object would not be destroyed. 
The question is whether or not there is a need to destroy that object if 
you are exiting anyway (yes, for valgrind purposes and, perhaps, for other 
reasons you would want all destructors to be called).

Now, xrootd was never architected to call exit(); which is why you never 
see that in XrdMain.cc. Over the years (which has probably changed now) 
exiting from the main thread when other threads were active caused more 
problems that it was worth trying to deal with and it was written to 
simply accept a hard "kill" with no damaging side-effects. This may not 
work all that well with the qserv plug-in as there may be state 
information that has to be cleaned up. That said, you still have to deal 
with the fact that a server may crash and you never get a chance to 
shutdown cleanly anyway. So, the best course of action is to never put 
yourself in the position where you have to correct state information in 
that case (i.e. siply accept the fact that destructors will not always be 
called at shutdown). Of course, that's easy for me to say.

Perhaps, the best solution is either you implement a SIGTERM handler or 
xrootd can implement one (though what it would do is another story). Let 
me know your throughts on this.

Andy

On Fri, 28 Aug 2015, Serge Monkewitz wrote:

> [+AndyH]
>
> XrdSsiProviderServer is a global that will only be destroyed if you explicitly destruct it, if you call exit(), or if you return normally from main(). It might also get called if somebody dlclose()s the worker shared library, but I really donÿÿt know for sure whatÿÿs guaranteed in that case.
>
> Our worker is being stopped via delivery of SIGTERM to the xrootd process (followed up by SIGKILL if that doesnÿÿt work). Since exit() is not guaranteed to be async-signal-safe and we have no idea who might be using the object at arbitrary points, I donÿÿt think we can tie SIGTERM to a signal handler that does any of those things and call it a day. Iÿÿm not sure itÿÿs possible to do this correctly from inside an xrootd plugin, i.e. xrootd might have to provide graceful shutdown for us to be able to assume that our global destructors will get called.
>
> From what Iÿÿve read, one is supposed to do this sort of thing by having main() sigwait() for SIGTERM or SIGINT, or by having main() read from a ÿÿterminationÿÿ pipe that is written to from inside signal handlers. When a graceful shutdown request is received, one runs cleanup code and returns from main() normally. Xrd/XrdMain.cc <http://xrdmain.cc/> doesnÿÿt seem to do that, but I could easily be missing another approach or not understanding the code very well.
>
> Andy - can you advise?
>
> Serge
>
>> On Aug 28, 2015, at 12:16 PM, John Gates <[log in to unmask]> wrote:
>>
>> If that one destructor gets called, it should get everything on the list. The thing is we need to know when xrootd is done with it.
>>
>> -John
>>
>> On 08/28/15 12:08, Becla, Jacek wrote:
>>> ÿÿa list of destructorsÿÿÿÿ that sounds like it might be less trivial that I expected. If you think you need more than 1SP go ahead and tweak it today
>>>
>>> Jacek
>>>
>>>
>>>> On Aug 28, 2015, at 9:00 AM, Gates, John H <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>>
>>>> Also, there's a long list of destructors that aren't being called, including the one that should release the lock. I think this isn't being deleted.
>>>>    XrdSsiProvider *XrdSsiProviderServer = new lsst::qserv::xrdsvc::SsiProviderServer;
>>>>
>>>> On 08/27/15 20:54, Becla, Jacek wrote:
>>>>> For the record, I have a fix in DM-3618
>>>>>
>>>>> John, FYI, AndyS is restarting xrootd in wmgr because apparently that is the only way to force xrootd to refresh chunk inventory (at the moment)
>>>>>
>>>>> Jacek
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 27, 2015, at 7:39 PM, Becla, Jacek <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>>>>
>>>>>> John
>>>>>>
>>>>>> Your latest code is giving me troubles. Integration test killed xrootd, the tail of the log is similar to what you observed intermittently:
>>>>>>
>>>>>> [2015-08-27T21:32:13.084-0500] [0x7f92c2df8720] INFO  root (build/xrdsvc/SsiService.cc <http://ssiservice.cc/>:142) - Cleaning up scratchDb: qservScratch.
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] WARN  root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:302) - memLockStatus LOCKED_OTHER wrong uid. Expected 29962 got 28572 err=
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] WARN  root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:316) - Memory tables were not released cleanly! LockStatus=LOCKED_OTHER
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] DEBUG root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:272) - execLockSql CREATE DATABASE IF NOT EXISTS q_memoryLockDb;CREATE TABLE IF NOT EXISTS q_memoryLockDb.memoryLockTbl ( keyId INT UNIQUE, uid INT ) ENGINE = MEMORY;
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] DEBUG root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:272) - execLockSql INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 )
>>>>>> [2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] ERROR root (build/wdb/ChunkResource.cc <http://chunkresource.cc/>:373) - Lock failed, exiting. query=INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 ) err=Error 1062: Duplicate entry '1' for key 'keyId' Unable to execute query: INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 )
>>>>>>
>>>>>>
>>>>>> I saved full log here:
>>>>>>
>>>>>> /home/becla/qserv-run/2015_08/var/log/worker/xrootd.log
>>>>>>
>>>>>> I hope we will be able to resolve it very soon, if we wonÿÿt, backup plan:
>>>>>> a) back off all changes
>>>>>> b) disable the code that is causing xrootd to die while you investigate
>>>>>>
>>>>>> Iÿÿll try to debug it tonight
>>>>>>
>>>>>> Jacek
>>>>>>
>>>>>> Use REPLY-ALL to reply to list
>>>>>>
>>>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1>
>>>>>
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1>
>>>
>>
>>
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 <https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2018
February 2018
January 2018
December 2017
August 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use