URL:
<http://savannah.cern.ch/bugs/?82184>
Summary: Problem for startup using /etc/init.d and proxy
services
Project: XROOTD
Submitted by: bdouglas
Submitted on: 2011-05-13 03:24
Severity: 5 - Blocker
Priority: 7 - High
Status: None
Privacy: Public
Assigned to: None
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Fixed by commit(s):
_______________________________________________________
Details:
According to the xrootd manual, any component may be started in any order and
will simply wait until it has the right resources to proceed. This is great
because it removes the sysadmin from trying to figure out much when starting
up the system or restarting some part of it.
However, this seems to be in direct opposition to how init.d works. That is,
while xrootd assumes a parallel start-up order; init.d assumes a serial one.
This essentially means that depending on how you specify things in init.d you
may get into a deadlock situation. For instance, starting a proxy server
before starting its manager on the same machine will hang because the server
will wait for the manager and the manager will never be started. While this
would seem to be avoided by reversing the order, that is not necessarily
going to solve anything because the manager could be waiting for resources
from other machines in the cluster essentially delaying the proxy which
simply delays init.d. This puts the sysadmin in the unenviable position of
trying to figure out the state of dozens of machines when init.d hangs when
all of this could have been avoided if init.d just started everything in
parallel. There seems to be a fundamental mismatch here.
Today, Andy and I had a heck of a time debugging this problem.
Intitially a typo in the config file caused the proxy xrootd server child
process not to finish and the the parent process was locked. After we fixed
the mistake in config file, we restarted things and got into a dead lock
condition because none. Of the data servers were running. Not an uncommon
occurrence after a power outage. The child proxy process was hung again
waiting for response from the redirector. These locking occurances were
direct result of how /etc/init.d demonizes things.
_______________________________________________________
Reply to this item at:
<http://savannah.cern.ch/bugs/?82184>
_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
|