Print

Print


URL:
  <http://savannah.cern.ch/bugs/?82184>

                 Summary: Problem for startup using /etc/init.d and proxy
services
                 Project: XROOTD
            Submitted by: bdouglas
            Submitted on: 2011-05-13 03:24
                Severity: 5 - Blocker
                Priority: 7 - High
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
      Fixed by commit(s): 

    _______________________________________________________

Details:

According to the xrootd manual, any component may be started in any order and
will simply wait until it has the right resources to proceed. This is great
because it removes the sysadmin from trying to figure out much when starting
up the system or restarting some part of it.
 However, this seems to be in direct opposition to how init.d works. That is,
while xrootd assumes a parallel start-up order; init.d assumes a serial one.
This essentially means that depending on how you specify things in init.d you
may get into a deadlock situation. For instance, starting a proxy server
before starting its manager on the same machine will hang because the server
will wait for the manager and the manager will never be started. While this
would seem to be avoided by reversing the order, that is not necessarily
going to solve anything because the manager could be waiting for resources
from other machines in the cluster essentially delaying the proxy which
simply delays init.d. This puts the sysadmin in the unenviable position of
trying to figure out the state of dozens of machines when init.d hangs when
all of this could have been avoided if init.d just started everything in
parallel. There seems to be a fundamental mismatch here. 
Today,  Andy and I had a heck of a time debugging this problem.
Intitially a typo in the config file caused the proxy xrootd server child
process not to finish and the the parent process was locked.  After we fixed
the mistake in config file, we restarted things and got into a dead lock
condition because none. Of the data servers were running. Not an uncommon
occurrence after a power outage.  The child proxy process was hung again
waiting for response from the redirector.  These locking occurances were
direct result of how /etc/init.d demonizes things.




    _______________________________________________________

Reply to this item at:

  <http://savannah.cern.ch/bugs/?82184>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/