Print

Print


Hi,
few questions, just do see if I got it right.

in an autoconf cluster all nodes, beside the managers are specified as 
superservers. a superserver is first of all (at startup) a normal 
dataserver. this means there are two processes running, an xrootd 
(target) and an olbd (dataserver) process. if new supervisors are needed 
the manager tells a superserver to start a supervisor. hence another 
process is forked (an xrootd including olbd - right?). so at a 
superserver node acting also as supervisor in total 3 processes are 
running (1 olbd, 2 xrootds)?

have you thought about which available superserver should fork a 
supervisor? just take the next available or apply an algorithm (for 
example an election?) to determine the best candidate?

cheers,
guenter

Andrew Hanushevsky wrote:
> Hi,
> 
> I have been investigating on how to meet various needs in terms of
> 0-config clusters and simplifying the overall setup. The two main items
> that were brought up: 1) 0-config, and 2) combining the xrootd and olbd so
> that one does not have to deal with two daemons seem like workable
> solutions that, in fact, are very related. So, here is a proposal and
> please feel free to rip it apart :-)
> 
> In general, manager and supervisor nodes can combine the xrootd and olbd
> as one daemon. This is because they perform one simple function -- lookup
> and redirection. The thread demands are largely homogeneous and one would
> expect a smooth flow through the daemon. Data servers, on the other hand,
> pose a problem since combining the two functions is like mixing apples and
> oranges. Data servers never really need olbd functionality and, indeed,
> the thread demands for data serving would compete with the services a data
> server olbd provides to the cluster. Depending on the thread  contention,
> long delays can be introduced into the olbd path that would  cause
> unpredictable behaviour in terms of locating files. Hence, the two
> function really need to live in separate processes.
> 
> So, here is what I can do to work within these constraints.
> 
> 1) We introduce a new directive (optional):
> 
> xrootd.olb <path>
> 
> which specifies the location of the olbd "plugin", libXrdOlb.so. The
> default is to use whatever LD_LIBRARY_PATH happens to be set to.
> 
> 2) Manager and supervisor xrootd’s simply load the plugin and use it via
> an object interface. Data server xrootd’s load the plugin  then fork and
> execute the plugin in a separate process. The process verifies that a
> previous incarnation is not running and if it is, exits since the xrootd
> will simply use the previous  incarnation. This allows all functionality
> to be controlled by simply starting an xrootd with the appropriate
> parameters. No more starting a separate daemon.
> 
> 3) Introduce a new directive (mandatory for auto-config clusters):
> 
> olb.xrootd <command line>
> 
> This directive specifies how to start an xrootd that will function as a
> supervisor. I suppose we can come up with defaults but the problem is that
> the xrd layer strips out  parameters before passing the command line to
> xrootd so we can never know things like where the log file should go.
> However, I don’t think that this parameter is unwieldy since it’s pretty
> much fixed for once you dream up the configuration.
> 
> 4) Introduce a new directive (optional):
> 
> olb.ftlevel x%
> 
> This specifies the fault tolerance level (default is 20%). The manager
> will start enough supervisors to handle x% more data servers than really
> is needed (e.g., be default 1.2 times as many supervisors than would be
> needed are started).
> 
> 5) Modify the existing role directive:
> 
> olb.role {manager | server | supervisor | superserver ]} [if <conds>]
> 
> The difference is that you can specify that a data server olb can also
> function as a supervisor olb if you specify
> 
> olb.role superserver
> 
> In general, auto-clusters would always have that directive (the default
> being manual configuration). This also provides an convenient way to limit
> which nodes can act as superservers.
> 
> The algorithm would work as follows:
> 
> 1) As data server olb’s connect they tell the manager (as is now) how many
> rounds they have done without finding a supervisor. Once that number
> reaches 3 (arbitrary -- you can give me another one), the manager asks a
> superserver that it has not asked before to start a supervisor and the
> data server olbd is asked to restart it’s search.
> 
> 2) When a superserver olbd is asked to start a supervisor (and it has not
> done so already) it launches a supervisor using the olb.xrootd command.
> The trick here (which I haven’t figured out yet) is how to know that a
> supervisor has been launched across a restart of the data server
> xrootd/olbd. Tricky, very tricky.
> 
> 3) The manager tallies how many data servers it knows about and always
> makes sure that olb.ftlevel supervisors have been started.
> 
> 4) This leads to a good possibility that the superserver requests can be
> cascaded so that auto-clustering can work past 4,096 servers.
> 
> There are still many details to work out (the devil is in the details):
> 
> a) How one controls this algorithm in the presence of load balanced
> managers. That is, you can start x managers and, somehow, one has to
> prevent these managers from starting supervisors willy-nilly. This is not
> an easy problem to solve as managers work independently and are loath to
> contact each other (in fact that’s one of the strengths of the current
> scheme).
> 
> b) What are the administrative interface relationships? This is another
> one that I haven’t solved. It’s easy when xrootd’s and olbd’s are separate
> but difficult to address when some are and some are not.
> 
> c) What are the cache side-effects in the presence of combined
> xrootd/olbd’s? Not clear. What I do know is that there will be more cache
> activity as things come and go at the supervisor level. How that sorts out
> is unknown. However, one good thing here is that this investigation did
> bring to light a failing in the reconfiguration algorithm. Currently, the
> system does not completely handle port reassignment across partial
> reconfigurations (i.e., xrootd going then coming back with a different
> port number). Something to fix.
> 
> e) How will this effect existing schemes to automatically restart failed
> servers? Note that data servers start an ephemeral olbd. This puts the
> xrootd in the situation where it has to make sure that the ephemeral olbd
> is restarted should it fail. It also adds in the big nit that this
> knowledge is lost across data server restarts and it’s not clear how to
> handle that situation.
> 
> f) Should the architecture change in terms of the xroots/olbd
> relationship? Currently, olbd interactions occur at the ofs layer. In a
> combined xrootd/olbd these interactions could occur at the xroot protocol
> layer. In some ways this is cleaner but it also is more restrictive in how
> you can reuse components.
> 
> g) How long will it take to reach stability? The answer is obviously
> longer than it takes now since, if for no other reason, supervisors cannot
> be pre-started. Unknown what production effect this will have.
> 
> h) There are probably a lot of other end conditions that I don’t know
> about. So, please speak up and ask questions on how things would be
> handled in strange situations.
> 
> All in all, the above is a workable solution but not something that I can
> implement in a day. So, please comment because once I star down this road
> it will be hard to change things.
> 
> Andy
> 
> 
>