Hi, few questions, just do see if I got it right. in an autoconf cluster all nodes, beside the managers are specified as superservers. a superserver is first of all (at startup) a normal dataserver. this means there are two processes running, an xrootd (target) and an olbd (dataserver) process. if new supervisors are needed the manager tells a superserver to start a supervisor. hence another process is forked (an xrootd including olbd - right?). so at a superserver node acting also as supervisor in total 3 processes are running (1 olbd, 2 xrootds)? have you thought about which available superserver should fork a supervisor? just take the next available or apply an algorithm (for example an election?) to determine the best candidate? cheers, guenter Andrew Hanushevsky wrote: > Hi, > > I have been investigating on how to meet various needs in terms of > 0-config clusters and simplifying the overall setup. The two main items > that were brought up: 1) 0-config, and 2) combining the xrootd and olbd so > that one does not have to deal with two daemons seem like workable > solutions that, in fact, are very related. So, here is a proposal and > please feel free to rip it apart :-) > > In general, manager and supervisor nodes can combine the xrootd and olbd > as one daemon. This is because they perform one simple function -- lookup > and redirection. The thread demands are largely homogeneous and one would > expect a smooth flow through the daemon. Data servers, on the other hand, > pose a problem since combining the two functions is like mixing apples and > oranges. Data servers never really need olbd functionality and, indeed, > the thread demands for data serving would compete with the services a data > server olbd provides to the cluster. Depending on the thread contention, > long delays can be introduced into the olbd path that would cause > unpredictable behaviour in terms of locating files. Hence, the two > function really need to live in separate processes. > > So, here is what I can do to work within these constraints. > > 1) We introduce a new directive (optional): > > xrootd.olb <path> > > which specifies the location of the olbd "plugin", libXrdOlb.so. The > default is to use whatever LD_LIBRARY_PATH happens to be set to. > > 2) Manager and supervisor xrootd’s simply load the plugin and use it via > an object interface. Data server xrootd’s load the plugin then fork and > execute the plugin in a separate process. The process verifies that a > previous incarnation is not running and if it is, exits since the xrootd > will simply use the previous incarnation. This allows all functionality > to be controlled by simply starting an xrootd with the appropriate > parameters. No more starting a separate daemon. > > 3) Introduce a new directive (mandatory for auto-config clusters): > > olb.xrootd <command line> > > This directive specifies how to start an xrootd that will function as a > supervisor. I suppose we can come up with defaults but the problem is that > the xrd layer strips out parameters before passing the command line to > xrootd so we can never know things like where the log file should go. > However, I don’t think that this parameter is unwieldy since it’s pretty > much fixed for once you dream up the configuration. > > 4) Introduce a new directive (optional): > > olb.ftlevel x% > > This specifies the fault tolerance level (default is 20%). The manager > will start enough supervisors to handle x% more data servers than really > is needed (e.g., be default 1.2 times as many supervisors than would be > needed are started). > > 5) Modify the existing role directive: > > olb.role {manager | server | supervisor | superserver ]} [if <conds>] > > The difference is that you can specify that a data server olb can also > function as a supervisor olb if you specify > > olb.role superserver > > In general, auto-clusters would always have that directive (the default > being manual configuration). This also provides an convenient way to limit > which nodes can act as superservers. > > The algorithm would work as follows: > > 1) As data server olb’s connect they tell the manager (as is now) how many > rounds they have done without finding a supervisor. Once that number > reaches 3 (arbitrary -- you can give me another one), the manager asks a > superserver that it has not asked before to start a supervisor and the > data server olbd is asked to restart it’s search. > > 2) When a superserver olbd is asked to start a supervisor (and it has not > done so already) it launches a supervisor using the olb.xrootd command. > The trick here (which I haven’t figured out yet) is how to know that a > supervisor has been launched across a restart of the data server > xrootd/olbd. Tricky, very tricky. > > 3) The manager tallies how many data servers it knows about and always > makes sure that olb.ftlevel supervisors have been started. > > 4) This leads to a good possibility that the superserver requests can be > cascaded so that auto-clustering can work past 4,096 servers. > > There are still many details to work out (the devil is in the details): > > a) How one controls this algorithm in the presence of load balanced > managers. That is, you can start x managers and, somehow, one has to > prevent these managers from starting supervisors willy-nilly. This is not > an easy problem to solve as managers work independently and are loath to > contact each other (in fact that’s one of the strengths of the current > scheme). > > b) What are the administrative interface relationships? This is another > one that I haven’t solved. It’s easy when xrootd’s and olbd’s are separate > but difficult to address when some are and some are not. > > c) What are the cache side-effects in the presence of combined > xrootd/olbd’s? Not clear. What I do know is that there will be more cache > activity as things come and go at the supervisor level. How that sorts out > is unknown. However, one good thing here is that this investigation did > bring to light a failing in the reconfiguration algorithm. Currently, the > system does not completely handle port reassignment across partial > reconfigurations (i.e., xrootd going then coming back with a different > port number). Something to fix. > > e) How will this effect existing schemes to automatically restart failed > servers? Note that data servers start an ephemeral olbd. This puts the > xrootd in the situation where it has to make sure that the ephemeral olbd > is restarted should it fail. It also adds in the big nit that this > knowledge is lost across data server restarts and it’s not clear how to > handle that situation. > > f) Should the architecture change in terms of the xroots/olbd > relationship? Currently, olbd interactions occur at the ofs layer. In a > combined xrootd/olbd these interactions could occur at the xroot protocol > layer. In some ways this is cleaner but it also is more restrictive in how > you can reuse components. > > g) How long will it take to reach stability? The answer is obviously > longer than it takes now since, if for no other reason, supervisors cannot > be pre-started. Unknown what production effect this will have. > > h) There are probably a lot of other end conditions that I don’t know > about. So, please speak up and ask questions on how things would be > handled in strange situations. > > All in all, the above is a workable solution but not something that I can > implement in a day. So, please comment because once I star down this road > it will be hard to change things. > > Andy > > >