Hi, I have been investigating on how to meet various needs in terms of 0-config clusters and simplifying the overall setup. The two main items that were brought up: 1) 0-config, and 2) combining the xrootd and olbd so that one does not have to deal with two daemons seem like workable solutions that, in fact, are very related. So, here is a proposal and please feel free to rip it apart :-) In general, manager and supervisor nodes can combine the xrootd and olbd as one daemon. This is because they perform one simple function -- lookup and redirection. The thread demands are largely homogeneous and one would expect a smooth flow through the daemon. Data servers, on the other hand, pose a problem since combining the two functions is like mixing apples and oranges. Data servers never really need olbd functionality and, indeed, the thread demands for data serving would compete with the services a data server olbd provides to the cluster. Depending on the thread contention, long delays can be introduced into the olbd path that would cause unpredictable behaviour in terms of locating files. Hence, the two function really need to live in separate processes. So, here is what I can do to work within these constraints. 1) We introduce a new directive (optional): xrootd.olb <path> which specifies the location of the olbd "plugin", libXrdOlb.so. The default is to use whatever LD_LIBRARY_PATH happens to be set to. 2) Manager and supervisor xrootd’s simply load the plugin and use it via an object interface. Data server xrootd’s load the plugin then fork and execute the plugin in a separate process. The process verifies that a previous incarnation is not running and if it is, exits since the xrootd will simply use the previous incarnation. This allows all functionality to be controlled by simply starting an xrootd with the appropriate parameters. No more starting a separate daemon. 3) Introduce a new directive (mandatory for auto-config clusters): olb.xrootd <command line> This directive specifies how to start an xrootd that will function as a supervisor. I suppose we can come up with defaults but the problem is that the xrd layer strips out parameters before passing the command line to xrootd so we can never know things like where the log file should go. However, I don’t think that this parameter is unwieldy since it’s pretty much fixed for once you dream up the configuration. 4) Introduce a new directive (optional): olb.ftlevel x% This specifies the fault tolerance level (default is 20%). The manager will start enough supervisors to handle x% more data servers than really is needed (e.g., be default 1.2 times as many supervisors than would be needed are started). 5) Modify the existing role directive: olb.role {manager | server | supervisor | superserver ]} [if <conds>] The difference is that you can specify that a data server olb can also function as a supervisor olb if you specify olb.role superserver In general, auto-clusters would always have that directive (the default being manual configuration). This also provides an convenient way to limit which nodes can act as superservers. The algorithm would work as follows: 1) As data server olb’s connect they tell the manager (as is now) how many rounds they have done without finding a supervisor. Once that number reaches 3 (arbitrary -- you can give me another one), the manager asks a superserver that it has not asked before to start a supervisor and the data server olbd is asked to restart it’s search. 2) When a superserver olbd is asked to start a supervisor (and it has not done so already) it launches a supervisor using the olb.xrootd command. The trick here (which I haven’t figured out yet) is how to know that a supervisor has been launched across a restart of the data server xrootd/olbd. Tricky, very tricky. 3) The manager tallies how many data servers it knows about and always makes sure that olb.ftlevel supervisors have been started. 4) This leads to a good possibility that the superserver requests can be cascaded so that auto-clustering can work past 4,096 servers. There are still many details to work out (the devil is in the details): a) How one controls this algorithm in the presence of load balanced managers. That is, you can start x managers and, somehow, one has to prevent these managers from starting supervisors willy-nilly. This is not an easy problem to solve as managers work independently and are loath to contact each other (in fact that’s one of the strengths of the current scheme). b) What are the administrative interface relationships? This is another one that I haven’t solved. It’s easy when xrootd’s and olbd’s are separate but difficult to address when some are and some are not. c) What are the cache side-effects in the presence of combined xrootd/olbd’s? Not clear. What I do know is that there will be more cache activity as things come and go at the supervisor level. How that sorts out is unknown. However, one good thing here is that this investigation did bring to light a failing in the reconfiguration algorithm. Currently, the system does not completely handle port reassignment across partial reconfigurations (i.e., xrootd going then coming back with a different port number). Something to fix. e) How will this effect existing schemes to automatically restart failed servers? Note that data servers start an ephemeral olbd. This puts the xrootd in the situation where it has to make sure that the ephemeral olbd is restarted should it fail. It also adds in the big nit that this knowledge is lost across data server restarts and it’s not clear how to handle that situation. f) Should the architecture change in terms of the xroots/olbd relationship? Currently, olbd interactions occur at the ofs layer. In a combined xrootd/olbd these interactions could occur at the xroot protocol layer. In some ways this is cleaner but it also is more restrictive in how you can reuse components. g) How long will it take to reach stability? The answer is obviously longer than it takes now since, if for no other reason, supervisors cannot be pre-started. Unknown what production effect this will have. h) There are probably a lot of other end conditions that I don’t know about. So, please speak up and ask questions on how things would be handled in strange situations. All in all, the above is a workable solution but not something that I can implement in a day. So, please comment because once I star down this road it will be hard to change things. Andy