Gregory J. Sharp wrote:
> Andy,
> I am trying to understand the olbd caching system. However, self-doubt 
> persists and I wonder if you could help me.
> In the case of load-based selection (as opposed to round-robin), it 
> looks like the following to me:
> 1. Each time a file is opened, the olbd manager makes a list (well, a 
> mask) of servers that have the file (a primary) or that could stage the 
> file.
> 2. It then checks the currently cached load information for each server 
> from step 1 and takes the first one that isn't overloaded.
> 3. If the selected server doesn't already have the file, then the master 
> tells it to stage the file and notes in the manager's cache that the 
> file should be on this server in the future.
> 4. It then redirects the open request to the selected server.
> Now for the ignorant questions:
> If the manager olbd crashes, then the manager's cache of what files are 
> already staged on a machine would be lost. When it restarts, does it 
> rebuild that knowledge only by responding to requests to open files (at 
> which point it may behave inefficiently by assuming that there are no 
> primary servers for a file)? Or does it broadcast some directive to get 
> information about which servers have the file already staged?

  Well, Andy may be more precise, but anyway the knowledge is rebuilt 
incrementally when new requests come. If a request for an unknown file 
comes, that olbd broadcasts an information request to its son nodes, and 
then it asynchronously collects the responses. So, it's not supposed to 
behave inefficiently, it only will (eventually) pause the client that is 
asking for the file.

> If a server drops a staged file to make space for something else, does 
> it also tell the manager so that the manager updates its cache of what 
> files are where? Or does the central manager always direct what should 
> be dropped (since it has more global knowledge about what is heavily 
> replicated)?

  Usually that is a task of the client protocol. If a client, after 
having followed a path through one or more redirectors, is unable to 
find a file where it was supposed to be, it jumps back asking for a 
cache refresh (relatively for that file). Doing this way, the periferic 
data servers are quite autonomous about the choices of dropping files, 
and, also, this mechanism is supposed to be good to recover from other 
forms of failures.


> -- 
> Gregory J. Sharp                   email: [log in to unmask]
> Wilson Synchrotron Laboratory      url:
> Dryden Rd                          ph:  +1 607 255 4882
> Ithaca, NY 14853                   fax: +1 607 255 8062