Ah, yes, Fabrizio and I are working on a paper that goes into detail about
this. I'll give you the short answer....
> If the manager olbd crashes, then the manager's cache of what files are
> already staged on a machine would be lost. When it restarts, does it
> rebuild that knowledge only by responding to requests to open files (at
> which point it may behave inefficiently by assuming that there are no
> primary servers for a file)? Or does it broadcast some directive to get
> information about which servers have the file already staged?
Caches almost always get rebuilt on demand. So, the olbd requires that the
server that was assigned to stage the file must, by convention, claim to
have the file in the future even if the file hasn't been staged yet.
> If a server drops a staged file to make space for something else, does it
> also tell the manager so that the manager updates its cache of what files
> are where? Or does the central manager always direct what should be
> dropped (since it has more global knowledge about what is heavily
Since there is no central point of control in this system (remember that
even the top-most olbd doesn't know anything beyond its immediate set of 64
neighbors), file discards are determined by the holder of the resource
(i.e., leaf node). When a file is discarded, a message is sent to manager to
allow it to update its cache. Having "more global" knowledge here is moot.
It winds up that files rarely get replicated (at least in our environment).
As far as we can tell it's because the algorithm for server selection works
remarably well in practice (it wasn't luck - we had several iterations of
bad algorithms until we hit on a good one).
> Gregory J. Sharp email: [log in to unmask]
> Wilson Synchrotron Laboratory url:
> Dryden Rd ph: +1 607 255 4882
> Ithaca, NY 14853 fax: +1 607 255 8062