Hi Gregry,

Ah, yes, Fabrizio and I are working on a paper that goes into detail about 
this. I'll give you the short answer....

> If the manager olbd crashes, then the manager's cache of what files are 
> already staged on a machine would be lost. When it restarts, does it 
> rebuild that knowledge only by responding to requests to open files (at 
> which point it may behave inefficiently by assuming that there are no 
> primary servers for a file)? Or does it broadcast some directive to get 
> information about which servers have the file already staged?
Caches almost always get rebuilt on demand. So, the olbd requires that the 
server that was assigned to stage the file must, by convention, claim to 
have the file in the future even if the file hasn't been staged yet.

> If a server drops a staged file to make space for something else, does it 
> also tell the manager so that the manager updates its cache of what files 
> are where? Or does the central manager always direct what should be 
> dropped (since it has more global knowledge about what is heavily 
> replicated)?
Since there is no central point of control in this system (remember that 
even the top-most olbd doesn't know anything beyond its immediate set of 64 
neighbors), file discards are determined by the holder of the resource 
(i.e., leaf node). When a file is discarded, a message is sent to manager to 
allow it to update its cache. Having "more global" knowledge here is moot. 
It winds up that files rarely get replicated (at least in our environment). 
As far as we can tell it's because the algorithm for server selection works 
remarably well in practice (it wasn't luck - we had several iterations of 
bad algorithms until we hit on a good one).


> --
> Gregory J. Sharp                   email: [log in to unmask]
> Wilson Synchrotron Laboratory      url: 
> Dryden Rd                          ph:  +1 607 255 4882
> Ithaca, NY 14853                   fax: +1 607 255 8062