Print

Print


I have a problem with xrootd which I can't really explain, although I 
have some suspicions.
My systems run RHEL 3. As a result, I am using round-robin scheduling 
rather than load balancing.
A have one director (D) and three data servers (S1, S2 & S3).  I have 
set olbd.delay drop 1m.
I have three files cached, one on each of the data servers. The HSM 
system is up and running, and
able to deliver all three files to any server that requests it.

Now if I run a program that reads all three files, then D caches the 
location status. If I now kill S1 and S2 and try to run my program 
again, it hangs for 1 minute waiting for D to drop S1 and S2. After 
that, D sits in an (apparently) endless loop of

     041210 15:05:38 010 do_Select Select delay XXXXXX.lns.cornell.edu 15

(where XXXXXX is the deleted hostname) instead of caching the two files 
to S3 that are not already cached there.

I suspect that this is because it cached the negative responses from S3 
regarding the two files not cached on it, which were made when S1 and 
S2 were still running. I have to hope that when S1 and S2 were dropped 
that all the files that D believed they cached were removed from the 
olb cache. But when that happens, it might also be a good idea to drop 
all negative cache information.

Is there a parameter to shorten the lifetime of the negative cache 
responses? olb.fxhold seems like a candidate, but that deletes positive 
caching information as well, which I don't want to do.

Am I barking up the wrong tree here and I need to tweak something else 
to stop this aberrant behavior?
Any help would be greatly appreciated.

--
Gregory J. Sharp                   email: [log in to unmask]
Wilson Synchrotron Laboratory      url: 
http://www.lepp.cornell.edu/~gregor
Dryden Rd                          ph:  +1 607 255 4882
Ithaca, NY 14853                   fax: +1 607 255 8062