I have a problem with xrootd which I can't really explain, although I
have some suspicions.
My systems run RHEL 3. As a result, I am using round-robin scheduling
rather than load balancing.
A have one director (D) and three data servers (S1, S2 & S3). I have
set olbd.delay drop 1m.
I have three files cached, one on each of the data servers. The HSM
system is up and running, and
able to deliver all three files to any server that requests it.
Now if I run a program that reads all three files, then D caches the
location status. If I now kill S1 and S2 and try to run my program
again, it hangs for 1 minute waiting for D to drop S1 and S2. After
that, D sits in an (apparently) endless loop of
041210 15:05:38 010 do_Select Select delay XXXXXX.lns.cornell.edu 15
(where XXXXXX is the deleted hostname) instead of caching the two files
to S3 that are not already cached there.
I suspect that this is because it cached the negative responses from S3
regarding the two files not cached on it, which were made when S1 and
S2 were still running. I have to hope that when S1 and S2 were dropped
that all the files that D believed they cached were removed from the
olb cache. But when that happens, it might also be a good idea to drop
all negative cache information.
Is there a parameter to shorten the lifetime of the negative cache
responses? olb.fxhold seems like a candidate, but that deletes positive
caching information as well, which I don't want to do.
Am I barking up the wrong tree here and I need to tweak something else
to stop this aberrant behavior?
Any help would be greatly appreciated.
--
Gregory J. Sharp email: [log in to unmask]
Wilson Synchrotron Laboratory url:
http://www.lepp.cornell.edu/~gregor
Dryden Rd ph: +1 607 255 4882
Ithaca, NY 14853 fax: +1 607 255 8062
|