Hello Chris I saw something similar at SLAC. I put a file into the kan xrootd cluster but not into HPSS. Using xrdcp to access the file I saw failures from time to time because xrootd redirected the client to a server that didn't have the file. The client then went back to the redirector but again it was redirected to a different server that didn't have the file and at some moment the client gave up and failed (note the server with the file was up all the time). AFAIK if the client asks for a file again (the refresh bit turned on) xrootd will locate the file again and it should find that the file is on a particular server. However, xrootd might not select the server because it is heavily loaded. In all of the old xrootd versions the load values are totally off and it could be that xrootd is not selecting a server even if it is not so busy. Do you have the olbd log file (from the redirector) when you saw this problem? Maybe the load value for the data server was very high and the machine didn't get selected. This is just a guess but I will see if I can test it. Cheers, Wilko On Thu, 27 Oct 2005, Brew, CAJ (Chris) wrote: > Hi, > > I still don't really understand why this is a client issue not a server > issue, but whatever it is it still seems to be present in the latest SP > release. > > A quick recap of the problem as it presented in this case. > > The disk containing one of the BkgTrigger files was take offline for > checks - This was a run 5 collection that had just been imported and so > hadn't yet been written to tape. > > When a client job requested the file the olbd redirected it to one of > the stage servers to get if from tape, which obviously failed as did the > subsequent jobs. > > However when the disk was brought back online and the file was once > again available 50% of the jobs continued to fail. > > Looking at the olbd logs I see it redirecting these jobs to the stage > server irrespective of the fact that it FAILED to import the file. > > As far as I can tell the only way to fix this is to stop and start the > oblds on both redirectors. > > Chris. > > >