Hi Remi, If my last response was not copied tio xrootd-l; (please look at the cc list), please forward it t o xrootd-l; thanks! Andy On Thu, 25 Nov 2004, Remi Mommsen wrote: > Hi Andy, > > On Nov 25, 2004, at 2:31 AM, Andrew Hanushevsky wrote: > > > Hi Remi, > > > >> This is indeed solved, but the load balancer cashes the information > >> once it has retrieved it. > > I don't understand this. There is no core file and there is nothing in > > the > > log indicating that a crash/restart occured. Hence, there is no trail > > of a > > crash. So what is really happening? > > I don't say it crashes, but it caches the results (sorry for the typo, > but it should have been clear from the context). > > If you ask the load balancer for a file at a time this file does not > exists, it claims the file does not exists for several hours (8 ?) even > when the file was created in the meantime. The opposite is true as > well: if you delete a file, the load balancer still reports that the > file exists and reports even a checksum for files which were deleted > hours before. > > Best this is confusing, but for the skim production it is also very > inconvienent. We need to merge several collections from the skim > production which were copied to the /prod area. A merge operation fails > from time to time and we need to redo it. As we create roughly 170 > merges for different skims in parallel, we do not want to remerge all > because just 1 merge failed. Thus, we ask the load balancer at the > beginning of the job if it already has the file (which is uniquely > named). If it does not have the file, we start the merge and create the > file which is then copied to xrootd. However, due to the cache of the > load balancer, any further check on that file fails for the next 8 > hours as the load balancer claims the file does not exists. In the > worst case, the merge is restarted before the cache is cleared and, as > the in the first round no merge existed at all, the load balancer will > report that all files do not exist. The whole merge is redone and the > output files are reproduced, duplicating the files already in xrootd. > > Another issue is when the copy of files into xrootd fails. We copy the > file (with xrdcp) and then ask for the checksum of the copied file and > compare it to the local checksum. Assume that the copy corrupted the > file and the checksums do not match. We want to delete the file and > retransfer it. But the load balancer will keep reporting the cached > (wrong) checksum regardless if the retransfer copied the file correctly > or not. > > I hope it became clear that we either have to switch off the caching > (if that is possible) or to make the caching more clever. I guess the > latter will be hard as long as xrdcp does not work via the load > balancer. > > Cheers, > Remi > > --------------------------------------------------------------------- > Computers are like air-conditioners, they stop working properly when > you open Windows. (Anonymous) > > ********************************************************************* > Remigius K. Mommsen e-mail: [log in to unmask] > University of California, Irvine URL: http://cern.ch/mommsen > c/o SLAC voice: ++1 (650) 926-3595 > 2575 Sand Hill Road #35 fax: ++1 (650) 926-3882 > Menlo Park, CA 94025, US home: ++1 (650) 233-9041 > ********************************************************************* > >