Print

Print


Hi Remi,

If my last response was not copied tio xrootd-l; (please look at the cc
list), please forward it t o xrootd-l; thanks!

Andy

On Thu, 25 Nov 2004, Remi Mommsen wrote:

> Hi Andy,
>
> On Nov 25, 2004, at 2:31 AM, Andrew Hanushevsky wrote:
>
> > Hi Remi,
> >
> >> This is indeed solved, but the load balancer cashes the information
> >> once it has retrieved it.
> > I don't understand this. There is no core file and there is nothing in
> > the
> > log indicating that a crash/restart occured. Hence, there is no trail
> > of a
> > crash. So what is really happening?
>
> I don't say it crashes, but it caches the results (sorry for the typo,
> but it should have been clear from the context).
>
> If you ask the load balancer for a file at a time this file does not
> exists, it claims the file does not exists for several hours (8 ?) even
> when the file was created in the meantime. The opposite is true as
> well: if you delete a file, the load balancer still reports that the
> file exists and reports even a checksum for files which were deleted
> hours before.
>
> Best this is confusing, but for the skim production it is also very
> inconvienent. We need to merge several collections from the skim
> production which were copied to the /prod area. A merge operation fails
> from time to time and we need to redo it. As we create roughly 170
> merges for different skims in parallel, we do not want to remerge all
> because just 1 merge failed. Thus, we ask the load balancer at the
> beginning of the job if it already has the file (which is uniquely
> named). If it does not have the file, we start the merge and create the
> file which is then copied to xrootd. However, due to the cache of the
> load balancer, any further check on that file fails for the next 8
> hours as the load balancer claims the file does not exists. In the
> worst case, the merge is restarted before the cache is cleared and, as
> the in the first round no merge existed at all, the load balancer will
> report that all files do not exist. The whole merge is redone and the
> output files are reproduced, duplicating the files already in xrootd.
>
> Another issue is when the copy of files into xrootd fails. We copy the
> file (with xrdcp) and then ask for the checksum of the copied file and
> compare it to the local checksum. Assume that the copy corrupted the
> file and the checksums do not match. We want to delete the file and
> retransfer it. But the load balancer will keep reporting the cached
> (wrong) checksum regardless if the retransfer copied the file correctly
> or not.
>
> I hope it became clear that we either have to switch off the caching
> (if that is possible) or to make the caching more clever. I guess the
> latter will be hard as long as xrdcp does not work via the load
> balancer.
>
> Cheers,
> 		Remi
>
> ---------------------------------------------------------------------
> Computers are like air-conditioners, they stop working properly when
> you open Windows.                                         (Anonymous)
>
> *********************************************************************
> Remigius K. Mommsen                 e-mail: [log in to unmask]
> University of California, Irvine       URL:    http://cern.ch/mommsen
> c/o SLAC                             voice:        ++1 (650) 926-3595
> 2575 Sand Hill Road #35                fax:        ++1 (650) 926-3882
> Menlo Park, CA 94025, US              home:        ++1 (650) 233-9041
> *********************************************************************
>
>