Print

Print



Hello Pete


On Sat, 20 Nov 2004, Peter Elmer wrote:

>   Hi Wilko,
>
> On Sat, Nov 20, 2004 at 11:26:06AM -0800, Wilko Kroeger wrote:
> > On Sat, 20 Nov 2004, Peter Elmer wrote:
> > > On Fri, Nov 19, 2004 at 04:29:28PM -0800, Remi Mommsen wrote:
> > > > I have many erratic problems with the bbrprod0X servers inhibiting the
> > > > use of xrootd for the skim production. I cannot reliable reproduce the
> > > > errors, but about 30% of the transfers fail. The tracebacks are similar
> > > > to the one posted by Alvise and myself to xrootd-l.
> > > >
> > > > Questions:
> > > > - Are you (or somebody else) actively looking into these issues? We
> > > > need to get this solved by early next week.
> > > > - Which version(s) of xrootd are running on bbrprod0X? Can you please
> > > > start the latest version on all of them?
> > >
> > >   Andy shouldn't be doing this. We arranged things such this decision
> > > should be _entirely_ in the hands of Wilko, Artem, etc. (i.e. Andy
> > > shouldn't even need to be in the loop to distribute the software via
> > > "taylor" at SLAC). Wilko, is that not true?
> >
> > Artem and I do the restart and also do some of the configuration of the
> > xrootds but setting up a release so that it gets distributed by taylor
> > is done by Andy.
>
>   Ok, this is different from what Andy, Chuck and I agreed. What we
> wanted was that (a) someone (me) makes the releases and (b) you/Artem
> have complete control over distributing them with taylor and starting them
> on machines at SLAC. This takes Andy completely out of the operational
> loop. We need to fix that.
>
>   (You are of course supposed to ignore the fact that we are taking the
> person normally ~10 meters from you out of the loop while a person 8000km
> from you remains in the loop...)
>
> > Any how, I am restarting the xrootd servers from the
> > latest version, 20041118-0948, but right now it doesn't start which looks
> > like a configuration issue.

The new version needed also the ooss.stagecmd options, which wasn't
specified in the xrootd.cf file and not needed so far. I just
added it and now xrootd started up. Right now it runs on bbrprod04/05
and I will restart the other three and redirector in a few minutes.



Wilko


>
>   What happens?
>
>                                    Pete
>
>
> > > > - I can get a checksum only from bbrprod05. Do you know what the
> > > > problem is?
> > >
> > >   There is clearly a big mess for the versions. I see:
> > >
> > >   bbrprod01  20041022-0258
> > >   bbrprod02  20040830-0105
> > >   bbrprod03  20040830-0105
> > >   bbrprod04  20040830-0105
> > >   bbrprod05  20041022-0258
> > >
> > > and of course I've no idea if they have all been started with the new
> > > version of the config file which includes the external checksum script.
> > >
> > >   Actually, you can always check the versions in Ganglia:
> > >
> > >   http://www-gmon.slac.stanford.edu:8080/ganglia/?m=xrootd_version&r=hour&s=by%2520hostname&c=xrootd-prod&h=&sh=1&hc=4
> > >
> > >   Wilko, could you please sort this out?
> > >
> > > > There is a test perl script at
> > > > /afs/slac.stanford.edu/u/br/bbrskim/releases/test-16.0.1a/workdir/
> > > > testPAdmin.pl
> > > > which exercises the functionality which we need.
> > > >
> > > > BTW: we gave up to get it to work using olb on the time scale of next
> > > > week. We will be happy if the functionality required by testPAdmin.pl
> > > > works for all 5 bbrprod0X machines.
> > >
> > >   I'll take a look at it once they start the latest version of the
> > > server (20041118-0948) on all 5 machines with the config file containing
> > > the directive with the external checksum script.
> > >
> > >   BTW, the fact that you are using your own compiled version of (HEAD of) the
> > > client instead of the version installed in afs is also a bit confusing. I'll
> > > try to sort out the debug version for linux to help this along.
> > >
> > >                                  thanks,
> > >                                    Pete
> > >
>
>
>
> -------------------------------------------------------------------------
> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------
>