Print

Print


Hello Pete


On Sat, 20 Nov 2004, Peter Elmer wrote:

>   Hi Remi,
>
> On Fri, Nov 19, 2004 at 04:29:28PM -0800, Remi Mommsen wrote:
> > I have many erratic problems with the bbrprod0X servers inhibiting the
> > use of xrootd for the skim production. I cannot reliable reproduce the
> > errors, but about 30% of the transfers fail. The tracebacks are similar
> > to the one posted by Alvise and myself to xrootd-l.
> >
> > Questions:
> > - Are you (or somebody else) actively looking into these issues? We
> > need to get this solved by early next week.
> > - Which version(s) of xrootd are running on bbrprod0X? Can you please
> > start the latest version on all of them?
>
>   Andy shouldn't be doing this. We arranged things such this decision
> should be _entirely_ in the hands of Wilko, Artem, etc. (i.e. Andy
> shouldn't even need to be in the loop to distribute the software via
> "taylor" at SLAC). Wilko, is that not true?

Artem and I do the restart and also do some of the configuration of the
xrootds but setting up a release so that it gets distributed by taylor
is done by Andy. Any how, I am restarting the xrootd servers from the
latest version, 20041118-0948, but right now it doesn't start which looks
like a configuration issue.

Wilko


>
> > - I can get a checksum only from bbrprod05. Do you know what the
> > problem is?
>
>   There is clearly a big mess for the versions. I see:
>
>   bbrprod01  20041022-0258
>   bbrprod02  20040830-0105
>   bbrprod03  20040830-0105
>   bbrprod04  20040830-0105
>   bbrprod05  20041022-0258
>
> and of course I've no idea if they have all been started with the new
> version of the config file which includes the external checksum script.
>
>   Actually, you can always check the versions in Ganglia:
>
>   http://www-gmon.slac.stanford.edu:8080/ganglia/?m=xrootd_version&r=hour&s=by%2520hostname&c=xrootd-prod&h=&sh=1&hc=4
>
>   Wilko, could you please sort this out?
>
> > There is a test perl script at
> > /afs/slac.stanford.edu/u/br/bbrskim/releases/test-16.0.1a/workdir/
> > testPAdmin.pl
> > which exercises the functionality which we need.
> >
> > BTW: we gave up to get it to work using olb on the time scale of next
> > week. We will be happy if the functionality required by testPAdmin.pl
> > works for all 5 bbrprod0X machines.
>
>   I'll take a look at it once they start the latest version of the
> server (20041118-0948) on all 5 machines with the config file containing
> the directive with the external checksum script.
>
>   BTW, the fact that you are using your own compiled version of (HEAD of) the
> client instead of the version installed in afs is also a bit confusing. I'll
> try to sort out the debug version for linux to help this along.
>
>                                  thanks,
>                                    Pete
>
> -------------------------------------------------------------------------
> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------
>