Hello Pete On Sat, 20 Nov 2004, Peter Elmer wrote: > Hi Remi, > > On Fri, Nov 19, 2004 at 04:29:28PM -0800, Remi Mommsen wrote: > > I have many erratic problems with the bbrprod0X servers inhibiting the > > use of xrootd for the skim production. I cannot reliable reproduce the > > errors, but about 30% of the transfers fail. The tracebacks are similar > > to the one posted by Alvise and myself to xrootd-l. > > > > Questions: > > - Are you (or somebody else) actively looking into these issues? We > > need to get this solved by early next week. > > - Which version(s) of xrootd are running on bbrprod0X? Can you please > > start the latest version on all of them? > > Andy shouldn't be doing this. We arranged things such this decision > should be _entirely_ in the hands of Wilko, Artem, etc. (i.e. Andy > shouldn't even need to be in the loop to distribute the software via > "taylor" at SLAC). Wilko, is that not true? Artem and I do the restart and also do some of the configuration of the xrootds but setting up a release so that it gets distributed by taylor is done by Andy. Any how, I am restarting the xrootd servers from the latest version, 20041118-0948, but right now it doesn't start which looks like a configuration issue. Wilko > > > - I can get a checksum only from bbrprod05. Do you know what the > > problem is? > > There is clearly a big mess for the versions. I see: > > bbrprod01 20041022-0258 > bbrprod02 20040830-0105 > bbrprod03 20040830-0105 > bbrprod04 20040830-0105 > bbrprod05 20041022-0258 > > and of course I've no idea if they have all been started with the new > version of the config file which includes the external checksum script. > > Actually, you can always check the versions in Ganglia: > > http://www-gmon.slac.stanford.edu:8080/ganglia/?m=xrootd_version&r=hour&s=by%2520hostname&c=xrootd-prod&h=&sh=1&hc=4 > > Wilko, could you please sort this out? > > > There is a test perl script at > > /afs/slac.stanford.edu/u/br/bbrskim/releases/test-16.0.1a/workdir/ > > testPAdmin.pl > > which exercises the functionality which we need. > > > > BTW: we gave up to get it to work using olb on the time scale of next > > week. We will be happy if the functionality required by testPAdmin.pl > > works for all 5 bbrprod0X machines. > > I'll take a look at it once they start the latest version of the > server (20041118-0948) on all 5 machines with the config file containing > the directive with the external checksum script. > > BTW, the fact that you are using your own compiled version of (HEAD of) the > client instead of the version installed in afs is also a bit confusing. I'll > try to sort out the debug version for linux to help this along. > > thanks, > Pete > > ------------------------------------------------------------------------- > Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 > Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland > ------------------------------------------------------------------------- >