Hello Pete On Sat, 20 Nov 2004, Peter Elmer wrote: > Hi Wilko, > > On Sat, Nov 20, 2004 at 11:26:06AM -0800, Wilko Kroeger wrote: > > On Sat, 20 Nov 2004, Peter Elmer wrote: > > > On Fri, Nov 19, 2004 at 04:29:28PM -0800, Remi Mommsen wrote: > > > > I have many erratic problems with the bbrprod0X servers inhibiting the > > > > use of xrootd for the skim production. I cannot reliable reproduce the > > > > errors, but about 30% of the transfers fail. The tracebacks are similar > > > > to the one posted by Alvise and myself to xrootd-l. > > > > > > > > Questions: > > > > - Are you (or somebody else) actively looking into these issues? We > > > > need to get this solved by early next week. > > > > - Which version(s) of xrootd are running on bbrprod0X? Can you please > > > > start the latest version on all of them? > > > > > > Andy shouldn't be doing this. We arranged things such this decision > > > should be _entirely_ in the hands of Wilko, Artem, etc. (i.e. Andy > > > shouldn't even need to be in the loop to distribute the software via > > > "taylor" at SLAC). Wilko, is that not true? > > > > Artem and I do the restart and also do some of the configuration of the > > xrootds but setting up a release so that it gets distributed by taylor > > is done by Andy. > > Ok, this is different from what Andy, Chuck and I agreed. What we > wanted was that (a) someone (me) makes the releases and (b) you/Artem > have complete control over distributing them with taylor and starting them > on machines at SLAC. This takes Andy completely out of the operational > loop. We need to fix that. > > (You are of course supposed to ignore the fact that we are taking the > person normally ~10 meters from you out of the loop while a person 8000km > from you remains in the loop...) > > > Any how, I am restarting the xrootd servers from the > > latest version, 20041118-0948, but right now it doesn't start which looks > > like a configuration issue. The new version needed also the ooss.stagecmd options, which wasn't specified in the xrootd.cf file and not needed so far. I just added it and now xrootd started up. Right now it runs on bbrprod04/05 and I will restart the other three and redirector in a few minutes. Wilko > > What happens? > > Pete > > > > > > - I can get a checksum only from bbrprod05. Do you know what the > > > > problem is? > > > > > > There is clearly a big mess for the versions. I see: > > > > > > bbrprod01 20041022-0258 > > > bbrprod02 20040830-0105 > > > bbrprod03 20040830-0105 > > > bbrprod04 20040830-0105 > > > bbrprod05 20041022-0258 > > > > > > and of course I've no idea if they have all been started with the new > > > version of the config file which includes the external checksum script. > > > > > > Actually, you can always check the versions in Ganglia: > > > > > > http://www-gmon.slac.stanford.edu:8080/ganglia/?m=xrootd_version&r=hour&s=by%2520hostname&c=xrootd-prod&h=&sh=1&hc=4 > > > > > > Wilko, could you please sort this out? > > > > > > > There is a test perl script at > > > > /afs/slac.stanford.edu/u/br/bbrskim/releases/test-16.0.1a/workdir/ > > > > testPAdmin.pl > > > > which exercises the functionality which we need. > > > > > > > > BTW: we gave up to get it to work using olb on the time scale of next > > > > week. We will be happy if the functionality required by testPAdmin.pl > > > > works for all 5 bbrprod0X machines. > > > > > > I'll take a look at it once they start the latest version of the > > > server (20041118-0948) on all 5 machines with the config file containing > > > the directive with the external checksum script. > > > > > > BTW, the fact that you are using your own compiled version of (HEAD of) the > > > client instead of the version installed in afs is also a bit confusing. I'll > > > try to sort out the debug version for linux to help this along. > > > > > > thanks, > > > Pete > > > > > > > ------------------------------------------------------------------------- > Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 > Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland > ------------------------------------------------------------------------- >