Hi Pete, Sometimes minor things leak in that have major impacts. Usually I go by what is really running successful elsewhere to determine the probability of success. However, you do bring up a good point. Chris, please make sure that you are using -w on the olbd consistently with -t on the xrootd data server. If you specifu -w but not -t then you will see exactly what you described. Also, the logs during start-up time to hang time would be helpful (i.e., redirector: xrootd and olbd, and data server xrootd/olbd). Please clearly identify which is which. Thanks. Andy On Thu, 17 Feb 2005, Peter Elmer wrote: > Hi Andy, > > From: > > http://xrootd.slac.stanford.edu/xrootd.History > > the only differences between version 20040907-0403 (the one we currently > label "production") and 20040830-0105 are small changes to the ./configure > and makefiles, but nothing of substance that would lead to problems with > the olbd. I suspect that there is something else going on. (e.g. the famous > wait/-w problems?) > > Pete > > On Thu, Feb 17, 2005 at 07:12:08AM -0800, Andrew Hanushevsky wrote: > > Hi Chris, > > > > Those two particular releases seem to have had some problems. I assume > > you are not mixing releases here (i.e., running either on all servers > > causes you to see the problem). > > > > I do know that 20040830 is a stable release. We run that everywhere at > > SLAC for analysis. I'd suggest going with that one until we test out > > the latest release that should have fixed some other problem relating > > to writing files. > > > > Andy > > > > On Thu, 17 Feb 2005, Brew, CAJ (Chris) wrote: > > > > > Hi, > > > > > > Since increasing the number of servers at RAL from 8 to 21 we seem to be > > > seeing a new failure mode. > > > > > > All the processes seem to be running fine and you can read a file by > > > going directly to the server that hold is but the server does not seem > > > to respond via the olbd network so if you try to access a file via the > > > load balancer you fail. > > > > > > Restarting the load balancer on the data server fixes the problem. > > > > > > There is nothing unusual in the logs at either end as far or anything > > > missing either as I can tell. > > > > > > This is on data servers running RH73 and xrootd-20040907-0403 or > > > xrootd-20041214-1142. > > > > > > Has anyone else seen this? Is there a fix? > > > > > > Thanks, > > > Chris. > > > > > > > ------------------------------------------------------------------------- > Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 > Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland > ------------------------------------------------------------------------- >