Print

Print


Hi Pete,

Sometimes minor things leak in that have major impacts. Usually I go by
what is really running successful elsewhere to determine the probability
of success. However, you do bring up a good point. Chris, please make sure
that you are using -w on the olbd consistently with -t on the xrootd data
server. If you specifu -w but not -t then you will see exactly what you
described. Also, the logs during start-up time to hang time would be
helpful (i.e., redirector: xrootd and olbd, and data server xrootd/olbd).
Please clearly identify which is which. Thanks.

Andy

On Thu, 17 Feb 2005, Peter Elmer wrote:

>   Hi Andy,
>
>   From:
>
>   http://xrootd.slac.stanford.edu/xrootd.History
>
> the only differences between version 20040907-0403 (the one we currently
> label "production") and 20040830-0105 are small changes to the ./configure
> and makefiles, but nothing of substance that would lead to problems with
> the olbd. I suspect that there is something else going on. (e.g. the famous
> wait/-w problems?)
>
>                                    Pete
>
> On Thu, Feb 17, 2005 at 07:12:08AM -0800, Andrew Hanushevsky wrote:
> > Hi Chris,
> >
> > Those two particular releases seem to have had some problems. I assume
> > you are not mixing releases here (i.e., running either on all servers
> > causes you to see the problem).
> >
> > I do know that 20040830 is a stable release. We run that everywhere at
> > SLAC for analysis. I'd suggest going with that one until we test out
> > the latest release that should have fixed some other problem relating
> > to writing files.
> >
> > Andy
> >
> > On Thu, 17 Feb 2005, Brew, CAJ (Chris) wrote:
> >
> > > Hi,
> > >
> > > Since increasing the number of servers at RAL from 8 to 21 we seem to be
> > > seeing a new failure mode.
> > >
> > > All the processes seem to be running fine and you can read a file by
> > > going directly to the server that hold is but the server does not seem
> > > to respond via the olbd network so if you try to access a file via the
> > > load balancer you fail.
> > >
> > > Restarting the load balancer on the data server fixes the problem.
> > >
> > > There is nothing unusual in the logs at either end as far or anything
> > > missing either as I can tell.
> > >
> > > This is on data servers running RH73 and xrootd-20040907-0403 or
> > > xrootd-20041214-1142.
> > >
> > > Has anyone else seen this? Is there a fix?
> > >
> > > Thanks,
> > > Chris.
> > >
>
>
>
> -------------------------------------------------------------------------
> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------
>