Print

Print


Hi Chris,

Yes, especially if it thinks one of the two is not particularly responsive
to its requests. The xrootd redirectors are programmed to be greedy.

Andy

On Tue, 1 Mar 2005, Brew, CAJ (Chris) wrote:

> Hi,
>
> The clients should only know about one of the load balancers but the
> load balancers know about each other from the olbd "network" since they
> share a config file and log into each other. So when the client asks the
> xrootd on the LB server for a file could it be asking the olbd on the
> other LB server to find the file for it.
>
> That could explain why what looked to be an intermittant problem on the
> LB server's olbd affected finding files via both LB servers in the same
> way at the same time.
>
> Yours,
> Chris.
>
> > -----Original Message-----
> > From: Andrew Hanushevsky [mailto:[log in to unmask]]
> > Sent: 01 March 2005 17:40
> > To: Brew, CAJ (Chris)
> > Cc: Olaiya, EO (Emmanuel); [log in to unmask]
> > Subject: RE: olbd tracing
> >
> > Hi Chris,
> >
> > Yes and no. Depends on what client you use. The client code has gone
> > through several changes. Some always ask only one redirector, others
> > switch back and forth. But, in any case, you should have
> > looked at both
> > redirector logs. Presumably, the xrootd defaults are being used (i.e.,
> > fail-over mode). Take a look at the xrootd logs on the
> > redirector to see
> > if anything strange is going on there.
> >
> > Andy
> >
> >
> > On Tue, 1 Mar 2005, Brew, CAJ (Chris) wrote:
> >
> > > Hmmm...
> > >
> > > Actually there are some debug messages in the log file now:
> > >
> > > When I just ran another test I got:
> > >
> > > 050301 16:46:39 24895 do_Select Lookup delay
> > xrootd107.gridpp.rl.ac.uk 5
> > > 050301 16:46:39 24895 Receive From csfnfs49.rl.ac.uk:1094:
> > 7@0 have r
> > > /store/test/csfnfs49.01.root
> > > 050301 16:46:44 24895 Receive From
> > xrootd107.gridpp.rl.ac.uk: 35 select
> > > r /store/test/csfnfs49.01.root
> > > 050301 16:46:44 24895 do_Select Redirect
> > xrootd107.gridpp.rl.ac.uk ->
> > > csfnfs49.rl.ac.uk:1094 for /store/test/csfnfs49.01.root
> > >
> > > Is it possible that because there was another load balancer in the
> > > set-up it was asking that to find the files for it? That
> > other machine
> > > has now gone down to be reinstalled with SL3 and now we're
> > getting more
> > > logging info.
> > >
> > > Weird.
> > >
> > > Chris.
> > >
> > > > -----Original Message-----
> > > > From: [log in to unmask]
> > > > [mailto:[log in to unmask]] On Behalf Of
> > > > Brew, CAJ (Chris)
> > > > Sent: 01 March 2005 16:13
> > > > To: Andrew Hanushevsky; Olaiya, EO (Emmanuel)
> > > > Cc: [log in to unmask]
> > > > Subject: RE: olbd tracing
> > > >
> > > > cc'd to Manny in case he doesn't catch it on the list.
> > > >
> > > > The machine I'm trying to turn the logging on on is
> > > > xrootd107.gridpp.rl.ac.uk our new master load balancer.
> > We don't have
> > > > root access on the box but can restart the deamons and modify
> > > > the config
> > > > file.
> > > >
> > > > I started the deamon with the -d option by running StopOLB
> > > > and StartOLB
> > > > -d rather than using the sudo /sbin/service olbd start|stop
> > > > we normally
> > > > do.
> > > >
> > > > We haven't had any problems since I reduced the number of
> > > > xrootd servers
> > > > in the cluster but I've now only got about 7TB free on the
> > > > servers left
> > > > in the cluster so will run out of room in about a week.
> > > >
> > > > Yours,
> > > > Chris.
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Hanushevsky [mailto:[log in to unmask]]
> > > > > Sent: 01 March 2005 15:53
> > > > > To: Brew, CAJ (Chris)
> > > > > Cc: Andrew Hanushevsky; [log in to unmask]
> > > > > Subject: RE: olbd tracing
> > > > >
> > > > > Hi Chris,
> > > > >
> > > > > OK, something seems to be amiss with the overall
> > > > configuration if even
> > > > > this doesn't work. Let me get together with Manny and take a
> > > > > look at what
> > > > > is actually running and how it is put together. Manny when?
> > > > >
> > > > > Andy
> > > > >
> > > > > On Tue, 1 Mar 2005, Brew, CAJ (Chris) wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I've got:
> > > > > >
> > > > > > olb.trace debug
> > > > > > odc.trace debug
> > > > > > ofs.trace debug
> > > > > >
> > > > > > in my xrootd.cf file and started the olbd with -d on
> > the LB server
> > > > > > [xrootd107] /opt/xrootd/etc > ps fwwwU bbdatsrv
> > > > > >   PID TTY      STAT   TIME COMMAND
> > > > > > 18801 ?        S      0:00 sshd: bbdatsrv@pts/0
> > > > > > 18803 pts/0    S      0:00 -bash
> > > > > > 25484 pts/0    R      0:00  \_ ps fwwwU bbdatsrv
> > > > > > 24895 pts/0    S      0:00 /opt/xrootd/bin/olbd -d -m -l
> > > > > > /opt/xrootd/logs/olbdlog -c /opt/xrootd//etc/xrootd.cf
> > > > > > 23940 pts/0    S      0:00 /opt/xrootd/bin/xrootd -r -l
> > > > > > /opt/xrootd/logs/xrdlog -c /opt/xrootd/etc/xrootd.cf
> > > > > > 23975 pts/0    S      0:00 /opt/xrootd/bin/xrootd -r -l
> > > > > > /opt/xrootd/logs/xrdlog -c /opt/xrootd/etc/xrootd.cf
> > > > > >
> > > > > > but am still not getting any debug info on how it's
> > > > > locating the files:
> > > > > >
> > > > > > the old.trace debug on the Data Servers does get me:
> > > > > >
> > > > > > 050301 11:45:08 616 Receive From xrootd108.gridpp.rl.ac.uk:
> > > > > 7@0  state
> > > > > > /store...
> > > > > >
> > > > > > when looking for a file.
> > > > > >
> > > > > > Anyone know what else I need on the LB server?
> > > > > >
> > > > > > Thanks,
> > > > > > Chris.
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Andrew Hanushevsky [mailto:[log in to unmask]]
> > > > > > > Sent: 23 February 2005 20:59
> > > > > > > To: Brew, CAJ (Chris)
> > > > > > > Cc: [log in to unmask]
> > > > > > > Subject: Re: olbd tracing
> > > > > > >
> > > > > > > Hi Chris,
> > > > > > >
> > > > > > > That's starting the olbd with the -d option (for debugging).
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Brew, CAJ (Chris)" <[log in to unmask]>
> > > > > > > To: "Andrew Hanushevsky" <[log in to unmask]>
> > > > > > > Cc: <[log in to unmask]>
> > > > > > > Sent: Wednesday, February 23, 2005 11:01 AM
> > > > > > > Subject: RE: olbd tracing
> > > > > > >
> > > > > > >
> > > > > > > > Hi Andy,
> > > > > > > >
> > > > > > > > I don't think the odc.trace redirect is the one
> > I'm looking
> > > > > > > for. What's
> > > > > > > > the directive that puts the "have ?" and "have" replies
> > > > > > > into the olbd
> > > > > > > > log.
> > > > > > > >
> > > > > > > > Once I narrow it down to the manager not asking the server
> > > > > > > or the server
> > > > > > > > not replying correctly I can turn debug on on the relavant
> > > > > > > machine. I'm
> > > > > > > > reluctant to turn it on on all machines because
> > it's a fair
> > > > > > > time before
> > > > > > > > the problem manifests itself.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Chris.
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Andrew Hanushevsky [mailto:[log in to unmask]]
> > > > > > > > > Sent: 22 February 2005 21:26
> > > > > > > > > To: Brew, CAJ (Chris); [log in to unmask]
> > > > > > > > > Subject: Re: olbd tracing
> > > > > > > > >
> > > > > > > > > Hi Chris,
> > > > > > > > >
> > > > > > > > > Try:
> > > > > > > > >
> > > > > > > > > odc.trace redirect
> > > > > > > > >
> > > > > > > > > for the olb try using the '-d' option; though
> > you may get
> > > > > > > > > more information
> > > > > > > > > than really needed.
> > > > > > > > >
> > > > > > > > > Andy
> > > > > > > > >
> > > > > > > > > ----- Original Message -----
> > > > > > > > > From: "Brew, CAJ (Chris)" <[log in to unmask]>
> > > > > > > > > To: <[log in to unmask]>
> > > > > > > > > Sent: Tuesday, February 22, 2005 6:53 AM
> > > > > > > > > Subject: olbd tracing
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > What's the trace argument to add to the xrootd.cf file
> > > > > > > to get it to
> > > > > > > > > > output the queries to locate files to the logs.
> > > > > > > > > >
> > > > > > > > > > We're still having problems at RAL with
> > > > > files/servers not being
> > > > > > > > > > available via the load balancers when they are if you
> > > > > > > contact them
> > > > > > > > > > directly.
> > > > > > > > > >
> > > > > > > > > > Yours,
> > > > > > > > > > Chris.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >   Chris Brew  ([log in to unmask])  +44 1235 446326
> > > > > > > > > >   Particle Physics Department
> > > > > > > > > >   Rutherford Appleton Laboratory
> > > > > > > > > >   Chilton, Didcot. Oxfordshire.
> > > > > > > > > >   OX11 0QX. United Kingdom.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>
>