Print

Print


On Thursday 11 November 2004 13:14, Peter Elmer wrote:
>   Hi Manfred and Rolf,
>
>   It doesn't look like xrootd/olbd are running on babar2 right now. Rolf,
> will you be starting it?

I am teaching today.  I won't have time until 1830.

Sorry, Rolf

> I would have thought that xinetd would have 
> started it, though.  Ah, does xinetd itself need to be restarted in order
> to see the changes in /etc/xinetd/?
>
>   One other small thing: it looks like different config files are being
> used for the xrootd and the olbd:
>
> fzk-babar2> grep server_args /etc/xinetd.d/xrootd /etc/xinetd.d/olbd
> /etc/xinetd.d/xrootd:   server_args     = -r -l /var/log/xrootd -c
> /opt/xrootd/etc/redirector.cf /etc/xinetd.d/olbd:     server_args     = -m
> -l /var/log/olbd -c /opt/xrootd/etc/xrootd_redirector.cf
>
> Was that intentional? In general you should be able to use a single config
> file for both the xrootd and olbd on any given single machine.
>
>                                    Pete
>
> On Thu, Nov 11, 2004 at 10:46:38AM +0100, Manfred Alef wrote:
> > Hi Pete,
> >
> > I have enabled the old xinetd.d files. You may now be able
> > to start olbd and xrootd as babaradm.
> >
> > The xinetd.d daemons use the old configuration file
> > /opt/xrootd/etc/xrootd_redirector.cf.
> >
> > I have killed the daemons running as root.
> >
> > Regards
> > Manfred
> >
> > Peter Elmer wrote:
> > >   Hi Manfred and Rolf,
> > >
> > >   Two other small things, I see:
> > >
> > > fzk-babar2> ps -ef | grep xrootd
> > > root      3349  3082  0 Nov10 pts/0    00:00:11 /opt/xrootd/bin/xrootd
> > > -r -l /var/log/babar2.xrdlog -c /opt/xrootd/etc/redirector.cf root     
> > > 3361  3082  0 Nov10 pts/0    00:00:12 /opt/xrootd/bin/olbd -m -l
> > > /var/log/babar2.olblog -c /opt/xrootd/etc/redirector.cf elmer     8970 
> > > 8225  0 10:02 pts/3    00:00:00 grep xrootd
> > > fzk-babar2> less /opt/xrootd/etc/redirector.cf
> > >
> > >   For security reasons, the daemon should not run as root, but as some
> > > generic babar account which would normally own the data in the unix
> > > file owner sense (e.g. whatever account you use for importing babar
> > > data). I believe Andy has disallowed running as root in later versions
> > > xrootd/olbd, too.
> > >
> > >   Also the xrootd protocol allows clients to survive server crashes (or
> > > restarts), but you need to set something up to automatically restart
> > > the xrootd should it crash for some reason. Checking and restarting the
> > > server every 10 minutes should be within the period in which the client
> > > will keep retrying to connect (eventually it times out and just gives
> > > up).
> > >
> > >                                    Pete
> > >
> > > On Thu, Nov 11, 2004 at 10:06:24AM +0100, Peter Elmer wrote:
> > >>  Hi Manfred,
> > >>
> > >>On Thu, Nov 11, 2004 at 09:52:28AM +0100, Manfred Alef wrote:
> > >>>the redirector server was upgraded to SL 3.03. Now we could
> > >>>start olbd from xrootd's RHEL RPM without any problem.
> > >>
> > >>  Ok, that is strange. The only guess I have is that it could have been
> > >> the xinetd stuff interfering with starting by hand (if that is what
> > >> you did).
> > >>
> > >>  In any case, I looked at the logs for the xrootd/olbd on babar2 and
> > >> there is still a problem. Normally the xrootd should connect to the
> > >> olbd on the same machine, but there are errors in the xrootd log file:
> > >>
> > >>041110 09:43:54 3349 odc_Manager: Connected to l01-001-122
> > >>041110 09:43:54 3349 odc_GetLine: Unable to reading request ;
> > >> connection reset b y peer
> > >>041110 09:43:54 3349 odc_Manager: Unable to receive msg from
> > >> l01-001-122; connec tion reset by peer
> > >>
> > >>and in the olbd log file:
> > >>
> > >>041110 09:43:54 3361 olb_Accept: Unable to accept connection from
> > >> l01-001-122.gr idka.de; permission denied
> > >>
> > >>  Looking at the config file, I see:
> > >>
> > >>olb.allow host l01-001-122      # babar2.fzk.de
> > >>olb.allow host f01-001-121
> > >>olb.allow host f01-001-122
> > >>
> > >>  I think you may need to specify the full hostname, including domain,
> > >> i.e.
> > >>
> > >>olb.allow host l01-001-122.gridka.de     # babar2.fzk.de
> > >>olb.allow host f01-001-121.gridka.de
> > >>olb.allow host f01-001-122.gridka.de
> > >>
> > >>  Does that work?
> > >>
> > >>                                   Pete
> > >>
> > >>>Peter Elmer wrote:
> > >>>>  Hi Manfred and Rolf,
> > >>>>
> > >>>>  Sorry for the late reply. (You picked a somewhat awkward time to
> > >>>> try this since Andy is away and I'm just back from vacation in a
> > >>>> series of meetings/transits this past week!)
> > >>>>
> > >>>>  I'll give this a try to see if I can reproduce it. I see, however,
> > >>>> that you restarted things yesterday:
> > >>>>
> > >>>>root      3349  3082  0 Nov10 pts/0    00:00:11
> > >>>> /opt/xrootd/bin/xrootd -r -l /var/log/babar2.xrdlog -c
> > >>>> /opt/xrootd/etc/redirector.cf root      3361  3082  0 Nov10 pts/0   
> > >>>> 00:00:11 /opt/xrootd/bin/olbd -m -l /var/log/babar2.olblog -c
> > >>>> /opt/xrootd/etc/redirector.cf
> > >>>>
> > >>>>and I don't see anything in the log files about "Unable to bind
> > >>>> socket; address already in use". There are other problems related to
> > >>>> the dataservers connecting to the redirector, I think, but I'll look
> > >>>> at those now.
> > >>>>
> > >>>>  One thing that I recall is that Jos and I looked at setting up
> > >>>> xinitd style restarts of the server. That wasn't still there, was
> > >>>> it? (I don't see it now, but presumably it would have interfered
> > >>>> with separate attempts to start the daemons by hand.)
> > >>>>
> > >>>>                                   Pete
> > >>>>
> > >>>>On Fri, Nov 05, 2004 at 01:00:36PM +0100, Manfred Alef wrote:
> > >>>>>Hi Pete,
> > >>>>>
> > >>>>>the config files are from http://xrootd.slac.stanford.edu/
> > >>>>>examples/multserver/index.html.
> > >>>>>
> > >>>>>Best regards
> > >>>>>Manfred
> > >>>>>
> > >>>>>babar2 # cat redirector.cf
> > >>>>>#
> > >>>>># redirector.cf
> > >>>>>#
> > >>>>># xrootd
> > >>>>>#+xrootd.fslib /opt/xrootd/lib/libXrdOfs.so
> > >>>>>xrootd.fslib /usr/local/xrootd/lib/i386_linux24/libXrdOfs.so
> > >>>>>xrootd.export /data
> > >>>>>odc.manager l01-001-122 3121
> > >>>>>odc.trace redirect
> > >>>>># olbd
> > >>>>>olb.port 3121
> > >>>>>#+olb.allow host kanrdr.slac.stanford.edu
> > >>>>>#+olb.allow host kan001.slac.stanford.edu
> > >>>>>#+olb.allow host kan002.slac.stanford.edu
> > >>>>>olb.allow host l01-001-122    # babar2.fzk.de
> > >>>>>olb.allow host f01-001-121
> > >>>>>olb.allow host f01-001-122
> > >>>>>babar2 #
> > >>>>>
> > >>>>>[root@f01-001-122 etc]# cat dataserver.cf
> > >>>>>#
> > >>>>># dataserver.cf
> > >>>>>#
> > >>>>># xrootd
> > >>>>>#+xrootd.fslib /opt/xrootd/lib/libXrdOfs.so
> > >>>>>xrootd.fslib /usr/local/xrootd/lib/i386_linux24/libXrdOfs.so
> > >>>>>xrootd.export /data
> > >>>>>oss.readonly
> > >>>>>odc.manager l01-001-122 3121
> > >>>>># olbd
> > >>>>>olb.port 3121
> > >>>>>olb.subscribe l01-001-122 3121
> > >>>>>[root@f01-001-122 etc]#
> > >>>>>
> > >>>>>Peter Elmer wrote:
> > >>>>>> [CC the xrootd mailing list]
> > >>>>>>
> > >>>>>> Hi Rolf,
> > >>>>>>
> > >>>>>> Do you have the config files you are using to try to start xrootd
> > >>>>>> and the olbd (on the redirector and the file servers)?
> > >>>>>>
> > >>>>>>                                  Pete
> > >>>>>>
> > >>>>>>On Fri, Nov 05, 2004 at 11:33:47AM +0100, Manfred Alef wrote:
> > >>>>>>>Hi Pete,
> > >>>>>>>
> > >>>>>>>I am sitting here at GridKa together with Manfred Alef and we are
> > >>>>>>> trzing to install xrootd on two of the fileservers and on babar2,
> > >>>>>>> a login mashine which will also be the redirector.
> > >>>>>>>We use the current production versin and had no problems starting
> > >>>>>>> xrootd and albd on one of the fileservers.  However, when we trz
> > >>>>>>> to start the olbd on the redirector, it exits with exit code 1. 
> > >>>>>>> The logfile is attached.  We made sure nothing else is going on
> > >>>>>>> on the mashine (reboot) and also removed anz old socket we could
> > >>>>>>> find in /tmp/.olb/ Do you have an idea what could go wrong or
> > >>>>>>> what else we could try?
> > >>>>>>>
> > >>>>>>>Cheers, Rolf
> > >>>>>>>
> > >>>>>>>---------------------------------------------------------------
> > >>>>>>>41105 10:44:31 32156 olb_Config: (c) 2004 SLAC olbd version
> > >>>>>>>20040907-0403 initializing as Manager
> > >>>>>>>041105 10:44:31 32156 olb_Bind: Unable to bind socket; address
> > >>>>>>> already in use
> > >>>>>>>041105 10:44:31 32156 olb_Config: Manager initialization failed.
> > >>>>>>>041105 10:46:15 32191 olb_Config: (c) 2004 SLAC olbd version
> > >>>>>>>20040907-0403 initializing as Manager
> > >>>>>>>041105 10:46:15 32191 olb_Bind: Unable to bind socket; address
> > >>>>>>> already in use
> > >>>>>>>041105 10:46:15 32191 olb_Config: Manager initialization failed.
> > >>>>>>>041105 10:48:49 32248 Schedule scheduling midnight runner in 47471
> > >>>>>>> seconds 041105 10:48:49 32248 olb_Config: (c) 2004 SLAC olbd
> > >>>>>>> version 20040907-0403 initializing as Manager
> > >>>>>>>041105 10:48:49 32248 olb_Bind: Unable to bind socket; address
> > >>>>>>> already in use
> > >>>>>>>041105 10:48:49 32248 olb_Config: Manager initialization failed.
> > >>>>>>>041105 11:10:37 3175 olb_Config: (c) 2004 SLAC olbd version
> > >>>>>>>20040907-0403 initializing as Manager
> > >>>>>>>041105 11:10:37 3175 olb_Bind: Unable to bind socket; address
> > >>>>>>> already in use 041105 11:10:37 3175 olb_Config: Manager
> > >>>>>>> initialization failed. 041105 11:18:16 3332 olb_Config: (c) 2004
> > >>>>>>> SLAC olbd version 20040907-0403 initializing as Manager
> > >>>>>>>041105 11:18:16 3332 olb_Bind: Unable to bind socket; address
> > >>>>>>> already in use 041105 11:18:16 3332 olb_Config: Manager
> > >>>>>>> initialization failed.
> > >>>>>>> ----------------------------------------------------------------
> > >
> > > -----------------------------------------------------------------------
> > >-- Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22)
> > > 767-4644 Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23,
> > > Switzerland
> > > -----------------------------------------------------------------------
> > >--
>
> -------------------------------------------------------------------------
> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------

-- 
contacts: http://www.physi.uni-heidelberg.de/~dubitzky