Peter Elmer wrote:
> Hi Manfred and Rolf,
>
> It doesn't look like xrootd/olbd are running on babar2 right now. Rolf,
> will you be starting it? I would have thought that xinetd would have started
> it, though. Ah, does xinetd itself need to be restarted in order to see
> the changes in /etc/xinetd/?
Hi Pete,
I have restarted the xinetd.
>
> One other small thing: it looks like different config files are being
> used for the xrootd and the olbd:
>
> fzk-babar2> grep server_args /etc/xinetd.d/xrootd /etc/xinetd.d/olbd
> /etc/xinetd.d/xrootd: server_args = -r -l /var/log/xrootd -c /opt/xrootd/etc/redirector.cf
> /etc/xinetd.d/olbd: server_args = -m -l /var/log/olbd -c /opt/xrootd/etc/xrootd_redirector.cf
>
Sorry. The entry in /etc/xinetd.d/olbd was wrong.
Manfred
> Was that intentional? In general you should be able to use a single config
> file for both the xrootd and olbd on any given single machine.
>
> Pete
>
> On Thu, Nov 11, 2004 at 10:46:38AM +0100, Manfred Alef wrote:
>
>>Hi Pete,
>>
>>I have enabled the old xinetd.d files. You may now be able
>>to start olbd and xrootd as babaradm.
>>
>>The xinetd.d daemons use the old configuration file
>>/opt/xrootd/etc/xrootd_redirector.cf.
>>
>>I have killed the daemons running as root.
>>
>>Regards
>>Manfred
>>
>>
>>Peter Elmer wrote:
>>
>>> Hi Manfred and Rolf,
>>>
>>> Two other small things, I see:
>>>
>>>fzk-babar2> ps -ef | grep xrootd
>>>root 3349 3082 0 Nov10 pts/0 00:00:11 /opt/xrootd/bin/xrootd -r -l /var/log/babar2.xrdlog -c /opt/xrootd/etc/redirector.cf
>>>root 3361 3082 0 Nov10 pts/0 00:00:12 /opt/xrootd/bin/olbd -m -l /var/log/babar2.olblog -c /opt/xrootd/etc/redirector.cf
>>>elmer 8970 8225 0 10:02 pts/3 00:00:00 grep xrootd
>>>fzk-babar2> less /opt/xrootd/etc/redirector.cf
>>>
>>> For security reasons, the daemon should not run as root, but as some
>>>generic babar account which would normally own the data in the unix file
>>>owner sense (e.g. whatever account you use for importing babar data). I
>>>believe Andy has disallowed running as root in later versions xrootd/olbd,
>>>too.
>>>
>>> Also the xrootd protocol allows clients to survive server crashes (or
>>>restarts), but you need to set something up to automatically restart the
>>>xrootd should it crash for some reason. Checking and restarting the
>>>server every 10 minutes should be within the period in which the client will
>>>keep retrying to connect (eventually it times out and just gives up).
>>>
>>> Pete
>>>
>>>
>>>
>>>On Thu, Nov 11, 2004 at 10:06:24AM +0100, Peter Elmer wrote:
>>>
>>>
>>>> Hi Manfred,
>>>>
>>>>On Thu, Nov 11, 2004 at 09:52:28AM +0100, Manfred Alef wrote:
>>>>
>>>>
>>>>>the redirector server was upgraded to SL 3.03. Now we could
>>>>>start olbd from xrootd's RHEL RPM without any problem.
>>>>
>>>> Ok, that is strange. The only guess I have is that it could have been the
>>>>xinetd stuff interfering with starting by hand (if that is what you did).
>>>>
>>>> In any case, I looked at the logs for the xrootd/olbd on babar2 and there
>>>>is still a problem. Normally the xrootd should connect to the olbd on
>>>>the same machine, but there are errors in the xrootd log file:
>>>>
>>>>041110 09:43:54 3349 odc_Manager: Connected to l01-001-122
>>>>041110 09:43:54 3349 odc_GetLine: Unable to reading request ; connection reset b
>>>>y peer
>>>>041110 09:43:54 3349 odc_Manager: Unable to receive msg from l01-001-122; connec
>>>>tion reset by peer
>>>>
>>>>and in the olbd log file:
>>>>
>>>>041110 09:43:54 3361 olb_Accept: Unable to accept connection from l01-001-122.gr
>>>>idka.de; permission denied
>>>>
>>>> Looking at the config file, I see:
>>>>
>>>>olb.allow host l01-001-122 # babar2.fzk.de
>>>>olb.allow host f01-001-121
>>>>olb.allow host f01-001-122
>>>>
>>>> I think you may need to specify the full hostname, including domain, i.e.
>>>>
>>>>olb.allow host l01-001-122.gridka.de # babar2.fzk.de
>>>>olb.allow host f01-001-121.gridka.de
>>>>olb.allow host f01-001-122.gridka.de
>>>>
>>>> Does that work?
>>>>
>>>> Pete
>>>>
>>>>
>>>>
>>>>
>>>>>Peter Elmer wrote:
>>>>>
>>>>>
>>>>>> Hi Manfred and Rolf,
>>>>>>
>>>>>> Sorry for the late reply. (You picked a somewhat awkward time to try this
>>>>>>since Andy is away and I'm just back from vacation in a series of
>>>>>>meetings/transits this past week!)
>>>>>>
>>>>>> I'll give this a try to see if I can reproduce it. I see, however, that
>>>>>>you restarted things yesterday:
>>>>>>
>>>>>>root 3349 3082 0 Nov10 pts/0 00:00:11 /opt/xrootd/bin/xrootd -r -l /var/log/babar2.xrdlog -c /opt/xrootd/etc/redirector.cf
>>>>>>root 3361 3082 0 Nov10 pts/0 00:00:11 /opt/xrootd/bin/olbd -m -l /var/log/babar2.olblog -c /opt/xrootd/etc/redirector.cf
>>>>>>
>>>>>>and I don't see anything in the log files about "Unable to bind socket;
>>>>>>address already in use". There are other problems related to the dataservers
>>>>>>connecting to the redirector, I think, but I'll look at those now.
>>>>>>
>>>>>> One thing that I recall is that Jos and I looked at setting up xinitd
>>>>>>style restarts of the server. That wasn't still there, was it? (I don't
>>>>>>see it now, but presumably it would have interfered with separate attempts
>>>>>>to start the daemons by hand.)
>>>>>>
>>>>>> Pete
>>>>>>
>>>>>>On Fri, Nov 05, 2004 at 01:00:36PM +0100, Manfred Alef wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Hi Pete,
>>>>>>>
>>>>>>>the config files are from http://xrootd.slac.stanford.edu/
>>>>>>>examples/multserver/index.html.
>>>>>>>
>>>>>>>Best regards
>>>>>>>Manfred
>>>>>>>
>>>>>>>babar2 # cat redirector.cf
>>>>>>>#
>>>>>>># redirector.cf
>>>>>>>#
>>>>>>># xrootd
>>>>>>>#+xrootd.fslib /opt/xrootd/lib/libXrdOfs.so
>>>>>>>xrootd.fslib /usr/local/xrootd/lib/i386_linux24/libXrdOfs.so
>>>>>>>xrootd.export /data
>>>>>>>odc.manager l01-001-122 3121
>>>>>>>odc.trace redirect
>>>>>>># olbd
>>>>>>>olb.port 3121
>>>>>>>#+olb.allow host kanrdr.slac.stanford.edu
>>>>>>>#+olb.allow host kan001.slac.stanford.edu
>>>>>>>#+olb.allow host kan002.slac.stanford.edu
>>>>>>>olb.allow host l01-001-122 # babar2.fzk.de
>>>>>>>olb.allow host f01-001-121
>>>>>>>olb.allow host f01-001-122
>>>>>>>babar2 #
>>>>>>>
>>>>>>>[root@f01-001-122 etc]# cat dataserver.cf
>>>>>>>#
>>>>>>># dataserver.cf
>>>>>>>#
>>>>>>># xrootd
>>>>>>>#+xrootd.fslib /opt/xrootd/lib/libXrdOfs.so
>>>>>>>xrootd.fslib /usr/local/xrootd/lib/i386_linux24/libXrdOfs.so
>>>>>>>xrootd.export /data
>>>>>>>oss.readonly
>>>>>>>odc.manager l01-001-122 3121
>>>>>>># olbd
>>>>>>>olb.port 3121
>>>>>>>olb.subscribe l01-001-122 3121
>>>>>>>[root@f01-001-122 etc]#
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Peter Elmer wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>[CC the xrootd mailing list]
>>>>>>>>
>>>>>>>>Hi Rolf,
>>>>>>>>
>>>>>>>>Do you have the config files you are using to try to start xrootd and
>>>>>>>>the olbd (on the redirector and the file servers)?
>>>>>>>>
>>>>>>>> Pete
>>>>>>>>
>>>>>>>>On Fri, Nov 05, 2004 at 11:33:47AM +0100, Manfred Alef wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Hi Pete,
>>>>>>>>>
>>>>>>>>>I am sitting here at GridKa together with Manfred Alef and we are trzing
>>>>>>>>>to install xrootd on two of the fileservers and on babar2, a login
>>>>>>>>>mashine which will also be the redirector.
>>>>>>>>>We use the current production versin and had no problems starting xrootd
>>>>>>>>>and albd on one of the fileservers. However, when we trz to start the
>>>>>>>>>olbd on the redirector, it exits with exit code 1. The logfile is
>>>>>>>>>attached. We made sure nothing else is going on on the mashine (reboot)
>>>>>>>>>and also removed anz old socket we could find in /tmp/.olb/
>>>>>>>>>Do you have an idea what could go wrong or what else we could try?
>>>>>>>>>
>>>>>>>>>Cheers, Rolf
>>>>>>>>>
>>>>>>>>>---------------------------------------------------------------
>>>>>>>>>41105 10:44:31 32156 olb_Config: (c) 2004 SLAC olbd version
>>>>>>>>>20040907-0403 initializing as Manager
>>>>>>>>>041105 10:44:31 32156 olb_Bind: Unable to bind socket; address already
>>>>>>>>>in use
>>>>>>>>>041105 10:44:31 32156 olb_Config: Manager initialization failed.
>>>>>>>>>041105 10:46:15 32191 olb_Config: (c) 2004 SLAC olbd version
>>>>>>>>>20040907-0403 initializing as Manager
>>>>>>>>>041105 10:46:15 32191 olb_Bind: Unable to bind socket; address already
>>>>>>>>>in use
>>>>>>>>>041105 10:46:15 32191 olb_Config: Manager initialization failed.
>>>>>>>>>041105 10:48:49 32248 Schedule scheduling midnight runner in 47471 seconds
>>>>>>>>>041105 10:48:49 32248 olb_Config: (c) 2004 SLAC olbd version
>>>>>>>>>20040907-0403 initializing as Manager
>>>>>>>>>041105 10:48:49 32248 olb_Bind: Unable to bind socket; address already
>>>>>>>>>in use
>>>>>>>>>041105 10:48:49 32248 olb_Config: Manager initialization failed.
>>>>>>>>>041105 11:10:37 3175 olb_Config: (c) 2004 SLAC olbd version
>>>>>>>>>20040907-0403 initializing as Manager
>>>>>>>>>041105 11:10:37 3175 olb_Bind: Unable to bind socket; address already in use
>>>>>>>>>041105 11:10:37 3175 olb_Config: Manager initialization failed.
>>>>>>>>>041105 11:18:16 3332 olb_Config: (c) 2004 SLAC olbd version
>>>>>>>>>20040907-0403 initializing as Manager
>>>>>>>>>041105 11:18:16 3332 olb_Bind: Unable to bind socket; address already in use
>>>>>>>>>041105 11:18:16 3332 olb_Config: Manager initialization failed.
>>>>>>>>>----------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>>>
>>>-------------------------------------------------------------------------
>>>Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644
>>>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
>>>-------------------------------------------------------------------------
>>>
>>
>
>
>
> -------------------------------------------------------------------------
> Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------
>
|