Peter Elmer wrote: > Hi Manfred and Rolf, > > It doesn't look like xrootd/olbd are running on babar2 right now. Rolf, > will you be starting it? I would have thought that xinetd would have started > it, though. Ah, does xinetd itself need to be restarted in order to see > the changes in /etc/xinetd/? Hi Pete, I have restarted the xinetd. > > One other small thing: it looks like different config files are being > used for the xrootd and the olbd: > > fzk-babar2> grep server_args /etc/xinetd.d/xrootd /etc/xinetd.d/olbd > /etc/xinetd.d/xrootd: server_args = -r -l /var/log/xrootd -c /opt/xrootd/etc/redirector.cf > /etc/xinetd.d/olbd: server_args = -m -l /var/log/olbd -c /opt/xrootd/etc/xrootd_redirector.cf > Sorry. The entry in /etc/xinetd.d/olbd was wrong. Manfred > Was that intentional? In general you should be able to use a single config > file for both the xrootd and olbd on any given single machine. > > Pete > > On Thu, Nov 11, 2004 at 10:46:38AM +0100, Manfred Alef wrote: > >>Hi Pete, >> >>I have enabled the old xinetd.d files. You may now be able >>to start olbd and xrootd as babaradm. >> >>The xinetd.d daemons use the old configuration file >>/opt/xrootd/etc/xrootd_redirector.cf. >> >>I have killed the daemons running as root. >> >>Regards >>Manfred >> >> >>Peter Elmer wrote: >> >>> Hi Manfred and Rolf, >>> >>> Two other small things, I see: >>> >>>fzk-babar2> ps -ef | grep xrootd >>>root 3349 3082 0 Nov10 pts/0 00:00:11 /opt/xrootd/bin/xrootd -r -l /var/log/babar2.xrdlog -c /opt/xrootd/etc/redirector.cf >>>root 3361 3082 0 Nov10 pts/0 00:00:12 /opt/xrootd/bin/olbd -m -l /var/log/babar2.olblog -c /opt/xrootd/etc/redirector.cf >>>elmer 8970 8225 0 10:02 pts/3 00:00:00 grep xrootd >>>fzk-babar2> less /opt/xrootd/etc/redirector.cf >>> >>> For security reasons, the daemon should not run as root, but as some >>>generic babar account which would normally own the data in the unix file >>>owner sense (e.g. whatever account you use for importing babar data). I >>>believe Andy has disallowed running as root in later versions xrootd/olbd, >>>too. >>> >>> Also the xrootd protocol allows clients to survive server crashes (or >>>restarts), but you need to set something up to automatically restart the >>>xrootd should it crash for some reason. Checking and restarting the >>>server every 10 minutes should be within the period in which the client will >>>keep retrying to connect (eventually it times out and just gives up). >>> >>> Pete >>> >>> >>> >>>On Thu, Nov 11, 2004 at 10:06:24AM +0100, Peter Elmer wrote: >>> >>> >>>> Hi Manfred, >>>> >>>>On Thu, Nov 11, 2004 at 09:52:28AM +0100, Manfred Alef wrote: >>>> >>>> >>>>>the redirector server was upgraded to SL 3.03. Now we could >>>>>start olbd from xrootd's RHEL RPM without any problem. >>>> >>>> Ok, that is strange. The only guess I have is that it could have been the >>>>xinetd stuff interfering with starting by hand (if that is what you did). >>>> >>>> In any case, I looked at the logs for the xrootd/olbd on babar2 and there >>>>is still a problem. Normally the xrootd should connect to the olbd on >>>>the same machine, but there are errors in the xrootd log file: >>>> >>>>041110 09:43:54 3349 odc_Manager: Connected to l01-001-122 >>>>041110 09:43:54 3349 odc_GetLine: Unable to reading request ; connection reset b >>>>y peer >>>>041110 09:43:54 3349 odc_Manager: Unable to receive msg from l01-001-122; connec >>>>tion reset by peer >>>> >>>>and in the olbd log file: >>>> >>>>041110 09:43:54 3361 olb_Accept: Unable to accept connection from l01-001-122.gr >>>>idka.de; permission denied >>>> >>>> Looking at the config file, I see: >>>> >>>>olb.allow host l01-001-122 # babar2.fzk.de >>>>olb.allow host f01-001-121 >>>>olb.allow host f01-001-122 >>>> >>>> I think you may need to specify the full hostname, including domain, i.e. >>>> >>>>olb.allow host l01-001-122.gridka.de # babar2.fzk.de >>>>olb.allow host f01-001-121.gridka.de >>>>olb.allow host f01-001-122.gridka.de >>>> >>>> Does that work? >>>> >>>> Pete >>>> >>>> >>>> >>>> >>>>>Peter Elmer wrote: >>>>> >>>>> >>>>>> Hi Manfred and Rolf, >>>>>> >>>>>> Sorry for the late reply. (You picked a somewhat awkward time to try this >>>>>>since Andy is away and I'm just back from vacation in a series of >>>>>>meetings/transits this past week!) >>>>>> >>>>>> I'll give this a try to see if I can reproduce it. I see, however, that >>>>>>you restarted things yesterday: >>>>>> >>>>>>root 3349 3082 0 Nov10 pts/0 00:00:11 /opt/xrootd/bin/xrootd -r -l /var/log/babar2.xrdlog -c /opt/xrootd/etc/redirector.cf >>>>>>root 3361 3082 0 Nov10 pts/0 00:00:11 /opt/xrootd/bin/olbd -m -l /var/log/babar2.olblog -c /opt/xrootd/etc/redirector.cf >>>>>> >>>>>>and I don't see anything in the log files about "Unable to bind socket; >>>>>>address already in use". There are other problems related to the dataservers >>>>>>connecting to the redirector, I think, but I'll look at those now. >>>>>> >>>>>> One thing that I recall is that Jos and I looked at setting up xinitd >>>>>>style restarts of the server. That wasn't still there, was it? (I don't >>>>>>see it now, but presumably it would have interfered with separate attempts >>>>>>to start the daemons by hand.) >>>>>> >>>>>> Pete >>>>>> >>>>>>On Fri, Nov 05, 2004 at 01:00:36PM +0100, Manfred Alef wrote: >>>>>> >>>>>> >>>>>> >>>>>>>Hi Pete, >>>>>>> >>>>>>>the config files are from http://xrootd.slac.stanford.edu/ >>>>>>>examples/multserver/index.html. >>>>>>> >>>>>>>Best regards >>>>>>>Manfred >>>>>>> >>>>>>>babar2 # cat redirector.cf >>>>>>># >>>>>>># redirector.cf >>>>>>># >>>>>>># xrootd >>>>>>>#+xrootd.fslib /opt/xrootd/lib/libXrdOfs.so >>>>>>>xrootd.fslib /usr/local/xrootd/lib/i386_linux24/libXrdOfs.so >>>>>>>xrootd.export /data >>>>>>>odc.manager l01-001-122 3121 >>>>>>>odc.trace redirect >>>>>>># olbd >>>>>>>olb.port 3121 >>>>>>>#+olb.allow host kanrdr.slac.stanford.edu >>>>>>>#+olb.allow host kan001.slac.stanford.edu >>>>>>>#+olb.allow host kan002.slac.stanford.edu >>>>>>>olb.allow host l01-001-122 # babar2.fzk.de >>>>>>>olb.allow host f01-001-121 >>>>>>>olb.allow host f01-001-122 >>>>>>>babar2 # >>>>>>> >>>>>>>[root@f01-001-122 etc]# cat dataserver.cf >>>>>>># >>>>>>># dataserver.cf >>>>>>># >>>>>>># xrootd >>>>>>>#+xrootd.fslib /opt/xrootd/lib/libXrdOfs.so >>>>>>>xrootd.fslib /usr/local/xrootd/lib/i386_linux24/libXrdOfs.so >>>>>>>xrootd.export /data >>>>>>>oss.readonly >>>>>>>odc.manager l01-001-122 3121 >>>>>>># olbd >>>>>>>olb.port 3121 >>>>>>>olb.subscribe l01-001-122 3121 >>>>>>>[root@f01-001-122 etc]# >>>>>>> >>>>>>> >>>>>>> >>>>>>>Peter Elmer wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>>[CC the xrootd mailing list] >>>>>>>> >>>>>>>>Hi Rolf, >>>>>>>> >>>>>>>>Do you have the config files you are using to try to start xrootd and >>>>>>>>the olbd (on the redirector and the file servers)? >>>>>>>> >>>>>>>> Pete >>>>>>>> >>>>>>>>On Fri, Nov 05, 2004 at 11:33:47AM +0100, Manfred Alef wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Hi Pete, >>>>>>>>> >>>>>>>>>I am sitting here at GridKa together with Manfred Alef and we are trzing >>>>>>>>>to install xrootd on two of the fileservers and on babar2, a login >>>>>>>>>mashine which will also be the redirector. >>>>>>>>>We use the current production versin and had no problems starting xrootd >>>>>>>>>and albd on one of the fileservers. However, when we trz to start the >>>>>>>>>olbd on the redirector, it exits with exit code 1. The logfile is >>>>>>>>>attached. We made sure nothing else is going on on the mashine (reboot) >>>>>>>>>and also removed anz old socket we could find in /tmp/.olb/ >>>>>>>>>Do you have an idea what could go wrong or what else we could try? >>>>>>>>> >>>>>>>>>Cheers, Rolf >>>>>>>>> >>>>>>>>>--------------------------------------------------------------- >>>>>>>>>41105 10:44:31 32156 olb_Config: (c) 2004 SLAC olbd version >>>>>>>>>20040907-0403 initializing as Manager >>>>>>>>>041105 10:44:31 32156 olb_Bind: Unable to bind socket; address already >>>>>>>>>in use >>>>>>>>>041105 10:44:31 32156 olb_Config: Manager initialization failed. >>>>>>>>>041105 10:46:15 32191 olb_Config: (c) 2004 SLAC olbd version >>>>>>>>>20040907-0403 initializing as Manager >>>>>>>>>041105 10:46:15 32191 olb_Bind: Unable to bind socket; address already >>>>>>>>>in use >>>>>>>>>041105 10:46:15 32191 olb_Config: Manager initialization failed. >>>>>>>>>041105 10:48:49 32248 Schedule scheduling midnight runner in 47471 seconds >>>>>>>>>041105 10:48:49 32248 olb_Config: (c) 2004 SLAC olbd version >>>>>>>>>20040907-0403 initializing as Manager >>>>>>>>>041105 10:48:49 32248 olb_Bind: Unable to bind socket; address already >>>>>>>>>in use >>>>>>>>>041105 10:48:49 32248 olb_Config: Manager initialization failed. >>>>>>>>>041105 11:10:37 3175 olb_Config: (c) 2004 SLAC olbd version >>>>>>>>>20040907-0403 initializing as Manager >>>>>>>>>041105 11:10:37 3175 olb_Bind: Unable to bind socket; address already in use >>>>>>>>>041105 11:10:37 3175 olb_Config: Manager initialization failed. >>>>>>>>>041105 11:18:16 3332 olb_Config: (c) 2004 SLAC olbd version >>>>>>>>>20040907-0403 initializing as Manager >>>>>>>>>041105 11:18:16 3332 olb_Bind: Unable to bind socket; address already in use >>>>>>>>>041105 11:18:16 3332 olb_Config: Manager initialization failed. >>>>>>>>>---------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>> >>> >>>------------------------------------------------------------------------- >>>Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 >>>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland >>>------------------------------------------------------------------------- >>> >> > > > > ------------------------------------------------------------------------- > Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 > Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland > ------------------------------------------------------------------------- >