>> 1) There are two network interfaced on the redirector. The issue is that
>> using the wrong one in the KanAccess file (the one that is not the same as
>> in the xrootd config files) causes to crash the xrootd (and not olbd)
>> proccess on the redirector.
>> The core file reveals:
>>> gdb /opt/xrootd/bin/xrootd core.9418
>> #0 0x0806349c in XrdNetwork::getHostName ()
> Yes, this is a known problem. The xrootd will crash if it cannot do a
> reverse translation of IP address to DNS name for it's own machine. I
> should be more graceful and issue an error message and exit. That's on the
> list of corrections.
>> 2) Strange disk size in the olb redirector log file. I'm using the sep 04
>> prod version.
>> 050216 18:49:32 9419 olb_a2i: tot dsk value -1463812096 is too small
>> 050216 18:49:32 9419 olb_Server: invalid response from f01-001-118.gridka.de:1094
> What release are you using? This was a problem in some early releases
> dealing with the way parameters were being passed to a function and how
> the compiler treated them. It worked using Sun CC but not in g++.
As I said, it's the september 2004 production version that I'm running. I
think I remember Jean-Yves told me that using the same release he's not
seeing this problem at IN2P3. He's not running on linux there though.
>> 3) Probably it's a result of the computing center's instability of
>> last weekbut most of the dataservers are not registering with olbd on
>> redirector. Only the dataservers that I restarted are now registering. The
>> olb logfile of the ones not registering are showing:
>> 050211 00:00:00 20383 olb_Config: (c) 2004 SLAC olbd version 20040907-0403 executing as Server
>> 050211 14:33:02 20388 olb_Manager: Manager babar2 appears to be dead.
>> 050211 14:35:15 20388 olb_Manager: Manager h^RBh^RB appears to be dead.
>> 050211 15:03:05 1143 olb_Config: (c) 2004 SLAC olbd version 20040907-0403 initializing as Server
>> 050211 15:03:05 1143 olb_Config: Server initialization completed.
>> 050211 15:03:05 1155 olb_Start: Waiting for primary server to login.
>> 050211 15:03:06 1157 olb_Admin_Login: Primary server 1142 logged in
>> 050211 15:03:06 1143 olb_Server: Logged into babar2
>> 050211 16:14:01 1149 olb_Manager: Manager babar2 appears to be dead.
>> 050211 16:16:14 1149 olb_Manager: Manager h^RBh^RB appears to be dead.
>> 050211 16:18:27 1149 olb_Manager: Manager p^p^ appears to be dead.
>> 050211 16:20:40 1149 olb_Manager: Manager p^p^ appears to be dead.
>> 050211 16:22:54 1149 olb_Manager: Manager p^p^ appears to be dead.
>> 050211 16:25:07 1149 olb_Manager: Manager p^p^ appears to be dead.
>> Indeed, babar2 has been that day (11/02/2005) around 16:12. This message
>> kept going on in the log file until I restarted all the dataserver's
>> processes yesterday.
> Other people have complained about this and it appears that the 20040907
> release is definitely bad. Please switch to the 200408 release we are
> using for BaBar analysis (or try the 200502 development release if you are
> adevnturous). Also, it appears that the DNS name is getting screwed up (at
> least in the messages). Please do a gcore and send me the executable and
> core file (or place it in an accessible area).
One problem of the xrootd page is that I can't find any link to that
version and cannot list the download directory. I'm not sure I want to
install this version, but where can I find it?
How and where should I do this gcore? (On the redirector or dataserver?)
Please, also note that since babar2 reboot I've killed and restarted all
xrootd/olbd processes so I don't have processes in the state described
above until next babar2 reboot.
> All of this points out to a packaging problem we have. The only way we
> really test releases is to create what we call a development release. That,
> unfortunately, makes it available to everyone else -- even before we can
> certify it as being materially correct. I do know we've had some
> development releases that should have never seen the light of day, but
> unfortuantely the process lets them out. We are trying to get a new
> process in to place that will *never* cut a release unless we know that it
> will actually work on a reasonablly sized system.
-------------- Dr. Gregory Schott --------------
Institut fuer Experimentelle Kernphysik (IEKP)
Universitaet Karlsruhe - Postfach 3640
76021 Karlsruhe (Germany)
e-mail: [log in to unmask]