Hello Andrew, >> 1) There are two network interfaced on the redirector. The issue is that >> using the wrong one in the KanAccess file (the one that is not the same as >> in the xrootd config files) causes to crash the xrootd (and not olbd) >> proccess on the redirector. >> >> The core file reveals: >>> gdb /opt/xrootd/bin/xrootd core.9418 >> #0 0x0806349c in XrdNetwork::getHostName () > Yes, this is a known problem. The xrootd will crash if it cannot do a > reverse translation of IP address to DNS name for it's own machine. I > should be more graceful and issue an error message and exit. That's on the > list of corrections. >> >> 2) Strange disk size in the olb redirector log file. I'm using the sep 04 >> prod version. >> >> 050216 18:49:32 9419 olb_a2i: tot dsk value -1463812096 is too small >> 050216 18:49:32 9419 olb_Server: invalid response from f01-001-118.gridka.de:1094 > What release are you using? This was a problem in some early releases > dealing with the way parameters were being passed to a function and how > the compiler treated them. It worked using Sun CC but not in g++. As I said, it's the september 2004 production version that I'm running. I think I remember Jean-Yves told me that using the same release he's not seeing this problem at IN2P3. He's not running on linux there though. >> 3) Probably it's a result of the computing center's instability of >> last weekbut most of the dataservers are not registering with olbd on > the >> redirector. Only the dataservers that I restarted are now registering. The >> olb logfile of the ones not registering are showing: >> >> 050211 00:00:00 20383 olb_Config: (c) 2004 SLAC olbd version 20040907-0403 executing as Server >> 050211 14:33:02 20388 olb_Manager: Manager babar2 appears to be dead. >> 050211 14:35:15 20388 olb_Manager: Manager h^RBh^RB appears to be dead. >> 050211 15:03:05 1143 olb_Config: (c) 2004 SLAC olbd version 20040907-0403 initializing as Server >> 050211 15:03:05 1143 olb_Config: Server initialization completed. >> 050211 15:03:05 1155 olb_Start: Waiting for primary server to login. >> 050211 15:03:06 1157 olb_Admin_Login: Primary server 1142 logged in >> 050211 15:03:06 1143 olb_Server: Logged into babar2 >> 050211 16:14:01 1149 olb_Manager: Manager babar2 appears to be dead. >> 050211 16:16:14 1149 olb_Manager: Manager h^RBh^RB appears to be dead. >> 050211 16:18:27 1149 olb_Manager: Manager p^p^ appears to be dead. >> 050211 16:20:40 1149 olb_Manager: Manager p^p^ appears to be dead. >> 050211 16:22:54 1149 olb_Manager: Manager p^p^ appears to be dead. >> 050211 16:25:07 1149 olb_Manager: Manager p^p^ appears to be dead. >> >> Indeed, babar2 has been that day (11/02/2005) around 16:12. This message >> kept going on in the log file until I restarted all the dataserver's >> processes yesterday. > Other people have complained about this and it appears that the 20040907 > release is definitely bad. Please switch to the 200408 release we are > using for BaBar analysis (or try the 200502 development release if you are > adevnturous). Also, it appears that the DNS name is getting screwed up (at > least in the messages). Please do a gcore and send me the executable and > core file (or place it in an accessible area). One problem of the xrootd page is that I can't find any link to that version and cannot list the download directory. I'm not sure I want to install this version, but where can I find it? How and where should I do this gcore? (On the redirector or dataserver?) Please, also note that since babar2 reboot I've killed and restarted all xrootd/olbd processes so I don't have processes in the state described above until next babar2 reboot. -- Gregory > All of this points out to a packaging problem we have. The only way we > really test releases is to create what we call a development release. That, > unfortunately, makes it available to everyone else -- even before we can > certify it as being materially correct. I do know we've had some > development releases that should have never seen the light of day, but > unfortuantely the process lets them out. We are trying to get a new > process in to place that will *never* cut a release unless we know that it > will actually work on a reasonablly sized system. > > Andy -------------- Dr. Gregory Schott -------------- Institut fuer Experimentelle Kernphysik (IEKP) Universitaet Karlsruhe - Postfach 3640 76021 Karlsruhe (Germany) tel.: +49-(0)724782-3537 fax.: +49-(0)724782-3414 e-mail: [log in to unmask] -----------------------------------------------