Hi, Just an update on this issue in case someone else comes across something similar in the future: I don't really know exactly why it didn't work, but the host where the redirector cms was running had different network interfaces configured - 2 with local network IPs and one with an external one. After deactivating one of the local configured interfaces everything works now. Cheers, Marcus On Fri, Jan 9, 2015 at 5:31 PM, Marcus Ebert <[log in to unmask]> wrote: > Hi Andy, > > Thanks! It seems this is the problem.... (see below) > > Both are running 4.1.1 > How would one diagnose what exactly is going wrong with the routing of > the data packets here? ( I don't have root access on the manager > machine) > > I focused so far only on the connection to the meta manager, but > looking more careful to the log, it seems that also the local data > server cms can not connect to the local manager although both, local > manager and data server, run on the same machine... > That's the log until the login try, don't know if there is anything > else unusual: > > ++++++ cmsd anon@MANAGER initialization started. > Config using configuration file /home/xrootd/config/redirector.cf > =====> xrd.trace all > 150109 19:44:10 5325 XrdSched: scheduling underused thread monitor in > 780 seconds > 150109 19:44:10 5325 XrdSched: Set min_Workers=8 max_Workers=4095 > 150109 19:44:10 5325 XrdSched: Set stk_Workers=3583 max_Workidl=780 > =====> xrd.sched mint 8 maxt 4095 avlt 512 idle 780 > Config maximum number of connections restricted to 4096 > Config maximum number of threads restricted to 54510 > 150109 19:44:18 5325 XrdConfig: sendfile enabled. > 150109 19:44:18 5325 XrdSched: time event underused thread monitor cancelled > 150109 19:44:18 5325 XrdSched: scheduling underused thread monitor in > 780 seconds > 150109 19:44:18 5325 XrdSched: Starting with 2 workers > 150109 19:44:18 5325 XrdLink: Allocating 8 link objects at a time > 150109 19:44:18 5325 XrdPoll: Starting poller 0 > 150109 19:44:18 5348 XrdXeq: Poller thread started > 150109 19:44:18 5325 XrdPoll: Starting poller 1 > 150109 19:44:18 5344 XrdXeq: Worker thread started > 150109 19:44:18 5342 XrdXeq: Worker thread started > 150109 19:44:18 5340 XrdXeq: Buffer Manager reshaper thread started > 150109 19:44:18 5350 XrdXeq: Poller thread started > 150109 19:44:18 5341 XrdXeq: Time scheduler thread started > 150109 19:44:18 5325 XrdPoll: Starting poller 2 > 150109 19:44:18 5352 XrdXeq: Poller thread started > 150109 19:44:18 5325 XrdProtocol: getting port from protocol cmsd > Copr. 2007 Stanford University/SLAC cmsd. > ++++++ anon@MANAGER phase 1 initialization started. > =====> all.role manager > =====> all.manager meta METAMANAGER:1213 > =====> all.manager MANAGER:3121 > =====> cms.trace all > =====> cms.sched cpu 100 > The following paths are available to the meta-manager: > r / > > ------ anon@MANAGER phase 1 manager initialization completed. > 150109 19:44:18 5325 XrdConfig: LCL port 3121 wsz=87380 (87380) > 150109 19:44:18 5325 XrdProtocol: getting protocol object cmsd > ++++++ anon@MANAGER phase 2 manager initialization started. > 150109 19:44:18 5371 XrdXeq: Cache Clock thread started > 150109 19:44:18 5325 Replenish old free 0 + 4096 = 4096 > Config warning: adminpath resides in /tmp and may be unstable! > 150109 19:44:18 5325 Configure2 Global System Identification: anon-m 3121MANAGER > ++++++ Storage system initialization started. > =====> oss.localroot /home/xrootd/disk/ > ++++++ Configuring standalone mode . . . > 150109 19:44:18 5325 oss_AioInit: started AIO read signal thread; tid=777754368 > 150109 19:44:18 5325 oss_AioInit: started AIO write signal thread; tid=776701696 > Config effective /home/xrootd/config/redirector.cf oss configuration: > oss.alloc 0 0 0 > oss.cachescan 600 > oss.fdlimit 2048 4096 > oss.maxsize 0 > oss.localroot /home/xrootd/disk/ > oss.trace fff > oss.xfr 1 deny 10800 keep 1200 > oss.memfile off max 3667929088 > oss.defaults r/w nocheck nodread nomig norcreate nopurge nostage xattr > ------ Storage system initialization completed. > 150109 19:44:18 5325 Start Srv=0 dfs=0 lcl=0 Pre=1 dmLife=0 0 > 150109 19:44:18 5325 Start Lim=0 0 fix=0 Qmax=1 > 150109 19:44:18 5374 XrdXeq: Performance monitor thread started > 150109 19:44:18 5375 XrdXeq: Refcount monitor thread started > 150109 19:44:18 5376 XrdXeq: Request Responder thread started > 150109 19:44:18 5377 XrdXeq: Request Timeout thread started > ------ anon@MANAGER phase 2 manager initialization completed. > 150109 19:44:18 5344 XrdSched: running cmsd startup inq=0 > 150109 19:44:18 5344 XrdSched: scheduling ping clock in 60 seconds > 150109 19:44:18 5380 XrdXeq: Admin traffic thread started > 150109 19:44:18 5379 XrdXeq: Prep handler thread started > 150109 19:44:18 5380 Start: Waiting for primary server to login. > 150109 19:44:18 5378 XrdXeq: Notification handler thread started > ------ cmsd anon@MANAGER:3121 initialization completed. > 150109 19:44:18 5342 XrdSched: Now have 3 workers > 150109 19:44:18 5342 XrdSched: running main accept inq=0 > 150109 19:44:18 5382 XrdXeq: Worker thread started > 150109 19:44:22 5342 XrdInet: Accepted connection from 19@MANAGER > 150109 19:44:22 5382 XrdSched: Now have 4 workers > 150109 19:44:22 5382 XrdSched: running main accept inq=0 > 150109 19:44:22 5383 XrdXeq: Worker thread started > 150109 19:44:22 5342 XrdProtocol: matched protocol cmsd > 150109 19:44:22 5342 ?:19@MANAGER XrdPoll: FD 19 attached to poller 0; num=1 > 150109 19:44:28 5384 Admin_Login initial request: 'login p 5326 port 1095' > 150109 19:44:28 5384 Update FrontEnd Parm1=1 Parm2=1095 > 150109 19:44:28 5384 do_Login:: Primary server 5326 logged in; data port is 1095 > 150109 19:44:28 5385 XrdXeq: State monitor thread started > 150109 19:44:28 5383 XrdSched: Now have 5 workers > 150109 19:44:28 5383 XrdSched: running cms protocol handler inq=0 > 150109 19:44:28 5383 Pander manager services to METAMANAGER:1213 > 150109 19:44:28 5383 Pander trying to connect to lvl 0 METAMANAGER:1213 > 150109 19:44:28 5386 XrdXeq: Worker thread started > 150109 19:44:28 5383 XrdInet: Connected to METAMANAGER:1213 > 150109 19:44:32 5342 Protocol: redirector.5326:19@MANAGER logged in. > 150109 19:44:32 5342 Admit_Redirector redirector.5326:19@MANAGER assigned slot 1 > 150109 19:44:32 5382 XrdInet: Accepted connection from 21@MANAGER > 150109 19:44:32 5386 XrdSched: Now have 6 workers > 150109 19:44:32 5386 XrdSched: running main accept inq=0 > 150109 19:44:32 5403 XrdXeq: Worker thread started > 150109 19:44:33 5382 XrdProtocol: ?:21@Manager terminated login not received > 150109 19:44:38 5383 Add METAMANAGER to manager config; id=0 > 150109 19:44:38 5383 manager.0:25@METAMANAGER XrdLink: No RecvAll() > data; errno=0 > 150109 19:44:38 5383 Login: METAMANAGER login failed; rejected > 150109 19:44:38 5383 Remove completed METAMANAGER manager 0.2 > 150109 19:44:38 5383 Manager: manager.0:25@METAMANAGER removed; lost connection > > > Changing the configuration for the data server cms from "all.manager > HOSTNAME:3121" to "localhost:3121" allows the local data server > process to login: > > 150109 20:08:30 5507 XrdInet: Accepted connection from 21@localhost > 150109 20:08:30 5506 XrdSched: running main accept inq=0 > 150109 20:08:30 5507 XrdProtocol: matched protocol cmsd > 150109 20:08:30 5507 ?:21@localhost XrdPoll: FD 21 attached to poller 1; num=1 > 150109 20:08:30 5507 Add Reconnect server.5526:20@localhost:1094 to > cluster anon-s 3121localhost slot 0.3 (nodecnt=1 supn=1) > 150109 20:08:30 5507 Update Counts Parm1=1 Parm2=1 > 150109 20:08:30 5507 Admit localhost TSpace=2513GB NumFS=1 > FSpace=2573624MB MinFR=51473 MB Util=1 Share=100 TZone=-5 > 150109 20:08:30 5507 Admit localhost adding path: ws /store > 150109 20:08:30 5507 Admit localhost adding path: ws /prod > 150109 20:08:30 5507 server.5526:20@localhost:1094 do_Space: 2573624MB > free; 1% util > 150109 20:08:30 5507 Protocol: Primary server.5526:20@localhost:1094 logged in. > > So it seems you found the problem being a routing problem and probably > I should focus first to get the local data server process register > with the local manager using the hostname of the machine instead of > localhost... > But how could this be solved....? > > > Cheers, > Marcus > -- -- --------------------------------------------------------------------- Marcus Ebert SLAC National Accelerator Laboratory Mail Stop 34 2575 Sand Hill Road Menlo Park, CA, 94025 USA ------ office: ROB 129 Tel. : +1 (650) 926-3565 ===================================================================== Please, don't send Word, Excel or PowerPoint attachments - use pdf, plain text or HTML format instead. See also http://www.gnu.org/philosophy/no-word-attachments.html --------------------------------------------------------------------- ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1