Print

Print


Hi,

Just an update on this issue in case someone else comes across
something similar in the future:
I don't really know exactly why it didn't work, but the host where the
redirector cms was running had different network interfaces configured
- 2 with local network IPs and one with an external one. After
deactivating one of the local configured interfaces everything works
now.

Cheers,
  Marcus

On Fri, Jan 9, 2015 at 5:31 PM, Marcus Ebert <[log in to unmask]> wrote:
> Hi Andy,
>
> Thanks! It seems this is the problem.... (see below)
>
> Both are running 4.1.1
> How would one diagnose what exactly is going wrong with the routing of
> the data packets here? ( I don't have root access on the manager
> machine)
>
> I focused so far only on the connection to the meta manager, but
> looking more careful to the log, it seems that also the local data
> server cms can not connect to the local manager although both, local
> manager and data server, run on the same machine...
> That's the log until the login try, don't know if there is anything
> else unusual:
>
> ++++++ cmsd anon@MANAGER initialization started.
> Config using configuration file /home/xrootd/config/redirector.cf
> =====> xrd.trace all
> 150109 19:44:10 5325 XrdSched: scheduling underused thread monitor in
> 780 seconds
> 150109 19:44:10 5325 XrdSched: Set min_Workers=8 max_Workers=4095
> 150109 19:44:10 5325 XrdSched: Set stk_Workers=3583 max_Workidl=780
> =====> xrd.sched mint 8 maxt 4095 avlt 512 idle 780
> Config maximum number of connections restricted to 4096
> Config maximum number of threads restricted to 54510
> 150109 19:44:18 5325 XrdConfig: sendfile enabled.
> 150109 19:44:18 5325 XrdSched: time event underused thread monitor cancelled
> 150109 19:44:18 5325 XrdSched: scheduling underused thread monitor in
> 780 seconds
> 150109 19:44:18 5325 XrdSched: Starting with 2 workers
> 150109 19:44:18 5325 XrdLink: Allocating 8 link objects at a time
> 150109 19:44:18 5325 XrdPoll: Starting poller 0
> 150109 19:44:18 5348 XrdXeq: Poller thread started
> 150109 19:44:18 5325 XrdPoll: Starting poller 1
> 150109 19:44:18 5344 XrdXeq: Worker thread started
> 150109 19:44:18 5342 XrdXeq: Worker thread started
> 150109 19:44:18 5340 XrdXeq: Buffer Manager reshaper thread started
> 150109 19:44:18 5350 XrdXeq: Poller thread started
> 150109 19:44:18 5341 XrdXeq: Time scheduler thread started
> 150109 19:44:18 5325 XrdPoll: Starting poller 2
> 150109 19:44:18 5352 XrdXeq: Poller thread started
> 150109 19:44:18 5325 XrdProtocol: getting port from protocol cmsd
> Copr.  2007 Stanford University/SLAC cmsd.
> ++++++ anon@MANAGER phase 1 initialization started.
> =====> all.role manager
> =====> all.manager meta METAMANAGER:1213
> =====> all.manager MANAGER:3121
> =====> cms.trace all
> =====> cms.sched cpu 100
> The following paths are available to the meta-manager:
> r  /
>
> ------ anon@MANAGER phase 1 manager initialization completed.
> 150109 19:44:18 5325 XrdConfig: LCL port 3121 wsz=87380 (87380)
> 150109 19:44:18 5325 XrdProtocol: getting protocol object cmsd
> ++++++ anon@MANAGER phase 2 manager initialization started.
> 150109 19:44:18 5371 XrdXeq: Cache Clock thread started
> 150109 19:44:18 5325 Replenish old free 0 + 4096 = 4096
> Config warning: adminpath resides in /tmp and may be unstable!
> 150109 19:44:18 5325 Configure2 Global System Identification: anon-m 3121MANAGER
> ++++++ Storage system initialization started.
> =====> oss.localroot /home/xrootd/disk/
> ++++++ Configuring standalone mode . . .
> 150109 19:44:18 5325 oss_AioInit: started AIO read signal thread; tid=777754368
> 150109 19:44:18 5325 oss_AioInit: started AIO write signal thread; tid=776701696
> Config effective /home/xrootd/config/redirector.cf oss configuration:
>        oss.alloc        0 0 0
>        oss.cachescan    600
>        oss.fdlimit      2048 4096
>        oss.maxsize      0
>        oss.localroot /home/xrootd/disk/
>        oss.trace        fff
>        oss.xfr          1 deny 10800 keep 1200
>        oss.memfile off  max 3667929088
>        oss.defaults  r/w  nocheck nodread nomig norcreate nopurge nostage xattr
> ------ Storage system initialization completed.
> 150109 19:44:18 5325 Start Srv=0 dfs=0 lcl=0 Pre=1 dmLife=0 0
> 150109 19:44:18 5325 Start Lim=0 0 fix=0 Qmax=1
> 150109 19:44:18 5374 XrdXeq: Performance monitor thread started
> 150109 19:44:18 5375 XrdXeq: Refcount monitor thread started
> 150109 19:44:18 5376 XrdXeq: Request Responder thread started
> 150109 19:44:18 5377 XrdXeq: Request Timeout thread started
> ------ anon@MANAGER phase 2 manager initialization completed.
> 150109 19:44:18 5344 XrdSched: running cmsd startup inq=0
> 150109 19:44:18 5344 XrdSched: scheduling ping clock in 60 seconds
> 150109 19:44:18 5380 XrdXeq: Admin traffic thread started
> 150109 19:44:18 5379 XrdXeq: Prep handler thread started
> 150109 19:44:18 5380 Start: Waiting for primary server to login.
> 150109 19:44:18 5378 XrdXeq: Notification handler thread started
> ------ cmsd anon@MANAGER:3121 initialization completed.
> 150109 19:44:18 5342 XrdSched: Now have 3 workers
> 150109 19:44:18 5342 XrdSched: running main accept inq=0
> 150109 19:44:18 5382 XrdXeq: Worker thread started
> 150109 19:44:22 5342 XrdInet: Accepted connection from 19@MANAGER
> 150109 19:44:22 5382 XrdSched: Now have 4 workers
> 150109 19:44:22 5382 XrdSched: running main accept inq=0
> 150109 19:44:22 5383 XrdXeq: Worker thread started
> 150109 19:44:22 5342 XrdProtocol: matched protocol cmsd
> 150109 19:44:22 5342 ?:19@MANAGER XrdPoll: FD 19 attached to poller 0; num=1
> 150109 19:44:28 5384 Admin_Login initial request: 'login p 5326 port 1095'
> 150109 19:44:28 5384 Update FrontEnd Parm1=1 Parm2=1095
> 150109 19:44:28 5384 do_Login:: Primary server 5326 logged in; data port is 1095
> 150109 19:44:28 5385 XrdXeq: State monitor thread started
> 150109 19:44:28 5383 XrdSched: Now have 5 workers
> 150109 19:44:28 5383 XrdSched: running cms protocol handler inq=0
> 150109 19:44:28 5383 Pander manager services to METAMANAGER:1213
> 150109 19:44:28 5383 Pander trying to connect to lvl 0 METAMANAGER:1213
> 150109 19:44:28 5386 XrdXeq: Worker thread started
> 150109 19:44:28 5383 XrdInet: Connected to METAMANAGER:1213
> 150109 19:44:32 5342 Protocol: redirector.5326:19@MANAGER logged in.
> 150109 19:44:32 5342 Admit_Redirector redirector.5326:19@MANAGER assigned slot 1
> 150109 19:44:32 5382 XrdInet: Accepted connection from 21@MANAGER
> 150109 19:44:32 5386 XrdSched: Now have 6 workers
> 150109 19:44:32 5386 XrdSched: running main accept inq=0
> 150109 19:44:32 5403 XrdXeq: Worker thread started
> 150109 19:44:33 5382 XrdProtocol: ?:21@Manager terminated login not received
> 150109 19:44:38 5383 Add METAMANAGER to manager config; id=0
> 150109 19:44:38 5383 manager.0:25@METAMANAGER XrdLink: No RecvAll()
> data; errno=0
> 150109 19:44:38 5383 Login: METAMANAGER login failed; rejected
> 150109 19:44:38 5383 Remove completed METAMANAGER manager 0.2
> 150109 19:44:38 5383 Manager: manager.0:25@METAMANAGER removed; lost connection
>
>
> Changing the configuration for the data server cms from "all.manager
> HOSTNAME:3121" to "localhost:3121" allows the local data server
> process to login:
>
> 150109 20:08:30 5507 XrdInet: Accepted connection from 21@localhost
> 150109 20:08:30 5506 XrdSched: running main accept inq=0
> 150109 20:08:30 5507 XrdProtocol: matched protocol cmsd
> 150109 20:08:30 5507 ?:21@localhost XrdPoll: FD 21 attached to poller 1; num=1
> 150109 20:08:30 5507 Add Reconnect server.5526:20@localhost:1094 to
> cluster anon-s 3121localhost slot 0.3 (nodecnt=1 supn=1)
> 150109 20:08:30 5507 Update Counts Parm1=1 Parm2=1
> 150109 20:08:30 5507 Admit localhost TSpace=2513GB NumFS=1
> FSpace=2573624MB MinFR=51473 MB Util=1 Share=100 TZone=-5
> 150109 20:08:30 5507 Admit localhost adding path: ws /store
> 150109 20:08:30 5507 Admit localhost adding path: ws /prod
> 150109 20:08:30 5507 server.5526:20@localhost:1094 do_Space: 2573624MB
> free; 1% util
> 150109 20:08:30 5507 Protocol: Primary server.5526:20@localhost:1094 logged in.
>
> So it seems you found the problem being a routing problem and probably
> I should focus first to get the local data server process register
> with the local manager using the hostname of the machine instead of
> localhost...
> But how could this be solved....?
>
>
> Cheers,
>   Marcus
> --



-- 
---------------------------------------------------------------------
Marcus Ebert
SLAC National Accelerator Laboratory
Mail Stop 34
2575 Sand Hill Road
Menlo Park, CA, 94025
USA
------
office: ROB 129
Tel.  : +1 (650) 926-3565
=====================================================================
      Please, don't send Word, Excel or PowerPoint attachments -
            use pdf, plain text or HTML format instead.
   See also http://www.gnu.org/philosophy/no-word-attachments.html
---------------------------------------------------------------------

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1