Print

Print


Hi Tapas,

I have submitted a set of patches that should solve the supervisor issues 
that you have encountered. It would be extremely helpful if you could test 
the patches. In your case, that means installing the "git head" RPM on 
your top level redirector (the one that redirects servers to supervisors). 
No need to change anything else (assuming you run less than 64 supervisors 
-- which I think is the case). You can find the relevant RPM's at:

https://teamcity-dss.cern.ch:8443/viewLog.html?buildId=45007&tab=artifacts&buildTypeId=bt45

Simply pick the platform and install the relevant RPM's (i.e. the server 
set of RPM's). You will likely need to de-install the existing RPM before 
installing the new set. Alternatively, if it's easy to setup a test 
redirector to test there that route might be easier.

Andy

  On Fri, 10 Apr 2015, Tapas Sarangi wrote:

> @abh3
> Suggested mitigation with `_I v4` in `xrootd` and `cmsd` for supervisors and manager doesn't seem to be working....
>
> Here are cmsd logs from manager:
>
> ```
> 150410 11:52:13 364 Protocol: redirector.529:17@cmsxrootd logged in.
> 150410 11:52:13 364 Admit_Redirector redirector.529:17@cmsxrootd assigned slot 1
> 150410 11:52:15 384 AddNode srv supervisor.19907:18@s15n01:31094 cluster 1213cmsxrootd.hep.wisc.edu mask=1 anum=0
> 150410 11:52:15 384 Add supervisor.19907:18@s15n01:31094 to cluster anon-u 1213cmsxrootd.hep.wisc.edu slot 0.2 (nodecnt=1 supn=1)
> 150410 11:52:15 384 Update Counts Parm1=0 Parm2=0
> 150410 11:52:15 384 Admit s15n01 TSpace=1GB NumFS=0 FSpace=0MB MinFR=0 MB Util=0 Share=100 TZone=-6
> 150410 11:52:15 384 Admit s15n01 adding path: w /
> 150410 11:52:15 384 supervisor.19907:18@s15n01:31094 do_Space: 0MB free; 0% util
> 150410 11:52:15 384 Protocol: Primary supervisor.19907:18@s15n01:31094 logged in suspended.
> =====> Routing for s15n01.hep.wisc.edu: local pub4 prv4 pub6 prv6
> =====> Route all4: s15n01.hep.wisc.edu Dest=[::144.92.181.127]:31094
> =====> Route all6: s15n01.hep.wisc.edu Dest=[2607:f388:101c:1000::335]:31094
> 150410 11:52:23 362 Update Stage Parm1=-1 Parm2=0
> 150410 11:52:23 362 Update Active Parm1=-1 Parm2=0
> 150410 11:52:23 362 Config: manager service enabled.
> 150410 11:52:23 381 State: Status changed to suspended + nostaging
> 150410 11:52:23 381 Send status to redirector.529:17@cmsxrootd
> 150410 11:52:25 384 Dispatch supervisor.19907:18@s15n01:31094 for status dlen=0
> 150410 11:52:25 384 supervisor.19907:18@s15n01:31094 do_Status: suspend nostage
> 150410 11:54:34 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:55:27 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:56:06 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:56:41 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:57:11 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:58:05 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:58:53 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 11:59:27 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 12:00:06 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 12:00:40 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 12:01:15 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 12:01:46 364 Dispatch redirector.529:17@cmsxrootd for update dlen=0
> 150410 12:02:13 384 Dispatch supervisor.19907:18@s15n01:31094 for load dlen=12
> 150410 12:02:13 384 supervisor.19907:18@s15n01:31094 do_Load: cpu=0 net=0 xeq=0 mem=0 pag=0 dsk=0% 0MB load=0 mass=0
> ```
>
> Logs from supervisor:
> ```
> ------ cmsd [log in to unmask]:49972 initialization completed.
> 150410 11:52:15 19927 Inet: Accepted connection from 7@localhost
> 150410 11:52:15 20090 Admin_Login initial request: 'login p 20073 port 31094'
> 150410 11:52:15 20090 Update FrontEnd Parm1=1 Parm2=31094
> 150410 11:52:15 20090 do_Login:: Primary server 20073 logged in; data port is 31094
> 150410 11:52:15 19927 Protocol: redirector.20073:7@localhost logged in.
> 150410 11:52:15 19927 Admit_Redirector redirector.20073:7@localhost assigned slot 1
> 150410 11:52:15 19930 Pander supervisor services to cmsxrootd.hep.wisc.edu:1213
> 150410 11:52:15 19930 Pander trying to connect to lvl 0 cmsxrootd.hep.wisc.edu:1213
> 150410 11:52:15 19930 Add cmsxrootd.hep.wisc.edu to manager config; id=0
> 150410 11:52:15 19930 ManTree: Now connected to 1 root node(s)
> 150410 11:52:15 19930 Protocol: Logged into cmsxrootd
> 150410 11:52:25 19911 Update Stage Parm1=-1 Parm2=0
> 150410 11:52:25 19911 Update Active Parm1=-1 Parm2=0
> 150410 11:52:25 19911 Config: supervisor service enabled.
> 150410 11:52:25 20091 State: Status changed to suspended + nostaging
> 150410 11:52:25 20091 Inform cmsxrootd.hep.wisc.edu status
> 150410 12:02:13 19930 Dispatch manager.0:21@cmsxrootd for usage dlen=0
> 150410 12:02:13 19930 Report_Usage cpu=0 net=0 xeq=0 mem=0 pag=0 dsk=0 0
> 150410 12:12:13 19930 Dispatch manager.0:21@cmsxrootd for usage dlen=0
> 150410 12:12:13 19930 Report_Usage cpu=0 net=0 xeq=0 mem=0 pag=0 dsk=0 0
> ```
>
> logs from one of the data servers:
>
> ```
> 150410 12:13:43 4656 Pander trying to connect to lvl 0 cmsxrootd.hep.wisc.edu:1213
> 150410 12:13:43 4656 XrdOpen: Unable to connect socket to cmsxrootd.hep.wisc.edu; connection refused
> 150410 12:13:46 4656 Pander trying to connect to lvl 1 [2607:42672
> 150410 12:13:52 4656 Pander trying to connect to lvl 0 cmsxrootd.hep.wisc.edu:1213
> 150410 12:13:55 4656 Pander trying to connect to lvl 1 [2607:42672
> 150410 12:14:01 4656 Pander trying to connect to lvl 0 cmsxrootd.hep.wisc.edu:1213
> 150410 12:14:04 4656 Pander trying to connect to lvl 1 [2607:42672
> 150410 12:14:10 4656 Pander trying to connect to lvl 0 cmsxrootd.hep.wisc.edu:1213
> 150410 12:14:13 4656 Pander trying to connect to lvl 1 [2607:42672
> 150410 12:14:13 4656 XrdOpen: Unable to create socket for ' [2607 '; invalid IPv6 address
> ```
>
>
>
>
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/xrootd/xrootd/issues/227#issuecomment-91626234


---
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/227#issuecomment-92688157

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1