Hi Tommaso,
You mentioned that the fnal.goc addresses are worer nodes. Why are they connecting to the cmsd?
Andy
On Fri, 4 Sep 2015, Tommaso Boccali wrote:By the way, yesterday i upgraded the eu redir to 423. Seems to work fine,
even if the statistics is less than 1 day for the moment....
Tom
Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha scritto:*Gerard Bernabeu AltayĆ³*the fnal.gov address is from a WorkerNode (probably running a CMS job).
Gerard
On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky <[log in to unmask]>
wrote:
Hi Tommaso,
What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The
message clearly shows that whatever they sent over was incorrect. Yes,
4.2.2 would crash in this case, sigh.
Andy
On Wed, 26 Aug 2015, Tommaso Boccali wrote:
ciao, another piece of info which might be interesting:90
I was looking into the bari eu redir, which uses xrootd
xrootd-4.1.1-1.el5
the cmsd.log has TONS of messages like
150826 05:18:00 30442 XrdInet: Accepted connection from
[log in to unmask]
150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD
attached to poller 0; num=23failed;
150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0: ident.
150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login
invalid login data90
150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD
detached from poller 0; num=22login
from many servers, most from FNAL
150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid
datalogin
150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid
datalogin
150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid
datalogin
150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid
datalogin
150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid
datalogin
150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid
datalogin
150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid
datalogin
150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid
datalogin
150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid
datawrote:
apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2, but
moves along ...
tom
On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]>
(so
On 8/25/15 11:58 AM, Tommaso Boccali wrote:
Well, but: isn't th global redir only subscribed by regional redirs
morenot many)?
you're right, I neglected this fact (outsmarted myself ;))...
Probably eu redirs are the most connected, with close to 64 cmsd
entering... It s just normal we saw the problem there.
ok, this is alarming and we should revise current setup and introduce
this -redirectors if needed in EU. Btw, I recently talked with Andy about
globalit looks much more promising way to handle 64 limits - to think about
supervisors:
http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050
I'm going to do this in transitional federation where there is one
offredirector for all T3s and then those subscribers who will be kicked
duefrom production federation and subscribed there instead.
-Marian
Ifca said it has 336-1, which is fairly common. I guess it cannot be
redirectorsto (just) the release....
Andy, did you understand the source of the bad Iogin data? Is it worth
trying and debugging it?
Tom
Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask]
<mailto:[log in to unmask]>> ha scritto:
On 08/25/2015 05:56 PM, Marian Zvada wrote:
Hi Tom,
[..]
yeah, that is my guess too, but then we have global
I'dat CERN
running 4.2.2 dealing with hell lot of cmsd subscriptions so
countsexpect
some visible trouble there as well. So maybe we're lucky there
too so
far... (I believe that autorestart of cmsd if it crashes is
disabled
there, Jan?)
No, the CMS global redirectors are on CC7, and will auto-restart
cmsd on "unclean" exit (Restart=on-abort). I hope that SEGV
as such...
Not sure whether we'd even notice the occasional restart, unless
another tool (abrt) picks this up.
Cheers
jan
--
Tommaso Boccali
INFN Pisa
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
--
Deputy Department Head
Distributed Computing Services Operations
Fermi National Accelerator Laboratory
630 840 6509 office
www.fnal.gov
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1