Hi Tommaso, You mentioned that the fnal.goc addresses are worer nodes. Why are they connecting to the cmsd? Andy On Fri, 4 Sep 2015, Tommaso Boccali wrote: > By the way, yesterday i upgraded the eu redir to 423. Seems to work fine, > even if the statistics is less than 1 day for the moment.... > > Tom > Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha scritto: > >> the fnal.gov address is from a WorkerNode (probably running a CMS job). >> >> Gerard >> >> On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky <[log in to unmask]> >> wrote: >> >>> Hi Tommaso, >>> >>> What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The >>> message clearly shows that whatever they sent over was incorrect. Yes, >>> 4.2.2 would crash in this case, sigh. >>> >>> Andy >>> >>> On Wed, 26 Aug 2015, Tommaso Boccali wrote: >>> >>>> ciao, another piece of info which might be interesting: >>>> >>>> I was looking into the bari eu redir, which uses xrootd >>>> >>>> xrootd-4.1.1-1.el5 >>>> >>>> the cmsd.log has TONS of messages like >>>> >>>> 150826 05:18:00 30442 XrdInet: Accepted connection from >>>> [log in to unmask] >>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD >>> 90 >>>> attached to poller 0; num=23 >>>> 150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0: ident. >>>> 150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login >>> failed; >>>> invalid login data >>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD >>> 90 >>>> detached from poller 0; num=22 >>>> >>>> from many servers, most from FNAL >>>> >>>> 150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid >>> login >>>> data >>>> 150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid >>> login >>>> data >>>> >>>> apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2, but >>>> moves along ... >>>> >>>> tom >>>> >>>> On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]> >>> wrote: >>>> >>>>> On 8/25/15 11:58 AM, Tommaso Boccali wrote: >>>>> >>>>>> Well, but: isn't th global redir only subscribed by regional redirs >>> (so >>>>>> not many)? >>>>>> >>>>> >>>>> you're right, I neglected this fact (outsmarted myself ;))... >>>>> >>>>> Probably eu redirs are the most connected, with close to 64 cmsd >>>>>> entering... It s just normal we saw the problem there. >>>>>> >>>>> >>>>> ok, this is alarming and we should revise current setup and introduce >>> more >>>>> redirectors if needed in EU. Btw, I recently talked with Andy about >>> this - >>>>> it looks much more promising way to handle 64 limits - to think about >>>>> supervisors: >>>>> >>>>> http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050 >>>>> >>>>> I'm going to do this in transitional federation where there is one >>> global >>>>> redirector for all T3s and then those subscribers who will be kicked >>> off >>>>> from production federation and subscribed there instead. >>>>> >>>>> -Marian >>>>> >>>>> Ifca said it has 336-1, which is fairly common. I guess it cannot be >>> due >>>>>> to (just) the release.... >>>>>> >>>>>> Andy, did you understand the source of the bad Iogin data? Is it worth >>>>>> trying and debugging it? >>>>>> >>>>>> Tom >>>>>> >>>>>> Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask] >>>>>> <mailto:[log in to unmask]>> ha scritto: >>>>>> >>>>>> On 08/25/2015 05:56 PM, Marian Zvada wrote: >>>>>> >>>>>> Hi Tom, >>>>>> >>>>>> [..] >>>>>> >>>>>> yeah, that is my guess too, but then we have global >>> redirectors >>>>>> at CERN >>>>>> running 4.2.2 dealing with hell lot of cmsd subscriptions so >>> I'd >>>>>> expect >>>>>> some visible trouble there as well. So maybe we're lucky there >>>>>> too so >>>>>> far... (I believe that autorestart of cmsd if it crashes is >>>>>> disabled >>>>>> there, Jan?) >>>>>> >>>>>> >>>>>> No, the CMS global redirectors are on CC7, and will auto-restart >>>>>> cmsd on "unclean" exit (Restart=on-abort). I hope that SEGV >>> counts >>>>>> as such... >>>>>> >>>>>> Not sure whether we'd even notice the occasional restart, unless >>>>>> another tool (abrt) picks this up. >>>>>> >>>>>> Cheers >>>>>> jan >>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Tommaso Boccali >>>> INFN Pisa >>>> >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-L list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>> >> >> >> >> -- >> *Gerard Bernabeu Altayó* >> Deputy Department Head >> >> Distributed Computing Services Operations >> Fermi National Accelerator Laboratory >> 630 840 6509 office >> www.fnal.gov >> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1