ciao andrew, are you sure these are connections from servers and not from clients? I Ask since their names are suspicious: 160219 08:53:38 16932 Pup: buffer overrun unpacking short arg 0: ident. 160219 08:53:38 16932 Login: sbgwn20.in2p3.fr login failed; invalid login data 160219 08:57:02 16934 Pup: buffer overrun unpacking short arg 0: ident. 160219 08:57:02 16934 Login: gw1.cis.gov.pl login failed; invalid login data 160219 08:45:08 64858 Pup: buffer overrun unpacking short arg 0: ident. 160219 08:45:08 64858 Login: sbgwn12.in2p3.fr login failed; invalid login data ... 160219 08:56:34 24733 Login: g28n03.hep.wisc.edu login failed; invalid login data ... they seem WN names, not server names .... still, I am asking them tom On Thu, Feb 18, 2016 at 10:25 PM, Andrew Hanushevsky <[log in to unmask]> wrote: > Hi Tommaso, > > Yes, you should care. These sites are not joining your cluster as they > cannot login. Could you tell me what version the sites that are getting the > errors running? > > Andy > > *From:* Tommaso Boccali <[log in to unmask]> > *Sent:* Thursday, February 18, 2016 8:12 AM > *To:* Andrew Hanushevsky <[log in to unmask]> > *Cc:* Gerard Bernabeu <[log in to unmask]> ; [log in to unmask] ; Marian > Zvada <[log in to unmask]> ; Jan Iven <[log in to unmask]> > *Subject:* Re: problem in transitioning a redirector from 3.3.6 to 4.2.2 > > ciao, coming back to this, a few months later (on 4.2.3) > > i still see TONS of > > 160218 17:08:17 42171 Pup: buffer overrun unpacking short arg 0: ident. > 160218 17:08:17 42171 Login: gridlink.hephy.oeaw.ac.at login failed; > invalid login data > ... > 160218 17:07:52 42118 Login: grid-wn080.physik.rwth-aachen.de login > failed; invalid login data > ... > 160218 17:04:06 40163 Login: fw-nat-inside-outside.gridka.de login > failed; invalid login data > ... > 160218 16:53:56 25501 Login: wna033.jinr-t1.ru login failed; invalid > login data > > > in cmsd.log > > not sure it has any bad effect ... but: should we care? > > this is at least 1 Hz, and comes form multiple sites .... > > > tom > > On Fri, Sep 4, 2015 at 7:55 AM, Andrew Hanushevsky <[log in to unmask]> > wrote: > >> Hi Tommaso, >> >> You mentioned that the fnal.goc addresses are worer nodes. Why are they >> connecting to the cmsd? >> >> Andy >> >> >> On Fri, 4 Sep 2015, Tommaso Boccali wrote: >> >> By the way, yesterday i upgraded the eu redir to 423. Seems to work fine, >>> even if the statistics is less than 1 day for the moment.... >>> >>> Tom >>> Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha >>> scritto: >>> >>> the fnal.gov address is from a WorkerNode (probably running a CMS job). >>>> >>>> Gerard >>>> >>>> On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky < >>>> [log in to unmask]> >>>> wrote: >>>> >>>> Hi Tommaso, >>>>> >>>>> What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The >>>>> message clearly shows that whatever they sent over was incorrect. Yes, >>>>> 4.2.2 would crash in this case, sigh. >>>>> >>>>> Andy >>>>> >>>>> On Wed, 26 Aug 2015, Tommaso Boccali wrote: >>>>> >>>>> ciao, another piece of info which might be interesting: >>>>>> >>>>>> I was looking into the bari eu redir, which uses xrootd >>>>>> >>>>>> xrootd-4.1.1-1.el5 >>>>>> >>>>>> the cmsd.log has TONS of messages like >>>>>> >>>>>> 150826 05:18:00 30442 XrdInet: Accepted connection from >>>>>> [log in to unmask] >>>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: >>>>>> FD >>>>>> >>>>> 90 >>>>> >>>>>> attached to poller 0; num=23 >>>>>> 150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0: >>>>>> ident. >>>>>> 150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login >>>>>> >>>>> failed; >>>>> >>>>>> invalid login data >>>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: >>>>>> FD >>>>>> >>>>> 90 >>>>> >>>>>> detached from poller 0; num=22 >>>>>> >>>>>> from many servers, most from FNAL >>>>>> >>>>>> 150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> 150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid >>>>>> >>>>> login >>>>> >>>>>> data >>>>>> >>>>>> apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2, >>>>>> but >>>>>> moves along ... >>>>>> >>>>>> tom >>>>>> >>>>>> On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]> >>>>>> >>>>> wrote: >>>>> >>>>>> >>>>>> On 8/25/15 11:58 AM, Tommaso Boccali wrote: >>>>>>> >>>>>>> Well, but: isn't th global redir only subscribed by regional redirs >>>>>>>> >>>>>>> (so >>>>> >>>>>> not many)? >>>>>>>> >>>>>>>> >>>>>>> you're right, I neglected this fact (outsmarted myself ;))... >>>>>>> >>>>>>> Probably eu redirs are the most connected, with close to 64 cmsd >>>>>>> >>>>>>>> entering... It s just normal we saw the problem there. >>>>>>>> >>>>>>>> >>>>>>> ok, this is alarming and we should revise current setup and introduce >>>>>>> >>>>>> more >>>>> >>>>>> redirectors if needed in EU. Btw, I recently talked with Andy about >>>>>>> >>>>>> this - >>>>> >>>>>> it looks much more promising way to handle 64 limits - to think about >>>>>>> supervisors: >>>>>>> >>>>>>> http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050 >>>>>>> >>>>>>> I'm going to do this in transitional federation where there is one >>>>>>> >>>>>> global >>>>> >>>>>> redirector for all T3s and then those subscribers who will be kicked >>>>>>> >>>>>> off >>>>> >>>>>> from production federation and subscribed there instead. >>>>>>> >>>>>>> -Marian >>>>>>> >>>>>>> Ifca said it has 336-1, which is fairly common. I guess it cannot be >>>>>>> >>>>>> due >>>>> >>>>>> to (just) the release.... >>>>>>>> >>>>>>>> Andy, did you understand the source of the bad Iogin data? Is it >>>>>>>> worth >>>>>>>> trying and debugging it? >>>>>>>> >>>>>>>> Tom >>>>>>>> >>>>>>>> Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask] >>>>>>>> <mailto:[log in to unmask]>> ha scritto: >>>>>>>> >>>>>>>> On 08/25/2015 05:56 PM, Marian Zvada wrote: >>>>>>>> >>>>>>>> Hi Tom, >>>>>>>> >>>>>>>> [..] >>>>>>>> >>>>>>>> yeah, that is my guess too, but then we have global >>>>>>>> >>>>>>> redirectors >>>>> >>>>>> at CERN >>>>>>>> running 4.2.2 dealing with hell lot of cmsd subscriptions so >>>>>>>> >>>>>>> I'd >>>>> >>>>>> expect >>>>>>>> some visible trouble there as well. So maybe we're lucky >>>>>>>> there >>>>>>>> too so >>>>>>>> far... (I believe that autorestart of cmsd if it crashes is >>>>>>>> disabled >>>>>>>> there, Jan?) >>>>>>>> >>>>>>>> >>>>>>>> No, the CMS global redirectors are on CC7, and will auto-restart >>>>>>>> cmsd on "unclean" exit (Restart=on-abort). I hope that SEGV >>>>>>>> >>>>>>> counts >>>>> >>>>>> as such... >>>>>>>> >>>>>>>> Not sure whether we'd even notice the occasional restart, unless >>>>>>>> another tool (abrt) picks this up. >>>>>>>> >>>>>>>> Cheers >>>>>>>> jan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Tommaso Boccali >>>>>> INFN Pisa >>>>>> >>>>>> >>>>> >>>>> ######################################################################## >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Gerard Bernabeu AltayĆ³* >>>> Deputy Department Head >>>> >>>> Distributed Computing Services Operations >>>> Fermi National Accelerator Laboratory >>>> 630 840 6509 office >>>> www.fnal.gov >>>> >>>> > > > -- > Tommaso Boccali > INFN Pisa > -- Tommaso Boccali INFN Pisa ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1