ciao, coming back to this, a few months later (on 4.2.3) i still see TONS of 160218 17:08:17 42171 Pup: buffer overrun unpacking short arg 0: ident. 160218 17:08:17 42171 Login: gridlink.hephy.oeaw.ac.at login failed; invalid login data ... 160218 17:07:52 42118 Login: grid-wn080.physik.rwth-aachen.de login failed; invalid login data ... 160218 17:04:06 40163 Login: fw-nat-inside-outside.gridka.de login failed; invalid login data ... 160218 16:53:56 25501 Login: wna033.jinr-t1.ru login failed; invalid login data in cmsd.log not sure it has any bad effect ... but: should we care? this is at least 1 Hz, and comes form multiple sites .... tom On Fri, Sep 4, 2015 at 7:55 AM, Andrew Hanushevsky <[log in to unmask]> wrote: > Hi Tommaso, > > You mentioned that the fnal.goc addresses are worer nodes. Why are they > connecting to the cmsd? > > Andy > > > On Fri, 4 Sep 2015, Tommaso Boccali wrote: > > By the way, yesterday i upgraded the eu redir to 423. Seems to work fine, >> even if the statistics is less than 1 day for the moment.... >> >> Tom >> Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha scritto: >> >> the fnal.gov address is from a WorkerNode (probably running a CMS job). >>> >>> Gerard >>> >>> On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky < >>> [log in to unmask]> >>> wrote: >>> >>> Hi Tommaso, >>>> >>>> What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The >>>> message clearly shows that whatever they sent over was incorrect. Yes, >>>> 4.2.2 would crash in this case, sigh. >>>> >>>> Andy >>>> >>>> On Wed, 26 Aug 2015, Tommaso Boccali wrote: >>>> >>>> ciao, another piece of info which might be interesting: >>>>> >>>>> I was looking into the bari eu redir, which uses xrootd >>>>> >>>>> xrootd-4.1.1-1.el5 >>>>> >>>>> the cmsd.log has TONS of messages like >>>>> >>>>> 150826 05:18:00 30442 XrdInet: Accepted connection from >>>>> [log in to unmask] >>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD >>>>> >>>> 90 >>>> >>>>> attached to poller 0; num=23 >>>>> 150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0: ident. >>>>> 150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login >>>>> >>>> failed; >>>> >>>>> invalid login data >>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD >>>>> >>>> 90 >>>> >>>>> detached from poller 0; num=22 >>>>> >>>>> from many servers, most from FNAL >>>>> >>>>> 150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> 150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid >>>>> >>>> login >>>> >>>>> data >>>>> >>>>> apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2, >>>>> but >>>>> moves along ... >>>>> >>>>> tom >>>>> >>>>> On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]> >>>>> >>>> wrote: >>>> >>>>> >>>>> On 8/25/15 11:58 AM, Tommaso Boccali wrote: >>>>>> >>>>>> Well, but: isn't th global redir only subscribed by regional redirs >>>>>>> >>>>>> (so >>>> >>>>> not many)? >>>>>>> >>>>>>> >>>>>> you're right, I neglected this fact (outsmarted myself ;))... >>>>>> >>>>>> Probably eu redirs are the most connected, with close to 64 cmsd >>>>>> >>>>>>> entering... It s just normal we saw the problem there. >>>>>>> >>>>>>> >>>>>> ok, this is alarming and we should revise current setup and introduce >>>>>> >>>>> more >>>> >>>>> redirectors if needed in EU. Btw, I recently talked with Andy about >>>>>> >>>>> this - >>>> >>>>> it looks much more promising way to handle 64 limits - to think about >>>>>> supervisors: >>>>>> >>>>>> http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050 >>>>>> >>>>>> I'm going to do this in transitional federation where there is one >>>>>> >>>>> global >>>> >>>>> redirector for all T3s and then those subscribers who will be kicked >>>>>> >>>>> off >>>> >>>>> from production federation and subscribed there instead. >>>>>> >>>>>> -Marian >>>>>> >>>>>> Ifca said it has 336-1, which is fairly common. I guess it cannot be >>>>>> >>>>> due >>>> >>>>> to (just) the release.... >>>>>>> >>>>>>> Andy, did you understand the source of the bad Iogin data? Is it >>>>>>> worth >>>>>>> trying and debugging it? >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask] >>>>>>> <mailto:[log in to unmask]>> ha scritto: >>>>>>> >>>>>>> On 08/25/2015 05:56 PM, Marian Zvada wrote: >>>>>>> >>>>>>> Hi Tom, >>>>>>> >>>>>>> [..] >>>>>>> >>>>>>> yeah, that is my guess too, but then we have global >>>>>>> >>>>>> redirectors >>>> >>>>> at CERN >>>>>>> running 4.2.2 dealing with hell lot of cmsd subscriptions so >>>>>>> >>>>>> I'd >>>> >>>>> expect >>>>>>> some visible trouble there as well. So maybe we're lucky >>>>>>> there >>>>>>> too so >>>>>>> far... (I believe that autorestart of cmsd if it crashes is >>>>>>> disabled >>>>>>> there, Jan?) >>>>>>> >>>>>>> >>>>>>> No, the CMS global redirectors are on CC7, and will auto-restart >>>>>>> cmsd on "unclean" exit (Restart=on-abort). I hope that SEGV >>>>>>> >>>>>> counts >>>> >>>>> as such... >>>>>>> >>>>>>> Not sure whether we'd even notice the occasional restart, unless >>>>>>> another tool (abrt) picks this up. >>>>>>> >>>>>>> Cheers >>>>>>> jan >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Tommaso Boccali >>>>> INFN Pisa >>>>> >>>>> >>>> ######################################################################## >>>> Use REPLY-ALL to reply to list >>>> >>>> To unsubscribe from the XROOTD-L list, click the following link: >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>>> >>>> >>> >>> >>> -- >>> *Gerard Bernabeu AltayĆ³* >>> Deputy Department Head >>> >>> Distributed Computing Services Operations >>> Fermi National Accelerator Laboratory >>> 630 840 6509 office >>> www.fnal.gov >>> >>> -- Tommaso Boccali INFN Pisa ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1