Print

Print


Hi Tommaso,

Yes, you should care. These sites are not joining your cluster as they cannot login. Could you tell me what version the sites that are getting the errors running?

Andy

From: Tommaso Boccali 
Sent: Thursday, February 18, 2016 8:12 AM
To: Andrew Hanushevsky 
Cc: Gerard Bernabeu ; [log in to unmask] ; Marian Zvada ; Jan Iven 
Subject: Re: problem in transitioning a redirector from 3.3.6 to 4.2.2

ciao, coming back to this, a few months later (on 4.2.3) 

i still see TONS of

160218 17:08:17 42171 Pup: buffer overrun unpacking short arg 0: ident.
160218 17:08:17 42171 Login: gridlink.hephy.oeaw.ac.at login failed; invalid login data
...
160218 17:07:52 42118 Login: grid-wn080.physik.rwth-aachen.de login failed; invalid login data

...
160218 17:04:06 40163 Login: fw-nat-inside-outside.gridka.de login failed; invalid login data

...
160218 16:53:56 25501 Login: wna033.jinr-t1.ru login failed; invalid login data



in cmsd.log

not sure it has any bad effect ... but: should we care?

this is at least 1 Hz, and comes form multiple sites ....


tom

On Fri, Sep 4, 2015 at 7:55 AM, Andrew Hanushevsky <[log in to unmask]> wrote:

  Hi Tommaso,

  You mentioned that the fnal.goc addresses are worer nodes. Why are they connecting to the cmsd?

  Andy 


  On Fri, 4 Sep 2015, Tommaso Boccali wrote:


    By the way, yesterday i upgraded the eu redir to 423. Seems to work fine,
    even if the statistics is less than 1 day for the moment....

    Tom
    Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha scritto:


      the fnal.gov address is from a WorkerNode (probably running a CMS job).

      Gerard

      On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky <[log in to unmask]>
      wrote:


        Hi Tommaso,

        What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The
        message clearly shows that whatever they sent over was incorrect. Yes,
        4.2.2 would crash in this case, sigh.

        Andy

        On Wed, 26 Aug 2015, Tommaso Boccali wrote:


          ciao, another piece of info which might be interesting:

          I was looking into the bari eu redir, which uses xrootd

          xrootd-4.1.1-1.el5

          the cmsd.log has TONS of messages like

          150826 05:18:00 30442 XrdInet: Accepted connection from
          [log in to unmask]
          150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD

        90

          attached to poller 0; num=23
          150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0: ident.
          150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login

        failed;

          invalid login data
          150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD

        90

          detached from poller 0; num=22

          from many servers, most from FNAL

          150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid

        login

          data
          150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid

        login

          data
          150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid

        login

          data
          150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid

        login

          data
          150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid

        login

          data
          150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid

        login

          data
          150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid

        login

          data
          150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid

        login

          data
          150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid

        login

          data

          apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2, but
          moves along ...

          tom

          On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]>

        wrote:



            On 8/25/15 11:58 AM, Tommaso Boccali wrote:


              Well, but: isn't th global redir only subscribed by regional redirs

        (so

              not many)?



            you're right, I neglected this fact (outsmarted myself ;))...

            Probably eu redirs are the most connected, with close to 64 cmsd

              entering... It s just normal we saw the problem there.



            ok, this is alarming and we should revise current setup and introduce

        more

            redirectors if needed in EU. Btw, I recently talked with Andy about

        this -

            it looks much more promising way to handle 64 limits - to think about
            supervisors:

            http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050

            I'm going to do this in transitional federation where there is one

        global

            redirector for all T3s and then those subscribers who will be kicked

        off

            from production federation and subscribed there instead.

            -Marian

            Ifca said it has 336-1, which is fairly common. I guess it cannot be

        due

              to (just) the release....

              Andy, did you understand the source of the bad Iogin data? Is it worth
              trying and debugging it?

              Tom

              Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask]
              <mailto:[log in to unmask]>> ha scritto:

                  On 08/25/2015 05:56 PM, Marian Zvada wrote:

                      Hi Tom,

                  [..]

                      yeah, that is my guess too, but then we have global

        redirectors

                      at CERN
                      running 4.2.2 dealing with hell lot of cmsd subscriptions so

        I'd

                      expect
                      some visible trouble there as well. So maybe we're lucky there
                      too so
                      far... (I believe that autorestart of cmsd if it crashes is
              disabled
                      there, Jan?)


                  No, the CMS global redirectors are on CC7, and will auto-restart
                  cmsd on "unclean" exit (Restart=on-abort).  I hope that SEGV

        counts

                  as such...

                  Not sure whether we'd even notice the occasional restart, unless
                  another tool (abrt) picks this up.

                  Cheers
                  jan





          --
          Tommaso Boccali
          INFN Pisa



        ########################################################################
        Use REPLY-ALL to reply to list

        To unsubscribe from the XROOTD-L list, click the following link:
        https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1





      --

      *Gerard Bernabeu AltayĆ³*
      Deputy Department Head

      Distributed Computing Services Operations
      Fermi National Accelerator Laboratory
      630 840 6509 office
      www.fnal.gov






-- 

Tommaso Boccali
INFN Pisa

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1