Print

Print


ciao andrew, are you sure these are connections from servers and not from
clients? I Ask since their names are suspicious:

160219 08:53:38 16932 Pup: buffer overrun unpacking short arg 0: ident.
160219 08:53:38 16932 Login: sbgwn20.in2p3.fr login failed; invalid login
data

160219 08:57:02 16934 Pup: buffer overrun unpacking short arg 0: ident.
160219 08:57:02 16934 Login: gw1.cis.gov.pl login failed; invalid login data

160219 08:45:08 64858 Pup: buffer overrun unpacking short arg 0: ident.
160219 08:45:08 64858 Login: sbgwn12.in2p3.fr login failed; invalid login
data

...
160219 08:56:34 24733 Login: g28n03.hep.wisc.edu login failed; invalid
login data
...

they seem WN names, not server names ....

still, I am asking them

tom

On Thu, Feb 18, 2016 at 10:25 PM, Andrew Hanushevsky <[log in to unmask]>
wrote:

> Hi Tommaso,
>
> Yes, you should care. These sites are not joining your cluster as they
> cannot login. Could you tell me what version the sites that are getting the
> errors running?
>
> Andy
>
> *From:* Tommaso Boccali <[log in to unmask]>
> *Sent:* Thursday, February 18, 2016 8:12 AM
> *To:* Andrew Hanushevsky <[log in to unmask]>
> *Cc:* Gerard Bernabeu <[log in to unmask]> ; [log in to unmask] ; Marian
> Zvada <[log in to unmask]> ; Jan Iven <[log in to unmask]>
> *Subject:* Re: problem in transitioning a redirector from 3.3.6 to 4.2.2
>
> ciao, coming back to this, a few months later (on 4.2.3)
>
> i still see TONS of
>
> 160218 17:08:17 42171 Pup: buffer overrun unpacking short arg 0: ident.
> 160218 17:08:17 42171 Login: gridlink.hephy.oeaw.ac.at login failed;
> invalid login data
> ...
> 160218 17:07:52 42118 Login: grid-wn080.physik.rwth-aachen.de login
> failed; invalid login data
> ...
> 160218 17:04:06 40163 Login: fw-nat-inside-outside.gridka.de login
> failed; invalid login data
> ...
> 160218 16:53:56 25501 Login: wna033.jinr-t1.ru login failed; invalid
> login data
>
>
> in cmsd.log
>
> not sure it has any bad effect ... but: should we care?
>
> this is at least 1 Hz, and comes form multiple sites ....
>
>
> tom
>
> On Fri, Sep 4, 2015 at 7:55 AM, Andrew Hanushevsky <[log in to unmask]>
> wrote:
>
>> Hi Tommaso,
>>
>> You mentioned that the fnal.goc addresses are worer nodes. Why are they
>> connecting to the cmsd?
>>
>> Andy
>>
>>
>> On Fri, 4 Sep 2015, Tommaso Boccali wrote:
>>
>> By the way, yesterday i upgraded the eu redir to 423. Seems to work fine,
>>> even if the statistics is less than 1 day for the moment....
>>>
>>> Tom
>>> Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha
>>> scritto:
>>>
>>> the fnal.gov address is from a WorkerNode (probably running a CMS job).
>>>>
>>>> Gerard
>>>>
>>>> On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky <
>>>> [log in to unmask]>
>>>> wrote:
>>>>
>>>> Hi Tommaso,
>>>>>
>>>>> What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The
>>>>> message clearly shows that whatever they sent over was incorrect. Yes,
>>>>> 4.2.2 would crash in this case, sigh.
>>>>>
>>>>> Andy
>>>>>
>>>>> On Wed, 26 Aug 2015, Tommaso Boccali wrote:
>>>>>
>>>>> ciao, another piece of info which might be interesting:
>>>>>>
>>>>>> I was looking into the bari eu redir, which uses xrootd
>>>>>>
>>>>>> xrootd-4.1.1-1.el5
>>>>>>
>>>>>> the cmsd.log has TONS of messages like
>>>>>>
>>>>>> 150826 05:18:00 30442 XrdInet: Accepted connection from
>>>>>> [log in to unmask]
>>>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll:
>>>>>> FD
>>>>>>
>>>>> 90
>>>>>
>>>>>> attached to poller 0; num=23
>>>>>> 150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0:
>>>>>> ident.
>>>>>> 150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login
>>>>>>
>>>>> failed;
>>>>>
>>>>>> invalid login data
>>>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll:
>>>>>> FD
>>>>>>
>>>>> 90
>>>>>
>>>>>> detached from poller 0; num=22
>>>>>>
>>>>>> from many servers, most from FNAL
>>>>>>
>>>>>> 150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>> 150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid
>>>>>>
>>>>> login
>>>>>
>>>>>> data
>>>>>>
>>>>>> apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2,
>>>>>> but
>>>>>> moves along ...
>>>>>>
>>>>>> tom
>>>>>>
>>>>>> On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On 8/25/15 11:58 AM, Tommaso Boccali wrote:
>>>>>>>
>>>>>>> Well, but: isn't th global redir only subscribed by regional redirs
>>>>>>>>
>>>>>>> (so
>>>>>
>>>>>> not many)?
>>>>>>>>
>>>>>>>>
>>>>>>> you're right, I neglected this fact (outsmarted myself ;))...
>>>>>>>
>>>>>>> Probably eu redirs are the most connected, with close to 64 cmsd
>>>>>>>
>>>>>>>> entering... It s just normal we saw the problem there.
>>>>>>>>
>>>>>>>>
>>>>>>> ok, this is alarming and we should revise current setup and introduce
>>>>>>>
>>>>>> more
>>>>>
>>>>>> redirectors if needed in EU. Btw, I recently talked with Andy about
>>>>>>>
>>>>>> this -
>>>>>
>>>>>> it looks much more promising way to handle 64 limits - to think about
>>>>>>> supervisors:
>>>>>>>
>>>>>>> http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050
>>>>>>>
>>>>>>> I'm going to do this in transitional federation where there is one
>>>>>>>
>>>>>> global
>>>>>
>>>>>> redirector for all T3s and then those subscribers who will be kicked
>>>>>>>
>>>>>> off
>>>>>
>>>>>> from production federation and subscribed there instead.
>>>>>>>
>>>>>>> -Marian
>>>>>>>
>>>>>>> Ifca said it has 336-1, which is fairly common. I guess it cannot be
>>>>>>>
>>>>>> due
>>>>>
>>>>>> to (just) the release....
>>>>>>>>
>>>>>>>> Andy, did you understand the source of the bad Iogin data? Is it
>>>>>>>> worth
>>>>>>>> trying and debugging it?
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>> Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask]
>>>>>>>> <mailto:[log in to unmask]>> ha scritto:
>>>>>>>>
>>>>>>>>     On 08/25/2015 05:56 PM, Marian Zvada wrote:
>>>>>>>>
>>>>>>>>         Hi Tom,
>>>>>>>>
>>>>>>>>     [..]
>>>>>>>>
>>>>>>>>         yeah, that is my guess too, but then we have global
>>>>>>>>
>>>>>>> redirectors
>>>>>
>>>>>>         at CERN
>>>>>>>>         running 4.2.2 dealing with hell lot of cmsd subscriptions so
>>>>>>>>
>>>>>>> I'd
>>>>>
>>>>>>         expect
>>>>>>>>         some visible trouble there as well. So maybe we're lucky
>>>>>>>> there
>>>>>>>>         too so
>>>>>>>>         far... (I believe that autorestart of cmsd if it crashes is
>>>>>>>> disabled
>>>>>>>>         there, Jan?)
>>>>>>>>
>>>>>>>>
>>>>>>>>     No, the CMS global redirectors are on CC7, and will auto-restart
>>>>>>>>     cmsd on "unclean" exit (Restart=on-abort).  I hope that SEGV
>>>>>>>>
>>>>>>> counts
>>>>>
>>>>>>     as such...
>>>>>>>>
>>>>>>>>     Not sure whether we'd even notice the occasional restart, unless
>>>>>>>>     another tool (abrt) picks this up.
>>>>>>>>
>>>>>>>>     Cheers
>>>>>>>>     jan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tommaso Boccali
>>>>>> INFN Pisa
>>>>>>
>>>>>>
>>>>>
>>>>> ########################################################################
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Gerard Bernabeu AltayĆ³*
>>>> Deputy Department Head
>>>>
>>>> Distributed Computing Services Operations
>>>> Fermi National Accelerator Laboratory
>>>> 630 840 6509 office
>>>> www.fnal.gov
>>>>
>>>>
>
>
> --
> Tommaso Boccali
> INFN Pisa
>



-- 
Tommaso Boccali
INFN Pisa

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1