Print

Print


Hi Tommaso,

You mentioned that the fnal.goc addresses are worer nodes. Why are they 
connecting to the cmsd?

Andy

On Fri, 4 Sep 2015, Tommaso Boccali wrote:

> By the way, yesterday i upgraded the eu redir to 423. Seems to work fine,
> even if the statistics is less than 1 day for the moment....
>
> Tom
> Il 04/set/2015 01:33 AM, "Gerard Bernabeu" <[log in to unmask]> ha scritto:
>
>> the fnal.gov address is from a WorkerNode (probably running a CMS job).
>>
>> Gerard
>>
>> On Thu, Sep 3, 2015 at 4:54 PM, Andrew Hanushevsky <[log in to unmask]>
>> wrote:
>>
>>> Hi Tommaso,
>>>
>>> What are fw-nat-inside-outside.gridka.de and cmswn2148.fnal.gov? The
>>> message clearly shows that whatever they sent over was incorrect. Yes,
>>> 4.2.2 would crash in this case, sigh.
>>>
>>> Andy
>>>
>>> On Wed, 26 Aug 2015, Tommaso Boccali wrote:
>>>
>>>> ciao, another piece of info which might be interesting:
>>>>
>>>> I was looking into the bari eu redir, which uses xrootd
>>>>
>>>> xrootd-4.1.1-1.el5
>>>>
>>>> the cmsd.log has TONS of messages like
>>>>
>>>> 150826 05:18:00 30442 XrdInet: Accepted connection from
>>>> [log in to unmask]
>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD
>>> 90
>>>> attached to poller 0; num=23
>>>> 150826 05:18:00 30442 Pup: buffer overrun unpacking short arg 0: ident.
>>>> 150826 05:18:00 30442 Login: fw-nat-inside-outside.gridka.de login
>>> failed;
>>>> invalid login data
>>>> 150826 05:18:00 30442 ?:[log in to unmask] XrdPoll: FD
>>> 90
>>>> detached from poller 0; num=22
>>>>
>>>> from many servers, most from FNAL
>>>>
>>>> 150826 21:41:28 3396 Login: cmswn2148.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:28 3436 Login: cmswn2146.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:35 3461 Login: cmswn2131.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:36 2475 Login: cmswn2158.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:40 3461 Login: cmswn2150.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:45 3458 Login: cmswn2160.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:47 3396 Login: cmswn2131.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:50 3461 Login: cmswn2140.fnal.gov login failed; invalid
>>> login
>>>> data
>>>> 150826 21:41:56 3458 Login: cmswn2147.fnal.gov login failed; invalid
>>> login
>>>> data
>>>>
>>>> apparently, we did not notice since 4.1.1-1 does not crash as 4.2.2, but
>>>> moves along ...
>>>>
>>>> tom
>>>>
>>>> On Tue, Aug 25, 2015 at 9:07 PM, Marian Zvada <[log in to unmask]>
>>> wrote:
>>>>
>>>>> On 8/25/15 11:58 AM, Tommaso Boccali wrote:
>>>>>
>>>>>> Well, but: isn't th global redir only subscribed by regional redirs
>>> (so
>>>>>> not many)?
>>>>>>
>>>>>
>>>>> you're right, I neglected this fact (outsmarted myself ;))...
>>>>>
>>>>> Probably eu redirs are the most connected, with close to 64 cmsd
>>>>>> entering... It s just normal we saw the problem there.
>>>>>>
>>>>>
>>>>> ok, this is alarming and we should revise current setup and introduce
>>> more
>>>>> redirectors if needed in EU. Btw, I recently talked with Andy about
>>> this -
>>>>> it looks much more promising way to handle 64 limits - to think about
>>>>> supervisors:
>>>>>
>>>>> http://xrootd.org/doc/dev42/cms_config.htm#_Toc405927050
>>>>>
>>>>> I'm going to do this in transitional federation where there is one
>>> global
>>>>> redirector for all T3s and then those subscribers who will be kicked
>>> off
>>>>> from production federation and subscribed there instead.
>>>>>
>>>>> -Marian
>>>>>
>>>>> Ifca said it has 336-1, which is fairly common. I guess it cannot be
>>> due
>>>>>> to (just) the release....
>>>>>>
>>>>>> Andy, did you understand the source of the bad Iogin data? Is it worth
>>>>>> trying and debugging it?
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> Il 25/ago/2015 06:21 PM, "Jan Iven" <[log in to unmask]
>>>>>> <mailto:[log in to unmask]>> ha scritto:
>>>>>>
>>>>>>     On 08/25/2015 05:56 PM, Marian Zvada wrote:
>>>>>>
>>>>>>         Hi Tom,
>>>>>>
>>>>>>     [..]
>>>>>>
>>>>>>         yeah, that is my guess too, but then we have global
>>> redirectors
>>>>>>         at CERN
>>>>>>         running 4.2.2 dealing with hell lot of cmsd subscriptions so
>>> I'd
>>>>>>         expect
>>>>>>         some visible trouble there as well. So maybe we're lucky there
>>>>>>         too so
>>>>>>         far... (I believe that autorestart of cmsd if it crashes is
>>>>>> disabled
>>>>>>         there, Jan?)
>>>>>>
>>>>>>
>>>>>>     No, the CMS global redirectors are on CC7, and will auto-restart
>>>>>>     cmsd on "unclean" exit (Restart=on-abort).  I hope that SEGV
>>> counts
>>>>>>     as such...
>>>>>>
>>>>>>     Not sure whether we'd even notice the occasional restart, unless
>>>>>>     another tool (abrt) picks this up.
>>>>>>
>>>>>>     Cheers
>>>>>>     jan
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Tommaso Boccali
>>>> INFN Pisa
>>>>
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>
>>
>>
>>
>> --
>> *Gerard Bernabeu Altayó*
>> Deputy Department Head
>>
>> Distributed Computing Services Operations
>> Fermi National Accelerator Laboratory
>> 630 840 6509 office
>> www.fnal.gov
>>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1