Hi folks,

Ignore the thread, but take a look at the link from Monday (SEGV on log rotation).

Has the bug been reported anywhere besides the WLCG meeting?  I haven’t seen it in the tracker...

Brian

Begin forwarded message:

Date: October 21, 2014 at 5:46:20 AM CDT
To: <[log in to unmask]>
From: Nicolo Magini <[log in to unmask]>
Subject: Re: Caltech xrootd failures


*** Discussion title: WAN Data Access

Hi all,

FWIW, a Tier-1 also reported to WLCG Ops an issue in xrootd-4.0.3:

https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek141020#Monday

"xrootd v.4.0.3 in EPEL7 contains a serious problem that leads to memory
corruption and eventual daemon crash. Discovered at logrotate time. The
script running at 3am systematically coincided (entailed?) the crash."

I'll let Marian and the experts comment if it's better to upgrade to
4.0.3 anyway, or to wait for the next release instead.

Cheers
N.

On 10/21/2014 04:36 AM, Marian Zvada wrote:

*** Discussion title: WAN Data Access

Hi Samir,

v4.0.0 version has indeed several issues (IPv6 especially) and it's
recommended go for 4.0.3. This is last production release as of xrootd
developers can tell:

http://xrootd.org/dload.html

Currently only in osg-testing repo, but it doesn't bring higher risk
than stay on 4.0.0.:
http://repo.grid.iu.edu/osg/3.2/el6/testing/x86_64/

So I'd recommend reconsider.

My 2c,
Marian

On 10/20/14, 2:14 PM, Samir Cury wrote:

*** Discussion title: WAN Data Access

Hi Brian,

I upgraded one of the servers last week, with the  (hopefully) latest
packages in OSG 3.2 stable :

xrootd4-libs-4.0.0-1.9.osg32.el6.x86_64
xrootd-lcmaps-0.0.7-7.osg32.el6.x86_64
xrootd4-client-libs-4.0.0-1.9.osg32.el6.x86_64
xrootd4-4.0.0-1.9.osg32.el6.x86_64
xrootd-hdfs-1.8.4-2.osg32.el6.x86_64
xrootd4-server-libs-4.0.0-1.9.osg32.el6.x86_64
xrootd-cmstfc-1.5.1-8.osg32.el6.x86_64

And this is the server which presented the problem.

I'm not really considering upgrading to testing versions for stability reasons.

Good news is that this error didn't appear again so far.

Thanks,
Samir

On Sun, Oct 19, 2014 at 7:05 PM, Brian Bockelman <[log in to unmask]> wrote:

*** Discussion title: WAN Data Access

Hi Samir,

What version of Xrootd are you using?  There are some known IPv6 issues prior to 4.0.3.

Brian

On Oct 18, 2014, at 3:02 PM, Samir Cury <[log in to unmask]> wrote:


*** Discussion title: WAN Data Access

Interesting, looking at CMSD logs :

141018 12:49:01 15811 Login: xrootd.unl.edu login failed; timed out
141018 12:49:01 15811 Manager: manager.0:[log in to unmask] removed;
lost connection
141018 12:49:13 15811 XrdSetIF: Skipping duplicate public interface
[2600:900:6:1101:5054:ff:fe00:70cb]
141018 12:50:13 15811 Login: xrootd.unl.edu login failed; timed out
141018 12:50:13 15811 Manager: manager.0:[log in to unmask] removed;
lost connection
141018 12:50:25 15811 XrdSetIF: Skipping duplicate public interface
[2600:900:6:1101:5054:ff:fe00:70cb]
141018 12:51:25 15811 Login: xrootd.unl.edu login failed; timed out
141018 12:51:25 15811 Manager: manager.0:[log in to unmask] removed;
lost connection
141018 12:51:37 15811 XrdSetIF: Skipping duplicate public interface
[2600:900:6:1101:5054:ff:fe00:70cb]
141018 12:52:37 15811 Login: xrootd.unl.edu login failed; timed out

Looks like cmsd has been given an IPv6 service to talk to? We're using
the FNAL central redirector as recommended.

Configuration was fine, everything worked after restart :

141018 12:57:17 001 Xrd: main:
root://cmsxrootd.fnal.gov:1094//store/test/xrootd/T2_US_Caltech/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root
--> /tmp/teste2.root
141018 12:57:17 8658 Xrd: Read: Hole in the cache: offs=0, len=8388608
^Crootd] Total 492.25 MB |==========>.........| 53.63 % [18.7 MB/s]
-bash-4.1$ nslookup xrootd.unl.edu

Cheers,
Samir




Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1