*** Discussion title: WAN Data Access
Hi all,
FWIW, a Tier-1 also reported to WLCG Ops an issue in xrootd-4.0.3:
https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek141020#Monday"xrootd v.4.0.3 in EPEL7 contains a serious problem that leads to memory
corruption and eventual daemon crash. Discovered at logrotate time. The
script running at 3am systematically coincided (entailed?) the crash."
I'll let Marian and the experts comment if it's better to upgrade to
4.0.3 anyway, or to wait for the next release instead.
Cheers
N.
On 10/21/2014 04:36 AM, Marian Zvada wrote:
*** Discussion title: WAN Data Access
Hi Samir,
v4.0.0 version has indeed several issues (IPv6 especially) and it's
recommended go for 4.0.3. This is last production release as of xrootd
developers can tell:
http://xrootd.org/dload.html
Currently only in osg-testing repo, but it doesn't bring higher risk
than stay on 4.0.0.:
http://repo.grid.iu.edu/osg/3.2/el6/testing/x86_64/
So I'd recommend reconsider.
My 2c,
Marian
On 10/20/14, 2:14 PM, Samir Cury wrote:
*** Discussion title: WAN Data Access
Hi Brian,
I upgraded one of the servers last week, with the (hopefully) latest
packages in OSG 3.2 stable :
xrootd4-libs-4.0.0-1.9.osg32.el6.x86_64
xrootd-lcmaps-0.0.7-7.osg32.el6.x86_64
xrootd4-client-libs-4.0.0-1.9.osg32.el6.x86_64
xrootd4-4.0.0-1.9.osg32.el6.x86_64
xrootd-hdfs-1.8.4-2.osg32.el6.x86_64
xrootd4-server-libs-4.0.0-1.9.osg32.el6.x86_64
xrootd-cmstfc-1.5.1-8.osg32.el6.x86_64
And this is the server which presented the problem.
I'm not really considering upgrading to testing versions for stability reasons.
Good news is that this error didn't appear again so far.
Thanks,
Samir
On Sun, Oct 19, 2014 at 7:05 PM, Brian Bockelman <[log in to unmask]> wrote:
*** Discussion title: WAN Data Access
Hi Samir,
What version of Xrootd are you using? There are some known IPv6 issues prior to 4.0.3.
Brian
On Oct 18, 2014, at 3:02 PM, Samir Cury <[log in to unmask]> wrote:
*** Discussion title: WAN Data Access
Interesting, looking at CMSD logs :
141018 12:49:01 15811 Login: xrootd.unl.edu login failed; timed out
141018 12:49:01 15811 Manager: manager.0:[log in to unmask] removed;
lost connection
141018 12:49:13 15811 XrdSetIF: Skipping duplicate public interface
[2600:900:6:1101:5054:ff:fe00:70cb]
141018 12:50:13 15811 Login: xrootd.unl.edu login failed; timed out
141018 12:50:13 15811 Manager: manager.0:[log in to unmask] removed;
lost connection
141018 12:50:25 15811 XrdSetIF: Skipping duplicate public interface
[2600:900:6:1101:5054:ff:fe00:70cb]
141018 12:51:25 15811 Login: xrootd.unl.edu login failed; timed out
141018 12:51:25 15811 Manager: manager.0:[log in to unmask] removed;
lost connection
141018 12:51:37 15811 XrdSetIF: Skipping duplicate public interface
[2600:900:6:1101:5054:ff:fe00:70cb]
141018 12:52:37 15811 Login: xrootd.unl.edu login failed; timed out
Looks like cmsd has been given an IPv6 service to talk to? We're using
the FNAL central redirector as recommended.
Configuration was fine, everything worked after restart :
141018 12:57:17 001 Xrd: main:
root://cmsxrootd.fnal.gov:1094//store/test/xrootd/T2_US_Caltech/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root
--> /tmp/teste2.root
141018 12:57:17 8658 Xrd: Read: Hole in the cache: offs=0, len=8388608
^Crootd] Total 492.25 MB |==========>.........| 53.63 % [18.7 MB/s]
-bash-4.1$ nslookup xrootd.unl.edu
Cheers,
Samir