With XRootD 5.4.1 running on CentOS Linux 7.9.2009, we have observed xrootd
failing initialization after a re-start. The xrootd.log
shows xrootd
attempting to delete files listed in /var/run/xrootd/t2wisc/.ofs/posc.log
, but each attempt fails with "permission denied". After these failed attempts, the log reports:
------ File system server initialization failed.
220616 04:46:12 2228 XrootdConfig: Unable to load base file system using default
------ xroot protocol initialization failed.
220616 04:46:12 2228 XrdProtocol: Protocol xroot could not be loaded
------ xrootd [log in to unmask]:-1 initialization failed.
We have discovered that deleting the /var/run/xrootd/t2wisc/.ofs/posc.log
and then starting xrootd
results in a successful start-up without errors.
We have hundreds of xrootd nodes, but only a handful show this initialization failure. On most of our nodes the posc.log
is 64 bytes in size. On the nodes that show the failures, the posc.log
is larger and seems to contain file names.
Here is an example of a file that helps trigger this failure. Before xrootd re-start, xrootd reports a problem with this file:
220612 12:19:18 5913 acc_Audit: cmspilot.367:188@[::ffff:129.93.227.92] grant gsi osg_cmsuser@[::ffff:18.12.8.19] create /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz
220612 12:19:18 5913 hdfs_stat: Unable to stat /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz; No such file or directory
File we will access: /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz
220612 12:19:18 5913 ofs_open: cmspilot.367:188@[::ffff:129.93.227.92] Unable to fchmod /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz; is a directory
hdfsDelete(path=/store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz, recursive=0): FileSystem#delete error:
RemoteException: Permission denied: user=nobody, access=WRITE, inode="/store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log":osg_cmsuser:cmsprod:drwxr-xr-x
This file is created as a zero-size file:
-rw-r--r-- 1 osg_cmsuser 0 Jun 12 12:19 /hdfs/store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz
It is not a directory, even though the xrootd.log
claims it "is a directory".
After this initial failure to delete the file, when xrootd
is re-started, it again fails to delete the file, along with other files, and then reports that initialization failed.
There are at least two questions. Why does xrootd
fail to delete the file in the first place, even though it would seem to have permission. And then second, why does xrootd
keep trying to delete files that it didn't have permission to delete when it first tried?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1