Print

Print


With XRootD 5.4.1 running on CentOS Linux 7.9.2009, we have observed `xrootd` failing initialization after a re-start. The `xrootd.log` shows `xrootd` attempting to delete files listed in `/var/run/xrootd/t2wisc/.ofs/posc.log`, but each attempt fails with "permission denied". After these failed attempts, the log reports:
```
------ File system server initialization failed.
220616 04:46:12 2228 XrootdConfig: Unable to load base file system using default
------ xroot protocol initialization failed.
220616 04:46:12 2228 XrdProtocol: Protocol xroot could not be loaded
------ xrootd [log in to unmask]:-1 initialization failed.
```
We have discovered that deleting the `/var/run/xrootd/t2wisc/.ofs/posc.log` and then starting `xrootd` results in a successful start-up without errors.

We have hundreds of xrootd nodes, but only a handful show this initialization failure. On most of our nodes the `posc.log` is 64 bytes in size. On the nodes that show the failures, the `posc.log` is larger and seems to contain file names.

Here is an example of a file that helps trigger this failure. Before xrootd re-start, xrootd reports a problem with this file:
```
220612 12:19:18 5913 acc_Audit: cmspilot.367:188@[::ffff:129.93.227.92] grant gsi osg_cmsuser@[::ffff:18.12.8.19] create /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz
220612 12:19:18 5913 hdfs_stat: Unable to stat /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz; No such file or directory
File we will access: /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz
220612 12:19:18 5913 ofs_open: cmspilot.367:188@[::ffff:129.93.227.92] Unable to fchmod /store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz; is a directory
hdfsDelete(path=/store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz, recursive=0): FileSystem#delete error:
RemoteException: Permission denied: user=nobody, access=WRITE, inode="/store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log":osg_cmsuser:cmsprod:drwxr-xr-x
```
This file is created as a zero-size file:
```
-rw-r--r-- 1 osg_cmsuser 0 Jun 12 12:19 /hdfs/store/temp/user/santocch.a98004385da0008d4e6aefa9b0a59973eeb30ede/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/crab_wjets-madgraph-2018/220612_150920/0000/log/cmsRun_9.log.tar.gz
```
It is not a directory, even though the `xrootd.log` claims it "is a directory".
After this initial failure to delete the file, when `xrootd` is re-started, it again fails to delete the file, along with other files, and then reports that initialization failed.

There are at least two questions. Why does `xrootd` fail to delete the file in the first place, even though it would seem to have permission. And then second, why does `xrootd` keep trying to delete files that it didn't have permission to delete when it first tried?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1721
You are receiving this because you are subscribed to this thread.

Message ID: <[log in to unmask]>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1