Print

Print


Summary
In xrootd-5 the cksum paths (at least for hdfs) sometimes are calculated by prefixing "/cksums/" to the Logical File Name (LFN) which is the file name requested by the user (e.g. root://foo.bar:1094/LFN) instead of the Physical File Name(PFN) as it was always done before.

The PFN is is the actual location of the file within the server and it is translated from the LFN by the CMS Trivial File Catalog through a plugin in XRootD with the following directive:
oss.namelib /usr/lib64/libXrdCmsTfc.so file:/etc/xrootd/storage.xml?protocol=hadoop

Relevant Versions:
xrootd : 5.0.2-1.osgup.el7.x86_64
xrootd-hdfs: 2.1.8-1.1.osgup.el7.x86_64
xrootd-cmstfc: 1.5.2-6.osgup.el7.x86_64

Observations:

  1. One could be inclined to blame the mapping of LFN -> PFN but I have seen cases in which the same file (exact same LFN) gets its cksum written in the right place and later on, after being deleted and written again, it hits the wrong case and its cksum gets written using the lfn instead.

  2. When the wrong case is hit, the cksum actually gets written in both the wrong (lfn-based) and the right (pfn-based) places and if you later on request the checksum of one of these files to:
    a) an xrootd-4 door, you'll get it served from the right path
    b) an xrootd-5 door, you'll get it served from the wrong path
    but you will always get the right cksum

The above makes this issue to seem harmless, but I guess the problem arises when the lfn-based path makes no sense or it's not writable in the server.

Examples
Here you can see the logs when the cksum is written in the correct path[1] and this is the case where it is not[2].
you can see that the list of events are different in both cases:

in [1]:

  1. the file is written
  2. the cksum file is written

in [2]:

  1. The cksum file tries to be written in the wrong path and raises a hadoop exception
  2. the normal file gets written
  3. the cksum files gets written in the wrong path, this time successfully

[1]

201110 03:10:51 31695 XrootdBridge: uscms.7583:245@sdsc-24 login as uscms
File we will access: /cms/phedex/store/temp/user/mdyulend.4af711dff5b10062beb8e2ab48d501b23799834a/phys_b2g/BprimeKit_Ntuples_CMSSW_106X/ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/crab_BPK_106X_ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/201110_080948/0000/log/cmsRun_91.log.tar.gz
Readahead buffer stats for /cms/phedex/store/temp/user/mdyulend.4af711dff5b10062beb8e2ab48d501b23799834a/phys_b2g/BprimeKit_Ntuples_CMSSW_106X/ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/crab_BPK_106X_ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/201110_080948/0000/log/cmsRun_91.log.tar.gz : 0 misses, 0 hits, 0 partial hits, 0 unbuffered, 0 buffered bytes used of 0 read (0.00%)
No lfn2pfn mapping for /cksums//cms/phedex/store/temp/user/mdyulend.4af711dff5b10062beb8e2ab48d501b23799834a/phys_b2g/BprimeKit_Ntuples_CMSSW_106X/ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/crab_BPK_106X_ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/201110_080948/0000/log/cmsRun_91.log.tar.gz
File we will access: /cksums//cms/phedex/store/temp/user/mdyulend.4af711dff5b10062beb8e2ab48d501b23799834a/phys_b2g/BprimeKit_Ntuples_CMSSW_106X/ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/crab_BPK_106X_ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/201110_080948/0000/log/cmsRun_91.log.tar.gz
Readahead buffer stats for /cksums//cms/phedex/store/temp/user/mdyulend.4af711dff5b10062beb8e2ab48d501b23799834a/phys_b2g/BprimeKit_Ntuples_CMSSW_106X/ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/crab_BPK_106X_ST_s-channel_4f_leptonDecays_TuneCP5_13TeV-amcatnlo-pythia8/201110_080948/0000/log/cmsRun_91.log.tar.gz : 0 misses, 0 hits, 0 partial hits, 0 unbuffered, 0 buffered bytes used of 0 read (0.00%)
201110 03:10:51 13937 XrootdXeq: uscms.7583:245@sdsc-24 disc 0:00:00 (send failure)

[2]

201110 03:15:11 4148 XrootdBridge: cmswrite.7602:249@sdsc-42 login as cmswriter
No lfn2pfn mapping for /cksums//store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
File we will access: /cksums//store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
hdfsOpenFile(/cksums//store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error:
RemoteException: File does not exist: /cksums/store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2038)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2008)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1921)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:421)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
java.io.FileNotFoundException: File does not exist: /cksums/store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2038)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2008)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1921)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:421)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
        at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:527)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1284)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1269)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1257)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:326)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:292)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:284)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1588)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:335)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:331)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:331)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /cksums/store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2038)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2008)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1921)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:421)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
        at org.apache.hadoop.ipc.Client.call(Client.java:1504)
        at org.apache.hadoop.ipc.Client.call(Client.java:1441)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:607)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:260)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1282)
        ... 10 more
File we will access: /cms/phedex/store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
Readahead buffer stats for /cms/phedex/store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root : 0 misses, 0 hits, 0 partial hits, 140 unbuffered, 0 buffered bytes used of 0 read (0.00%)
No lfn2pfn mapping for /cksums//store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
File we will access: /cksums//store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root
Readahead buffer stats for /cksums//store/unmerged/RunIISummer20UL16MiniAODAPV/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/230007/3AD46E81-53C5-744F-A287-ABB8A5074FE9.root : 0 misses, 0 hits, 0 partial hits, 0 unbuffered, 0 buffered bytes used of 0 read (0.00%)
201110 03:15:13 15284 XrootdXeq: cmswrite.7602:249@sdsc-42 disc 0:00:02


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1331", "url": "https://github.com/xrootd/xrootd/issues/1331", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1