Hi Olivier, This is to be expected when you join a federation and all of a sudden have to deal with 10 of thousands new requests. There are several mitigations, each with it's own set of drawbacks. I would suggest you consult with cms experts on some of these. a) You can artificially restrict the number of threads in the cmsd. The drawback here is that there is no distunction between local requests and federated requests and you may wind up crowding out local requests. Use the xrd.sched directive for this: https://xrootd.slac.stanford.edu/doc/dev55/xrd_config.htm#_Toc88513978 b) You can specify the percentage of federated requests you are wiling to handle. The is not as precise as it seems as the global redirector might not have any choice but to violate your request if your site has is the only source of a file or when every other site has reached its global share. Use the cms.sched directive for this: https://xrootd.slac.stanford.edu/doc/dev55/xrd_config.htm#_Toc88513978 c) Scale up to meet the demand. Nothing stops you from running a cluster of redirectors for the federation. By using the all.manager all option the load is equally split amongst all of the available sub-redirectors. The drawback is that you need more hardware to do this but given the load you are experiencing that's the only solution available without constaining the local resource. See the all.manager directive: https://xrootd.slac.stanford.edu/doc/dev54/cms_config.htm#_Toc53611061 The issue here is that your current setup is insuffient for the number of requests the federation is trying to handle. I would suggest talking to the cms federation managers to see if you can employ one of the above options or whether they have other alternatives. Andy On Thu, 19 May 2022, Olivier Devroede wrote: > Dear xrootd experts, > > we upgraded to version 5.4.2 of xrootd two days ago. > > It worked flawlessly for 1.5 days, but now cmsd spawns thousands of > threads. This causes huge loads (up to 30.000) on the machine. > Restarting the daemon does not solve the problem. > > Extra info: we are part of the xrootd federation of the cms experiment. > > Do you have any idea how we can fix/debug this problem? The logs do not > tell us a lot. It's mostly requests for files in the cmsd logs [1] and > nothing special in the xrootd logs [2] > > Any help is greatly appreciated. > > Olivier. > > [1] cmsd.log > > 220519 15:42:40 22203 cms_Dispatch: > manager.0:[log in to unmask] for state dlen=156 > 220519 15:42:40 6266 manager.0:[log in to unmask] > cms_do_State: /store/user/nshadski/c > ustomNano/QCD_Pt_470to600_TuneCP5_13TeV_pythia8/KIT_CustomNanoV9_MC_2016postVFP/211229_123611/0000/MC_2016p > ostVFP_NanoAODv9_1-32.root > 220519 15:42:40 22265 cms_Dispatch: manager.0:[log in to unmask] > for state dlen=156 > 220519 15:42:40 6267 manager.0:[log in to unmask] cms_do_State: > /store/user/nshadski/customNano/QCD > _Pt_470to600_TuneCP5_13TeV_pythia8/KIT_CustomNanoV9_MC_2016postVFP/211229_123611/0000/MC_2016postVFP_NanoAO > Dv9_1-32.root > 220519 15:42:40 22265 cms_Dispatch: manager.0:[log in to unmask] > for state dlen=105 > 220519 15:42:40 6268 manager.0:[log in to unmask] cms_do_State: > /store/data/Run2018D/EGamma/MINIAOD > /12Nov2019_UL2018-v4/260000/E8C2D279-4422-E743-904A-4233F0BF230E.root > 220519 15:42:40 22266 cms_Dispatch: manager.0:[log in to unmask] for > state dlen=156 > 220519 15:42:40 6269 manager.0:[log in to unmask] cms_do_State: > /store/user/nshadski/customNano/QCD_Pt_47 > 0to600_TuneCP5_13TeV_pythia8/KIT_CustomNanoV9_MC_2016postVFP/211229_123611/0000/MC_2016postVFP_NanoAODv9_1- > 32.root > > [2] xrootd.log > 220519 15:42:30 22178 sysThrottleManager: Current IO counter is 0; total IO > wait time is 0ms. > 220519 15:42:31 22178 sysThrottleManager: Round ops allocation -1 > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1