Fixes an issue where if files are not closed before the Python interpreter exits XRootD.client causes the Python interpreter to deadlock.

This can be reproduced in both XRootD 4.12.3 and 5.0.0. If there is any intention to make further releases for the 4.x series it would be nice to have this back ported.

You can find some more details about this below and in: scikit-hep/uproot#504

Reproducer

  1. Create a file called xrootd_bug.py containing:
import XRootD.client
f = XRootD.client.File()
f.open('root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/B2HHH_MagnetDown.root')
  1. Run python xrootd_bug.py

Technical details

This is caused by the atexit function shutting down the worker thread pool:

xrootd/bindings/python/libs/client/finalize.py

Lines 28 to 35 in 912673b

@atexit.register
def finalize():
"""Python atexit handler, will stop all XRootD client threads
(XrdCl JobManager, TaskManager and Poller) in order to ensure
no Python APIs are called after the Python Interpreter gets
finalized.
"""
client.__XrdCl_Stop_Threads()

The debug output shows that it shuts down all the threads and then sends the call to close the file:

***** Calling client.__XrdCl_Stop_Threads()
[2020-07-24 08:38:40.953763 +0200][Debug  ][JobMgr            ] Stopping the job manager...
[2020-07-24 08:38:40.954512 +0200][Debug  ][JobMgr            ] Job manager stopped
[2020-07-24 08:38:40.954574 +0200][Debug  ][TaskMgr           ] Stopping the task manager...
[2020-07-24 08:38:40.954863 +0200][Debug  ][TaskMgr           ] Task manager stopped
[2020-07-24 08:38:40.954921 +0200][Debug  ][Poller            ] Stopping the poller...
***** Finished client.__XrdCl_Stop_Threads()
[2020-07-24 08:38:40.990681 +0200][Debug  ][File              ] [0x84000b80@root://xrootd-lhcb.cr.cnaf.infn.it:1094//storage/gpfs_lhcb/lhcb/disk/LHCb/Collision18/LEPTONIC.MDST/00077054/0001/00077054_00019123_1.leptonic.mdst?xrdcl.requuid=17781476-5b88-4609-98f3-6b4d78e188eb] Sending a close command for handle 0x0 to xrootd-lhcb.cr.cnaf.infn.it:1094
[2020-07-24 08:38:40.990769 +0200][Debug  ][ExDbgMsg          ] [xrootd-lhcb.cr.cnaf.infn.it:1094] MsgHandler created: 0x83d0e9c0 (message: kXR_close (handle: 0x00000000) ).

As the call to send the close command is after the thread pool has been shut down there is there is nothing to actually process the request and the process hangs forever.

Solution in this PR

This PR works around the problem by looping over all objects known to Python and closing any XRootD.client.file.File objects. There may be other ways this bug can manifest, someone more familiar with the code should think about if there are any other commands added to the thread pool when objects are deleted.


You can view, comment on, or merge this pull request online at:

  https://github.com/xrootd/xrootd/pull/1260

Commit Summary

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/pull/1260", "url": "https://github.com/xrootd/xrootd/pull/1260", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1