Print

Print


Fixes an issue where if files are not closed before the Python interpreter exits `XRootD.client` causes the Python interpreter to deadlock.

This can be reproduced in both XRootD 4.12.3 and 5.0.0. If there is any intention to make further releases for the 4.x series it would be nice to have this back ported.

You can find some more details about this below and in: https://github.com/scikit-hep/uproot/issues/504

## Reproducer

1. Create a file called `xrootd_bug.py` containing:

```python
import XRootD.client
f = XRootD.client.File()
f.open('root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/B2HHH_MagnetDown.root')
```

2. Run `python xrootd_bug.py`

## Technical details

This is caused by the `atexit` function shutting down the worker thread pool:

https://github.com/xrootd/xrootd/blob/912673bfbc75ff0d9f661556716238f65e671a6a/bindings/python/libs/client/finalize.py#L28-L35

The debug output shows that it shuts down all the threads and then sends the call to close the file:
```
***** Calling client.__XrdCl_Stop_Threads()
[2020-07-24 08:38:40.953763 +0200][Debug  ][JobMgr            ] Stopping the job manager...
[2020-07-24 08:38:40.954512 +0200][Debug  ][JobMgr            ] Job manager stopped
[2020-07-24 08:38:40.954574 +0200][Debug  ][TaskMgr           ] Stopping the task manager...
[2020-07-24 08:38:40.954863 +0200][Debug  ][TaskMgr           ] Task manager stopped
[2020-07-24 08:38:40.954921 +0200][Debug  ][Poller            ] Stopping the poller...
***** Finished client.__XrdCl_Stop_Threads()
[2020-07-24 08:38:40.990681 +0200][Debug  ][File              ] [0x84000b80@root://xrootd-lhcb.cr.cnaf.infn.it:1094//storage/gpfs_lhcb/lhcb/disk/LHCb/Collision18/LEPTONIC.MDST/00077054/0001/00077054_00019123_1.leptonic.mdst?xrdcl.requuid=17781476-5b88-4609-98f3-6b4d78e188eb] Sending a close command for handle 0x0 to xrootd-lhcb.cr.cnaf.infn.it:1094
[2020-07-24 08:38:40.990769 +0200][Debug  ][ExDbgMsg          ] [xrootd-lhcb.cr.cnaf.infn.it:1094] MsgHandler created: 0x83d0e9c0 (message: kXR_close (handle: 0x00000000) ).
```

As the call to send the close command is after the thread pool has been shut down there is there is nothing to actually process the request and the process hangs forever.

## Solution in this PR

This PR works around the problem by looping over all objects known to Python and closing any `XRootD.client.file.File` objects. There may be other ways this bug can manifest, someone more familiar with the code should think about if there are any other commands added to the thread pool when objects are deleted.
You can view, comment on, or merge this pull request online at:

  https://github.com/xrootd/xrootd/pull/1260

-- Commit Summary --

  * Prevent deadlock in Python bindings from XRootD.client.finalize.finalize

-- File Changes --

    M bindings/python/libs/client/finalize.py (13)

-- Patch Links --

https://github.com/xrootd/xrootd/pull/1260.patch
https://github.com/xrootd/xrootd/pull/1260.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/pull/1260

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1