LISTSERV 16.5 - XROOTD-DEV Archives

In Python 3, strings are bytes interpreted as UTF-8. This means one cannot use arbitrary byte sequences to construct a `str` because Python will try to interpret the bytes as UTF-8, throwing a `UnicodeDecodeError` on failure. Instead, byte sequences should be returned as `bytes` objects.

As an example where this fails in `pyxrootd`, the [`File::Read` method] tries to build a string from the result of reading a file:

```cpp
pyresponse = Py_BuildValue( "s#", buffer, bytesRead );
```
The `s#` notation [means]

> Convert a C string and its length to a Python `str` object using `'utf-8'` encoding. If the C string pointer is _NULL_, the length is ignored and `None` is returned.

This will fail in general, as not all byte sequences are sequences of valid UTF-8 codes. A sufficient fix might be to use `y#` instead.

[`File::Read` method]: https://github.com/xrootd/xrootd/blob/4b98210385f1fb9eafb49db58c1f8d6983cee055/bindings/python/src/PyXRootDFile.cc#L143-L209
[means]: https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/632

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1