In Python 3, strings are bytes interpreted as UTF-8. This means one cannot use arbitrary byte sequences to construct a str
because Python will try to interpret the bytes as UTF-8, throwing a UnicodeDecodeError
on failure. Instead, byte sequences should be returned as bytes
objects.
As an example where this fails in pyxrootd
, the File::Read
method tries to build a string from the result of reading a file:
pyresponse = Py_BuildValue( "s#", buffer, bytesRead );
The s#
notation means
Convert a C string and its length to a Python
str
object using'utf-8'
encoding. If the C string pointer is NULL, the length is ignored andNone
is returned.
This will fail in general, as not all byte sequences are sequences of valid UTF-8 codes. A sufficient fix might be to use y#
instead.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1