unicode Archives - Page 3 of 11

How to set sys.stdout encoding in Python 3?

August 21, 2022 by Magenaut

Setting the default output encoding in Python 2 is a well-known idiom:

How to get string objects instead of Unicode from JSON?

August 21, 2022 by Magenaut

I’m using Python 2 to parse JSON from ASCII encoded text files.

u’ufeff’ in Python string

August 20, 2022 by Magenaut

I got an error with the following exception message:

Normalizing Unicode

August 20, 2022 by Magenaut

Is there a standard way, in Python, to normalize a unicode string, so that it only comprehends the simplest unicode entities that can be used to represent it ?

Python regex matching Unicode properties

August 20, 2022 by Magenaut

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use p{Ll} to match an arbitrary lower-case letter, or p{Zs} for any space separator. I don’t see support for this in either the 2.x nor 3.x lines of Python (with due regrets). Is anybody aware of a good strategy to get a similar effect? Homegrown solutions are welcome.

Unicode (UTF-8) reading and writing to files in Python

August 20, 2022 by Magenaut

I’m having some brain failure in understanding reading and writing text to a file (Python 2.4).

How can I convert surrogate pairs to normal string in Python?

August 20, 2022 by Magenaut

This is a follow-up to Converting to Emoji. In that question, the OP had a json.dumps()-encoded file with an emoji represented as a surrogate pair – ud83dude4f. S/he was having problems reading the file and translating the emoji correctly, and the correct answer was to json.loads() each line from the file, and the json module would handle the conversion from surrogate pair back to (I’m assuming UTF8-encoded) emoji.

How to set sys.stdout encoding in Python 3?

How to get string objects instead of Unicode from JSON?

u’ufeff’ in Python string

Normalizing Unicode

Python regex matching Unicode properties

Unicode (UTF-8) reading and writing to files in Python

How can I convert surrogate pairs to normal string in Python?

removing emojis from a string in Python

Writing Unicode text to a text file?

Replace non-ASCII characters with a single space