character-encoding Archives - Page 3 of 4

How to unquote a urlencoded unicode string in python?

August 13, 2022 by Magenaut

I have a unicode string like “Tanım” which is encoded as “Tan%u0131m” somehow. How can i convert this encoded string back to original unicode.
Apparently urllib.unquote does not support unicode.

Understanding Unix file name encoding

August 10, 2022 by Magenaut

I have a hard time understanding how the file name encoding works. On unix.SE
I find contradicting explanations.

Unexpected non-null encoding of /proc//cmdline

August 10, 2022 by Magenaut

I am parsing the /proc/pid/cmdline value for a number of processes on my Linux system (Ubuntu 16.04) and have found that while most of the entries are null-encoded, as expected, at least one uses spaces for delimiters which I find unexpected.

How to determine the character encoding that a terminal uses in a C/C++ program?

August 10, 2022 by Magenaut

I’ve noticed that SyncTERM uses a different character encoding than the default MacOS terminal emulator, and they’re incompatible with one another. For example, say you want to print a block character in a format string. In SyncTERM, which uses the IBM Extended ASCII character encoding, you would use an octal escape sequence like 261. In Terminal.app (and probably iTerm2 as well), this just prints a question mark. Since these terminals use UTF-8, you need to use the uxxxx escape sequence.

How to print all printable ASCII chars in CLI?

August 9, 2022 by Magenaut

How can I list all the printable ASCII characters in the terminal?

Some apps doesn’t accept some characters from the «Compose» key

August 8, 2022 by Magenaut

The problem is that the compose key works fine, but some application doesn’t accept some characters from it. E.g. I can type the character ∞ in Emacs (Compose+8+8), but this won’t work in FireFox, Konsole and Kate. But many other characters, e.g. € typed there (in FireFox, Konsole and Kate) with Compose work just fine. Also I may insert a problem symbols with a simple copy-paste (from any of two clipboards).

How to use grep/ack with files in arbitrary encoding?

August 8, 2022 by Magenaut

On my Linux desktop I have a UTF-8 locale. When I try to search some KOI8-R encoded files with grep (ack), it fails. If I manually encode the pattern into KOI8-R and pass that as an argument, it works.

Script failing with “command not found: ^M”

August 8, 2022 by Magenaut

When I try to run the following script in zsh, via the command /bin/zsh ~/.set_color_scheme.sh I get the following error:

bulk rename (or correctly display) files with special characters

August 7, 2022 by Magenaut

I have a bunch of directories and subdirectories that contain files with special characters, like this file:

How can I correctly decompress a ZIP archive of files with Hebrew names?

August 7, 2022 by Magenaut

Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome archive manager manages to unzip the file, but the Hebrew characters are garbled. I think I’m getting UTF-8 octets extended into Unicode characters, e.g. I have a file whose name has four characters and a .doc suffic, and the characters are: 0x008E 0x0087 0x008E 0x0085 . Using the command-line unzip utility is even worse – it refuses to decompress altogether, complaining about an “Invalid or incomplete multibyte or wide character”.