unicode Archives - Page 10 of 11

How can I enable UTF-8 support in the Linux console?

August 9, 2022 by Magenaut

Right now, it looks like this:

Why is printf “shrinking” umlaut?

August 9, 2022 by Magenaut

If I execute the following simple script:

Why are these fancy characters not shown in my prompt?

August 8, 2022 by Magenaut

I am using Ubuntu 14.04 and 16.04. On 14.04 I additionally installed Terminology.

Linux alternative to alt+numpad codes

August 8, 2022 by Magenaut

I sometimes need to type “alt codes” to get symbols and on linux, it can be inefficient to use shift+ctrl+u then type the code, for example shift+ctrl+u+00a7 for § when on Windows it’s alt+21.

How to make the login shell xterm use utf-8?

August 8, 2022 by Magenaut

I use xterm (X-Win32 2012 Build 30 from StarNet Communications Corp) to login from a Windows 7 PC to a Red Enterprise Linux 6 (RHEL6).

Heirloom Toolchest tr: error(s) trying to delete the complement of a set containing a multibyte character?

August 8, 2022 by Magenaut

I’m trying to use the tr command from the Heirloom Toolchest to overcome a current limitation of the coreutils implementation, so as to be able to “pump” (with the -dc options) multibyte characters from a “random” generator (/dev/urandom) to the terminal. Noteworthy is the fact that this has been compiled from source on Archbang after having failed to do so using the AUR version(s).

Convert between Unicode Normalization Forms on the unix command-line

August 7, 2022 by Magenaut

In Unicode, some character combinations have more than one representation.

How can I correctly decompress a ZIP archive of files with Hebrew names?

August 7, 2022 by Magenaut

Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome archive manager manages to unzip the file, but the Hebrew characters are garbled. I think I’m getting UTF-8 octets extended into Unicode characters, e.g. I have a file whose name has four characters and a .doc suffic, and the characters are: 0x008E 0x0087 0x008E 0x0085 . Using the command-line unzip utility is even worse – it refuses to decompress altogether, complaining about an “Invalid or incomplete multibyte or wide character”.

Filtering invalid utf8

August 7, 2022 by Magenaut

I have a text file in an unknown or mixed encoding. I want to see the lines that contain a byte sequence that is not valid UTF-8 (by piping the text file into some program). Equivalently, I want to filter out the lines that are valid UTF-8. In other words, I’m looking for grep [notutf8].

Can vim display ASCII characters only, and treat other bytes as binary data?

August 7, 2022 by Magenaut

I already know vim -b, however, depending on the locale used, it displays multi-byte characters (like UTF-8) as single letters.