How can I enable UTF-8 support in the Linux console?
Right now, it looks like this:
Right now, it looks like this:
If I execute the following simple script:
I am using Ubuntu 14.04 and 16.04. On 14.04 I additionally installed Terminology.
I sometimes need to type “alt codes” to get symbols and on linux, it can be inefficient to use shift+ctrl+u then type the code, for example shift+ctrl+u+00a7 for § when on Windows it’s alt+21.
I use xterm (X-Win32 2012 Build 30 from StarNet Communications Corp) to login from a Windows 7 PC to a Red Enterprise Linux 6 (RHEL6).
I’m trying to use the tr command from the Heirloom Toolchest to overcome a current limitation of the coreutils implementation, so as to be able to “pump” (with the -dc options) multibyte characters from a “random” generator (/dev/urandom) to the terminal. Noteworthy is the fact that this has been compiled from source on Archbang after having failed to do so using the AUR version(s).
In Unicode, some character combinations have more than one representation.
Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome archive manager manages to unzip the file, but the Hebrew characters are garbled. I think I’m getting UTF-8 octets extended into Unicode characters, e.g. I have a file whose name has four characters and a .doc suffic, and the characters are: 0x008E 0x0087 0x008E 0x0085 . Using the command-line unzip utility is even worse – it refuses to decompress altogether, complaining about an “Invalid or incomplete multibyte or wide character”.
I have a text file in an unknown or mixed encoding. I want to see the lines that contain a byte sequence that is not valid UTF-8 (by piping the text file into some program). Equivalently, I want to filter out the lines that are valid UTF-8. In other words, I’m looking for grep [notutf8].
I already know vim -b, however, depending on the locale used, it displays multi-byte characters (like UTF-8) as single letters.