I have a text file encoded as following according to
ISO-8859 text, with CRLF line terminators
This file contains French’s text with accents. My shell is able to display accent and
emacs in console mode is capable of correctly displaying these accents.
My problem is that
less tools don’t display this file correctly. I guess that it means that these tools don’t support this characters encoding set. Is this true? What are the characters encodings supported by these tools?
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Your shell can display accents etc because it is probably using UTF-8. Since the file in question is a different encoding,
cat are trying to read it as UTF and fail. You can check your current encoding with
You have two choices, you can either change your default encoding, or change the file to UTF-8. To change your encoding, open a terminal and type
$ echo $LANG en_US.UTF-8 $ cat foo.txt J'ai mal � la t�te, c'est chiant! $ export LANG="fr_FR.ISO-8859" $ xterm <-- open a new terminal $ cat foo.txt J'ai mal à la tête, c'est chiant!
If you are using
gnome-terminalor similar, you may need to activate the encoding, for example for
terminatorright click and:
Your other (better) option is to change the file’s encoding:
$ cat foo.txt J'ai mal � la t�te, c'est chiant! $ iconv -f ISO-8859-1 -t UTF-8 foo.txt > bar.txt $ cat bar.txt J'ai mal à la tête, c'est chiant!
ISO-8858 character encodings are a bit outdated for Linux systems. Your whole Linux system is likely using UTF-8 all the way. Including your terminal emulator and your shell.
less do not do any encoding transformation, they will treat your ISO-8859/latin1 file as UTF-8, which will not work.
If emacs is able to display them, it’s because it tries to autodetect the encoding used and apparently succeed. Tell emacs to save the file as UTF-8 and you will be able to use
grep/whatever on it.
If you know the exact character encoding (ISO-8859 is a collection of them, you have to know the exact one: ISO-8859-1 or ISO-8859-15 or worse), you can also convert your files from the command line:
iconv --from-code ISO-8859-15 your_file -o your_file_as_utf8
Cat, More and Less are just doing their job of displaying the file. Translating between encodings isn’t in their job description. The encoding of newlines isn’t a problem as CRLF is displayed just like the normal line ending LF, but your terminal is probably expecting UTF-8-encoded text, which is the de facto standard nowadays.
Luit translates between supported encodings and UTF-8. You tell Luit which encoding to translate by setting the
LC_CTYPE environment variable or with the
-encoding option. For example, to display a latin-1 (a.k.a. ISO 8859-1) file:
LC_CTYPE=en_US luit less somefile luit -encoding ISO8859-1 less somefile
If the file is in some exotic encoding that Luit doesn’t support, you can pipe it through a translator program. Iconv supports many encodings.
iconv -f latin1 somefile iconv -f latin1 somefile | less