Characters encodings supported by more, cat and less
I have a text file encoded as following according to file
:
ISO-8859 text, with CRLF line terminators
This file contains French’s text with accents. My shell is able to display accent and emacs
in console mode is capable of correctly displaying these accents.
My problem is that more
, cat
and less
tools don’t display this file correctly. I guess that it means that these tools don’t support this characters encoding set. Is this true? What are the characters encodings supported by these tools?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Your shell can display accents etc because it is probably using UTF-8. Since the file in question is a different encoding, less
more
and cat
are trying to read it as UTF and fail. You can check your current encoding with
echo $LANG
You have two choices, you can either change your default encoding, or change the file to UTF-8. To change your encoding, open a terminal and type
export LANG="fr_FR.ISO-8859"
For example:
$ echo $LANG en_US.UTF-8 $ cat foo.txt J'ai mal � la t�te, c'est chiant! $ export LANG="fr_FR.ISO-8859" $ xterm <-- open a new terminal $ cat foo.txt J'ai mal à la tête, c'est chiant!
If you are using
gnome-terminal
or similar, you may need to activate the encoding, for example for terminator
right click and:For gnome-terminal
:
Your other (better) option is to change the file’s encoding:
$ cat foo.txt J'ai mal � la t�te, c'est chiant! $ iconv -f ISO-8859-1 -t UTF-8 foo.txt > bar.txt $ cat bar.txt J'ai mal à la tête, c'est chiant!
Method 2
ISO-8858 character encodings are a bit outdated for Linux systems. Your whole Linux system is likely using UTF-8 all the way. Including your terminal emulator and your shell.
However. cat
, grep
and less
do not do any encoding transformation, they will treat your ISO-8859/latin1 file as UTF-8, which will not work.
If emacs is able to display them, it’s because it tries to autodetect the encoding used and apparently succeed. Tell emacs to save the file as UTF-8 and you will be able to use cat
/grep
/whatever on it.
If you know the exact character encoding (ISO-8859 is a collection of them, you have to know the exact one: ISO-8859-1 or ISO-8859-15 or worse), you can also convert your files from the command line:
iconv --from-code ISO-8859-15 your_file -o your_file_as_utf8
Method 3
Cat, More and Less are just doing their job of displaying the file. Translating between encodings isn’t in their job description. The encoding of newlines isn’t a problem as CRLF is displayed just like the normal line ending LF, but your terminal is probably expecting UTF-8-encoded text, which is the de facto standard nowadays.
Luit translates between supported encodings and UTF-8. You tell Luit which encoding to translate by setting the LC_CTYPE
environment variable or with the -encoding
option. For example, to display a latin-1 (a.k.a. ISO 8859-1) file:
LC_CTYPE=en_US luit less somefile luit -encoding ISO8859-1 less somefile
If the file is in some exotic encoding that Luit doesn’t support, you can pipe it through a translator program. Iconv supports many encodings.
iconv -f latin1 somefile iconv -f latin1 somefile | less
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0