Characters encodings supported by more, cat and less

I have a text file encoded as following according to file:

ISO-8859 text, with CRLF line terminators

This file contains French’s text with accents. My shell is able to display accent and emacs in console mode is capable of correctly displaying these accents.

My problem is that more, cat and less tools don’t display this file correctly. I guess that it means that these tools don’t support this characters encoding set. Is this true? What are the characters encodings supported by these tools?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Your shell can display accents etc because it is probably using UTF-8. Since the file in question is a different encoding, less more and cat are trying to read it as UTF and fail. You can check your current encoding with

echo $LANG

You have two choices, you can either change your default encoding, or change the file to UTF-8. To change your encoding, open a terminal and type
export LANG="fr_FR.ISO-8859"

For example:
$ echo $LANG 
en_US.UTF-8
$ cat foo.txt 
J'ai mal � la t�te, c'est chiant!
$ export LANG="fr_FR.ISO-8859"
$ xterm <-- open a new terminal 
$ cat foo.txt 
J'ai mal à la tête, c'est chiant!

If you are using gnome-terminal or similar, you may need to activate the encoding, for example for terminator right click and:

enter image description here

For gnome-terminal :

enter image description here

Your other (better) option is to change the file’s encoding:

$ cat foo.txt 
J'ai mal � la t�te, c'est chiant!
$ iconv -f ISO-8859-1 -t UTF-8  foo.txt > bar.txt
$ cat bar.txt 
J'ai mal à la tête, c'est chiant!

Method 2

ISO-8858 character encodings are a bit outdated for Linux systems. Your whole Linux system is likely using UTF-8 all the way. Including your terminal emulator and your shell.

However. cat, grep and less do not do any encoding transformation, they will treat your ISO-8859/latin1 file as UTF-8, which will not work.

If emacs is able to display them, it’s because it tries to autodetect the encoding used and apparently succeed. Tell emacs to save the file as UTF-8 and you will be able to use cat/grep/whatever on it.

If you know the exact character encoding (ISO-8859 is a collection of them, you have to know the exact one: ISO-8859-1 or ISO-8859-15 or worse), you can also convert your files from the command line:

iconv --from-code ISO-8859-15 your_file -o your_file_as_utf8

Method 3

Cat, More and Less are just doing their job of displaying the file. Translating between encodings isn’t in their job description. The encoding of newlines isn’t a problem as CRLF is displayed just like the normal line ending LF, but your terminal is probably expecting UTF-8-encoded text, which is the de facto standard nowadays.

Luit translates between supported encodings and UTF-8. You tell Luit which encoding to translate by setting the LC_CTYPE environment variable or with the -encoding option. For example, to display a latin-1 (a.k.a. ISO 8859-1) file:

LC_CTYPE=en_US luit less somefile
luit -encoding ISO8859-1 less somefile

If the file is in some exotic encoding that Luit doesn’t support, you can pipe it through a translator program. Iconv supports many encodings.
iconv -f latin1 somefile
iconv -f latin1 somefile | less


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments