On my Linux desktop I have a UTF-8 locale. When I try to search some KOI8-R encoded files with grep (ack), it fails. If I manually encode the pattern into KOI8-R and pass that as an argument, it works.
Is it possible to tell grep what encoding to use for the pattern? Or any other tool?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If all the files you’re searching in have the same encoding:
LC_CTYPE=ru_RU.KOI8-R luit ack-grep "$(echo 'привет' | iconv -t KOI8-R)" *.txt
or in bash or zsh
LC_CTYPE=ru_RU.KOI8-R luit ack-grep "$(iconv -t KOI8-R <<<'привет')" *.txt
Or start a child shell in the desired encoding:
$ LC_CTYPE=ru_RU.KOI8-R luit $ ack-grep 'привет' *.txt $ exit
Luit (shipped with XFree86 and X.org) runs the program specified on its command line in the locale specified by the LC_CTYPE setting, assuming an UTF-8 terminal. So the command runs in the desired locale, and Luit translates its terminal output to UTF-8.
Another approach, if you have a directory tree with a lot of files in a different encoding, is to mount a view of that directory tree under a your prefered encoding. I think the fuseflt filesystem can do this (untested).
mkdir /utf8-view fuseflt iconv-koi8r-utf8.conf /some/dir /utf8-view ack-grep 'привет' /utf8-view/*.txt.utf8 fusermount -u /utf8-view
where the configuration file iconv-koi8r-utf8.conf contains
ext_in = ext_out = *.utf8 flt_in = flt_out = .utf8 flt_cmd = iconv -f KOI8-R -t UTF-8
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0