In bash, how can I convert a Unicode Codepoint [0-9A-F] into a printable character?

I have a list of Unicode codepoints, but I don’t know of a “simple” way to convert these hex values into the actual characters they represent…

I’ve heard that zsh has echo -e 'u0965', but I use bash 4.1.

Is there something as simple as the zsh method, for bash?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use bash’s echo or /bin/echo from GNU coreutils in combination with iconv:

echo -ne 'x09x65' | iconv -f utf-16be

By default iconv converts to your locales encoding. Perhaps more portable than relying on a specific shell or echo command is Perl. Most any UNIX system I am aware of while have Perl available and it even have several Windows ports.

perl -C -e 'print chr 0x0965'

Most of the time when I need to do this, I’m in an editor like Vim/GVim which has built-in support. While in insert mode, hit Ctrl-V followed by u, then type four hex characters. If you want a character beyond U+FFFF, use a capital U and type 8 hex characters. Vim also supports custom easy to make keymaps. It converts a series of characters to another symbol. For example, I have a keymap I developed called www, it converts TM to ™, (C) to ©, (R) to ®, and so on. I also have a keymap for Klingon for when that becomes necessary. I’m sure Emacs has something similar. If you are in a GTK+ app which includes GVim and GNOME Terminal, you can try Control-Shift-u followed by 4 hex characters to create a Unicode character. I’m sure KDE/Qt has something similar.

UPDATE: As of Bash 4.2, it seems to be a built in feature now:

echo $'u0965'

UPDATE: Also, nowadays a Python example would probably be preferred to Perl. This works in both Python 2 and 3:

python -c 'print(u"u0965")'

Method 2

Bash 4.2 (released in 2011) added support for echo -e 'u0965', printf 'u0965', printf %b 'u0965' and echo $'u0965' also work.

http://tiswww.case.edu/php/chet/bash/FAQ:

o   $'...', echo, and printf understand uXXXX and UXXXXXXXX escape sequences.

Method 3

If you have GNU coreutils, try printf:

$ printf 'u0965n'
॥

echo can do the job if your console is using UTF-8 and you have the UTF-8 encoding:

$ echo -e 'xE0xA5xA5'

You can find a table of Unicode to UTF-8 hex encodings here: http://www.utf8-chartable.de/. You can convert the Unicode code points to hex using a number of scripting languages. Here is an example using python:

python -c "print(unichr(int('0965', 16)).encode('utf-8').encode('hex'))"

The following is a Perl script that will convert arguments to the correct hex value (many unnecessary parenthesis here):

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Encode;

foreach (@ARGV) {
    say unpack('H*', encode('utf8', chr(hex($_))))
}

For instance,

./uni2utf 0965
e0a5a5

Of course, if you have Perl or Python you could also just use those to print the characters.

Method 4

UPDATE: Here is a bash way to do a single Unicode value …(by “bash” I mean: not using any another scripting language) .. thanks to Gilles for a suggeston in this askubuntu Q/A .
According to this link: recode (Obsoletes iconv, dos2unix, unix2dos).. Edit: but as per the comment below, “obsoletes’ may just mean “alternative”

      echo -n 0x0965 |recode UTF-16BE/x4..UTF-8

Here is a method to process a raw hex dump as input (ie. no escaped-prefixes like; u0965, and no x09x65)..
xxd is a hex-dump utility (packaged with vim-common) which can revert a raw hex dump to the characters the dump represents… Unicode Codepoints are UTF-16BigEndian, which is exactly what a Hex-dump is..
xxd in revert mode accepts a stream of Hex values with line breaks.which are ignored.

This script creates a UTF-16BE stream, which it then reverts to the original chars.
The last line contains the two needed commands; xxd and iconv

for line in 
  "Matsuo Basho (1644-1694)" 
  "  pond" 
  "  frog jumps in" 
  "  plop!"
do 
  echo "$line" |iconv -f "$(locale charmap)" -t "UTF-16BE" |xxd -ps -u 
done |
#    (---this is the **revert** code---) 
tee >(xxd -p -u -r |iconv -f "UTF-16BE") ;echo

Here is the output (showing the UTF-16BE hex-dump input, first).
Note; xxd segments its own output with a newline at 60 hex-digits… The revert option ignores these newlines.. it ignores any/all newlines (as the aren’t hex-digits)..

004D0061007400730075006F00200042006100730068006F002000280031
003600340034002D00310036003900340029000A
002000200070006F006E0064000A
0020002000660072006F00670020006A0075006D0070007300200069006E
000A
002000200070006C006F00700021000A

Matsuo Basho (1644-1694)
  pond
  frog jumps in
  plop!

Method 5

Using Pattern substitution in bash version 4.2 (and higher):

${parameter/pattern/string}

as described here http://steve-parker.org/sh/tips/pattern-substitution/

UNICODE_HEX="U+02211"
printf ${UNICODE_HEX/U+/"U"}
∑

UNICODE_HEX="U+03BB"
printf ${UNICODE_HEX/U+/"U"}
λ

Method 6

Assuming the default encoding for your OS is UTF-8 (true for most current distros)
then you can use bash directly to convert any UNICODE code point:

echo -e "Unicode Character 'DEVANAGARI DOUBLE DANDA' (U+0965) U0965"

Of course, the glyph will appear correctly only if you have the correct font.
As of bash 4.3 all code points will work correctly. And this two builtins options will also work:

printf "%b" "Unicode Character (U+0965) U0965 n"
echo $'Unicode Character (U+0965) U0965'

Note that for bash 4.2 the Unicode code points from 0x80 to 0xFF are encoded incorrectly (bash bug). To workaround this issue you must take a look at the program at this site (also good for a deep look into the issue of converting numbers to chars.

Method 7

Thanks @illucent
I can now see how to script Unicode mappings in filenames.
So (for anyone who’s interested), a script that can serve as a template for anyone wanting to fix some non-ascii filenames.

function makelist {
RUN=false
if [[ -n ${1+X} ]] ; then RUN=true; fi
local -A UNI
A=$( printf "%dn" 0x2000 )
B=$( printf "%dn" 0x200f )
for N in $( seq ${A} ${B} ) ; do
    V=$( printf "%04Xn" "${N}" )
    UNI["${V}"]=' '
done
A=$( printf "%dn" 0x2010)
B=$( printf "%dn" 0x2015)
for N in $( seq ${A} ${B} ) ; do
    V=$( printf "%04Xn" "${N}" )
    UNI["${V}"]='-'
done
local -A MORE=( [0xA1]='!' [0xBF]='?' [0x300]='' [0x301]='' [0xB4]="'" [0xAB]="(" [0xBB]=')' [0x60]="'" [0x2018]="'" [0x2019]="'" [0x201C]="'" [0x201D]="'" [0x2032]="'" [0x2035]="'" )
for N in ${!MORE[@]} ; do
    V=$( printf "%04Xn" "${N}" )
    UNI["${V}"]="${MORE[$N]}"
done
if $RUN ; then
    ECHO= ;
else
    ECHO='-n'
fi

for XX in "${!UNI[@]}" ; do
    UHEX=$( printf "U+%04Xn" 0x${XX} )
    CHR=$( printf "${UHEX/U+/\U}" )
    printf -- "---------Search-> %s => [%s] => [%s] ----n" "${XX}" "${CHR}" "${UNI[$XX]}"
    find * -name "*"$( printf "${UHEX/U+/\U}" )"*" -print -exec rename $ECHO -E "s/$CHR/${UNI[$XX]}/g;" {} ;
done

}

But hey… it’s not an armourclad script, just a quickie to get something done, so use at your own peril.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating