Bash script to get ASCII values for alphabet

How do I get the ASCII value of the alphabet?

For example, 97 for a?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Define these two functions (usually available in other languages):

chr() {
  [ "$1" -lt 256 ] || return 1
  printf "\$(printf '%03o' "$1")"
}

ord() {
  LC_CTYPE=C printf '%d' "'$1"
}

Usage:

chr 65
A

ord A
65

Method 2

You can see the entire set with:

$ man ascii

You’ll get tables in octal, hex, and decimal.

Method 3

This works well,

echo "A" | tr -d "n" | od -An -t uC

echo "A"                              ### Emit a character.
         | tr -d "n"                 ### Remove the "newline" character.
                      | od -An -t uC  ### Use od (octal dump) to print:
                                      ### -An  means Address none
                                      ### -t  select a type
                                      ###  u  type is unsigned decimal.
                                      ###  C  of size (one) char.

exactly equivalent to:

echo -n "A" | od -An -tuC        ### Not all shells honor the '-n'.

Method 4

If you want to extend it to UTF-8 characters (assuming you’re in a UTF-8 locale):

$ perl -CA -le 'print ord shift' 😈
128520

$ perl -CS -le 'print chr shift' 128520
😈

With bash, ksh or zsh builtins:

$ printf "U$(printf %08x 128520)n"
😈

Method 5

I’m going for the simple (and elegant?) Bash solution:

for i in {a..z}; do echo $(printf "%s %d" "$i" "'$i"); done

For in a script you can use the following:

CharValue="A"
AscValue=`printf "%d" "'$CharValue"

Notice the single quote before the CharValue. It is obligated…

Method 6

ctbl()  for O                   in      0 1 2 3
        do  for o               in      0 1 2 3 4 5 6 7
                do for  _o      in      7 6 5 4 3 2 1 0
                        do      case    $((_o=(_o+=O*100+o*10)?_o:200)) in
                                (*00|*77) set   "${1:+ "}\$_o${1:-"}";;
                                (140|42)  set   '\'"\$_o$1"           ;;
                                (*)       set   "\$_o$1"               ;esac
                        done;   printf   "$1";   shift
                done
        done
eval '
ctbl(){
        ${1:+":"}       return "$((OPTARG=0))"
        set     "" ""   "${1%"${1#?}"}"
        for     c in    ${a+"a=$a"} ${b+"b=$b"} ${c+"c=$c"}
                        ${LC_ALL+"LC_ALL=$LC_ALL"}
        do      while   case  $c in     (*'''*) ;; (*) ! 
                                 set "" "${c%%=*}='''${c#*=}$1''' $2" "$3"
                        esac;do  set    "'"'''${c##*'}"'<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="83a7c3">[email protected]</a>";  c=${c%'''*}
        done;   done;   LC_ALL=C a=$3 c=;set "" "$2 OPTARG='''${#a}*("
        while   [ 0 -ne "${#a}" ]
        do      case $a in      ([[:print:][:cntrl:]]*)
                        case    $a in   (['"$(printf \1-\77)"']*)
                                        b=0;;   (*)     b=1
                        esac;;  (['"$(  printf  \200-\277)"']*)
                                        b=2;;   (*)     b=3
                esac;    set    '"$(ctbl)"'     "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8aaeca">[email protected]</a>"
                eval "   set    "${$((b+1))%"'''"${a%"${a#?}"}"*}" "$6"'''
                a=${a#?};set    "$((b=b*100+${#1}+${#1}/8*2)))" 
                                "$2(o$((c+=1))=$b)>=(d$c=$((0$b)))|"
        done;   eval "   unset   LC_ALL  a b c;${2%?})'''"
        return  "$((${OPTARG%%**}-1))"
}'

The first ctbl() – at the top there – only ever runs the one time. It generates the following output (which has been filtered through sed -n l for printability’s sake):

ctbl | sed -n l

 "200010203040506abt$
vfr161720212223242526273031323334
353637 !\"#$%&'()*+,-./0123456789:;<=>?" "@ABCDEFGHIJKLMNOPQRS
TUVWXYZ[\]^_\`abcdefghijklmnopqrstuvwxyz{|}~177" "200201202203
204205206207210211212213214215216217220221222223224
225226227230231232233234235236237240241242243244245
246247250251252253254255256257260261262263264265266
267270271272273274275276277" "300301302303304305306
307310311312313314315316317320321322323324325326327
330331332333334335336337340341342343344345346347350
351352353354355356357360361362363364365366367370371
372373374375376377"$

…which are all 8-bit bytes (less NUL), divided into four shell-quoted strings split evenly at 64-byte boundaries. The strings might be represented with octal ranges like 2001-77,100-177,200-277,300-377, where byte 128 is used as a place-holder for NUL.

The first ctbl()‘s entire purpose for existence is to generate those strings so that eval may define the second ctbl() function with them literally embedded thereafter. In that way they can be referenced in the function without needing to generate them again each time they are needed. When eval does define the second ctbl() function the first will cease to be.

The top half of the second ctbl() function is mostly ancillary here – it is designed to portably and safely serialize any current shell state it might affect when it is called. The top loop will quote any quotes in the values of any variables it might want to use, and then stack all of the results in its positional parameters.

The first two lines, though, first immediately return 0 and set $OPTARG to same if the function’s first argument does not contain at least one character. And if it does, the second line immediately truncates its first argument to only its first character – because the function only handles a character at a time. Importantly, it does this in the current locale context, which means that if a character might comprise more than a single byte, then, provided the shell properly supports multi-byte chars, it will not discard any bytes except those which are not in the first character of its first argument.

        ${1:+":"}       return "$((OPTARG=0))"
        set     "" ""   "${1%"${1#?}"}"

It then does the save loop if at all necessary, and afterward it redefines the current locale context to the C locale for every category by assigning to the LC_ALL variable. From this point on, a character can only consist of a single byte, and so if there were multiple bytes in the first character of its first argument, these should now be each addressable as individual characters in their own right.

        LC_ALL=C

It is for this reason that the second half of the function is a while loop, as opposed to a singly run sequence. In most cases it will probably execute only once per call, but, if the shell in which ctbl() is defined properly handles multi-byte characters, it might loop.

        while   [ 0 -ne "${#a}" ]
        do      case $a in      ([[:print:][:cntrl:]]*)
                        case    $a in   (['"$(printf \1-\77)"']*)
                                        b=0;;   (*)     b=1
                        esac;;  (['"$(  printf  \200-\277)"']*)
                                        b=2;;   (*)     b=3
                esac;    set    '"$(ctbl)"'     "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3b1f7b">[email protected]</a>"

Note that the above $(ctbl) command substitution is only ever evaluated once – by eval when the function is initially defined – and that forever after that token is replaced with the literal output of that command substitution as saved into the the shell’s memory. The same is true of the two case pattern command substitutions. This function does not ever call a subshell or any other command. It will also never attempt to read or write input/output (except in the case of some shell diagnostic message – which probably indicates a bug).

Also note that the test for loop continuity is not simply [ -n "$a" ], because, as I found to my frustration, for some reason a bash shell does:

char=$(printf \1)
[ -n "$char" ] || echo but it's not null!

but it's not null!

…and so I explicitly compare $a‘s len to 0 for each iteration, which, also inexplicably, behaves differently (read: correctly).

The case checks the first byte for inclusion in any of our four strings and stores a reference to the byte’s set in $b. Afterward the shell’s first four positional parameters are set to the strings embedded by eval and written by ctbl()‘s predecessor.

Next, whatever remains of the first argument is again temporarily truncated to its first character – which should now be assured to be a single byte. This first byte is used as a reference to strip from the tail of the string which it matched and the reference in $b is eval‘d to represent a positional parameter so everything from the reference byte to the last byte in string can be substituted away. The other three strings are dropped from the positional parameters entirely.

               eval "   set    "${$((b+1))%"'''"${a%"${a#?}"}"*}" "$6"'''
               a=${a#?};set    "$((b=b*100+${#1}+${#1}/8*2)))" 
                                "$2(o$((c+=1))=$b)>=(d$c=$((0$b)))|"

At this point the byte’s value (modulo 64) can be referenced as the string’s len:

str=$(printf '2001234567')
ref=$(printf \4)
str=${str%"$ref"*}
echo "${#str}"

4

A little math is then done to reconcile the modulus based on the value in $b, the first byte in $a is permanently stripped away, and output for the current cycle is appended to a stack pending completion before the loop recycles to check if $a is actually empty.

    eval "   unset   LC_ALL  a b c;${2%?})'''"
    return  "$((${OPTARG%%**}-1))"

When $a definitely is empty, all names and state – with the exception of $OPTARG – that the function affected throughout the course of its execution are restored to their previous state – whether set and not null, set and null, or unset – and the output is saved to $OPTARG as the function returns. The actual return value is one less than the total number of bytes in the first character of its first argument – so any single byte character returns zero and any multi-byte char will return more than zero – and its output format is a little strange.

The value ctbl() saves to $OPTARG is a valid shell arithmetic expression that, if evaluated, will concurrently set variable names of the forms $o1, $d1, $o2, $d2 to decimal and octal values of all respective bytes in the first character of its first argument, but ultimately evaluate to the total number of bytes in its first argument. I had a specific kind of workflow in mind when writing this, and I think maybe a demonstration is in order.

I often find a reason to take a string apart with getopts like:

str=some string OPTIND=1
while   getopts : na  -"$str"
do      printf %s\n "$OPTARG"
done

s
o
m
e

s
t
r
i
n
g

I probably do a little more than just print it a char per line, but anything’s possible. In any case, I haven’t yet found a getopts that will properly do (strike that – dash‘s getopts does it char by char, but bash definitely doesn’t):

str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş  OPTIND=1
while   getopts : na  -"$str"
do      printf %s\n "$OPTARG"
done|   od -tc

0000000 305  n 220  n 305  n 221  n 305  n 222  n 305  n 223  n
0000020 305  n 224  n 305  n 225  n 305  n 226  n 305  n 227  n
0000040 305  n 230  n 305  n 231  n 305  n 232  n 305  n 233  n
0000060 305  n 234  n 305  n 235  n 305  n 236  n 305  n 237  n
0000100

Ok. So I tried…

str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş
while   [ 0 -ne "${#str}" ]
do      printf %c\n "$str"    #identical results for %.1s
        str=${str#?}
done|   od -tc

#dash
0000000 305  n 220  n 305  n 221  n 305  n 222  n 305  n 223  n
0000020 305  n 224  n 305  n 225  n 305  n 226  n 305  n 227  n
0000040 305  n 230  n 305  n 231  n 305  n 232  n 305  n 233  n
0000060 305  n 234  n 305  n 235  n 305  n 236  n 305  n 237  n
0000100

#bash
0000000 305  n 305  n 305  n 305  n 305  n 305  n 305  n 305  n
*
0000040

That kind of workflow – the byte for byte/char for char kind – is one I often get into when doing tty stuff. At the leading edge of input you need to know char values as soon as you read them, and you need their sizes (especially when counting columns), and you need characters to be whole characters.

And so now I have ctbl():

str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş
while [ 0 -ne "${#str}" ]
do    ctbl "$str"
      printf "%.$(($OPTARG))st::t$OPTARGt::t$?t::t\$o1\$o2n" "$str"
      str=${str#?}
done

Ő   ::  2*((o1=305)>=(d1=197)|(o2=220)>=(d2=144))   ::  1   ::  Ő
ő   ::  2*((o1=305)>=(d1=197)|(o2=221)>=(d2=145))   ::  1   ::  ő
Œ   ::  2*((o1=305)>=(d1=197)|(o2=222)>=(d2=146))   ::  1   ::  Œ
œ   ::  2*((o1=305)>=(d1=197)|(o2=223)>=(d2=147))   ::  1   ::  œ
Ŕ   ::  2*((o1=305)>=(d1=197)|(o2=224)>=(d2=148))   ::  1   ::  Ŕ
ŕ   ::  2*((o1=305)>=(d1=197)|(o2=225)>=(d2=149))   ::  1   ::  ŕ
Ŗ   ::  2*((o1=305)>=(d1=197)|(o2=226)>=(d2=150))   ::  1   ::  Ŗ
ŗ   ::  2*((o1=305)>=(d1=197)|(o2=227)>=(d2=151))   ::  1   ::  ŗ
Ř   ::  2*((o1=305)>=(d1=197)|(o2=230)>=(d2=152))   ::  1   ::  Ř
ř   ::  2*((o1=305)>=(d1=197)|(o2=231)>=(d2=153))   ::  1   ::  ř
Ś   ::  2*((o1=305)>=(d1=197)|(o2=232)>=(d2=154))   ::  1   ::  Ś
ś   ::  2*((o1=305)>=(d1=197)|(o2=233)>=(d2=155))   ::  1   ::  ś
Ŝ   ::  2*((o1=305)>=(d1=197)|(o2=234)>=(d2=156))   ::  1   ::  Ŝ
ŝ   ::  2*((o1=305)>=(d1=197)|(o2=235)>=(d2=157))   ::  1   ::  ŝ
Ş   ::  2*((o1=305)>=(d1=197)|(o2=236)>=(d2=158))   ::  1   ::  Ş
ş   ::  2*((o1=305)>=(d1=197)|(o2=237)>=(d2=159))   ::  1   ::  ş

Note that ctbl() doesn’t actually define the $[od][12...] variables – it never has any lasting effect on any state but $OPTARG – but only puts the string in $OPTARG that can be used to define them – which is how I get the second copy of each char above by doing printf "\$o1\$o2" because they are set each time I evaluate $(($OPTARG)). But where I do it I’m also declaring a field length modifier to printf‘s %s string argument format, and because the expression always evaluates to the total number of bytes in a character, I get the whole character on output when I do:

printf %.2s "$str"

Method 7

Not a shell script, but works

awk 'BEGIN{for( i=97; i<=122;i++) printf "%c %dn",i,i }'

Sample output

xieerqi:$ awk 'BEGIN{for( i=97; i<=122;i++) printf "%c %dn",i,i }' | head -n 5                                    
a 97
b 98
c 99
d 100
e 101

Method 8

  • select the symbol, then press CTRL+C
  • open konsole
  • and type: xxd<press enter>
  • then press <SHIFT+INSERT><CTRL+D>

you get something like:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="82efe3f0ebe3ece9c2e6e6bbb2b1e1b7ecb3">[email protected]</a> ~ $ xxd
û0000000: fb

you know the symbol you pasted has hex code 0xfb

Method 9

If you want to print out the decimal representation of the UTF-8 value, I endorse dsmsk80’s soluiton. If, on the other hand, you need to assign the value to a variable, there is a mechanism within Bash’s printf that works faster. Let us assume that you want to assign the ascii value of “A” (which is 65 in decimal, and which we have assigned to a variable, theChar) to a variable myVar. Inlining dmsmsk80’s ord() function we would get:

LC_CTYPE=C myVar=$(printf "%d" "'$theChar")

In order for this assignment to take place, the value harvested from '$theChar must be formatted in decimal characters and then parsed from decimal to the number 65 that is then stored in myVar. To avoid this formatting and parsing we can take advantage of the -v flag for printf, which assigns the value to be printed directly. The syntax is as follows:

LC_CTYPE=C printf -v myVar "%d" "'$theChar"

I discovered this because I needed to create a Bash script that gave me the Fowler-Noll-Vo hash for each line of a text file, which I quote here:

#!/bin/bash

export LC_CTYPE=C
prime=16777619                                             #FNV prime
ofset=2166136261                                           #FNV offset
mask=0xffffffff                                            #bitmask
cat $1 | while read line || [[ -n $line ]]                 #foreach line in file (w/o end return)
do
    hash=$ofset                                            #set hash to offset for line.
    for (( i=0; i<${#line}; i++ ))                         #foreach char in line
    do
        printf -v charVal "%d" "'${line:$i:1}"             #use printf -v trick.
        hash=$(( ( ( hash ^ charVal ) * prime ) & mask ))  #update FNV1-a hash for char.
    done
    printf "%08Xn" $hash                                  #print hash result for line.
done

Using the -v option, for assignment with printf, resulted in a 50X performance improvement, when run in Cygwin64.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x