How to efficiently generate large, uniformly distributed, random integers in bash?

I have been wondering what would be the best way to get good randomness in bash, i.e., what would be a procedure to get a random positive integer between MIN and MAX such that

The range can be arbitrarily large (or at least, say, up to 2³²-1);
Values are uniformly distributed (i.e., no bias);
It is efficient.

An efficient way to get randomness in bash is to use the $RANDOM variable. However, this only samples a value between 0 and 2¹⁵-1, which may not be large enough for all purposes. People typically use a modulo to get it into the range they want, e.g.,

MIN=0
MAX=12345
rnd=$(( $RANDOM % ($MAX + 1 - $MIN) + $MIN ))

This, additionally, creates a bias unless $MAX happens to divide 2¹⁵-1=32767. E.g., if $MIN is 0 and $MAX is 9, then the values 0 through 7 are slightly more probable than the values 8 and 9, as $RANDOM will never be 32768 or 32769. This bias gets worse as the range increases, e.g., if $MIN is 0 and $MAX is 9999, then the numbers 0 through 2767 have a probability of ⁴/₃₂₇₆₇, while the numbers 2768 through 9999 only have a probability of ³/₃₂₇₆₇.

So while the above method fulfills condition 3, it does not fulfill conditions 1 and 2.

The best method that I came up with so far in trying to satisfy conditions 1 and 2 was to use /dev/urandom as follows:

MIN=0
MAX=1234567890
while
  rnd=$(cat /dev/urandom | tr -dc 0-9 | fold -w${#MAX} | head -1 | sed 's/^0*//;')
  [ -z $rnd ] && rnd=0
  (( $rnd < $MIN || $rnd > $MAX ))
do :
done

Basically, just collect randomness from /dev/urandom (might consider to use /dev/random instead if a cryptographically strong pseudorandom number generator is desired, and if you have lots of time, or else maybe a hardware random number generator), delete every character that’s not a decimal digit, fold the output to the length of $MAX and cut leading 0’s. If we happened to only get 0’s then $rnd is empty, so in this case set rnd to 0. Check if the result is outside our range and if so, then repeat. I forced the “body” of the while loop into the guard here so as to force execution of the body at least once, in the spirit of emulating a do ... while loop, since rnd is undefined to start with.

I think I fulfilled conditions 1 and 2 here, but now I screwed up condition 3. It’s kinda slow. Takes up to a second or so (tenth of a second when I’m lucky). Actually, the loop is not even guaranteed to terminate (although the probability of termination converges to 1 as time increases).

Is there an efficient way to get unbiased random integers, within a pre-specified and potentially large range, in bash? (I’ll continue to investigate when time allows, but in the meantime I thought someone here might have a cool idea!)

Contents hide

Table of Answers

Answers:

Method 1

Method 2

So how does it work?

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Method 10

Table of Answers

The most basic (and hence portable) idea is to generate a random bitstring just long enough. There are different ways of generating a random bitstring, either using bash’s built-in $RANDOM variable or using od and /dev/urandom (or /dev/random). If the random number is greater than $MAX, start over.
Alternatively, it is possible to use external tools.
- The Perl solution
  - Pro: quite portable, simple, flexible
  - Contra: not for very large numbers above 2³²-1
- The Python solution
  - Pro: simple, flexible, works even for large numbers
  - Contra: less portable
- The zsh solution
  - Pro: good for people who use zsh anyway
  - Contra: probably even less portable

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I see another interesting method from here.

rand=$(openssl rand 4 | od -DAn)

This one also seems to be a good option. It reads 4 bytes from the random device and formats them as unsigned integer between 0 and 2^32-1.

rand=$(od -N 4 -t uL -An /dev/urandom | tr -d " ")

Method 2

Thank you all for all your great answers. I ended up with the following solution, that I would like to share.

Before I go into any more detail about the whys and hows, here’s the tl;dr: my shiny new script 🙂

#!/usr/bin/env bash
#
# Generates a random integer in a given range

# computes the ceiling of log2
# i.e., for parameter x returns the lowest integer l such that 2**l >= x
log2() {
  local x=$1 n=1 l=0
  while (( x>n && n>0 ))
  do
    let n*=2 l++
  done
  echo $l
}

# uses $RANDOM to generate an n-bit random bitstring uniformly at random
#  (if we assume $RANDOM is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 60 bits
get_n_rand_bits() {
  local n=$1 rnd=$RANDOM rnd_bitlen=15
  while (( rnd_bitlen < n ))
  do
    rnd=$(( rnd<<15|$RANDOM ))
    let rnd_bitlen+=15
  done
  echo $(( rnd>>(rnd_bitlen-n) ))
}

# alternative implementation of get_n_rand_bits:
# uses /dev/urandom to generate an n-bit random bitstring uniformly at random
#  (if we assume /dev/urandom is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 56 bits
get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uL /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

# for parameter max, generates an integer in the range {0..max} uniformly at random
# max can be an arbitrary integer, needs not be a power of 2
rand() {
  local rnd max=$1
  # get number of bits needed to represent $max
  local bitlen=$(log2 $((max+1)))
  while
    # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM
    rnd=$(get_n_rand_bits $bitlen)
    (( rnd > max ))
  do :
  done
  echo $rnd
}

# MAIN SCRIPT

# check number of parameters
if (( $# != 1 && $# != 2 ))
then
  cat <<EOF 1>&2
Usage: $(basename $0) [min] max

Returns an integer distributed uniformly at random in the range {min..max}
min defaults to 0
(max - min) can be up to 2**60-1  
EOF
  exit 1
fi

# If we have one parameter, set min to 0 and max to $1
# If we have two parameters, set min to $1 and max to $2
max=0
while (( $# > 0 ))
do
  min=$max
  max=$1
  shift
done

# ensure that min <= max
if (( min > max ))
then
  echo "$(basename $0): error: min is greater than max" 1>&2
  exit 1
fi

# need absolute value of diff since min (and also max) may be negative
diff=$((max-min)) && diff=${diff#-}

echo $(( $(rand $diff) + min ))

Save that to ~/bin/rand and you have at your availability a sweet random function in bash that can sample an integer in a given arbitrary range. The range may contain negative and positive integers and can be up to 2⁶⁰-1 in length:

$ rand 
Usage: rand [min] max

Returns an integer distributed uniformly at random in the range {min..max}
min defaults to 0
(max - min) can be up to 2**60-1  
$ rand 1 10
9
$ rand -43543 -124
-15757
$ rand -3 3
1
$ for i in {0..9}; do rand $((2**60-1)); done
777148045699177620
456074454250332606
95080022501817128
993412753202315192
527158971491831964
336543936737015986
1034537273675883580
127413814010621078
758532158881427336
924637728863691573

All ideas by the other answerers were great. The answers by terdon, J.F. Sebastian, and jimmij used external tools to do the task in a simple and efficient manner. However, I preferred a true bash solution for maximum portability, and maybe a little bit, simply out of love for bash 😉

Ramesh‘s and l0b0‘s answers used /dev/urandom or /dev/random in combination with od. That’s good, however, their approaches had the disadvantage of only being able to sample random integers in the range 0 to 2⁸ⁿ-1 for some n, since this method samples bytes, i.e., bitstrings of length 8. These are quite big jumps with increasing n.

Finally, Falco‘s answer describes the general idea how this could be done for arbitrary ranges (not only powers of two). Basically, for a given range {0..max}, we can determine what the next power of two is, i.e., exactly how many bits are required to represent max as a bitstring. Then we can sample just that many bits and see whether this bistring, as an integer, is greater than max. If so, repeat. Since we sample just as many bits as are required to represent max, each iteration has a probability greater or equal than 50% of succeeding (50% in the worst case, 100% in the best case). So this is very efficient.

My script is basically a concrete implementation of Falco’s answer, written in pure bash and highly efficient since it uses bash’s built-in bitwise operations to sample bitstrings of the desired length. It additionally honors an idea by Eliah Kagan that suggests to use the built-in $RANDOM variable by concatening bitstrings resulting from repeated invocations of $RANDOM. I actually implemented both the possibilities to use /dev/urandom and $RANDOM. By default the above script uses $RANDOM. (And ok, if using /dev/urandom we need od and tr, but these are backed by POSIX.)

So how does it work?

Before I get into this, two observations:

It turns out bash can’t handle integers larger than 2⁶³-1. See for yourself:
```
$ echo $((2**63-1))
9223372036854775807
$ echo $((2**63))
-9223372036854775808
```
It would appear that bash internally uses signed 64-bit integers to store integers. So, at 2⁶³ it “wraps around” and we get a negative integer. So we can’t hope to get any range larger than 2⁶³-1 with whatever random function we use. Bash simply can’t handle it.
Whenever we want to sample a value in an arbitrary range between min and max with possibly min != 0, we can simply sample a value between 0 and max-min instead and then add min to the final result. This works even if min and possibly also max are negative, but we need to be careful to sample a value between 0 and the absolute value of max-min. So then, we can focus on how to sample a random value between 0 and an arbitrary positive integer max. The rest is easy.

Step 1: Determine how many bits are needed to represent an integer (the logarithm)

So for a given value max, we want to know just how many bits are needed to represent it as a bitstring. This is so that later we can randomly sample only just as many bits as are needed, which makes the script so efficient.

Let’s see. Since with n bits, we can represent up to the value 2ⁿ-1, then the number n of bits needed to represent an arbitrary value x is ceiling(log₂(x+1)). So, we need a function to compute the ceiling of a logarithm to the base 2. It is rather self-explanatory:

log2() {
  local x=$1 n=1 l=0
  while (( x>n && n>0 ))
  do
    let n*=2 l++
  done
  echo $l
}

We need the condition n>0 so if it grows too great, wraps around and becomes negative, the loop is guaranteed to terminate.

Step 2: Sample a random a bitstring of length n

The most portable ideas are to either use /dev/urandom (or even /dev/random if there is a strong reason) or bash’s built-in $RANDOM variable. Let’s look at how to do it with $RANDOM first.

Option A: Using $RANDOM

This uses the idea mentioned by Eliah Kagan. Basically, since $RANDOM samples a 15-bit integer, we can use $((RANDOM<<15|RANDOM)) to sample a 30-bit integer. That means, shift a first invocation of $RANDOM by 15 bits to the left, and apply a bitwise or with a second invocation of $RANDOM, effectively concatening two independently sampled bitstrings (or at least as independent as bash’s built-in $RANDOM goes).

We can repeat this to obtain a 45-bit or 60-bit integer. After that bash can’t handle it anymore, but this means we can easily sample a random value between 0 and 2⁶⁰-1. So, to sample an n-bit integer, we repeat the procedure until our random bitstring, whose length grows in 15-bit steps, has a length greater or equal than n. Finally, we cut off the bits that are too much by appropriately bitwise shifting to the right, and we end up with a n-bit random integer.

get_n_rand_bits() {
  local n=$1 rnd=$RANDOM rnd_bitlen=15
  while (( rnd_bitlen < n ))
  do
    rnd=$(( rnd<<15|$RANDOM ))
    let rnd_bitlen+=15
  done
  echo $(( rnd>>(rnd_bitlen-n) ))
}

Option B: Using /dev/urandom

Alternatively, we can use od and /dev/urandom to sample an n-bit integer. od will read bytes, i.e., bitstrings of length 8. Similarly as in the previous method, we sample just so many bytes that the equivalent number of sampled bits is greater or equal than n, and cut off the bits that are too much.

The lowest number of bytes needed to get at least n bits is the lowest multiple of 8 that is greater or equal than n, i.e., floor((n+7)/8).

This only works up to 56-bit integers. Sampling one more byte would get us an 64-bit integer, i.e., a value up to 2⁶⁴-1, which bash can’t handle.

get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uL /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

Putting the pieces together: Get random integers in arbitrary ranges

We can sample n-bit bitstrings now, but we want to sample integers in a range from 0 to max, uniformly at random, where max may be arbitrary, not necessarily a power of two. (We can’t use modulo as that creates a bias.)

The whole point why we tried so hard to sample just as many bits as are needed to represent the value max, is that we can now safely (and efficiently) use a loop to repeatedly sample an n-bit bitstring until we sample a value that is lower or equal to max. In the worst case (max is a power of two), each iteration terminates with a probability of 50%, and in the best case (max is a power of two minus one), the first iteration terminates with certainty.

rand() {
  local rnd max=$1
  # get number of bits needed to represent $max
  local bitlen=$(log2 $((max+1)))
  while
    # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM
    rnd=$(get_n_rand_bits $bitlen)
    (( rnd > max ))
  do :
  done
  echo $rnd
}

Wrapping things up

Finally, we want to sample integers between min and max, where min and max can be arbitrary, even negative. As previously mentioned, this is now trivial.

Let’s put it all in a bash script. Do some argument parsing stuff… We want two arguments min and max, or only one argument max, where min defaults to 0.

# check number of parameters
if (( $# != 1 && $# != 2 ))
then
  cat <<EOF 1>&2
Usage: $(basename $0) [min] max

Returns an integer distributed uniformly at random in the range {min..max}
min defaults to 0
(max - min) can be up to 2**60-1  
EOF
  exit 1
fi

# If we have one parameter, set min to 0 and max to $1
# If we have two parameters, set min to $1 and max to $2
max=0
while (( $# > 0 ))
do
  min=$max
  max=$1
  shift
done

# ensure that min <= max
if (( min > max ))
then
  echo "$(basename $0): error: min is greater than max" 1>&2
  exit 1
fi

…and, finally, to sample uniformly at random a value between min and max, we sample a random integer between 0 and the absolute value of max-min, and add min to the final result. 🙂

diff=$((max-min)) && diff=${diff#-}

echo $(( $(rand $diff) + min ))

Inspired by this, I might try to use dieharder to test and benchmark this PRNG, and put my findings in here. 🙂

Method 3

Can it be zsh?

zmodload zsh/mathfunc
max=1000
integer rnd='rand48() * max'

(for random numbers between 0 and 999)

You may want to use seed as well with rand48(seed). See man zshmodules and man 3 erand48 for detailed description if interested.

Method 4

$ python -c 'import random as R; print(R.randint(-3, 5**1234))'

python is available on Redhat, Debian-based systems.

Method 5

If you want a number from 0 through (2^n)-1 where n mod 8 = 0 you can simply get n / 8 bytes from /dev/random. For example, to get the decimal representation of a random int you could:

od --read-bytes=4 --address-radix=n --format=u4 /dev/random | awk '{print $1}'

If you want to take just n bits you can first take ceiling(n / 8) bytes and right shift to the amount you want. For example if you want 15 bits:

echo $(($(od --read-bytes=2 --address-radix=n --format=u4 /dev/random | awk '{print $1}') >> 1))

If you are absolutely sure that you don’t care about the quality of the randomness and you want to guarantee a minimal run time you can use /dev/urandom instead of /dev/random. Make sure you know what you are doing before using /dev/urandom!

Method 6

Assuming you don’t object to using external tools, this should fulfill your requirements:

rand=$(perl -e 'print int(rand(2**32-1))');

It’s using perl’s rand function which takes an upper limit as a parameter. You can set it to whatever you like. How close this is to true randomness in the abstract mathematical definition is beyond the scope of this site but it should be fine unless you need it for extremely sensitive encryption or the like. Perhaps even there but I won’t venture an opinion.

Method 7

You should get the nearest (2^X)-1 equal or grater than your desired maximum and get the number of bits. Then just call /dev/random multiple times and append all the bits together until you have enough, truncating all bits which are too much.
If the resulting number is bigger than your max repeat. In the worst case you have a greater than 50% chance of getting a random number below your Maximum so (for this worst case) you will take two calls on average.

Method 8

For generating 32 bit random integers, just do:

min=0
max=4294967295
rnd=$((SRANDOM % ( max - min + 1 ) + min))
echo "$rnd"

$SRANDOM

is available in bash since version 5.1
supports a range of 32 bit (unsigned, with values uniformly distributed between 0 and 2³² – 1)
is self seeding and doesn’t need and can’t be seeded externally

Method 9

For generating p.e. 32 bit or 63 bit random integers, just do:

min=0  # min
max=4294967295  # for 32 bit 4.294.967.295, for 63 bit 9223372036854775807
n=1    # count of numbers, dont use a higher count than 1

shuf -rn "$n" -i "$min-$max"

Supports a range of up to 63 bit (unsigned, with values uniformly distributed between 0 and 9223372036854775807

Remark:

Dont generate more than one number by one call, if you need uniformly distributed numbers. P.e. if you need uniformly distributed numbers, you have to call it 10 times.

Method 10

Your answer is interesting but quite long.

If you want arbitrarily large numbers, then you could join multiple random numbers in a helper:

# $1 - number of 'digits' of size base
function random_helper()
{
  base=32768
  random=0
  for((i=0; i<$1; ++i)); do
    let "random+=$RANDOM*($base**$i)"
  done
  echo $random
}

If the problem is bias, then just remove it.

# $1 - min value wanted
# $2 - max value wanted
function random()
{
  MAX=32767
  min=$1
  max=$(($2+1))
  size=$((max-min))
  bias_range=$((MAX/size))
  while
    random=$RANDOM
  [ $((random/size)) -eq $bias_range ]; do :; done
  echo $((random%size+min))
}

Joining these functions together

# $1 - min value wanted
# $2 - max value wanted
# $3 - number of 'digits' of size base
function random()
{
  base=32768
  MAX=$((base**$3-1))
  min=$1
  max=$(($2+1))
  size=$((max-min))
  bias_range=$((MAX/size))
  while
    random=$(random_helper)
  [ $((random/size)) -eq $bias_range ]; do :; done
  echo $((random%size+min))
}

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating