Generating Discrete random variables with specified weights using SciPy or NumPy

I am looking for a simple function that can generate an array of specified random values based on their corresponding (also specified) probabilities. I only need it to generate float values, but I don’t see why it shouldn’t be able to generate any scalar. I can think of many ways of building this from existing functions, but I think I probably just missed an obvious SciPy or NumPy function.

E.g.:

>>> values = [1.1, 2.2, 3.3]
>>> probabilities = [0.2, 0.5, 0.3]
>>> print some_function(values, probabilities, size=10)
(2.2, 1.1, 3.3, 3.3, 2.2, 2.2, 1.1, 2.2, 3.3, 2.2)

Note: I found scipy.stats.rv_discrete but I don’t understand how it works. Specifically, I do not understand what this (below) means nor what it should do:

numargs = generic.numargs
[ <shape(s)> ] = ['Replace with resonable value', ]*numargs

If rv_discrete is what I should be using, could you please provide me with a simple example and an explanation of the above “shape” statement?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Drawing from a discrete distribution is directly built into numpy.
The function is called random.choice (difficult to find without any reference to discrete distributions in the numpy docs).

elements = [1.1, 2.2, 3.3]
probabilities = [0.2, 0.5, 0.3]
np.random.choice(elements, 10, p=probabilities)

Method 2

Here is a short, relatively simple function that returns weighted values, it uses NumPy’s digitize, accumulate, and random_sample.

import numpy as np
from numpy.random import random_sample

def weighted_values(values, probabilities, size):
    bins = np.add.accumulate(probabilities)
    return values[np.digitize(random_sample(size), bins)]

values = np.array([1.1, 2.2, 3.3])
probabilities = np.array([0.2, 0.5, 0.3])

print weighted_values(values, probabilities, 10)
#Sample output:
[ 2.2  2.2  1.1  2.2  2.2  3.3  3.3  2.2  3.3  3.3]

It works like this:

First using accumulate we create bins.
Then we create a bunch of random numbers (between 0, and 1) using random_sample
We use digitize to see which bins these numbers fall into.
And return the corresponding values.

Method 3

You were going in a good direction: the built-in scipy.stats.rv_discrete() quite directly creates a discrete random variable. Here is how it works:

>>> from scipy.stats import rv_discrete  

>>> values = numpy.array([1.1, 2.2, 3.3])
>>> probabilities = [0.2, 0.5, 0.3]

>>> distrib = rv_discrete(values=(range(len(values)), probabilities))  # This defines a Scipy probability distribution

>>> distrib.rvs(size=10)  # 10 samples from range(len(values))
array([1, 2, 0, 2, 2, 0, 2, 1, 0, 2])

>>> values[_]  # Conversion to specific discrete values (the fact that values is a NumPy array is used for the indexing)
[2.2, 3.3, 1.1, 3.3, 3.3, 1.1, 3.3, 2.2, 1.1, 3.3]

The distribution distrib above thus returns indexes from the values list.

More generally, rv_discrete() takes a sequence of integer values in the first elements of its values=(…,…) argument, and returns these values, in this case; there is no need to convert to specific (float) values. Here is an example:

>>> values = [10, 20, 30]
>>> probabilities = [0.2, 0.5, 0.3]
>>> distrib = rv_discrete(values=(values, probabilities))
>>> distrib.rvs(size=10)
array([20, 20, 20, 20, 20, 20, 20, 30, 20, 20])

where (integer) input values are directly returned with the desired probability.

Method 4

The simplest DIY way would be to sum up the probabilities into a cumulative distribution.
This way, you split the unit interval into sub-intervals of the length equal to your original probabilities. Now generate a single random number uniform on [0,1), and and see to which interval it lands.

Method 5

You could also use Lea, a pure Python package dedicated to discrete probability distributions.

>>> distrib = Lea.fromValFreqs((1.1,2),(2.2,5),(3.3,3))
>>> distrib
1.1 : 2/10
2.2 : 5/10
3.3 : 3/10
>>> distrib.random(10)
(2.2, 2.2, 1.1, 2.2, 2.2, 2.2, 1.1, 3.3, 1.1, 3.3)

Et voilà!

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating