# NumPy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

How do I get the exponential weighted moving average in NumPy just like the following in pandas?

```import pandas as pd
from datetime import datetime

# Declare variables
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']
windowSize = 20

# Get PANDAS exponential weighted moving average
ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()

print(ewm_pd)```

I tried the following with NumPy

```import numpy as np
from datetime import datetime

# From this post: http://stackoverflow.com/a/40085052/3293881 by @Divakar
def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S
nrows = ((a.size - L) // S) + 1
n = a.strides
return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))

def numpyEWMA(price, windowSize):
weights = np.exp(np.linspace(-1., 0., windowSize))
weights /= weights.sum()

a2D = strided_app(price, windowSize, 1)

returnArray = np.empty((price.shape))
returnArray.fill(np.nan)
for index in (range(a2D.shape)):
returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]
return np.reshape(returnArray, (-1, 1))

# Declare variables
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']
windowSize = 20

# Get NumPy exponential weighted moving average
ewma_np = numpyEWMA(ibm, windowSize)

print(ewma_np)```

But the results are not similar as the ones in pandas.

Is there maybe a better approach to calculate the exponential weighted moving average directly in NumPy and get the exact same result as the `pandas.ewm().mean()`?

At 60,000 requests on pandas solution, I get about 230 seconds. I am sure that with a pure NumPy, this can be decreased significantly.

Contents

### Method 1

I think I have finally cracked it!

Here’s a vectorized version of `numpy_ewma` function that’s claimed to be producing the correct results from `@RaduS's post`

```def numpy_ewma_vectorized(data, window):

alpha = 2 /(window + 1.0)
alpha_rev = 1-alpha

scale = 1/alpha_rev
n = data.shape

r = np.arange(n)
scale_arr = scale**r
offset = data*alpha_rev**(r+1)
pw0 = alpha*alpha_rev**(n-1)

mult = data*pw0*scale_arr
cumsums = mult.cumsum()
out = offset + cumsums*scale_arr[::-1]
return out```

Further boost

We can boost it further with some code re-use, like so –

```def numpy_ewma_vectorized_v2(data, window):

alpha = 2 /(window + 1.0)
alpha_rev = 1-alpha
n = data.shape

pows = alpha_rev**(np.arange(n+1))

scale_arr = 1/pows[:-1]
offset = data*pows[1:]
pw0 = alpha*alpha_rev**(n-1)

mult = data*pw0*scale_arr
cumsums = mult.cumsum()
out = offset + cumsums*scale_arr[::-1]
return out```

Runtime test

Let’s time these two against the same loopy function for a big dataset.

```In : data = np.random.randint(2,9,(5000))
...: window = 20
...:

In : np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized(data, window))
Out: True

In : np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized_v2(data, window))
Out: True

In : %timeit numpy_ewma(data, window)
100 loops, best of 3: 6.03 ms per loop

In : %timeit numpy_ewma_vectorized(data, window)
1000 loops, best of 3: 665 µs per loop

In : %timeit numpy_ewma_vectorized_v2(data, window)
1000 loops, best of 3: 357 µs per loop

In : 6030/357.0
Out: 16.89075630252101```

There is around a 17 times speedup!

### Method 2

Updated 08/06/2019

PURE NUMPY, FAST & VECTORIZED SOLUTION FOR LARGE INPUTS

`out` parameter for in-place computation,
`dtype` parameter,
index `order` parameter

This function is equivalent to pandas’ `ewm(adjust=False).mean()`, but much faster. `ewm(adjust=True).mean()` (the default for pandas) can produce different values at the start of the result. I am working to add the `adjust` functionality to this solution.

@Divakar’s answer leads to floating point precision problems when the input is too large. This is because `(1-alpha)**(n+1) -> 0` when `n -> inf` and `alpha -> 1`, leading to divide-by-zero’s and `NaN` values popping up in the calculation.

Here is my fastest solution with no precision problems, nearly fully vectorized. It’s gotten a little complicated but the performance is great, especially for really huge inputs. Without using in-place calculations (which is possible using the `out` parameter, saving memory allocation time): 3.62 seconds for 100M element input vector, 3.2ms for a 100K element input vector, and 293µs for a 5000 element input vector on a pretty old PC (results will vary with different `alpha`/`row_size` values).

```# tested with python3 & numpy 1.15.2
import numpy as np

def ewma_vectorized_safe(data, alpha, row_size=None, dtype=None, order='C', out=None):
"""
Reshapes data before calculating EWMA, then iterates once over the rows
to calculate the offset without precision issues
:param data: Input data, will be flattened.
:param alpha: scalar float in range (0,1)
The alpha parameter for the moving average.
:param row_size: int, optional
The row size to use in the computation. High row sizes need higher precision,
low values will impact performance. The optimal value depends on the
platform and the alpha being used. Higher alpha values require lower
row size. Default depends on dtype.
:param dtype: optional
Data type used for calculations. Defaults to float64 unless
data.dtype is float32, then it will use float32.
:param order: {'C', 'F', 'A'}, optional
Order to use when flattening the data. Defaults to 'C'.
:param out: ndarray, or None, optional
A location into which the result is stored. If provided, it must have
the same shape as the desired output. If not provided or `None`,
a freshly-allocated array is returned.
:return: The flattened result.
"""
data = np.array(data, copy=False)

if dtype is None:
if data.dtype == np.float32:
dtype = np.float32
else:
dtype = np.float
else:
dtype = np.dtype(dtype)

row_size = int(row_size) if row_size is not None
else get_max_row_size(alpha, dtype)

if data.size <= row_size:
# The normal function can handle this input, use that
return ewma_vectorized(data, alpha, dtype=dtype, order=order, out=out)

if data.ndim > 1:
# flatten input
data = np.reshape(data, -1, order=order)

if out is None:
out = np.empty_like(data, dtype=dtype)
else:
assert out.shape == data.shape
assert out.dtype == dtype

row_n = int(data.size // row_size)  # the number of rows to use
trailing_n = int(data.size % row_size)  # the amount of data leftover
first_offset = data

if trailing_n > 0:
# set temporary results to slice view of out parameter
out_main_view = np.reshape(out[:-trailing_n], (row_n, row_size))
data_main_view = np.reshape(data[:-trailing_n], (row_n, row_size))
else:
out_main_view = out
data_main_view = data

# get all the scaled cumulative sums with 0 offset
ewma_vectorized_2d(data_main_view, alpha, axis=1, offset=0, dtype=dtype,
order='C', out=out_main_view)

scaling_factors = (1 - alpha) ** np.arange(1, row_size + 1)
last_scaling_factor = scaling_factors[-1]

# create offset array
offsets = np.empty(out_main_view.shape, dtype=dtype)
offsets = first_offset
# iteratively calculate offset for each row
for i in range(1, out_main_view.shape):
offsets[i] = offsets[i - 1] * last_scaling_factor + out_main_view[i - 1, -1]

# add the offsets to the result
out_main_view += offsets[:, np.newaxis] * scaling_factors[np.newaxis, :]

if trailing_n > 0:
# process trailing data in the 2nd slice of the out parameter
ewma_vectorized(data[-trailing_n:], alpha, offset=out_main_view[-1, -1],
dtype=dtype, order='C', out=out[-trailing_n:])
return out

def get_max_row_size(alpha, dtype=float):
assert 0. <= alpha < 1.
# This will return the maximum row size possible on
# your platform for the given dtype. I can find no impact on accuracy
# at this value on my machine.
# Might not be the optimal value for speed, which is hard to predict
# due to numpy's optimizations
# Use np.finfo(dtype).eps if you  are worried about accuracy
# and want to be extra safe.
epsilon = np.finfo(dtype).tiny
# If this produces an OverflowError, make epsilon larger
return int(np.log(epsilon)/np.log(1-alpha)) + 1```

The 1D ewma function:

```def ewma_vectorized(data, alpha, offset=None, dtype=None, order='C', out=None):
"""
Calculates the exponential moving average over a vector.
Will fail for large inputs.
:param data: Input data
:param alpha: scalar float in range (0,1)
The alpha parameter for the moving average.
:param offset: optional
The offset for the moving average, scalar. Defaults to data.
:param dtype: optional
Data type used for calculations. Defaults to float64 unless
data.dtype is float32, then it will use float32.
:param order: {'C', 'F', 'A'}, optional
Order to use when flattening the data. Defaults to 'C'.
:param out: ndarray, or None, optional
A location into which the result is stored. If provided, it must have
the same shape as the input. If not provided or `None`,
a freshly-allocated array is returned.
"""
data = np.array(data, copy=False)

if dtype is None:
if data.dtype == np.float32:
dtype = np.float32
else:
dtype = np.float64
else:
dtype = np.dtype(dtype)

if data.ndim > 1:
# flatten input
data = data.reshape(-1, order)

if out is None:
out = np.empty_like(data, dtype=dtype)
else:
assert out.shape == data.shape
assert out.dtype == dtype

if data.size < 1:
# empty input, return empty array
return out

if offset is None:
offset = data

alpha = np.array(alpha, copy=False).astype(dtype, copy=False)

# scaling_factors -> 0 as len(data) gets large
# this leads to divide-by-zeros below
scaling_factors = np.power(1. - alpha, np.arange(data.size + 1, dtype=dtype),
dtype=dtype)
# create cumulative sum array
np.multiply(data, (alpha * scaling_factors[-2]) / scaling_factors[:-1],
dtype=dtype, out=out)
np.cumsum(out, dtype=dtype, out=out)

# cumsums / scaling
out /= scaling_factors[-2::-1]

if offset != 0:
offset = np.array(offset, copy=False).astype(dtype, copy=False)
out += offset * scaling_factors[1:]

return out```

The 2D ewma function:

```def ewma_vectorized_2d(data, alpha, axis=None, offset=None, dtype=None, order='C', out=None):
"""
Calculates the exponential moving average over a given axis.
:param data: Input data, must be 1D or 2D array.
:param alpha: scalar float in range (0,1)
The alpha parameter for the moving average.
:param axis: The axis to apply the moving average on.
If axis==None, the data is flattened.
:param offset: optional
The offset for the moving average. Must be scalar or a
vector with one element for each row of data. If set to None,
defaults to the first value of each row.
:param dtype: optional
Data type used for calculations. Defaults to float64 unless
data.dtype is float32, then it will use float32.
:param order: {'C', 'F', 'A'}, optional
Order to use when flattening the data. Ignored if axis is not None.
:param out: ndarray, or None, optional
A location into which the result is stored. If provided, it must have
the same shape as the desired output. If not provided or `None`,
a freshly-allocated array is returned.
"""
data = np.array(data, copy=False)

assert data.ndim <= 2

if dtype is None:
if data.dtype == np.float32:
dtype = np.float32
else:
dtype = np.float64
else:
dtype = np.dtype(dtype)

if out is None:
out = np.empty_like(data, dtype=dtype)
else:
assert out.shape == data.shape
assert out.dtype == dtype

if data.size < 1:
# empty input, return empty array
return out

if axis is None or data.ndim < 2:
# use 1D version
if isinstance(offset, np.ndarray):
offset = offset
return ewma_vectorized(data, alpha, offset, dtype=dtype, order=order,
out=out)

assert -data.ndim <= axis < data.ndim

# create reshaped data views
out_view = out
if axis < 0:
axis = data.ndim - int(axis)

if axis == 0:
# transpose data views so columns are treated as rows
data = data.T
out_view = out_view.T

if offset is None:
# use the first element of each row as the offset
offset = np.copy(data[:, 0])
elif np.size(offset) == 1:
offset = np.reshape(offset, (1,))

alpha = np.array(alpha, copy=False).astype(dtype, copy=False)

# calculate the moving average
row_size = data.shape
row_n = data.shape
scaling_factors = np.power(1. - alpha, np.arange(row_size + 1, dtype=dtype),
dtype=dtype)
# create a scaled cumulative sum array
np.multiply(
data,
np.multiply(alpha * scaling_factors[-2], np.ones((row_n, 1), dtype=dtype),
dtype=dtype)
/ scaling_factors[np.newaxis, :-1],
dtype=dtype, out=out_view
)
np.cumsum(out_view, axis=1, dtype=dtype, out=out_view)
out_view /= scaling_factors[np.newaxis, -2::-1]

if not (np.size(offset) == 1 and offset == 0):
offset = offset.astype(dtype, copy=False)
# add the offsets to the scaled cumulative sums
out_view += offset[:, np.newaxis] * scaling_factors[np.newaxis, 1:]

return out```

usage:

```data_n = 100000000
data = ((0.5*np.random.randn(data_n)+0.5) % 1) * 100

span = 5000  # span >= 1
alpha = 2/(span+1)  # for pandas` span parameter

# com = 1000  # com >= 0
# alpha = 1/(1+com)  # for pandas` center-of-mass parameter

# halflife = 100  # halflife > 0
# alpha = 1 - np.exp(np.log(0.5)/halflife)  # for pandas` half-life parameter

result = ewma_vectorized_safe(data, alpha)```

Just a tip

It is easy to calculate a ‘window size’ (technically exponential averages have infinite ‘windows’) for a given `alpha`, dependent on the contribution of the data in that window to the average. This is useful for example to chose how much of the start of the result to treat as unreliable due to border effects.

```def window_size(alpha, sum_proportion):
# Increases with increased sum_proportion and decreased alpha
# solve (1-alpha)**window_size = (1-sum_proportion) for window_size
return int(np.log(1-sum_proportion) / np.log(1-alpha))

alpha = 0.02
sum_proportion = .99  # window covers 99% of contribution to the moving average
window = window_size(alpha, sum_proportion)  # = 227
sum_proportion = .75  # window covers 75% of contribution to the moving average
window = window_size(alpha, sum_proportion)  # = 68```

The `alpha = 2 / (window_size + 1.0)` relation used in this thread (the ‘span’ option from pandas) is a very rough approximation of the inverse of the above function (with `sum_proportion~=0.87`). `alpha = 1 - np.exp(np.log(1-sum_proportion)/window_size)` is more accurate (the ‘half-life’ option from pandas equals this formula with `sum_proportion=0.5`).

In the following example, `data` represents a continuous noisy signal. `cutoff_idx` is the first position in `result` where at least 99% of the value is dependent on separate values in `data` (i.e. less than 1% depends on data). The data up to `cutoff_idx` is excluded from the final results because it is too dependent on the first value in `data`, therefore possibly skewing the average.

```result = ewma_vectorized_safe(data, alpha, chunk_size)
sum_proportion = .99
cutoff_idx = window_size(alpha, sum_proportion)
result = result[cutoff_idx:]```

To illustrate the problem the above solve you can run this a few times, notice the often-appearing false start of the red line, which is skipped after `cutoff_idx`:

```data_n = 100000
data = np.random.rand(data_n) * 100
window = 1000
sum_proportion = .99
alpha = 1 - np.exp(np.log(1-sum_proportion)/window)

result = ewma_vectorized_safe(data, alpha)

cutoff_idx = window_size(alpha, sum_proportion)
x = np.arange(start=0, stop=result.size)

import matplotlib.pyplot as plt
plt.plot(x[:cutoff_idx+1], result[:cutoff_idx+1], '-r',
x[cutoff_idx:], result[cutoff_idx:], '-b')
plt.show()```

note that `cutoff_idx==window` because alpha was set with the inverse of the `window_size()` function, with the same `sum_proportion`.
This is similar to how pandas applies `ewm(span=window, min_periods=window)`.

### Method 3

#### Fastest EWMA 23x `pandas`

The question is strictly asking for a `numpy` solution, however, it seems that the OP was actually just after a pure `numpy` solution to speed up runtime.

I solved a similar problem but instead looked towards `numba.jit` which massively speeds the compute time

```In : a = np.random.random(10**7)
...: df = pd.Series(a)
In : %timeit numpy_ewma(a, 10)               # /a/42915307/4013571
...: %timeit df.ewm(span=10).mean()          # pandas
...: %timeit numpy_ewma_vectorized_v2(a, 10) # best w/o numba: /a/42926270/4013571
...: %timeit _ewma(a, 10)                    # fastest accurate (below)
...: %timeit _ewma_infinite_hist(a, 10)      # fastest overall (below)
4.14 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
991 ms ± 52.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
396 ms ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
181 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.6 ms ± 979 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)```

Scaling down to smaller arrays of `a = np.random.random(100)` (results in the same order)

```41.6 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
945 ms ± 12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
16 µs ± 93.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.66 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.14 µs ± 5.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)```

It is also worth pointing out that my functions below are identically aligned to the `pandas` (see the examples in docstr), whereas a few of the answers here take various different approximations. For example,

```In : print(pd.DataFrame([1,2,3]).ewm(span=2).mean().values.ravel())
...: print(numpy_ewma_vectorized_v2(np.array([1,2,3]), 2))
...: print(numpy_ewma(np.array([1,2,3]), 2))
[1.         1.75       2.61538462]
[1.         1.66666667 2.55555556]
[1.         1.18181818 1.51239669]```

The source code which I have documented for my own library

```import numpy as np
from numba import jit
from numba import float64
from numba import int64

@jit((float64[:], int64), nopython=True, nogil=True)
def _ewma(arr_in, window):
r"""Exponentialy weighted moving average specified by a decay ``window``
to provide better adjustments for small windows via:

y[t] = (x[t] + (1-a)*x[t-1] + (1-a)^2*x[t-2] + ... + (1-a)^n*x[t-n]) /
(1 + (1-a) + (1-a)^2 + ... + (1-a)^n).

Parameters
----------
arr_in : np.ndarray, float64
A single dimenisional numpy array
window : int64
The decay window, or 'span'

Returns
-------
np.ndarray
The EWMA vector, same length / shape as ``arr_in``

Examples
--------
>>> import pandas as pd
>>> a = np.arange(5, dtype=float)
>>> np.array_equal(_ewma_infinite_hist(a, 10), exp.values.ravel())
True
"""
n = arr_in.shape
ewma = np.empty(n, dtype=float64)
alpha = 2 / float(window + 1)
w = 1
ewma_old = arr_in
ewma = ewma_old
for i in range(1, n):
w += (1-alpha)**i
ewma_old = ewma_old*(1-alpha) + arr_in[i]
ewma[i] = ewma_old / w
return ewma

@jit((float64[:], int64), nopython=True, nogil=True)
def _ewma_infinite_hist(arr_in, window):
r"""Exponentialy weighted moving average specified by a decay ``window``
assuming infinite history via the recursive form:

(2) (i)  y = x; and
(ii) y[t] = a*x[t] + (1-a)*y[t-1] for t>0.

This method is less accurate that ``_ewma`` but
much faster:

In : import numpy as np, bars
...: arr = np.random.random(100000)
...: %timeit bars._ewma(arr, 10)
...: %timeit bars._ewma_infinite_hist(arr, 10)
3.74 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
262 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Parameters
----------
arr_in : np.ndarray, float64
A single dimenisional numpy array
window : int64
The decay window, or 'span'

Returns
-------
np.ndarray
The EWMA vector, same length / shape as ``arr_in``

Examples
--------
>>> import pandas as pd
>>> a = np.arange(5, dtype=float)
>>> np.array_equal(_ewma_infinite_hist(a, 10), exp.values.ravel())
True
"""
n = arr_in.shape
ewma = np.empty(n, dtype=float64)
alpha = 2 / float(window + 1)
ewma = arr_in
for i in range(1, n):
ewma[i] = arr_in[i] * alpha + ewma[i-1] * (1 - alpha)
return ewma```

### Method 4

Here is an implementation using NumPy that is equivalent to using `df.ewm(alpha=alpha).mean()`. After reading the documentation, it is just a few matrix operations. The trick is constructing the right matrices.

It is worth noting that because we are creating float matrices, you can quickly eat through your memory if the input array is too large.

```import pandas as pd
import numpy as np

def ewma(x, alpha):
'''
Returns the exponentially weighted moving average of x.

Parameters:
-----------
x : array-like
alpha : float {0 <= alpha <= 1}

Returns:
--------
ewma: numpy array
the exponentially weighted moving average
'''
# Coerce x to an array
x = np.array(x)
n = x.size

# Create an initial weight matrix of (1-alpha), and a matrix of powers
# to raise the weights by
w0 = np.ones(shape=(n,n)) * (1-alpha)
p = np.vstack([np.arange(i,i-n,-1) for i in range(n)])

# Create the weight matrix
w = np.tril(w0**p,0)

# Calculate the ewma
return np.dot(w, x[::np.newaxis]) / w.sum(axis=1)```

Let’s test its:

```alpha = 0.55
x = np.random.randint(0,30,15)
df = pd.DataFrame(x, columns=['A'])
df.ewm(alpha=alpha).mean()

# returns:
#             A
# 0   13.000000
# 1   22.655172
# 2   20.443268
# 3   12.159796
# 4   14.871955
# 5   15.497575
# 6   20.743511
# 7   20.884818
# 8   24.250715
# 9   18.610901
# 10  17.174686
# 11  16.528564
# 12  17.337879
# 13   7.801912
# 14  12.310889

ewma(x=x, alpha=alpha)

# returns:
# array([ 13.        ,  22.65517241,  20.44326778,  12.1597964 ,
#        14.87195534,  15.4975749 ,  20.74351117,  20.88481763,
#        24.25071484,  18.61090129,  17.17468551,  16.52856393,
#        17.33787888,   7.80191235,  12.31088889])```

### Method 5

Given `alpha` and `windowSize`, here’s an approach to simulate the corresponding behavior on NumPy –

```def numpy_ewm_alpha(a, alpha, windowSize):
wghts = (1-alpha)**np.arange(windowSize)
wghts /= wghts.sum()
out = np.full(df.shape,np.nan)
out[windowSize-1:] = np.convolve(a,wghts,'valid')
return out```

Sample runs for verification –

```In : alpha = 0.55
...: windowSize = 20
...:

In : df = pd.DataFrame(np.random.randint(2,9,(100)))

In : out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
...:
Max. error : 5.10531254605e-07

In : alpha = 0.75
...: windowSize = 30
...:

In : out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))

Max. error : 8.881784197e-16```

Runtime test on bigger dataset –

```In : alpha = 0.55
...: windowSize = 20
...:

In : df = pd.DataFrame(np.random.randint(2,9,(10000)))

In : %timeit df.ewm(alpha = alpha, min_periods=windowSize).mean()
1000 loops, best of 3: 851 µs per loop

In : %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
1000 loops, best of 3: 204 µs per loop```

Further boost

For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from `np.convolve`, like so –

```def numpy_ewm_alpha_v2(a, alpha, windowSize):
wghts = (1-alpha)**np.arange(windowSize)
wghts /= wghts.sum()
out = np.convolve(a,wghts)
out[:windowSize-1] = np.nan
return out[:a.size]```

Timings –

```In : alpha = 0.55
...: windowSize = 20
...:

In : df = pd.DataFrame(np.random.randint(2,9,(10000)))

In : %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
1000 loops, best of 3: 204 µs per loop

In : %timeit numpy_ewm_alpha_v2(df.values.ravel(), alpha = alpha, windowSize = windowSize)
10000 loops, best of 3: 195 µs per loop```

### Method 6

A very simple solution that avoids numba and that is within a factor 2 of Alexander McFarlane’s solution for large arrays is to use scipy’s `lfilter` function (because an EWMA is a linear filter):

```from scipy.signal import lfiltic, lfilter
# careful not to mix between scipy.signal and standard python signal
# (https://docs.python.org/3/library/signal.html) if your code handles some processes

def ewma_linear_filter(array, window):
alpha = 2 /(window + 1)
b = [alpha]
a = [1, alpha-1]
zi = lfiltic(b, a, array[0:1], )
return lfilter(b, a, array, zi=zi)```

Timings are as follows:

```window = 100  # doesn't have any impact on run time
for n in [1000, 10_000, 100_000, 1_000_000, 10_000_000, 100_000_000]:
data = np.random.normal(0, 1, n)
print(f'n={n:,d}, window={window}')
%timeit _ewma_infinite_hist(data, window)
%timeit ewma_linear_filter(data, window)
print()

n=1,000, window=100
5.01 µs ± 23.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
58.4 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

n=10,000, window=100
39 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
134 µs ± 387 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

n=100,000, window=100
373 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
845 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

n=1,000,000, window=100
5.35 ms ± 22 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.77 ms ± 78.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

n=10000000, window=100
53.7 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
96.6 ms ± 2.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

n=10,0000,000, window=100
547 ms ± 5.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
963 ms ± 4.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)```

### Method 7

This answer may seem irrelevant. But, for those who also need to calculate the exponentially weighted variance (and also standard deviation) with NumPy, the following solution will be useful:

```import numpy as np

def ew(a, alpha, winSize):
_alpha = 1 - alpha
ws = _alpha ** np.arange(winSize)
w_sum = ws.sum()
ew_mean = np.convolve(a, ws)[winSize - 1] / w_sum
bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum())
ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias
ew_std = np.sqrt(ew_var)
return (ew_mean, ew_var, ew_std)```

### Method 8

Here is another solution O came up with in the meantime. It is about four times faster than the pandas solution.

```def numpy_ewma(data, window):
returnArray = np.empty((data.shape))
returnArray.fill(np.nan)
e = data
alpha = 2 / float(window + 1)
for s in range(data.shape):
e =  ((data[s]-e) *alpha ) + e
returnArray[s] = e
return returnArray```

I used this formula as a starting point. I am sure that this can be improved even more, but it is at least a starting point.

### Method 9

@Divakar’s answer seems to cause overflow when dealing with

`numpy_ewma_vectorized(np.random.random(500000), 10)`

What I have been using is:

```def EMA(input, time_period=10): # For time period = 10
t_ = time_period - 1
ema = np.zeros_like(input,dtype=float)
multiplier = 2.0 / (time_period + 1)
#multiplier = 1 - multiplier
for i in range(len(input)):
# Special Case
if i > t_:
ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1]
else:
ema[i] = np.mean(input[:i+1])
return ema```

However, this is way slower than the panda solution:

```from pandas import ewma as pd_ema
def EMA_fast(X, time_period = 10):
out = pd_ema(X, span=time_period, min_periods=time_period)
out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period))
return out```

### Method 10

Derivation: Implementation:

```def ema(p:np.ndarray, a:float) -> np.ndarray:
o = np.empty(p.shape, p.dtype)
#   (1-α)^0, (1-α)^1, (1-α)^2, ..., (1-α)^n
np.power(1.0 - a, np.arange(0.0, p.shape, 1.0, p.dtype), o)
#   α*P0, α*P1, α*P2, ..., α*Pn
np.multiply(a, p, p)
#   α*P0/(1-α)^0, α*P1/(1-α)^1, α*P2/(1-α)^2, ..., α*Pn/(1-α)^n
np.divide(p, o, p)
#   α*P0/(1-α)^0, α*P0/(1-α)^0 + α*P1/(1-α)^1, ...
np.cumsum(p, out=p)
#   (α*P0/(1-α)^0)*(1-α)^0, (α*P0/(1-α)^0 + α*P1/(1-α)^1)*(1-α)^1, ...
np.multiply(p, o, o)
return o```

Note: the input will be overwritten.

### Method 11

Building on top of Divakar‘s great answer, here is an implementation which corresponds to the `adjust=True` flag of the pandas function, i.e. using weights rather than recursion.

```def numpy_ewma(data, window):
alpha = 2 /(window + 1.0)
scale = 1/(1-alpha)
n = data.shape
scale_arr = (1-alpha)**(-1*np.arange(n))
weights = (1-alpha)**np.arange(n)
pw0 = (1-alpha)**(n-1)
mult = data*pw0*scale_arr
cumsums = mult.cumsum()
out = cumsums*scale_arr[::-1] / weights.cumsum()

return out```

### Method 12

Thanks to @Divakar’s solution and that is really fast. However, it does cause overflow problem which was pointed out by @Danny. The function doesn’t return correct answers when the length is greater than 13835 or so at my end.

The following is my solution based on Divakar’s solution and pandas.ewm().mean()

```def numpy_ema(data, com=None, span=None, halflife=None, alpha=None):
"""Summary
Calculate ema with automatically-generated alpha. Weight of past effect
decreases as the length of window increasing.

# these functions reproduce the pandas result when the flag adjust=False is set.
References:
https://stackoverflow.com/questions/42869495/numpy-version-of-exponential-weighted-moving-average-equivalent-to-pandas-ewm

Args:
data (TYPE): Description
com (float, optional): Specify decay in terms of center of mass, alpha=1/(1+com), for com>=0
span (float, optional): Specify decay in terms of span, alpha=2/(span+1), for span>=1
halflife (float, optional): Specify decay in terms of half-life, alpha=1-exp(log(0.5)/halflife), for halflife>0
alpha (float, optional): Specify smoothing factor alpha directly, 0<alpha<=1

Returns:
TYPE: Description

Raises:
ValueError: Description
"""
n_input = sum(map(bool, [com, span, halflife, alpha]))
if n_input != 1:
raise ValueError(
'com, span, halflife, and alpha are mutually exclusive')

nrow = data.shape
if np.isnan(data).any() or (nrow > 13835) or (data.ndim == 2):
df = pd.DataFrame(data)
df_ewm = df.ewm(com=com, span=span, halflife=halflife,
out = df_ewm.mean().values.squeeze()
else:
if com:
alpha = 1 / (1 + com)
elif span:
alpha = 2 / (span + 1.0)
elif halflife:
alpha = 1 - np.exp(np.log(0.5) / halflife)

alpha_rev = 1 - alpha
pows = alpha_rev**(np.arange(nrow + 1))

scale_arr = 1 / pows[:-1]
offset = data * pows[1:]
pw0 = alpha * alpha_rev**(nrow - 1)

mult = data * pw0 * scale_arr

cumsums = np.cumsum(mult)
out = offset + cumsums * scale_arr[::-1]
return out```

### Method 13

Here’s my implementation for 1D input arrays with infinite window size. As it uses large numbers, it works only with input arrays with elements of absolute value < 1e16, when using float32, but that should normally be the case.

The idea is to reshape the input array into slices of a limited length, so that no overflow occurs, and then doing the ewm calculation with each slice separately.

```def ewm(x, alpha):
"""
Returns the exponentially weighted mean y of a numpy array x with scaling factor alpha
y = x
y[j] = (1. - alpha) * y[j-1] + alpha * x[j],  for j > 0

x -- 1D numpy array
alpha -- float
"""

n = int(-100. / np.log(1.-alpha)) # Makes sure that the first and last elements in f are very big and very small (about 1e22 and 1e-22)
f = np.exp(np.arange(1-n, n, 2) * (0.5 * np.log(1. - alpha))) # Scaling factor for each slice
tmp = (np.resize(x, ((len(x) + n - 1) // n, n)) / f * alpha).cumsum(axis=1) * f # Get ewm for each slice of length n

# Add the last value of each previous slice to the next slice with corresponding scaling factor f and return result
return np.resize(tmp + np.tensordot(np.append(x, np.roll(tmp.T[n-1], 1)[1:]), f * ((1. - alpha) / f), axes=0), len(x))```

### Method 14

Isn’t exponential filter the same as a first order IIR filter?
Why don’t you try this:

```from scipy import signal
signal.lfilter([alpha], [1, alpha-1], data)```

where alpha ranges from 0 to 1

### Method 15

All the answers below don’t consider missing values, so I give my version, which assumes nan and the result match pandas EWM. I use numba for speed up, and it is ten times faster than pandas’ implementation.

``````matrix = df.values

@jit(nopython=True, nogil=True, parallel=True)
def ewm_mean(arr_in, com):
'''
calculate the exponential moving average for each column

\$y_{t}=frac{ewm_t}{w_t}\$
\$ewm_t = ewm_{t-1} (1 - alpha) + x_t\$
\$w_t = w_{t-1} (1 - alpha) + 1\$

arr_in->ndarray(dtype=float64): ewm per column
com->int: \$alpha = 1/com + 1\$
'''

t, m = arr_in.shape
ewma = np.empty((t, m), dtype=float32)

alpha = 1 / (com + 1)

# the size of blocks depending on the device, number of blocks should match the number of cores on your machine
sizeOfblock = 1000
numberOfblock = m // sizeOfblock
assert sizeOfblock*numberOfblock == m, "wrong split"
# main loop
for nb in prange(numberOfblock):       # split columns to blocks

w = np.where(np.isnan(arr_in[0, nb*sizeOfblock: (nb+1)*sizeOfblock]), 0., 1.)
ewma_old = arr_in[0, nb*sizeOfblock: (nb+1)*sizeOfblock]
ewma[0, nb*sizeOfblock: (nb+1)*sizeOfblock] = ewma_old

for i in range(1, t):              # accumulate row by row
data_now = arr_in[i, nb*sizeOfblock: (nb+1)*sizeOfblock]
ewma_old = np.where(np.isnan(ewma_old), 0., ewma_old)*(1-alpha) + data_now
if np.isnan(np.sum(data_now)): # check nan
nan_pos = np.isnan(data_now)
w = w * (1 - alpha) + np.where(nan_pos, 0., 1.)
d = ewma_old / w
ewma[i, nb*sizeOfblock: (nb+1)*sizeOfblock] = np.where(nan_pos, ewma[i-1, nb*sizeOfblock: (nb+1)*sizeOfblock], d)
else:
w = w * (1 - alpha) + 1.
d = ewma_old / w
ewma[i, nb*sizeOfblock: (nb+1)*sizeOfblock] = d
return ewma

np.isclose(df.ewm(com=2).mean().values, ewm_mean(matrix, com=2), equal_nan=True).all()
``````

The dataset looks like this.

``````import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.normal(0, 10, size=(4000, 5000)))
df.iloc[0:800, 1000:2000] = np.nan
df.iloc[800:1600, 2000:3000] = np.nan
df.iloc[1600:2400, 3000:4000] = np.nan
df.iloc[2400:3200, 4000:5000] = np.nan
``````

Notice: This code spends a lot of time handing nan numbers, So I split data into blocks and compute parallelly. If your data doesn’t have nan values, you can make minor changes and speed it up dramatically. Also, you can write a more complex logic to assign data to cores. `sizeOfblock` is a parameter; play with it. 