I’m trying to fit a Gaussian for my data (which is already a rough gaussian). I’ve already taken the advice of those here and tried curve_fit and leastsq but I think that I’m missing something more fundamental (in that I have no idea how to use the command).
Here’s a look at the script I have so far
import pylab as plb
import matplotlib.pyplot as plt
# Read in data -- first 2 rows are header in this example.
data = plb.loadtxt('part 2.csv', skiprows=2, delimiter=',')
x = data[:,2]
y = data[:,3]
mean = sum(x*y)
sigma = sum(y*(x - mean)**2)
def gauss_function(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt, pcov = curve_fit(gauss_function, x, y, p0 = [1, mean, sigma])
plt.plot(x, gauss_function(x, *popt), label='fit')
# plot data
plt.plot(x, y,'b')
# Add some axis labels
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
What I get from this is a gaussian-ish shape which is my original data, and a straight horizontal line.
Also, I’d like to plot my graph using points, instead of having them connected.
Any input is appreciated!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Here is corrected code:
import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
x = ar(range(10))
y = ar([0,1,2,3,4,5,4,3,2,1])
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
result:

Method 2
Explanation
You need good starting values such that the curve_fit function converges at “good” values. I can not really say why your fit did not converge (even though the definition of your mean is strange – check below) but I will give you a strategy that works for non-normalized Gaussian-functions like your one.
Example
The estimated parameters should be close to the final values (use the weighted arithmetic mean – divide by the sum of all values):
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
x = np.arange(10)
y = np.array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1])
# weighted arithmetic mean (corrected - check the section below)
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))
def Gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma])
plt.plot(x, y, 'b+:', label='data')
plt.plot(x, Gauss(x, *popt), 'r-', label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
I personally prefer using numpy.
Comment on the definition of the mean (including Developer’s answer)
Since the reviewers did not like my edit on #Developer’s code, I will explain for what case I would suggest an improved code. The mean of developer does not correspond to one of the normal definitions of the mean.
Your definition returns:
>>> sum(x * y) 125
Developer’s definition returns:
>>> sum(x * y) / len(x) 12.5 #for Python 3.x
The weighted arithmetic mean:
>>> sum(x * y) / sum(y) 5.0
Similarly you can compare the definitions of standard deviation (sigma). Compare with the figure of the resulting fit:
Comment for Python 2.x users
In Python 2.x you should additionally use the new division to not run into weird results or convert the the numbers before the division explicitly:
from __future__ import division
or e.g.
sum(x * y) * 1. / sum(y)
Method 3
You get a horizontal straight line because it did not converge.
Better convergence is attained if the first parameter of the fitting (p0) is put as max(y), 5 in the example, instead of 1.
Method 4
After losing hours trying to find my error, the problem is your formula:
sigma = sum(y*(x-mean)**2)/n
This previous formula is wrong, the correct formula is the square root of this!;
sqrt(sum(y*(x-mean)**2)/n)
Hope this helps
Method 5
There is another way of performing the fit, which is by using the ‘lmfit’ package. It basically uses the cuve_fit but is much better in fitting and offers complex fitting as well.
Detailed step by step instructions are given in the below link.
http://cars9.uchicago.edu/software/python/lmfit/model.html#model.best_fit
Method 6
sigma = sum(y*(x - mean)**2)
should be
sigma = np.sqrt(sum(y*(x - mean)**2))
Method 7
Actually, you do not need to do a first guess. Simply doing
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
x = ar(range(10))
y = ar([0,1,2,3,4,5,4,3,2,1])
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y)
#popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
works fine. This is simpler because making a guess is not trivial. I had more complex data and did not manage to do a proper first guess, but simply removing the first guess worked fine 🙂
P.S.: use numpy.exp() better, says a warning of scipy
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
