I know that to find the distance between two latitude, longitude points I need to use the haversine function:
def haversine(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
I have a DataFrame where one column is latitude and another column is longitude. I want to find out how far these points are from a set point, -56.7213600, 37.2175900. How do I take the values from the DataFrame and put them into the function?
example DataFrame:
SEAZ LAT LON 1 296.40, 58.7312210, 28.3774110 2 274.72, 56.8148320, 31.2923240 3 192.25, 52.0649880, 35.8018640 4 34.34, 68.8188750, 67.1933670 5 271.05, 56.6699880, 31.6880620 6 131.88, 48.5546220, 49.7827730 7 350.71, 64.7742720, 31.3953780 8 214.44, 53.5192920, 33.8458560 9 1.46, 67.9433740, 38.4842520 10 273.55, 53.3437310, 4.4716664
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I can’t confirm if the calculations are correct but the following worked:
In [11]:
from numpy import cos, sin, arcsin, sqrt
from math import radians
def haversine(row):
lon1 = -56.7213600
lat1 = 37.2175900
lon2 = row['LON']
lat2 = row['LAT']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
df['distance'] = df.apply(lambda row: haversine(row), axis=1)
df
Out[11]:
SEAZ LAT LON distance
index
1 296.40 58.731221 28.377411 6275.791920
2 274.72 56.814832 31.292324 6509.727368
3 192.25 52.064988 35.801864 6990.144378
4 34.34 68.818875 67.193367 7357.221846
5 271.05 56.669988 31.688062 6538.047542
6 131.88 48.554622 49.782773 8036.968198
7 350.71 64.774272 31.395378 6229.733699
8 214.44 53.519292 33.845856 6801.670843
9 1.46 67.943374 38.484252 6418.754323
10 273.55 53.343731 4.471666 4935.394528
The following code is actually slower on such a small dataframe but I applied it to a 100,000 row df:
In [35]: %%timeit df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LON']) df['dLON'] = df['LON_rad'] - math.radians(-56.7213600) df['dLAT'] = df['LAT_rad'] - math.radians(37.2175900) df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2)) 1 loops, best of 3: 17.2 ms per loop
Compared to the apply function which took 4.3s so nearly 250 times quicker, something to note in the future
If we compress all the above in to a one-liner:
In [39]: %timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['LAT']) - math.radians(37.2175900))/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(np.radians(df['LAT'])) * np.sin((np.radians(df['LON']) - math.radians(-56.7213600))/2)**2)) 100 loops, best of 3: 12.6 ms per loop
We observe further speed ups now a factor of ~341 times quicker.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0