banner



How To Find The Distance Between A Pair Of Points

Working with Geo data is really fun and exciting especially when you lot clean up all the data and loaded it to a dataframe or to an assortment. The real works starts when yous have to observe distances between 2 coordinates or cities and generate a distance matrix to find out distance of each metropolis from other.

We will discuss in details about some performance oriented way to discover the distances and what are the tools available to reach that without much hassle.

In this post we will meet how to find distance between two geo-coordinates using scipy and numpy vectorize methods

Distance Matrix

As per wiki definition

In mathematics, figurer science and especially graph theory, a altitude matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. If there areDue north elements, this matrix will have sizeN×Northward. In graph-theoretic applications the elements are more often referred to as points, nodes or vertices

Here is an example, A altitude matrix showing distance of each of these Indian cities between each other

Haversine Distance Metrics using Scipy Distance Metrics Class

Create a Dataframe

Let's create a dataframe of 6 Indian cities with their corresponding Latitude/Longitude

            from sklearn.neighbors import DistanceMetric from math import radians import pandas as pd import numpy as np  cities_df = pd.DataFrame({'urban center':['bangalore','Mumbai','Delhi','kolkatta','chennai','bhopal'],     'lat':[12.9716,19.076,28.7041,22.5726,thirteen.0827,23.2599],                           'lon':[77.5946,72.877,77.1025,88.639,fourscore.2707,77.4126],                           })                      

Convert the Lat/Long degress in Radians

In this stride we will convert eh Lat/Long values in degrees to radians because nigh of the scipy distance metrics functions takes Lat/Long input every bit radians

            cities_df['lat'] = np.radians(cities_df['lat']) cities_df['lon'] = np.radians(cities_df['lon'])                      

Scipy get_metrics()

Scipy has a distance metrics class to find out the fast altitude metrics. You lot can admission the following metrics as shown in the epitome below using the get_metrics() method of this class and find the altitude betwixt using the two points

Here is the table from the original scipy documentation :

Delight check the documentation for other metrics to exist utilise for other vector spaces

            dist = DistanceMetric.get_metric('haversine')                      

Scipy Pairwise()

We have created a dist object with haversine metrics above and now we will use pairwise() part to calculate the haversine distance between each of the chemical element with each other in this array

pairwise() accepts a 2D matrix in the class of [latitude,longitude] in radians and computes the distance matrix as output in radians as well.

Input:

Input to pairwise() function is numpy.ndarray. So we take created a 2D matrix containing the Lat/Long of all the cities in the in a higher place dataframe

            cities_df[['lat','lon']].to_numpy()  assortment([[12.9716, 77.5946],        [19.076 , 72.877 ],        [28.7041, 77.1025],        [22.5726, 88.639 ],        [xiii.0827, eighty.2707],        [23.2599, 77.4126]])                      

We will laissez passer this ndarray in pairwise() function which returns the ouput equally ndarray too

                          dist.pairwise(cities_df [['lat','lon']].to_numpy())*6373                      

Output:

Concluding Output of pairwise part is a numpy matrix which we will convert to a dataframe to view the results with City labels and every bit a distance matrix

Because earth spherical radius as 6373 in kms, Multiply the effect with 6373 to go the altitude in KMS. For miles multiply by 3798

            dist.pairwise(cities_df[['lat','lon']].to_numpy())*6373   assortment([[   0. ,  845.62832501, 1750.66416275, 1582.52517566,          290.26311647, 1144.52705214], [ 845.62832501,    0. , 1153.62973323, 1683.20328341,         1033.47995206,  661.62108356], [1750.66416275, 1153.62973323,    0. , 1341.80906015,         1768.20631663,  606.34972183], [1582.52517566, 1683.20328341, 1341.80906015,    0. ,         1377.28350373, 1152.40418062], [ 290.26311647, 1033.47995206, 1768.20631663, 1377.28350373,            0. , 1171.47693568], [1144.52705214,  661.62108356,  606.34972183, 1152.40418062,         1171.47693568,    0. ]])                      

Create Dataframe of Distance Matrix

From the above output ndarray we volition create a dataframe of distance matrix which will showcase distance of each of these cities from each other

So the index of this dataframe is the list of city and the columns are as well the same metropolis

Now if you look at the row and prison cell of whatsoever of the city it will evidence the altitude betwixt them

                          pd.DataFrame(dist.pairwise(cities_df[['lat','lon']].to_numpy())*6373,  columns=cities_df.city.unique(), alphabetize=cities_df.urban center.unique())                      

Euclidean Distance Metrics using Scipy Spatial pdist office

Scipy spatial distance form is used to detect distance matrix using vectors stored in a rectangular array

Nosotros will bank check pdist function to find pairwise distance between observations in due north-Dimensional space

Here is the simple calling format:

Y = pdist(10, 'euclidean')

Nosotros will use the same dataframe which we used to a higher place to detect the altitude matrix using scipy spatial pdist office

pd.DataFrame(squareform(pdist(cities_df.iloc[:, ane:])), columns=cities_df.metropolis.unique(), index=cities_df.city.unique())

We are using foursquare form which is another function to convert vector-course distance vector to a square-form distance matrix, and vice-versa

Here also we convert all the Lat/long from degrees to radians and the output blazon is aforementioned numpy.ndarray

Numpy Vectorize approach to calculate haversine distance between two points

For this nosotros have to first define a vectorized function, which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy assortment or a tuple of numpy arrays

Haversine Vectorize Part

Permit's create a haversine function using numpy

            import numpy as np  def haversine_vectorize(lon1, lat1, lon2, lat2):      lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])      newlon = lon2 - lon1     newlat = lat2 - lat1      haver_formula = np.sin(newlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(newlon/2.0)**2      dist = 2 * np.arcsin(np.sqrt(haver_formula ))     km = 6367 * dist #6367 for altitude in KM for miles utilize 3958     return km                      

Now here nosotros need two sets of lat and long because nosotros are trying to calculate the altitude between two cities or points

Dataframe with Orign and Destination Lat/Long

Let'southward create some other dataframe with Origin and destination Lat/Long columns

            orig_dest_df = pd.DataFrame({     'origin_city':['Bangalore','Bombay','Delhi','Kolkatta','Chennai','Bhopal'],     'orig_lat':[12.9716,19.076,28.7041,22.5726,13.0827,23.2599],     'orig_lon':[77.5946,72.877,77.1025,88.639,80.2707,77.4126],     'dest_lat':[23.2599,12.9716,19.076,13.0827,28.7041,22.5726],     'dest_lon':[77.4126,77.5946,72.877,80.2707,77.1025,88.639],     'destination_city':['Bhopal','Bangalore','Bombay','Chennai','Delhi','Kolkatta']                           })                      

Calculate distance between origin and dest

Let's summate the haversine distance between origin and destination city using numpy vectorize haversine part

            haversine_vectorize(orig_dest_df['orig_lon'],orig_dest_df['orig_lat'],orig_dest_df['dest_lon'],                    orig_dest_df['dest_lat'])                      
            0    1143.449512 1     844.832190 2    1152.543623 3    1375.986830 four    1766.541600 v    1151.319225 dtype: float64                      

Add cavalcade to Dataframe using vectorize function

Permit's create a new column called haversine_dist and add together to the original dataframe

            orig_dest_df['haversine_dist'] = haversine_vectorize(orig_dest_df['orig_lon'],orig_dest_df['orig_lat'],orig_dest_df['dest_lon'],orig_dest_df['dest_lat'])                      

It's way faster than normal python looping and using the timeit role I can run into the performance is really tremendous.

            %%timeit orig_dest_df['haversine_dist'] = haversine_vectorize(orig_dest_df['orig_lon'],orig_dest_df['orig_lat'],orig_dest_df['dest_lon'],orig_dest_df['dest_lat'])                      

18.5 ms ± four.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

We have a modest dataset but for really large data in millions also it works fast with this vectorize arroyo

Conclusion:

And so far we have seen the unlike ways to summate the pairwise distance and compute the distance matrix using Scipy's spatial altitude and Altitude Metrics course.

Scipy Distance functions are a fast and piece of cake to compute the distance matrix for a sequence of lat,long in the class of [long, lat] in a 2D array. The output is a numpy.ndarray and which tin be imported in a pandas dataframe

Using numpy and vectorize function we have seen how to calculate the haversine distance between two points or geo coordinates really fast and without an explicit looping

Exercise yous know any other methods or functions to calculate distance matrix between vectors ? Delight write your comments and allow us know


Source: https://kanoki.org/2019/12/27/how-to-calculate-distance-in-python-and-pandas-using-scipy-spatial-and-distance-functions/

Posted by: brownthorthamme.blogspot.com

0 Response to "How To Find The Distance Between A Pair Of Points"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel