Reverse Geocoding with Geopy

Nancy Amandi
4 min readDec 30, 2022

Learn how to reverse geocode a dataset and extract sub-addresses from it

Photo by Markus Spiske on Unsplash

Why this article?

With this article, you can learn how to convert longitudes and latitudes in a large dataset to actual addresses and extract sub-addresses from them.

You might be in a situation where you have only longitude and latitude points in your dataset and you’re wondering how to extract insights. This dataset is useless after all!

Or maybe not.

You can derive lots of variables just from longitude and latitude points. Well, that's if you’re ready to follow me through this article.

Ready? Let’s go!

What is Reverse Geocoding?

In simple terms, reverse geocoding is the process of converting latitude and longitude coordinate points to addresses that humans can understand.

It’s just like converting these coordinates:

Longitude — 33.804504

Latitude — -84.1587461

To this address:

1000 Robert E Lee Blvd, Stone Mountain, GA 30083, United States — Stone Mountain Park

Now that you know what I mean by reverse geocoding, let’s go straight to business, shall we?

Import the necessary libraries

import pandas as pd
import numpy as np
import geopandas as gpd
import geopy
import geocoder
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

Read CSV file into pandas dataframe

We’ll be using one of Uber's open-source datasets. You can find the dataset here.

df = pd.read_csv("uber-raw-data-aug14.csv")
print(df.info())

Notice the large number of entries(829,274) and the Lat and Lon columns representing the Latitude and Longitude coordinates.

Image by Author

For the sake of this article and to get row values that most likely represent all locations in the whole dataset, we will randomly sample the dataframe to contain a subset of 500 rows.

You might not replicate the same row values in this article due to the random sampling though.

df = df.sample(n=500, replace=False)
print(df.shape)

Join the latitude and longitude points into one column

To successfully use Geopy, you need to join your coordinate points into one column.

df["coordinates"] = df["Lat"].apply(str)+ " , " +df["Lon"].apply(str)
print(df.head(10))

Notice the new column called coordinates.

Image by Author

Reverse geocode

There are several tools out there that are compatible with Geopy and provide reverse geocoding services like Google, Bing, Amazon, etc. However, we’ll be using OpenStreetMap Nominatim because it’s free.

geolocator = Nominatim(user_agent="Nancy Amandi", timeout= 10)
rgeocode = RateLimiter(geolocator.reverse, min_delay_seconds=0.1)
df["location"] = df["coordinates"].apply(rgeocode)
print(df.head(10))

It’s essential to put the “timeout” variable so that you will get a timeout error after the indicated time in minutes if the output takes too long to show. I used 10 minutes because I know it won’t take that long to work on 500 row values.

The “min_delay_seconds” variable is also important to give a delay time so that it won’t crash.

The “user_agent” variable is the name of your location app. You can actually put any name so I used my name.

Image by Author

Voila! We have a new column called location with our addresses.

Extract sub-addresses

From this location that we’ve derived, we will be extracting the “suburb”, “city”, and “state” from them.

Notice that these location values are tuples. We will need to convert each of them to a dictionary of values so that it would be easy to extract the sub-addresses as keys.

The def function helps to apply our custom code to each row of the “location” column. Our custom code returns the values of the suburb, city and state key in the dictionary formed from location.raw.

At every point, where one or more of these keys don’t exist in the dictionary, we return a null value using the except code.

def get_suburb(row):
location = row["location"]

try:
return location.raw["address"]["suburb"]
except KeyError:
return " "

def get_city(row):
location = row["location"]
try:
return location.raw["address"]["city"]
except KeyError:
return " "

def get_state(row):
location = row["location"]
try:
return location.raw["address"]["state"]
except KeyError:
return " "

df["CITY"] = df.apply(get_suburb, axis=1)
df["SUBURB"] = df.apply(get_city, axis=1)
df["STATE"] = df.apply(get_state, axis=1)

print(df.head(10))

Look closely at the location output values and notice how the values in the city are present in the corresponding location.

Yes! All sub-addresses have been extracted successfully.

Conclusion

Thank you for following me through this guide. I would love to know how it helped you reverse geocode your dataset. So please leave a comment.

Connect with me on Twitter and LinkedIn

--

--