When calculating distance between two points in SQL, there are a few different ways to approach the problem depending on your use case.
- Cartesian Distance (aka Euclidean distance), or the distance on a flat map calculated using a straightforward SQL query
- Haversine Distance, or the flying distance calculated using latitude and longitude points in SQL
- Driving Distance, using a Python package and the Google Sheets API
I’ll explain how to use each method in the three examples below, using the distance between San Francisco, CA and Cleveland, OH as my location examples.
This example is especially relevant for me since I’m flying this exact route to visit home for the holidays! As you’ll see, each query will generate a slightly (or dramatically) different result.
The simplest method: Calculating the linear distance between two points in SQL
The simplest way to calculate linear distance between two points in SQL relies on the Pythagorean formula to calculate the hypotenuse of a triangle (A² + B² = C²). I won’t get too deep into the weeds about the math, but the Pythagorean formula is the geometric principle that will help you calculate Euclidean and/or Cartesian distance.
The SQL code looks like this:
Use AirOps Data Sidekick to “automagically” write SQL
Not to brag, but I’m pretty fast with SQL.
I’ve manually written SQL to calculate the simple, linear distance between two points countless times throughout my data career.
I honestly didn’t think I could get much faster, but AirOps AI Data Sidekick proved me wrong.
Instead of writing the SQL query, I can type a question in plain English, click Generate SQL, and automatically get the code I need and the answer to my question.
Here’s what the code looks like in Data Sidekick, and you can also see the answer at the bottom (approximately 2,828 miles).
This would have saved me so much time over the years.
Is linear distance the best way to measure distance in SQL?
While this first method is fast, simple, and accurate for certain use cases, there’s an important caveat to keep in mind:
It calculates distance “as the crow flies” and is only helpful if you need to calculate short distances.
You can still use SQL to calculate longer distances for use cases that require more exact measurements. You just need something that accounts for the curvature of the earth.
A more advanced method: Using SQL to calculate the distance between two latitude and longitude points
Before we dive into the SQL, think about some potential use cases you might encounter that would require calculating the distance between two latitude and longitude points. Here are a few examples:
- You need to determine the best areas to expand a business’s physical footprint based on existing locations within a city.
- You’re working on a consumer e-scooter rental app, and you need to match users to nearby scooters.
- You want to build an email drip campaign based on a user’s IP address.
- You need to help the marketing team deliver geo-targeted digital ads based on a lead’s location.
For each of these examples, linear distance isn’t going to cut it – it won’t give you an accurate number.
That’s because the distance between two points on a curved surface, like a globe, isn’t linear.
Additionally, the distances between lines of latitude and longitude aren’t equidistant. The closer you are to the equator, the further apart lines of latitude become.
Since the Earth isn't a flat surface, you'll need an approach that considers all of this. That's where the Haversine formula comes in:
The Haversine formula allows you to calculate the distance between two locations using latitudinal and longitudinal coordinates. Luckily, you don’t need to do the calculation by hand.
I share this background because it helps frame the Haversine formula’s key benefit: It provides a much more accurate distance "as the crow flies" over long distances.
It may not make much difference when calculating short distances, like e-scooter locations within a small, dense city center. However, it provides far more accurate data when measuring longer distances, which I saw when writing these examples.
Using my San Francisco → Cleveland example, here’s what the SQL query to calculate the distance between the two latitude and longitude points looks like:
Automatically write the SQL query to calculate distance between two locations
AI Data Sidekick would have saved me tons of hours writing this SQL query, too.
In the first example, I specified Euclidean distance in my question.
This time, I’ll simply ask for the distance between Cleveland and San Francisco. The AI automatically “knows” to generate SQL that accounts for latitude and longitude points. It also provides the answer to the query.
Here’s what that looks like in Data Sidekick:
So, using this method, we get a result of approximately 2,165 miles between Cleveland and San Francisco.
A bonus method for advanced use cases: Calculating driving/road distance with a Python package
While the two methods we just covered will suit a variety of use cases, some queries aren't a good fit for SQL.
For example, if you need to calculate the actual driving distance/road distance between two cities, the two SQL examples I shared above won’t yield the most accurate results. If you’ve ever driven around a state with tons of lakes, mountains, or other geographical features, you know how much mileage it adds to your total driving distance.
Luckily, you can use a Python Package called haversine and Google Maps to quickly and easily calculate road/driving distance using Python.
Here’s what the package looks like in action:
The output that you’ll get from this query is that the driving distance between San Francisco, California and Cleveland, Ohio is 2,456 miles.
An easier way to calculate distances in SQL and Python
While a working knowledge of Python is valuable for any data analyst, you don’t necessarily need to be an expert in the language to use it in your work. There’s a way to harness the power of Python (and SQL) without the need to write line after line of code.
…it’s AirOps AI Data Sidekick, if you haven’t guessed already. 🙃
Sidekick uses AI to make working with Python and SQL significantly faster.
Data Sidekick can read a SQL query, understand its output, anticipate the query’s structure, and generate Python operations to perform the next step.
The foundation is the Recipe Book, our list of AI-powered recipes that help SQL users and other data practitioners work faster.
Here’s a quick look at some of our current recipes:
I may be biased since I’m so close to the product, but I can honestly say that it’s an awesome tool for anyone who works with data. Whether you’re learning SQL for the first time, trying to level up, or want to be more efficient in your work with data, Data Sidekick can make your life easier.
If you want to take it for a spin, click here to try Sidekick today! It’s completely free for individuals and small teams.