Shortest Path Analysis with Python (2024)

Lesson objectives#

This tutorial focuses on spatial networks and learn how to construct a routable directed graph for Networkx and find shortest paths along the given street network based on travel times or distance by car. In addition, we will learn how to calculate travel times matrices by public transport using r5py -library.

Run these codes in Binder#

Before you can run this Notebook, and/or do any programming, you need to launch the Binder instance. You can find buttons for activating the python environment at the top-right of this page which look like this:

Working with Jupyter Notebooks#

Jupyter Notebooks are documents that can be used and run inside the JupyterLab programming environment containing the computer code and rich text elements (such as text, figures, tables and links).

A couple of hints:

You can execute a cell by clicking a given cell that you want to run and pressing Shift + Enter (or by clicking the “Play” button on top)
You can change the cell-type between Markdown (for writing text) and Code (for writing/executing code) from the dropdown menu above.

See further details and help for using Notebooks and JupyterLab from here.

Tutorial#

In this tutorial we will focus on a network analysis methods that relate to way-finding.Finding a shortest path from A to B using a specific street network is a very common spatial analyticsproblem that has many practical applications.

Python provides easy to use tools for conducting spatial network analysis.One of the easiest ways to start is to use a librarycalled Networkxwhich is a Python module that provides a lot tools that can be used toanalyze networks on various different ways. It also contains algorithmssuch as Dijkstra’salgorithmor A*algoritm that are commonly used to find shortest paths alongtransportation network.

Next, we will learn how to do spatial network analysis in practice.

Typical workflow for routing#

If you want to conduct network analysis (in any programming language) there are a few basic steps that typically needs to be done before you can start routing. These steps are:

Retrieve data (such as street network from OSM or Digiroad + possibly transit data if routing with PT).
Modify the network by adding/calculating edge weights (such as travel times based on speed limit and length of the road segment).
Build a routable graph for the routing tool that you are using (e.g. for NetworkX, igraph or OpenTripPlanner).
Conduct network analysis (such as shortest path analysis) with the routing tool of your choice.

1. Retrieve data#

As a first step, we need to obtain data for routing. Pyrosm library makes it really easy to retrieve routable networks from OpenStreetMap (OSM) with different transport modes (walking, cycling and driving).

Let’s first extract OSM data for Helsinki that are walkable. In pyrosm, we can use a function called osm.get_network() which retrieves data from OpenStreetMap. It is possible to specify what kind of roads should be retrieved from OSM with network_type -parameter (supports walking, cycling, driving).

from pyrosm import OSM, get_dataimport geopandas as gpdimport pandas as pdimport networkx as nx# We will use test data for Helsinki that comes with pyrosmosm = OSM(get_data("helsinki_pbf"))# Parse roads that can be driven by carroads = osm.get_network(network_type="driving")roads.plot(figsize=(10,10))

<AxesSubplot:>

roads.head(2)

	access	area	bicycle	bridge	cycleway	foot	footway	highway	int_ref	lanes	...	surface	tunnel	width	id	timestamp	version	tags	osm_type	geometry	length
0	None	None	None	None	None	None	None	unclassified	None	2	...	paved	None	None	4236349	1380031970	21	{"name:fi":"Erottajankatu","name:sv":"Skillnad...	way	MULTILINESTRING ((24.94327 60.16651, 24.94337 ...	14.0
1	None	None	None	None	None	None	None	unclassified	None	2	...	paved	None	None	4243035	1543430213	12	{"name:fi":"Korkeavuorenkatu","name:sv":"H\u00...	way	MULTILINESTRING ((24.94567 60.16767, 24.94567 ...	51.0

2 rows × 30 columns

Okay, now we have drivable roads as a GeoDataFrame for the city center of Helsinki. If you look at the GeoDataFrame (scroll to the right), we can see that pyrosm has also calculated us the length of each road segment (presented in meters). The geometries are presented here as MultiLineString objects. From the map above we can see that the data also includes short pieces of roads that do not lead to anywhere (i.e. they are isolated). This is a typical issue when working with real-world data such as roads. Hence, at some point we need to take care of those in someway (remove them (typical solution), or connect them to other parts of the network).

In OSM, the information about the allowed direction of movement is stored in column oneway. Let’s take a look what kind of values we have in that column:

roads["oneway"].unique()

array(['yes', None, 'no'], dtype=object)

As we can see the unique values in that column are "yes", "no" or None. We can use this information to construct a directed graph for routing by car. For walking and cycling, you typically want create a bidirectional graph, because the travel is typically allowed in both directions at least in Finland. Notice, that the rules vary by country, e.g. in Copenhagen you have oneway rules also for bikes but typically each road have the possibility to travel both directions (you just need to change the side of the road if you want to make a U-turn). Column maxspeed contains information about the speed limit for given road:

roads["maxspeed"].unique()

array(['30', '40', None, '20', '10', '5', '50'], dtype=object)

As we can see, there are also None values in the data, meaning that the speed limit has not been tagged for some roads. This is typical, and often you need to fill the non existing speed limits yourself. This can be done by taking advantage of the road class that is always present in column highway:

roads["highway"].unique()

array(['unclassified', 'residential', 'secondary', 'service', 'tertiary', 'primary', 'primary_link', 'cycleway', 'footway', 'tertiary_link', 'pedestrian', 'trail', 'crossing'], dtype=object)

Based on these values, we can make assumptions that e.g. residential roads in Helsinki have a speed limit of 30 kmph. Hence, this information can be used to fill the missing values in maxspeed. As we can see, the current version of the pyrosm tool seem to have a bug because some non-drivable roads were also leaked to our network (e.g. footway, cycleway). If you notice these kind of issues with any of the libraries that you use, please notify the developers by raising an Issue in GitHub. This way, you can help improving the software. For this given problem, an issue has already been raised so you don’t need to do it again (it’s always good to check if a related issue exists in GitHub before adding a new one).

Okay, but how can we make a routable graph out of this data of ours? Let’s remind us about the basic elements of a graph that we went through in the lecture slides:

So to be able to create a graph we need to have nodes and edges. Now we have a GeoDataFrame of edges, but where are those nodes? Well they are not yet anywhere, but with pyrosm we can easily retrieve the nodes as well by specifying nodes=True, when parsing the streets:

# Parse nodes and edgesnodes, edges = osm.get_network(network_type="driving", nodes=True)# Plot the dataax = edges.plot(figsize=(10,10), color="gray", lw=1.0)ax = nodes.plot(ax=ax, color="red", markersize=2)# Zoom in to take a closer look#ax.set_xlim([24.9375, 24.945])ax.set_ylim([60.17, 60.173])

(60.17, 60.173)

Okay, as we can see now we have both the roads (i.e. edges) and the nodes that connect the street elements together (in red) that are typically intersections. However, we can see that many of the nodes are in locations that are clearly not intersections. This is intented behavior to ensure that we have full connectivity in our network. We can at later stage clean and simplify this network by merging all roads that belong to the same link (i.e. street elements that are between two intersections) which also reduces the size of the network.

Note

In OSM, the street topology is typically not directly suitable for graph traversal due to missing nodes at intersections which means that the roads are not splitted at those locations. The consequence of this, is that it is not possible to make a turn if there is no intersection present in the data structure. Hence, pyrosm will separate all road segments/geometries into individual rows in the data.

Let’s take a look what our nodes data look like:

nodes.head()

	lon	lat	tags	timestamp	version	id	geometry
0	24.943271	60.166514	None	1390926206	2	1372477605	POINT (24.94327 60.16651)
1	24.943365	60.166444	{'highway': 'crossing', 'crossing': 'traffic_s...	1383915357	6	292727220	POINT (24.94337 60.16644)
2	24.943403	60.166408	None	1374595731	1	2394117042	POINT (24.94340 60.16641)
3	24.945668	60.167668	{'highway': 'crossing', 'crossing': 'uncontrol...	1290714658	5	296250563	POINT (24.94567 60.16767)
4	24.945671	60.167630	{'traffic_calming': 'divider'}	1354578076	1	2049084195	POINT (24.94567 60.16763)

As we can see, the nodes GeoDataFrame contains information about the coordinates of each node as well as a unique id for each node. These id values are used to determine the connectivity in our network. Hence, pyrosm has also added two columns to the edges GeoDataFrame that specify from and to ids for each edge. Column u contains information about the from-id and column v about the to-id accordingly:

# Check last four columnsedges.iloc[:5,-4:]

	geometry	u	v	length
0	LINESTRING (24.94327 60.16651, 24.94337 60.16644)	1372477605	292727220	9.370
1	LINESTRING (24.94337 60.16644, 24.94340 60.16641)	292727220	2394117042	4.499
2	LINESTRING (24.94567 60.16767, 24.94567 60.16763)	296250563	2049084195	4.174
3	LINESTRING (24.94567 60.16763, 24.94569 60.16744)	2049084195	60072359	21.692
4	LINESTRING (24.94569 60.16744, 24.94571 60.16726)	60072359	6100704327	19.083

We can see that the geometries are now stored as LineString instead of MultiLineString. At this point, we can fix the issue related to having some pedestrian roads in our network. We can do this by removing all edges from out GeoDataFrame that have highway value in 'cycleway', 'footway', 'pedestrian', 'trail', 'crossing':

edges = edges.loc[~edges["highway"].isin(['cycleway', 'footway', 'pedestrian', 'trail', 'crossing'])].copy()edges.plot()

<AxesSubplot:>

Now we can see, that some of the isolated edges were removed from the data. The character ~ (tilde) in the command above is a negation operator that is handy if you want to e.g. remove some rows from your GeoDataFrame based on criteria such as we used here.

2. Modify the data#

At this stage, we have the necessary components to build a routable graph (nodes and edges) based on distance. However, in real life the network distance is not the best cost metric to use, because the shortest path (based on distance) is not necessarily always the optimal route in terms of travel time. Time is typically the measure that people value more (plus it is easier to comprehend), so at this stage we want to add a new cost attribute to our edges GeoDataFrame that converts the metric distance information to travel time (in seconds) based on following formula:

<distance-in-meters> / (<speed-limit-kmph> / 3.6)

Before we can do this calculation, we need to ensure that all rows in maxspeed column have information about the speed limit. Let’s check the value counts of the column and also include information about the NaN values with dropna parameter:

# Count valuesedges["maxspeed"].value_counts(dropna=False)

30 1110NaN 62840 42210 5020 365 2150 2Name: maxspeed, dtype: int64

As we can see, the rows which do not contain information about the speed limit is the second largest group in our data. Hence, we need to apply a criteria to fill these gaps. We can do this based on following “rule of thumb” criteria in Finland (notice that these vary country by country):

Road class	Speed limit within urban region	Speed limit outside urban region
motorway	100	120
motorway_link	80	80
trunk	60	100
trunk_link	60	60
primary	50	80
primary_link	50	50
secondary	50	50
secondary_link	50	50
tertiary	50	60
tertiary_link	50	50
unclassified	50	80
unclassified_link	50	50
residential	50	80
living_street	20	NA
service	30	NA
other	50	80

For simplicity, we can consider that all the roads in Helsinki Region follows the within urban region speed limits, although this is not exactly true (the higher speed limits start somewhere at the outer parts of the city region). For making the speed limit values more robust / correct, you could use data about urban/rural classification which is available in Finland from Finnish Environment Institute. Let’s first convert our maxspeed values to integers using astype() method:

edges["maxspeed"] = edges["maxspeed"].astype(float).astype(pd.Int64Dtype())edges["maxspeed"].unique()

<IntegerArray>[30, 40, <NA>, 20, 10, 5, 50]Length: 7, dtype: Int64

	maxspeed	highway
30	30	service
47	30	service
48	30	service
49	30	service
50	30	service

	u	v	length	travel_time_seconds
0	1372477605	292727220	9.370	1.12440
1	292727220	2394117042	4.499	0.53988
2	296250563	2049084195	4.174	0.50088
3	2049084195	60072359	21.692	2.60304
4	60072359	6100704327	19.083	2.28996
5	6100704327	296250223	6.027	0.72324
6	264015226	25345665	9.644	1.15728
7	25345665	296248024	7.016	0.84192
8	296248024	426911766	4.137	0.49644
9	426911766	60072364	21.132	2.53584

3. Build a directed graph with pyrosm#

We can use pyrosm library (as well as OSMnx) to easily build a directed graph. Let’s see how we can create a routable NetworkX graph using pyrosm with one command:

G = osm.to_graph(nodes, edges, graph_type="networkx")G

<networkx.classes.multidigraph.MultiDiGraph at 0x7f9385274610>

Now we have a similar routable graph, but pyrosm actually does some additional steps in the background. By default, pyrosm cleans all unconnected edges from the graph and only keeps edges that can be reached from every part of the network. In addition, pyrosm automatically modifies the graph attribute information in a way that they are compatible with OSMnx that provides many handy functionalities to work with graphs. Such as plotting an interactive map based on the graph:

import osmnx as ox ox.plot_graph_folium(G)

Make this Notebook Trusted to load map: File -> Trust Notebook

4. Routing with NetworkX#

Now we have everything we need to start routing with NetworkX (based on driving distance or travel time). But first, let’s again go through some basics about routing.

Basic logic in routing#

Most (if not all) routing algorithms work more or less in a similar manner. The basic steps for finding an optimal route from A to B, is to:

Find the nearest node for origin location * (+ get info about its node-id and distance between origin and node)
Find the nearest node for destination location * (+ get info about its node-id and distance between origin and node)
Use a routing algorithm to find the shortest path between A and B
Retrieve edge attributes for the given route(s) and summarize them (can be distance, time, CO2, or whatever)

* in more advanced implementations you might search for the closest edge

This same logic should be applied always when searching for an optimal route between a single origin to a single destination, or when calculating one-to-many -type of routing queries (producing e.g. travel time matrices).

Find the optimal route between two locations#

Next, we will learn how to find the shortest path between two locations using Dijkstra’s algorithm.

First, let’s find the closest nodes for two locations that are located in the area. OSMnx provides a handly function for geocoding an address ox.geocode(). We can use that to retrieve the x and y coordinates of our origin and destination.

# OSM data is in WGS84 so typically we need to use lat/lon coordinates when searching for the closest node# Originorig_address = "Simonkatu 3, Helsinki"orig_y, orig_x = ox.geocode(orig_address) # notice the coordinate order (y, x)!# Destinationdest_address = "Unioninkatu 33, Helsinki"dest_y, dest_x = ox.geocode(dest_address) print("Origin coords:", orig_x, orig_y)print("Destination coords:", dest_x, dest_y)

Origin coords: 24.9360071 60.1696202Destination coords: 24.9512311 60.1664348

Okay, now we have coordinates for our origin and destination.

Find the nearest nodes#

Next, we need to find the closest nodes from the graph for both of our locations. For calculating the closest point we use ox.distance.nearest_nodes() -function and specify return_dist=True to get the distance in meters.

# 1. Find the closest nodes for origin and destinationorig_node_id, dist_to_orig = ox.distance.nearest_nodes(G, X=orig_x, Y=orig_y, return_dist=True)dest_node_id, dist_to_dest = ox.distance.nearest_nodes(G, X=dest_x, Y=dest_y, return_dist=True)print("Origin node-id:", orig_node_id, "and distance:", dist_to_orig, "meters.")print("Destination node-id:", dest_node_id, "and distance:", dist_to_dest, "meters.")

Origin node-id: 659998487 and distance: 27.888258768172825 meters.Destination node-id: 3367881248 and distance: 1.3705603583619679 meters.

Now we are ready to start the actual routing with NetworkX.

Find the fastest route by distance / time#

Now we can do the routing and find the shortest path between the origin and target locationsby using the dijkstra_path() function of NetworkX. For getting only the cumulative cost of the trip, we can directly use a function dijkstra_path_length() that returns the travel time without the actual path.

With weight -parameter we can specify the attribute that we want to use as cost/impedance. We have now three possible weight attributes available: 'length' and 'travel_time_seconds'.

Let’s first calculate the routes between locations by walking and cycling, and also retrieve the travel times

# Calculate the paths by walking and cyclingmetric_path = nx.dijkstra_path(G, source=orig_node_id, target=dest_node_id, weight='length')time_path = nx.dijkstra_path(G, source=orig_node_id, target=dest_node_id, weight='travel_time_seconds')# Get also the actual travel times (summarize)travel_length = nx.dijkstra_path_length(G, source=orig_node_id, target=dest_node_id, weight='length')travel_time = nx.dijkstra_path_length(G, source=orig_node_id, target=dest_node_id, weight='travel_time_seconds')

Okay, that was it! Let’s now see what we got as results by visualizing the results.

For visualization purposes, we can use a handy function again from OSMnx called ox.plot_graph_route() (for static) or ox.plot_route_folium() (for interactive plot).

Let’s first make static maps

# Shortest path based on distancefig, ax = ox.plot_graph_route(G, metric_path)# Add the travel time as titleax.set_xlabel("Shortest path distance {t: .1f} meters.".format(t=travel_length))

Text(0.5, 51.0, 'Shortest path distance 1138.1 meters.')

fig, ax = ox.plot_graph_route(G, time_path)# Add the travel time as titleax.set_xlabel("Travel time {t: .1f} minutes.".format(t=travel_time/60))

Text(0.5, 51.0, 'Travel time 2.4 minutes.')

Great! Now we have successfully found the optimal route between our origin and destination and we also have estimates about the travel time that it takes to travel between the locations by walking and cycling. As we can see, the route for both travel modes is exactly the same which is natural, as the only thing that changed here was the constant travel speed.

Let’s still finally see an example how you can plot a nice interactive map out of our results with OSMnx:

ox.plot_route_folium(G, time_path, popup_attribute='travel_time_seconds')

Make this Notebook Trusted to load map: File -> Trust Notebook