Zhuojun Jiang, Lei Dong, Lun Wu, and Yu Liu
The complexity of navigation in cities has increased with the expansion of urban areas, creating challenging transportation problems that drive many studies on the navigability of networks. However, due to the lack of individual mobility data, large-scale empirical analysis of the wayfinder's real-world navigation is rare. Here, using 225 million subway trips from three major cities in China, we quantify navigation difficulty from an information perspective. Our results reveal that 1) people conserve a small number of repeatedly used routes, and 2) the navigation information in the sub-networks formed by those routes is much smaller than the theoretical value in the global network, suggesting that the decision cost for actual trips is significantly smaller than the theoretical upper limit found in previous studies. By modeling routing behaviors in growing networks, we show that while the global network can become difficult to navigate, navigability can be improved in sub-networks. We further present a universal linear relationship between the empirical and theoretical search information, which allows the two metrics to predict each other. Our findings demonstrate how large-scale observations can quantify real-world navigation behaviors and aid in evaluating transportation planning.
Cite: Zhuojun Jiang, Lei Dong, Lun Wu, Yu Liu, Quantifying navigation complexity in transportation networks, PNAS Nexus, 1(3), 2022, pgac126.
Data to replicate the results of the paper.
Smart card records in Beijing, Shanghai and Shenzhen. The datasets provide OD at the station level grouped by travel time (see data prepoccessing for details):
- bj_2019.csv: Beijing subway in May 2019.
- sh_2015.csv: Shanghai subway in April 2015.
- sz_2017.csv: Shenzhen subway in October 2017.
Column | Definition | Data type |
---|---|---|
lineid_o | identication of the starting subway line | int |
stationid_o | identication of the starting subway station | int |
stationname_o | name of the starting subway station | str |
lineid_d | identication of the terminal subway line | int |
stationid_d | identication of the terminal subway station | int |
stationname_d | name of the terminal subway station | str |
d_time | interval between the entry and exit timestamps (in minutes) | int |
count | number of trips with a d_time travel time between this station pair | int |
Attributes of subway lines and stations in Beijing/Shanghai/Shenzhen (by 2020):
- lines_[bj/sh/sz].csv: list of subway lines.
- stations_[bj/sh/sz].csv: list of subway stations.
- Eudist_[bj/sh/sz].csv: Euclidean distance between each station pair.
Subway/information networks of three cities from 2000 to 2020:
- PrimalGraph_[bj/sh/sz]_[yyyy].gml: the subway network of Beijing/Shanghai/Shenzhen in the year of 'yyyy'.
- Node: each node represents a station.
- Edge: each edge represents the subway line connecting adjacent stations.
Node attribute | Definition | Data type |
---|---|---|
id | serial number of the node in the network (starting from 0) | int |
label | each node is labelled as 'lineid-stationid' | str |
Edge attribute | Definition | Data type |
---|---|---|
source | the serial number of the starting node | int |
target | the serial number of the terminal node | int |
key/nid | identication of the subway line | int |
duration | the in-vehicle time of the section | double |
distance | the Euclidean distance along the subway line between the connected stations | double |
- DualGraph_[bj/sh/sz]_[yyyy].gml: the information network of Beijing/Shanghai/Shenzhen in the year of 'yyyy'.
- Node: each node represents a subway line.
- Edge: each edge represents the transfer station connecting the two intersecting lines.
Node attribute | Definition | Data type |
---|---|---|
id | serial number of the node in the network (starting from 0) | int |
label | each node is labelled by 'lineid', where lineid represents the line ID | str |
name | name of the line | int |
Edge attribute | Definition | Data type |
---|---|---|
source | the serial number of the starting node | int |
target | the serial number of the terminal node | int |
key/sid | identication of the subway station | int |
crossing | name of the transfer station | str |
Usage (Python 3.7)
Import the networkx module with
>>> import networkx as nx
Import the subway network with
>>> H = nx.read_gml('PrimalGraph_bj_2019.gml')
272 subway trips with known routes and duration through questionnaires in the three studied cities.
Column | Definition | Data type |
---|---|---|
id | questionnaire id | int |
city | abbreviation of the city name | str |
line_o | name of the starting subway line | str |
station_o | name of the starting subway station | str |
line_d | name of the terminal subway line | str |
station_d | name of the terminal subway station | str |
line1 | name of the 1st subway line taken in this trip | str |
line2 | name of the 2nd subway line taken in this trip (blank if none) | str |
line3 | name of the 3rd subway line taken in this trip (blank if none) | str |
line4 | name of the 4th subway line taken in this trip (blank if none) | str |
line5 | name of the 5th subway line taken in this trip (blank if none) | str |
line6 | name of the 6th subway line taken in this trip (blank if none) | str |
transfer1 | name of the 1st transfer station taken in this trip (blank if none) | str |
transfer2 | name of the 2nd transfer station taken in this trip (blank if none) | str |
transfer3 | name of the 3rd transfer station taken in this trip (blank if none) | str |
transfer4 | name of the 4th transfer station taken in this trip (blank if none) | str |
transfer5 | name of the 5th transfer station taken in this trip (blank if none) | str |
hour_o | Hour of the entry timestamp | int |
minute_o | Minute of the entry timestamp | int |
second_o | Second of the entry timestamp | int |
hour_d | Hour of the exit timestamp | int |
minute_d | Minute of the exit timestamp | int |
second_d | Second of the exit timestamp | int |
d_time | interval between the entry and exit timestamps (in seconds) | int |
nroutes | number of subway lines taken along the trip | int |
choice | the way passengers choose this route(1: by intuitive/habit/experience, 2: selected from multiple routes, 3: directly select the first route recommended by the navigatison software, 4: others) | int |
age | passenger's age group(1: <18, 2: 18-25, 3: 26-30, 4: 31-40, 5: 41-50, 6: 51-60, 7: >60) | int |
The official published ridership of the Beijing subway for May 2019. The numbers are obtained from Beijing Subway Company’s official Weibo.
Column | Definition | Data type |
---|---|---|
linename | name of the subway line | str |
2019/5/1 | Ridership on 2019/5/1 (in millions) | double |
2019/5/2 | Ridership on 2019/5/2 (in millions) | double |
... | ... | double |
2019/5/31 | Ridership on 2019/5/31 (in millions) | double |
Data used to generate the figures in the paper, which is the output of the code for the paper (see gen_figures for details).
- matrix_matched_path_[bj/sh/sz]_[yyyy].pkl: the results of matched paths from Route matching. The file contains:
- A matrix of dataframes where the dataframe of matched paths from station i to station j is stored at (i,j).
Column | Definition | Data type |
---|---|---|
seq_stops | sequence of stations along the path | str[] |
pathturns | transfer stations along the path | str[] |
seq_lines | sequence of lines along the path | int[] |
nroutes | number of lines taken along the path | int |
duration | the travel time of the path | double |
distance | the Euclidean distance of the path | double |
avg_counts | number of trips on the path | double |
total | total number of trips between the OD stations | double |
- ESI/res_stationlevel_card_[bj/sh/sz]_[yyyy].pkl: the results of the station-level empirical search information calculated based on the matched paths from Calculate the empirical search information (ESI). The file contains:
- A dataframe of empirical search information values for all matched path in the information network
- A dataframe of the empirical search information between station pairs
dataframe of empirical search information for matched paths
Column | Definition | Data type |
---|---|---|
i | start station id | int |
j | terminal station id | int |
seq_lines | sequence of nodes along the path in the information network | int[] |
pathturns | transfer stations along the path | str[] |
nroutes | number of subway lines taken along the trip | int |
diff_nroutes | difference between the transfer number of this path and the minimum transfer number of all the matched paths between the OD stations | int |
duration | the travel time of the path | double |
avg_counts | number of trips on the path | double |
Ktot_sub | number of connections in the sub-network of the OD stations | int |
S_sub | empirical search information of taking the path | double |
dataframe of empirical search information for station pairs
Column | Definition | Data type |
---|---|---|
i | start station id | int |
j | terminal station id | int |
nroutes | number of subway lines taken between the OD stations | double |
min_nroutes | minimum number of subway lines taken on the matched paths between the OD stations | int |
duration | the travel time between the OD stations | double |
Ktot_sub | number of connections in the sub-network of the OD stations | int |
avg_counts | number of trips between the OD stations | double |
k_paths | number of matched paths between the OD stations | int |
S_sub | empirical search information between the OD stations | double |
-
ESI/res_linelevel_card_[bj/sh/sz]_[yyyy].pkl: the results of the line-level empirical search information calculated based on the matched paths from Calculate the empirical search information (ESI). The file contains:
- Matrices of line-level empirical search information between line pairs with all/1/2/3 transfers on the travel paths.
-
ESI/ksp_[k]_[bj/sh/sz]_[yyyy].pkl: the results of the empirical search information calculated based on the k shortest paths from Calculate the empirical search information (ESI). The file contains:
- A matrix of station-level empirical search information values.
- A matrix of number of subway lines taken between station pairs.
- A matrix of travel time between station pairs.
- A matrix of travel distance between station pairs.
- Matrices of line-level empirical search information between line pairs with all/1/2/3 transfers.
-
TSI/[bj/sh/sz]_[yyyy].pkl: the results of the theoretical search information calculated from Calculate the theoretical search information (TSI). The file contains:
- A matrix of station-level theoretical search information values.
- A matrix of number of subway lines on the fastest simplest path between station pairs.
- A matrix of travel time of the fastest simplest path between station pairs.
- A matrix of travel distance of the fastest simplest path between station pairs.
- Matrices of line-level theoretical search information between line pairs with all/1/2/3 transfers on the fastest simplest paths.
Usage (Python 3.7)
Import the I/O functions with
>>> from iofiles import *
Import the data for figures with
>>> data = load_variable('data/output/ESI/res_stationlevel_card_bj_2019.pkl')
>>> data[0] # view the dataframe
-
data_prepoccessing.sql is used to pre-process smart card data.
-
route_matching.py is used to match paths from the OD records.
-
ESI_from_matching.py is used to calculate the empirical search information from the matched paths.
-
ESI_from_ksp.py is used to calculate the empirical search information from the k shortest paths.
-
TSI.py reproduces the amount of theoretical global search information according to Gallotti et al, 2016.
-
funcs.py includes the basic functions for the calculations in networks.
-
iofiles.py includes the I/O functions for saving and loading files.
-
gen_figures.ipynb reproduces figures in the paper.
To generate the route matching results using the provided scripts, run in the terminal:
$ python route_matching.py
The output will be in output/
.
To generate the ESI results based on the matched path from the OD records, run in the terminal:
$ python ESI_from_matching.py
The output will be in output/ESI
.
When no smart card data (matched paths) are available from 2000 to 2020, the ESI of the subway networks is calculated using the k shortest paths for each year (Beijing: k=13, Shanghai: k=12, Shenzhen k=6, see paper for detail). To generate the ESI results based on the k shortest paths, run in the terminal:
$ python ESI_from_ksp.py
Reproduce the amount of theoretical global search information according to Gallotti et al, 2016.
To generate the TSI results using the provided scripts, run in the terminal:
$ python TSI.py
The output will be in output/TSI
.
Contact: [email protected]