Redwood Rangers: US National Parks in CA Road Trip Route Optimization

Part 1: Find and Prepare Data

In this stage of the project you need to collect data on locations that will be included. Each location should contain longitude and latitude as well as location name for later visualization. We recommend to select at maximum 9 locations to find the optimal route in the reasonable time.

You can use any source for locations but if you feel stuck you can try searching some datasets on Kaggle. There are many datasets with cities there which you can use for selecting the locations. Keep in mind that when you decide to take the dataset from Kaggle you need to check the quality and eventually preprocess the data, e.g. check incorrect values (if some points are out of reasonable range) or missing values.

In [2]:
pip install folium
Requirement already satisfied: folium in c:\users\user\anaconda3\lib\site-packages (0.16.0)
Requirement already satisfied: branca>=0.6.0 in c:\users\user\anaconda3\lib\site-packages (from folium) (0.7.2)
Requirement already satisfied: jinja2>=2.9 in c:\users\user\anaconda3\lib\site-packages (from folium) (3.1.2)
Requirement already satisfied: numpy in c:\users\user\anaconda3\lib\site-packages (from folium) (1.24.3)
Requirement already satisfied: requests in c:\users\user\anaconda3\lib\site-packages (from folium) (2.31.0)
Requirement already satisfied: xyzservices in c:\users\user\anaconda3\lib\site-packages (from folium) (2022.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\user\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.1)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (2024.6.2)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip
In [2]:
pip install ortools
Requirement already satisfied: ortools in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (9.10.4067)
Requirement already satisfied: absl-py>=2.0.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from ortools) (2.1.0)
Requirement already satisfied: numpy>=1.13.3 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from ortools) (1.26.4)
Requirement already satisfied: pandas>=2.0.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from ortools) (2.2.2)
Requirement already satisfied: protobuf>=5.26.1 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from ortools) (5.27.1)
Requirement already satisfied: immutabledict>=3.0.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from ortools) (4.2.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from pandas>=2.0.0->ortools) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from pandas>=2.0.0->ortools) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from pandas>=2.0.0->ortools) (2024.1)
Requirement already satisfied: six>=1.5 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas>=2.0.0->ortools) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [3]:
pip install osmnx
Requirement already satisfied: osmnx in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (1.9.3)
Requirement already satisfied: geopandas>=0.12 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from osmnx) (0.14.4)
Requirement already satisfied: networkx>=2.5 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from osmnx) (2.8.8)
Requirement already satisfied: numpy>=1.20 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from osmnx) (1.26.4)
Requirement already satisfied: pandas>=1.1 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from osmnx) (2.2.2)
Requirement already satisfied: requests>=2.27 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from osmnx) (2.31.0)
Requirement already satisfied: shapely>=2.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from osmnx) (2.0.4)
Requirement already satisfied: fiona>=1.8.21 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from geopandas>=0.12->osmnx) (1.9.6)
Requirement already satisfied: packaging in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from geopandas>=0.12->osmnx) (24.0)
Requirement already satisfied: pyproj>=3.3.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from geopandas>=0.12->osmnx) (3.6.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from pandas>=1.1->osmnx) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from pandas>=1.1->osmnx) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from pandas>=1.1->osmnx) (2024.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from requests>=2.27->osmnx) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from requests>=2.27->osmnx) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from requests>=2.27->osmnx) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from requests>=2.27->osmnx) (2024.6.2)
Requirement already satisfied: attrs>=19.2.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from fiona>=1.8.21->geopandas>=0.12->osmnx) (23.2.0)
Requirement already satisfied: click~=8.0 in /Users/lauraf/.local/lib/python3.9/site-packages (from fiona>=1.8.21->geopandas>=0.12->osmnx) (8.1.7)
Requirement already satisfied: click-plugins>=1.0 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from fiona>=1.8.21->geopandas>=0.12->osmnx) (1.1.1)
Requirement already satisfied: cligj>=0.5 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from fiona>=1.8.21->geopandas>=0.12->osmnx) (0.7.2)
Requirement already satisfied: six in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from fiona>=1.8.21->geopandas>=0.12->osmnx) (1.16.0)
Requirement already satisfied: importlib-metadata in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from fiona>=1.8.21->geopandas>=0.12->osmnx) (7.1.0)
Requirement already satisfied: zipp>=0.5 in /Users/lauraf/miniforge3/envs/pytorch/lib/python3.9/site-packages (from importlib-metadata->fiona>=1.8.21->geopandas>=0.12->osmnx) (3.17.0)
Note: you may need to restart the kernel to use updated packages.
In [3]:
# import libraries
import pandas as pd
import numpy as np
import re

# for plotting
import folium
from folium import plugins
import plotly.express as px
import plotly.graph_objs as go
import matplotlib.pyplot as plt 
import seaborn as sns

# for simple routing
import osmnx as ox  #1.2.2
import networkx as nx  #3.0

# for advanced routing 
from ortools.constraint_solver import pywrapcp  #9.6
from ortools.constraint_solver import routing_enums_pb2
In [5]:
# import data
df = pd.read_csv("C:\\Users\\User\\Documents\\TripleTen\\Code Jam\\Dataset\\df_2.csv")
In [6]:
# general info on df
df.info()
df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63 entries, 0 to 62
Data columns (total 8 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Unnamed: 0                       63 non-null     int64  
 1   Name                             63 non-null     object 
 2   Image                            0 non-null      float64
 3   Location                         63 non-null     object 
 4   Date established as park[7][12]  63 non-null     object 
 5   Area (2021)[13]                  63 non-null     object 
 6   Recreation visitors (2021)[11]   63 non-null     int64  
 7   Description                      63 non-null     object 
dtypes: float64(1), int64(2), object(5)
memory usage: 4.1+ KB
Out[6]:
Unnamed: 0 Name Image Location Date established as park[7][12] Area (2021)[13] Recreation visitors (2021)[11] Description
0 0 Acadia NaN Maine.mw-parser-output .geo-default,.mw-parser... February 26, 1919 49,071.40 acres (198.6 km2) 4069098 Covering most of Mount Desert Island and other...
1 1 American Samoa NaN American Samoa14°15′S 170°41′W / 14.25°S 170... October 31, 1988 8,256.67 acres (33.4 km2) 8495 The southernmost national park is on three Sam...
2 2 Arches NaN Utah38°41′N 109°34′W / 38.68°N 109.57°W November 12, 1971 76,678.98 acres (310.3 km2) 1806865 This site features more than 2,000 natural san...
3 3 Badlands NaN South Dakota43°45′N 102°30′W / 43.75°N 102.50°W November 10, 1978 242,755.94 acres (982.4 km2) 1224226 The Badlands are a collection of buttes, pinna...
4 4 Big Bend NaN Texas29°15′N 103°15′W / 29.25°N 103.25°W June 12, 1944 801,163.21 acres (3,242.2 km2) 581220 Named for the prominent bend in the Rio Grande...

Upon initial inspection, the only missing rows are located in the 'image' column, which is not needed and will be dropped. Because we will only be working with the 9 rows of the California National Parks, it is not necessary to check for duplicates.

Data Preprocessing

In [7]:
# drop unnecessary columns
df = df.drop(['Image','Unnamed: 0'], axis=1)
In [8]:
# rename columns to fit style guidelines
df = df.rename(columns = {'Name':'name_of_park', 'Location':'location', 'Date established as park[7][12]': 'date_established_as_park', 'Area (2021)[13]': 'area_in_acres', 'Recreation visitors (2021)[11]': 'visitors_in_2021', 'Description':'description'})
In [9]:
# check changes
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63 entries, 0 to 62
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   name_of_park              63 non-null     object
 1   location                  63 non-null     object
 2   date_established_as_park  63 non-null     object
 3   area_in_acres             63 non-null     object
 4   visitors_in_2021          63 non-null     int64 
 5   description               63 non-null     object
dtypes: int64(1), object(5)
memory usage: 3.1+ KB
In [10]:
# find all parks located in State of California
df = df.loc[df['location'].str.contains("California")]
In [11]:
# check changes
df
Out[11]:
name_of_park location date_established_as_park area_in_acres visitors_in_2021 description
11 Channel Islands California34°01′N 119°25′W / 34.01°N 119.42°W March 5, 1980 249,561.00 acres (1,009.9 km2) 319252 Five of the eight Channel Islands are protecte...
15 Death Valley California, Nevada36°14′N 116°49′W / 36.24°N... October 31, 1994 3,408,395.63 acres (13,793.3 km2) 1146551 Death Valley is the hottest, lowest, and dries...
34 Joshua Tree California33°47′N 115°54′W / 33.79°N 115.90°W October 31, 1994 795,155.85 acres (3,217.9 km2) 3064400 Covering large areas of the Colorado and Mojav...
37 Kings Canyon California36°48′N 118°33′W / 36.80°N 118.55°W March 4, 1940 461,901.20 acres (1,869.2 km2) 562918 Home to several giant sequoia groves and the G...
40 Lassen Volcanic California40°29′N 121°31′W / 40.49°N 121.51°W August 9, 1916 106,589.02 acres (431.4 km2) 359635 Lassen Peak, the largest lava dome volcano in ...
48 Pinnacles California36°29′N 121°10′W / 36.48°N 121.16°W January 10, 2013 26,685.73 acres (108.0 km2) 348857 Named for the eroded leftovers of a portion of...
49 Redwood * California41°18′N 124°00′W / 41.30°N 124.00°W October 2, 1968 138,999.37 acres (562.5 km2) 435879 This park and the co-managed state parks prote...
52 Sequoia California36°26′N 118°41′W / 36.43°N 118.68°W September 25, 1890 404,062.63 acres (1,635.2 km2) 1059548 This park protects the Giant Forest, which boa...
61 Yosemite * California37°50′N 119°30′W / 37.83°N 119.50°W October 1, 1890 761,747.50 acres (3,082.7 km2) 3287595 Yosemite features sheer granite cliffs, except...

The new df contains no missing or duplicate values and consists of the 9 national parks located in California. These are the locations from which will be optimizing routes.

In [12]:
# convert 'date_established_as_park' to datetime data type
df['date_established_as_park'] = pd.to_datetime(df['date_established_as_park'])

Feature Engineering

Extract Latitude and Longitude from Location Column

In [13]:
# split location column
location = df.location.str.split(r'(\d)',n=1,expand=True)
In [14]:
# rename 
location = location.rename(columns={location.columns[0]:'state'})
In [15]:
# add separator back into column
location[2] = location[1] + location[2]
In [16]:
# drop separator column
location = location.drop([1],axis=1)
In [17]:
# split lat-lon coordinates into 2 columns
location[['loc1','loc2']] = location[2].str.split('/',expand=True)
In [18]:
# drop 2nd lat-lon column
location = location.drop([2,'loc2'],axis=1)
In [19]:
# check changes
location
Out[19]:
state loc1
11 California 34°01′N 119°25′W
15 California, Nevada 36°14′N 116°49′W
34 California 33°47′N 115°54′W
37 California 36°48′N 118°33′W
40 California 40°29′N 121°31′W
48 California 36°29′N 121°10′W
49 California 41°18′N 124°00′W
52 California 36°26′N 118°41′W
61 California 37°50′N 119°30′W
In [20]:
# split loc1 column into lat and lon columns
location[['lat','lon']] = location.loc1.str.split(' ',n=1,expand=True)
In [21]:
# check changes
location
Out[21]:
state loc1 lat lon
11 California 34°01′N 119°25′W 34°01′N 119°25′W
15 California, Nevada 36°14′N 116°49′W 36°14′N 116°49′W
34 California 33°47′N 115°54′W 33°47′N 115°54′W
37 California 36°48′N 118°33′W 36°48′N 118°33′W
40 California 40°29′N 121°31′W 40°29′N 121°31′W
48 California 36°29′N 121°10′W 36°29′N 121°10′W
49 California 41°18′N 124°00′W 41°18′N 124°00′W
52 California 36°26′N 118°41′W 36°26′N 118°41′W
61 California 37°50′N 119°30′W 37°50′N 119°30′W
In [22]:
# create function to convert from degree minute format to degree decimals
def ddm2dec(dms_str):
    sign = -1 if re.search('[swSW]', dms_str) else 1
    numbers = re.split('\D+', dms_str)

    degree = numbers[0]
    minute_decimal = numbers[1]

    return sign * (int(degree) + float(minute_decimal) / 60)
In [23]:
# replace ticks in string to single quote
location['lat'] = location.lat.str.replace("′","'")
location['lon'] = location.lon.str.replace("′","'")
In [24]:
# convert lat lon columns to degree decimal format
location['lat'] = location.lat.apply(lambda x:ddm2dec(x))
location['lon'] = location.lon.apply(lambda x:ddm2dec(x))
In [25]:
# check changes
location
Out[25]:
state loc1 lat lon
11 California 34°01′N 119°25′W 34.016667 -119.416667
15 California, Nevada 36°14′N 116°49′W 36.233333 -116.816667
34 California 33°47′N 115°54′W 33.783333 -115.900000
37 California 36°48′N 118°33′W 36.800000 -118.550000
40 California 40°29′N 121°31′W 40.483333 -121.516667
48 California 36°29′N 121°10′W 36.483333 -121.166667
49 California 41°18′N 124°00′W 41.300000 -124.000000
52 California 36°26′N 118°41′W 36.433333 -118.683333
61 California 37°50′N 119°30′W 37.833333 -119.500000
In [26]:
# create latitude, longitude, and state columns in original df

df['lat'] = location['lat']
df['lon'] = location['lon']
df['state'] = location['state']
In [27]:
# check changes
df
Out[27]:
name_of_park location date_established_as_park area_in_acres visitors_in_2021 description lat lon state
11 Channel Islands California34°01′N 119°25′W / 34.01°N 119.42°W 1980-03-05 249,561.00 acres (1,009.9 km2) 319252 Five of the eight Channel Islands are protecte... 34.016667 -119.416667 California
15 Death Valley California, Nevada36°14′N 116°49′W / 36.24°N... 1994-10-31 3,408,395.63 acres (13,793.3 km2) 1146551 Death Valley is the hottest, lowest, and dries... 36.233333 -116.816667 California, Nevada
34 Joshua Tree California33°47′N 115°54′W / 33.79°N 115.90°W 1994-10-31 795,155.85 acres (3,217.9 km2) 3064400 Covering large areas of the Colorado and Mojav... 33.783333 -115.900000 California
37 Kings Canyon California36°48′N 118°33′W / 36.80°N 118.55°W 1940-03-04 461,901.20 acres (1,869.2 km2) 562918 Home to several giant sequoia groves and the G... 36.800000 -118.550000 California
40 Lassen Volcanic California40°29′N 121°31′W / 40.49°N 121.51°W 1916-08-09 106,589.02 acres (431.4 km2) 359635 Lassen Peak, the largest lava dome volcano in ... 40.483333 -121.516667 California
48 Pinnacles California36°29′N 121°10′W / 36.48°N 121.16°W 2013-01-10 26,685.73 acres (108.0 km2) 348857 Named for the eroded leftovers of a portion of... 36.483333 -121.166667 California
49 Redwood * California41°18′N 124°00′W / 41.30°N 124.00°W 1968-10-02 138,999.37 acres (562.5 km2) 435879 This park and the co-managed state parks prote... 41.300000 -124.000000 California
52 Sequoia California36°26′N 118°41′W / 36.43°N 118.68°W 1890-09-25 404,062.63 acres (1,635.2 km2) 1059548 This park protects the Giant Forest, which boa... 36.433333 -118.683333 California
61 Yosemite * California37°50′N 119°30′W / 37.83°N 119.50°W 1890-10-01 761,747.50 acres (3,082.7 km2) 3287595 Yosemite features sheer granite cliffs, except... 37.833333 -119.500000 California
In [28]:
# drop unnecessary original location and new loc1 columns
df = df.drop('location',axis=1)
In [29]:
# check changes
display(df)
name_of_park date_established_as_park area_in_acres visitors_in_2021 description lat lon state
11 Channel Islands 1980-03-05 249,561.00 acres (1,009.9 km2) 319252 Five of the eight Channel Islands are protecte... 34.016667 -119.416667 California
15 Death Valley 1994-10-31 3,408,395.63 acres (13,793.3 km2) 1146551 Death Valley is the hottest, lowest, and dries... 36.233333 -116.816667 California, Nevada
34 Joshua Tree 1994-10-31 795,155.85 acres (3,217.9 km2) 3064400 Covering large areas of the Colorado and Mojav... 33.783333 -115.900000 California
37 Kings Canyon 1940-03-04 461,901.20 acres (1,869.2 km2) 562918 Home to several giant sequoia groves and the G... 36.800000 -118.550000 California
40 Lassen Volcanic 1916-08-09 106,589.02 acres (431.4 km2) 359635 Lassen Peak, the largest lava dome volcano in ... 40.483333 -121.516667 California
48 Pinnacles 2013-01-10 26,685.73 acres (108.0 km2) 348857 Named for the eroded leftovers of a portion of... 36.483333 -121.166667 California
49 Redwood * 1968-10-02 138,999.37 acres (562.5 km2) 435879 This park and the co-managed state parks prote... 41.300000 -124.000000 California
52 Sequoia 1890-09-25 404,062.63 acres (1,635.2 km2) 1059548 This park protects the Giant Forest, which boa... 36.433333 -118.683333 California
61 Yosemite * 1890-10-01 761,747.50 acres (3,082.7 km2) 3287595 Yosemite features sheer granite cliffs, except... 37.833333 -119.500000 California
In [30]:
# extract only numerical area info from area_in_acres column & convert to numeric
df['area_in_acres'] = df.area_in_acres.str.extract(r'(\d+[,.\d]*)')
df['area_in_acres'] = df.area_in_acres.apply(lambda x: x.replace(",",""))
df['area_in_acres'] = pd.to_numeric(df.area_in_acres)
In [31]:
# check changes
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, 11 to 61
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   name_of_park              9 non-null      object        
 1   date_established_as_park  9 non-null      datetime64[ns]
 2   area_in_acres             9 non-null      float64       
 3   visitors_in_2021          9 non-null      int64         
 4   description               9 non-null      object        
 5   lat                       9 non-null      float64       
 6   lon                       9 non-null      float64       
 7   state                     9 non-null      object        
dtypes: datetime64[ns](1), float64(3), int64(1), object(3)
memory usage: 648.0+ bytes
In [32]:
# remove astericks from park names
df.name_of_park = df.name_of_park.apply(lambda x:x.replace(' *',''))
In [33]:
df.name_of_park.unique()
Out[33]:
array(['Channel Islands', 'Death Valley', 'Joshua Tree', 'Kings Canyon',
       'Lassen Volcanic', 'Pinnacles', 'Redwood', 'Sequoia', 'Yosemite'],
      dtype=object)
In [34]:
df.describe()
Out[34]:
date_established_as_park area_in_acres visitors_in_2021 lat lon
count 9 9.000000e+00 9.000000e+00 9.000000 9.000000
mean 1954-06-07 00:00:00 7.058998e+05 1.176071e+06 37.040741 -119.505556
min 1890-09-25 00:00:00 2.668573e+04 3.192520e+05 33.783333 -124.000000
25% 1916-08-09 00:00:00 1.389994e+05 3.596350e+05 36.233333 -121.166667
50% 1968-10-02 00:00:00 4.040626e+05 5.629180e+05 36.483333 -119.416667
75% 1994-10-31 00:00:00 7.617475e+05 1.146551e+06 37.833333 -118.550000
max 2013-01-10 00:00:00 3.408396e+06 3.287595e+06 41.300000 -115.900000
std NaN 1.049722e+06 1.175138e+06 2.543771 2.470605

EDA

In [35]:
fig = px.scatter_geo(df, lat='lat',lon='lon', hover_name='name_of_park', size='visitors_in_2021', locationmode='USA-states',
                     height=500, width=800,
                     labels={'visitors_in_2021':'Visitors','lat':'Latitude','lon':'Longitude'})
fig.update_layout(
    title_text='California National Parks and Visitors',
    geo_scope='usa')
fig.show()
In [36]:
map = px.scatter_mapbox(df, lat='lat', lon='lon', hover_name='name_of_park',
               title='US National Parks in California',size='visitors_in_2021',
               center={'lat':36.7783,'lon':-119.4179},zoom=5,
               width=700,height=900,
               labels={'visitors_in_2021':'Visitors','lat':'Latitude','lon':'Longitude'})
map.update_layout(mapbox_style="open-street-map",geo=dict(scope='usa'))
map.show()
In [37]:
fig = px.scatter_geo(df, lat='lat', lon='lon', hover_name='name_of_park', size='area_in_acres', locationmode='USA-states',
                     height=500, width=800,
                     labels={'area_in_acres':'Area (acres)','lat':'Latitude','lon':'Longitude'})
fig.update_layout(
    title_text='US National Parks & Area in Acres',
    geo_scope='usa')
fig.show()
In [38]:
map = px.scatter_mapbox(df, lat='lat', lon='lon', hover_name='name_of_park',
               title='US National Parks in California',size='area_in_acres',
               center={'lat':36.7783,'lon':-119.4179},zoom=5,
               width=700,height=900,
               labels={'area_in_acres':'Area (Acres)','lat':'Latitude','lon':'Longitude'})
map.update_layout(mapbox_style="open-street-map",geo=dict(scope='usa'))
map.show()
In [39]:
# bar plot of recreation visitors by park
fig = plt.figure(figsize = (13, 5))

plt.bar(df['name_of_park'], df['visitors_in_2021'], color ='khaki', 
        width = 0.8)

plt.xlabel("California National Park")
plt.ylabel("No. of visitors in 2021")
plt.title("Recreational Visitors at California's National Parks in 2021")
plt.show()
In [40]:
# plot dates established as parks
date = df.sort_values(by='date_established_as_park')



fig = px.scatter(x=df['date_established_as_park'], y=df['name_of_park'], labels={"x": "Year", "y": "Park"}, title='Dates Established as US National Parks in the State of California',)
fig.show()
In [41]:
fig = px.scatter(df, x=df['date_established_as_park'], y=df['visitors_in_2021'], color=df['name_of_park'], symbol=df['name_of_park'], title='Year of Establishment and Number of Visitors in 2021', labels={"df['date_established_as_park']": "Date Established as Park", "y": "Number of Visitors in 2021"},)
fig.update_traces(marker_size=10)
fig.show()

Fun Fact:

Joshua Tree National Park in California had 3,064,400 visitors in 2021. This was second to only Yosemite, which was established as a park in 1890, over 100 years earlier. Joshua Tree's popularity may have something to do with its driving distance and accessability from major cities such as Los Angeles and San Diego.

In [42]:
# bar plot of area of parks
fig = plt.figure(figsize = (13, 5))

plt.bar(df['name_of_park'], df['area_in_acres'], color ='maroon', 
        width = 0.8)

plt.xlabel("National Park")
plt.ylabel("Area in Acres")
plt.title("California's National Parks by Size")
plt.show()
In [43]:
#correlation between visitors in 2021 and area in acres
corr = df['visitors_in_2021'].corr(df['area_in_acres'])

print('The correlation between the area of the parks in acres and the number of visitors is:', corr)
The correlation between the area of the parks in acres and the number of visitors is: 0.22832525356723213
In [44]:
sns.regplot(data=df, x="area_in_acres", y="visitors_in_2021")
plt.title('Correlation of Park Size to Visitors in 2021')
plt.xlabel('Park Size')
plt.ylabel('Visitors in 2021')
plt.show()

The seemingly low (0.228) correlation coefficient sugguests little to no relationship between the two columns.

In [44]:
df
Out[44]:
name_of_park date_established_as_park area_in_acres visitors_in_2021 description lat lon state
11 Channel Islands 1980-03-05 249561.00 319252 Five of the eight Channel Islands are protecte... 34.016667 -119.416667 California
15 Death Valley 1994-10-31 3408395.63 1146551 Death Valley is the hottest, lowest, and dries... 36.233333 -116.816667 California, Nevada
34 Joshua Tree 1994-10-31 795155.85 3064400 Covering large areas of the Colorado and Mojav... 33.783333 -115.900000 California
37 Kings Canyon 1940-03-04 461901.20 562918 Home to several giant sequoia groves and the G... 36.800000 -118.550000 California
40 Lassen Volcanic 1916-08-09 106589.02 359635 Lassen Peak, the largest lava dome volcano in ... 40.483333 -121.516667 California
48 Pinnacles 2013-01-10 26685.73 348857 Named for the eroded leftovers of a portion of... 36.483333 -121.166667 California
49 Redwood 1968-10-02 138999.37 435879 This park and the co-managed state parks prote... 41.300000 -124.000000 California
52 Sequoia 1890-09-25 404062.63 1059548 This park protects the Giant Forest, which boa... 36.433333 -118.683333 California
61 Yosemite 1890-10-01 761747.50 3287595 Yosemite features sheer granite cliffs, except... 37.833333 -119.500000 California
In [45]:
df['text'] = df['name_of_park'] + '<br>Area ' + (df['area_in_acres']).astype(str)+' acres'
limits = [(0,2),(2,5),(5,7),(7,9)]
colors = ["royalblue","crimson","lightseagreen","orange","lightgrey"]
cities = []
scale = 5000

fig = go.Figure()

for i in range(len(limits)):
    lim = limits[i]
    df_sub = df[lim[0]:lim[1]]
    fig.add_trace(go.Scattergeo(
        locationmode = 'USA-states',
        lon = df_sub['lon'],
        lat = df_sub['lat'],
        text = df_sub['text'],
        marker = dict(
            size = df_sub['area_in_acres']/scale,
            color = colors[i],
            line_color='rgb(40,40,40)',
            line_width=0.5,
            sizemode = 'area'
        ),
        name = '{0} - {1}'.format(lim[0],lim[1])))

fig.update_layout(
        title_text = 'California National Park Areas<br>(Click legend to toggle traces)',
        showlegend = True,
        geo = dict(
            scope = 'usa',
            landcolor = 'rgb(217, 217, 217)',
        )
    )

fig.show()
In [46]:
df.set_index('name_of_park',inplace=True)
In [47]:
sites = df.loc[:,['lat','lon']]
sites
Out[47]:
lat lon
name_of_park
Channel Islands 34.016667 -119.416667
Death Valley 36.233333 -116.816667
Joshua Tree 33.783333 -115.900000
Kings Canyon 36.800000 -118.550000
Lassen Volcanic 40.483333 -121.516667
Pinnacles 36.483333 -121.166667
Redwood 41.300000 -124.000000
Sequoia 36.433333 -118.683333
Yosemite 37.833333 -119.500000
In [48]:
# function to get distance between points using open source routing
def get_distance(point1: dict, point2: dict) -> tuple:
    """Gets distance between two points en route using http://project-osrm.org/docs/v5.10.0/api/#nearest-service"""
    
    url = f"""http://router.project-osrm.org/route/v1/driving/{point1["lon"]},{point1["lat"]};{point2["lon"]},{point2["lat"]}?overview=false&alternatives=false"""
    r = requests.get(url)
    
    # get the distance from the returned values
    route = json.loads(r.content)["routes"][0]
    return (route["distance"])
In [49]:
import requests
import json
In [50]:
# function to create distance matrix based on distance from open source routing
def compute_distance_matrix(dsites, dist_metric=get_distance):
    """ Creates an N x N distance matrix from a dataframe of N locations 
    with a latitute column and a longitude column """
    df_dist_matrix = pd.DataFrame(index=sites.index, 
                                  columns=sites.index)

    for orig, orig_loc in sites.iterrows():  # for each origin
        for dest, dest_loc in sites.iterrows():  # for each destination
            df_dist_matrix.at[orig, dest] = get_distance(orig_loc, dest_loc)
    return df_dist_matrix


df_distances = compute_distance_matrix(sites)

display(df_distances)
name_of_park Channel Islands Death Valley Joshua Tree Kings Canyon Lassen Volcanic Pinnacles Redwood Sequoia Yosemite
name_of_park
Channel Islands 0 498682.3 358268.7 439515.7 896276.7 393402.5 1090445.6 371054.6 579655
Death Valley 498476.5 0 571796.3 637199.1 846903 669984.5 1191139.9 568738 446748.7
Joshua Tree 361101.1 570260.3 0 672882.5 1129643.4 689940.8 1408445.6 604421.4 722871.4
Kings Canyon 439411.8 637869.8 669904.6 0 677296.3 380772.8 956098.5 180047.8 350752.9
Lassen Volcanic 896165.3 857908.1 1126658.2 677830.7 0 564056.9 390421.1 679712.8 528779.4
Pinnacles 393380.4 670252.7 686432.2 380905.9 563178.1 0 709139.7 302240.4 359043.5
Redwood 1090903.7 1201219.4 1361859.8 954949.3 389495.6 709848.3 0 956831.4 817947.8
Sequoia 370857.4 569315.3 601350.2 179888.9 679050.3 302263.6 957852.4 0 357060.1
Yosemite 574687.3 446762.7 723479.8 346161.4 528635.2 358983.6 819256.6 352728.7 0
In [51]:
# convert distance matrix to array
distance_matrix = df_distances.values
distance_matrix
Out[51]:
array([[0, 498682.3, 358268.7, 439515.7, 896276.7, 393402.5, 1090445.6,
        371054.6, 579655],
       [498476.5, 0, 571796.3, 637199.1, 846903, 669984.5, 1191139.9,
        568738, 446748.7],
       [361101.1, 570260.3, 0, 672882.5, 1129643.4, 689940.8, 1408445.6,
        604421.4, 722871.4],
       [439411.8, 637869.8, 669904.6, 0, 677296.3, 380772.8, 956098.5,
        180047.8, 350752.9],
       [896165.3, 857908.1, 1126658.2, 677830.7, 0, 564056.9, 390421.1,
        679712.8, 528779.4],
       [393380.4, 670252.7, 686432.2, 380905.9, 563178.1, 0, 709139.7,
        302240.4, 359043.5],
       [1090903.7, 1201219.4, 1361859.8, 954949.3, 389495.6, 709848.3, 0,
        956831.4, 817947.8],
       [370857.4, 569315.3, 601350.2, 179888.9, 679050.3, 302263.6,
        957852.4, 0, 357060.1],
       [574687.3, 446762.7, 723479.8, 346161.4, 528635.2, 358983.6,
        819256.6, 352728.7, 0]], dtype=object)
In [54]:
pip install python-tsp
Collecting python-tsp
  Downloading python_tsp-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting numpy<2.0.0,>=1.26.3 (from python-tsp)
  Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 0.0/61.0 kB ? eta -:--:--
     ---------------------------------------- 0.0/61.0 kB ? eta -:--:--
     ------------------- ------------------ 30.7/61.0 kB 435.7 kB/s eta 0:00:01
     -------------------------------------- 61.0/61.0 kB 540.4 kB/s eta 0:00:00
Requirement already satisfied: requests<3.0.0,>=2.28.0 in c:\users\user\anaconda3\lib\site-packages (from python-tsp) (2.31.0)
Collecting tsplib95<0.8.0,>=0.7.1 (from python-tsp)
  Downloading tsplib95-0.7.1-py2.py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\user\anaconda3\lib\site-packages (from requests<3.0.0,>=2.28.0->python-tsp) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\user\anaconda3\lib\site-packages (from requests<3.0.0,>=2.28.0->python-tsp) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\user\anaconda3\lib\site-packages (from requests<3.0.0,>=2.28.0->python-tsp) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\user\anaconda3\lib\site-packages (from requests<3.0.0,>=2.28.0->python-tsp) (2024.6.2)
Requirement already satisfied: Click>=6.0 in c:\users\user\anaconda3\lib\site-packages (from tsplib95<0.8.0,>=0.7.1->python-tsp) (8.0.4)
Collecting Deprecated~=1.2.9 (from tsplib95<0.8.0,>=0.7.1->python-tsp)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting networkx~=2.1 (from tsplib95<0.8.0,>=0.7.1->python-tsp)
  Downloading networkx-2.8.8-py3-none-any.whl.metadata (5.1 kB)
Requirement already satisfied: tabulate~=0.8.7 in c:\users\user\anaconda3\lib\site-packages (from tsplib95<0.8.0,>=0.7.1->python-tsp) (0.8.10)
Requirement already satisfied: colorama in c:\users\user\anaconda3\lib\site-packages (from Click>=6.0->tsplib95<0.8.0,>=0.7.1->python-tsp) (0.4.6)
Requirement already satisfied: wrapt<2,>=1.10 in c:\users\user\anaconda3\lib\site-packages (from Deprecated~=1.2.9->tsplib95<0.8.0,>=0.7.1->python-tsp) (1.14.1)
Downloading python_tsp-0.4.1-py3-none-any.whl (26 kB)
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
   ---------------------------------------- 0.0/15.8 MB ? eta -:--:--
   - -------------------------------------- 0.7/15.8 MB 14.4 MB/s eta 0:00:02
   ----- ---------------------------------- 2.1/15.8 MB 22.3 MB/s eta 0:00:01
   --------- ------------------------------ 3.6/15.8 MB 25.8 MB/s eta 0:00:01
   -------------- ------------------------- 5.6/15.8 MB 27.4 MB/s eta 0:00:01
   ------------------- -------------------- 7.8/15.8 MB 31.1 MB/s eta 0:00:01
   ------------------------ --------------- 9.9/15.8 MB 35.0 MB/s eta 0:00:01
   ------------------------------ --------- 11.9/15.8 MB 38.5 MB/s eta 0:00:01
   ---------------------------------- ----- 13.7/15.8 MB 40.9 MB/s eta 0:00:01
   ---------------------------------------  15.8/15.8 MB 46.7 MB/s eta 0:00:01
   ---------------------------------------  15.8/15.8 MB 46.7 MB/s eta 0:00:01
   ---------------------------------------- 15.8/15.8 MB 36.2 MB/s eta 0:00:00
Downloading tsplib95-0.7.1-py2.py3-none-any.whl (25 kB)
Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Downloading networkx-2.8.8-py3-none-any.whl (2.0 MB)
   ---------------------------------------- 0.0/2.0 MB ? eta -:--:--
   ----------------------------------- ---- 1.8/2.0 MB 37.9 MB/s eta 0:00:01
   ---------------------------------------- 2.0/2.0 MB 32.0 MB/s eta 0:00:00
Installing collected packages: numpy, networkx, Deprecated, tsplib95, python-tsp
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.3
    Uninstalling numpy-1.24.3:
      Successfully uninstalled numpy-1.24.3
  Attempting uninstall: networkx
    Found existing installation: networkx 3.1
    Uninstalling networkx-3.1:
      Successfully uninstalled networkx-3.1
Successfully installed Deprecated-1.2.14 networkx-2.8.8 numpy-1.26.4 python-tsp-0.4.1 tsplib95-0.7.1
Note: you may need to restart the kernel to use updated packages.
  WARNING: Failed to remove contents in a temporary directory 'C:\Users\User\anaconda3\Lib\site-packages\~umpy'.
  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.0 requires FuzzyTM>=0.4.0, which is not installed.
tables 3.8.0 requires blosc2~=2.0.0, which is not installed.
tables 3.8.0 requires cython>=0.29.21, which is not installed.
numba 0.57.1 requires numpy<1.25,>=1.21, but you have numpy 1.26.4 which is incompatible.
streamlit 1.28.0 requires protobuf<5,>=3.20, but you have protobuf 5.27.1 which is incompatible.

[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip
In [55]:
from python_tsp.exact import solve_tsp_brute_force


order, distance_tot = solve_tsp_brute_force(distance_matrix)
In [56]:
order
Out[56]:
[0, 2, 1, 8, 4, 6, 5, 7, 3]
In [57]:
distance_tot
Out[57]:
3925723.3999999994
In [ ]: