Forecasting Wildfire Risk in California

Ⅰ. Introduction

Ⅱ. Data

2.1 Data Wranging
2.2 Feature Engineering

Ⅲ. Exploratory Analysis

3.1 Time Analysis
3.2 Continuous features Analysis
3.3 Landcover feature Analysis

Ⅳ. Risks Estimation Model

4.1 Build Fire risk Prediction Model
4.2 Model Prediction

Ⅴ. Model validation

5.1 Goodnees of fit
5.2 ROC(AUC)
5.3 Cross Validation
5.4 Weight Scenario Analysis
5.5 Maximize Weighted Value (Optimize Threshold)
5.6 Optimized model

Ⅵ. Wildfire Risk Prediction Map

6.1 Year Prediction Map
6.2 Monthly Prediction Map

Ⅶ. Conclusion

Ⅰ. Introduction

Since the 1980s, the size and intensity of wildfires in California have notably increased. The amount of land burned by wildfires in the state has risen steeply in the past five years. In 2021, 8,619 wildfires burned almost 2.6 million acres. There were three fatalities and 3,629 structures were damaged or destroyed. Although the government has paid much attention to this problem and made effort to reduce these kind of things, the situation has not changed.

As the changes of climate, the wildfires becomes more and more frequent. So it’s essential for us to prevent wildfires in much more proactive ways. Our solution is using Logistic Regression to predict probability occurrence rate of wildfires in California by merging various data like elevation, land cover, precipitation and so on. By visualizing this prediction data, firefighters could aware upcoming dangers and able to take prevent measures. After constructing model, we’ve run a 100-folds cross validation to test the model’s generalizability. At last, we conduct weight scenario analysis to find the optimal threshold. For more accurate results, we also create 12 models for each month. That would provide more granular guidance to firefighters.

Favigator, the application below, was created based on the results of our predictions. It contains the basic function like when you click a month, it will show the wildfire risk prediction map accordingly, and you can also zoom to fire unit which you are interested in. The second and also main usage is to suggest an optimal patrol route of risky area based on user location. Click here to access the Favigator introduction video.

Ⅱ. Data

2.1 Data Wranging

This project includes 8 datasets in total: California Wildfires, California neighborhoods, Elevation, Land cover, Precipitation, Temperature, Facilities and Ground water stations data. The details are as follows

2.1.1 Initial Data : California Wildfires & Fishnet Grid

California wildfires data are collected from California Open Data fire perimeters, it records the data from 1878 to the latest date. These records are complete from 2002 to present. It’s an ESRI ArcGIS file geodatabase with three layers:

A layer depicting wildfire perimeters from contributing agencies current as of the previous fire year;
A layer depicting prescribed fires supplied from contributing agencies current as of the previous fire year;
A layer representing non-prescribed fire fuel reduction projects that were initially included in the database. Fuels reduction projects that are non prescribed fire are no longer included.

We chose the year of 2017, 2018 and 2019 as our training data, 2020 as the test data. The neighborhoods data is from California Open Data and we created 2 miles fishnet grids as the basic spatial analysis unit. THe following raster and vector data are all merged and calculated to the grids.

#------------------- California Wildfire Data-------------------#

California_Wildfire <- st_read("https://opendata.arcgis.com/api/v3/datasets/e3802d2abf8741a187e73a9db49d68fe_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1") %>%
  st_transform('EPSG:3310') %>%
  filter(YEAR_ >= 1910) 

California_Wildfire <- California_Wildfire %>%
  mutate(SHAPE_Area_m2 = st_area(California_Wildfire),
         SHAPE_Area_m2_ln = log(SHAPE_Area_m2))

#------------------ California County Boundaries----------------#

County_boudaries<- st_read("https://opendata.arcgis.com/api/v3/datasets/8713ced9b78a4abb97dc130a691a8695_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1") %>%
  st_transform('EPSG:3310') %>%
  dplyr::select(COUNTY_NAME, geometry) %>%
  unique()

#------------------California 2miles fishnet maps --------------#

study_area <-
  st_union(County_boudaries) %>%
  st_transform('EPSG:3310')

fishnet <-
  st_make_grid(study_area,
               cellsize = 10560, #2miles
               square = TRUE) %>%
  .[study_area] %>%
  st_sf() %>%
  mutate(uniqueID = rownames(.)) %>%
  st_transform('EPSG:3310')


#----------------------Wildfire distribution map------------------#

fg2111 <- ggplot()  +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent") +
  geom_sf(data = California_Wildfire, aes(fill = q5(SHAPE_Area_m2)), color = "transparent") +
  scale_fill_manual(values = bluePalette5,
                    labels = round(as.numeric(qBr(California_Wildfire, "SHAPE_Area_m2"))/1000,2),
                    name = "Area of \nWildfires\n(1,000m2\nquantiles)",
                    na.translate=FALSE)+
  labs(title= "California Wildfires",
       subtitle = "Total 21,533 cases\n Since 1910",
       caption = "Figure 2-1-1-1") +
  mapThememin2()

#-------------------count wildfires of each fishnet--------------#

fishnet <- st_join(fishnet, California_Wildfire) %>% dplyr::select(uniqueID) %>% group_by(uniqueID) %>% summarise(Wildfires = n())

fg2112 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent") +
  geom_sf(data = fishnet, aes(fill = Wildfires), color = "transparent") +
    scale_fill_gradient(low = greyPalette5[1], high = bluePalette5[5]) +
  labs(title = "Wildfire counts by Fishnet",
       subtitle = "Wildfire Frequency Mapping\n Since 1910",
       caption = "Figure 2-1-1-2") +
  mapThememin2()

fg2111+fg2112

2.1.2 Raster Data

2.1.2.1 USGS Elevation Data

Elevation data of the United State can be accessed from the USGS. The data used for this project is Elevation Products (3DEP) at 1/3 arc-second (approximately 10 m) resolution that can cover all the state of California. We have downloaded 61 Tiff format images, then merged them by QGIS and use the “Clip Raster” tool to get the area of California. The result Tiff data size is about 23.9GB.

## Elevation Data
California_elevation <- raster('./Data/Califoria_elevation.tif')

## Merge to fishnet
fishnet <- cbind(fishnet, exact_extract(California_elevation, 
                                        fishnet %>% st_transform(crs(California_elevation)), 
                                        'mean')) %>%
  rename(elevation = exact_extract.California_elevation..fishnet.....st_transform.crs.California_elevation....)


fg2121 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet, aes(fill = elevation), color = "transparent") +
    scale_fill_gradient(low = greyPalette5[1], high = bluePalette5[5]) +
  labs(title = "Mean Elevation in California",
       subtitle = "Source : USGS",
       caption = "Figure 2-1-2-1") +
  mapThememin2()

2.1.2.2 NLCD 2019 Land Cover Data

Land cover data is of great importance for our analysis. The National Land Cover Database (NLCD) provides nationwide data on land cover and land cover change at a 30m resolution. We have downloaded the entire data and then clipped it to the area of California. The result file size is about 910MB.

## Land Cover

California_landcover <- raster('./Data/Land Cover/California_landcover.tif')

## Create a dataframe defining the classifications based on the legend
reclass_df <- c(0, 10, 0,
                10, 11, 11, # Open Water
                11, 12, 12, # Perennial Ice/Snow
                12, 20, 0,
                20, 21, 21, # Developed, Open Space
                21, 22, 22, # Developed, Low Intensity
                22, 23, 23, # Developed, Medium Intensity 
                23, 24, 24, # Developed High Intensity
                24, 30, 0,
                30, 31, 31, # Barren Land (Rock/Sand/Clay)
                31, 40, 0,
                40, 41, 41, # Deciduous Forest
                41, 42, 42, # Evergreen Forest
                42, 43, 43, # Mixed Forest
                44, 51, 0,
                51, 52, 52, # Shrub/Scrub
                52, 70, 0,
                70, 71, 71, # Grassland/Herbaceous
                71, 80, 0, 
                80, 81, 81, # Pasture/Hay
                81, 82, 82, # Cultivated Crops
                82, 89, 0,
                89, 90, 90, # Woody Wetlands
                90, 94, 0,
                94, 95, 95, # Emergent Herbaceous Wetlands
                95, 255, 0)

## Reshape
reclass_m <- matrix(reclass_df,
                ncol = 3,
                byrow = TRUE)

## Reclassification
California_landcover <- reclassify(California_landcover, reclass_m)

# set a min and max value for the raster
California_landcover <- setMinMax(California_landcover)

fishnet <- cbind(fishnet, exact_extract(California_landcover, 
                                        fishnet %>% st_transform(crs(California_landcover)), 
                                        'mode')) %>%
  rename(landcover =    
exact_extract.California_landcover..fishnet.....st_transform.crs.California_landcover....) %>%
  mutate(landcover = as.factor(landcover)) %>%
  rename(lc_num = landcover) %>%
  mutate(landcover = case_when(lc_num == "11" ~ "Water",
                               lc_num == "12" ~ "Snow",
                               lc_num == "21" ~ "Developed_open",
                               lc_num == "22" ~ "Developed_low",
                               lc_num == "23" ~ "Developed_med",
                               lc_num == "24" ~ "Developed_high",
                               lc_num == "31" ~ "Barren",
                               lc_num == "41" ~ "Deciduous",
                               lc_num == "42" ~ "Evergreen",
                               lc_num == "43" ~ "Mixed",
                               lc_num == "52" ~ "Shrub",
                               lc_num == "71" ~ "Herbaceous",
                               lc_num == "81" ~ "Pasture",
                               lc_num == "82" ~ "Crops",
                               lc_num == "90" ~ "Woody_wetlands",
                               lc_num == "95" ~ "Herbaceous_wetlands")) %>%
  mutate(landcover = make.names(landcover)) %>%
  mutate(landcover = as.factor(landcover)) %>%
  filter(lc_num != 0)

fg2122 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = fishnet, aes(fill = landcover), colour = NA) +
  scale_fill_manual(values = c("Water" = "#5475A8",
                               "Snow" = "#ffffff",
                               "Developed_open" = "#E8D1D1",
                               "Developed_low" = "#E29E8C",
                               "Developed_med" = "#ff0000",
                               "Developed_high" = "#B50000",
                               "Barren" = "#D2CDC0",
                               "Deciduous" = "#85C77E",
                               "Evergreen" = "#38814E",
                               "Mixed" = "#D4E7B0",
                               "Shrub" = "#DCCA8F",
                               "Herbaceous" = "#FDE9AA",
                               "Pasture" = "#FBF65D",
                               "Crops" = "#CA9146",
                               "Woody_wetlands" = "#C8E6F8",
                               "Herbaceous_wetlands" = "#64B3D5")) +
  labs(title = "Land Cover in California",
       subtitle = "Source : National Land Cover Database",
       caption = "Figure 2-1-2-2") +
  mapThememin2()

fg2121 + fg2122

2.1.2.3 Precipitation Data

The precipitation data is available from PRISM Climate Group from Oregon State University. We have downloaded the raster data from 2017-2020 at the level of 4Km * 4Km.

## Precipitation Data

Precipitation_2019 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_2019_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_2019, 
                                        fishnet %>% st_transform(crs(Precipitation_2019)), 
                                        'mean')) %>%
  rename(Precipitation_2019 = exact_extract.Precipitation_2019..fishnet.....st_transform.crs.Precipitation_2019....)



fg2123 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet, aes(fill = Precipitation_2019), color = "transparent") +
    scale_fill_gradient(low = greyPalette5[1], high = bluePalette5[5]) +
  labs(title = "Mean Precipitation in California",
       subtitle = "2019",
       caption = "Figure 2-1-2-3") +
  mapThememin2()

## 2017

Precipitation_201701 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201701_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201701, 
                                        fishnet %>% st_transform(crs(Precipitation_201701)), 
                                        'mean')) %>%
  rename(Precipitation_201701 = exact_extract.Precipitation_201701..fishnet.....st_transform.crs.Precipitation_201701....)

Precipitation_201702 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201702_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201702, 
                                        fishnet %>% st_transform(crs(Precipitation_201702)), 
                                        'mean')) %>%
  rename(Precipitation_201702 = exact_extract.Precipitation_201702..fishnet.....st_transform.crs.Precipitation_201702....)

Precipitation_201703 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201703_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201703, 
                                        fishnet %>% st_transform(crs(Precipitation_201703)), 
                                        'mean')) %>%
  rename(Precipitation_201703 = exact_extract.Precipitation_201703..fishnet.....st_transform.crs.Precipitation_201703....)

Precipitation_201704 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201704_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201704, 
                                        fishnet %>% st_transform(crs(Precipitation_201704)), 
                                        'mean')) %>%
  rename(Precipitation_201704 = exact_extract.Precipitation_201704..fishnet.....st_transform.crs.Precipitation_201704....)

Precipitation_201705 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201705_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201705, 
                                        fishnet %>% st_transform(crs(Precipitation_201705)), 
                                        'mean')) %>%
  rename(Precipitation_201705 = exact_extract.Precipitation_201705..fishnet.....st_transform.crs.Precipitation_201705....)

Precipitation_201706 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201706_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201706, 
                                        fishnet %>% st_transform(crs(Precipitation_201706)), 
                                        'mean')) %>%
  rename(Precipitation_201706 = exact_extract.Precipitation_201706..fishnet.....st_transform.crs.Precipitation_201706....)

Precipitation_201707 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201707_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201707, 
                                        fishnet %>% st_transform(crs(Precipitation_201707)), 
                                        'mean')) %>%
  rename(Precipitation_201707 = exact_extract.Precipitation_201707..fishnet.....st_transform.crs.Precipitation_201707....)

Precipitation_201708 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201708_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201708, 
                                        fishnet %>% st_transform(crs(Precipitation_201708)), 
                                        'mean')) %>%
  rename(Precipitation_201708 = exact_extract.Precipitation_201708..fishnet.....st_transform.crs.Precipitation_201708....)

Precipitation_201709 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201709_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201709, 
                                        fishnet %>% st_transform(crs(Precipitation_201709)), 
                                        'mean')) %>%
  rename(Precipitation_201709 = exact_extract.Precipitation_201709..fishnet.....st_transform.crs.Precipitation_201709....)

Precipitation_201710 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201710_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201710, 
                                        fishnet %>% st_transform(crs(Precipitation_201710)), 
                                        'mean')) %>%
  rename(Precipitation_201710 = exact_extract.Precipitation_201710..fishnet.....st_transform.crs.Precipitation_201710....)

Precipitation_201711 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201711_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201711, 
                                        fishnet %>% st_transform(crs(Precipitation_201711)), 
                                        'mean')) %>%
  rename(Precipitation_201711 = exact_extract.Precipitation_201711..fishnet.....st_transform.crs.Precipitation_201711....)

Precipitation_201712 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2017_all_bil/PRISM_ppt_stable_4kmM3_201712_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201712, 
                                        fishnet %>% st_transform(crs(Precipitation_201712)), 
                                        'mean')) %>%
  rename(Precipitation_201712 = exact_extract.Precipitation_201712..fishnet.....st_transform.crs.Precipitation_201712....)

## 2018

Precipitation_201801 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201801_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201801, 
                                        fishnet %>% st_transform(crs(Precipitation_201801)), 
                                        'mean')) %>%
  rename(Precipitation_201801 = exact_extract.Precipitation_201801..fishnet.....st_transform.crs.Precipitation_201801....)

Precipitation_201802 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201802_bil.bil')


fishnet <- cbind(fishnet, exact_extract(Precipitation_201802, 
                                        fishnet %>% st_transform(crs(Precipitation_201802)), 
                                        'mean')) %>%
  rename(Precipitation_201802 = exact_extract.Precipitation_201802..fishnet.....st_transform.crs.Precipitation_201802....)

Precipitation_201803 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201803_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201803, 
                                        fishnet %>% st_transform(crs(Precipitation_201803)), 
                                        'mean')) %>%
  rename(Precipitation_201803 = exact_extract.Precipitation_201803..fishnet.....st_transform.crs.Precipitation_201803....)

Precipitation_201804 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201804_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201804, 
                                        fishnet %>% st_transform(crs(Precipitation_201804)), 
                                        'mean')) %>%
  rename(Precipitation_201804 = exact_extract.Precipitation_201804..fishnet.....st_transform.crs.Precipitation_201804....)

Precipitation_201805 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201805_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201805, 
                                        fishnet %>% st_transform(crs(Precipitation_201805)), 
                                        'mean')) %>%
  rename(Precipitation_201805 = exact_extract.Precipitation_201805..fishnet.....st_transform.crs.Precipitation_201805....)

Precipitation_201806 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201806_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201806, 
                                        fishnet %>% st_transform(crs(Precipitation_201806)), 
                                        'mean')) %>%
  rename(Precipitation_201806 = exact_extract.Precipitation_201806..fishnet.....st_transform.crs.Precipitation_201806....)

Precipitation_201807 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201807_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201807, 
                                        fishnet %>% st_transform(crs(Precipitation_201807)), 
                                        'mean')) %>%
  rename(Precipitation_201807 = exact_extract.Precipitation_201807..fishnet.....st_transform.crs.Precipitation_201807....)

Precipitation_201808 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201808_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201808, 
                                        fishnet %>% st_transform(crs(Precipitation_201808)), 
                                        'mean')) %>%
  rename(Precipitation_201808 = exact_extract.Precipitation_201808..fishnet.....st_transform.crs.Precipitation_201808....)

Precipitation_201809 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201809_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201809, 
                                        fishnet %>% st_transform(crs(Precipitation_201809)), 
                                        'mean')) %>%
  rename(Precipitation_201809 = exact_extract.Precipitation_201809..fishnet.....st_transform.crs.Precipitation_201809....)

Precipitation_201810 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201810_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201810, 
                                        fishnet %>% st_transform(crs(Precipitation_201810)), 
                                        'mean')) %>%
  rename(Precipitation_201810 = exact_extract.Precipitation_201810..fishnet.....st_transform.crs.Precipitation_201810....)

Precipitation_201811 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201811_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201811, 
                                        fishnet %>% st_transform(crs(Precipitation_201811)), 
                                        'mean')) %>%
  rename(Precipitation_201811 = exact_extract.Precipitation_201811..fishnet.....st_transform.crs.Precipitation_201811....)

Precipitation_201812 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2018_all_bil/PRISM_ppt_stable_4kmM3_201812_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201812, 
                                        fishnet %>% st_transform(crs(Precipitation_201812)), 
                                        'mean')) %>%
  rename(Precipitation_201812 = exact_extract.Precipitation_201812..fishnet.....st_transform.crs.Precipitation_201812....)

## 2019

Precipitation_201901 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201901_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201901, 
                                        fishnet %>% st_transform(crs(Precipitation_201901)), 
                                        'mean')) %>%
  rename(Precipitation_201901 = exact_extract.Precipitation_201901..fishnet.....st_transform.crs.Precipitation_201901....)

Precipitation_201902 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201902_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201902, 
                                        fishnet %>% st_transform(crs(Precipitation_201902)), 
                                        'mean')) %>%
  rename(Precipitation_201902 = exact_extract.Precipitation_201902..fishnet.....st_transform.crs.Precipitation_201902....)

Precipitation_201903 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201903_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201903, 
                                        fishnet %>% st_transform(crs(Precipitation_201903)), 
                                        'mean')) %>%
  rename(Precipitation_201903 = exact_extract.Precipitation_201903..fishnet.....st_transform.crs.Precipitation_201903....)

Precipitation_201904 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201904_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201904, 
                                        fishnet %>% st_transform(crs(Precipitation_201904)), 
                                        'mean')) %>%
  rename(Precipitation_201904 = exact_extract.Precipitation_201904..fishnet.....st_transform.crs.Precipitation_201904....)

Precipitation_201905 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201905_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201905, 
                                        fishnet %>% st_transform(crs(Precipitation_201905)), 
                                        'mean')) %>%
  rename(Precipitation_201905 = exact_extract.Precipitation_201905..fishnet.....st_transform.crs.Precipitation_201905....)

Precipitation_201906 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201906_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201906, 
                                        fishnet %>% st_transform(crs(Precipitation_201906)), 
                                        'mean')) %>%
  rename(Precipitation_201906 = exact_extract.Precipitation_201906..fishnet.....st_transform.crs.Precipitation_201906....)

Precipitation_201907 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201907_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201907, 
                                        fishnet %>% st_transform(crs(Precipitation_201907)), 
                                        'mean')) %>%
  rename(Precipitation_201907 = exact_extract.Precipitation_201907..fishnet.....st_transform.crs.Precipitation_201907....)

Precipitation_201908 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201908_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201908, 
                                        fishnet %>% st_transform(crs(Precipitation_201908)), 
                                        'mean')) %>%
  rename(Precipitation_201908 = exact_extract.Precipitation_201908..fishnet.....st_transform.crs.Precipitation_201908....)

Precipitation_201909 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201909_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201909, 
                                        fishnet %>% st_transform(crs(Precipitation_201909)), 
                                        'mean')) %>%
  rename(Precipitation_201909 = exact_extract.Precipitation_201909..fishnet.....st_transform.crs.Precipitation_201909....)

Precipitation_201910 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201910_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201910, 
                                        fishnet %>% st_transform(crs(Precipitation_201910)), 
                                        'mean')) %>%
  rename(Precipitation_201910 = exact_extract.Precipitation_201910..fishnet.....st_transform.crs.Precipitation_201910....)

Precipitation_201911 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201911_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201911, 
                                        fishnet %>% st_transform(crs(Precipitation_201911)), 
                                        'mean')) %>%
  rename(Precipitation_201911 = exact_extract.Precipitation_201911..fishnet.....st_transform.crs.Precipitation_201911....)

Precipitation_201912 <- raster('./Data/Precipitation/PRISM_ppt_stable_4kmM3_2019_all_bil/PRISM_ppt_stable_4kmM3_201912_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Precipitation_201912, 
                                        fishnet %>% st_transform(crs(Precipitation_201912)), 
                                        'mean')) %>%
  rename(Precipitation_201912 = exact_extract.Precipitation_201912..fishnet.....st_transform.crs.Precipitation_201912....)

rm(Precipitation_201701, Precipitation_201702, Precipitation_201703, Precipitation_201704, Precipitation_201705, Precipitation_201706,
   Precipitation_201707, Precipitation_201708, Precipitation_201709, Precipitation_201710, Precipitation_201711, Precipitation_201712,
   Precipitation_201801, Precipitation_201802, Precipitation_201803, Precipitation_201804, Precipitation_201805, Precipitation_201806,
   Precipitation_201807, Precipitation_201808, Precipitation_201809, Precipitation_201810, Precipitation_201811, Precipitation_201812,
   Precipitation_201901, Precipitation_201902, Precipitation_201903, Precipitation_201904, Precipitation_201905, Precipitation_201906,
   Precipitation_201907, Precipitation_201908, Precipitation_201909, Precipitation_201910, Precipitation_201911, Precipitation_201912)

2.1.2.4 Temperature Data

The temperature data can be also obtained from PRISM and we downloaded the raster from 2017-2020 at the level of 4Km * 4Km.

## Temperature Data

## 2017

Temperature_201701 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201701_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201701, 
                                        fishnet %>% st_transform(crs(Temperature_201701)), 
                                        'mean')) %>%
  rename(Temperature_201701 = exact_extract.Temperature_201701..fishnet.....st_transform.crs.Temperature_201701....)

Temperature_201702 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201702_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201702, 
                                        fishnet %>% st_transform(crs(Temperature_201702)), 
                                        'mean')) %>%
  rename(Temperature_201702 = exact_extract.Temperature_201702..fishnet.....st_transform.crs.Temperature_201702....)

Temperature_201703 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201703_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201703, 
                                        fishnet %>% st_transform(crs(Temperature_201703)), 
                                        'mean')) %>%
  rename(Temperature_201703 = exact_extract.Temperature_201703..fishnet.....st_transform.crs.Temperature_201703....)

Temperature_201704 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201704_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201704, 
                                        fishnet %>% st_transform(crs(Temperature_201704)), 
                                        'mean')) %>%
  rename(Temperature_201704 = exact_extract.Temperature_201704..fishnet.....st_transform.crs.Temperature_201704....)

Temperature_201705 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201705_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201705, 
                                        fishnet %>% st_transform(crs(Temperature_201705)), 
                                        'mean')) %>%
  rename(Temperature_201705 = exact_extract.Temperature_201705..fishnet.....st_transform.crs.Temperature_201705....)

Temperature_201706 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201706_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201706, 
                                        fishnet %>% st_transform(crs(Temperature_201706)), 
                                        'mean')) %>%
  rename(Temperature_201706 = exact_extract.Temperature_201706..fishnet.....st_transform.crs.Temperature_201706....)

Temperature_201707 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201707_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201707, 
                                        fishnet %>% st_transform(crs(Temperature_201707)), 
                                        'mean')) %>%
  rename(Temperature_201707 = exact_extract.Temperature_201707..fishnet.....st_transform.crs.Temperature_201707....)

Temperature_201708 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201708_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201708, 
                                        fishnet %>% st_transform(crs(Temperature_201708)), 
                                        'mean')) %>%
  rename(Temperature_201708 = exact_extract.Temperature_201708..fishnet.....st_transform.crs.Temperature_201708....)

Temperature_201709 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201709_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201709, 
                                        fishnet %>% st_transform(crs(Temperature_201709)), 
                                        'mean')) %>%
  rename(Temperature_201709 = exact_extract.Temperature_201709..fishnet.....st_transform.crs.Temperature_201709....)

Temperature_201710 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201710_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201710, 
                                        fishnet %>% st_transform(crs(Temperature_201710)), 
                                        'mean')) %>%
  rename(Temperature_201710 = exact_extract.Temperature_201710..fishnet.....st_transform.crs.Temperature_201710....)

Temperature_201711 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201711_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201711, 
                                        fishnet %>% st_transform(crs(Temperature_201711)), 
                                        'mean')) %>%
  rename(Temperature_201711 = exact_extract.Temperature_201711..fishnet.....st_transform.crs.Temperature_201711....)

Temperature_201712 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2017_all_bil/PRISM_tdmean_stable_4kmM3_201712_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201712, 
                                        fishnet %>% st_transform(crs(Temperature_201712)), 
                                        'mean')) %>%
  rename(Temperature_201712 = exact_extract.Temperature_201712..fishnet.....st_transform.crs.Temperature_201712....)

## 2018

Temperature_201801 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201801_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201801, 
                                        fishnet %>% st_transform(crs(Temperature_201801)), 
                                        'mean')) %>%
  rename(Temperature_201801 = exact_extract.Temperature_201801..fishnet.....st_transform.crs.Temperature_201801....)

Temperature_201802 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201802_bil.bil')


fishnet <- cbind(fishnet, exact_extract(Temperature_201802, 
                                        fishnet %>% st_transform(crs(Temperature_201802)), 
                                        'mean')) %>%
  rename(Temperature_201802 = exact_extract.Temperature_201802..fishnet.....st_transform.crs.Temperature_201802....)

Temperature_201803 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201803_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201803, 
                                        fishnet %>% st_transform(crs(Temperature_201803)), 
                                        'mean')) %>%
  rename(Temperature_201803 = exact_extract.Temperature_201803..fishnet.....st_transform.crs.Temperature_201803....)

Temperature_201804 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201804_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201804, 
                                        fishnet %>% st_transform(crs(Temperature_201804)), 
                                        'mean')) %>%
  rename(Temperature_201804 = exact_extract.Temperature_201804..fishnet.....st_transform.crs.Temperature_201804....)

Temperature_201805 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201805_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201805, 
                                        fishnet %>% st_transform(crs(Temperature_201805)), 
                                        'mean')) %>%
  rename(Temperature_201805 = exact_extract.Temperature_201805..fishnet.....st_transform.crs.Temperature_201805....)

Temperature_201806 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201806_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201806, 
                                        fishnet %>% st_transform(crs(Temperature_201806)), 
                                        'mean')) %>%
  rename(Temperature_201806 = exact_extract.Temperature_201806..fishnet.....st_transform.crs.Temperature_201806....)

Temperature_201807 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201807_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201807, 
                                        fishnet %>% st_transform(crs(Temperature_201807)), 
                                        'mean')) %>%
  rename(Temperature_201807 = exact_extract.Temperature_201807..fishnet.....st_transform.crs.Temperature_201807....)

Temperature_201808 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201808_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201808, 
                                        fishnet %>% st_transform(crs(Temperature_201808)), 
                                        'mean')) %>%
  rename(Temperature_201808 = exact_extract.Temperature_201808..fishnet.....st_transform.crs.Temperature_201808....)

Temperature_201809 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201809_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201809, 
                                        fishnet %>% st_transform(crs(Temperature_201809)), 
                                        'mean')) %>%
  rename(Temperature_201809 = exact_extract.Temperature_201809..fishnet.....st_transform.crs.Temperature_201809....)

Temperature_201810 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201810_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201810, 
                                        fishnet %>% st_transform(crs(Temperature_201810)), 
                                        'mean')) %>%
  rename(Temperature_201810 = exact_extract.Temperature_201810..fishnet.....st_transform.crs.Temperature_201810....)

Temperature_201811 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201811_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201811, 
                                        fishnet %>% st_transform(crs(Temperature_201811)), 
                                        'mean')) %>%
  rename(Temperature_201811 = exact_extract.Temperature_201811..fishnet.....st_transform.crs.Temperature_201811....)

Temperature_201812 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2018_all_bil/PRISM_tdmean_stable_4kmM3_201812_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201812, 
                                        fishnet %>% st_transform(crs(Temperature_201812)), 
                                        'mean')) %>%
  rename(Temperature_201812 = exact_extract.Temperature_201812..fishnet.....st_transform.crs.Temperature_201812....)

## 2019

Temperature_201901 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201901_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201901, 
                                        fishnet %>% st_transform(crs(Temperature_201901)), 
                                        'mean')) %>%
  rename(Temperature_201901 = exact_extract.Temperature_201901..fishnet.....st_transform.crs.Temperature_201901....)

Temperature_201902 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201902_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201902, 
                                        fishnet %>% st_transform(crs(Temperature_201902)), 
                                        'mean')) %>%
  rename(Temperature_201902 = exact_extract.Temperature_201902..fishnet.....st_transform.crs.Temperature_201902....)

Temperature_201903 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201903_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201903, 
                                        fishnet %>% st_transform(crs(Temperature_201903)), 
                                        'mean')) %>%
  rename(Temperature_201903 = exact_extract.Temperature_201903..fishnet.....st_transform.crs.Temperature_201903....)

Temperature_201904 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201904_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201904, 
                                        fishnet %>% st_transform(crs(Temperature_201904)), 
                                        'mean')) %>%
  rename(Temperature_201904 = exact_extract.Temperature_201904..fishnet.....st_transform.crs.Temperature_201904....)

Temperature_201905 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201905_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201905, 
                                        fishnet %>% st_transform(crs(Temperature_201905)), 
                                        'mean')) %>%
  rename(Temperature_201905 = exact_extract.Temperature_201905..fishnet.....st_transform.crs.Temperature_201905....)

Temperature_201906 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201906_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201906, 
                                        fishnet %>% st_transform(crs(Temperature_201906)), 
                                        'mean')) %>%
  rename(Temperature_201906 = exact_extract.Temperature_201906..fishnet.....st_transform.crs.Temperature_201906....)

Temperature_201907 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201907_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201907, 
                                        fishnet %>% st_transform(crs(Temperature_201907)), 
                                        'mean')) %>%
  rename(Temperature_201907 = exact_extract.Temperature_201907..fishnet.....st_transform.crs.Temperature_201907....)

Temperature_201908 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201908_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201908, 
                                        fishnet %>% st_transform(crs(Temperature_201908)), 
                                        'mean')) %>%
  rename(Temperature_201908 = exact_extract.Temperature_201908..fishnet.....st_transform.crs.Temperature_201908....)

Temperature_201909 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201909_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201909, 
                                        fishnet %>% st_transform(crs(Temperature_201909)), 
                                        'mean')) %>%
  rename(Temperature_201909 = exact_extract.Temperature_201909..fishnet.....st_transform.crs.Temperature_201909....)

Temperature_201910 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201910_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201910, 
                                        fishnet %>% st_transform(crs(Temperature_201910)), 
                                        'mean')) %>%
  rename(Temperature_201910 = exact_extract.Temperature_201910..fishnet.....st_transform.crs.Temperature_201910....)

Temperature_201911 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201911_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201911, 
                                        fishnet %>% st_transform(crs(Temperature_201911)), 
                                        'mean')) %>%
  rename(Temperature_201911 = exact_extract.Temperature_201911..fishnet.....st_transform.crs.Temperature_201911....)

Temperature_201912 <- raster('./Data/Temperature/PRISM_tdmean_stable_4kmM3_2019_all_bil/PRISM_tdmean_stable_4kmM3_201912_bil.bil')

fishnet <- cbind(fishnet, exact_extract(Temperature_201912, 
                                        fishnet %>% st_transform(crs(Temperature_201912)), 
                                        'mean')) %>%
  rename(Temperature_201912 = exact_extract.Temperature_201912..fishnet.....st_transform.crs.Temperature_201912....)

rm(Temperature_201701, Temperature_201702, Temperature_201703, Temperature_201704, Temperature_201705, Temperature_201706,
   Temperature_201707, Temperature_201708, Temperature_201709, Temperature_201710, Temperature_201711, Temperature_201712,
   Temperature_201801, Temperature_201802, Temperature_201803, Temperature_201804, Temperature_201805, Temperature_201806,
   Temperature_201807, Temperature_201808, Temperature_201809, Temperature_201810, Temperature_201811, Temperature_201812,
   Temperature_201901, Temperature_201902, Temperature_201903, Temperature_201904, Temperature_201905, Temperature_201906,
   Temperature_201907, Temperature_201908, Temperature_201909, Temperature_201910, Temperature_201911, Temperature_201912)

fishnet <- fishnet %>% mutate(Temperature_2019 = (Temperature_201901+Temperature_201902+Temperature_201903
                                                  +Temperature_201904+Temperature_201905+Temperature_201906
                                                  +Temperature_201907+Temperature_201908+Temperature_201909
                                                  +Temperature_201910+Temperature_201911+Temperature_201912)/12)

fg2124 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet, aes(fill = Temperature_2019), color = "transparent") +
    scale_fill_gradient(low = greyPalette5[1], high = bluePalette5[5]) +
  labs(title = "Mean Temperature in California",
       subtitle = "2019",
       caption = "Figure 2-1-2-4") +
  mapThememin2()

fg2123+fg2124

2.1.3 Vector Data

2.1.3.1 Facilities for wildland fire proection

The related facilities of wildfire include fire stations, air attack and helitak bases, conservation camps and support facilities, which are available from FRAP.

#------------ Facilities for wildland fire protection ---------------#

Facilities <- st_read("https://opendata.arcgis.com/api/v3/datasets/8e72bb9b01954c83bf910cef4174bb3a_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1") %>% st_drop_geometry() %>% dplyr::select(Y = LAT, X = LON) %>%  filter(Y > 0, X < 0) %>% na.omit() %>% st_as_sf(coords = c("X", "Y"), crs = 4236, agr = "constant") %>% st_transform('EPSG:3310') %>% mutate(Legend = "Facilities")

fg2131 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area %>% st_transform('EPSG:3310'), fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = Facilities, fill = bluePalette5[4], color = bluePalette5[4], size = 0.5, alpha = 0.5) +
  labs(title = "Facilities for wildland fire proection",
       subtitle = "Total 1,259 facilities",
       caption = "Figure 2-1-3-1") +
  mapThememin2()+
  theme(plot.margin=unit(x=c(0,15,0,0),units="mm"))

2.1.3.2 Ground Water Stations

The ground station is a facility that draws water from underground. We assume groundwater might be related to the moisture content of the soil, which we included in the data because it would certainly be correlated with the occurrence of wildfires.

#---------------------- Ground Water Stations------------------------#

GW_Station <- st_read("https://opendata.arcgis.com/api/v3/datasets/a47c05451dbe4a82a1b948a1d6e25852_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1") %>%  st_transform('EPSG:3310')  %>% mutate(Legend = "GW_Station") %>% dplyr::select(Legend)

#--------------------- vector fishinetting --------------------------#

fishnet <- 
  rbind(Facilities, GW_Station) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID, Legend) %>%
  summarize(count = n()) %>%
    full_join(fishnet) %>%
    spread(Legend, count, fill=0) %>%
    st_sf() %>%
    dplyr::select(-`<NA>`) %>%
    na.omit() %>%
    ungroup()

fishnet_ct <- st_centroid(fishnet)
fishnet_ct_c <- st_c(fishnet_ct)
GW_Station_c <- st_c(GW_Station)
Facilities_c <- st_c(Facilities) %>% na.omit()

fishnet <- fishnet %>%
    dplyr::mutate(GW_Station_nn3 = nn_function(fishnet_ct_c, GW_Station_c, k=3),
                  Facilities_nn3 = nn_function(fishnet_ct_c, Facilities_c, k=3))

fg2132 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area %>% st_transform('EPSG:3310'), fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = GW_Station, fill = bluePalette5[4], color = bluePalette5[4], size = 0.5, alpha = 0.5) +
  labs(title = "Ground Water Stations",
       subtitle = "Total 41,169 Stations",
       caption = "Figure 2-1-3-2") +
  mapThememin2()+
  theme(plot.margin=unit(x=c(0,0,0,15),units="mm"))

fg2131 + fg2132

2.2 Feature Engineering

2.2.1 Time features

We’ve tried to figure out time feature of wildfire event by month, day of the week and year. As you can can see following figures, wildfire has obvious seasonality. On the other hand, it seems no correlation with day of the week. Most important fact we’ve found is that wildfire counts by year has been increasing over 100 years since 1910.

#--------------------fires since 1910 2022----------------------------#

California_Wildfire <- California_Wildfire %>% mutate(interval60 = floor_date(ymd_hms(ALARM_DATE), unit = "hour"),
                                                      month = month(ymd_hms(ALARM_DATE)),
                                                      year = year(ymd_hms(ALARM_DATE)),
                                                      dotw = wday(ymd_hms(ALARM_DATE)),
                                                      time_of_day = case_when(hour(interval60) < 8 | hour(interval60) > 20 ~ "Overnight",
                                                                              hour(interval60) >= 8 & hour(interval60) <= 20 ~ "Daytime")) %>% filter(year >= 1910) %>% filter(year <= 2022)

#----------------------------month------------------------------------#

fg2311 <- ggplot(California_Wildfire, aes(x = month))+
  geom_bar(fill=bluePalette5[3], color = "white", size = 0.5)+
  labs(title = "Wildfire counts by month",
       subtitle = "",
       x = "month",
       y = "counts",
       caption = "Figure 2-2-1-1")+
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "longdash", size = 0.25, color = greyPalette5[2]))

#------------------------------year-----------------------------------#

fg2312 <- ggplot(California_Wildfire, aes(x = year))+
  geom_bar(fill=bluePalette5[3], color = "white", size = 0.5)+
  labs(title = "Wildfire counts by year",
       subtitle = "",
       x = "year",
       y = "counts",
       caption = "Figure 2-2-1-2")+
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "longdash", size = 0.25, color = greyPalette5[2]))

#-------------------------day of the week----------------------------#

fg2313 <- ggplot(California_Wildfire, aes(x = dotw))+
  geom_bar(fill=bluePalette5[3], color = "white", size = 0.5)+
  labs(title = "Wildfire counts by day of the week",
       subtitle = "",
       x = "day of the week",
       y = "counts",
       caption = "Figure 2-2-1-3")+
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "longdash", size = 0.25, color = greyPalette5[2]))

grid.arrange(arrangeGrob(fg2311, fg2313, ncol=2), fg2312, nrow = 2)

2.2.2 Spatial features

This chapter discusses how to incorporate the burned attribute into a fishnet grid. We have defined an area as having the spatial feature “burned” if the total area of all fire polygons overlaid exceeds 25% of the area of a single fishnet grid cell over a 100-year period. The plots below show the burned and not burned areas in each month. It’s obvious that from January to May, the number of areas that have occurred wildfires has gradually increased. From the June to September, almost 80% area of California have happened wildfires. Then it keeps decreasing until December. The results also shows that the occurrence of wildfires is highly correlated with Seasonality.

#---------------------------wildfire----------------------------#

burned <- 
  st_intersection(fishnet, California_Wildfire)
  
burned <- burned %>%  
  mutate(burned_area_m2 = round(as.numeric(st_area(burned$geometry))))

burned_JAN <- burned %>% dplyr::filter(month == 1)
burned_FEB <- burned %>% dplyr::filter(month == 2)
burned_MAR <- burned %>% dplyr::filter(month == 3)
burned_APR <- burned %>% dplyr::filter(month == 4)
burned_MAY <- burned %>% dplyr::filter(month == 5)
burned_JUN <- burned %>% dplyr::filter(month == 6)
burned_JUL <- burned %>% dplyr::filter(month == 7)
burned_AUG <- burned %>% dplyr::filter(month == 8)
burned_SEP <- burned %>% dplyr::filter(month == 9)
burned_OCT <- burned %>% dplyr::filter(month == 10)
burned_NOV <- burned %>% dplyr::filter(month == 11)
burned_DEC <- burned %>% dplyr::filter(month == 12)

burned_ALL <- burned %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/10359952) %>%
  mutate(burned = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>% 
  mutate(burned_numeric = case_when(burned == "not-burned" ~ 0,
                                    burned == "burned" ~ 1)) %>%
           dplyr::select(uniqueID, burned, burned_numeric) %>% st_drop_geometry()
         
burned_JAN <- burned_JAN %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/10359952) %>%
  mutate(burned_JAN = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>% 
  mutate(burned_JAN_numeric = case_when(burned_JAN == "not-burned" ~ 0,
                                        burned_JAN == "burned" ~ 1)) %>%
  dplyr::select(uniqueID, burned_JAN, burned_JAN_numeric) %>% st_drop_geometry()

burned_FEB <- burned_FEB %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_FEB = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%
  mutate(burned_FEB_numeric = case_when(burned_FEB == "not-burned" ~ 0,
                                        burned_FEB == "burned" ~ 1)) %>%
  dplyr::select(burned_FEB, burned_FEB_numeric, uniqueID) %>% st_drop_geometry()

burned_MAR <- burned_MAR %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_MAR = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%
  mutate(burned_MAR_numeric = case_when(burned_MAR == "not-burned" ~ 0,
                                        burned_MAR == "burned" ~ 1)) %>%
  dplyr::select(burned_MAR, burned_MAR_numeric, uniqueID) %>% st_drop_geometry()


burned_APR <- burned_APR %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_APR = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%
  mutate(burned_APR_numeric = case_when(burned_APR == "not-burned" ~ 0,
                                        burned_APR == "burned" ~ 1)) %>%
  dplyr::select(burned_APR, burned_APR_numeric, uniqueID) %>% st_drop_geometry()


burned_MAY <- burned_MAY %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_MAY = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>% 
  mutate(burned_MAY_numeric = case_when(burned_MAY == "not-burned" ~ 0,
                                        burned_MAY == "burned" ~ 1)) %>%
  dplyr::select(burned_MAY, burned_MAY_numeric, uniqueID) %>% st_drop_geometry()

burned_JUN <- burned_JUN %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_JUN = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%  
  mutate(burned_JUN_numeric = case_when(burned_JUN == "not-burned" ~ 0,
                                        burned_JUN == "burned" ~ 1)) %>%
  dplyr::select(burned_JUN, burned_JUN_numeric, uniqueID) %>% st_drop_geometry()

burned_JUL <- burned_JUL %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_JUL = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%   
  mutate(burned_JUL_numeric = case_when(burned_JUL == "not-burned" ~ 0,
                                        burned_JUL == "burned" ~ 1)) %>%
  dplyr::select(burned_JUL, burned_JUL_numeric, uniqueID) %>% st_drop_geometry()

burned_AUG <- burned_AUG %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_AUG = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%   
  mutate(burned_AUG_numeric = case_when(burned_AUG == "not-burned" ~ 0,
                                        burned_AUG == "burned" ~ 1)) %>%
  dplyr::select(burned_AUG, burned_AUG_numeric, uniqueID) %>% st_drop_geometry()

burned_SEP <- burned_SEP %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_SEP = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%    
  mutate(burned_SEP_numeric = case_when(burned_SEP == "not-burned" ~ 0,
                                        burned_SEP == "burned" ~ 1)) %>%
  dplyr::select(burned_SEP, burned_SEP_numeric, uniqueID) %>% st_drop_geometry()

burned_OCT <- burned_OCT %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_OCT = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%    
  mutate(burned_OCT_numeric = case_when(burned_OCT == "not-burned" ~ 0,
                                        burned_OCT == "burned" ~ 1)) %>%
  dplyr::select(burned_OCT, burned_OCT_numeric, uniqueID) %>% st_drop_geometry()

burned_NOV <- burned_NOV %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_NOV = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%    
  mutate(burned_NOV_numeric = case_when(burned_NOV == "not-burned" ~ 0,
                                        burned_NOV == "burned" ~ 1)) %>%
  dplyr::select(burned_NOV, burned_NOV_numeric, uniqueID) %>% st_drop_geometry()

burned_DEC <- burned_DEC %>% group_by(uniqueID) %>% summarise(total_burned_area_m2 = sum(burned_area_m2)) %>%
  mutate(burned_ratio = total_burned_area_m2/1035995) %>%
  mutate(burned_DEC = case_when(burned_ratio < 0.25 ~ "not-burned",
                            burned_ratio >= 0.25 ~ "burned")) %>%     
  mutate(burned_DEC_numeric = case_when(burned_DEC == "not-burned" ~ 0,
                                        burned_DEC == "burned" ~ 1)) %>%
  dplyr::select(burned_DEC, burned_DEC_numeric, uniqueID) %>% st_drop_geometry()

burned <- purrr::reduce(list(burned_ALL, burned_JAN, burned_FEB, burned_MAR, burned_APR, burned_MAY, burned_JUN, burned_JUL, burned_AUG, burned_SEP, burned_OCT, burned_NOV, burned_DEC), dplyr::left_join, by='uniqueID')


fishnet <- left_join(fishnet, burned, by='uniqueID')

fishnet_monthly_grid <- fishnet %>% dplyr::select(uniqueID, burned_JAN, burned_FEB, burned_MAR, burned_APR, burned_MAY, burned_JUN, burned_JUL, burned_AUG, burned_SEP, burned_OCT, burned_NOV, burned_DEC) %>% replace(is.na(.), "not-burned") %>% st_drop_geometry() %>% gather(Variable, Value, -uniqueID)

fishnet_monthly_grid <-  left_join(fishnet %>% dplyr::select(uniqueID), fishnet_monthly_grid, by = "uniqueID")

ggplot() +
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet_monthly_grid %>%
            mutate(across(Variable, factor, levels=c("burned_JAN", "burned_FEB", "burned_MAR", "burned_APR",
                                                     "burned_MAY", "burned_JUN", "burned_JUL", "burned_AUG",
                                                     "burned_SEP", "burned_OCT", "burned_NOV", "burned_DEC"))), aes(fill = Value), color = "transparent") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = c("not-burned" = bluePalette5[1], "burned" = bluePalette5[4])) +
  labs(title = "California Burned Area by month",
       subtitle = "Burned Area \n where sum of wildfire area since 1910 is over 25% of each grid",
       caption = "Figure 2-2-2") +
  mapThememin2()

2.2.3 Weather features

2.2.3.1 Precipitation

The figures below show the average precipitation in California by month. The months with less precipitation occur from June to September, which are the high risk seasons of wildfires. There is a significant negative correlation between precipitation and wildfire occurrence.

fishnet <- fishnet %>% mutate(Precipitation = (Precipitation_201701+Precipitation_201801+Precipitation_201901
                                               +Precipitation_201702+Precipitation_201802+Precipitation_201902
                                               +Precipitation_201703+Precipitation_201803+Precipitation_201903
                                               +Precipitation_201704+Precipitation_201804+Precipitation_201904
                                               +Precipitation_201705+Precipitation_201805+Precipitation_201905
                                               +Precipitation_201706+Precipitation_201806+Precipitation_201906
                                               +Precipitation_201707+Precipitation_201807+Precipitation_201907
                                               +Precipitation_201708+Precipitation_201808+Precipitation_201908
                                               +Precipitation_201709+Precipitation_201809+Precipitation_201909
                                               +Precipitation_201710+Precipitation_201810+Precipitation_201910
                                               +Precipitation_201711+Precipitation_201811+Precipitation_201911
                                               +Precipitation_201712+Precipitation_201812+Precipitation_201912)/36,
                              Precipitation_JAN = (Precipitation_201701+Precipitation_201801+Precipitation_201901)/3,
                              Precipitation_FEB = (Precipitation_201702+Precipitation_201802+Precipitation_201902)/3,
                              Precipitation_MAR = (Precipitation_201703+Precipitation_201803+Precipitation_201903)/3,
                              Precipitation_APR = (Precipitation_201704+Precipitation_201804+Precipitation_201904)/3,
                              Precipitation_MAY = (Precipitation_201705+Precipitation_201805+Precipitation_201905)/3,
                              Precipitation_JUN = (Precipitation_201706+Precipitation_201806+Precipitation_201906)/3,
                              Precipitation_JUL = (Precipitation_201707+Precipitation_201807+Precipitation_201907)/3,
                              Precipitation_AUG = (Precipitation_201708+Precipitation_201808+Precipitation_201908)/3,
                              Precipitation_SEP = (Precipitation_201709+Precipitation_201809+Precipitation_201909)/3,
                              Precipitation_OCT = (Precipitation_201710+Precipitation_201810+Precipitation_201910)/3,
                              Precipitation_NOV = (Precipitation_201711+Precipitation_201811+Precipitation_201911)/3,
                              Precipitation_DEC = (Precipitation_201712+Precipitation_201812+Precipitation_201912)/3) %>%
  dplyr::select(-Precipitation_201701, -Precipitation_201702, -Precipitation_201703, -Precipitation_201704, -Precipitation_201705, -Precipitation_201706,
                -Precipitation_201707, -Precipitation_201708, -Precipitation_201709, -Precipitation_201710, -Precipitation_201711, -Precipitation_201712,
                -Precipitation_201801, -Precipitation_201802, -Precipitation_201803, -Precipitation_201804, -Precipitation_201805, -Precipitation_201806,
                -Precipitation_201807, -Precipitation_201808, -Precipitation_201809, -Precipitation_201810, -Precipitation_201811, -Precipitation_201812,
                -Precipitation_201901, -Precipitation_201902, -Precipitation_201903, -Precipitation_201904, -Precipitation_201905, -Precipitation_201906,
                -Precipitation_201907, -Precipitation_201908, -Precipitation_201909, -Precipitation_201910, -Precipitation_201911, -Precipitation_201912)

fishnet_Precipitation_grid <- fishnet %>% dplyr::select(uniqueID, Precipitation_JAN, Precipitation_FEB, Precipitation_MAR, Precipitation_APR, Precipitation_MAY, Precipitation_JUN, Precipitation_JUL, Precipitation_AUG, Precipitation_SEP, Precipitation_OCT, Precipitation_NOV, Precipitation_DEC) %>% st_drop_geometry() %>% gather(Variable, Value, -uniqueID)

fishnet_Precipitation_grid <-  left_join(fishnet %>% dplyr::select(uniqueID), fishnet_Precipitation_grid, by = "uniqueID")

ggplot() +
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet_Precipitation_grid %>% 
            mutate(across(Variable, factor, levels=c("Precipitation_JAN", "Precipitation_FEB", "Precipitation_MAR", "Precipitation_APR",
                                                     "Precipitation_MAY", "Precipitation_JUN", "Precipitation_JUL", "Precipitation_AUG",
                                                     "Precipitation_SEP", "Precipitation_OCT", "Precipitation_NOV", "Precipitation_DEC"))), aes(fill = q5(Value)), color = "transparent") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = bluePalette5,
                    labels = round(as.numeric(qBr(fishnet_Precipitation_grid, "Value")),2),
                    name = "Precipitation(㎜\nquantiles)",
                    na.translate=FALSE)+
  labs(title = "Average Precipitation in California by month",
       subtitle = "2017 - 2019",
       caption = "Figure 2-2-3-1") +
  mapThememin2()

2.2.3.2 Temperature

The temperature is the opposite of the precipitation characteristics, the higher the temperature, the higher the probability of wildfire occurrence, especially in the month of June, July and August.

fishnet <- fishnet %>% mutate(Temperature = (Temperature_201701+Temperature_201801+Temperature_201901
                                               +Temperature_201702+Temperature_201802+Temperature_201902
                                               +Temperature_201703+Temperature_201803+Temperature_201903
                                               +Temperature_201704+Temperature_201804+Temperature_201904
                                               +Temperature_201705+Temperature_201805+Temperature_201905
                                               +Temperature_201706+Temperature_201806+Temperature_201906
                                               +Temperature_201707+Temperature_201807+Temperature_201907
                                               +Temperature_201708+Temperature_201808+Temperature_201908
                                               +Temperature_201709+Temperature_201809+Temperature_201909
                                               +Temperature_201710+Temperature_201810+Temperature_201910
                                               +Temperature_201711+Temperature_201811+Temperature_201911
                                               +Temperature_201712+Temperature_201812+Temperature_201912)/36,
                              Temperature_JAN = (Temperature_201701+Temperature_201801+Temperature_201901)/3,
                              Temperature_FEB = (Temperature_201702+Temperature_201802+Temperature_201902)/3,
                              Temperature_MAR = (Temperature_201703+Temperature_201803+Temperature_201903)/3,
                              Temperature_APR = (Temperature_201704+Temperature_201804+Temperature_201904)/3,
                              Temperature_MAY = (Temperature_201705+Temperature_201805+Temperature_201905)/3,
                              Temperature_JUN = (Temperature_201706+Temperature_201806+Temperature_201906)/3,
                              Temperature_JUL = (Temperature_201707+Temperature_201807+Temperature_201907)/3,
                              Temperature_AUG = (Temperature_201708+Temperature_201808+Temperature_201908)/3,
                              Temperature_SEP = (Temperature_201709+Temperature_201809+Temperature_201909)/3,
                              Temperature_OCT = (Temperature_201710+Temperature_201810+Temperature_201910)/3,
                              Temperature_NOV = (Temperature_201711+Temperature_201811+Temperature_201911)/3,
                              Temperature_DEC = (Temperature_201712+Temperature_201812+Temperature_201912)/3) %>%
  dplyr::select(-Temperature_201701, -Temperature_201702, -Temperature_201703, -Temperature_201704, -Temperature_201705, -Temperature_201706,
                -Temperature_201707, -Temperature_201708, -Temperature_201709, -Temperature_201710, -Temperature_201711, -Temperature_201712,
                -Temperature_201801, -Temperature_201802, -Temperature_201803, -Temperature_201804, -Temperature_201805, -Temperature_201806,
                -Temperature_201807, -Temperature_201808, -Temperature_201809, -Temperature_201810, -Temperature_201811, -Temperature_201812,
                -Temperature_201901, -Temperature_201902, -Temperature_201903, -Temperature_201904, -Temperature_201905, -Temperature_201906,
                -Temperature_201907, -Temperature_201908, -Temperature_201909, -Temperature_201910, -Temperature_201911, -Temperature_201912)

fishnet_Temperature_grid <- fishnet %>% dplyr::select(uniqueID, Temperature_JAN, Temperature_FEB, Temperature_MAR, Temperature_APR, Temperature_MAY, Temperature_JUN, Temperature_JUL, Temperature_AUG, Temperature_SEP, Temperature_OCT, Temperature_NOV, Temperature_DEC) %>% st_drop_geometry() %>% gather(Variable, Value, -uniqueID)

fishnet_Temperature_grid <-  left_join(fishnet %>% dplyr::select(uniqueID), fishnet_Temperature_grid, by = "uniqueID")

ggplot() +
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet_Temperature_grid %>% 
            mutate(across(Variable, factor, levels=c("Temperature_JAN", "Temperature_FEB", "Temperature_MAR", "Temperature_APR",
                                                     "Temperature_MAY", "Temperature_JUN", "Temperature_JUL", "Temperature_AUG",
                                                     "Temperature_SEP", "Temperature_OCT", "Temperature_NOV", "Temperature_DEC"))), aes(fill = q5(Value)), color = "transparent") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = bluePalette5,
                    labels = round(as.numeric(qBr(fishnet_Temperature_grid, "Value")),2),
                    name = "Temperature(℃\nquantiles)",
                    na.translate=FALSE)+
  labs(title = "Average Temperature in California by month",
       subtitle = "2017 - 2019",
       caption = "Figure 2-2-3-2") +
  mapThememin2()

Ⅲ. Exploratory Analysis

3.1 Time Analysis

We have firstly counted the total burned area in each grid for each month since 1910, then defined the burned grids as the total burned area is over 25% of the grid’s area. The spatial distribution map and bar plots demonstrate that the months from May to October are the high incidence of wildfires, especially in July and August, the number of burned grids almost equals to those not-burned grids.

fishnet$burned_numeric[is.na(fishnet$burned_numeric)] <- 0
fishnet$burned_JAN_numeric[is.na(fishnet$burned_JAN_numeric)] <- 0
fishnet$burned_FEB_numeric[is.na(fishnet$burned_FEB_numeric)] <- 0
fishnet$burned_MAR_numeric[is.na(fishnet$burned_MAR_numeric)] <- 0
fishnet$burned_APR_numeric[is.na(fishnet$burned_APR_numeric)] <- 0
fishnet$burned_MAY_numeric[is.na(fishnet$burned_MAY_numeric)] <- 0
fishnet$burned_JUN_numeric[is.na(fishnet$burned_JUN_numeric)] <- 0
fishnet$burned_JUL_numeric[is.na(fishnet$burned_JUL_numeric)] <- 0
fishnet$burned_AUG_numeric[is.na(fishnet$burned_AUG_numeric)] <- 0
fishnet$burned_SEP_numeric[is.na(fishnet$burned_SEP_numeric)] <- 0
fishnet$burned_OCT_numeric[is.na(fishnet$burned_OCT_numeric)] <- 0
fishnet$burned_NOV_numeric[is.na(fishnet$burned_NOV_numeric)] <- 0
fishnet$burned_DEC_numeric[is.na(fishnet$burned_DEC_numeric)] <- 0

fishnet <- fishnet %>% mutate(burned = as.character(burned_numeric)) %>%
  mutate(burned = case_when(burned == "0" ~ "not_burned",
                            burned == "1" ~ "burned"))

fishnet_monthly <- fishnet %>% dplyr::select(burned_JAN, burned_FEB, burned_MAR, burned_APR, burned_MAY, burned_JUN, burned_JUL, burned_AUG, burned_SEP, burned_OCT, burned_NOV, burned_DEC) %>% 
  st_drop_geometry() %>% gather(Variable, Value) %>% replace(is.na(.), "not-burned")

ggplot() +
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet_monthly_grid %>%
            mutate(across(Variable, factor, levels=c("burned_JAN", "burned_FEB", "burned_MAR", "burned_APR",
                                                     "burned_MAY", "burned_JUN", "burned_JUL", "burned_AUG",
                                                     "burned_SEP", "burned_OCT", "burned_NOV", "burned_DEC"))), aes(fill = Value), color = "transparent") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = c("not-burned" = bluePalette5[1], "burned" = bluePalette5[4])) +
  labs(title = "California Burned Area by month",
       subtitle = "Burned Area \n where sum of wildfire area since 1910 is over 25% of each grid",
       caption = "Figure 3-1-1") +
  mapThememin2()

fg31 <- ggplot(fishnet_monthly %>%
            mutate(across(Variable, factor, levels=c("burned_JAN", "burned_FEB", "burned_MAR", "burned_APR",
                                                     "burned_MAY", "burned_JUN", "burned_JUL", "burned_AUG",
                                                     "burned_SEP", "burned_OCT", "burned_NOV", "burned_DEC"))), aes(fill = Value)) +
  geom_bar(mapping = aes(x = Value), width = 0.2)+
  facet_wrap(~Variable, scales = "free") +
  scale_fill_manual(values = c(bluePalette5[4],bluePalette5[2]),
                    name = "Wildfires")+
  labs(title = "Monthly wildfire Distribution",
       subtitle = "counts by fishnet",
       caption = "Figure 3-1-2",
       x = "Wildfires")+
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "longdash", size = 0.25, color = greyPalette5[2]),
        legend.key.size = unit(0.25, "cm"))

fg31

3.2 Continuous features Analysis

The plots below are the continuous features distribution of wildfires. As elevation increases, wildfires will gradually decrease and reach the peak at approximately 250 feet high. There is no significant difference between the shape of the two curves for facilities distance and ground water station distance. In terms of precipitation, burned area’s density reach the peak a little earlier than not-burned area, which means wildfires are more likely to occur in places with less precipitation.

fishnet_cf <- fishnet %>% dplyr::select(burned, elevation, GW_Station_nn3, Facilities_nn3, Precipitation) %>% 
  st_drop_geometry() %>% gather(Variable, Value, -burned) 

ggplot(fishnet_cf) + 
  geom_density(aes(Value, linetype=burned, color=burned, ), fill = "transparent", size = 0.8) + 
  facet_wrap(~Variable, scales = "free") +
  scale_color_manual(values = c(bluePalette5[3],bluePalette5[4])) +
  labs(title = "Feature Distribution of wildfires",
       subtitle = "continous features",
       caption = "Figure 3-2-1")+
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "longdash", size = 0.25, color = greyPalette5[2]),
        legend.key.size = unit(0.25, "cm"))+
  guides(linetype=guide_legend(title = "Wildfires"),
         color=guide_legend(title = "Wildfires"))

3.3 Landcover feature Analysis

As for the land cover, We counted the number of burned area grids and not-burned area grids for each land cover category. It’s obvious that in evergreen and herbaceous area, the probability of wildfires is much more higher. And the land cover type with the highest number of wildfires was shrub, which reached nearly 800 grids.

fishnet_landcover <- fishnet %>% dplyr::select(burned, landcover) %>% 
  st_drop_geometry()

ggplot(fishnet_landcover, aes(fill = burned)) +   
    geom_bar(mapping = aes(x = burned), width = 0.2) +
    facet_wrap(~landcover, scales="free") +
    scale_fill_manual(values = c(bluePalette5[4], bluePalette5[2])) +
    labs(x="Burned", y="Count",
         title = "Landcover distribution of wildfires",
         subtitle = "",
         caption = "figure 3-3") +
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "longdash", size = 0.25, color = greyPalette5[2]),
        axis.text.x = element_text(size = 7),
    legend.key.size = unit(0.25, "cm"))

Ⅳ. Risks Estimation Model

4.1 Build Fire risk Prediction Model

We use the logit model to predict the risk of wildfires, which is a binomial general linear model. And the binary output is represented as whether this area will burn or not in the next year. Firstly, we randomly split the fishnet into training set and test set. The ratio is 7:3.

The following summary table shows performance of the model. The McFadden R-squared is an indicator that how much variance can be explained by the model. In this case, the value is about 0.36, which is close to 0.4. The range of “excellent” fit as specified by the creator of the metric, Daniel McFadden is 0.2-0.4, which means our model has a good performance.

set.seed(3456)

trainIndex <- createDataPartition(y = paste(fishnet$burned, fishnet$landcover), 
                                  p = .70,
                                  list = FALSE,
                                  times = 1)
burnTrain <- fishnet[ trainIndex,]
burnTest <- fishnet[-trainIndex,]


reg_ALL <- glm(burned_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
            dplyr::select(burned_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation, Temperature),
            family="binomial"(link="logit"))

pR2(reg_ALL)

## fitting null model for pseudo-r2

##           llh       llhNull            G2      McFadden          r2ML 
## -1166.1211134 -1747.6455150  1163.0488032     0.3327473     0.3655691 
##          r2CU 
##     0.4905301

4.2 Model Prediction

The result of a logistic model is a probability. Therefore, this chapter deals with determining what percentage probability should be defined as “burned.” The chart below represents the predicted distribution of burn and burned model results for each threshold. The model is well-constructed, with “not burn” leaning to the left and “burn” leaning to the right. However, the threshold is determined based on how the model is evaluated under a certain scenario.

testProbs = data.frame(Outcome = as.factor(burnTest$burned_numeric),
                       Probs = predict(reg_ALL, burnTest, type= "response"))

ggplot(testProbs, aes(x = Probs, fill = as.factor(Outcome))) + 
  geom_density() +
  facet_grid(Outcome ~ .) +
  scale_fill_manual(values = c(bluePalette5[2], bluePalette5[4])) + xlim(0, 1) +
  geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
  geom_vline(xintercept = 0.5, size = 0.4, color = bluePalette5[3], linetype = "longdash") +
  labs(x = "Burned", y = "Density",
       title = "Regression Model",
       subtitle = "",
       caption = "Figure 4-2")  +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed", size = 0.25, color = greyPalette5[2]),
        legend.position = "none")

Ⅴ. Model validation

Model evaluation is generally composed of accuracy and generalizability. We evaluated the accuracy of the model using the most common method for evaluating logistic models, the ROC curve, and AUC calculation, and we verified generalizability through 100 folds cv test.

5.1 Goodnees of fit

Based on the result of test set predicted probabilities, we set the outcome threshold to 60%. We use the confusion matrix to present the number and proportion of true positives, true negatives, false positives, and false negatives. It provides an overall accuracy. In this case, the value is 0.82 which is relatively high.

In addition, the confusion matrix also provides the sensitivity and specificity of the predictions, which are the true positive rate and true negative rate. A higher sensitivity is our priority task since the accurate prediction of wildfire place is of great importance to us. In this case, our sensitivity is about 0.86 and specificity is about 0.77, both of which are relatively high.

testProbs <- testProbs %>%
  mutate(predOutcome  = as.character(ifelse(testProbs$Probs > 0.6 , 1, 0)),
         predOutcome_ft  = as.factor(ifelse(testProbs$Probs > 0.6 , 1, 0)))

cf_mtrix <- caret::confusionMatrix(testProbs$predOutcome_ft, testProbs$Outcome, positive = "1")

cf_mtrix

cf_mtrix_table <- tidy(cf_mtrix$table)

eval <- evaluate(
  data = testProbs,
  target_col = "Outcome",
  prediction_cols = "predOutcome",
  type = "binomial"
)

plot_confusion_matrix(eval[["Confusion Matrix"]][[1]])

5.2 ROC(AUC)

The receiver operator curve (ROC) is another tool that help us visualize the trade-offs between model outcomes based on the outcome threshold. As number of true positives increases, the number of false positives will also increase. We pay more attention to increasing the true positive rate than decreasing the false positive rate. The “Coin Flip line” is the diagonal line in the bottom panel of figure below. Any classifier with an ROC Curve along the diagonal line is no better than a coin flip, which means the true positive rate and false positive rate are equal. In this case the area under the curve (AUC) is 0.5. Here, our AUC is 0.85, which indicates that the model has a high goodness of fit.

auc <- round(pROC::auc(testProbs$Outcome, testProbs$Probs),3)

ggplot(testProbs, aes(d = as.numeric(testProbs$Outcome), m = Probs)) +
  geom_roc(n.cuts = 100, labels = FALSE, colour = bluePalette5[3], pointsize = 0.4, size = 0.6) +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 0.4, color = greyPalette5[3]) +
  labs(title = "ROC curve",
       subtitle = auc,
       caption = "Figure 5-2",
       y = "True Positive Fraction") +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed", size = 0.25, color = greyPalette5[2]),
        panel.grid.major.x = element_line(linetype = "dashed", size = 0.25, color = greyPalette5[2]),
        legend.position = "none")

5.3 Cross Validation

To test our model’s generalizability, we perform K-fold random cross validation and set k to 100, then gather the ROC, Sensitivity, and Specificity results to explore whether the model generalizes well. The results are shown in the plots below. The average area under the ROC curve is consistent and the average sensitivity increased substantially while the specificity decreased slightly.

From the distribution of Sensitivity, most values are concentrated around the mean value, very few values are distributed around 0.75 and 0.5, which indicates our model is useful and generalizes well across random spatial contexts.

ctrl <- trainControl(method = "cv", number = 100, classProbs=TRUE, summaryFunction=twoClassSummary)

cv <- train(burned ~ .,
              data=fishnet %>% st_drop_geometry() %>%
                dplyr::select(burned, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation, Temperature), 
                method="glm", family="binomial", metric="ROC", trControl = ctrl)

cv_gfm <- round(cv$results[2:4],4)

cv_resample <- cv$resample %>%
  dplyr::select(-Resample) %>%
  gather(metric, value) %>% # column to row with value (100obj > 300obj)
  left_join(gather(cv$results[2:4], metric, mean))#metric is key to left join


dplyr::select(cv$resample, -Resample) %>%
  gather(metric, value) %>%
  left_join(gather(cv$results[2:4], metric, mean)) %>%
  ggplot(aes(value)) + 
    geom_histogram(bins=35, fill = bluePalette5[3]) +
    facet_wrap(~metric) +
    geom_hline(yintercept = 0, size = 0.25, color = greyPalette5[3]) +
    geom_vline(aes(xintercept = mean), colour = bluePalette5[4], linetype = 4, size = 0.7) +
    scale_x_continuous(limits = c(0, 1)) +
    labs(x="Goodness of Fit", y="Count", title="CV Goodness of Fit Metrics",
         subtitle = "Across-fold mean reprented as dotted lines",
         caption = "Figure 4-4-2")+
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed", size = 0.25, color = greyPalette5[2]),
        panel.grid.major.x = element_line(linetype = "dashed", size = 0.25, color = greyPalette5[2]),
        legend.position = "none")

5.4 Weight Scenario Analysis

True Positive : Predicted correctly where actually occurred fires

True Negative : Predicted correctly where actually didn’t occurred fires

False Positive : Predicted incorrectly where actually didn’t occurred fires

False Negative : Predicted incorrectly where actually occurred fires

Variable	weight	description
True Positive	50	predicted correctly so we can reduce the impact of fires
True Negative	0	predicted correctly so basically nothing happens
False Positive	-25	predicted incorrectly so we allocate useless resources and waste it
False Negative	-100	predicted incorrectly resulting in disastrous loss of lives and properties

cost_benefit_table <-
   testProbs %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25)))) %>%
    bind_cols(data.frame(Description = c(
              "Predicted correctly where actually didn't occurred fires",
              "Predicted correctly where actually occurred fires",
              "Predicted incorrectly where actually occurred fires",
              "Predicted incorrectly where actually didn't occurred fires")))


kable(cost_benefit_table, caption = "<center><span style='font-size:14px; color:black; font-family: Arial'>Table 5-1. Weight Table of 50% thresholds</span></center>",
      align = "c") %>% 
      kable_minimal(full_width = T, html_font = "Arial", font_size = 15) %>%
      column_spec(1, bold = TRUE) %>%
      row_spec(c(2,4), background = greyPalette5[1]) %>%
      row_spec(0:4, extra_css = "line-height: 35px") 

5.5 Maximize Weighted Value (Optimize Threshold)

For each threshold, there is a different confusion matrix, set of errors and weight scenario calculation and the optimize threshold is the one that has the greatest weighted value. The plot below is the changes of weighted value for each confusion matrix type and threshold. True positive type remains stable until the threshold of 0.6, then gradually decline to 0. While false negative type starts from 0 and drops all the time.

The weighted value plot shows that when threshold is equal to 0.25, it has the maximum weighted value. The higher the threshold, the more burned grids will become unpredictable, and the higher the public cost.

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold <- iterateThresholds(testProbs)

whichThreshold %>%
  ggplot(.,aes(Threshold, Weighted_Value, colour = Variable)) +
  geom_point() +
  scale_colour_manual(values = bluePalette5[2:5]) +    
  labs(title = "Weighted Value by confusion matrix type and threshold",
       y = "WeightedValue",
       caption = "Figure 5-5-1") +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed", size = 0.35, color = greyPalette5[2]),
        panel.grid.major.x = element_line(linetype = "dashed", size = 0.35, color = greyPalette5[2]),
        legend.position = "bottom",
        legend.background = element_rect(fill = "white"),
        legend.key = element_rect(colour = NA, fill = NA)) +
  guides(colour=guide_legend(title = "Confusion Matrix")) 

whichThreshold_cb <- whichThreshold %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th <- pull(arrange(whichThreshold_cb, -Weighted_Value)[1,1])

ggplot(whichThreshold_cb, aes(Threshold, Weighted_Value)) +
  geom_point(color = bluePalette5[3], size = 1.2) +
  scale_colour_manual(values = bluePalette5) +    
  geom_vline(xintercept =  optimal_th, colour = bluePalette5[4], linetype = 4, size = 1) +
  labs(title = "Weighted Value by threshold",
       subtitle = "Maximizing Weghted Value Threshold : 0.25",
       y = "Weighted Value",
       caption = "Figure 5-5-2") +
  corTheme2()+
  theme(panel.background = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed", size = 0.45, color = greyPalette5[2]),
        panel.grid.major.x = element_line(linetype = "dashed", size = 0.45, color = greyPalette5[2]),
        legend.position = "bottom",
        axis.text.y = element_text(angle = 90)) +
  guides(colour=guide_legend(title = "Confusion Matrix")) 

whichThreshold_cb %>%
  filter(Threshold == "0.5" | Threshold == "0.25") %>%
  mutate(Scenario = case_when(Threshold == "0.5" ~ "50% Threshold", Threshold == "0.25" ~ "Optimal Threshold")) %>%
  dplyr::select(Scenario, Weighted_Value) %>%
  kable( caption = "<center><span style='font-size:14px; color:black; font-family: Arial'>Table 5-3. Weighted Value Comparison Table</span></center>",
      align = "c") %>% 
      kable_minimal(full_width = T, html_font = "Arial", font_size = 16)%>%
      column_spec(1:2, bold = TRUE) %>%
      row_spec(1, color = orangePalette5[5]) %>%
      row_spec(2, color = bluePalette5[4]) %>%
      row_spec(1:2, extra_css = "line-height: 30px") 

5.6 Optimized model

When we set the threshold as 0.25, the outcome is shown below, which indicates that the sensitivity is extremely high across all 100 folds, while the specificity decreased a lot and means that there is a large number of false negatives.

testProbs <- testProbs %>%
  mutate(predOutcome_opt  = as.factor(ifelse(testProbs$Probs > 0.25 , 1, 0)))

#caret::confusionMatrix(testProbs$predOutcome_opt, testProbs$Outcome, positive = "1")

ctrl <- trainControl(method = "cv", number = 100, classProbs=TRUE, savePredictions = "all", summaryFunction=twoClassSummary)

cv_opt <- train(burned ~ .,
              data=fishnet %>% st_drop_geometry() %>%
                dplyr::select(burned, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation, Temperature), 
                method="glm", family="binomial", metric="ROC", trControl = ctrl)

thresholder(cv_opt, .25, final = TRUE) %>%
  dplyr::select( Threshold = prob_threshold, Sensitivity, Specificity) %>%
  kable(., caption = "<center><span style='font-size:14px; color:black; font-family: Arial'>Table 5-6. Optimized model </span></center>",
      align = "c") %>% 
      kable_minimal(full_width = T, html_font = "Arial", font_size = 15) %>%
      row_spec(0:1, extra_css = "line-height: 30px") 

Table 5-6. Optimized model
Threshold	Sensitivity	Specificity
0.25	0.9540476	0.5179583

Ⅵ. Wildfire Risk Prediction Map

We have created two types of model based on different time intervals.

6.1 Year Prediction Map

Wildfire probabilities and predictions for the year are mapped below. The left one presents the calculated probabilities, which is a continuous factor, the right one is the predicted outcome, which can be seen that almost everywhere will be burned the next year. We were trying to create a risk map to help firefighters with their patrols, but the year map has limitations in its use due to the wide temporal and spatial range of its predictions.

fishnet <- fishnet %>%
  mutate(Probs = predict(reg_ALL, fishnet, type= "response"))
  fishnet$predOutcome = as.factor(ifelse(fishnet$Probs > optimal_th , "burned", "not"))

fg611 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = fishnet, aes(fill = q5(Probs)), colour = NA) +
  scale_fill_manual(values = bluePalette5,
                    labels = qBr2(fishnet, "Probs"),
                    name = "Probability",
                    na.translate=FALSE)+
  labs(title= "California Wildfires Predictions of Year",
       subtitle = "Using 2018~2020 Climate data",
       caption = "Figure 6-1-1") +
  mapThememin2()


fg612 <- ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = fishnet, aes(fill = predOutcome), colour = NA) +
  scale_fill_manual(values = c("not" = bluePalette5[1], "burned" = bluePalette5[4])) +
  labs(title = "Wildfire Prevention Area to Maximize Resource Effect",
       subtitle = "Based on 25% threshold of Wildfire Risk Prediction",
       caption = "Figure 6-1-2") +
  mapThememin2()

fg611+fg612

6.2 Monthly Prediction Map

Thus, to help us predict wildfire places more accurately, we created 12 regression models for each month. The predicted maps of each month are shown below, which is consistent with common sense and our previous conclusions.

#------------------monthly reg model-----------------------#

reg_JAN <- glm(burned_JAN_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_JAN_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_JAN, Temperature_JAN),
            family="binomial"(link="logit"))

reg_FEB <- glm(burned_FEB_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_FEB_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_FEB, Temperature_FEB),
            family="binomial"(link="logit"))

reg_MAR <- glm(burned_MAR_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_MAR_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_MAR, Temperature_MAR),
            family="binomial"(link="logit"))

reg_APR <- glm(burned_APR_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_APR_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_APR, Temperature_APR),
            family="binomial"(link="logit"))

reg_MAY <- glm(burned_MAY_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_MAY_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_MAY, Temperature_MAY),
            family="binomial"(link="logit"))

reg_JUN <- glm(burned_JUN_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_JUN_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_JUN, Temperature_JUN),
            family="binomial"(link="logit"))

reg_JUL <- glm(burned_JUL_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_JUL_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_JUL, Temperature_JUL),
            family="binomial"(link="logit"))

reg_AUG <- glm(burned_AUG_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_AUG_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_AUG, Temperature_AUG),
            family="binomial"(link="logit"))

reg_SEP <- glm(burned_SEP_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_SEP_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_SEP, Temperature_SEP),
            family="binomial"(link="logit"))

reg_OCT <- glm(burned_OCT_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_OCT_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_OCT, Temperature_OCT),
            family="binomial"(link="logit"))

reg_NOV <- glm(burned_NOV_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_NOV_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_NOV, Temperature_NOV),
            family="binomial"(link="logit"))

reg_DEC <- glm(burned_DEC_numeric ~ .,
            data = as.data.frame(burnTrain) %>% 
              dplyr::select(burned_DEC_numeric, landcover, elevation, Facilities_nn3, GW_Station_nn3, Precipitation_DEC, Temperature_DEC),
            family="binomial"(link="logit"))


pR2(reg_JAN)
pR2(reg_FEB)
pR2(reg_MAR)
pR2(reg_APR)
pR2(reg_MAY)
pR2(reg_JUN)
pR2(reg_JUL)
pR2(reg_AUG)
pR2(reg_SEP)
pR2(reg_OCT)
pR2(reg_NOV)
pR2(reg_DEC)

#------------------monthly optimal threshold-----------------------#

##---JAN---##

testProbs_JAN = data.frame(Outcome = as.factor(burnTest$burned_JAN_numeric),
                           Probs = predict(reg_JAN, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_JAN %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_JAN <- iterateThresholds(testProbs_JAN)

whichThreshold_JAN <- whichThreshold_JAN %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_JAN <- pull(arrange(whichThreshold_JAN, -Weighted_Value)[1,1])

##---FEB---##

testProbs_FEB = data.frame(Outcome = as.factor(burnTest$burned_FEB_numeric),
                       Probs = predict(reg_FEB, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_FEB %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_FEB <- iterateThresholds(testProbs_FEB)

whichThreshold_FEB <- whichThreshold_FEB %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_FEB <- pull(arrange(whichThreshold_FEB, -Weighted_Value)[1,1])

##---MAR---##

testProbs_MAR = data.frame(Outcome = as.factor(burnTest$burned_MAR_numeric),
                       Probs = predict(reg_MAR, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_MAR %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_MAR <- iterateThresholds(testProbs_MAR)

whichThreshold_MAR <- whichThreshold_MAR %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_MAR <- pull(arrange(whichThreshold_MAR, -Weighted_Value)[1,1])

##---APR---##

testProbs_APR = data.frame(Outcome = as.factor(burnTest$burned_APR_numeric),
                       Probs = predict(reg_APR, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_APR %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_APR <- iterateThresholds(testProbs_APR)

whichThreshold_APR <- whichThreshold_APR %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_APR <- pull(arrange(whichThreshold_APR, -Weighted_Value)[1,1])

##---MAY---##

testProbs_MAY = data.frame(Outcome = as.factor(burnTest$burned_MAY_numeric),
                       Probs = predict(reg_MAY, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_MAY %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_MAY <- iterateThresholds(testProbs_MAY)

whichThreshold_MAY <- whichThreshold_MAY %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_MAY <- pull(arrange(whichThreshold_MAY, -Weighted_Value)[1,1])

##---JUN---##

testProbs_JUN = data.frame(Outcome = as.factor(burnTest$burned_JUN_numeric),
                       Probs = predict(reg_JUN, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_JUN %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_JUN <- iterateThresholds(testProbs_JUN)

whichThreshold_JUN <- whichThreshold_JUN %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_JUN <- pull(arrange(whichThreshold_JUN, -Weighted_Value)[1,1])

##---JUL---##

testProbs_JUL = data.frame(Outcome = as.factor(burnTest$burned_JUL_numeric),
                       Probs = predict(reg_JUL, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_JUL %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_JUL <- iterateThresholds(testProbs_JUL)

whichThreshold_JUL <- whichThreshold_JUL %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_JUL <- pull(arrange(whichThreshold_JUL, -Weighted_Value)[1,1])

##---AUG---##

testProbs_AUG = data.frame(Outcome = as.factor(burnTest$burned_AUG_numeric),
                       Probs = predict(reg_AUG, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_AUG %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_AUG <- iterateThresholds(testProbs_AUG)

whichThreshold_AUG <- whichThreshold_AUG %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_AUG <- pull(arrange(whichThreshold_AUG, -Weighted_Value)[1,1])

##---SEP---##

testProbs_SEP = data.frame(Outcome = as.factor(burnTest$burned_SEP_numeric),
                       Probs = predict(reg_SEP, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_SEP %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_SEP <- iterateThresholds(testProbs_SEP)

whichThreshold_SEP <- whichThreshold_SEP %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_SEP <- pull(arrange(whichThreshold_SEP, -Weighted_Value)[1,1])

##---OCT---##

testProbs_OCT = data.frame(Outcome = as.factor(burnTest$burned_OCT_numeric),
                       Probs = predict(reg_OCT, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_OCT %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_OCT <- iterateThresholds(testProbs_OCT)

whichThreshold_OCT <- whichThreshold_OCT %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_OCT <- pull(arrange(whichThreshold_OCT, -Weighted_Value)[1,1])

##---NOV---##

testProbs_NOV = data.frame(Outcome = as.factor(burnTest$burned_NOV_numeric),
                       Probs = predict(reg_NOV, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_NOV %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_NOV <- iterateThresholds(testProbs_NOV)

whichThreshold_NOV <- whichThreshold_NOV %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_NOV <- pull(arrange(whichThreshold_NOV, -Weighted_Value)[1,1])

##---DEC---##

testProbs_DEC = data.frame(Outcome = as.factor(burnTest$burned_DEC_numeric),
                       Probs = predict(reg_DEC, burnTest, type= "response"))

iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
    this_prediction <- testProbs_DEC %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0])) %>%
       gather(Variable, Count) %>%
       mutate(Weighted_Value =
               case_when(Variable == "True_Negative" ~ (Count * 0),
                         Variable == "True_Positive" ~ (Count * 50),
                         Variable == "False_Negative" ~ (Count * (-100)),
                         Variable == "False_Positive"~ (Count * (-25))),
              Threshold = x)
    all_prediction <- rbind(all_prediction, this_prediction)
    x <- x + .01
  }
return(all_prediction)
}

whichThreshold_DEC <- iterateThresholds(testProbs_DEC)

whichThreshold_DEC <- whichThreshold_DEC %>% 
  mutate(blocks = Count) %>%
  group_by(Threshold) %>% 
  summarize(Weighted_Value = sum(Weighted_Value),
            blocks = sum(blocks))

optimal_th_DEC <- pull(arrange(whichThreshold_DEC, -Weighted_Value)[1,1])

fishnet <- fishnet %>%
  mutate(Probs_JAN = predict(reg_JAN, fishnet, type= "response"))
         fishnet$predOutcome_JAN = as.factor(ifelse(fishnet$Probs_JAN > optimal_th_JAN , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_FEB = predict(reg_FEB, fishnet, type= "response"))
         fishnet$predOutcome_FEB = as.factor(ifelse(fishnet$Probs_FEB > optimal_th_FEB , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_MAR = predict(reg_MAR, fishnet, type= "response"))
         fishnet$predOutcome_MAR = as.factor(ifelse(fishnet$Probs_MAR > optimal_th_MAR , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_APR = predict(reg_APR, fishnet, type= "response"))
         fishnet$predOutcome_APR = as.factor(ifelse(fishnet$Probs_APR > optimal_th_APR , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_MAY = predict(reg_MAY, fishnet, type= "response"))
         fishnet$predOutcome_MAY = as.factor(ifelse(fishnet$Probs_MAY > optimal_th_MAY , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_JUN = predict(reg_JUN, fishnet, type= "response"))
         fishnet$predOutcome_JUN = as.factor(ifelse(fishnet$Probs_JUN > optimal_th_JUN , "burned", "not"))

fishnet <- fishnet %>%
  mutate(Probs_JUL = predict(reg_JUL, fishnet, type= "response"))
         fishnet$predOutcome_JUL = as.factor(ifelse(fishnet$Probs_JUL > optimal_th_JUL , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_AUG = predict(reg_AUG, fishnet, type= "response"))
         fishnet$predOutcome_AUG = as.factor(ifelse(fishnet$Probs_AUG > optimal_th_AUG , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_SEP = predict(reg_SEP, fishnet, type= "response"))
         fishnet$predOutcome_SEP = as.factor(ifelse(fishnet$Probs_SEP > optimal_th_SEP , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_OCT = predict(reg_OCT, fishnet, type= "response"))
         fishnet$predOutcome_OCT = as.factor(ifelse(fishnet$Probs_OCT > optimal_th_OCT , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_NOV = predict(reg_NOV, fishnet, type= "response"))
         fishnet$predOutcome_NOV = as.factor(ifelse(fishnet$Probs_NOV > optimal_th_NOV , "burned", "not"))
         
fishnet <- fishnet %>%
  mutate(Probs_DEC = predict(reg_DEC, fishnet, type= "response"))
         fishnet$predOutcome_DEC = as.factor(ifelse(fishnet$Probs_DEC > optimal_th_DEC , "burned", "not"))

fishnet_Prediction_grid <- fishnet %>% dplyr::select(uniqueID, predOutcome_JAN, predOutcome_FEB, predOutcome_MAR, predOutcome_APR, predOutcome_MAY, predOutcome_JUN, predOutcome_JUL, predOutcome_AUG, predOutcome_SEP, predOutcome_OCT, predOutcome_NOV, predOutcome_DEC) %>% st_drop_geometry() %>% gather(Variable, Value, -uniqueID)

fishnet_Prediction_grid <- left_join(fishnet %>% dplyr::select(uniqueID), fishnet_Prediction_grid, by = "uniqueID")

Monthly_animation <-
  ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet_Prediction_grid %>% 
            mutate(across(Variable, factor, levels=c("predOutcome_JAN", "predOutcome_FEB", "predOutcome_MAR", "predOutcome_APR",
                                                     "predOutcome_MAY", "predOutcome_JUN", "predOutcome_JUL", "predOutcome_AUG",
                                                     "predOutcome_SEP", "predOutcome_OCT", "predOutcome_NOV", "predOutcome_DEC"))), aes(fill = Value), color = "transparent") +
  scale_fill_manual(values = c("not" = bluePalette5[1], "burned" = bluePalette5[4]))+
  labs(title = "Wildfire Prediction Area to Maximize Resource Effect",
       subtitle = "Month: {current_frame}",
       caption = "") +
    transition_manual(Variable) +
  mapThememin2()


Monthly_animation_plot <- animate(Monthly_animation, duration=10, renderer = gifski_renderer())

fishnet_Probability_grid <- fishnet %>% dplyr::select(uniqueID, Probs_JAN,  Probs_FEB,  Probs_MAR,  Probs_APR,  Probs_MAY,  Probs_JUN,  Probs_JUL,  Probs_AUG,  Probs_SEP,  Probs_OCT,  Probs_NOV,  Probs_DEC) %>% st_drop_geometry() %>% gather(Variable, Value, -uniqueID)

fishnet_Probability_grid <- left_join(fishnet %>% dplyr::select(uniqueID), fishnet_Probability_grid, by = "uniqueID")

Monthly_animation_Prob <-
  ggplot() +
  annotation_map_tile("cartolight", forcedownload = FALSE)+
  geom_sf(data = study_area, fill = greyPalette5[1], color = "transparent", alpha = 0.5) +
  geom_sf(data = fishnet_Probability_grid %>% 
            mutate(across(Variable, factor, levels=c("Probs_JAN", "Probs_FEB", "Probs_MAR", "Probs_APR",
                                                     "Probs_MAY", "Probs_JUN", "Probs_JUL", "Probs_AUG",
                                                     "Probs_SEP", "Probs_OCT", "Probs_NOV", "Probs_DEC"))), aes(fill = q5(Value)), color = "transparent") +
  scale_fill_manual(values = bluePalette5,
                    labels = round(as.numeric(qBr2(fishnet_Probability_grid, "Value"))*100),
                    name = "Probability(%)",
                    na.translate=FALSE)+
  labs(title = "Probablity of Wildfire",
       subtitle = "Month: {current_frame}",
       caption = "") +
    transition_manual(Variable) +
  mapThememin2()


Monthly_animation_Prob_plot <- animate(Monthly_animation_Prob, duration=10, renderer = gifski_renderer())

Monthly_animation_Prob_plot

Monthly_animation_plot

Ⅶ. Conclusion

In conclusion, we built Logistic Regression model to predict probability occurrence rate of wildfires in California, which is in order to help California Department of Forestry and Fire Protection reduce wildfire risk and we also designed a web application to better support their work. We have created two types of model based on the time intervals.

The first model is based on year, the AUC of which is quite high reaching 0.85, indicating that the model has a high goodness of fit. The sensitivity of the model is about 0.89 after cross validation on 100 k-folds while the specificity is close to 0.75. This model generalizes well on true positive and the specificity needs to be improved. Then, we conducted weight scenario analysis to find the optimal threshold. We set different weights to different confusion matrix types. The result shows the threshold of 0.25 is the best value. The optimized model’s result has increased sensitivity but decreased specificity a lot. But the year prediction map doesn’t seems work too much, since almost everywhere will occur wildfires and lack of time information.

Thus, we have created 12 models based on month and made prediction maps for each month. The results are convincing. To improve our accuracy of the model, we should add more factors related to wildfires like drought conditions, reservoir points and so on.

While there is room for improvement, we believe that our model can make sense in predicting wildfire risk in California and our application can supply better service to the government department.