⚠️ This is the first operational version of the handbook, but it is still a work in progress and will be heavily updated during 2024! ⚠️

Hazard Assessment for Wildfire - Machine Learning Approach#

Click Fire to go to this workflow’s GitHub repository.

Contributors:

This Jupyter Notebook is used to compute a Hazard map for an interested area using these inputs: DEM, Corine land-cover, Fire data, administrative level (NUTS). Moreover, it is used to analyze fire hazard data and perform various preprocessing and analysis tasks.

Introduction#

In this section, an overview of the project and the goals of the analysis will be provided. This analysis is based on the hazard mapping works of Tonini et al. 2020, Trucchia et al. 2023.

The workflow is based on the following steps:

  • Data gathering and preprocessing.

  • Building a model for wildfire susceptibility using present climate conditions and synoptic wildfire events.

  • Projecting the model to future climate conditions.

  • For both cases, susceptibility can be evolved to hazard by considering the different plant functional types, which are a proxy for the intensity of potential wildfires. See Trucchia et al. 2023 for more details.

  • Damage assessment for different vulnearbility categories and exposed elements(roads) in order to get Risk maps.

  • Regarding climate, the analysis revolves around a High-resolution gridded climate data for Europe based on bias-corrected EURO-CORDEX: the ECLIPS-2.0 dataset. ECLIPS (European CLimate Index ProjectionS) dataset contains gridded data for 80 annual, seasonal, and monthly climate variables for two past (1961-1990, 1991-2010) and five future periods (2011-2020, 2021-2140, 2041-2060, 2061-2080, 2081-2100). The future data are based on five Regional Climate Models (RCMs) driven by two greenhouse gas concentration scenarios, RCP 4.5 and 8.5. See Debojyoti et al. 2020 for more details.

Preparation Work#

At the present stage, the analysis case study is the Catalonia region in Spain. A prepared sample dataset for Catalonia is available from the CLIMAAX cloud storage.

Assessing risk for other regions

To assess the wildfire hazard and risk for a different region, data equivalent to the provided Catalonia sample have to be provided and substituted throughout the workflow. This includes:

  • A shapefile of the region (sample data in data_cat/adm_level_stanford/)

  • Digital Elevation Model (DEM) raster data (sample data in dem_processing/)

  • Land cover data (sample data from CORINE in data_cat/hazard/input_hazard/)

  • A dataset of historical fires (sample data in data_cat/hazard/input_hazard/fires/)

  • A climate dataset with parameters relating to fuel availability and fire danger (sample data based on ECLIPS-2.0 in resized_climate/).

  • Vulnerability data (sample data from JRC with European coverage in /data_cat/Vul_data/; risk assessment only)

  • Exposure data, e.g. for critical infrastructure (sample data in data_cat/risk/; risk assessment only)

  • NUTS level data for aggregation (sample data for Spain in data_cat/administrative_units_NUTS/; risk assessment only)

To simplify the creation of the climate dataset for input in the machine learning model, CLIMAAX provides a (partial) dataset mirror of relevant parameters from the ECLIPS-2.0 dataset. For the purposes of this workflow, we recommend downloading the ECLIPS-2.0 data via this mirror, as the original dataset is only provided as a single 87.2 GB-large compressed archive file.

Most of the analysis is based on raster calculations. The “base” raster is a clipped dem file (data_cat/hazard/input_hazard/dem_3035_clip.tif), which has been clipped using the extent of the Catalonia administrative shapefile. The raster is metric, using the EPSG:3035 projection, with 100 meter resolution, and with extent (left, bottom, right, top) given by:

3488731.355 1986586.650 3769731.355 2241986.650

Import libraries#

import os

import pooch
from tqdm import tqdm

import numpy as np
from osgeo import gdal, ogr
import geopandas as gpd
import pandas as pd
from scipy import signal
import rasterio
from rasterio import features
from rasterio.plot import show
from rasterio.warp import calculate_default_transform, reproject, Resampling
import rasterio.plot
from rasterio.mask import mask

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib import colors
import matplotlib.patches as mpatches

Sample data#

The data is stored in three different folders:

  • data_cat

  • dem_processing

  • resized_climate

Sample data for the Catalonia case study is available from the CLIMAAX cloud storage. We will download all files here first so it is available in the following. A registry for the data (about 100 files, 1.3 GB total) that can be used by the pooch package is included in this workflow (files_registry.txt). We load it here and retrieve all files. If any files were downloaded before, pooch will inspect the local file contents and skip the download if the contents match expectations.

# Pooch downloader for workflow directory and CLIMAAX cloud storage
sample_pooch = pooch.create(
    path=".",
    base_url="https://object-store.os-api.cci1.ecmwf.int/climaax/wildfire_sample_cat/"
)
sample_pooch.load_registry("files_registry.txt")

# Download all files from the attached registry
for path in sample_pooch.registry:
    sample_pooch.fetch(path)

ECLIPS-2.0 dataset#

To overlay wildfire polygons with climate data, and to perform future projections of wildfire hazard, a climate dataset is needed.

Here, we download the ECLIPS-2.0 dataset from Zenodo with pooch.

Important

The data is available as an around 80 GB-large 7z archive, that has to be unpacked after downloading. This requires an appropriate amount of free disk space and a 7z-capable unpacking application.

pooch.retrieve(
    "doi:10.5281/zenodo.3952159/ECLIPS2.0.7z",
    path=".",
    fname="ECLIPS2.0.7z",
    known_hash="e70aee55e365f734b2487c7b8b90f02fc57f6ff80f23f3980d1576293fbadea6",
    progressbar=True
)

Choose your preferred unpacking application to extract the contents of the ECLIPS2.0.7z file into the workflow directory. Here, we use the command line application 7za that should have been installed with the p7zip package in the climaax_fire conda environment. This will take some time.

! 7za x "ECLIPS2.0.7z"

Structure of the ECLIPS2.0 folder#

Four different folders:

  • ECLIPS2.0_196190: Historical data from 1961 to 1990

  • ECLPS2.0_199110: Historical data from 1991 to 2010 (note they typo in the folder name)

  • ECLIPS2.0_RCP45: Emission scenario RCP4.5

  • ECLIPS2.0_RCP85: Emission scenario RCP8.5

Structure of the ECLIPS2.0_RCP45 folder#

In this folder, analogous to the other folder RCP85, there are 5 subfolders:

  • CLMcom_CCLM_4.5

  • CLMcom_CLM_4.5

  • DMI_HIRAM_4.5

  • KMNI_RAMCO_4.5

  • MPI_CSC_REMO2009_4.5

I assumed that CCLM stands for Regional nonhydrostatic Consortium for Small-Scale Modeling (COSMO) model in Climate Mode (COSMO-CLM; abbreviated as CCLM) [1] but this is not in line with the description of the dataset [2].

CLMC stands for Climate Limited-Area Modelling Community [2]. In [2], the Authors specified that they used five daily bias-corrected regional climate model results out of nine projections available in the EURO-CORDEX database at the time of this study. The criteria used to select the five climate projections were as follows: (a) representation of all available RCMs and GCMs; (b) two RCMs being nested in the same driving GCM; and © one RCM being driven by two different GCMs. Such criteria were adopted to ensure the representativeness of all combinations of RCMs and GCMs available in the EURO-CORDEX database. The models were driven by two Representative Concentration Pathways scenarios RCP4.5 and RCP8.5 (Moss et al., 2010). The simulations were run for the EUR-11 domain with 0.11° × 0.11° horizontal resolution (Giorgi et al., 2009; Jacob et al., 2014).

  • [1] Rolf Zentek and Günther Heinemann, Verification of the regional atmospheric model CCLM v5.0 with conventional data and lidar measurements in Antarctica, Volume 13, issue 4, Geoscientifc Model Development, 13, 1809–1825, 2020 https://doi.org/10.5194/gmd-13-1809-2020

  • [2] High-resolution gridded climate data for Europe based on bias-corrected EURO-CORDEX: The ECLIPS dataset Debojyoti Chakraborty, Laura Dobor, Anita Zolles, Tomáš Hlásny, and Silvio Schueler.Geoscience Data Journal,2021;8:121–131. DOI: 10.1002/gdj3.110

Data in the CLMcom_CCLM_4.5 folder#

Variable table

Acronym

Variable name

Unit

MWMT

Mean warmest month temperature

°C

MCMT

Mean coldest month temperature

°C

TD

Continentality

°C

AHM

Annual heat:moisture index

°C/mm

SHM

Summer heat:moisture index

°C/mm

DDbelow0

Degree-days below 0°C

°C

DDabove5

Degree-days above 5°C

°C

DDbelow18

Degree-days below 18°C

°C

DDabove18

Degree-days above 18°C

°C

NFFD

Number of frost-free days

-

FFP

Longest frost-free period

days

bFFP

Begining of FFP

day

eFFP

End of FFP

day

EMT

Extreme minimum temperature

°C

MAT

Annual mean temperaure

°C

MAP

Annual total precipitation

mm

Tmin_an

Annual mean of minimum temperature

°C

Tmax_an

Annual mean of maximum temperature

°C

Tmax_01 to Tmax_12

Maximum monthly temperatures

°C

Tmin_01 to Tmin_12

Minimum monthly temperatures

°C

Tave_01 to Tave_12

Mean monthly temperatures

°C

Tave_at

Mean autumn temperature

°C

Tave_sm

Mean summer temperature

°C

Tave_sp

Mean spring temperature

°C

Tave_wt

Mean winter temperature

°C

Tmax_at

Maximum autumn temperature

°C

Tmax_sm

Maximum summer temperature

°C

Tmax_sp

Maximum spring temperature

°C

Tmax_wt

Maximum winter temperature

°C

Tmin_at

Minimum autumn temperature

°C

Tmin_sm

Minimum summer temperature

°C

Tmin_sp

Minimum spring temperature

°C

Tmin_wt

Minimum winter temperature

°C

PPT_at

Mean autumn precipitation

mm

PPT_sm

Mean summer precipitation

mm

PPT_sp

Mean spring precipitation

mm

PPT_wt

Mean winter precipitation

mm

PPT_01 to PPT_12

Mean monthly precipitation

mm

The data is structured like this:

Feature

Description

Resolution

30 arcsec

Coordinate System

WGS 84

Projection

CRS (“+proj=longlat +datum=WGS84”), (EPSG:4326)

Data Format

GeoTIFF

Extent

-32.65000, 69.44167, 30.87892, 71.57893 (xmin, xmax, ymin, ymax)

Temporal scale

Past climate: mean of 1961-1990 & 1991-2010

Future periods: Means of 2011-2020, 2021-2040, 2041-2060, 2061-2080,2081-2100

Climate forcing scenarios

RCP 8.5 and RCP 4.5

Number of variables

80

The data we want is the following set of variables:

  • MWMT, Mean warmest month temperature

  • TD, Continentality

  • AHM, Annual Heat-Moisture Index

  • SHM, Summer Heat-Moisture Index

  • DDbelow0, Degree-days below 0°C

  • DDabove18, Degree-days above 18°C

  • MAT, Annual mean temperaure

  • MAP, Annual total precipitation

  • Tave_sm, Mean summer temperature

  • Tmax_sm, Maximum summer temperature

  • PPT_at, Mean autumn precipitation

  • PPT_sm, Mean summer precipitation

  • PPt_sp, Mean spring precipitation

  • PPT_wt, Mean winter precipitation

With these descriptors I can catch some features related to fuel availability and fire danger. MWMT, TD, DDbelow0, DDabove18, MAT, Tave_sm, Tmax_sm are related to temperature, with a particular focus on the summer season. AHM, SHM, are related to the interplay between heat and moisture, also with a focus on summer period. MAP, PPT_at, PPT_sm, PPt_sp, PPT_wt are related to precipitation, with a particular focus on the seasonality.

Path configuration#

Paths to data for nearly all the layers used in the notebook (mostly raster files and shapefiles).

ECLIPS2p0_path = "./ECLIPS2.0/"
dem_path = "./data_cat/hazard/input_hazard/dem2_3035.tif"
dem_path_clip = "./data_cat/hazard/input_hazard/dem_3035_clip.tif"
slope_path = "./dem_processing/slope.tif"
aspect_path = "./dem_processing/aspect.tif"
easting_path = "./dem_processing/easting.tif"

northing_path = "./dem_processing/northing.tif"
roughness_path = "./dem_processing/roughness.tif"

clc_path = "./data_cat/hazard/input_hazard/veg_corine_reclass.tif"
clc_path_clip = "./data_cat/hazard/input_hazard/veg_corine_reclass_clip.tif"
clc_path_clip_nb = "./data_cat/hazard/input_hazard/veg_corine_reclass_clip_nb.tif"
fires_raster_path = "./data_cat/hazard/input_hazard/fires_raster.tif"

res_clim_dir_path = "./resized_climate/"
folder_1961_1990 = ECLIPS2p0_path + "ECLIPS2.0_196190/"
folder_1991_2010 = ECLIPS2p0_path + "ECLPS2.0_199110/"
folder_rcp45 = ECLIPS2p0_path + "ECLIPS2.0_RCP45/"
folder_rcp85 = ECLIPS2p0_path + "ECLIPS2.0_RCP85/"
CLMcom_CCLM_4p5_path = folder_rcp45 + "CLMcom_CCLM_4.5/"
CLMcom_CCLM_8p5_path = folder_rcp85 + "CLMcom_CCLM_8.5/"

Names of the climate variables taken from the ECLIPS2.0 dataset and related file names for all time periods and climate scenarios:

var_names = ["MWMT", "TD", "AHM", "SHM", "DDbelow0", "DDabove18", "MAT", "MAP", "Tave_sm", "Tmax_sm", "PPT_at", "PPT_sm", "PPT_sp", "PPT_wt"]

# Historical
f_hist_1961_1990 = [folder_1961_1990 + vv + "_196190.tif" for vv in var_names ]
f_hist_1991_2010 = [folder_1991_2010 + vv + "_199110.tif" for vv in var_names ]

# RCP 4.5
f_rcp45_2011_2020 = [ CLMcom_CCLM_4p5_path + vv + "_201120.tif" for vv in var_names ]
f_rcp45_2021_2040 = [ CLMcom_CCLM_4p5_path + vv + "_202140.tif" for vv in var_names ]
f_rcp45_2041_2060 = [ CLMcom_CCLM_4p5_path + vv + "_204160.tif" for vv in var_names ]
f_rcp45_2061_2080 = [ CLMcom_CCLM_4p5_path + vv + "_206180.tif" for vv in var_names ]

# RCP 8.5
f_rcp85_2011_2020 = [ CLMcom_CCLM_8p5_path + vv + "_201120.tif" for vv in var_names ]
f_rcp85_2021_2040 = [ CLMcom_CCLM_8p5_path + vv + "_202140.tif" for vv in var_names ]
f_rcp85_2041_2060 = [ CLMcom_CCLM_8p5_path + vv + "_204160.tif" for vv in var_names ]
f_rcp85_2061_2080 = [ CLMcom_CCLM_8p5_path + vv + "_206180.tif" for vv in var_names ]

Outputs and misc paths:

# Present Susceptibility and Hazard
output_susc_present = "./data_cat/hazard/Hazard_output/my_suscep_present.tif"
output_hazard_present = "./data_cat/hazard/Hazard_output/my_hazard_present.tif"

# RCP 4.5 Susceptibility and Harzad
output_susc_4p5_2021_2040 = "./data_cat/hazard/Hazard_output/my_suscep_4p5_2021_2040.tif"
output_susc_4p5_2041_2060 = "./data_cat/hazard/Hazard_output/my_suscep_4p5_2041_2060.tif"
output_hazard_4p5_2021_2040 = "./data_cat/hazard/Hazard_output/my_hazard_4p5_2021_2040.tif"
output_hazard_4p5_2041_2060 = "./data_cat/hazard/Hazard_output/my_hazard_4p5_2041_2060.tif"

# RCP 8.5 Susceptibility and Hazard
output_susc_8p5_2021_2040 = "./data_cat/hazard/Hazard_output/my_suscep_8p5_2021_2040.tif"
output_susc_8p5_2041_2060 = "./data_cat/hazard/Hazard_output/my_suscep_8p5_2041_2060.tif"
output_hazard_8p5_2021_2040 = "./data_cat/hazard/Hazard_output/my_hazard_8p5_2021_2040.tif"
output_hazard_8p5_2041_2060 = "./data_cat/hazard/Hazard_output/my_hazard_8p5_2041_2060.tif"

os.makedirs("./data_cat/hazard/Hazard_output", exist_ok=True)

In the following cell, the rasters are clipped using the extent of the shapefile of the sample region. Execute the cell only if you want to resize the rasters. If a shapefile is available, the DEM can be clipped on that.

# Clip raster with shapefile of area of interest (Catalonia - ALREADY DONE)
# The DEM file has been clipped to the extent of shapefile.

# Load the shapefile: example file for Catalonia
catalonia_adm_path = "./data_cat/adm_level_stanford/catalonia_adm_3035.shp"
gdf = gpd.read_file(catalonia_adm_path)
shapes = gdf.geometry.values

# Clip the DEM file
with rasterio.open(dem_path) as src:
    out_image, out_transform = mask(src, shapes, crop=True)
    out_meta = src.meta.copy()

out_meta.update({"driver": "GTiff",
                 "height": out_image.shape[1],
                 "width": out_image.shape[2],
                 "transform": out_transform})
# Save the clipped raster
with rasterio.open("./provaclip.tiff", "w", **out_meta) as dest:
    dest.write(out_image)

Extract bounds of area of interest’s DEM#

We want to clip all climatic data on that extent:

left = rasterio.open(dem_path_clip).bounds.left
bottom = rasterio.open(dem_path_clip).bounds.bottom
right = rasterio.open(dem_path_clip).bounds.right
top = rasterio.open(dem_path_clip).bounds.top

print(rasterio.open(dem_path_clip).bounds)
BoundingBox(left=3488731.355109199, bottom=1986586.6503754416, right=3769731.355109199, top=2241986.6503754416)

Resizing of the climate rasters#

Execute this part only if you want to resize the rasters.

# This is the path of the folder where the resized climate rasters will be stored
output_folder = "./resized_climate/"

# Create the output folders if they don't exist
os.makedirs(output_folder, exist_ok=True)

# In the folder there will be 4 subfolders, one for each time period
raster_subfolders = ["rcp45_2011_2020", "rcp45_2021_2040", "rcp45_2041_2060",
                     'rcp85_2011_2020', 'rcp85_2021_2040', 'rcp85_2041_2060',
                     "hist_1961_1990", "hist_1991_2010"]

# Loop through the lists and perform gdalwarp for each raster... I need to do a zip with the subfolder list and the raster list
for raster_list, raster_subfolder_name in zip([ f_rcp45_2011_2020, f_rcp45_2021_2040, f_rcp45_2041_2060,
                                                f_rcp85_2011_2020, f_rcp85_2021_2040, f_rcp85_2041_2060,
                                                f_hist_1961_1990, f_hist_1991_2010],
                                                raster_subfolders):
    for raster_path in tqdm(raster_list, desc = f"Processing {raster_subfolder_name}"):
        # Get the filename from the path
        filename = os.path.basename(raster_path)
        output_subfolder = os.path.join(output_folder, raster_subfolder_name)
        # Construct the output path by joining the output folder, the subfolder and the filename
        output_path = os.path.join(output_subfolder, filename)
        # create the output subfolder if it doesn't exist
        os.makedirs(output_subfolder, exist_ok=True)
        # Perform gdalwarp using the string command. I use the data of the blueprint DEM raster as a reference
        os.system(f"gdalwarp -t_srs EPSG:3035 -tr 100 -100 -te {left} {bottom} {right} {top} -r bilinear  -overwrite -co COMPRESS=LZW -co TILED=YES -co BIGTIFF=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 {raster_path} {output_path}")

Check if the dimensions of the output rasters are the same as the reference raster:

# Path to the reference raster
reference_raster_path = dem_path_clip
check_resizing = False
if check_resizing:
    # Loop through the output files and compare their dimensions with the reference raster
    for raster_list, raster_subfolder_name in zip([f_rcp45_2011_2020, f_rcp45_2021_2040,
                                                    f_rcp85_2011_2020, f_rcp85_2021_2040, f_rcp85_2041_2060,
                                                    f_hist_1961_1990, f_hist_1991_2010], raster_subfolders):
        for raster_path in tqdm(raster_list, desc=f"Processing {raster_subfolder_name}"):
            # Open the output raster
            output_raster_path = os.path.join(output_folder, raster_subfolder_name, os.path.basename(raster_path))
            with rasterio.open(output_raster_path) as output_raster:
                # Open the reference raster
                with rasterio.open(reference_raster_path) as reference_raster:
                    # Get the dimensions of the output raster
                    output_rows, output_cols = output_raster.shape
                    # Get the dimensions of the reference raster
                    reference_rows, reference_cols = reference_raster.shape

                    # Compare the dimensions
                    if output_rows == reference_rows and output_cols == reference_cols:
                        print(f"{raster_path} has the same dimensions as the reference raster.")
                    else:
                        print(f"{raster_path} does NOT have the same dimensions as the reference raster.")

Clip the CORINE raster to the extent of the area of interest:

raster_clc_path = "./data_cat/hazard/input_hazard/veg_corine_reclass.tif"
output_clc_path = "./data_cat/hazard/input_hazard/veg_corine_reclass_clip.tif"

# Clip, warp and explore the corine raster: DONE for Catalonia example data
# and reprojection to EPSG:3035 (no need to run again)!
os.system(f"gdalwarp -t_srs EPSG:3035 -tr 100 -100 -te {left} {bottom} {right} {top} -r near  -overwrite -co COMPRESS=LZW -co TILED=YES -co BIGTIFF=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 {raster_clc_path} {output_clc_path}")
Creating output file that is 2810P x 2554L.
Processing ./data_cat/hazard/input_hazard/veg_corine_reclass.tif [1/1] : 0Using internal nodata values (e.g. 0) for image ./data_cat/hazard/input_hazard/veg_corine_reclass.tif.
Copying nodata values from source ./data_cat/hazard/input_hazard/veg_corine_reclass.tif to destination ./data_cat/hazard/input_hazard/veg_corine_reclass_clip.tif.
...10...20...30...40...50...60...70...80...90...100 - done.
0

Obtain the slope and aspect layers#

Process DEM: calculate slope, aspect and roughness and save them into the output folder.

Hide code cell source
def save_raster_as(array, output_file, reference_file, **kwargs):
    """Save a raster from a 2D numpy array using another raster as reference to get the spatial extent and projection.
    
    :param array: 2D numpy array with the data
    :param output_file: Path to the output raster
    :param reference_file: Path to a raster who's geotransform and projection will be used
    :param kwargs: Keyword arguments to be passed to rasterio.open when creating the output raster
    """
    with rasterio.open(reference_file) as f:
        profile = f.profile
        profile.update(**kwargs)
        with rasterio.open(output_file, 'w', **profile) as dst:
            dst.write(array.astype(profile['dtype']), 1)


def plot_raster_V2(raster,ref_arr, cmap = 'seismic', title = '', figsize = (10, 8), dpi = 300, outpath = None,
                    array_classes = [], classes_colors = [], classes_names = [], shrink_legend = 1, xy = (0.5, 1.1), labelsize = 10,
                    basemap = False,
                    basemap_params = {'crs' : 'EPSG:4326', 'source' : None, 'alpha' : 0.5, 'zoom' : '11'},
                    add_to_ax: tuple = None):
    '''Plot a raster object with possibility to add basemap and continuing to build upon the same ax.

    Example with discrete palette:
    array_classes = [-1, 0, 0.2, 0.4, 0.6, 0.8, 1],  # all the values including nodata
    classes_colors = ['#0bd1f700','#0bd1f8', '#1ff238', '#ea8d1b', '#dc1721', '#ff00ff'], # a color for each range
    classes_names = [ 'no data', 'Very Low', 'Low', 'Medium', 'High', 'Extreme'], # names

    add_to_ax: pass an axs to overlay other object to the same ax. it is a tuple (fig, ax)
    '''
    if add_to_ax is None:
        fig, ax = plt.subplots(figsize=figsize, dpi = dpi)
    else:
        fig = add_to_ax[0]
        ax = add_to_ax[1]

    if len(array_classes) > 0 and len(classes_colors) > 0 and len(classes_names) > 0:
        cmap = colors.ListedColormap(classes_colors)
        norm = colors.BoundaryNorm(array_classes, cmap.N)

        # plot the raster
        f = show(np.where(ref == -9999, np.nan,raster), ax = ax,
                cmap = cmap, norm=norm, interpolation='none')

        img = f.get_images()[0]

        # trick to shift ticks labels in the center of each color
        cumulative = np.cumsum(array_classes, dtype = float)
        cumulative[2:] = cumulative[2:] - cumulative[:-2]
        ticks_postions_ = cumulative[2 - 1:]/2
        ticks_postions = []
        ticks_postions.extend(list(ticks_postions_))

        # plot colorbar
        cbar = fig.colorbar(img, boundaries=array_classes, ticks=ticks_postions, shrink = shrink_legend)
        cbar.ax.set_yticklabels(classes_names)
        cbar.ax.tick_params(labelsize = labelsize)
    else:
        # use imshow so that we have something to map the colorbar to
        image = show(np.where(ref == -9999, np.nan,raster), ax = ax,
                    cmap = cmap)
        img = image.get_images()[0]
        cbar = fig.colorbar(img, ax=ax,  shrink = shrink_legend)
        cbar.ax.tick_params(labelsize = labelsize)

    ax.set_xticks([])
    ax.set_yticks([])
    for s in [ "top", 'bottom', "left", 'right']:
        ax.spines[s].set_visible(False)

    ax.annotate(title,
                xy = xy, xytext = xy, va = 'center', ha = 'center',
                xycoords  ='axes fraction', fontfamily='sans-serif', fontsize = 12, fontweight='bold')

    if basemap:
        if basemap_params['source'] is None:
            cx.add_basemap(ax, crs = basemap_params['crs'], source = cx.providers.OpenStreetMap.Mapnik, alpha = basemap_params['alpha'],
                            zorder = -1)
        else:
            cx.add_basemap(ax, crs = basemap_params['crs'], source = basemap_params['source'], alpha = basemap_params['alpha'],
                            zorder = -1, zoom = basemap_params['zoom'])

    if outpath is not None:
        fig.savefig(outpath, dpi = dpi, bbox_inches = 'tight')

    return fig, ax


def save_raster_as_h(array, output_file, reference_file, **kwargs):
    """Save a raster from a 2D numpy array using another raster as reference to get the spatial extent and projection.
    
    :param array: 2D numpy array with the data
    :param output_file: Path to the output raster
    :param reference_file: Path to a raster who's geotransform and projection will be used
    :param kwargs: Keyword arguments to be passed to rasterio.open when creating the output raster
    """
    with rasterio.open(reference_file) as f:
        profile = f.profile
        profile.update(**kwargs)
        mask = array == profile['nodata']
        array  = np.ma.array(array, mask=mask)
        with rasterio.open(output_file, 'w', **profile) as dst:
            dst.write(array.astype(profile['dtype']), 1)


def process_dem(dem_path, output_folder, verbose = False):
    """Calculate slope and aspect from a DEM and save them to the output folder.
    
    :param dem_path: Path to the DEM
    :param output_folder: Path to the output folder
    :param verbose: If True, print some messages
    :return: Nothing
    """
    slope_path = os.path.join(output_folder, "slope.tif")
    aspect_path = os.path.join(output_folder, "aspect.tif")
    northing_path = os.path.join(output_folder, "northing.tif")
    easting_path = os.path.join(output_folder, "easting.tif")
    roughness_path = os.path.join(output_folder, "roughness.tif")
    # Create the output folder if it doesn't exist
    os.makedirs(output_folder, exist_ok=True)

    with rasterio.open(dem_path) as src:
        if verbose:
            print(f'Reading dem file {dem_path}')
        dem = src.read(1, masked  = True)
        if verbose:
            print(f'This is what {dem_path} looks like')
            plt.imshow(dem)
            plt.title('DEM')
            plt.colorbar(shrink = 0.5)
            plt.xticks([])
            plt.yticks([])
            plt.show()
            print(f'Calculating slope, aspect and roughness')
            gdal.DEMProcessing(slope_path, dem_path, 'slope')
            gdal.DEMProcessing(aspect_path, dem_path, 'aspect')
            gdal.DEMProcessing(roughness_path, dem_path, 'roughness')

    with rasterio.open(aspect_path) as f:
        if verbose:
            print(f'Calculating northing and easting files')
            print(f'Reading aspect file {aspect_path}')
        aspect = f.read(1,   masked = True)
        if verbose:
            print(f'Aspect looks like this...')
            plt.imshow(aspect)
            plt.title('Aspect')
            plt.colorbar(shrink = 0.5)
            plt.xticks([])
            plt.yticks([])
            plt.show()
        #aspect[aspect <= -9999] = np.NaN
    northing = np.cos(aspect * np.pi/180.0)
    print(f'Saving northing file {northing_path}')
    save_raster_as(northing, northing_path, aspect_path)
    del northing
    print(f'Saving easting file {easting_path}')
    easting = np.sin(aspect * np.pi/180.0)
    save_raster_as(easting, easting_path, aspect_path)
    del easting


def resize_rasters(path_reference_raster, path_rasters_to_resize = None, output_folder = None, verbose = True):
    """This function resizes the rasters to the same size as the last raster.
    
    path_reference_raster: path to the referenec raster of the simulation.
    path_rasters_to_resize: list of paths to the rasters to resize. If None, it will resize all the rasters in the folder
    which are not the reference raster.
    verbose: if True, it will print the path of the rasters that are being resized as well as additional information.
    """
    with rasterio.open(path_reference_raster) as src:
        dst_crs = src.crs
        dst_transform = src.transform
        dst_height = src.height
        dst_width = src.width
        dst_bounds = src.bounds
        dst_meta = src.meta
        if verbose:
            print("destination raster crs:", dst_crs)
            print("destination raster transform:", dst_transform)
            print("destination raster height:", dst_height)
            print("destination raster width:", dst_width)
            print("destination raster bounds:", dst_bounds)

    if path_rasters_to_resize is None:
        # in this case, I will search for all the rasters in the folder of the last raster, which are not the last raster
        path_rasters_to_resize = []
        # I get the folder given the path of the last raster
        input_folder = os.path.dirname(path_reference_raster)
        # I get the name of the last raster
        target_file = os.path.basename(path_reference_raster)
        for filename in os.listdir(input_folder):
            if filename.endswith('.tiff') and (os.path.basename(filename) is not target_file):
                path_rasters_to_resize.append(os.path.join(input_folder, filename))
    if verbose:
        print("rasters to resize:", path_rasters_to_resize)

    for path_raster in path_rasters_to_resize:
        if verbose:
            print("resizing raster:", path_raster)
        with rasterio.open(path_raster) as src:
            src_crs = src.crs
            src_transform = src.transform
            src_height = src.height
            src_width = src.width
            src_bounds = src.bounds
            if verbose:
                print("source raster crs:", src_crs)
                print("source raster transform:", src_transform)
                print("source raster height:", src_height)
                print("source raster width:", src_width)
                print("source raster bounds:", src_bounds)
            #calculate transform array and shape of reprojected raster
            transform, width, height = calculate_default_transform(
            src_crs, dst_crs, dst_width, dst_height, *dst_bounds)
            if verbose:
                print("transform array of source raster")
                print(src_transform)
                print("transform array of destination raster")
                print(transform)
            #working off the meta for the destination raster
            kwargs = src.meta.copy()
            kwargs.update({
            'crs': dst_crs,
            'transform': transform,
            'width': width,
            'height': height
            })
            # open  newly resized raster, write it and save it
            # I will use the same name as the original raster, but with the suffix "_resized"
            # I will save it in the same folder as the original raster
            if output_folder is None:
                output_folder = os.path.dirname(path_raster)
            with rasterio.open(os.path.join(output_folder, os.path.basename(path_raster).split('.tiff')[0] + "_resized0.tiff"), 'w', **kwargs)  as resized_raster:
                    #reproject and save raster band data
                    for i in range(1, src.count + 1):
                        reproject(
                        source=rasterio.band(src, i),
                        destination=rasterio.band(resized_raster, i),
                        src_transform=src_transform,
                        src_crs=src_crs,
                        dst_transform=transform,
                        dst_crs=dst_crs,
                        resampling=Resampling.nearest)


def clean_resized_tiff(folder_name):
    """This function removes the resized tiff files.
    
    It deletes any file with the suffix "_resized in the name and that are tiff files.
    """
    for filename in os.listdir(folder_name):
        if filename.endswith('.tiff') and ("_resized" in filename):
            os.remove(os.path.join(folder_name, filename))
            print("removed file:", filename)


def rasterize_numerical_feature(gdf, reference_file, column=None, verbose = True):
    """Rasterize a vector file using a reference raster to get the shape and the transform.
    
    :param gdf: GeoDataFrame with the vector data
    :param reference_file: Path to the reference raster
    :param column: Name of the column to rasterize. If None, it will rasterize the geometries
    :return: Rasterized version of the vector file
    """
    with rasterio.open(reference_file) as f:
        out = f.read(1,   masked = True)
        myshape = out.shape
        mytransform = f.transform #f. ...
    del out
    if verbose:
        print("Shape of the reference raster:", myshape)
        print("Transform of the reference raster:", mytransform)
    out_array = np.zeros(myshape)#   out.shape)
    # this is where we create a generator of geom, value pairs to use in rasterizing
    if column is not None:
        shapes = ((geom, value) for geom, value in zip(gdf.geometry, gdf[column]))
    else:
        shapes = ((geom, 1) for geom in gdf.geometry)
    print("Features.rasterize is launched...")
    burned = features.rasterize(shapes=shapes, fill=np.NaN, out=out_array, transform=mytransform)#, all_touched=True)
    #    out.write_band(1, burned)
    print("Features.rasterize is done...")
    return burned

Run the function to calculate slope, aspect and roughness:

process_dem(dem_path_clip, "./dem_processing", verbose = True)
Reading dem file ./data_cat/hazard/input_hazard/dem_3035_clip.tif
This is what ./data_cat/hazard/input_hazard/dem_3035_clip.tif looks like
../../../../_images/9cbe857683cc89cdcf04d411cb1b51a1df4732eb01b3fed095083b5bec5e95ef.png
Calculating slope, aspect and roughness
Calculating northing and easting files
Reading aspect file ./dem_processing/aspect.tif
Aspect looks like this...
../../../../_images/c296f506f5d63c750571b76b5af295be407dcf4a9f3211d282b52493356f8d05.png
Saving northing file ./dem_processing/northing.tif
Saving easting file ./dem_processing/easting.tif

Land cover data#

# Create an info dataframe on CLC land cover (DONE)
clc_leg = pd.read_excel("./data_cat/hazard/input_hazard/clc2000legend.xls")

# Eliminate nans of the dataframe in column CLC_CODE, wioll put "0-0-0"
clc_leg["RGB"] = clc_leg["RGB"].fillna("0-0-0")

# Create dataframe with CLC_CODE, R, G, B (using split of RGB of clc_leg)
clc_leg["R"] = clc_leg["RGB"].apply(lambda x: int(x.split("-")[0]))
clc_leg["G"] = clc_leg["RGB"].apply(lambda x: int(x.split("-")[1]))
clc_leg["B"] = clc_leg["RGB"].apply(lambda x: int(x.split("-")[2]))

# I need to create a row for accouting for 0 values in the raster
clc_leg = pd.concat([clc_leg, pd.DataFrame({"CLC_CODE":0, "LABEL3":"No data/Not Burnable", "R":0, "G":0, "B":0}, index = [0])]).reset_index(drop = True)

# create a colormap for the land cover raster
clc_colormap = {}

# Create a colormap from the DataFrame
# For all clc codes
cmap = mcolors.ListedColormap(clc_leg[['R', 'G', 'B']].values/255.0, N=clc_leg['CLC_CODE'].nunique())
# For all clc codes except not burnable codes
cmap2 = mcolors.ListedColormap(clc_leg[['R', 'G', 'B']].values/255.0, N=clc_leg['CLC_CODE'].nunique() - 1)

# Create a list to hold the legend elements
legend_elements = []
# Iterate over the rows of the DataFrame
for _, row in clc_leg.iterrows():
    # Create a patch for each CLC code with the corresponding color and label
    color = (row['R']/255.0, row['G']/255.0, row['B']/255.0)
    print(color)
    label = str(row['CLC_CODE'])
    print(label)
    patch = mpatches.Patch(color=color, label=label)
    print(patch)
    # Add the patch to the legend elements list
    legend_elements.append(patch)

# PLOT THE LAND COVER RASTER
# Open the shapefile
catalonia_adm_path = "./data_cat/adm_level_stanford/catalonia_adm_3035.shp"
catalonia = gpd.read_file(catalonia_adm_path)

# Open the raster data
raster_clc_clipped = "./data_cat/hazard/input_hazard/veg_corine_reclass_clip.tif"
with rasterio.open(raster_clc_clipped) as src:
    # Read the raster band
    band = src.read(1)

    fig, ax = plt.subplots(figsize=(15, 15))
    # Plot the raster
    rasterio.plot.show(src, ax=ax, cmap = cmap)
    # Plot the shapefile
    catalonia.plot(ax=ax, facecolor="none", edgecolor="black", linewidth=2)
    # band is the raster data.  I want to know nrows, ncols, NODATA_value, dtype.
    ax.legend(handles=legend_elements,  loc='lower left' , bbox_to_anchor = (-.1,0), title="CLC_CODE")
    ax.set_title("Land cover", fontsize=20)
    ax.set_xticks([])
    ax.set_yticks([])
    print('The shape(rows-columns) of LC map is: ', band.shape,'\n','Datatype is: ', band.dtype,'\n', 'The range of values are: ' , band.min(),band.max())
    print('Values of LC codes are: ', '\n', np.unique(band))
The shape(rows-columns) of LC map is:  (2554, 2810) 
 Datatype is:  int16 
 The range of values are:  0 523
Values of LC codes are:  
 [  0 111 112 121 122 123 124 131 132 133 141 142 211 212 213 221 222 223
 231 241 242 243 311 312 313 321 322 323 324 331 332 333 334 335 411 421
 422 511 512 521 523]
../../../../_images/1fc84fba6aab9a5f167c9dd4a43712f2a39628e5d8a02e1e19906d089124e152.png

Save the raster of clc with all non-burnable classes set to 0:

list_of_non_burnable_clc = [111,112,121,122,123,124,131,132,133,141,142,
                            331,332,333,335,411,412,421,422,423,511,512,521,522,523]

# Below, some considerations in the 3XX classes of the CLC and their degree of burnability.
# 331 332 333  are respectively beaches, dunes, sands; bare rocks; sparsely vegetated areas; in this case I will consider them as non-burnable
# 334 burnt areas in this case I will consider them as burnable
# 335 glaciers and perpetual snow in this case I will consider them as non-burnable

# Creating legends for newclc
burnables = set(clc_leg['CLC_CODE'].tolist()) - set(list_of_non_burnable_clc)
burnables = list(burnables)
cmap2_gdf = clc_leg.loc[:,['CLC_CODE','R', 'G', 'B']]
cmap2_gdf = cmap2_gdf[cmap2_gdf['CLC_CODE'].isin(burnables)]
cmap2 = mcolors.ListedColormap(cmap2_gdf[['R', 'G', 'B']].values/255.0, N=cmap2_gdf['CLC_CODE'].nunique()-1)

# Open the raster
output_clc_path = "./data_cat/hazard/input_hazard/veg_corine_reclass_clip_nb.tif"
ref = rasterio.open(reference_raster_path).read(1)
with rasterio.open(raster_clc_clipped) as src:
    # Read the raster band
    band = src.read(1)
    # First plot, left side of the multiplot
    fig, ax = plt.subplots(figsize=(10, 10))
    # Plot the raster
    plt.imshow(np.where(ref == -9999, np.nan, band), cmap = cmap)
    plt.title('Original CLC')
    plt.colorbar(shrink = 0.5, label = 'CLC_CODE', ticks = [122, 221, 334, 512])
    plt.show()
    # Set the values in list_of_non_burnable_clc to 0
    band[np.isin(band, list_of_non_burnable_clc)] = 0
    # Plot the raster, right side of the multiplot
    fig, ax = plt.subplots(figsize=(10, 10))
    plt.imshow(np.where(ref == -9999, np.nan, band), cmap = cmap2)
    plt.title('Modified CLC by removing non-burnable classes')
    plt.colorbar(shrink = 0.5, label = 'CLC_CODE', ticks = [122, 221, 334, 512])
    plt.show()
    # Save the modified raster
    with rasterio.open(output_clc_path, 'w', **src.profile) as dst:
        dst.write(band, 1)
../../../../_images/a20c658b56c795abc7b730b6aee1f315c89a3cf31fd2b14711dbec0f631fcb15.png ../../../../_images/f6f005597036596d871992b1218ca6b7d317f90bb342e8b2230604c73aa75c11.png

Cleaning Fire Data#

Data cleaning of Catalan wildfires. Discard all rows whose "YEAR_FIRE" is not of 4 characters. How many are the rows discarded? The row discarded seem to be the overlapping fires… for this they have multiple date entries.

fires_path = "./data_cat/hazard/input_hazard/fires/Forest_Fire_1986_2022_filtrat_3035.shp"
fires = gpd.read_file(fires_path)
print("Number of rows...", len(fires))

rows_discarded = len(fires) - len(fires[fires['YEAR_FIRE'].str.len() == 4])
fires_2 = fires[fires['YEAR_FIRE'].str.len() == 4]
fires_0 = fires[fires['YEAR_FIRE'].str.len() != 4]
print("Number of rows discarded:", rows_discarded)
print("These are the buggy entries in the FIRE_YEAR column...",fires_0.YEAR_FIRE.unique())
print("......")
print("I will convert to int the YEAR_FIRE column of the non buggy data...")

#fires_2.loc['YEAR_FIRE'] = fires_2['YEAR_FIRE'].astype(int)
fires_2.loc[:, 'YEAR_FIRE'] = fires_2['YEAR_FIRE'].astype(int) # to avoid setting with copy warning
print("The filtered dataset comprises of ", len(fires_2), "rows, spanning the years" , fires_2.YEAR_FIRE.min(), "to", fires_2.YEAR_FIRE.max())

# in the following, fires_2 will be our geo dataframe with the fires.
# Now selecting just the fires_2 with year > 1990
fires_2 = fires_2[fires_2['YEAR_FIRE'] > 1990]
print("After the filtering the final filtered dataset comprises of ", len(fires_2), "rows, spanning the years" , fires_2.YEAR_FIRE.min(), "to", fires_2.YEAR_FIRE.max())
Number of rows... 1217
Number of rows discarded: 395
These are the buggy entries in the FIRE_YEAR column... ['Unknown' '1995 - 2015 - 2012' '1995 - 2015 - 2021' '1995 - 2012 - 2021'
 '1995 - 2006 - 2010' '1995 - 1999 - 2002' '1995 - 2000 - 2021'
 '1986 - 1995 - 2019' '1986 - 1998 - 2004' '1988 - 1993 - 2001'
 '1986 - 2011' '1986 - 2012' '1986 - 1997' '1986 - 1995' '1998 - 2020'
 '1986 - 1999 - 2012' '1986 - 2003 - 2012' '1986 - 2004 - 2012'
 '1986 - 2006 - 2012' '1986 - 2007 - 2012' '1989 - 1994' '1989 - 2003'
 '1989 - 2005' '1989 - 2020' '1989 - 2014' '1989 -2012' '1990 - 1994'
 '1995 - 2010' '1995 - 2017' '1986 - 2000 - 2021' '1986 - 2001 - 2003'
 '1986 - 2001 - 2022' '1986 - 2002 - 2022' '1988 - 2001' '1988 - 2003'
 '1988 - 2009' '1988 - 2012' '1988 - 1991' '1988 - 1994' '1988 - 2007'
 '1986 - 2016' '1986 - 2022' '1986 - 1994' '1986 - 2005' '1994 - 2003'
 '1994 - 2011' '1994 - 1987' '1990 - 2016' '1990 - 2020' '2005 - 2020'
 '2005 - 2013' '2006 - 2012' '1986 - 2004 - 2022' '1986 - 1994 - 1997'
 '1986 - 1994 - 1996' '1994 - 2005' '1994 - 2015' '1994 - 2002'
 '1994 - 2000' '2002 - 2020' '2003 - 2004' '1994 - 2006' '2006 - 2010'
 '2007 - 2012' '2008 - 2010' '2008 - 2015' '2008 - 2020' '1986 - 1993'
 '1986 - 2000' '1986 - 2001' '1993 - 2005' '2003 - 2011' '2003 - 2012'
 '2003 - 2015' '2003 - 2020' '2003 - 2019' '1999 - 2002' '1999 - 2004'
 '1999 - 2006' '1999 - 2012' '1990 - 1993' '1990 - 2001' '1993 - 1995'
 '1993 - 1994' '1995 - 2012' '2004 - 2017' '2004 - 2012' '2004 - 2015'
 '2004 - 2006' '2004 - 2022' '1994 - 2003 - 2011' '1994 - 2005 - 2013'
 '1994 - 2005 - 2015' '1995 - 2000 - 2015' '1995 - 2000 - 2012'
 '1990 - 1991' '1990 - 2019' '1994 - 1998' '1995 - 2009' '1995 - 2006'
 '1986 - 1994 - 2003' '1986 - 1994 - 2011' '2005 - 2007' '2005 - 2017'
 '2005 - 2008' '2005 - 2021' '1986 - 2007' '1986 - 2013' '2009 - 2020'
 '2009 - 2016' '1986 - 1995 - 1999' '1986 - 1995 - 2002' '1986 - 2002'
 '1986 - 2003' '1986 - 2021' '1986 - 1988' '1986 - 1996' '1986 - 2004'
 '1986 - 1999 - 2002' '1986 - 2003 - 2011' '1986 - 1995 - 1997'
 '1986 - 2000 - 2010' '1986 - 2000 - 2022' '1994 - 2016' '1994 - 1995'
 '1994 - 2017' '2009 - 2019' '2018 - 2019' '2016 - 2019' '1993 - 2001'
 '1993 - 2003' '1993 - 2000' '1993 - 2022' '2012 - 2016' '2016 - 2021'
 '2015 - 2014' '2014 - 2022' '2012 - 2021' '2000 - 2018' '2000 - 2016'
 '2000 - 2021' '2000 - 2006' '2000 - 2003' '2000 - 2012' '2000 - 2007'
 '1988 - 2001 - 2003' '1989 - 2012 - 2017' '1991 - 1993' '1991 - 1994'
 '1991 - 2016' '1994 - 2012' '1995 - 1999' '1995 - 2002' '1996 - 2006'
 '1996 - 2000' '1989 - 1994 - 2001' '1989 - 1992 - 1994' '1986 - 2009'
 '1986 - 1999' '1986 - 1998' '1994 - 2007' '1994 - 2009' '1987 - 2012'
 '1986 - 1989 - 2000' '1986 - 1999 - 2004' '1986 - 1999 - 2006'
 '1989 - 2017' '1989 - 2012' '1989 - 2013' '1989 - 2001' '1988 - 1993'
 '1994 - 2013' '2013 - 2019' '2015 - 2019' '2012 - 2019' '2018 - 2012'
 '2018 - 2016' '1986 - 2015' '1994 - 1999' '1989 - 1993' '1989 - 1991'
 '1994 - 2020' '2000 - 2010' '2001 - 2003' '2001 - 2020' '1986 - 1989'
 '1991 - 2009' '1991 - 2019' '1997 - 2015' '1989 - 1992' '1989 - 2000'
 '1989 - 2007' '2001 - 2010' '2001 - 2018' '2002 - 2017' '1991 - 2020'
 '1991 - 1998' '1991 - 2000' '1995 - 2000' '1995 - 2015' '1997 - 2014'
 '1997 - 2006' '1997 - 2007' '1997 - 2005' '1989 - 2000 - 2020'
 '1989 - 2000 - 2018' '1994 - 2001' '1986 - 2012 - 2018'
 '1986 - 1993 - 2001' '1986 - 1993 - 2000' '1986 - 1993 - 2002'
 '1986 - 1993 - 2022' '1995 - 2021' '1995 - 1997' '1995 - 2013'
 '2012 - 2018' '2017 - 2020' '2017 - 2016' '1990 - 2016 - 2021'
 '1990 - 1993 - 1994' '1990 - 1993 - 2005' '1993 - 1994 - 2002'
 '1994 - 1995 - 1999' '1994 - 2014' '1997 - 2014 - 2015'
 '1998 - 2009 - 2020' '1999 - 2004 - 2012' '2012 - 2017' '2016 - 2020'
 '1994 - 1997' '1999 - 2006 - 2012' '2014 - 2015 - 2019'
 '2012 - 2016 - 2018' '1986 - 1999 - 2004 - 2012'
 '1986 - 1999 - 2006 - 2012' '1986 - 1993 - 2001 - 2022'
 '1986 - 1993 - 2002 - 2022' '1986 - 1994 - 2003 - 2011'
 '1986 - 1995 - 1999 - 2002' '1986 - 2006' '1992 - 1994' '1992 - 1995'
 '1992 - 2011' '1992 - 2020' '1998 - 2003' '1998 - 2009' '1994 - 1996'
 '1998 - 2006' '1995 - 2000 - 2012 - 2015' '1995 - 2000 - 2015 - 2021'
 '1995 - 2000 - 2012 - 2021' '1995 - 2012 - 2015 - 2021'
 '1999 - 2004 - 2006 - 2012' '1986 - 1999 - 2004 - 2006 - 2012'
 '1995 - 2000 - 2012 - 2015 - 2021' '1993 - 2002' '1998 - 2005'
 '1998 - 2004' '1986 - 2016 - 2022' '1986 - 2005 - 2007']
......
I will convert to int the YEAR_FIRE column of the non buggy data...
The filtered dataset comprises of  822 rows, spanning the years 1986 to 2022
After the filtering the final filtered dataset comprises of  717 rows, spanning the years 1991 to 2022

To avoid overfitting I am going to delete two big fires from the data set. They are located in the central part of Catalonia and they did not represent the classical fire regime in the region. 198969,0 199033,0

fires_2 = fires_2.loc[(fires_2['OBJECTID'] != 199033.0) & (fires_2['OBJECTID'] != 198969.0),:]

The geodataframe of the fires is rasterized and saved to file(raster), using the corine land cover raster as a reference.

# Rasterize the fires...
raster_clc_clipped = "./data_cat/hazard/input_hazard/veg_corine_reclass_clip.tif"
fires_rasterized = rasterize_numerical_feature(fires_2, raster_clc_clipped, column=None)
# save to file
save_raster_as(fires_rasterized, fires_raster_path, raster_clc_clipped)
Shape of the reference raster: (2554, 2810)
Transform of the reference raster: | 100.00, 0.00, 3488731.36|
| 0.00,-100.00, 2241986.65|
| 0.00, 0.00, 1.00|
Features.rasterize is launched...
Features.rasterize is done...

Visualization#

Check that the rasterized fires can assume just the values 0 and 1 and there are no nan values.

# Values of the rasterized fires
print('The values of fire rasterised file are: ', np.unique(fires_rasterized))

# Does the rasterized files have nan?
print('There is a NaN in file: ', np.isnan(fires_rasterized).any())

# Visualize the rasterized fires
fig, ax = plt.subplots(figsize=(10, 10))
cx = ax.imshow(np.where(ref == -9999, np.nan, fires_rasterized), cmap = 'inferno')
cbar = fig.colorbar(cx, ax =ax, shrink = 0.5, ticks = [0, 1])
ax.set_title('Rasterized Historical Fires')
ax.set_xticks([])
ax.set_yticks([])
plt.show()
The values of fire rasterised file are:  [0. 1.]
There is a NaN in file:  False
../../../../_images/fde730d7ebbcc9cc21b7c8d26f00e2b04a747fe4b509c681cddc8e05ac6d356a.png

Creating our model#

A class is defined to handle the path and metadata of a raster:

Hide code cell source
class MyRaster:
    """
    dem_raster = MyRaster(dem_path, "dem")
    dem_raster.read_raster()
    slope_raster = MyRaster(slope_path, "slope")
    slope_raster.read_raster()
    northing_raster = MyRaster(aspect_path, "aspect")
    northing_raster.read_raster()
    easting_raster = MyRaster(easting_path, "easting")
    easting_raster.read_raster()
    roughness_raster = MyRaster(roughness_path, "roughness")
    dem_rasters = [dem_raster, slope_raster, northing_raster, easting_raster, roughness_raster]
    """
    def __init__(self, path, label):
        self.path = path
        self.label = label
        self.data = None
        self.mask = None
        self.nodata = None
        self.read_raster()

    def read_raster(self):
        with rasterio.open(self.path) as src:
            self.data = src.read(1, masked=True)
            self.mask = src.read_masks(1)
            self.nodata = src.nodata

    def set_data(self, data):
        self.data = data

    def get_data(self):
        return self.data

    def get_mask(self):
        return self.mask

    def get_nodata(self):
        return self.nodata

    def get_label(self):
        return self.label

    def get_path(self):
        return self.path

    def get_shape(self):
        return self.data.shape

    def get_transform(self):
        return self.transform

    def get_crs(self):
        return self.crs

    def get_bounds(self):
        return self.bounds

    def get_meta(self):
        return self.meta

    def get_height(self):
        return self.height

    def get_width(self):
        return self.width

    def get_dtype(self):
        return self.dtype

    def get_count(self):
        return self.count

    def get_index(self):
        return self.index

    def get_nodatavals(self):
        return self.nodatavals

Some functions are defined to assemble the dictionaries with the rasters to be used in the model:

Hide code cell source
climate_paths = []
clim_var_names = ["MWMT", "TD", "AHM", "SHM", "DDbelow0", "DDabove18", "MAT", "MAP", "Tave_sm", "Tmax_sm", "PPT_at", "PPT_sm", "PPT_sp", "PPT_wt"]
climate_root = "./resized_climate"


def assemble_climate_dict(clim_category, clim_cat_namefile, clim_var_names, climate_root = "./resized_climate"):
    """
    Parameters
    ----------
    clim_category : string that can have the following values: "rcp45_2011_2020", "rcp45_2021_2040", "hist_1961_1990", "hist_1991_2010"
    clim_cat_namefile: string which classifies the filenames of the category (tipically the years)
        examples are: "201120", "202140", "196190", "199110"
    clim_var_names: a list of strings with the names of the climate variables to be used in the model. An example of such list is:
    ["MWMT", "TD", "AHM", "SHM", "DDbelow0", "DDabove18", "MAT", "MAP", "Tave_sm", "Tmax_sm", "PPT_at", "PPT_sm", "PPT_sp", "PPT_wt"]
    Returns: a dictionary with the climate variables as keys and the MyRaster objects as values
    sample usage:
    climate_dict = assemble_climate_dict("hist_1991_2010", "1991_2010", clim_var_names)
    future_climate_dict = assemble_climate_dict("rcp45_2021_2040", "202140", clim_var_names)
    """

    climate_paths = []
    for clim_var_name in clim_var_names:
        climate_paths.append(os.path.join(climate_root, clim_category, clim_var_name + "_" + clim_cat_namefile + ".tif"))

    climate_dict = {}
    for path, label in zip(climate_paths, clim_var_names):
        climate_dict[label] = MyRaster(path, label)
        climate_dict[label].read_raster()
    return climate_dict


def assemble_dem_dict(dem_paths, dem_labels):
    """Assemble the dictionary with all the rasters to be used in the model which are related to topography.
    
    :param dem_paths: paths to the dem and related rasters
    :param dem_labels: labels of the dem and related rasters
    :return: a dictionary with all the rasters to be used in the model
    usage example:
    dem_paths = [dem_path, slope_path, aspect_path, easting_path, northing_path, roughness_path]
    dem_labels = ["dem", "slope", "aspect", "easting", "northing", "roughness"]
    dem_dict = assemble_dem_dict(dem_paths, dem_labels)
    """
    # Create the dictionary
    dem_dict = {}
    for path, label in zip(dem_paths, dem_labels):
        dem_dict[label] = MyRaster(path, label)
        dem_dict[label].read_raster()

    return dem_dict


def assemble_veg_dictionary(veg_path, dem_path, verbose = False):
    """Assemble the dictionary with all the rasters to be used in the model which are related to vegetation.
    
    This comprises of:
    - the vegetation raster
    - the rasters of vegetation densities: a fuzzy counting of the neighboring vegetation for each type of vegetation
        See Tonini et al. 2020 for more details.

    :param veg_path: path to the vegetation raster.
    :return: a dictionary with all the veg rasters to be used in the model
    
    usage example:
    veg_dict = assemble_veg_dictionary(veg_path, dem_path, verbose = True)
    """
    # Create the dictionary
    veg_dict = {}
    veg_dict["veg"] = MyRaster(veg_path, "veg")
    veg_dict["veg"].read_raster()
    veg_arr = veg_dict["veg"].data.astype(int)
    dem_raster = MyRaster(dem_path, "dem")
    dem_raster.read_raster()
    dem_arr = dem_raster.data
    dem_nodata = dem_raster.nodata

    veg_mask = np.where(veg_arr == 0, 0, 1)
    # complete the mask selecting the points where also dem exists.
    mask = (veg_mask == 1) & (dem_arr != dem_nodata)

    # evaluation of perc just in vegetated area, non vegetated are grouped in code 0
    veg_int = veg_arr.astype(int)
    veg_int = np.where(mask == 1, veg_int, 0)
    window_size = 2
    types = np.unique(veg_int)
    if verbose:
        print("types of vegetation in the veg raster:", types)
    types_presence = {}

    counter = np.ones((window_size*2+1, window_size*2+1))
    take_center = 1
    counter[window_size, window_size] = take_center
    counter = counter / np.sum(counter)

    # perc --> neighbouring vegetation generation
    for t in tqdm(types):
        density_entry = 'perc_' + str(int(t))
        if verbose:
            print(f'Processing vegetation density {density_entry}')
        temp_data = 100 * signal.convolve2d(veg_int==t, counter, boundary='fill', mode='same')
        temp_raster = MyRaster(dem_path, density_entry) # the path is dummy... I need just the other metadata.
        temp_raster.read_raster()
        temp_raster.set_data(temp_data)
        veg_dict[density_entry] = temp_raster

    return veg_dict, mask


def preprocessing(dem_dict, veg_dict, climate_dict, fires_raster, mask):
    """
    Usage:
    X_all, Y_all, columns = preprocessing(dem_dict, veg_dict, climate_dict, fires_raster, mask)
    """
    # creaate X and Y datasets
    n_pixels = len(dem_dict["dem"].data[mask])

    # the number of features is given by all the dem layers, all the veg layers, all the climate layers.
    # TODO: add the possibility to add other layers which belong to a "misc" category.
    n_features = len(dem_dict.keys())+ len(veg_dict.keys()) + len(climate_dict.keys())
    #create the dictionary with all the data
    data_dict = {**dem_dict, **veg_dict, **climate_dict}

    X_all = np.zeros((n_pixels, n_features)) # This is going to be big... Maybe use dask?
    Y_all = fires_raster.data[mask]

    print('Creating dataset for RandomForestClassifier')
    columns = data_dict.keys()
    for col, k in enumerate(data_dict):
        print(f'Processing column: {k}')
        data = data_dict[k]
        # data is a MyRaster object and data.data is the numpy array with the data
        X_all[:, col] = data.data[mask]

    return X_all, Y_all, columns


def prepare_sample(X_all, Y_all, percentage=0.1):
    """
    Usage:
    model, X_train, X_test, y_train, y_test = train(X_all, Y_all, percentage)
    
    parameters:
    X_all: the X dataset with the descriptive features
    Y_all: the Y dataset with the target variable (burned or not burned)
    percentage: the percentage of the dataset to be used for training
    """
    # randomforest parameters
    max_depth = 10
    number_of_trees = 100
    # filter df taking info in the burned points
    fires_rows = Y_all.data != 0
    print(f'Number of burned points: {np.sum(fires_rows)}')
    X_presence = X_all[fires_rows]

    # sampling training set
    print(' I am random sampling the dataset ')
    # reduction of burned points --> reduction of training points
    reduction = int((X_presence.shape[0]*percentage))
    print(f"reducted df points: {reduction} of {X_presence.shape[0]}")

    # sampling and update presences
    X_presence_indexes = np.random.choice(X_presence.shape[0], size=reduction, replace=False)

    X_presence = X_presence[X_presence_indexes, :]
    # select not burned points

    X_absence = X_all[~fires_rows] #why it is zero?
    print("X_absence.shape[0]", X_absence.shape[0])
    print("X_presence.shape[0]", X_presence.shape[0])
    X_absence_choices_indexes = np.random.choice(X_absence.shape[0], size=X_presence.shape[0], replace=False)

    X_pseudo_absence = X_absence[X_absence_choices_indexes, :]
    # create X and Y with same number of burned and not burned points
    X = np.concatenate([X_presence, X_pseudo_absence], axis=0)
    Y = np.concatenate([np.ones((X_presence.shape[0],)), np.zeros((X_presence.shape[0],))])
    # create training and testing df with random sampling
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)

    print(f'Running RF on data sample: {X_train.shape}')
    model  = RandomForestClassifier(n_estimators=number_of_trees, max_depth = max_depth, verbose = 2)

    return model, X_train, X_test, y_train, y_test


def fit_and_print_stats(model, X_train, y_train, X_test, y_test, columns):
    """Fit the model and prints the stats on the training and test datasets.
    
    Input:
    model: the model to fit
    X_train: the training dataset
    y_train: the training labels
    X_test: the test dataset
    y_test: the test labels
    columns: the columns of the dataset (list of strings that were the keys of the dictionary)
    
    example usage:
    fit_and_print_stats(model, X_train, y_train, X_test, y_test, columns)
    """
    # fit model
    model.fit(X_train, y_train)
    # stats on training df
    p_train = model.predict_proba(X_train)[:,1]

    auc_train = sklearn.metrics.roc_auc_score(y_train, p_train)
    print(f'AUC score on train: {auc_train:.2f}')

    # stats on test df
    p_test = model.predict_proba(X_test)[:,1]
    auc_test = sklearn.metrics.roc_auc_score(y_test, p_test)
    print(f'AUC score on test: {auc_test:.2f}')
    mse = sklearn.metrics.mean_squared_error(y_test, p_test)
    print(f'MSE: {mse:.2f}')
    p_test_binary = model.predict(X_test)
    accuracy = sklearn.metrics.accuracy_score(y_test, p_test_binary)
    print(f'accuracy: {accuracy:.2f}')

    # features impotance
    print('I am evaluating features importance')
    imp = model.feature_importances_

    perc_imp_list = list()
    list_imp_noPerc = list()

    # separate the perc featuers with the others
    for i,j in zip(columns, imp):
        if i.startswith('perc_'):
            perc_imp_list.append(j)
        else:
            list_imp_noPerc.append(j)

    # aggregate perc importances
    perc_imp = sum(perc_imp_list)
    # add the aggregated result
    list_imp_noPerc.append(perc_imp)

    # list of columns of interest
    cols = [col for col in columns if not col.startswith('perc_')]
    cols.append('perc')

    # print results
    print('importances')
    dict_imp = dict(zip(cols, list_imp_noPerc))
    dict_imp_sorted = {k: v for k, v in sorted(dict_imp.items(),
                                                key=lambda item: item[1],
                                                reverse=True)}
    for i in dict_imp_sorted:
        print('{} : {}'.format(i, round(dict_imp_sorted[i], 2)))


def get_results( model, X_all,dem_arr, mask):
    """Get the results of the model and returns a raster with the results
    
    Input:
    model: the model to fit
    X_all: the dataset with the descriptive features
    dem_arr: the dem array data
    mask: the mask of the dem array with all the valid and burnable pixels
    
    example usage:
    Y_raster = get_results( model, X_all,dem_arr, mask)
    """
    # prediction over all the points
    Y_out = model.predict_proba(X_all)
    # array of predictions over the valid pixels
    Y_raster = np.zeros_like(dem_arr)
    Y_raster[mask] = Y_out[:,1]
    # clip susc where dem exsits
    Y_raster[~mask] = -1
    return Y_raster

Train the model#

using the previously defined functions:

climate_dict = assemble_climate_dict("hist_1991_2010", "199110", clim_var_names)
dem_paths = [dem_path_clip, slope_path, aspect_path, easting_path, northing_path, roughness_path]
dem_labels = ["dem", "slope", "aspect", "easting", "northing", "roughness"]
dem_dict = assemble_dem_dict(dem_paths, dem_labels)
veg_dict, mask = assemble_veg_dictionary(clc_path_clip_nb, dem_path_clip, verbose = True)
fires_raster = MyRaster(fires_raster_path, "fires")

X_all, Y_all, columns = preprocessing(dem_dict, veg_dict, climate_dict, fires_raster, mask)

model, X_train, X_test, y_train, y_test = prepare_sample(X_all, Y_all, percentage=0.1)
fit_and_print_stats(model, X_train, y_train, X_test, y_test, columns)
Hide code cell output
types of vegetation in the veg raster: [  0 211 212 213 221 222 223 231 241 242 243 311 312 313 321 322 323 324
 334]
Processing vegetation density perc_0
Processing vegetation density perc_211
Processing vegetation density perc_212
Processing vegetation density perc_213
Processing vegetation density perc_221
Processing vegetation density perc_222
Processing vegetation density perc_223
Processing vegetation density perc_231
Processing vegetation density perc_241
Processing vegetation density perc_242
Processing vegetation density perc_243
Processing vegetation density perc_311
Processing vegetation density perc_312
Processing vegetation density perc_313
Processing vegetation density perc_321
Processing vegetation density perc_322
Processing vegetation density perc_323
Processing vegetation density perc_324
Processing vegetation density perc_334
100%|██████████| 19/19 [00:13<00:00,  1.36it/s]
Creating dataset for RandomForestClassifier
Processing column: dem
Processing column: slope
Processing column: aspect
Processing column: easting
Processing column: northing
Processing column: roughness
Processing column: veg
Processing column: perc_0
Processing column: perc_211
Processing column: perc_212
Processing column: perc_213
Processing column: perc_221
Processing column: perc_222
Processing column: perc_223
Processing column: perc_231
Processing column: perc_241
Processing column: perc_242
Processing column: perc_243
Processing column: perc_311
Processing column: perc_312
Processing column: perc_313
Processing column: perc_321
Processing column: perc_322
Processing column: perc_323
Processing column: perc_324
Processing column: perc_334
Processing column: MWMT
Processing column: TD
Processing column: AHM
Processing column: SHM
Processing column: DDbelow0
Processing column: DDabove18
Processing column: MAT
Processing column: MAP
Processing column: Tave_sm
Processing column: Tmax_sm
Processing column: PPT_at
Processing column: PPT_sm
Processing column: PPT_sp
Processing column: PPT_wt
Number of burned points: 152232
 I am random sampling the dataset 
reducted df points: 15223 of 152232
X_absence.shape[0] 7024508
X_presence.shape[0] 15223
Running RF on data sample: (20398, 40)
building tree 1 of 100
building tree 2 of 100
building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100
building tree 45 of 100
building tree 46 of 100
building tree 47 of 100
building tree 48 of 100
building tree 49 of 100
building tree 50 of 100
building tree 51 of 100
building tree 52 of 100
building tree 53 of 100
building tree 54 of 100
building tree 55 of 100
building tree 56 of 100
building tree 57 of 100
building tree 58 of 100
building tree 59 of 100
building tree 60 of 100
building tree 61 of 100
building tree 62 of 100
building tree 63 of 100
building tree 64 of 100
building tree 65 of 100
building tree 66 of 100
building tree 67 of 100
building tree 68 of 100
building tree 69 of 100
building tree 70 of 100
building tree 71 of 100
building tree 72 of 100
building tree 73 of 100
building tree 74 of 100
building tree 75 of 100
building tree 76 of 100
building tree 77 of 100
building tree 78 of 100
building tree 79 of 100
building tree 80 of 100
building tree 81 of 100
building tree 82 of 100
building tree 83 of 100
building tree 84 of 100
building tree 85 of 100
building tree 86 of 100
building tree 87 of 100
building tree 88 of 100
building tree 89 of 100
building tree 90 of 100
building tree 91 of 100
building tree 92 of 100
building tree 93 of 100
building tree 94 of 100
building tree 95 of 100
building tree 96 of 100
building tree 97 of 100
building tree 98 of 100
building tree 99 of 100
building tree 100 of 100
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
AUC score on train: 0.98
AUC score on test: 0.97
MSE: 0.07
accuracy: 0.91
I am evaluating features importance
importances
perc : 0.18
dem : 0.12
slope : 0.1
roughness : 0.08
aspect : 0.06
MWMT : 0.06
AHM : 0.06
veg : 0.05
DDabove18 : 0.04
SHM : 0.03
PPT_sm : 0.03
Tave_sm : 0.03
PPT_wt : 0.02
PPT_sp : 0.02
MAT : 0.02
DDbelow0 : 0.02
Tmax_sm : 0.02
MAP : 0.02
PPT_at : 0.02
TD : 0.02
northing : 0.01
easting : 0.01
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished

Run the model for the present data#

# In this cell, results of the model will be gotton. It is still a simple array.
Y_raster = get_results(model,X_all ,dem_dict["dem"].get_data().data, mask)

# I convert that array to a raster and save it to file
save_raster_as(Y_raster, output_susc_present, dem_path_clip)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   30.3s finished

I want to extract the 50, 75, 90, 95 quantiles of Y_raster[Y_raster>=0.0]. Since the susceptibility is the output of a random forest classifier with proba = True, its outputs are distributed between 0 and 1. I need to extract the quantiles of the positive values only, in order to get meaningful data from the arbitrary decisions of the classifier.

print(
    'Min susc is: ', Y_raster[Y_raster>=0.0].min(), '\n'
    'Max susc is: ', Y_raster[Y_raster>=0.0].max(), '\n'
    'Mean susc is: ', Y_raster[Y_raster>=0.0].mean(), '\n'
    'Standard deviation is:', Y_raster[Y_raster>=0.0].std(), '\n\n'
    'q1:', np.quantile(Y_raster[Y_raster>=0.0], 0.5), '\n'
    'q2:', np.quantile(Y_raster[Y_raster>=0.0], 0.75), '\n'
    'q3:', np.quantile(Y_raster[Y_raster>=0.0], 0.9), '\n'
    'q4:', np.quantile(Y_raster[Y_raster>=0.0], 0.95)
)
Min susc is:  0.0 
Max susc is:  0.9891565 
Mean susc is:  0.18083496 
Standard deviation is: 0.27911165 

q1: 0.0008383446838706732 
q2: 0.29957183450460434 
q3: 0.6595171868801117 
q4: 0.8097123712301253

Visualizing the present Susceptibility#

Plot the susceptibility for the present climate:

# Open the raster
raster_susceptibility_path = output_susc_present
# Open the shapefile
catalonia_adm_path = "./data_cat/adm_level_stanford/catalonia_adm_3035.shp"
catalonia = gpd.read_file(catalonia_adm_path)

with rasterio.open(raster_susceptibility_path) as src:
    # Read the raster band
    band = src.read(1)
    plot_raster_V2(band, ref, cmap='nipy_spectral', title='Susceptibilty Present Climate', dpi = 200)
../../../../_images/bf7247617739129477229f42592e8d897f64b3e314c6cad2995d752a90af7969.png

Run the model on future data#

To project the model in the future, I need to repeat the same steps as above, but with the future climate data. Of course, DEM will be the same, and vegetation will be the same (in that case, it is a simplification, but it is ok for now).

# RCP45
future_climate_dict_4p5_202140 = assemble_climate_dict("rcp45_2021_2040", "202140", clim_var_names)
future_climate_dict_4p5_204160 = assemble_climate_dict("rcp45_2041_2060", "204160", clim_var_names)

# RCP85
future_climate_dict_8p5_202140 = assemble_climate_dict('rcp85_2021_2040', '202140', clim_var_names)
future_climate_dict_8p5_204160 = assemble_climate_dict('rcp85_2041_2060', '204160', clim_var_names)
# RCP45
X_all_4p5_202140, _, columns_202140 = preprocessing(dem_dict, veg_dict, future_climate_dict_4p5_202140, fires_raster, mask)
X_all_4p5_204160, _, columns_204160 = preprocessing(dem_dict, veg_dict, future_climate_dict_4p5_204160, fires_raster, mask)

# RCP85
X_all_8p5_202140, _, columns_202140 = preprocessing(dem_dict, veg_dict, future_climate_dict_8p5_202140, fires_raster, mask)
X_all_8p5_204160, _, columns_204160 = preprocessing(dem_dict, veg_dict, future_climate_dict_8p5_204160, fires_raster, mask)
Hide code cell output
Creating dataset for RandomForestClassifier
Processing column: dem
Processing column: slope
Processing column: aspect
Processing column: easting
Processing column: northing
Processing column: roughness
Processing column: veg
Processing column: perc_0
Processing column: perc_211
Processing column: perc_212
Processing column: perc_213
Processing column: perc_221
Processing column: perc_222
Processing column: perc_223
Processing column: perc_231
Processing column: perc_241
Processing column: perc_242
Processing column: perc_243
Processing column: perc_311
Processing column: perc_312
Processing column: perc_313
Processing column: perc_321
Processing column: perc_322
Processing column: perc_323
Processing column: perc_324
Processing column: perc_334
Processing column: MWMT
Processing column: TD
Processing column: AHM
Processing column: SHM
Processing column: DDbelow0
Processing column: DDabove18
Processing column: MAT
Processing column: MAP
Processing column: Tave_sm
Processing column: Tmax_sm
Processing column: PPT_at
Processing column: PPT_sm
Processing column: PPT_sp
Processing column: PPT_wt
Creating dataset for RandomForestClassifier
Processing column: dem
Processing column: slope
Processing column: aspect
Processing column: easting
Processing column: northing
Processing column: roughness
Processing column: veg
Processing column: perc_0
Processing column: perc_211
Processing column: perc_212
Processing column: perc_213
Processing column: perc_221
Processing column: perc_222
Processing column: perc_223
Processing column: perc_231
Processing column: perc_241
Processing column: perc_242
Processing column: perc_243
Processing column: perc_311
Processing column: perc_312
Processing column: perc_313
Processing column: perc_321
Processing column: perc_322
Processing column: perc_323
Processing column: perc_324
Processing column: perc_334
Processing column: MWMT
Processing column: TD
Processing column: AHM
Processing column: SHM
Processing column: DDbelow0
Processing column: DDabove18
Processing column: MAT
Processing column: MAP
Processing column: Tave_sm
Processing column: Tmax_sm
Processing column: PPT_at
Processing column: PPT_sm
Processing column: PPT_sp
Processing column: PPT_wt
Creating dataset for RandomForestClassifier
Processing column: dem
Processing column: slope
Processing column: aspect
Processing column: easting
Processing column: northing
Processing column: roughness
Processing column: veg
Processing column: perc_0
Processing column: perc_211
Processing column: perc_212
Processing column: perc_213
Processing column: perc_221
Processing column: perc_222
Processing column: perc_223
Processing column: perc_231
Processing column: perc_241
Processing column: perc_242
Processing column: perc_243
Processing column: perc_311
Processing column: perc_312
Processing column: perc_313
Processing column: perc_321
Processing column: perc_322
Processing column: perc_323
Processing column: perc_324
Processing column: perc_334
Processing column: MWMT
Processing column: TD
Processing column: AHM
Processing column: SHM
Processing column: DDbelow0
Processing column: DDabove18
Processing column: MAT
Processing column: MAP
Processing column: Tave_sm
Processing column: Tmax_sm
Processing column: PPT_at
Processing column: PPT_sm
Processing column: PPT_sp
Processing column: PPT_wt
Creating dataset for RandomForestClassifier
Processing column: dem
Processing column: slope
Processing column: aspect
Processing column: easting
Processing column: northing
Processing column: roughness
Processing column: veg
Processing column: perc_0
Processing column: perc_211
Processing column: perc_212
Processing column: perc_213
Processing column: perc_221
Processing column: perc_222
Processing column: perc_223
Processing column: perc_231
Processing column: perc_241
Processing column: perc_242
Processing column: perc_243
Processing column: perc_311
Processing column: perc_312
Processing column: perc_313
Processing column: perc_321
Processing column: perc_322
Processing column: perc_323
Processing column: perc_324
Processing column: perc_334
Processing column: MWMT
Processing column: TD
Processing column: AHM
Processing column: SHM
Processing column: DDbelow0
Processing column: DDabove18
Processing column: MAT
Processing column: MAP
Processing column: Tave_sm
Processing column: Tmax_sm
Processing column: PPT_at
Processing column: PPT_sm
Processing column: PPT_sp
Processing column: PPT_wt

Create Susceptibility maps for future climate and save them#

Obtain the susceptibilty for the future climate:

# RCP45
Y_raster_4p5_202140 = get_results(model, X_all_4p5_202140, dem_dict["dem"].get_data().data, mask)
save_raster_as(Y_raster_4p5_202140, output_susc_4p5_2021_2040, dem_path_clip)
Y_raster_4p5_204160 = get_results( model, X_all_4p5_204160, dem_dict["dem"].get_data().data, mask)
save_raster_as(Y_raster_4p5_204160, output_susc_4p5_2041_2060, dem_path_clip)

# RCP85
Y_raster_8p5_202140 = get_results(model, X_all_8p5_202140, dem_dict["dem"].get_data().data, mask)
save_raster_as(Y_raster_8p5_202140, output_susc_8p5_2021_2040, dem_path_clip)
Y_raster_8p5_204160 = get_results( model, X_all_8p5_204160, dem_dict["dem"].get_data().data, mask)
save_raster_as(Y_raster_8p5_204160, output_susc_8p5_2041_2060, dem_path_clip)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.6s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   33.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   32.4s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   32.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   32.5s finished

Visualization of future susceptibility maps#

catalonia_adm_path = "./data_cat/adm_level_stanford/catalonia_adm_3035.shp"
susc_map_paths = [output_susc_present,
                output_susc_4p5_2021_2040, output_susc_4p5_2041_2060,
                output_susc_8p5_2021_2040, output_susc_8p5_2041_2060]
catalonia = gpd.read_file(catalonia_adm_path)

# This function is defined to plot susc and fires
def plot_susc_with_fires(output_susc_present:str, catalonia:gpd, fires:gpd):
    with rasterio.open(output_susc_present) as src:
        # Read the raster band
        band = src.read(1)

        fig, ax = plt.subplots(figsize=(15, 15))
        # Plot the raster
        rasterio.plot.show(src, ax=ax, vmin=0, vmax=1, cmap = 'viridis')
        name = os.path.basename(map).split('suscep_')[-1].split('.tif')[0]
        plt.title(f'Susceptibility {name} climate')
        # Plot the shapefile
        catalonia.plot(ax=ax, facecolor="none", edgecolor="red", linewidth=2)
        fires_2.plot(ax=ax, facecolor="none", edgecolor="black", linewidth=1)


for map in susc_map_paths:
    plot_susc_with_fires(map, catalonia, fires_2)
../../../../_images/77eddd851728243c801e10902b40622fde7be1414bdf2abfe0daa885a4f33c2c.png ../../../../_images/a05e7b9fb2cadc9aaf3ef8fca84674f476d2ad2d194ffb16ff3c531d442170d6.png ../../../../_images/dd9d5b54769368b0f2db1a03a37356b3206c2bd4ca7909349b56f4f845689399.png ../../../../_images/5f924c77267611be33eece1bcb9a31adbbf2ac529f3294b1ed292c1b29ddcca5.png ../../../../_images/375c01ad40789e54e242cff3de1cf3c14f034a19c3f67820654b8f3ee100c8ef.png

Hazard#

Functions to define the hazard:

Hide code cell source
def corine_to_fuel_type(corine_codes_array, converter_dict, visualize_result = False):
    """Convert the corine land cover raster to a raster with the fuel types.
    
    The fuel types are defined in the converter_dict dictionary.
    """
    converted_band = np.vectorize(converter_dict.get)(corine_codes_array)
    converted_band[converted_band == None] = -1
    #convert to int
    converted_band = converted_band.astype(int)
    if visualize_result:
        plt.matshow(converted_band)
        # discrete colorbar
        cbar = plt.colorbar(ticks=[0, 1, 2, 3, 4, 5, 6])
    return converted_band


def susc_classes( susc_arr, quantiles):
    '''Take a raster map and a list of quantiles and returns a categorical raster map related to the quantile classes.
    
    Parameters:
    susc_arr: the susceptibility array
    quantiles: the quantiles to use to create the classes (see np.digitize documentation)
    '''
    bounds = list(quantiles)
    # Convert the raster map into a categorical map based on quantile values
    out_arr = np.digitize(susc_arr, bounds, right=True)
    out_arr = out_arr.astype(np.int8())
    return out_arr


def hazard_matrix(arr1, arr2):
    '''Takes two arrays and returns a matrix with the hazard values.
    
    arr1 take values on the rows -> susc., arr2 on the columns -> intensity
    
    Parameters:
    arr1: the susceptibility array
    arr2: the intensity array
    #   s\i  1 2 3 4
    #     1! 1 2 3 4
    #     2! 2 3 4 5
    #     3! 3 3 5 6
    '''
    matrix_values = np.array([
                            [1, 2, 3, 4],
                            [2, 3, 4, 5],
                            [3, 3, 5, 6]])
    # using fancy indexing
    combined_array = matrix_values[arr1 - 1, arr2 - 1]
    return combined_array


def contigency_matrix_on_array(xarr, yarr, xymatrix, nodatax, nodatay):
        '''
        xarr: 2D array, rows entry of contingency matrix
        yarr: 2D array, cols entry of contingency matrix
        xymatrix: 2D array, contingency matrix
        nodatax1: value for no data in xarr : if your array has nodata = np.nan >> nodatax or nodatay has to be 1
        nodatax2: value for no data in yarr : if your array has nodata = np.nan >> nodatax or nodatay has to be 1
        '''
        # if arr have nan, mask it with lowest class
        xarr = np.where(np.isnan(xarr)==True , 1, xarr)
        yarr = np.where(np.isnan(yarr)==True , 1, yarr)
        # convert to int
        xarr = xarr.astype(int)
        yarr = yarr.astype(int)

        mask = np.where(((xarr == nodatax) | (yarr ==nodatay)), 0, 1)

        # put lowest class in place of no data
        yarr[~mask] = 1
        xarr[~mask] = 1

        # apply contingency matrix
        output = xymatrix[ xarr - 1, yarr - 1]
        # mask out no data
        output[~mask] = 0
        return output

Create the intensity matrix by converting the CLC raster:

my_clc_raster = MyRaster(clc_path_clip_nb, "clc")

converter = pd.read_excel("./CORINE_to_FuelType.xlsx")
converter_dict = dict(zip(converter.veg.values, converter.aggr.values))

# I obtain the array of the fuel types converting the corine land cover raster
converted_band = corine_to_fuel_type(my_clc_raster.data.data, converter_dict)

# dtype of the converted band
print('data type is: ', converted_band.dtype)
print('values of original map(here corine) are:','\n', np.unique(converter_dict.keys()))
print('Values of fuel map are:','\n', converter_dict.values())
data type is:  int64
values of original map(here corine) are: 
 [dict_keys([111, 112, 121, 122, 123, 124, 131, 132, 133, 141, 142, 211, 212, 213, 221, 222, 223, 224, 231, 241, 242, 243, 244, 311, 312, 313, 321, 322, 323, 324, 331, 332, 333, 334, 335, 411, 412, 421, 422, 423, 511, 512, 521, 522, 523, 999])]
Values of fuel map are: 
 dict_values([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 4, 4, 1, 3, 3, 3, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Obtain the hazard map for present climate crossing the susceptibility map with the intensity map:

# compute quantiles for Y_raster
quantiles = np.quantile(Y_raster[Y_raster>=0.0], [0.5, 0.75 ])
print('quantiles are: ', quantiles)

# compute discrete susc array
susc_arr = susc_classes(Y_raster, quantiles) + 1 # >>>>>>>>>> I add 1 to avoid 0 values
print("Now I have just the susc classes", np.unique(susc_arr))

#
matrix_values = np.array([
                            [1, 2, 3, 4],
                            [2, 3, 4, 5],
                            [3, 3, 5, 6]])

# Compute discrete hazard
hazard_arr = contigency_matrix_on_array(susc_arr, converted_band, matrix_values, 0 ,-1)

# RCP45 and RCP85
Y_raster_4p5_202140 = MyRaster(output_susc_4p5_2021_2040, "susc_202140").data
Y_raster_4p5_204160 = MyRaster(output_susc_4p5_2041_2060, "susc_204160").data
Y_raster_8p5_202140 = MyRaster(output_susc_8p5_2021_2040, "susc_202140").data
Y_raster_8p5_204160 = MyRaster(output_susc_8p5_2041_2060, "susc_204160").data

# compute susc discrete array for future  RCP45 and RCP85
susc_arr_4p5_202140 = susc_classes( Y_raster_4p5_202140, quantiles) + 1# I add 1 to avoid 0 values
susc_arr_4p5_204160 = susc_classes( Y_raster_4p5_204160, quantiles) + 1# I add 1 to avoid 0 values
susc_arr_8p5_202140 = susc_classes( Y_raster_8p5_202140, quantiles) + 1# I add 1 to avoid 0 values
susc_arr_8p5_204160 = susc_classes( Y_raster_8p5_204160, quantiles) + 1# I add 1 to avoid 0 values

# compute hazard discrete array for future RCP45 and RCP85
hazard_arr_4p5_202140 = contigency_matrix_on_array( susc_arr_4p5_202140, converted_band, matrix_values,0, -1)
hazard_arr_4p5_204160 = contigency_matrix_on_array( susc_arr_4p5_204160, converted_band, matrix_values,0, -1)
hazard_arr_8p5_202140 = contigency_matrix_on_array( susc_arr_8p5_202140, converted_band, matrix_values,0, -1)
hazard_arr_8p5_204160 = contigency_matrix_on_array( susc_arr_8p5_204160, converted_band, matrix_values,0, -1)

# Save the hazard arrays to file(raster)
save_raster_as_h(hazard_arr, output_hazard_present, clc_path_clip_nb)
save_raster_as_h(hazard_arr_4p5_202140, output_hazard_4p5_2021_2040, clc_path_clip_nb)
save_raster_as_h(hazard_arr_4p5_204160, output_hazard_4p5_2041_2060, clc_path_clip_nb)
save_raster_as_h(hazard_arr_8p5_202140, output_hazard_8p5_2021_2040, clc_path_clip_nb)
save_raster_as_h(hazard_arr_8p5_204160, output_hazard_8p5_2041_2060, clc_path_clip_nb)
quantiles are:  [0.00083834 0.29957183]
Now I have just the susc classes [1 2 3]

Visualizing the Hazards#

Visualize the future hazard classes and present hazard classes:

hazards_path = [output_hazard_present,
                output_hazard_4p5_2021_2040, output_hazard_4p5_2041_2060,
                output_hazard_8p5_2021_2040, output_hazard_8p5_2041_2060]

# hazard cmap
values = [0, 0.9, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1]
colors_ = ['#00000000', '#0bd1f8', '#1ff238', '#f3db02', '#ea8d1b', '#dc1721', '#ff00ff']
dict_haz_col = {0: '#00000000', 1: '#0bd1f8', 2: '#1ff238', 3: '#f3db02', 4: '#ea8d1b', 5: '#dc1721', 6: '#ff00ff'}
# Create colormap
cmap = mcolors.ListedColormap(colors, N=len(values))
norm = mcolors.BoundaryNorm(values, len(values))
catalonia.to_crs(epsg=3035, inplace=True)
# Loop over hazard paths and plot
for hazard_path in hazards_path:
    haz_arr = rasterio.open(hazard_path).read(1)
    name = os.path.basename(hazard_path).split('hazard_')[-1].split('.tif')[0]
    plot_raster_V2(haz_arr, ref,array_classes = values, classes_colors = colors_,
    classes_names = ['no data', 'Very Low', 'Low', 'Medium', 'High', 'very high', 'Extreme'],
    title = f'Hazard {name}', dpi = 150)
../../../../_images/79d5cfb0183ab19dd85a5bf9785e04ae1c61fee2f4b1395d4001af971a4beb7b.png ../../../../_images/e41d30da55d68f3f940944c9290a2806cf50019414bbfb27f830f213437bca77.png ../../../../_images/d4706b8074db15fe0b4b037bea47df2f92e2eb91def33e3498592b388bd466a7.png ../../../../_images/f75913defedb60faa8154cd86645ba9daaa03da199a5b5f7fd693f0c96dd7b28.png ../../../../_images/92850f219488920f4d71cbab49cd35540d177f87b72810d38ccc40c0974e8ebc.png

References#

  • Tonini, M.; D’Andrea, M.; Biondi, G.; Degli Esposti, S.; Trucchia, A.; Fiorucci, P. A Machine Learning-Based Approach for Wildfire Susceptibility Mapping. The Case Study of the Liguria Region in Italy. Geosciences 2020, 10, 105. https://doi.org/10.3390/geosciences10030105

  • Trucchia, A.; Meschi, G.; Fiorucci, P.; Gollini, A.; Negro, D. Defining Wildfire Susceptibility Maps in Italy for Understanding Seasonal Wildfire Regimes at the National Level. Fire 2022, 5, 30. https://doi.org/10.1071/WF22138

  • Trucchia, A.; Meschi, G.; Fiorucci, P.; Provenzale, A.; Tonini, M.; Pernice, U. Wildfire hazard mapping in the eastern Mediterranean landscape. International Journal of Wildland Fire 2023, 32, 417-434. https://doi.org/10.1071/WF22138

  • Chakraborty Debojyoti, Dobor Laura, Zolles Anita, Hlásny Tomáš, & Schueler Silvio. (2020). High-resolution gridded climate data for Europe based on bias-corrected EURO-CORDEX: the ECLIPS-2.0 dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3952159