Hazard assessment for river flooding using river discharge statistics

Hazard assessment for river flooding using river discharge statistics#

Analyzing historical river discharge statistics#

This workflow uses the dataset of hydrological climate impact indicators by SMHI that is available via the Copernicus Data Store.

To execute this notebook, the dataset of European river discharges will first be retrieved from the CLIMAAX data server, where a prepared copy is stored for easier and faster access. However, if you would like to download the data directly from CDS or if the dataset copy is not available, you can use the previous notebook 01_river_discharges_get_data.ipynb to download the data.

Select area of interest#

First, we will define the coordinates of the area of interest. Based on these coordinates we will be able to identify the relevant catchment within the river discharges dataset and subset the data for further processing.

The river discharges dataset contants river discharges corresponding to a large number of small-scale catchments. Below we can specify coordinates of a location (WGS84 coordinates) that corresponds to the section of a river that we are interested in. This location will subsequently be snapped to the corresponding catchment in the dataset.

The areaname variable will be used when adding titles to figures or saving output files.

areaname = 'Putna_Romania'
loc = [26.84194444, 45.88861111]

# areaname = 'Zilina_Slovakia'
# loc = [18.717, 49.25]

# areaname = 'Maastricht_NL'
# loc = [5.697, 50.849]

Load libraries#

Find more info about the libraries used in this workflow here

In this notebook we will use the following Python libraries:

os - provides a way to interact with the operating system, allowing the creation of directories and file manipulation.
glob - library for finding pathnames containing a specified patterns.
numpy - A powerful library for numerical computations in Python, widely used for array operations and mathematical functions.
xarray - library for working with labelled multi-dimensional arrays.
geopandas - library for working with geospatial datasets.
shapely - library for manipulation and analysis of geometric objects.
plotly - interactive plotting library.
json - JSON encoder and decoder library.

These libraries collectively enable the download, processing, analysis, and visualization of geospatial and numerical data in this workflow.

import os
from glob import glob
import numpy as np
import xarray as xr
import geopandas as gpd
from shapely.geometry import Point
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json

Create the directory structure#

Firt we need to create the directory structure where the inputs and outputs of the workflow are stored.
In the next cell will create the directory called ‘FLOOD_RIVER_discharges’ in the same directory where this notebook is saved.

If you have already ran the notebook for downloading the river discharges dataset directly from CDS (01_river_discharges_get_data.ipynb), the workflow folder may already be present on your computer, and the pre-processed data is already found in the path defined under data_folder. Otherwise, the preprocessed data will be retrieved in this notebook from the CLIMAAX data server.

# Define the folder for the flood workflow
workflow_folder = 'FLOOD_RIVER_discharges'
os.makedirs(workflow_folder, exist_ok=True)

data_folder = os.path.join(workflow_folder, 'data')
os.makedirs(data_folder, exist_ok=True)

# Define directory for plots within the previously defined workflow folder
plot_dir = os.path.join(workflow_folder, f'plots_{areaname}')
os.makedirs(plot_dir, exist_ok=True)

Selecting the catchment of interest#

The river discharges dataset contains discharges corresponding to a large number of small-scale catchments, identified using a catchment ID (coordinate id in the dataset). For further analysis, we need to be able to subset the dataset to a specific catchment in our area of interest. In order to know the ID of the catchment we are interested in, we need to consult the map of catchments (sub-basins) available in shp format on Zenodo. Please download the dataset of subbasins from Zenodo and unzip it in the folder specied as data_folder_subbasins located in the data_folder (this folder is created below).

Reference to the dataset of subbasin contours:
Isberg, K. (2017). EHYPE3_subbasins.zip [Data set]. Zenodo. https://doi.org/10.5281/zenodo.581451

data_folder_subbasins = os.path.join(data_folder, 'EHYPE3_subbasins')
os.makedirs(data_folder_subbasins, exist_ok=True)

Checking if the files are present in the folder:

filelist_subbasins = glob(os.path.join(data_folder_subbasins, 'EHYPE3_subbasins.*'))

if filelist_subbasins == []:
    print('Files not found. Please download the map of EHYPE3 catchments from Zenodo into this folder.')
else:
    print('Files found:')
    print(filelist_subbasins)

Files found:
['FLOOD_RIVER_discharges\\data\\EHYPE3_subbasins\\EHYPE3_subbasins.dbf', 'FLOOD_RIVER_discharges\\data\\EHYPE3_subbasins\\EHYPE3_subbasins.prj', 'FLOOD_RIVER_discharges\\data\\EHYPE3_subbasins\\EHYPE3_subbasins.shp', 'FLOOD_RIVER_discharges\\data\\EHYPE3_subbasins\\EHYPE3_subbasins.shx']

We can open the dataset of catchment contours as a GeoDataFrame variable:

# Open dataset with subbasin contours
try:
    catchments = gpd.GeoDataFrame.from_file(os.path.join(data_folder_subbasins, 'EHYPE3_subbasins.shp'))
    print("Dataset loaded.")
except:
    print(f"Dataset with subbasin contours not found. Please download it and place it in the folder {data_folder_subbasins}")

catchments

Dataset loaded.

	SUBID	HAROID	geometry
0	8801544.0	8801544.0	MULTIPOLYGON (((-22.9068 65.75671, -22.92437 6...
1	8801548.0	8801548.0	POLYGON ((-24.42223 65.55144, -24.39406 65.537...
2	8000005.0	8000006.0	MULTIPOLYGON (((9.3944 59.15315, 9.41203 59.14...
3	8115258.0	8000006.0	POLYGON ((8.5962 59.30061, 8.59918 59.29174, 8...
4	8115717.0	8000006.0	POLYGON ((9.27409 59.01988, 9.27962 59.00213, ...
...	...	...	...
35403	9566395.0	9566395.0	POLYGON ((0.15417 49.37083, 0.15417 49.3625, 0...
35404	9581818.0	9581818.0	MULTIPOLYGON (((-4.89583 55.7375, -4.87917 55....
35405	9524166.0	9524166.0	MULTIPOLYGON (((-1.12917 45.34583, -1.12083 45...
35406	9581815.0	9581815.0	MULTIPOLYGON (((-4.89583 56.15417, -4.89583 56...
35407	9723401.0	9723401.0	MULTIPOLYGON (((29.05417 36.6875, 29.0625 36.6...

35408 rows × 3 columns

Now we need to identify the id of the catchment where the point of interest is located based on the coordinates stored in the variable loc:

point = Point((loc[0],loc[1]))

in_catchment = catchments.geometry.contains(point)

catch_id = int(catchments.iloc[in_catchment.index[in_catchment==True]].SUBID.values.tolist()[0])
print(f'Catchment ID in the E-HYPEcatch dataset: {catch_id}')

Catchment ID in the E-HYPEcatch dataset: 9601909

The catchment ID and contours will be stored in the variable catchment.

catchment = catchments[catchments.SUBID==catch_id]
catchment

	SUBID	HAROID	geometry
29282	9601909.0	9600704.0	POLYGON ((26.77084 46.00417, 26.77084 45.99583...

Below the catchment contours will be plotted on a map to check whether correct catchment is selected. The coordinates of the selected location are indicated as a red dot, and contours of the corresponding catchment are displayed, as well as the surrounding catchments.

# Creating a marker for the selected catchment (used in the visualization on the map)
catchments['select'] = np.where(catchments.SUBID==catch_id, 1, 0)

# Select only the nearby catchments within the radius of 1 degree
catchments_sel = catchments.cx[(loc[0]-1):(loc[0]+1), (loc[1]-1):(loc[1]+1)]

fig = go.Figure()

fig.add_trace(go.Scattermapbox(
    lat=[loc[1]],  # Latitude coordinates
    lon=[loc[0]],   # Longitude coordinates
    mode='markers',
    marker=go.scattermapbox.Marker(size=14, color='red'),
    text=['Location of interest'],  # Labels for the points
    name=''
))

fig.add_trace(go.Choroplethmapbox(
    geojson=json.loads(catchments_sel.to_json()),
    locations=catchments_sel.index,
    z=catchments_sel["select"],  
    hoverinfo='text',
    text=catchments_sel['SUBID'],  
    colorscale="RdPu",
    marker=dict(line=dict(color="black", width=1.5)),
    marker_opacity=0.2,  
    showscale=False
))

fig.update_layout(
    mapbox_center={'lat':loc[1], 'lon':loc[0]},
    mapbox_zoom=9,
    margin={"r":0,"t":0,"l":0,"b":0},
    mapbox={"style":"open-street-map"})

Based on the map above, we can evaluate whether we want to proceed with analysing modelled river discharges for the selected catchment. If you would like to change the catchment selection, you can go back to the start of this notebook and adjust the location coordinates.

Evaluating daily values of historical river discharges#

We will create dataset variable ds_day_sel which will contain the daily timeseries of river discharges only for the selected catchment:

ds_day_sel = ds_day.sel(id=catch_id); del ds_day

We can explore the timeseries of daily river discharges for the historical model period by plotting the entire dataset. By default we have selected a 5 year period (2001-2005). The plot below shows the timeseries based on different GCM-RCM climate model combinations in separate panels. Please note that these climate models are not constrained by real-world observations, but are aiming to represent one possible realization of the climate in that period. Therefore, the daily timeseries are not representative for analyzing individual past weather events - in the plot below we see different discharge patterns for the different GCM-RCM combinations.

The different lines on each plot correspond to the model results for different catchment models, and are helpful in assessing the uncertainty caused by the different assumptions in the hydrological model.

# Create a figure
fig = make_subplots(rows=len(ds_day_sel.gcm_rcm), cols=1,
                    shared_xaxes=True,
                    y_title="River discharge [m3/s]",
                    vertical_spacing=0.05,
                    subplot_titles=[f'{ds_day_sel.isel(gcm_rcm=x).gcm_rcm.values}' for x in range(len(ds_day_sel.gcm_rcm))])

colorlist = px.colors.cyclical.mrybm[::2]
visible=[True, False, False, False, False, False]

for ii in range(8):
    for mm, gcm_rcm in enumerate(ds_day_sel.gcm_rcm.values):
        fig.add_trace(go.Scatter(x=ds_day_sel.time, y=ds_day_sel.rdis.sel(gcm_rcm=gcm_rcm).isel(catchmodel=ii), 
                                mode='lines', line={'color': colorlist[ii]}, name=f'Model {ii:02}', 
                                legendgroup=f'Model {ii:02}', showlegend=visible[mm]), row=mm+1, col=1)

fig.update_yaxes(range=[0, np.max(ds_day_sel.rdis.values)])

# Customize layout
fig.update_layout(
    height=1000, width=1200, 
    title_text="<b>Daily timeseries of river discharges based on historical model period</b>",
    legend_title_text="Catchment model",
)

# Show the figure
fig.show()

Going forward, we will take the mean values across the catchmodel models in order to analyze a single timeseries per GCM-RCM combination. Below we plot the timeseries averaged over catchment models for all GCM-RCM combinations in one plot:

ds_day_sel_catchmean = ds_day_sel.rdis.mean(dim='catchmodel')

# Create a figure
fig = go.Figure()

for ii in range(len(ds_day_sel.gcm_rcm.values)):
    fig.add_trace(go.Scatter(x=ds_day_sel.time, y=ds_day_sel_catchmean.isel(gcm_rcm=ii), mode='lines', 
                             name=f'{ds_day_sel.isel(gcm_rcm=ii).gcm_rcm.values}'))

fig.update_yaxes(range=[0, np.max(ds_day_sel_catchmean.values)])

# Customize layout
fig.update_layout(
    height=500, width=1200, 
    title_text="<b>Daily timeseries of river discharges based on historical model period</b><br>\
for different GCM-RCM combinations and based on a mean of the catchment model emsemble",
    yaxis_title="River discharge [m3/s]",
    showlegend=True,
    legend_title_text="GCM-RCM combination",
    legend=dict(x=0.7, y=0.95),
)

# Show the figure
fig.show()

Flow-duration curve#

Daily discharge statistics are a useful metric for understanding how representative the model results are compared to the local observations. If we have access to measured daily discharge values, we can compare the exceedance curves of the daily discharges, also known as the flow-duration curve. We can compute the flow duration curve based on the modelled data below:

ds_flow_curve = xr.DataArray(
    data=-np.sort(-ds_day_sel_catchmean.values, axis=1),
    dims=["gcm_rcm", "exceedance"],
    coords=dict(
        gcm_rcm=ds_day_sel_catchmean.gcm_rcm.values,
        exceedance=np.arange(1.,len(ds_day_sel_catchmean.time)+1) / len(ds_day_sel_catchmean.time) * 100,
    ))

The resulting flow-duration curve is plotted below. The plot is interactive and makes it possible to zoom in and inspect the data, as well as to (de)select GCM-RCM model combinations in the legend.

# Create a figure
fig = go.Figure()

for ii in range(len(ds_day_sel.gcm_rcm.values)):
    fig.add_trace(go.Scatter(x=ds_flow_curve.exceedance, y=ds_flow_curve.isel(gcm_rcm=ii), mode='lines', 
                             name=f'{ds_day_sel.isel(gcm_rcm=ii).gcm_rcm.values}'))

fig.update_yaxes(range=[0, np.max(ds_flow_curve.values)])

# Customize layout
fig.update_layout(
    height=500, width=1200, 
    title_text="<b>Flow-duration curve</b><br>based on modelled river discharges for the historical period (1971-2000)",
    yaxis_title="River discharge [m3/s]",
    xaxis_title="Exceedance [%]",
    showlegend=True,
    legend_title_text="GCM-RCM combination",
    legend=dict(x=0.7, y=0.95),
)

# Show the figure
fig.show()

If you have access to flow-duration curves for the location of interest based on the local measurements, it is strongly recommended to compare the flow-duration curves in order to understand the bias between the modelled and observed discharges, especially focusing on the discharges with low probability (extreme events).

The statistics of some GCM-RCM combinations might be closer to the observed discharge statistics than others. Based on this, it can be useful exclude certain GCM-RCM combinations from the subsequent analysis, e.g. if the mode data showa statistical behavior that is very different from observations and from the other GCM-RCM combinations. This can be done by adding the name of the GCM-RCM combination to the list variable exclude_gcm_rcm below.

If you do not want to exclude any GCM-RCM combinations, please set the variable exclude_gcm_rcm to an empty list [].

print('Avalable GCM-RCM combinations:')
for ii in range(len(ds_day_sel.gcm_rcm)):
    print(ds_day_sel.gcm_rcm.isel(gcm_rcm=ii).values)

Avalable GCM-RCM combinations:
ICHEC-EC-EARTH_CLMcom-CCLM4-8-17
ICHEC-EC-EARTH_KNMI-RACMO22E
MOHC-HadGEM2-ES_KNMI-RACMO22E
MOHC-HadGEM2-ES_SMHI-RCA4
MPI-M-MPI-ESM-LR_MPI-CSC-REMO2009
MPI-M-MPI-ESM-LR_SMHI-RCA4

exclude_gcm_rcm = [] # set to empty list [] if you do not want to exclude any model combinations

if exclude_gcm_rcm == []:
    print(f'No model combination is excluded.')
else:
    for ii, gcm_rcm in enumerate(exclude_gcm_rcm):
        if gcm_rcm in ds_day_sel.gcm_rcm.values:
            print(f'{gcm_rcm} will be excluded from further analysis.')
        else:
            print(f'Model combination {gcm_rcm} not found, please check correctness.')

No model combination is excluded.

Hazard assessment for river flooding using river discharge statistics

Contents