Hazard assessment for river flooding using river discharge statistics

Hazard assessment for river flooding using river discharge statistics#

A workflow from the CLIMAAX Handbook and FLOODS GitHub repository.
See our how to use risk workflows page for information on how to run this notebook.

This workflow uses the dataset of hydrological climate impact indicators by SMHI that is available via the Copernicus Data Store.

To execute this notebook, the dataset of European river discharges will first be retrieved from the CLIMAAX data server, where a prepared copy is stored for easier and faster access. However, if you would like to download the data directly from CDS or if the dataset copy is not available, you can use the previous notebook to download the data.

Note

The plots produced in this notebook are made interactive (when executed by the user), making it possible to explore the data by zooming in or enabling/disabling the data layers. We encourage you to make full use of this functionality to obtain a clear picture of the data. {tip}

Preparation work#

Load libraries#

Find more info about the libraries used in this workflow here

In this notebook we will use the following Python libraries:

os - provides a way to interact with the operating system, allowing the creation of directories and file manipulation.
numpy - A powerful library for numerical computations in Python, widely used for array operations and mathematical functions.
xarray - library for working with labelled multi-dimensional arrays.
geopandas, pyogrio - libraries for working with geospatial datasets.
shapely - library for manipulation and analysis of geometric objects.
plotly - interactive plotting library.

These libraries collectively enable the download, processing, analysis, and visualization of geospatial and numerical data in this workflow.

import os

import pooch
import numpy as np
import xarray as xr
import geopandas as gpd
import pyogrio
from shapely.geometry import Point
import dask.diagnostics
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

Select area of interest#

First, we will define the coordinates of the area of interest. Based on these coordinates we will be able to identify the relevant catchment within the river discharges dataset and subset the data for further processing.

The river discharges dataset contants river discharges corresponding to a large number of small-scale catchments. Below we can specify coordinates of a location (WGS84 coordinates) that corresponds to the section of a river that we are interested in. This location will subsequently be snapped to the corresponding catchment in the dataset.

The locname variable will be used when adding titles to figures or saving output files.

locname = 'Putna_Romania'
loc = [26.84194444, 45.88861111]

# locname = 'Zilina_Slovakia'
# loc = [18.717, 49.25]

# locname = 'Maastricht_NL'
# loc = [5.697, 50.849]

Create the directory structure#

First, we need to create the directory structure where the inputs and outputs of the workflow are stored.
In the next cell will create the directory called FLOOD_RIVER_discharges in the same directory where this notebook is saved.

If you have already ran the notebook for downloading the river discharges dataset directly from CDS, the workflow folder may already be present on your computer, and the pre-processed data is already found in the path defined under data_folder. Otherwise, the preprocessed data will be retrieved in this notebook from the CLIMAAX data server.

# Define the folder for the flood workflow
workflow_folder = 'FLOOD_RIVER_discharges'
os.makedirs(workflow_folder, exist_ok=True)

data_folder = os.path.join(workflow_folder, 'data')
os.makedirs(data_folder, exist_ok=True)

# Define directory for plots within the previously defined workflow folder
plot_dir = os.path.join(workflow_folder, f'plots_{locname}')
os.makedirs(plot_dir, exist_ok=True)

Data selection#

Checking data availability#

In this step we will check if the necessary data is already available in the workflow folder or access it from the CLIMAAX cloud storage.

def open_from_climaax_mirror(file):
    return xr.open_zarr(f'https://object-store.os-api.cci1.ecmwf.int/climaax/river_discharge_mirror/{file}.zarr')

# Folder of a local EHYPE dataset (if available, see the get_data notebook).
# If the folder doesn't exist, data is accessed from the CLIMAAX cloud storage.
data_folder_catch = os.path.join(data_folder, 'EHYPEcatch')

We will open the four datasets in order to use them in the subsequent analysis.

def preprocess_daily(ds):
    filename = os.path.basename(ds.encoding['source'])
    _, _, catchmodel, gcm, _, _, rcm, _, _, _ = filename.split('_')
    return ds.expand_dims({
        'catchmodel': [catchmodel],
        'gcm_rcm': [f'{gcm}_{rcm}'],
    })

if not os.path.isdir(data_folder_catch):
    print('Accessing daily river discharges from CLIMAAX cloud storage')
    ds_day = open_from_climaax_mirror('rdis_day_E-HYPEcatch_allmodels')
else:
    print('Loading from local daily river discharges dataset')
    ds_day = xr.open_mfdataset(
        os.path.join(data_folder_catch, 'rdis_day_E-HYPEcatch*-EUR-11_*_catch_v1.nc'),
        preprocess=preprocess_daily,
        chunks="auto"
        )

Accessing daily river discharges from CLIMAAX cloud storage

def preprocess_monthly_mean(ds):
    filename = os.path.basename(ds.encoding['source'])
    _, _, _, catchmodel, gcm, scenario, _, rcm, _, time_period, _, _ = filename.split('_')
    ds['time'] = ds.time.dt.month
    return ds.expand_dims({
        'time_period': [time_period],
        'scenario': [scenario],
        'catchmodel': [catchmodel],
        'gcm_rcm': [f'{gcm}_{rcm}'],
    })

if not os.path.isdir(data_folder_catch):
    print('Accessing monthly mean river discharges from CLIMAAX cloud storage')
    ds_mon = open_from_climaax_mirror('rdis_ymonmean_abs_E-HYPEcatch_allmodels')
else:
    print('Loading from local monthly mean river discharges dataset')
    ds_mon = xr.open_mfdataset(
        os.path.join(data_folder_catch, 'rdis_ymonmean_abs_E-HYPEcatch*-EUR-11_*_na_*_catch_v1.nc'),
        preprocess=preprocess_monthly_mean,
        chunks="auto"
    )

Accessing monthly mean river discharges from CLIMAAX cloud storage

def preprocess_flood_occurence(ds):
    filename = os.path.basename(ds.encoding['source'])
    _, _, _, _, gcm, scenario, _, rcm, _, time_period, _, _ = filename.split('_')
    return ds.expand_dims({
        'scenario': [scenario],
        'gcm_rcm': [f'{gcm}_{rcm}']
    }).assign_coords({
        'time_period': ('time', [time_period])
    })

if not os.path.isdir(data_folder_catch):
    print('Accessing absolute extreme river discharges from CLIMAAX cloud storage')
    ds_flood = open_from_climaax_mirror('rdis_extremes_abs_E-HYPEcatch_allmodels')
else:
    print('Loading from local absolute extreme river discharges dataset')
    ds_flood = xr.open_mfdataset(
        os.path.join(data_folder_catch, 'rdisreturnmax*_tmean_abs_E-HYPEcatch*_catch_v1.nc'),
        preprocess=preprocess_flood_occurence
    )

Accessing absolute extreme river discharges from CLIMAAX cloud storage

if not os.path.isdir(data_folder_catch):
    print('Accessing relative extreme river discharges from CLIMAAX cloud storage')
    ds_flood_rel = open_from_climaax_mirror('rdis_extremes_rel_E-HYPEcatch_allmodels')
else:
    print('Loading from local relative extreme river discharges dataset')
    ds_flood_rel = xr.open_mfdataset(
        os.path.join(data_folder_catch, 'rdisreturnmax*_tmean_abs_E-HYPEcatch*_catch_v1.nc'),
        preprocess=preprocess_flood_occurence
    )

Accessing relative extreme river discharges from CLIMAAX cloud storage

Load catchment dataset#

The river discharges dataset contains discharges corresponding to a large number of small-scale catchments, identified using a catchment ID (coordinate id in the dataset). For further analysis, we need to be able to subset the dataset to a specific catchment in our area of interest. In order to know the ID of the catchment we are interested in, we need to consult the map of catchments (sub-basins).

data_folder_subbasins = os.path.join(data_folder, 'EHYPE3_subbasins')
os.makedirs(data_folder_subbasins, exist_ok=True)

Download the dataset of subbasins from Zenodo: The next cell will retrieve the files automatically and place them in the folder specied as data_folder_subbasins located in the data_folder (created in the previous cell). You can also download the dataset from the Zenodo web interface, unzip it and place it in the folder manually.

pooch.retrieve(
    'doi:10.5281/zenodo.581451/EHYPE3_subbasins.zip',
    known_hash='ce1a48393adba92443fb99cb2651b7cfadf60af9e4cce0ad9cae8e7b52d3c684',
    fname='EHYPE3_subbasins.zip',
    path=data_folder,
    downloader=pooch.DOIDownloader(),
    processor=pooch.Unzip(extract_dir=os.path.basename(data_folder_subbasins))
)

We can open the dataset of catchment contours as a GeoDataFrame variable:

try:
    catchments = gpd.GeoDataFrame.from_file(
        os.path.join(data_folder_subbasins, 'EHYPE3_subbasins.shp')
    )
    print('Dataset loaded.')
except pyogrio.errors.DataSourceError:
    print(
        'Dataset with subbasin contours not found. '
        f'Please download it and place it in the folder {data_folder_subbasins}'
    )

catchments = catchments.set_index(catchments['SUBID'].astype(int))
catchments

Dataset loaded.

	SUBID	HAROID	geometry
SUBID
8801544	8801544.0	8801544.0	MULTIPOLYGON (((-22.9068 65.75671, -22.92437 6...
8801548	8801548.0	8801548.0	POLYGON ((-24.42223 65.55144, -24.39406 65.537...
8000005	8000005.0	8000006.0	MULTIPOLYGON (((9.3944 59.15315, 9.41203 59.14...
8115258	8115258.0	8000006.0	POLYGON ((8.5962 59.30061, 8.59918 59.29174, 8...
8115717	8115717.0	8000006.0	POLYGON ((9.27409 59.01988, 9.27962 59.00213, ...
...	...	...	...
9566395	9566395.0	9566395.0	POLYGON ((0.15417 49.37083, 0.15417 49.3625, 0...
9581818	9581818.0	9581818.0	MULTIPOLYGON (((-4.89583 55.7375, -4.87917 55....
9524166	9524166.0	9524166.0	MULTIPOLYGON (((-1.12917 45.34583, -1.12083 45...
9581815	9581815.0	9581815.0	MULTIPOLYGON (((-4.89583 56.15417, -4.89583 56...
9723401	9723401.0	9723401.0	MULTIPOLYGON (((29.05417 36.6875, 29.0625 36.6...

35408 rows × 3 columns

Selecting the catchment of interest#

Now we need to identify the id of the catchment where the point of interest is located based on the coordinates stored in the variable loc:

point = Point((loc[0], loc[1]))

in_catchment = catchments.geometry.contains(point)

catch_id = catchments[in_catchment].index.values[0]
print(f'Catchment ID in the E-HYPEcatch dataset: {catch_id}')

Catchment ID in the E-HYPEcatch dataset: 9601909

The catchment ID and contours will be stored in the variable catchment.

catchment = catchments.loc[[catch_id]]
catchment

	SUBID	HAROID	geometry
SUBID
9601909	9601909.0	9600704.0	POLYGON ((26.77084 46.00417, 26.77084 45.99583...

Below, the catchment contours will be plotted on a map to check whether the correct catchment is selected. The coordinates of the selected location are indicated as a red dot, and contours of the corresponding catchment are displayed, as well as the surrounding catchments.

# Creating a marker for the selected catchment (used in the visualization on the map)
catchments['select'] = np.where(catchments.index==catch_id, 1, 0)

# Select only the nearby catchments within the radius of 1 degree
catchments_sel = catchments.cx[(loc[0]-1):(loc[0]+1), (loc[1]-1):(loc[1]+1)]

fig = go.Figure()

fig.add_trace(go.Scattermapbox(
    lat=[loc[1]],  # Latitude coordinates
    lon=[loc[0]],   # Longitude coordinates
    mode='markers',
    marker=go.scattermapbox.Marker(size=14, color='red'),
    text=['Location of interest'],  # Labels for the points
    name=''
))

fig.add_trace(go.Choroplethmapbox(
    geojson=catchments_sel.to_geo_dict(),
    locations=catchments_sel.index,
    z=catchments_sel['select'],  
    hoverinfo='text',
    text=catchments_sel['SUBID'],  
    colorscale='RdPu',
    marker={'line': {'color': 'black', 'width': 1.5}},
    marker_opacity=0.2,  
    showscale=False
))

fig.update_layout(
    mapbox_center={'lat': loc[1], 'lon': loc[0]},
    mapbox_zoom=8,
    margin={'r': 0, 't': 0, 'l': 0, 'b': 0},
    mapbox={'style': 'open-street-map'})

fig.show()

fig.write_image(os.path.join(plot_dir, f'{locname}_location_catchment_map.png'))

../../../../_images/460ff02f04b01b79fbbfeb681e0553b526cfa9205a4334811236184b1b39af51.svg

Based on the map above, we can evaluate whether we want to proceed with analysing modelled river discharges for the selected catchment. If you would like to change the catchment selection, you can go back to the start of this notebook and adjust the location coordinates.

Evaluating daily values of historical river discharges#

We will create dataset variable ds_day_sel which will contain the daily timeseries of river discharges only for the selected catchment:

with dask.diagnostics.ProgressBar():
    ds_day_sel = ds_day.sel(id=catch_id).compute()

# Save data for selected catchment to disk for reuse in other steps of the workflow
ds_day_sel.to_netcdf(os.path.join(data_folder, f'rdis_day_E-HYPEcatch_allmodels_{catch_id}.nc'))

We can explore the timeseries of daily river discharges for the historical model period by plotting the entire dataset. By default we have selected a 5 year period (2001-2005). The plot below shows the timeseries based on different GCM-RCM climate model combinations in separate panels. Please note that these climate models are not constrained by real-world observations, but are aiming to represent one possible realization of the climate in that period. Therefore, the daily timeseries are not representative for analyzing individual past weather events - in the plot below we see different discharge patterns for the different GCM-RCM combinations.

The different lines on each plot correspond to the model results for different catchment models, and are helpful in assessing the uncertainty caused by the different assumptions in the hydrological model.

str_daily_timerange = f'{ds_day_sel.time.dt.year.values[0]}-{ds_day_sel.time.dt.year.values[-1]}'
str_daily_timerange

'1991-2005'

# Create a figure
fig = make_subplots(
    rows=len(ds_day_sel.gcm_rcm),
    cols=1,
    shared_xaxes=True,
    y_title='River discharge [m3/s]',
    vertical_spacing=0.05,
    subplot_titles=ds_day_sel.gcm_rcm.values
)

colorlist = px.colors.cyclical.mrybm[::2]

for ii, catchmodel in enumerate(ds_day_sel.catchmodel.values):
    for mm, gcm_rcm in enumerate(ds_day_sel.gcm_rcm.values):
        fig.add_trace(go.Scatter(
            x=ds_day_sel.time, y=ds_day_sel.rdis.sel(gcm_rcm=gcm_rcm, catchmodel=catchmodel), 
            mode='lines', line={'color': colorlist[ii]}, name=f'Model {ii:02}', 
            legendgroup=f'Model {ii:02}', showlegend=(mm==0)
        ), row=mm+1, col=1)

fig.update_yaxes(range=[0, np.max(ds_day_sel.rdis.values)])

# Customize layout
fig.update_layout(
    height=900, width=1100, 
    title_text='<b>Daily timeseries of river discharges based on historical model period</b>',
    legend_title_text='Catchment model',
)

# Show the figure
fig.show()

# Save figure
fig.write_image(os.path.join(plot_dir,f'{locname}_daily_timeseries_{str_daily_timerange}.png'))

../../../../_images/fd76d91fabe16c5221fca667550bd5cd4339c2c7eda58829acc48d7e10a560d0.svg

Going forward, we will take the mean values across the catchmodel models in order to analyze a single timeseries per GCM-RCM combination. Below we plot the timeseries averaged over catchment models for all GCM-RCM combinations in one plot:

ds_day_sel_catchmean = ds_day_sel.rdis.mean(dim='catchmodel')

# Create a figure
fig = go.Figure()

for gcm_rcm in ds_day_sel.gcm_rcm.values:
    fig.add_trace(go.Scatter(x=ds_day_sel.time, y=ds_day_sel_catchmean.sel(gcm_rcm=gcm_rcm), mode='lines', name=gcm_rcm))

fig.update_yaxes(range=[0, np.max(ds_day_sel_catchmean.values)])

# Customize layout
fig.update_layout(
    height=500, width=1100, 
    title_text=(
        '<b>Daily timeseries of river discharges based on historical model period</b>'
        '<br>for different GCM-RCM combinations and based on a mean of the catchment model emsemble'
    ),
    yaxis_title='River discharge [m3/s]',
    showlegend=True,
    legend_title_text='GCM-RCM combination',
    legend={'x': 0.7, 'y': 0.95},
)

# Show the figure
fig.show()

# Save figure
fig.write_image(os.path.join(plot_dir, f'{locname}_daily_timeseries_{str_daily_timerange}_catchmodelmean.png'))

../../../../_images/37bfb8bd83546c3ed073a374a3492e95db155967315b32b93cf06f9aef8c8a9a.svg

Flow-duration curve#

Daily discharge statistics are a useful metric for understanding how representative the model results are compared to the local observations. If we have access to measured daily discharge values, we can compare the exceedance curves of the daily discharges, also known as the flow-duration curve. We can compute the flow duration curve based on the modelled data below:

ds_flow_curve = xr.DataArray(
    data=-np.sort(-ds_day_sel_catchmean.values, axis=1),
    dims=['gcm_rcm', 'exceedance'],
    coords={
        'gcm_rcm': ds_day_sel_catchmean.gcm_rcm.values,
        'exceedance': np.arange(1., len(ds_day_sel_catchmean.time)+1) / len(ds_day_sel_catchmean.time) * 100,
    }
)

The resulting flow-duration curve is plotted below. The plot is interactive and makes it possible to zoom in and inspect the data, as well as to (de)select GCM-RCM model combinations in the legend.

# Create a figure
fig = go.Figure()

for gcm_rcm in ds_day_sel.gcm_rcm.values:
    fig.add_trace(go.Scatter(x=ds_flow_curve.exceedance, y=ds_flow_curve.sel(gcm_rcm=gcm_rcm), mode='lines', name=gcm_rcm))

fig.update_yaxes(range=[0, np.max(ds_flow_curve.values)])

# Customize layout
fig.update_layout(
    height=500, width=1100, 
    title_text=f'<b>Flow-duration curve</b><br>based on modelled daily river discharges in the period {str_daily_timerange}',
    yaxis_title='River discharge [m3/s]',
    xaxis_title='Exceedance [%]',
    showlegend=True,
    legend_title_text='GCM-RCM combination',
    legend={'x': 0.7, 'y': 0.95},
)

# Show the figure
fig.show()

# Save figure
fig.write_image(os.path.join(plot_dir,f'{locname}_flow-duration_curve_{str_daily_timerange}.png'))

../../../../_images/919a02d52183198502b31282315982edccd9a86e5f89fc0cb41c82819cf7ad11.svg

For an accurate flow-duration curve, we need to consider a longer time period. As a minimum, we recommend 15 years of data (e.g. 1991-2005).

The flow-duration curve above can be compared to the curve derived from local river discharge observations in order to understand how well the model represents both the seasonal and extreme discharges. For this workflow, especially the representation of extreme (low probability events) discharges is important.

Consider the following questions:

Are the extremes of the same order of magnitude?
is the shape of the flow-duration curve based on model data similar to the observed data?

Possibility to exclude certain GCM-RCM model combinations from the analysis#

The statistics of some GCM-RCM combinations might be closer to the observed discharge statistics than others. Based on this, it can be useful exclude certain GCM-RCM combinations from the subsequent analysis, e.g. if the mode data showa statistical behavior that is very different from observations and from the other GCM-RCM combinations. This can be done by adding the name of the GCM-RCM combination to the list variable exclude_gcm_rcm below.

Tip

If you do not want to exclude any GCM-RCM combinations, please set the variable exclude_gcm_rcm to an empty list [].

print('Avalable GCM-RCM combinations:')
print('\n'.join(ds_day_sel.gcm_rcm.values))

Avalable GCM-RCM combinations:
ICHEC-EC-EARTH_CLMcom-CCLM4-8-17
ICHEC-EC-EARTH_KNMI-RACMO22E
MOHC-HadGEM2-ES_KNMI-RACMO22E
MOHC-HadGEM2-ES_SMHI-RCA4
MPI-M-MPI-ESM-LR_MPI-CSC-REMO2009
MPI-M-MPI-ESM-LR_SMHI-RCA4

exclude_gcm_rcm = [] # set to empty list [] if you do not want to exclude any model combinations

if not exclude_gcm_rcm:
    print('No model combination is excluded.')
else:
    for ii, gcm_rcm in enumerate(exclude_gcm_rcm):
        if gcm_rcm in ds_day_sel.gcm_rcm.values:
            print(f'{gcm_rcm} will be excluded from further analysis.')
        else:
            print(f'Model combination {gcm_rcm} not found, please check correctness.')

No model combination is excluded.

Next step#

Continue with a validation against observations.

References#

Berg, P., Photiadou, C., Bartosova, A., Biermann, J., Capell, R., Chinyoka, S., Fahlesson, T., Franssen, W., Hundecha, Y., Isberg, K., Ludwig, F., Mook, R., Muzuusa, J., Nauta, L., Rosberg, J., Simonsson, L., Sjökvist, E., Thuresson, J., and van der Linden, E., (2021): Hydrology related climate impact indicators from 1970 to 2100 derived from bias adjusted European climate projections. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.73237ad6
Isberg, K. (2017). EHYPE3_subbasins.zip [Data set]. Zenodo. https://doi.org/10.5281/zenodo.581451

Contributors#

Author of the workflow: Natalia Aleksandrova (Deltares)

Hazard assessment for river flooding using river discharge statistics

Contents

Hazard assessment for river flooding using river discharge statistics#

Preparation work#

Load libraries#

Select area of interest#

Create the directory structure#

Data selection#

Checking data availability#

Load catchment dataset#

Selecting the catchment of interest#

Evaluating daily values of historical river discharges#

Flow-duration curve#

Possibility to exclude certain GCM-RCM model combinations from the analysis#

Analysing the seasonal variations of river discharges#

Flood occurence in historical and future climates#

Absolute values of extreme river discharges in the current and future climate#

Relative change in extreme river discharges in the future climate#

Next step#

References#

Contributors#