Hazard assessment for heatwaves based on satellite-derived data#

Under climate change, the occurrence of heatwaves is expected to increase in the future in Europe. The main negative impacts caused by heatwave events are related to the overheating of the urban areas, which lowers the comfort of living or causes health issues among vulnerable population (see also: Integrated Assessment of Urban Overheating Impacts on Human Life), drought, and water scarcity. Nowadays, there are a lot of studies and methodologies on how we can mitigate the impacts of these events. This toolbox wants to answer simple questions that are more frequently asked by crisis management local authorities, urban planners, or policymakers.

These questions are:

  1. What are the problematic areas? (most overheated areas)

  2. Who or what is exposed?

Note

The video does not reflect the current workflow structure. Notebooks have been renamed and/or moved since the recording.

This workflow contains the following steps used to generate a risk map for your selected area:

  • Preparatory work: installing packages and creating directory structure

  • Understanding the trends in the occurence of hot days in the climate change scenarios

  • Identifying the heat islands (areas most exposed to heat) in your selected area, based on the observed land surface temperature (from RSLab Landsat8, resolution: 30x30m)

  • Analysing the distribution of vulnerable population, based on population distribution data.

  • Creating a heatwave risk map based on the risk matrix where risk = (level of exposure to heat in the area) x (level of vulnerability)

Preparation work#

Prepare your workspace#

import zipfile
import os
from glob import glob
from datetime import datetime

import numpy as np
import xarray as xr
import pandas as pd
import rasterio

import matplotlib.pyplot as plt
import plotly.express as px
# Set host forwarding for remote jupyterlab sessions
if 'JUPYTERHUB_SERVICE_PREFIX' in os.environ:
    os.environ['LOCALTILESERVER_CLIENT_PREFIX'] = f"{os.environ['JUPYTERHUB_SERVICE_PREFIX']}/proxy/{{port}}"

Create directory structure#

# Define the directory for the flashflood workflow preprocess
workflow_folder = 'Heatwave_risk'

# Define directories for data and results within the previously defined workflow directory
data_dir = os.path.join(workflow_folder, 'data')
LST_dir = os.path.join(data_dir, 'LST')
pop_dir = os.path.join(data_dir, 'population')
results_dir = os.path.join(workflow_folder, 'results')

# Create the workflow directory along with subdirectories for data and results
for path in [workflow_folder, data_dir, LST_dir, pop_dir, results_dir]:
    os.makedirs(path, exist_ok=True)

Identify overheated areas in the urban environment#

Data description#

Heat islands are urbanized areas that experience higher temperatures than surrounding areas. Structures such as buildings, roads, and other infrastructure absorb and re-emit the sun’s heat, thus resulting in warmer temperatures than in the natural landscapes (such as forests and water bodies). Urban areas, where these structures are highly concentrated and greenery is limited, become “islands” of higher temperatures relative to outlying areas (see also: Heat islands definition). For the identification of the heat islands from the historical data we can use these data:

Satellite-derived land surface temperature data in this workflow is based on land surface temperature (LST) calculated from Landsat 8 or 9 imagery (spatial resolution: 15-30m; at 8-16 days interval). We use this data product for the identification of heat islands.

Obtain land surface temperature#

LST data is available from the RSLab website (option 1) or can be calculated here from the Landsat imagery bands (option 2).

Tip

Pick your preferred option to obtain the LST data and skip over the cells of the other.

Option 1: download LST from the RSLab portal#

Land surface temperature data needs to be manually downloaded from the RSLab portal. To do this, you need to follow these steps:

  1. Go to the RSLab portal and draw a closed polygon around your area of interest by zooming in and clicking on the map. If you are focusing on urban areas, it is important to try to select only the urban areas in the rectangular polygon. You may encounter “Very big request” errors when selecting areas larger than the allowed limit or for complex selections (too many polygon vertices - try a simple bounding box instead).

  2. Select the time period, e.g. June to August in a specific year, where particularly high temperatures were observed. From Landsat8, data is available for 2013-2024, it is recommended to choose a year in this period (optional: consult the local meteorological authorities for more information on past heat waves).

  3. Select your preferred Landsat source and emissivity. For the dates in the period 2013-2024, please choose Landsat8 as the source. For urban areas, it is recommended to use MODIS and NDVI-based emissivity (ASTER emissivity is preferred for natural landscapes). Refer to the documentation for more details.

  4. Click the “Calculate LST” button.

  5. The results will be presented below the map. For each LST result you will find a “show” and “download” buttons. “Show” button allows to display the LST as a layer on the map and calculates the mean/min/max LST value for the selected image.

  6. To dowload the images, please press “Download - All images”, this will start the download of the “AllImages_LST.zip” file that contains .tiff images.

  7. Once downloaded, please move the downloaded file to the folder specified above under variable LST_dir (folder ./Heatwave_risk/data_dir/LST).

This code unzips all downloaded LST data:

# Loop through all files in the directory
for file in os.listdir(LST_dir):
    file_path = os.path.join(LST_dir, file)
    # Check if the file is a zipfile
    if zipfile.is_zipfile(file_path):
        # Open the zipfile
        with zipfile.ZipFile(file_path, 'r') as zip_ref:
            # Extract all contents of the zipfile into the working directory
            zip_ref.extractall(LST_dir)

Option 2: download Landsat 8 & 9 imagery and calculate LST#

While the public RSLab data portal provides convenient access to Landsat-derived land surface temperature values, it imposes restrictions on the size of the region for which data is downloaded. Alternatively, we can access Landsat imagery directly, e.g., via the US Geological Survey (USGS) Machine to Machine (M2M) interface and compute LST ourselves. Download volumes following this option are much larger (approx 1.3 GB per Landsat scene) compared to the direct download of LST from the RSLab data portal, but allow for the processing of larger regions.

We use the eodag package to search the USGS catalogue for Landsat collection 2 level 1 products and download them. The data access requires credentials to access the USGS M2M interface. An account with USGS can be registered here.

Land surface temperature is calaculated with the pylandtemp package.

import eodag
import pylandtemp

# Output folder for Landsat data
landsat_dir = os.path.join(data_dir, "landsat")

Before querying the USGS data catalogue, you have to configure the usgs source of eodag to use your account credentials. If choosing a yaml-file based configuration, an entry such as

usgs:
  api:
    credentials:
      username: ...
      password: ...

must be added with your account credentials. See the eodag configuration documentation for all configuration options.

Attention

  • Since February 2025, the USGS M2M only accepts token-based authentication. Put your access token in the password field for eodag.

  • The credentials in the eodag configuration file for the usgs provider must be placed under the api entry (not auth as used by some other providers).

  • If you are seeing UnsupportedProvider errors when searching for products, it can indicate that the usgs provider for eodag is not correctly set up in your configuration.

dag = eodag.EODataAccessGateway()  # also accepts a path to a configuration file as an argument

We search the catalogue for matching Landsat Collection 2 Level 1 (LANDSAT_C2L1) products:

  • geom defines the area of interest. Here, lat-lon coordinates of a bounding box are given, but a shapely geometry object would also be accepted. Alternatively, a recognized location can be specified with the locations parameter. See the documentation of search_all() for more information.

  • A time period is selected with the start and end parameters.

  • We fix the data provider here to USGS, but other providers of Landsat data may be available to you. Run dag.available_providers("LANDSAT_C2L1") to find alternatives.

# Bounding box for the area of interest
area = {
    "lonmin": 18.65, "lonmax": 18.85,
    "latmin": 49.16, "latmax": 49.27
}

products = dag.search_all(
    productType="LANDSAT_C2L1",
    geom=area,
    start="2015-06-01",
    end="2015-08-31",
    provider="usgs"
)

Tip

The example is configured to download a single year only, but data from multiple years can be selected. Products then may require additional filtering, e.g., to only include scenes from summer months.

Only consider images that fully contain our region of interest (minimum overlap = 100%):

products = products.filter_overlap(geometry=area, minimum_overlap=100)

Products selected for further processing:

products

Download each product, only keep values for cloud-free pixels, clip to the region of interest and reproject to lat-lon coordinate system for the calculation of land surface temperature:

Hide code cell source

def open_landsat_file(path, suffix):
    """Find and open a Landsat band file in a product folder"""
    for file in os.listdir(path):
        if file.endswith(f"_{suffix}.TIF"):
            # Bands are split into files, therefore each file is single-band
            # and we can squeeze out the band dimension to get a 2D array
            return xr.open_dataarray(os.path.join(path, file)).squeeze()
    raise ValueError(f"band/product {suffix} not found in {path}")

def open_and_preprocess(path, bands, mask_clouds=True, crs=None, clip_bbox=None):
    """Load, mask, reproject and clip a selection of bands from a Landsat product"""
    ds = xr.Dataset({f"band_{i}": open_landsat_file(path, f"B{i}") for i in bands})
    # Use QA_Pixel value to mask cloudy pixels if desired
    # https://www.usgs.gov/landsat-missions/landsat-collection-2-quality-assessment-bands
    if mask_clouds:
        qa = open_landsat_file(path, "QA_PIXEL")
        is_clear = (qa == 21824)
        ds = ds.where(is_clear, np.nan)
    # Reproject and clip before computation (note the reuse of crs for clip bbox)
    if crs is not None and ds.rio.crs != crs:
        ds = ds.rio.reproject(crs, nodata=np.nan)
    if clip_bbox is not None:
        ds = ds.rio.clip_box(*clip_bbox, crs=crs)
    return ds

def compute_land_surface_temperature(ds):
    """Compute LST in °C from a dataset with Landsat bands"""
    # Pick your preferred algorithm for calculating LST, see
    # https://github.com/pylandtemp/pylandtemp?tab=readme-ov-file#supported-algorithms-and-their-reference-keys
    lst = pylandtemp.split_window(
        ds["band_10"].values,
        ds["band_11"].values,
        ds["band_4"].values,
        ds["band_5"].values,
        lst_method="jiminez-munoz",
        emissivity_method="avdan"
    )
    # Convert from Kelvin to degrees Celsius
    lst = lst - 273.15
    # Return with spatial metadata taken from the input dataset
    return xr.DataArray(lst, dims=["y", "x"], coords=ds.coords).rio.write_crs(ds.rio.crs)
for product in products:

    # Download Landsat product (previously downloaded products will be reused)
    product_outdir = product.download(output_dir=landsat_dir, extract=True, delete_archive=True)

    # Create a dataset from the bands required to compute LST, reproject to lat-lon coords
    # and clip to the area of interest (TODO: consider using eodag-cube's get_data)
    product_data = open_and_preprocess(
        product_outdir,
        bands=[4, 5, 10, 11],
        mask_clouds=True,
        crs="EPSG:4326",
        clip_bbox=[area["lonmin"], area["latmin"], area["lonmax"], area["latmax"]]
    )

    # Compute land surface temperature with one of pylandtemp's algorithms
    product_lst = compute_land_surface_temperature(product_data)

    # Save LST such that it can be read in the same way as the data obtained in option 1
    product_date = datetime.fromisoformat(product.properties["startTimeFromAscendingNode"])
    product_lst.rio.to_raster(os.path.join(LST_dir, f"LST.{product_date:%Y%m%d_%H%M%S}.tif"), driver="gtiff")

Note

The output land surface temperature data is written by the code above such that it is compatible with the RSLab products. From here on, further processing in the workflow is identical for both options.

Preprocessing#

The code below extracts all dates from LST data files:

# This code extraxcts all dates from LST tif.files
# Create a list of file paths to your LST files using glob
L8list = glob(f"{LST_dir}/*LST*.tif")
# Initialize an empty list to store the datetime objects
lst_dates = []
# Loop through each filename
for file in L8list:
    # Extract the filename without the directory path
    filename = file.split('/')[-1]   
    # Extract the date and time part from the filename
    if "AllImages_LST" in filename:
        date_time_str = filename.split('.')[1]
        date_str = date_time_str.split('_')[0]
        time_str = date_time_str.split('_')[1]
    else:
        date_str = filename[4:12]  # Extract date part directly
        time_str = filename[13:19]  # Extract time part directly    
    # Combine date and time strings into a single string
    datetime_str = date_str + '_' + time_str    
    # Convert the combined datetime string to a datetime object
    datetime_obj = datetime.strptime(datetime_str, '%Y%m%d_%H%M%S')   
    # Append the datetime object to the lst_dates list
    lst_dates.append(datetime_obj)

lst_dates
[datetime.datetime(2015, 7, 12, 0, 0),
 datetime.datetime(2015, 7, 28, 0, 0),
 datetime.datetime(2015, 8, 29, 0, 0),
 datetime.datetime(2015, 8, 13, 0, 0),
 datetime.datetime(2015, 7, 19, 0, 0),
 datetime.datetime(2015, 8, 4, 0, 0),
 datetime.datetime(2015, 8, 20, 0, 0)]

This code will create a raster stack from all downloaded data:

# Load the data and create a raster stack from all maps
with rasterio.open(L8list[0]) as src0:
    meta = src0.meta

meta.update(count=len(L8list))
# Save a data to working directory
with rasterio.open(f'{data_dir}/L8_raster_stack.tif', 'w', **meta) as dst:
    for id, layer in enumerate(L8list, start=1):
        with rasterio.open(layer) as src1:
            dst.write_band(id, src1.read(1))

This code calculates mean values from the LST rasters which are not significantly influenced by clouds:

# Open the raster stack
stack_path = f'{data_dir}/L8_raster_stack.tif'
raster_dataset = xr.open_dataset(stack_path)

# Count the number of pixels greater than 0 for each band
num_pixels_gt_zero = (raster_dataset > 0).sum(dim=('x', 'y'))

# Calculate the total number of pixels for each band
total_pixels = raster_dataset.sizes['x'] * raster_dataset.sizes['y']

# Calculate the proportion of pixels greater than 0
prop_gt_zero = num_pixels_gt_zero / total_pixels

# Filter bands where at least 50% of pixels are greater than 0
filtered_bands = raster_dataset.where(prop_gt_zero >= 0.5)

# Calculate the mean values for the filtered bands
mean_values_filtered = filtered_bands.mean(dim=('x', 'y'))
#mean_values_filtered = mean_values_filtered.fillna(0)
mean_values_filtered = mean_values_filtered['band_data']
mean_values_list = mean_values_filtered.values.tolist()

Classify based on the LST#

In this step, you reclassify the LST into 5 categories (Very low - Very high) based on the temperature, each category will contain two values, we divided the values into 10 groups because of the better sensitivity in the urban areas. You can change the treshold values for each category:

  • Very low < 20-25°C [values 1-2]

  • Low 25-35 °C [values 3-4]

  • Medium 35-45 °C [values 5-6]

  • High 45-55 °C [values 7-8]

  • Very High 55-60 <°C [values 9-10]

Note

If needed, you can change the threshold values for each category to see more of a difference between the different areas in the city.

Important

The LST values represent the surface temperature, not the air temperature. The surface temperature reaches higher values than the air temperature. The ground surface temperature is an important influencing factor for the weather as it is perceived by humans. The temperature of the ground surface can be more than 10°C higher than the air temperature on a sunny day, and up to 10°C below air temperature on clear nights when the surface loses heat by radiation. (see also: Land surface temperature vs air temperature).

# Define classes for grouping LST data
lc_bins = [20, 25, 30, 35, 40, 45, 50, 55, 60]  # Effect on the human health
lc_values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Load the data and calculate maximum values from raster stack
L8_path = f'{data_dir}/L8_raster_stack.tif'
L8 = xr.open_dataset(L8_path)
L8 = L8.max(dim='band', skipna=True, keep_attrs=True)
L8lst2016 = L8['band_data']
lstbbox = L8.rio.bounds()

# Function to reclassify data using numpy
def reclassify_with_numpy_ge(array, bins, new_values):
    reclassified_array = np.zeros_like(array)
    reclassified_array[array < bins[0]] = new_values[0]
    for i in range(len(bins) - 1):
        reclassified_array[(array >= bins[i]) & (array < bins[i + 1])] = new_values[i + 1]
    reclassified_array[array >= bins[-1]] = new_values[-1]
    return reclassified_array

# Apply the reclassification function
lc_class = reclassify_with_numpy_ge(L8lst2016.values, bins=lc_bins, new_values=lc_values)

# Convert the reclassified array back to xarray DataArray for plotting
lc_class_da = xr.DataArray(lc_class, coords=L8lst2016.coords, dims=L8lst2016.dims)

# Plot the data
fig, ax = plt.subplots()
cmap = plt.get_cmap('Reds', 10) 
oa = lc_class_da.plot(ax=ax, cmap=cmap, add_colorbar=False)
cbar = fig.colorbar(oa, ticks=[1, 3.25, 5.5, 7.75, 10])
cbar.ax.set_yticklabels(['Very Low', 'Low', 'Medium', 'High', 'Very High'], size=10)
ax.set_xlabel('lon [deg_east]')
ax.set_ylabel('lat [deg_north]')
plt.title('Overheated areas in the area of interest', size=16, pad=10)
plt.show()
../../../../_images/c280a7dd45a29cc1c3b93acb1d75cf8f77834b18f5246d6fcb185c2384def867.png

Plot LST#

  • We display the measured data of the 2m air temperature together with the LST data, to see for which days we downloaded the LST data.

  • This step will give us the information if we downloaded the data for the days with the highest air temperature.

# Create a dictionary from your data
data = {'Dates': lst_dates, 'Values': mean_values_list}
# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Plot using Plotly Express
fig = px.scatter(df, x='Dates', y='Values', labels={'Dates': 'Date', 'Values': 'LST Mean Values'}, color='Values')
fig.update_traces(marker=dict(color='green'))  # Set dot color to green
fig.update_traces(mode='markers')  # Display as dots
# Add title and center it
fig.update_layout(title_text='LST mean values (cloud coverage<50%)',
                  title_x=0.5,  # Center the title horizontally
                  title_y=0.9)  # Adjust vertical position if needed
fig.update_xaxes(tickformat='%d-%m-%Y')
fig.show()

This plot gives you the information about land surface temperature mean values for the selected area, per Landsat image. Only the days corresponding to the LST pictures with cloud coverage lower then 50% are plotted.

References#

Authors#