How to use Deep Learning for Flood Forecasting

Saving 460 Million Lives with more accurate early warnings

9 min readJan 3, 2025

Following is an excerpt from my article- “How Google Built an Open Source AI to Provide Accurate Flood Warnings for 460 Million People.”

The original article covers several ideas such as

Why Weather Forecasting is so difficult.
Why LSTMs are well suited for weather forecasting.
Google’s (open source) cutting-edge architecture for Flood Forecasting System deployed globally.
And what policymakers should learn from it.

The following excerpt focuses on the 3. It breaks down the following Flood Forecasting System (FFS) in more detail-

Understanding Google’s Amazing ‘AI Model’ for Flood Forecasting

You can skip this section if you have read the paper and understand FFS Architecture and its decisions (although in that case, would really appreciate your input on why they made certain architectural decisions).

We’ll first cover the data used to develop FFS.

FFS and Training Data (sectioned copied from the announcement)

“The model uses three types of publicly available data inputs, mostly from governmental sources:

Static watershed attributes representing geographical and geophysical variables: From the HydroATLAS project, including data like long-term climate indexes (precipitation, temperature, snow fractions), land cover, and anthropogenic attributes (e.g., a nighttime lights index as a proxy for human development).
Historical meteorological time-series data: Used to spin up the model for one year prior to the issue time of a forecast. The data comes from NASA IMERG, NOAA CPC Global Unified Gauge-Based Analysis of Daily Precipitation, and the ECMWF ERA5-land reanalysis. Variables include daily total precipitation, air temperature, solar and thermal radiation, snowfall, and surface pressure.
Forecasted meteorological time series over a seven-day forecast horizon: Used as input for the forecast LSTM. These data are the same meteorological variables listed above, and come from the ECMWF HRES atmospheric model.

Training data are daily streamflow values from the Global Runoff Data Center over the time period 1980–2023. A single streamflow forecast model is trained using data from 5,680 diverse watershed streamflow gauges (shown below) to improve accuracy.”

Location of 5,680 streamflow gauges that supply training data for the river forecast model from the Global Runoff Data Center.

Understanding the FFS Architecture

Here is how the researchers describe their architecture-

Our river forecast model uses two LSTMs applied sequentially: (1) a “hindcast” LSTM ingests historical weather data (dynamic hindcast features) up to the present time (or rather, the issue time of a forecast), and (2) a “forecast” LSTM ingests states from the hindcast LSTM along with forecasted weather data (dynamic forecast features) to make future predictions. One year of historical weather data are input into the hindcast LSTM, and seven days of forecasted weather data are input into the forecast LSTM. Static features include geographical and geophysical characteristics of watersheds that are input into both the hindcast and forecast LSTMs and allow the model to learn different hydrological behaviors and responses in various types of watersheds.
Output from the forecast LSTM is fed into a “head” layer that uses mixture density networks to produce a probabilistic forecast (i.e., predicted parameters of a probability distribution over streamflow). Specifically, the model predicts the parameters of a mixture of heavy-tailed probability density functions, called asymmetric Laplacian distributions, at each forecast time step. The result is a mixture density function, called a Countable Mixture of Asymmetric Laplacians (CMAL) distribution, which represents a probabilistic prediction of the volumetric flow rate in a particular river at a particular time.

Here is what I understand about what they did (and why they did what they did)-

Overall Structure

The core idea is to use two LSTMs in a chained fashion for river forecasting:

Hindcast LSTM: This is like establishing a baseline; it learns from historical weather data.
Forecast LSTM: This builds on the baseline, fine-tuning predictions using forecasted weather data.

Hindcast LSTM

Input: One year of historical weather data. This gives the model a long-term understanding of seasonal patterns and typical weather in the region.
Static Features: Geographical/geophysical data about the watersheds is included. This teaches the model how different terrains and landscapes respond to weather (e.g., runoff amounts, timings).
Purpose: The hindcast LSTM develops a robust representation of the river system’s typical behavior under varying historical weather conditions. This acts similarly to a prior in Bayesian Learning.

Forecast LSTM

Input 1: The internal states from the hindcast LSTM (the ‘memory’ of the river system).
Input 2: 7 days of forecasted weather data. This is essential for immediate, short-term predictions.
Static Features (Again): The same geographical/geophysical data is re-used to ensure the forecast stage is consistent with the knowledge developed during the hindcast.
Purpose: The forecast LSTM takes the baseline understanding and “fine tunes” it based on the expected upcoming weather.

“Head” Layer with Mixture Density Networks

Why Probabilistic: Weather and river flow are inherently uncertain. Instead of a single value (e.g., “the river will rise 2 meters”), a probabilistic forecast is far more useful, conveying the range of possibilities. I super-duper lovee this call b/c this will also feed into simulations better. Let’s understand the math terms

Mixture Density Networks (MDNs): Imagine multiple probability distributions (like bell curves) mixed together. An MDN can represent complex shapes for real-world data that isn’t perfectly described by a single type of distribution. It works on the assumption that any general distribution can be broken down into a mixture of normal distributions. For eg. this distribution

can be represented as the sum of the following distributions

Asymmetric Laplacian Distributions: This is a specific type of probability distribution good for data with steeper peaks but heavier tails (leading to their role as the normal distribution of finance). It seems that they are better at capturing volatility, which is likely why they were used here. Take a look at the comparison of a Laplace Distribution vs a Normal Distribution to see how their properties differ-

In the comparison shown above, the peak would represent the majority of days where there’s no flooding. The Short Decay captures the likelihood of mild to moderate floods that decrease in frequency as their severity increases. Lastly, The Long Tail represents the potential for rare but extreme flood events that, while less frequent, have significant impacts.

Countable Mixture of Asymmetric Laplacians (CMAL): This combines multiple asymmetric Laplacian distributions into a single, even more flexible distribution. It can capture a wide range of possible values (river flow rates) and their corresponding probabilities. CMAL seems to be based on the same philosophy as ensemble models: let different submodels handle specific skews/distributions and combine them.

Why Deep Learning Ensembles Outperform Bayesian Neural Networks

This is what the paper cited by Google for CMAL has to say:

In regression tasks, aleatoric uncertainty is commonly addressed by considering a parametric distribution of the output variable, which is based on strong assumptions such as symmetry, unimodality or by supposing a restricted shape. These assumptions are too limited in scenarios where complex shapes, strong skews or multiple modes are present…. Apart from synthetic cases, we apply this model to room price forecasting and to predict financial operations in personal bank accounts. We demonstrate that UMAL produces proper distributions, which allows us to extract richer insights and to sharpen decision-making.
-CMAL is just UMAL but with countable ALs

Figure 1: Regression problem with heterogeneous output distributions modelled with UMAL.

CMAL gives FFS a specialized toolbox of different ALDs, allowing it to capture the nuances of regional differences (the topology can change flooding impacts), seasonal shifts, and even unpredictable extremes in flood patterns. Overall, CMALs end up with more precision, combined with better uncertainty quantification, and more flexibility than uni-distributional systems.

That is my analysis of their architecture and why it works (based on the two pillars of chocolate milk and Googling). If you have anything to add, would love to hear it. Maybe I missed it here, but I really wish AI papers spent more time explaining why they decided to do what they did (this is something that Amazon Science does extremely well in their engineering blogs). It would save everyone a lot of time and can only help in pushing science forward. I don’t understand why it’s not common practice.

Using the dataset described herein, the AI model takes a few hours to train on a single NVIDIA-V100 graphics processing unit. The exact wall time depends on how often validation is done during training. We use 50 validation steps (every 1,000 batches), resulting in a 10-hour train time for the full global model.
- Good, Effective AI doesn’t have to be a Leviathan. Can we please start focusing more on these.

I can think of a few interesting ways to improve this: including the usage of HPC (high-performance computing) for more comprehensive simulations, synthetic data for augmentation, and more techniques that might be useful for forecasting. I’ll touch upon these when I do deep dives into AI for weather forecasting/simulations. If you want to help me with the research/testing for that, shoot me a message and we can talk.

a,b, Confusion matrices of out-of-sample predictions about whether F1 scores from GloFAS (a) and the AI model (b) at each gauge are above or below the mean F1 score from the same model over all gauges. The numbers shown on the confusion matrices are micro-averaged precision and recall, and the colours serve as a visual indication of these same numbers. c, Correlations between F1 scores and HydroATLAS catchment attributes that have the highest feature importance ranks from these trained score classifier models. GloFAS simulation data from the Climate Data Store33.

I absolutely love this application of AI, and I think the people at Google did a phenomenal job here. I just wish it had been marketed better- since this is the kind of implementation of AI/Tech that actually does a lot of good in the world. Spreading awareness about such uses is key to pushing more investment into such projects. So please help me share this project with more people to help raise awareness for this project. Once again, the full breakdown of this project is available here.

Thank you for reading, and I hope you have a wonderful day. Go kill all.

Dev ❤.

I put a lot of work into writing this newsletter. To do so, I rely on you for support. If a few more people choose to become paid subscribers, the Chocolate Milk Cult can continue to provide high-quality and accessible education and opportunities to anyone who needs it. If you think this mission is worth contributing to, please consider a premium subscription. You can do so for less than the cost of a Netflix Subscription (pay what you want here).

Help me buy chocolate milk

If you liked this article and wish to share it, please refer to the following guidelines.

That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. You can share your testimonials over here.