Groundsource: Google’s AI turns news articles into a flood history dataset

Groundsource: Google’s AI turns news articles into a flood history dataset

4 0 0

Google Research just dropped something I’ve been waiting for — a way to turn the noise of news reports into structured, usable data. They call it Groundsource, and the first output is a global flash flood dataset with 2.6 million records. That’s not a typo. 2.6 million.

Here’s the problem they’re solving: we have decent data for big earthquakes or hurricanes that satellites can see. But flash floods? Those quick, localized disasters that kill people in urban areas? They slip through the cracks. Traditional databases like the Dartmouth Flood Observatory or GDACS capture maybe tens of thousands of events, mostly the big ones. That’s nowhere near enough to train AI models that need to predict where the next flood will hit.

Groundsource uses Gemini to read news articles — millions of them — and extract location, date, severity, and other details. No manual tagging. No waiting for satellite passes. Just AI parsing text and turning it into a map of historical floods going back to the year 2000, covering over 150 countries.

I’ll be honest: the idea isn’t entirely new. Researchers have been scraping news for disaster data for years, but the scale here is what impresses me. 2.6 million records is an order of magnitude more than anything publicly available. And Google is releasing the dataset openly, which is the right call. Proprietary datasets help no one when you’re trying to save lives.

The methodology itself is interesting. They don’t just grab headlines and call it a day. The pipeline filters for relevance, extracts structured fields using Gemini’s long-context capabilities, and then validates against known sources. They claim high precision on location extraction, which is the hardest part — news articles often say “flooding in the city center” without giving coordinates.

Of course, there are caveats. News coverage is biased toward wealthier regions and English-language media. A flash flood in rural Bangladesh might not make international headlines the same way one in London does. The dataset will have blind spots, and the researchers acknowledge that. But even with those gaps, it’s a massive improvement over what we had.

What I’d really like to see next is applying this same framework to other hazards — wildfires, landslides, heatwaves. The infrastructure is there. If they can do for drought what they did for floods, that would be genuinely transformative.

For now, this is a solid step forward. If you work in climate modeling, disaster response, or urban planning, go grab the dataset. It’s open access, it’s free, and it might just help save lives.

Comments (0)

Be the first to comment!