How Kettle Uses Earthmover to Power Wildfire Risk Modeling at Scale

The Company

Kettle is not a typical insurance company. Using AI to build smarter insurance products, Kettle provides insurance for property owners in areas affected by catastrophic climate events, with a particular focus on wildfire. Their AI models consume over 130 terabytes of satellite, weather, real estate, and utility data, extracting risk indicators to simulate millions of wildfire scenarios.

As climate change reshapes fire behavior across the American West and beyond, the traditional actuarial playbook — look back five or ten years, extrapolate forward — just doesn’t work as well as it used to. As the Kettle team puts it: wildfire is fundamentally random and changes year to year. The conditions that drive ignition and spread are changing faster than historical data can capture. Their response has been to build some of the most sophisticated wildfire risk models in the industry: a platform that combines cutting-edge climate science, geospatial data engineering, and probabilistic simulation to give underwriters a clearer, more accurate picture of risk at the parcel level, across a wide range of future scenarios.

The Challenge: Brittle, Bespoke Data Infrastructure

Principal Data Engineer Blake Haugen and Principal Software Engineer Max Dion work in the heart of Kettle’s data management operation: they sit at the intersection of data engineering and climate science, responsible for ingesting, managing, and serving the massive raster datasets that power Kettle’s models. Kettle’s data sources span the public and private sectors: NASA satellite sensors (e.g. MODIS), CalFire, the National Interagency Fire Center (NIFC), and others.

The datasets themselves are diverse and complex — weather variables such as wind speed, temperature, and relative humidity; topographic data including elevation and slope; land cover and vegetation indices; and historical wildfire occurrence and spread records. Most of this data arrives as large multidimensional arrays: large three-dimensional raster grids that encode values across space and time. Working with this kind of data at the scale Kettle demands requires specialized tooling, careful storage design, and thoughtful pipeline architecture.

Before Earthmover, the team had cobbled together a workable solution built around the tabular Parquet data format. Scripts would download data from source, apply necessary transformations, and write outputs to storage. Spatial queries — pulling a slice of data for a particular geographic region, time window, or variable — required pre-computing those slices in advance. Max and Blake had built elaborate pre-computation infrastructure to make this fast and usable. And it worked.

But it was time-consuming to build, complex to maintain, and difficult to extend as new data sources and modeling needs emerged.

“Blake and I actually set up some elaborate pipelines to pre-compute all that… it was tedious to set up, not enjoyable, though the pipelines worked.”

— Max Dion, Principal Software Engineer

Data versioning was another persistent headache. Whenever a dataset required an update — for instance, when a new monthly update to the vegetation index arrived — it was tricky to write the new version without breaking pipelines or queries reading the old one. Without a proper versioning layer, the team had to build their own safeguards, adding complexity and operational risk at every turn.

Finding Earthmover

Blake had been working in the geospatial space long enough to know the landscape of available tools. When the team began evaluating options for better array data management, Earthmover stood out on three dimensions.

The architecture fit. Earthmover is built around array-native storage, designed from the ground up for the kind of multidimensional raster data Kettle works with every day. Unlike general-purpose data lakes or columnar formats optimized for tabular data, Earthmover handles the multidimensional (spatial, temporal, and more) nature of climate datasets natively.

The versioning story was compelling. Earthmover’s Arraylake allows teams to store data in one place, write new versions incrementally, and serve reads from a stable, consistent view — without the risk of a write operation disrupting a downstream model run or a production dashboard query.

The open-source foundation mattered. Kettle wanted a solution they could work with deeply and flexibly, not a black box. Earthmover’s commitment to open data format standards (Zarr & Icechunk) meant the team could integrate it into their existing Python-based workflows without a steep migration cost.

“We needed something to better manage, query, and store that data… having the extra infrastructure around storing and versioning was really the key piece for us to get moving.”

— Blake Haugen, Principal Data Engineer

Kettle and Earthmover

Today, Earthmover sits at the heart of Kettle’s data pipeline, serving as the authoritative store for the geospatial datasets that feed their wildfire models. The team maintains their own ingestion pipelines — downloading raw data, applying transformations, and writing outputs to Arraylake — with an increasingly automated workflow as their data footprint grows.

One of the most frequently updated datasets is the Enhanced Vegetation Index (EVI) from NASA, a monthly product that captures the density and health of vegetation across the landscape. Vegetation is among the most important variables in wildfire modeling: dry, dense fuels burn faster and hotter, and fine-grained vegetation data dramatically improves the accuracy of both ignition and spread models. Beyond EVI, the team stores and versions weather variables, topographic datasets, and historical fire records — all versioned, queryable, and available to model developers on demand.

Model outputs are surfaced to Kettle’s underwriting team through an internal dashboard built and maintained in-house, including burn probability map layers that allow underwriters to assess risk at the property level and make more informed decisions.

And the Kettle team is happier and more productive on Earthmover. Why?

A dramatically better developer experience. Working with large array datasets is notoriously difficult — files are big, formats are complex, and debugging data issues can eat hours. Earthmover’s tooling and platform made Kettle’s datasets more stable, more predictable, and easier to reason about.

“I think what we’ve seen is that our data sets are so much more stable and so much easier to work with and reason about… the developer experience has improved dramatically.”

— Blake Haugen, Principal Data Engineer

Eliminated brittle pre-computation infrastructure. The elaborate pre-computation workflows the team had built to make spatial queries usable have been replaced by Arraylake. Thanks to Icechunk/Zarr’s stellar performance characteristics, fetching an arbitrary spatial slice — a specific region, time range, or variable — is now simpler and more direct, without requiring the team to anticipate every query pattern in advance and pre-build outputs for each one. Array workloads demand array-native tooling.

Reliable data versioning. The ability to write new versions of datasets without disrupting concurrent reads has removed a significant source of operational risk. The team no longer needs to hand-roll versioning logic or worry about race conditions between ingestion pipelines and model runs.

Greater independence and control. For datasets where Kettle has historically relied on third-party platforms like Google Earth Engine, Earthmover offers a path to greater control. Google Earth Engine provides useful capabilities but without service guarantees — a meaningful concern for a company whose models inform real underwriting decisions. Building on Earthmover gives Kettle the ability to manage their own data, on their own terms, for the datasets where it matters most.

Exceptional support. Both Blake and Max highlighted the quality of support they’ve received from the Earthmover team — not just during onboarding, but on an ongoing basis. Quick responses, willingness to engage on deep technical issues, and fast turnaround on bug reports have made the partnership feel collaborative rather than transactional.

“The product is great, but Earthmover support is fantastic — just top-notch.”

— Blake Haugen, Principal Data Engineer

Results

130+ terabytes of multi-modal data — satellite, weather, real estate, and utility data — managed with Earthmover and processed across Kettle’s models
High throughput for AI operations: many risk indicators extracted per property from multi-modal data to simulate wildfire scenarios; 10,000 annual simulations run to build probabilistic burn probability surfaces
AI-scale data management: Kettle’s 3 proprietary AI models — ignition, spread, and vulnerability — powered by data managed in Earthmover
Seamless dataset updates like the monthly vegetation index (EVI) versioned and managed in Arraylake