Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .

Blog

Articles, announcements, and case studies from the Earthmover team.

Announcing the Earthmover Data Marketplace

Announcing the Earthmover Data Marketplace

Earthmover launches the world's first marketplace for AI-ready weather and climate data, offering instant access to analysis-ready cloud-optimized data cubes from leading providers in the open-source Icechunk format.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

I/O-Maxing Tensors in the Cloud

I/O-Maxing Tensors in the Cloud

Zarr Python with Icechunk or Obstore now fully saturates the network between EC2 and S3, achieving the physically maximum possible throughput for reading and writing tensor data in the cloud. Benchmarks compare Zarr, Tensorstore, TileDB, and Parquet stacks across a range of chunk sizes and instance types.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Building the Future of Scientific Data at the Zarr Summit

Building the Future of Scientific Data at the Zarr Summit

Earthmover co-organizes the Zarr Summit in Rome, bringing together developers and adopters to advance the open-source cloud-native array format as adoption accelerates across major organizations like ESA, NASA, and NVIDIA.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Plotting NYC heatwaves during NYC Climate Week

Plotting NYC heatwaves during NYC Climate Week

A hands-on walkthrough of calculating historical heatwave frequency over NYC using ERA5 reanalysis data on the Earthmover platform with Arraylake, Icechunk, Xarray, and open-source climate tools.

Tom Nicholas
Tom Nicholas

Software Engineer

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Zarr lacks built-in support for concurrent readers and writers, leading to inconsistent reads and conflicting writes in team settings. Icechunk solves this by adding atomic updates, consistent snapshots, and Git-like version control on top of Zarr.

Lindsey Nield
Lindsey Nield

Software Engineer

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

Icechunk 1.0 is now stable and production-ready, bringing transactional safety, efficient versioning, high-performance Rust-based I/O, and virtual references for HDF5 and NetCDF to cloud-native array storage. The release includes manifest splitting, distributed writes, conflict resolution, and a 30 TB ERA5 sample dataset.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Meet the Earthmover Team at SciPy 2025 in Tacoma!

Meet the Earthmover Team at SciPy 2025 in Tacoma!

The Earthmover team is attending SciPy 2025 in Tacoma, Washington, with a tutorial on Xarray DataTree and Zarr, multiple talks and posters on Icechunk and Xarray, and a booth showcasing the Earthmover Platform.

Joe Hamman
Joe Hamman

CTO & Co-founder

The Untapped Promise of Weather Radar Data

The Untapped Promise of Weather Radar Data

Weather radar captures rich four-dimensional atmospheric data, but legacy binary formats and fragmented archives make large-scale analysis painfully difficult. A modern, cloud-native data model could unlock radar's vast scientific potential.

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Announcing Fine-Grained Access Controls

Announcing Fine-Grained Access Controls

Arraylake now supports fine-grained, repository-level permissions with admin, write, and read privilege levels, giving teams precise control over data access and secure sharing with external collaborators.

Brian Davis
Brian Davis

Software Engineer

Xarray for Biology

Xarray for Biology

Xarray's labeled, multidimensional data structures can solve common pain points in biological data analysis, from tracking microscopy metadata to managing complex genomic datasets. Adoption has been limited by awareness, technical rough edges, and lack of tool integration, but the community is actively working to change that.

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Icechunk: Efficient storage of versioned array data

Icechunk: Efficient storage of versioned array data

Icechunk stores versioned array data efficiently by never copying or rewriting existing chunks, so each new version only consumes storage for the data that actually changed. Older versions can be expired and garbage-collected when they are no longer needed.

Sebastian Galkin
Sebastian Galkin

Staff Engineer

TensorOps: Scientific Data Doesn't Have to Hurt

TensorOps: Scientific Data Doesn't Have to Hurt

Scientific data pipelines are plagued by data swamps, duplicated code, fragile workflows, and siloed teams. TensorOps is a vision for modern practices that bring collaboration, velocity, and reliability to scientific data engineering.

Brian Davis
Brian Davis

Software Engineer

Zarr takes Cloud-Native Geospatial by storm

Zarr takes Cloud-Native Geospatial by storm

At the 2025 Cloud-Native Geospatial conference, Zarr adoption was surging across the geospatial domain, with Copernicus Sentinel, USGS Landsat, Google Earth Engine, and ESRI ArcGIS all embracing the format for cloud-optimized array data.

Joe Hamman
Joe Hamman

CTO & Co-founder

Fundamentals: Tensors vs. Tables

Fundamentals: Tensors vs. Tables

Multidimensional array data about the physical world is fundamentally incompatible with the tabular data model. Benchmarks show that array-native tools like Xarray and Zarr outperform DuckDB and Parquet by up to 10x for common weather data queries.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

How Our Customers Use NOAA Data

How Our Customers Use NOAA Data

Earthmover customers share how NOAA climate and weather data powers their businesses, from wildfire risk modeling and energy trading to carbon market ratings and precipitation enhancement.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Accelerating Xarray with Zarr-Python 3

Accelerating Xarray with Zarr-Python 3

zarr-python’s performance paradox Last month, we released Zarr-Python 3.0 - a ground-up rewrite of the library (read more about it in this post). Beyond the exciting new features in Zarr V3, we put a lot of work into addressing some long standing performance issues with Zarr-Python 2. With the improvements described in this blog post, we’ve achieved a 14x speedup in loading the ARCO ERA5 dataset! Zarr-Python 2 had a paradoxical performance quirk; although the library could generate massive petabyte-scale datasets, it struggled to perform well when managing large or highly nested hierarchies. For example, listing the contents of a large Zarr group could be painfully slow, particularly if that Zarr group was stored on a high latency storage backend. Zarr users would experience this as long

Davis Bennet
Davis Bennet

Software Engineer

Zarr-Python 3 is here!

Zarr-Python 3 is here!

Zarr-Python 3.0 is released with full support for the Zarr V3 specification, chunk-sharding for more flexible storage, major performance improvements from a fully asynchronous core, and a modernized extensible codebase.

Joe Hamman
Joe Hamman

CTO & Co-founder

Announcing Icechunk!

Announcing Icechunk!

Earthmover announces Icechunk, an open-source transactional storage engine for Zarr that brings ACID transactions, time travel, data versioning, and high-performance Rust-based I/O to multidimensional array data in cloud object storage.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Vector data cubes in Xarray

Vector data cubes in Xarray

Vector data cubes extend the familiar raster data cube concept to geospatial vector data, using arrays indexed by geometries instead of gridded coordinates. The Xvec package brings this capability to Xarray, enabling powerful multidimensional analysis of point, line, and polygon data.

Emma Marshall
Emma Marshall

Software Engineer

Case Study: ALIVE at The University of Wisconsin-Madison

Case Study: ALIVE at The University of Wisconsin-Madison

The ALIVE research team at UW-Madison uses Arraylake to manage GOES-R satellite data for near real-time carbon and water flux estimation, benefiting from version control, ACID transactions, and seamless remote collaboration.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Toward Zarr-Python 3.0

Toward Zarr-Python 3.0

The Zarr-Python project is undergoing a major refactor toward version 3.0, bringing full support for the Zarr V3 specification, new asynchronous APIs for better performance, and a modernized plugin system for codecs and storage backends.

Joe Hamman
Joe Hamman

CTO & Co-founder

Case Study: Sylvera

Case Study: Sylvera

Carbon market ratings company Sylvera adopted Arraylake to centralize millions of scattered geotiff files into cloud-optimized arrays, enabling incremental data ingestion and version-tracked auditing across their geospatial pipelines.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Earthmover and Pangeo at AGU 2023

Earthmover and Pangeo at AGU 2023

Earthmover will be at AGU 2023 in booth 1007 alongside Coiled and Pangeo, demoing Arraylake and presenting three talks on cloud-native scientific data workflows.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Arraylake Now Available in Private Beta

Arraylake Now Available in Private Beta

Earthmover launches Arraylake in private beta, a cloud-native data lake platform purpose-built for multidimensional arrays with a built-in data catalog, ACID transactions, version control, and virtual file support.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Earthmover is hiring

Earthmover is hiring

Earthmover is hiring two founding engineers to help build a modern data stack for science, tackling climate and planetary challenges with cloud-native software.

Joe Hamman
Joe Hamman

CTO & Co-founder

Why we started Earthmover

Why we started Earthmover

Earthmover was founded to build a modern cloud data stack for scientific data, inspired by the success of the Pangeo open-source community and the urgent need for better tooling around multidimensional array datasets in climate tech and beyond.

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder