A new extension to Zarr just landed: the rectilinear chunk grid lets you specify arbitrarily sized chunks along each axis, aligning chunk boundaries with the natural structure of your data instead of forcing a regular grid.
The Company Kettle is not a typical insurance company. Using AI to build smarter insurance products, Kettle provides insurance for property owners in areas affected by catastrophic climate events, with a particular focus on wildfire. Their AI models consume over 130 terabytes of satellite, weather,
Eoliann builds proprietary climate risk models that estimate the physical impact of extreme weather events — floods, wildfires, and storms — on critical physical infrastructure: electricity transmission lines, substations, and gas pipelines…
When we released Icechunk 1.0 last July, we declared it production-ready and committed to format stability. Since then, adoption has exceeded our expectations. Teams across weather forecasting, climate science, neuroscience, and AI/ML have pushed Icechunk into scenarios we didn't fully anticipate--r
Zarr Python with Icechunk or Obstore now fully saturates the network between EC2 and S3, achieving the physically maximum possible throughput for reading and writing tensor data in the cloud. Benchmarks compare Zarr, Tensorstore, TileDB, and Parquet stacks across a range of chunk sizes and instance types.
Earthmover co-organizes the Zarr Summit in Rome, bringing together developers and adopters to advance the open-source cloud-native array format as adoption accelerates across major organizations like ESA, NASA, and NVIDIA.
Zarr lacks built-in support for concurrent readers and writers, leading to inconsistent reads and conflicting writes in team settings. Icechunk solves this by adding atomic updates, consistent snapshots, and Git-like version control on top of Zarr.
Introducing the Radar DataTree, a new data model that organizes thousands of fragmented weather radar scans into a single time-aware, cloud-native, version-controlled dataset using xarray-datatree, Zarr, and Icechunk.
Icechunk 1.0 is now stable and production-ready, bringing transactional safety, efficient versioning, high-performance Rust-based I/O, and virtual references for HDF5 and NetCDF to cloud-native array storage. The release includes manifest splitting, distributed writes, conflict resolution, and a 30 TB ERA5 sample dataset.
Zarr is an open-source, cloud-native protocol for storing chunked, compressed N-dimensional arrays. This guide covers how Zarr works, its ecosystem of tools like Xarray and Icechunk, and when to use it for large-scale scientific and ML data.
At the 2025 Cloud-Native Geospatial conference, Zarr adoption was surging across the geospatial domain, with Copernicus Sentinel, USGS Landsat, Google Earth Engine, and ESRI ArcGIS all embracing the format for cloud-optimized array data.
A practical walkthrough of how Icechunk uses transactions and conflict detection to guarantee data consistency when multiple processes write concurrently. The post demonstrates optimistic concurrency control and the rebase workflow using a bank-account transfer example.
Why traditional scientific file formats like NetCDF perform poorly on cloud object storage, and how cloud-optimized formats like Zarr and Icechunk solve the problem by separating metadata and chunking data.
zarr-python’s performance paradox Last month, we released Zarr-Python 3.0 - a ground-up rewrite of the library (read more about it in this post). Beyond the exciting new features in Zarr V3, we put a lot of work into addressing some long standing performance issues with Zarr-Python 2. With the improvements described in this blog post, we’ve achieved a 14x speedup in loading the ARCO ERA5 dataset! Zarr-Python 2 had a paradoxical performance quirk; although the library could generate massive petabyte-scale datasets, it struggled to perform well when managing large or highly nested hierarchies. For example, listing the contents of a large Zarr group could be painfully slow, particularly if that Zarr group was stored on a high latency storage backend. Zarr users would experience this as long
Zarr-Python 3.0 is released with full support for the Zarr V3 specification, chunk-sharding for more flexible storage, major performance improvements from a fully asynchronous core, and a modernized extensible codebase.
Earthmover announces Icechunk, an open-source transactional storage engine for Zarr that brings ACID transactions, time travel, data versioning, and high-performance Rust-based I/O to multidimensional array data in cloud object storage.
The Zarr-Python project is undergoing a major refactor toward version 3.0, bringing full support for the Zarr V3 specification, new asynchronous APIs for better performance, and a modernized plugin system for codecs and storage backends.