Tag

#open source

21 posts

Variable length chunks in Zarr

May 5, 2026

A new extension to Zarr just landed: the rectilinear chunk grid lets you specify arbitrarily sized chunks along each axis, aligning chunk boundaries with the natural structure of your data instead of forcing a regular grid.

Joe Hamman

CTO & Co-founder

Max Jones

Cloud Engineer @ DevSeed

Davis Bennet

Software Engineer (Freelance)

Announcement

Announcing Icechunk 2: Better Consistency, Performance, and Reliability for Tensor Storage

April 9, 2026

When we released Icechunk 1.0 last July, we declared it production-ready and committed to format stability. Since then, adoption has exceeded our expectations. Teams across weather forecasting, climate science, neuroscience, and AI/ML have pushed Icechunk into scenarios we didn't fully anticipate--r

Ryan Abernathey

CEO & Co-founder

Sebastian Galkin

Staff Engineer

Announcement

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

March 20, 2026

Earthmover built third-wheel, an open-source tool that rewrites Python wheels to install multiple versions of a library in one environment, enabling cross-version compatibility testing for the Icechunk V2 release.

Ian Hunt-Isaak

Xarray Community Developer

Announcement

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

December 2, 2025

A preview of Icechunk 2, featuring node rename, chunk reindexing, rectilinear grids, repository-level metadata, and significant performance improvements with a smooth migration path from Icechunk 1.

Sebastian Galkin

Staff Engineer

Blog Post

I/O-Maxing Tensors in the Cloud

November 25, 2025

Zarr Python with Icechunk or Obstore now fully saturates the network between EC2 and S3, achieving the physically maximum possible throughput for reading and writing tensor data in the cloud. Benchmarks compare Zarr, Tensorstore, TileDB, and Parquet stacks across a range of chunk sizes and instance types.

Ryan Abernathey

CEO & Co-founder

Blog Post

Building the Future of Scientific Data at the Zarr Summit

October 10, 2025

Earthmover co-organizes the Zarr Summit in Rome, bringing together developers and adopters to advance the open-source cloud-native array format as adoption accelerates across major organizations like ESA, NASA, and NVIDIA.

Ryan Abernathey

CEO & Co-founder

Blog Post

From Files to Datasets: FM-301 and the Future of Radar Interoperability

July 30, 2025

An introduction to the WMO FM-301 standard for weather radar data and how open-source tools like Xradar are turning fragmented binary radar files into structured, analysis-ready datasets.

Alfonso Ladino-Rincon

Data Scientist

Announcement

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

July 10, 2025

Icechunk 1.0 is now stable and production-ready, bringing transactional safety, efficient versioning, high-performance Rust-based I/O, and virtual references for HDF5 and NetCDF to cloud-native array storage. The release includes manifest splitting, distributed writes, conflict resolution, and a 30 TB ERA5 sample dataset.

Ryan Abernathey

CEO & Co-founder

Blog Post

The Untapped Promise of Weather Radar Data

June 30, 2025

Weather radar captures rich four-dimensional atmospheric data, but legacy binary formats and fragmented archives make large-scale analysis painfully difficult. A modern, cloud-native data model could unlock radar's vast scientific potential.

Alfonso Ladino-Rincon

Data Scientist

Blog Post

Ergonomic seasonal grouping and resampling in Xarray

June 18, 2025

Xarray introduces SeasonGrouper and SeasonResampler, two new Grouper objects that enable custom, overlapping, and variable-length seasonal aggregations without workarounds.

Deepak Cherian

Forward Deployed Engineer

Blog Post

Xarray for Biology

June 6, 2025

Xarray's labeled, multidimensional data structures can solve common pain points in biological data analysis, from tracking microscopy metadata to managing complex genomic datasets. Adoption has been limited by awareness, technical rough edges, and lack of tool integration, but the community is actively working to change that.

Ian Hunt-Isaak

Xarray Community Developer

Blog Post

Everything you need to know about Icechunk garbage collection

May 30, 2025

A practical guide to Icechunk's garbage collection and expiration operations, explaining when and how to safely reclaim storage from unused snapshots and dangling objects.

Sebastian Galkin

Staff Engineer

Blog Post

Zarr takes Cloud-Native Geospatial by storm

May 6, 2025

At the 2025 Cloud-Native Geospatial conference, Zarr adoption was surging across the geospatial domain, with Copernicus Sentinel, USGS Landsat, Google Earth Engine, and ESRI ArcGIS all embracing the format for cloud-optimized array data.

Joe Hamman

CTO & Co-founder

Blog Post

Learning about Icechunk consistency with a clichéd but instructive example

April 23, 2025

A practical walkthrough of how Icechunk uses transactions and conflict detection to guarantee data consistency when multiple processes write concurrently. The post demonstrates optimistic concurrency control and the rebase workflow using a bank-account transfer example.

Sebastian Galkin

Staff Engineer

Blog Post

Fundamentals: What is Cloud-Optimized Scientific Data?

April 17, 2025

Why traditional scientific file formats like NetCDF perform poorly on cloud object storage, and how cloud-optimized formats like Zarr and Icechunk solve the problem by separating metadata and chunking data.

Tom Nicholas

Software Engineer

Blog Post

Exploring Icechunk scalability: untangling S3's prefix story

April 10, 2025

Demystifying how S3 prefix sharding actually works and demonstrating that Icechunk can scale to hundreds of thousands of requests per second, far beyond the single-prefix limit.

Sebastian Galkin

Staff Engineer

Blog Post

Solving NASA's Cloud Data Dilemma: How Icechunk Revolutionizes Earth Data Access

March 27, 2025

Earthmover and Development Seed partnered with NASA to pilot Icechunk, an open-source tensor storage engine that enables 100x faster cloud-native data access for archival Earth science datasets without costly data migration.

Ryan Abernathey

CEO & Co-founder

Blog Post

Accelerating Xarray with Zarr-Python 3

February 20, 2025

zarr-python’s performance paradox Last month, we released Zarr-Python 3.0 - a ground-up rewrite of the library (read more about it in this post). Beyond the exciting new features in Zarr V3, we put a lot of work into addressing some long standing performance issues with Zarr-Python 2. With the improvements described in this blog post, we’ve achieved a 14x speedup in loading the ARCO ERA5 dataset! Zarr-Python 2 had a paradoxical performance quirk; although the library could generate massive petabyte-scale datasets, it struggled to perform well when managing large or highly nested hierarchies. For example, listing the contents of a large Zarr group could be painfully slow, particularly if that Zarr group was stored on a high latency storage backend. Zarr users would experience this as long

Davis Bennet

Software Engineer (Freelance)

Announcement

Zarr-Python 3 is here!

January 9, 2025

Zarr-Python 3.0 is released with full support for the Zarr V3 specification, chunk-sharding for more flexible storage, major performance improvements from a fully asynchronous core, and a modernized extensible codebase.

Joe Hamman

CTO & Co-founder

Announcement

Announcing Icechunk!

October 15, 2024

Earthmover announces Icechunk, an open-source transactional storage engine for Zarr that brings ACID transactions, time travel, data versioning, and high-performance Rust-based I/O to multidimensional array data in cloud object storage.

Ryan Abernathey

CEO & Co-founder

Blog Post

Toward Zarr-Python 3.0

May 9, 2024

The Zarr-Python project is undergoing a major refactor toward version 3.0, bringing full support for the Zarr V3 specification, new asynchronous APIs for better performance, and a modernized plugin system for codecs and storage backends.

Joe Hamman

CTO & Co-founder

← All posts

Blog Tags

access controls ai AI weather AIFS announcement arraylake Arraylake Big Data blog post blog-post case study case-study chat cirrus climate risk cloud cloud native conference data cube data infrastructure data workflows dev note earth observation earth-observation energy engineering enterprise era5 event FAIR feature release flux fundamentals garbage collection geospatial GOES-R hypothesis icechunk Icechunk IFS LangChain LangGraph marketplace mcp NASA natural language processing NCEI netCDF NetCDF noaa NOAA nws oceanography open data open science open source performance property-testing python Python Radar rust Rust security serverless shuttle tabular data technical tensorops tensors testing third-wheel toxiproxy vector data virtual-zarr virtualizarr weather wildfire WOD xarray Xarray zarr Zarr

#open source

Related Posts

Blog Tags