Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .
Tag

#icechunk

17 posts

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

tl;dr: Technical details of how we do surgery on python wheels in order to do cross version compatibility testing. We're getting ready to release Icechunk V2 — the next evolution of our tensor storage engine. People run real workloads on Icechunk V1. They're not all going to upgrade on the same day.

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Earthmover is building the cloud platform for scientific data, focusing on weather, climate and geospatial use cases. In these domains, tensors, not tables, are the ideal data model. We have devoted major engineering effort for the past year to Icechunk, our open-source transactional tensor storage

Sebastian Galkin
Sebastian Galkin

Staff Engineer

I/O-Maxing Tensors in the Cloud

I/O-Maxing Tensors in the Cloud

The critical role of I/O in data science and AI/ML For both analytics and AI workloads, fast I/O is the foundation of good performance. Most of these workloads involve fluxing a large amount of data from storage into RAM, and then to the CPU or GPU. In the cloud, where data reside on object storage,

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Multi-Player Mode: Why Teams That Use Zarr Need Icechunk

Bring reliability, scalability, and version control to your Zarr datasets, without giving up performance. Zarr is a powerful protocol for storing large-scale, multi-dimensional arrays. It's fast, scalable, and cloud-native, which is why it's used across a variety of domains like climate science and

Lindsey Nield
Lindsey Nield

Software Engineer

Radar DataTree: Transforming thousands of scans into a single cohesive model

Radar DataTree: Transforming thousands of scans into a single cohesive model

From structure to scale, radar needs a model that organizes complete collections as time-aware, cloud-native datasets. In our second post, we looked at how new standards and open-source tools are transforming weather radar from raw binary blobs into structured, metadata-rich datasets. FM-301—an offi

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

A year ago, we made an important internal decision which set Earthmover on a new course—we decided to refactor and open source our core technology for storing array-based data in the cloud. This took the form of the Icechunk project, an open source package and specification enabling database-style t

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Everything you need to know about Icechunk garbage collection

Everything you need to know about Icechunk garbage collection

We will talk about two powerful Icechunk operations: expiration and garbage collection. They are related, so we usually refer to both under the name of garbage collection or simply GC. We will explain what each of them does, why you may want to use them, and how to do it safely and effectively. The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data

Why scientists, data engineers, and developers are turning to Zarr Often the biggest bottleneck in your workflow isn’t your code or your hardware, but the way your data is stored. Data formats can limit–or unlock–what you’re able to do with your data. In modern science and data-intensive computing,

Lindsey Nield
Lindsey Nield

Software Engineer

Icechunk: Efficient storage of versioned array data

Icechunk: Efficient storage of versioned array data

We recently got an interesting question in Icechunk's community Slack channel (thank you Iury Simoes-Sousa for motivating this post): I'm new to Icechunk. How is the storage managed for redundant information between different versions of a data repository? Icechunk keeps your data versioned, allowin

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Zarr takes Cloud-Native Geospatial by storm

Zarr takes Cloud-Native Geospatial by storm

Our takeaways from the Cloud-Native Geospatial conference on Zarr's surging adoption and its impact on the future of Earth Observation data. Our team just returned from an action-packed week at the Cloud-Native Geospatial conference in beautiful Snowbird, Utah, and the key takeaway was unmistakable:

Joe Hamman
Joe Hamman

CTO & Co-founder

Learning about Icechunk consistency with a clichéd but instructive example

Learning about Icechunk consistency with a clichéd but instructive example

In this post we'll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consistency. For this, we'll use a commonplace example: bank account transfers. This is not a problem you wou

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: What is Cloud-Optimized Scientific Data?

Fundamentals: What is Cloud-Optimized Scientific Data?

Why naively lifting scientific data to the cloud falls flat. Scientific formats predate the cloud There are exabytes of scientific data out in the wild, with more being generated every year. At Earthmover we believe the best place for it to reside is in the cloud, in object storage. Cloud platforms

Tom Nicholas
Tom Nicholas

Software Engineer

Exploring Icechunk scalability: untangling S3's prefix story

Exploring Icechunk scalability: untangling S3's prefix story

We at Earthmover recently released the Icechunk tensor storage engine, a novel cloud-optimized storage format and library for large-scale array data. Built on Rust’s tokio async runtime, Icechunk delivers impressive gains in performance over today’s array storage engines (e.g. Zarr V2, netCDF). The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Announcing Icechunk!

Announcing Icechunk!

TLDR We are excited to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Icechunk works together with Zarr, augmenting the Zarr core data model with features that en

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder