Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .
Tag

#open source

19 posts

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

Cursed venvs, Confident Releases: Testing Icechunk Across Major Versions

tl;dr: Technical details of how we do surgery on python wheels in order to do cross version compatibility testing. We're getting ready to release Icechunk V2 — the next evolution of our tensor storage engine. People run real workloads on Icechunk V1. They're not all going to upgrade on the same day.

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Evolving our Tensor Storage Engine: A Preview of Icechunk 2

Earthmover is building the cloud platform for scientific data, focusing on weather, climate and geospatial use cases. In these domains, tensors, not tables, are the ideal data model. We have devoted major engineering effort for the past year to Icechunk, our open-source transactional tensor storage

Sebastian Galkin
Sebastian Galkin

Staff Engineer

I/O-Maxing Tensors in the Cloud

I/O-Maxing Tensors in the Cloud

The critical role of I/O in data science and AI/ML For both analytics and AI workloads, fast I/O is the foundation of good performance. Most of these workloads involve fluxing a large amount of data from storage into RAM, and then to the CPU or GPU. In the cloud, where data reside on object storage,

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Building the Future of Scientific Data at the Zarr Summit

Building the Future of Scientific Data at the Zarr Summit

Next week we are convening the Zarr community in Rome, Italy for a week of fast-paced collaboration and conversation. Given the recent acceleration on Zarr adoption across major data providers in Weather Forecasting, Earth Observation and Bioimaging, this in-person event is critical for aligning sta

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

From Files to Datasets: FM-301 and the Future of Radar Interoperability

From Files to Datasets: FM-301 and the Future of Radar Interoperability

At Earthmover, we’re interested in weather radar data for two reasons: - First, radar data are uniquely valuable for our customers thanks to their ability to characterize precipitation, atmospheric turbulence, and phenomena like tornadoes and hurricanes in real time with fine spatial and temporal re

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here

A year ago, we made an important internal decision which set Earthmover on a new course—we decided to refactor and open source our core technology for storing array-based data in the cloud. This took the form of the Icechunk project, an open source package and specification enabling database-style t

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

The Untapped Promise of Weather Radar Data

The Untapped Promise of Weather Radar Data

Weather radar is one of the most powerful observational tools in atmospheric science. Every few minutes, it captures reflectivity, among other variables, in a sample volume that is rotated forming a high-resolution, four-dimensional dataset (x,y,z,t) that tracks storms in real time, revealing fine-s

Alfonso Ladino-Rincon
Alfonso Ladino-Rincon

Data Scientist

Ergonomic seasonal grouping and resampling in Xarray

Ergonomic seasonal grouping and resampling in Xarray

At Earthmover, we contribute to maintaining and driving forward a range of community open-source projects including Xarray and Zarr. The following post, cross-posted from the Xarray developer blog, describes new API for seasonal aggregation in Xarray. TL;DR Two new Grouper objects - SeasonGrouper an

Deepak Cherian
Deepak Cherian

Forward Deployed Engineer

Xarray for Biology

Xarray for Biology

This was originally published on the Xarray blog: https://xarray.dev/blog/xarray-biology Hi! I'm Ian, a multimodal microscopist, and the new "Xarray Community Developer." I am funded by the Chan Zuckerberg Institute to support the use of Xarray in biological and biomedical applications. I believe Xa

Ian Hunt-Isaak
Ian Hunt-Isaak

Xarray Community Developer

Everything you need to know about Icechunk garbage collection

Everything you need to know about Icechunk garbage collection

We will talk about two powerful Icechunk operations: expiration and garbage collection. They are related, so we usually refer to both under the name of garbage collection or simply GC. We will explain what each of them does, why you may want to use them, and how to do it safely and effectively. The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Zarr takes Cloud-Native Geospatial by storm

Zarr takes Cloud-Native Geospatial by storm

Our takeaways from the Cloud-Native Geospatial conference on Zarr's surging adoption and its impact on the future of Earth Observation data. Our team just returned from an action-packed week at the Cloud-Native Geospatial conference in beautiful Snowbird, Utah, and the key takeaway was unmistakable:

Joe Hamman
Joe Hamman

CTO & Co-founder

Learning about Icechunk consistency with a clichéd but instructive example

Learning about Icechunk consistency with a clichéd but instructive example

In this post we'll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consistency. For this, we'll use a commonplace example: bank account transfers. This is not a problem you wou

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Fundamentals: What is Cloud-Optimized Scientific Data?

Fundamentals: What is Cloud-Optimized Scientific Data?

Why naively lifting scientific data to the cloud falls flat. Scientific formats predate the cloud There are exabytes of scientific data out in the wild, with more being generated every year. At Earthmover we believe the best place for it to reside is in the cloud, in object storage. Cloud platforms

Tom Nicholas
Tom Nicholas

Software Engineer

Exploring Icechunk scalability: untangling S3's prefix story

Exploring Icechunk scalability: untangling S3's prefix story

We at Earthmover recently released the Icechunk tensor storage engine, a novel cloud-optimized storage format and library for large-scale array data. Built on Rust’s tokio async runtime, Icechunk delivers impressive gains in performance over today’s array storage engines (e.g. Zarr V2, netCDF). The

Sebastian Galkin
Sebastian Galkin

Staff Engineer

Accelerating Xarray with Zarr-Python 3

Accelerating Xarray with Zarr-Python 3

zarr-python’s performance paradox Last month, we released Zarr-Python 3.0 - a ground-up rewrite of the library (read more about it in this post). Beyond the exciting new features in Zarr V3, we put a lot of work into addressing some long standing performance issues with Zarr-Python 2. With the improvements described in this blog post, we’ve achieved a 14x speedup in loading the ARCO ERA5 dataset! Zarr-Python 2 had a paradoxical performance quirk; although the library could generate massive petabyte-scale datasets, it struggled to perform well when managing large or highly nested hierarchies. For example, listing the contents of a large Zarr group could be painfully slow, particularly if that Zarr group was stored on a high latency storage backend. Zarr users would experience this as long

Davis Bennet
Davis Bennet

Software Engineer

Zarr-Python 3 is here!

Zarr-Python 3 is here!

Note: This post was originally published on the Zarr developer blog. After more than a year of development, we’re thrilled to announce the release of Zarr-Python 3! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enh

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Announcing Icechunk!

Announcing Icechunk!

TLDR We are excited to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Icechunk works together with Zarr, augmenting the Zarr core data model with features that en

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder

Toward Zarr-Python 3.0

Toward Zarr-Python 3.0

Note: This post was originally published on the Zarr developer blog. We released Zarr-Python 2.18.0 this week. Although this release was quite light in terms of user-facing changes, it represents the beginning of a new phase for the project. In this post, we’ll walk through our plan for Zarr-Python

Ryan Abernathey
Ryan Abernathey

CEO & Co-founder