Latest Posts
Zarr-Python 3 is here! This release brings support for Zarr's v3 specification, new extensions, and major performance improvements.
Note: This post was originally published on the Zarr developer blog.
After more than a year of development, we’re thrilled to announce the release of Zarr-Python 3! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enhancements, and a thoroughly modernized codebase. Whether you use Zarr to managing large multi-dimensional datasets in the cloud or for high-performance machine learning applications, we’ve built Zarr-Python 3 to help you. Let’s dive into some of the details of this release!
Zarr-Python 3 is available today on PyPI and Conda-Forge. It is compatible with Python 3.11 and above.
pip install --upgrade zarr
# or
conda install --channel conda-forge zarr
Support for Zarr’s v3 specification
The most n…
Read More
Icechunk is a brand new open-source transactional storage engine for tensor / ND-array data designed for use on cloud object storage. Icechunk works together with Zarr, augmenting the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context.
TLDR
We are excited to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Icechunk works together with Zarr, augmenting the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context. With the release of Icechunk, powerful capabilities such as isolated transactions and time travel, which were previously only available to Earthmover customers via our Arraylake platform, are now free and open source. Head over to icechunk.io to get started!
This is a blog version of a webinar that took place on October 22, 2024. View the presentation slide deck or check out the video of that webinar:
The …
Read More
Thanks to Xvec and developments across a number of packages, the Xarray ecosystem now supports data cubes with vector geometries as coordinate locations.
This is a blog version of a webinar that took place on August 27, 2024. Here’s a video of that webinar:
Geospatial datasets representing information about real-world features such as points, lines, and polygons are increasingly large, complex, and multidimensional. They are naturally represented as vector data cubes: n-dimensional arrays where at least one dimension is a set of vector geometries. The Xarray ecosystem now supports vector data cubes thanks to Xvec, a package designed for working with vector geometries within the Xarray data model 🎉. For those familiar with GeoPandas, Xvec is to Xarray as GeoPandas is to Pandas.
This blog post is geared toward analysts working with geospatial datasets. We introduce vector data cubes, discuss how they differ from raster data cubes, and d…
Read More
How Arraylake is enabling scientific research.
Background
The University of Wisconsin-Madison is home to a research team called Advanced Baseline Imager Live Imaging of Vegetated Ecosystems (ALIVE). The team, working remotely and led by Prof. Paul Stoy, PhD, is building a gradient-boosting regression model using geostationary satellites to estimate terrestrial carbon and water fluctuations in near real-time. The team trains its models using GOES-R and other public satellite and meteorological datasets.
In trying to process this data, they ran into the central problem when working with raster data for time series analysis – the data’s format, mainly NetCDF and GeoTIFF, is not conducive to time-series analysis. This experience inspired them to strive to create output datasets that are analysis-ready for various applications.
During AM…
Read More