Latest Posts

We set up a high-performance PyTorch dataloader using data stored as Zarr in the cloud

Machine learning has become essential in the utilization of weather, climate, and geospatial data. Sophisticated models such as GraphCast, ClimaX, and Clay are emerging within these domains. The advancement of these models is greatly influenced by the widespread availability of cloud computing resources, particularly GPUs, and the abundance of data stored in cloud repositories. Despite these advancements, there remains a lack of established best practices for efficiently managing machine learning training pipelines due to the diverse range of data formats used when storing scientific data. In this blog post, we discuss an architecture that we have found highly effective in seamlessly integrating multidimensional arrays from cloud storage into machine learning frameworks. The problem At …
Read More

The Earthmover team will be at AGU, along with a big crew from the Pangeo community.

December is here, and that means that thousands of Earth System Scientists are getting ready for the annual pilgrimage to AGU! At my first AGU in 2011, I was a fresh-faced grad student, excited to present my latest research on modeling Southern Ocean circulation. Since then, my relationship with this massive conference has evolved a lot. First it was opportunity to reunite with old grad-school friends and catch up on the latest science. Then it became an important opportunity to help my own students and postdocs to build their networks and find their next job opportunity. Now I’m excited to return wearing a new hat: as a vendor in the exhibit hall! 🤠 The Earthmover crew will be on site for the duration of the conference. Come find us at booth 1007! We’ve teamed up with Coiled and Pan…
Read More

Arraylake empowers scientific data teams to build faster and collaborate more effectively.

At Earthmover, we believe that scientific data are key to solving humanity’s greatest challenges. And we know that scientists today are struggling with tools that don’t understand scientific data formats and data models. For the past year, we’ve been hard at work building a platform to transform how scientists interact with data in the cloud. Today, we are thrilled to announce the launch of Arraylake in private beta. Arraylake is a data lake platform built around collections of multidimensional numerical arrays (a.k.a. ND-arrays, tensors)—the native data model of physical, biological, and computational sciences, not to mention deep learning. Given the centrality of tensors to so many disciplines, we were frustrated that they are poorly supported by today’s cloud data infrastructure. Ex…
Read More

We are hiring two founding engineers to help us build our first products.

How do we best utilize software and data to tackle our planet’s most urgent challenges? Being part of the answer to this question is why I am so excited about what we’re building at Earthmover. I was trained as an engineer and a climate scientist, and previously co-founded CarbonPlan, a non-profit working to improve the quality of climate solutions through open data and tools. For the past ten years, I’ve been developing open source software and community projects that help scientists and engineers to make better use of climate and weather data. Collectively, we have a lot of work to do to address the climate crisis, and today, I’m more sure than ever that software and data are going to be key elements of our collective response to the challenges ahead. A few months ago, Ryan Abernathey an…
Read More