Announcing the Earthmover Data Marketplace: Subscribe to ARCO datasets from ECMWF, NOAA, and more. Explore the marketplace .

Announcing the Arraylake MCP Server

Announcing the Arraylake MCP Server
Matt Iannucci
Matt Iannucci

Engineering

TLDR

The Arraylake MCP server is now publicly available at https://app.earthmover.io/mcp. Connect any Model Context Protocol client, including Claude.ai, ChatGPT, Claude Code, or anything else that speaks Streamable HTTP, and your AI assistant can browse your organizations and repositories, walk the tree of any dataset, query Flux services with OGC EDR, render interactive maps directly in chat, and link straight back to the Arraylake web app. Head to the docs to get started.

Why MCP?

AI agents are rapidly transforming science and data analytics. Natural-language-based interfaces allow scientists and developers to interact with their data in the most familiar, comfortable, and efficient way possible. This post is the first in a series of announcements about the Earthmover platform is evolving to support this exciting direction.

Model Context Protocol (MCP) is an open standard that lets AI assistants include context from external tools and data sources. The Arraylake MCP server is a remote MCP server, allowing you to connect your AI assistant to Arraylake without needing to run a local server.

A big part of working with scientific data is exploration:

  • What’s in this repo? What variables does it have?
  • What’s the latest forecast init time?
  • What does the temperature field look like over my region of interest right now?

These questions have always been answerable in Arraylake, through the web app, the Python client, or Flux, but every answer required multiple tools: Open a tab, open a notebook, paste a coordinate, run a cell, look at the plot. The Arraylake MCP server brings your Arraylake context directly to your AI assistant, speeding up your data exploration and analysis workflows.

Arraylake is AI-ready data

The MCP server is the interface, but the reason it works so well is the data underneath. Arraylake was built on two foundations that turn out to be exactly what an AI agent needs:

  • ARCO data, directly addressable over the internet. Arraylake stores data as analysis-ready, cloud-optimized chunked arrays in object storage, using Zarr and Icechunk. An agent doesn’t have to download a 200 GB NetCDF file to answer a question about one grid point, it (or Flux on its behalf) reads only the chunks it needs on demand. Agents are token-bound and time-bound and ARCO data fits that constraint natively.
  • Self-describing data with an always-accurate catalog. Zarr and CF conventions mean every dataset carries its own dimensions, coordinates, units, and attributes inline. An agent that opens a repo immediately knows what t2m means, what its units are, and how it’s gridded. And because the Arraylake catalog is the data (not a sidecar pointing at it), what the agent discovers is what’s actually there. There is no drift between metadata and reality.

ARCO + self-describing data + a unified catalog is already a powerful recipe for humans. The same recipe empowers agents to autonomously discover, inspect, slice, and visualize petabyte-scale archives.

Working with the MCP

Once you’ve connected a client, the assistant has access to a curated set of tools that streamline your data exploration and analysis workflows with Arraylake:

  • Discover: browse available repositories, search the public Marketplace.
  • Explore: walk the group/array tree of a repo, inspect xarray schemas (dimensions, coordinates, variables, attributes), read commit history and diffs.
  • Query: run OGC EDR position, area, and cube queries against your Flux compute services and get CSV, CoverageJSON, NetCDF, or GeoTIFF back.
  • Visualize: render live pan/zoom maps of any variable directly in the chat, or mint Flux tile URLs to drop into a custom app or static page.
  • Link: every discovery and lookup tool returns a deep link to the matching page in the Arraylake web app, so it’s easy to jump straight from your agent to the data you’re interested in.

Check out the Tool Reference to see the full list of available tools.

What it looks like

Here’s a real exchange against the earthmover/aifs repo (ECMWF’s AI-based weather model on Arraylake, created via a Marketplace Subscription to one of Dynamical’s free Listings):

🧑 You: Show me a temperature plot in Fahrenheit for tonight in Narragansett, Rhode Island using the earthmover/aifs repo on Arraylake.

🤖 Claude: I explored the earthmover/aifs repo on Arraylake — it’s the ECMWF AIFS Single forecast dataset (AI-based weather model, 0.25° global, 6-hourly forecasts out to 15 days). Latest initialization available: 2026-04-23 12:00 UTC.

Queried the EDR service at the grid point nearest Narragansett, RI (41.5°N, 71.5°W) for the temperature_2m variable, then converted from Celsius to Fahrenheit.

Forecast for tonight (EDT)

Valid time (EDT)Lead timeTemp (°C)Temp (°F)
2pm today (18Z)+6h17.162.8
8pm tonight (00Z)+12h13.155.5
2am overnight (06Z)+18h7.645.7
8am tomorrow (12Z)+24h7.946.2

A typical spring overnight — cooling about 10°F from evening into the pre-dawn hours, bottoming out in the mid-40s before sunrise.

Temperature forecast for Narragansett, RI

Source: ECMWF AIFS Single forecast, init 2026-04-23 12:00 UTC · earthmover/aifs on Arraylake.

Visualization works the same way. Ask for a map and the assistant renders one inline:

🧑 You: Using Arraylake can you show me a map of where it will be raining in the world tomorrow May 13, 2026 at 2 PM EDT according to aifs?

🤖 Claude: Tomorrow May 13, 2026 at 2 PM EDT = 18:00 UTC. The AIFS model initializes every 6 hours, so the most recent initialization available is the May 12 00:00 UTC run, with a lead time of 42 hours.

Here’s your global precipitation forecast:

AIFS precipitation rate forecast for 2026-05-13 18:00 UTC

Darker blue = heavier precipitation rate. The colorscale runs from ~0 to 0.0005 kg/m²/s (roughly 0 to 1.8 mm/hr); anything below the threshold is transparent so you only see where it’s actually raining.

The same machinery handles repo-auditing questions (“who was the last person to update this repo and how many chunks did they touch?”) and marketplace discovery (“find me a public ERA5 dataset with surface temperature”). See the Examples page for more.

Connecting a client

The server speaks the standard MCP Streamable HTTP transport with OAuth 2.1, so any compliant client works. A few of the common ones:

Claude.ai: Settings → Connectors → Add custom connector. Name it Arraylake, URL https://app.earthmover.io/mcp. Sign in and you’re done.

Claude Code:

claude mcp add --transport http arraylake https://app.earthmover.io/mcp

ChatGPT, Codex, Cursor, and others: see the setup guide for step-by-step instructions.

On first use, the client opens a browser, you sign into Arraylake the same way you would in the web app, and the token is stored. For headless setups (CI jobs, remote dev environments, anything that can’t pop a browser), you can also pass an Arraylake API token as a static Authorization header to skip OAuth entirely and customize the permissions for your specific workflow.

How do permissions work?

Tools inherit the permissions of whichever identity the token was issued for, exactly as if you were calling the API directly. There’s no new permission model to learn and no new attack surface to worry about.

Early access: we want to hear from you!

This is the first public release of the MCP server, and we’re treating it as an actively evolving surface. The tools you see today cover discovery, exploration, EDR queries, and visualization. Notably absent for now is server-side Python execution; we’re working on extending Arraylake to support more flexible compute workloads, stay tuned!

If you are building agentic workflows that integrate with Arraylake, we’d love if you tried the MCP and let us know how we can help to build the features you need. Reach out to us at support@earthmover.io or send us a message on our Slack and let’s get building!

Matt Iannucci
Matt Iannucci

Engineering