the cloud platform for scientific data teams

Introducing Arraylake, a cloud data lake platform for multidimensional scientific data. Effortlessly analyze, organize, build, and collaborate.

Built for your favorite open source libraries and data formats

Transform scattered data into a single source of truth

Most teams working with array data use a patchwork of file formats and storage solutions, making it hard to fully realize the data's value.

With Arraylake all your team's data are accessible through a central, unified catalog, searchable, and queryable via a high-performance cloud-native API.

Collaborate with confidence

ACID transactions and a rich permission structure allow users to safely and efficiently evolve a shared body of data.

Immutable data references enable reproducible, verifiable data-driven workflows.

A true home for scientific data

Existing data storage providers are built for tabular data and struggle to support multidimensional arrays.

Purpose-built for multidimensional, scientific data born our of our founders' own challenges in the field.

Unlock the value of your metadata

Scientific data (e.g. NetCDF files) come with extremely rich metadata, but most tools simply throw it away. Arraylake uses this metadata to enable rich browse and search experiences.

Search

cell_methods == time: mean
Search
Array Metadata Screenshot Array Metadata Screenshot

Harness the power of the cloud

Object storage is great for storing raw bytes, but it’s not a data catalog. Arraylake lets you leverage the scalability of performance of object storage while providing the flexibility and queryability of a database.

Shortcut the development cycle

Save months of development time by using an off-the-shelf solution instead of building and maintaining a bespoke system for cataloging array data in object storage.

Array Terminal Screenshot
Array Terminal Screenshot

Why Arraylake

Traditional data warehouses are designed to only work with tabular data, relegating arrays to the purgatory of "unstructured data".

That's why most teams working with array data choose to roll their own solutions, building custom data management systems on top of cloud object storage. Arraylake offers the best of both worlds:

A turnkey, cloud-native data lake platform built around the array data model.
Snowflake Google BigQuery Databricks

Managed Data Lakehouse

  • Data must be massaged into tabular formats
  • Loss of data fidelity
  • Poor performance on array workloads
  • Turnkey solution
  • Transactions, data versioning, catalog
  • Integrated compute layer

  • Native Array data model
  • Integration with scientists' preferred tools
  • Industry-leading performance
  • Turnkey solution
  • Transactions, data versioning, catalog
  • 🚀
    Automatic optimization Coming Soon!
Google Cloud Storage Amazon S3 Microsoft Azure Blob

Cloud Object Storage

  • Data model agnostic
  • Integration with scientists' preferred tools
  • Good performance is possible
  • Custom engineering required
  • No transactions, data versioning, catalog
  • No integrated compute layer
Slyvera Website Screenshot

"From testing Arraylake, it became clear that this new pipeline was much quicker than with their previous workflow..."

Case Study: Sylvera x Earthmover

ALIVE Model Visualization

"Arraylake takes our work and makes it more real, tractable, and dynamic...."

Case Study: ALIVE x Earthmover

Get started with your scientific data lake:

Interested in learning more?

Book a demo today and join our mailing list to stay up to date on releases and new features.