the cloud platform for scientific data teams
Introducing Arraylake, a cloud data lake platform for multidimensional scientific data. Effortlessly analyze, organize, build, and collaborate.
Built for your favorite open source libraries and data formats
Transform scattered data into a single source of truth
Most teams working with array data use a patchwork of file formats and storage solutions, making it hard to fully realize the data's value.
With Arraylake all your team's data are accessible through a central, unified catalog, searchable, and queryable via a high-performance cloud-native API.
Collaborate with confidence
ACID transactions and a rich permission structure allow users to safely and efficiently evolve a shared body of data.
Immutable data references enable reproducible, verifiable data-driven workflows.
A true home for scientific data
Existing data storage providers are built for tabular data and struggle to support multidimensional arrays.
Purpose-built for multidimensional, scientific data born our of our founders' own challenges in the field.
Unlock the value of your metadata
Scientific data (e.g. NetCDF files) come with extremely rich metadata, but most tools simply throw it away. Arraylake uses this metadata to enable rich browse and search experiences.
Search
Shortcut the development cycle
Save months of development time by using an off-the-shelf solution instead of building and maintaining a bespoke system for cataloging array data in object storage.
Why Arraylake
Traditional data warehouses are designed to only work with tabular data, relegating arrays to the purgatory of "unstructured data".
That's why most teams working with array data choose to roll their own solutions, building custom data management systems on top of cloud object storage. Arraylake offers the best of both worlds:
A turnkey, cloud-native data lake platform built around the array data model.Managed Data Lakehouse
- Data must be massaged into tabular formats
- Loss of data fidelity
- Poor performance on array workloads
- Turnkey solution
- Transactions, data versioning, catalog
- Integrated compute layer
Cloud Object Storage
- Data model agnostic
- Integration with scientists' preferred tools
- Good performance is possible
- Custom engineering required
- No transactions, data versioning, catalog
- No integrated compute layer