All Products
Search
Document Center

MaxCompute:What is MaxCompute?

Last Updated:Sep 25, 2025

MaxCompute is an enterprise-grade Software as a Service (SaaS) Cloud Data Warehouse built for Data Analytics. With its Serverless Architecture, it delivers a fast, fully managed online data warehouse service that eliminates the scalability and elasticity constraints of traditional data platforms. This approach minimizes your operational overhead, letting you analyze and process massive datasets economically and efficiently.

As data collection methods evolve and industry data accumulates, data volumes have grown to terabyte (TB), petabyte (PB), and even exabyte (EB) scales, reaching levels that traditional software cannot handle. MaxCompute provides both offline and real-time data ingestion, and supports large-scale data processing and query acceleration. It offers versatile data warehouse solutions and analytical modeling services for a wide range of computing scenarios. With comprehensive data import solutions and a variety of classic distributed computing models, you can easily analyze big data without the complexity of managing and maintaining distributed systems.

MaxCompute is designed for storage and compute needs ranging from 100 GB to the exabyte (EB) level and has been battle-tested at scale within Alibaba Group. It is ideal for use cases such as data warehousing and BI analytics for large internet companies, website log analysis, e-commerce transaction analysis, and analyzing user behavior and interests.

MaxCompute is deeply integrated with the following Alibaba Cloud products:

  • DataWorks

    An end-to-end platform for data synchronization, workflow design, data development, management, and operations.

  • Platform for AI (PAI)

    A machine learning platform with algorithm components for training models on MaxCompute data.

  • Hologres

    A real-time data warehouse that can accelerate queries on MaxCompute data via external tables or for interactive analysis on data exported from MaxCompute.

  • Quick BI

    A business intelligence tool for creating reports and visually analyzing MaxCompute data.

Core features

Feature

Description

Fully managed Serverless online service

  • An out-of-the-box online service accessed through APIs.

  • Provides a large-scale pre-provisioned resource cluster that you can use on demand with a pay-as-you-go billing method.

  • Requires no platform maintenance, minimizing your operational workload.

Elasticity and Scalability

  • Storage and compute scale independently, allowing enterprises to connect and analyze all their data assets on a single platform, eliminating data silos.

  • Supports dynamic resource allocation based on business peaks and troughs.

Unified and rich computing and storage capabilities

  • MaxCompute supports various computing models and rich UDFs.

  • Uses Columnar Storage, which typically achieves a 5x compression ratio to significantly reduce storage costs.

Data modeling, development, and governance capabilities

You can centralize, integrate, process, and govern all your data with DataWorks, a one-stop Data Development and Data Governance platform. DataWorks supports MaxCompute project management and web-based query editing.

Integrated AI capabilities

  • Seamlessly integrates with the Platform for AI (PAI) to provide powerful machine learning processing capabilities.

  • Lets you run intelligent analysis using the familiar Spark-ML.

  • Supports third-party Python machine learning libraries.

Deep integration with the Spark engine

  • Provides a built-in Apache Spark engine with complete Spark functionality.

  • Deeply integrates with MaxCompute's computing resources, data, and permission system.

Lakehouse

  • Integrates access and analysis for data in a data lake (such as OSS or Hadoop Distributed File System (HDFS)). You can analyze data in the lake by mapping it with an External Table or accessing it directly with Spark.

  • Enables joint analysis of data across a data lake and a data warehouse within a unified data warehouse service and user interface.

For more information, see Lakehouse of MaxCompute.

Unified offline and real-time processing

  • Deeply integrates with Hologres, a real-time data warehouse. It supports querying associated external tables and direct reads from the storage layer, achieving over 5 times higher query efficiency than other external table types.

  • Hologres provides query acceleration for MaxCompute, delivering a 10x or greater performance boost without moving data.

  • Hologres supports batch import of MaxCompute Metadata, eliminating the need to create external tables manually.

Support for stream writing and near real-time analytics

  • Supports real-time writing of streaming data for analysis within the data warehouse.

  • Deeply integrates with major cloud streaming services, making it easy to ingest streaming data from various sources.

  • Supports high-performance, second-level elastic concurrent queries for near real-time analytics scenarios.

Continuous SaaS-based data protection in the cloud

Provides over 20 security features that meet Level 3 standards for classified information security protection. These features cover infrastructure, data centers, networks, power supply, platform security, permission management, and privacy protection, combining the security capabilities of both open-source big data and managed databases.

Product architecture

The following figure shows the MaxCompute architecture.

image

The core modules are described below.

Module

Description

Storage engine

MaxCompute provides the MaxCompute Storage Engine (internal storage) to store MaxCompute tables and resources. You can also directly read data stored in other products like OSS, Tablestore, and RDS using external tables.

The MaxCompute Storage Engine primarily uses Columnar Storage, which typically achieves a 5x compression ratio.

Compute engine

MaxCompute provides the MaxCompute SQL Compute Engine and the CUPID computing platform.

  • MaxCompute SQL Engine: Directly runs MaxCompute SQL tasks. For command syntax, function requirements, and development examples for MaxCompute SQL tasks, see Overview of MaxCompute SQL.

  • CUPID computing platform: Runs tasks from third-party engines like Spark and Mars. For development requirements and examples for multiple engines, see PyODPS.

Cloud service layer

MaxCompute allows you to create different task queues and configure unique resources and priorities for each, enabling fine-grained control over task execution. To enhance overall system efficiency, its powerful scheduling system manages and optimizes the allocation and use of computing resources. To ensure data security and privacy, MaxCompute also provides multi-layered data protection, including project-level isolation, access control, and data encryption.

Unified metadata and security systems

MaxCompute's offline, tenant-level metadata is provided through Information Schema. You can also use Information Schema to query historical usage data logs, enabling you to analyze metrics like resource consumption, run duration, and data processed. This helps you optimize jobs or plan resource capacity.

MaxCompute also offers a comprehensive security management system with features like access control, data encryption, and dynamic data masking to ensure data security. For more security-related information, see Security features.

User interfaces and openness

MaxCompute provides the following user interfaces:

Data ecosystem support

MaxCompute is deeply integrated with Alibaba Cloud DataWorks to provide one-stop data development, analytics, and governance. It also supports various other data development and analysis scenarios:

  • Data lake

  • Data integration

  • Data governance

  • Data development by using a third-party engine

  • Visualized data analytics

TopConsole (MaxCompute console)

Provides basic configuration and management capabilities, including MaxCompute project management, quota management, and tenant management. It also offers fundamental O&M features like job O&M and resource observation, as well as enhanced features like materialized views and cost analytics and optimization. For more information, see Resource management and use.

Product advantages

MaxCompute offers the following key advantages:

  • Easy to use

    • High-performance storage and compute optimized for data warehousing.

    • Pre-integrated with services and supports standard SQL for simple development.

    • Built-in management and security capabilities.

    • Fully managed with a pay-as-you-go model; you incur no compute costs when not in use.

  • Elasticity that matches business growth

    With decoupled storage and compute, resources scale independently and dynamically. This on-demand elasticity meets sudden business growth without upfront capacity planning.

  • Support for various analytics scenarios

    Supports an open data ecosystem, providing a unified platform for data warehousing, BI, Near-real-time Analytics, Data Lake Analysis, and Machine Learning.

  • Open platform

    • Provides open APIs and a rich ecosystem, offering flexibility for data and application migration and Custom Development.

    • Flexibly combines with open-source and commercial products like Airflow and Tableau to build a wide range of data applications.

Contact us

If you have any questions or suggestions while using MaxCompute, please fill out the DingTalk group application form to join our DingTalk group for feedback.