All Products
Search
Document Center

MaxCompute:Data Lakehouse Solution 2.0 user guide

Last Updated:Sep 02, 2025

MaxCompute provides Data Lakehouse Solution 2.0. This solution lets you create management objects that define the metadata and data access methods for external data sources. Using an external schema mapping mechanism, you can directly access all tables within a database or schema of an external data source. This solution eliminates the silos between data lakes and data warehouses by combining the flexibility and rich multi-engine ecosystem of data lakes with the enterprise-grade capabilities of data warehouses. This helps you build an integrated data management platform. This feature is in public preview.

Basic concepts

  • Data warehouse vs. data lake

    Category

    Capability

    Data warehouse

    Emphasizes the management and constraints on structured and semi-structured data entering the warehouse. It relies on strong management capabilities to achieve better computing performance and more standardized management.

    Data lake

    Emphasizes open data storage and common data formats. It supports multiple engines for producing or consuming data as needed. To ensure flexibility, it provides only weak management capabilities. It is compatible with unstructured data and supports a schema-on-read approach, offering a more flexible data management method.

  • MaxCompute data warehouse

    MaxCompute is a cloud-native data warehouse that uses a serverless architecture. You can perform the following actions:

    • You can model your data warehouse using MaxCompute.

    • You can use extract, transform, and load (ETL) tools to load and store data in model tables with defined schemas.

    • You can use a standard SQL engine to process massive amounts of data in the data warehouse and analyze the data using the Hologres online analytical processing (OLAP) engine.

  • Scenarios for using MaxCompute with data lakes and federated computing

    In a data lake scenario, data resides in the lake and is produced or consumed by various engines. The MaxCompute compute engine can act as one of these engines to process and use the data. In this case, MaxCompute needs to read data generated upstream in the data lake, be compatible with various mainstream open-source data formats, perform computations within its engine, and continue to produce data for downstream workflows.

    As a secure, high-performance, and cost-effective data warehouse that aggregates high-value data, MaxCompute also needs to retrieve metadata and data from the data lake. This allows for in-engine computation on external data and federated computing with internal data to extract value and converge data into the fully managed data warehouse.

    Beyond the data lake, MaxCompute as a data warehouse also needs to retrieve data from various external data sources, such as Hadoop and Hologres, to perform federated computing with its internal data. In federated computing scenarios, MaxCompute must also support reading metadata and data from external systems.

  • MaxCompute Data Lakehouse Solution 2.0

    MaxCompute Data Lakehouse Solution 2.0 is based on the MaxCompute compute engine. It supports accessing Alibaba Cloud metadata or storage services over the Alibaba Cloud service interconnection network. It also supports accessing external data sources in a VPC over a leased line network. The solution lets you create management objects that define the metadata and data access methods for external data sources. Using an external schema, you can map to a database or schema of an external data source to directly access all tables within it.

    image

    • Network connectivity

      For more information, see the Networklink section in VPC access solution (direct connection over a leased line). MaxCompute can access data sources in a VPC network, such as EMR and RDS instances, through a network connection. Support for RDS instances is in preparation. Data Lake Formation (DLF), Object Storage Service (OSS), and Hologres are located on the Alibaba Cloud service interconnection network. MaxCompute can access data in these services directly without setting up a Networklink object.

    • Foreign Server

      A foreign server contains information for accessing metadata and data. It also includes authentication information, location details, and connection protocols for the data source system. A foreign server is a tenant-level management object defined by a tenant administrator.

      If project-level tenant resource access control is enabled, the tenant administrator attaches the foreign server to the project that will use it. The project administrator then uses a policy to grant users within the project permission to use the foreign server.

    • External Schema

      An external schema is a special type of schema in a MaxCompute data warehouse project. As shown in the preceding figure, it can map to a database or schema of a data source. This allows direct access to the tables and data within that database or schema. Tables mapped to a remote database through an external schema are called federated foreign tables.

      MaxCompute does not store metadata for federated foreign tables. Instead, MaxCompute retrieves it in real time from the metadata service defined in the foreign server object. When you run a query, you do not need to create an external table in the data warehouse using a Data Definition Language (DDL) statement. Instead, you can directly reference the original table name from the data source, using the project name and external schema name as the namespace. If the table schema or data in the data source changes, the federated foreign table immediately reflects the latest state of the source table. The data source level that an external schema can map to is determined by the system hierarchy of the data source and the access level of the identity defined in the foreign server.

    • External Project

      In Data Lakehouse Solution 1.0, an external project uses a two-layer model. Similar to an external schema, it maps to a database or schema of a data source and requires a data warehouse project to read and compute external data. However, external projects have a higher hierarchy level. Mapping a data source database or schema can lead to an excessive number of external projects, and they cannot be used with three-layer model data warehouse projects. MaxCompute is gradually deprecating Data Lakehouse Solution 1.0 external projects. Existing users can migrate to external schemas.

      In Data Lakehouse Solution 2.0, you can use an external schema to handle all the capabilities of an external project from Data Lakehouse Solution 1.0. An external project directly maps to a catalog or database of a three-layer model data source. It provides direct visibility into the databases under a DLF catalog or the schemas under a Hologres database. You can then access the data source tables as federated foreign tables. The external project feature for Data Lakehouse Solution 2.0 will be launched later. For updates, see the official documentation.

    Data source type

    Foreign server level

    External schema mapping level

    Data Lakehouse Solution 2.0 external project mapping level (coming soon)

    Data Lakehouse Solution 1.0 external project mapping level (being phased out)

    Authentication method

    DLF+OSS

    Region-level DLF and OSS services

    DLF Catalog.Database

    DLF Catalog

    DLF Catalog.Database

    RAMRole

    Hive+HDFS

    EMR instance

    Hive Database

    Not supported

    Hive Database

    No authentication

    Hologres

    Database of a Hologres instance

    Schema

    Database

    Not supported

    RAMRole

    Note

    Different data sources support various authentication methods. MaxCompute will gradually provide more authentication methods in future versions, such as using the current user's identity to access Hologres or using Kerberos authentication to access Hive.

Limits

  • The Data Lakehouse Solution 2.0 feature is supported only in the China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt) regions.

  • MaxCompute, DLF, and OSS must be deployed in the same region.

  • External schema operations must be performed in an internal project. Therefore, the internal project must have the three-layer model enabled. You can enable it in the console. For more information, see Enable the feature.

Precautions

  • You must enable the schema switch for project-level metadata support for the MaxCompute project where you will create the external schema. For more information, see Schema switch for project-level metadata support.image

  • Before you perform schema operations or query data from an external schema, you must enable the schema switch for SQL syntax support. For more information, see Schema switch for SQL syntax support.

  • When you run the SHOW tables IN <external_schema_name> command to access the data source system:

    • If you run the command in the DataWorks Data Development integrated development environment (IDE) and the system reports that the schema cannot be parsed, in addition to meeting the preceding requirements, you must also ensure that the MaxCompute CMD integrated with the DataWorks resource group is V0.46.8 or later. If the CMD version is earlier than V0.46.8, you can submit a ticket to contact MaxCompute technical support for an upgrade.

    • If you run the command in the MaxCompute client (odpscmd), you must also upgrade the client to V0.46.8 or later.

    Note

    You can run the SHOW VERSION; command to view the current version.

  • You can view the list of created external schemas, view schema information, and delete schemas. The commands and usage are the same as for internal schemas. For more information, see Schema operations.

View created foreign servers

You can view the created foreign servers and data sources in the foreign server management list. The following table describes the parameters.

Parameter

Description

Data source name

The name of the foreign server.

Type

The type of the foreign server. Currently, DLF+OSS, Hive+HDFS, and Hologres are supported.

Network connectivity

For a foreign server that uses a network connection, the network connection name is displayed. Currently, only the Hive+HDFS type supports using a network connection.

Owner account

The account information of the user who created the foreign server. External schemas and other objects rely on the foreign server to access the source system. The access permissions are derived from the identity specified by the creator in the foreign server.

  • If project-level tenant resource access control is enabled: The creator can specify which projects can use the foreign server by setting up an attachment between the foreign server and the projects. The project administrator then grants users within the project permission to use the foreign server through a policy.

  • If project-level tenant resource access control is not enabled: Any user who creates an external project or external schema can use this foreign server. Access to the external system is also based on the RAMRole permissions specified by the creator in the foreign server.

Attached projects

The number of projects to which the foreign server is attached.

Creation time

The time when the foreign server was created.

Update time

The time when the editable properties of the foreign server were last modified and saved.

Operations - Attach project

The creator sets the attachment relationship between the foreign server and projects.

  • When project-level tenant resource access control is enabled for a project, if the tenant administrator has not configured the attachment between this project and the referenced foreign server, the project cannot access the external system based on this foreign server.

  • If project-level tenant resource access control is not enabled for a project, the attachment relationship between the project and the referenced foreign server is not checked.

Operations - Details

View the properties of the foreign server.

Operations - Edit

Modify the properties of the foreign server.

Important

Modifying information such as permissions may invalidate authorization relationships configured within a project due to changes in the scope of visible objects.

Operations - Delete

Delete the current foreign server.

Important

After a foreign server is deleted, all tasks that depend on it will lose access to the external system. All configured attachment relationships between this foreign server and any projects will also be deleted.

Create and use a DLF+OSS data lakehouse

A DLF+OSS data source is a common data lake implementation that uses OSS as the data lake storage service and DLF as the metadata management service for the lake. MaxCompute supports creating this type of foreign server. By combining MaxCompute with DLF and OSS, you can integrate your data warehouse and data lake, which provides more flexible and efficient data management and processing capabilities.

Important

If the schema syntax switch has not been enabled for your current account at the tenant level, you must add the SET odps.namespace.schema=true; statement before the following SQL statements to execute schema-related statements.

Step 1: Grant permissions

When you build a lakehouse integration using MaxCompute with DLF and OSS, since the MaxCompute project account does not have authorization to access DLF or OSS by default, you must perform authorization operations. The authorization methods include the following two types:

  • One-click authorization: If you use the same account to create the MaxCompute project and deploy DLF and OSS, we recommend that you perform one-click authorization on the Cloud Resource Access Authorization page in the Resource Access Management (RAM) console.

  • Custom authorization: You can use this method regardless of whether the same account is used to create the MaxCompute project and deploy DLF and OSS. For more information, see Authorize a RAM user to access DLF.

Step 2: Create a DLF+OSS foreign server

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left navigation pane, choose Tenant Management > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters as described in the following table.

    Parameter

    Description

    Foreign server type

    Select DLF+OSS.

    Foreign server name

    Enter a custom name. The naming conventions are as follows:

    • It must start with a letter and can contain only lowercase letters, underscores (_), and digits.

    • It cannot exceed 128 characters in length.

    Foreign server description

    Enter a description as needed.

    Region

    The current region is selected by default.

    DLF Endpoint

    The DLF endpoint for the current region is used by default.

    OSS Endpoint

    The OSS endpoint for the current region is used by default.

    RoleARN

    The Alibaba Cloud Resource Name (ARN) of the RAM role. This role must have permissions to access both DLF and OSS services.

    You can log on to the RAM console. In the navigation pane on the left, choose Identity Management > Roles. Click the name of the target RAM role to get its ARN from the Basic Information section.

    Example: acs:ram::124****:role/aliyunodpsdefaultrole.

    Additional foreign server properties

    Additional properties for the foreign server. After you specify them, tasks that use this foreign server can access the source system according to the behavior defined by the parameters.

    Note

    For information about supported parameters, see the official documentation for future updates. Specific parameters will be gradually released as the product evolves.

  5. To create the foreign server, click Confirm.

Step 3: Create an external schema

The following command creates an external schema that references a DLF+OSS foreign server:

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
with <dlfoss_foreign_server_name>
ON '<dlf_calaog_id>.<database_name>';

The following list describes the parameters.

  • external_schema_name: The name of the external schema.

  • dlfoss_foreign_server_name: The name of the foreign server you created. The project to which the external schema belongs must be in the same region as the foreign server.

  • dlf_catalog_id: The ID of the DLF data catalog. For information about how to create a data catalog, see Create a data catalog.

  • database_name: The name of the database under the specified data catalog in DLF. For more information, see Databases, tables, and functions.

Step 4: Use SQL to access the data source system

  • Command format: List the table names from DLF and OSS.

    • Method 1:

      SHOW tables IN <external_schema_name>; --external_schema_name is the name of the external schema.
    • Method 2:

      USE SCHEMA <external_schema_name>; --external_schema_name is the name of the external schema.
      SHOW tables;
  • Usage examples:

    1. Query all table names under the external schema named es_dlf.

      USE SCHEMA es_dlf;
      SHOW TABLES;

      Sample result:

      ALIYUN$xxx@test.aliyunid.com:hellocsv
      ALIYUN$xxx@test.aliyunid.com:t1
      ALIYUN$xxx@test.aliyunid.com:t2
      ALIYUN$xxx@test.aliyunid.com:t3
    2. Query data from the hellocsv table in the es_dlf schema of the lakehouse47_3 project.

      SELECT * FROM lakehouse47_3.es_dlf.hellocsv;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+
    3. Copy data from the federated foreign table hellocsv in the data source to the data warehouse.

      -- Copy data from the federated foreign table to the data warehouse
      CREATE TABLE hellocsv_copy AS SELECT * FROM lakehouse47_3.es_dlf.hellocsv;
      
      -- Query the copied table data in the data warehouse
      SELECT * FROM hellocsv_copy;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+

Create and use a Hive+HDFS federation

Hive is a common open-source data warehouse solution for big data. Its metadata is mostly stored in Hive Metastore (HMS), and its data is mostly stored on HDFS. MaxCompute supports creating this type of foreign server. By federating MaxCompute with Hive, you can enable the data warehouse to access and aggregate data from open-source big data systems.

Important
  • During the public preview, pay-as-you-go SQL federated computing jobs for Hive+HDFS mode are free of charge.

  • If the schema syntax switch has not been enabled for your current account at the tenant level, you must add the SET odps.namespace.schema=true; statement before the SQL statements to execute schema-related statements.

Step 1: Create a Hive+HDFS foreign server

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left navigation pane, choose Tenant Management > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters as described in the following table.

    Parameter

    Description

    Foreign server type

    Select Hive+HDFS.

    Foreign server name

    Enter a custom name. The naming conventions are as follows:

    • It must start with a letter and can contain only lowercase letters, underscores (_), and digits.

    • It cannot exceed 128 characters in length.

    Foreign server description

    Enter a description as needed.

    Networklink object

    The name of the network connection. Select or create a connection from MaxCompute to the Alibaba Cloud E-MapReduce or Hadoop VPC network. For parameter details, see the Create a network connection between MaxCompute and the target VPC network step in VPC access solution (direct connection over a leased line).

    Note
    • For information about the basic concepts of network connections, see Networklink.

    • The VPC must be in the same region as the MaxCompute foreign server and the projects to which the foreign server is attached.

    Cluster name

    The name used to refer to the NameNode in a high-availability Hadoop cluster environment.

    To get the cluster name for an EMR cluster, follow these steps:

    1. Log on to the EMR console and click the target Cluster ID to go to the cluster details page.

    2. On the Cluster Services tab, click Configuration for the HDFS service to go to the Configuration page.

    3. Switch to the hdfs-site.xml tab and search for dfs.nameservices in the Configuration Item Name column. The value of this configuration item is the cluster name.

    NameNode address

    The service addresses and port numbers of the active and standby NameNodes of the target Hadoop cluster. The port number is usually 8020.

    To get the NameNode addresses for an EMR cluster, follow these steps:

    1. Log on to the EMR console and click the target Cluster ID to go to the cluster details page.

    2. On the Cluster Services tab, click the HDFS service to go to the Status page.

    3. In the Component List area, click the image icon in front of NameNode to expand the topology list.

    4. Get the Internal IP corresponding to the master-1-1 node. The NameNode Address format is Internal IP:8020.

    HMS service address

    The Hive metadata service addresses and port numbers of the active and standby NameNodes of the target Hadoop cluster. The port number is usually 9083.

    The format is Internal IP:9083.

    Authentication type

    Currently, only No authentication is supported.

    vSwitch

    MaxCompute accesses data sources through a VPC using the reverse access 2.0 solution by default. This solution requires configuring a vSwitch in a specific zone to establish the metadata access link. The available zones for vSwitches in each region are indicated in the interface prompts. You need to select an existing vSwitch or create a new one that meets the zone requirements in the VPC where the data source you want to access is located.

    Note

    Currently, this field is required only in the China (Shanghai) region.

    Additional foreign server properties

    Additional properties for the foreign server. After you specify them, tasks that use this foreign server can access the source system according to the behavior defined by the parameters.

    Note

    For information about supported parameters, see the official documentation for future updates. Specific parameters will be gradually released as the product evolves.

  5. To create the foreign server, click Confirm.

Step 2: Create an external schema

An external schema is an object within a project and can be manipulated using SQL. If the schema syntax switch has not been enabled for your current account at the tenant level, you must add the SET odps.namespace.schema=true; statement before the SQL statements to execute schema-related commands.

The following command creates an external schema that references a Hive+HDFS foreign server:

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
WITH  <hive_foreign_server_name>
ON '<database_name>' ;

The following table describes the parameters.

  • external_schema_name: The name of the external schema.

  • hive_foreign_server_name: The name of the foreign server you created.

  • database_name: The name of the Hive database.

Step 3: Use SQL to access the data source system

  • Command format: List the table names from Hive.

    • Method 1:

      SHOW tables IN <external_schema_name>; --external_schema_name is the name of the external schema.
    • Method 2:

      USE SCHEMA <external_schema_name>; --external_schema_name is the name of the external schema.
      SHOW tables;
  • Usage examples:

    1. Query all table names under the external schema named es_hive3.

      USE SCHEMA es_hive3;
      SHOW TABLES;

      Sample result:

      ALIYUN$xxx@test.aliyunid.com:t1
    2. Query data from the t1 table in the es_hive3 schema of the lakehouse47_3 project.

      SELECT * FROM lakehouse47_3.es_hive3.t1;

      Sample result:

      +------------+
      | id         |
      +------------+
      | 1          |
      +------------+
    3. Copy data from the federated foreign table hellocsv in the data source to the data warehouse.

      -- Copy data from the federated foreign table to the data warehouse
      CREATE TABLE t1_copy AS SELECT * FROM lakehouse47_3.es_hive3.t1;
      
      -- Query the copied table data in the data warehouse
      SELECT * FROM t1_copy;

      Sample result:

      +------------+
      | id         |
      +------------+
      | 1          |
      +------------+

Create and use a Hologres federation

Hologres is a one-stop real-time data warehouse engine. It supports real-time writing, updating, and analysis of massive data. It also supports standard SQL (compatible with the PostgreSQL protocol), multidimensional analysis (OLAP) and ad hoc analysis of petabyte-scale data, and high-concurrency, low-latency online data services (Serving). Hologres is deeply integrated with MaxCompute. Hologres can support the creation, analysis, and querying of OLAP models in a MaxCompute data warehouse. MaxCompute supports creating Hologres foreign servers. By federating with Hologres, you can implement the following scenarios.

  • Read data from a real-time data warehouse into a large-scale data warehouse for data archiving.

  • Read dimension data or data mart model data for contextual computing with fact tables from the operation data store (ODS), data warehouse detail (DWD), and data warehouse service (DWS) layers.

  • Read OLAP model data for high-performance, low-cost offline computing, and return the model results to the real-time data warehouse for analysis using Hologres external tables or remote function invocation.

Important

If the schema syntax switch has not been enabled for your current account at the tenant level, you must add the SET odps.namespace.schema=true; statement before the following SQL statements to execute schema-related statements.

Step 1: Create a Hologres foreign server

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left navigation pane, choose Tenant Management > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters as described in the following table.

    Parameter

    Description

    Foreign server type

    Select Hologres.

    Foreign server name

    Enter a custom name. The naming conventions are as follows:

    • It must start with a letter and can contain only lowercase letters, underscores (_), and digits.

    • It cannot exceed 128 characters in length.

    Foreign server description

    Enter a description as needed.

    Connection method

    Currently, only classic network access (internal network) is supported.

    Host

    The host information of the Hologres instance.

    You can log on to the Hologres Management Console. In the navigation pane on the left, choose Instance List. Click the corresponding instance ID to get the host from the Network Information section of the Instance Details page.

    Example: hgpostcn-cn-3m***-cn-shanghai-internal.hologres.aliyuncs.com.

    Important

    Only access through a classic network domain name is supported. Accessing Hologres through a VPC network domain name is not supported.

    Port

    The port information of the Hologres instance.

    You can log on to the Hologres Management Console. In the navigation pane on the left, choose Instance List. Click the corresponding instance ID to get the port from the Network Information section of the Instance Details page. The port is generally 80.

    DBNAME

    The database name of the Hologres instance.

    Authentication and authorization

    • Alibaba Cloud RAM Role: Uses RAM role-based authentication and authorization, which supports accessing Hologres across different Alibaba Cloud accounts. Federated foreign tables currently only support the RAMRole method. For RoleARN, an example is acs:ram::uid:role/aliyunodpsholorole. For information about how to configure and authorize the aliyunodpsholorole role, see Create a Hologres external table (STS mode).

    • Task Executor Identity: A user identity recognition mode between MaxCompute and Hologres. It allows the current user to see tables and data they have access to in both systems based on the same account. You can directly execute commands using the CALL EXEC_EXTERNAL_QUERY function without setting additional authentication information.

      Important

      The ExecuteWithUserAuth mode does not currently support usage with federated foreign tables.

    RoleARN

    The ARN of the RAM role. This role must have permissions to access the Hologres service. This parameter is required when you select RAMRole as the authentication method.

    You can log on to the RAM console. In the navigation pane on the left, choose Identity Management > Roles. Click the name of the target RAM role to get its ARN from the Basic Information section.

    Add Additional Foreign Server Data

    Additional properties for the foreign server. After you specify them, tasks that use this foreign server can access the source system according to the behavior defined by the parameters.

    Note

    For information about supported parameters, see the official documentation for future updates. Specific parameters will be gradually released as the product evolves.

  5. To create the foreign server, click Confirm.

Step 2: Create an external schema

An external schema is an object within a project and can be manipulated using SQL. If the schema syntax switch has not been enabled for your current account at the tenant level, you must add the SET odps.namespace.schema=true; statement before the SQL statements to execute schema-related commands.

The following command creates an external schema that references a Hologres foreign server:

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
with  <holo_foreign_server_name>
ON '<holoschema_name>' ;

The following table describes the parameters.

  • external_schema_name: The name of the external schema.

  • holo_foreign_server_name: The name of the foreign server you created.

  • holoschema_name: The name of the Hologres schema to be mapped.

Step 3: Use SQL to access the data source system

  • Command format: List the table names in the Hologres schema.

    • Method 1:

      SHOW tables IN <external_schema_name>; --external_schema_name is the name of the external schema.
    • Method 2:

      USE SCHEMA <external_schema_name>; --external_schema_name is the name of the external schema.
      SHOW tables;
  • Usage examples:

    1. Query all table names under the external schema named es_holo_rolearn_nonl:

      SET odps.namespace.schema=true;
      USE SCHEMA es_holo_rolearn_nonl;
      SHOW TABLES;

      Sample result:

      ALIYUN$xxx@test.aliyunid.com:mc_holo_external
    2. Query data from the mc_holo_external table in the es_holo_rolearn_nonl schema of the lakehouse47_3 project.

      SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+
    3. Copy data from the federated foreign table hellocsv in the data source to the data warehouse.

      -- Copy data from the federated foreign table to the data warehouse
      CREATE TABLE mc_holo_external_copy AS SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;
      
      -- Query the copied table data in the data warehouse
      SELECT * FROM mc_holo_external_copy;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+

Submit commands to Hologres using the current user's identity

MaxCompute supports using the CALL command to run the EXEC_EXTERNAL_QUERY function, which submits an executable SQL command to Hologres.

  • Command syntax

    CALL EXEC_EXTERNAL_QUERY (
      '<holo_ExecuteWithUserAuth_foreign_server_name>',
      r"###(
        <holo_query>)###");
    • holo_ExecuteWithUserAuth_foreign_server_name: The name of the Hologres foreign server created in ExecuteWithUserAuth mode.

    • r: Specifies the command to execute.

    • holo_query: The specific Hologres SQL command written between the delimiters.

    Note

    This function uses a delimiter syntax similar to C++ raw strings, which combines a delimiter string with "" and (). This syntax avoids conflicts between the delimiters and special characters in holo_query. The delimiter string can be modified, but the delimiters in the "[delimiter]( )[delimiter]" format must match. The delimiter string cannot contain the characters <>. A recommended example is "###(<holo_query>)###".

  • Usage examples

    • Example 1: In MaxCompute, submit an SQL statement to perform an INSERT OVERWRITE operation on the Hologres internal table public.current_user_test. The data source is a Hologres query that retrieves the current user's UID: SELECT current_user;

      1. Execute the following SQL in MaxCompute.

        CALL EXEC_EXTERNAL_QUERY (
          'fs_holo_ExecuteWithUserAuth_nonl_y',
          r"###(
            CALL hg_insert_overwrite(
              'public.current_user_test',
              $$SELECT current_user$$
        );)###");
      2. In Hologres, query the data of the internal table public.current_user_test.

        SELECT * FROM current_user_test;

        Result: Returns the UID of the user identity that is the same on both the MaxCompute and Hologres sides, such as 1117xxxxxx519.

    • Example 2: In MaxCompute, submit an SQL statement to perform an INSERT OVERWRITE operation on the Hologres internal table public.hologres_parent_insert1, with the partition sub-table being 2020. The data source is an external table in Hologres based on a MaxCompute table: SELECT * FROM mc_external_table WHERE a='2020';

      1. Execute the following SQL in MaxCompute.

        CALL EXEC_EXTERNAL_QUERY (
          'fs_holo_ExecuteWithUserAuth_nonl_y',
          r"###(
            CALL hg_insert_overwrite(
              'public.hologres_parent_insert1',
              '2020',
              $$SELECT * FROM mc_external_table WHERE a='2020'$$
        );)###");
      2. In Hologres, query the data of the internal table public.hologres_parent_insert1.

        -- Query the table in Hologres, one more row will be added
        SELECT * FROM hologres_parent_insert1;

        Result:

        a			b	c														d
        2020	1	2024-06-19 10:27:46.201636	a

Configure permissions for foreign servers

A foreign server is a tenant-level resource in MaxCompute. A RAM user can perform operations on a foreign server only if a tenant administrator grants the required permissions in the Policies section of Permission Management in the RAM console. For more information about how to create a custom policy, see Create a custom policy in script editor mode.

Example: Use an Alibaba Cloud account to create a policy named ForeignServerTest and grant it to a RAM user. The following is a sample policy:

{
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "odps:CreateForeignServer",
            "Resource": "acs:odps:*:12xxxxxxxx07:foreignservers/*"
        }
    ]
}

Add permissions to query all foreign servers and retrieve information about a specific foreign server:

Note
  • To specify a NetworkLink when creating a foreign server (CreateForeignServer), you must add networklink to the Resource.

  • If you specify a RAM role, you need the ram:PassRole permission for that RAM role.

{
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "odps:CreateForeignServer",
                "odps:GetForeignServer",
                "odps:ListForeignServers"
            ],
            "Resource": "acs:odps:*:12xxxxxxxx07:foreignservers/*"
        }
    ]
}

The following table describes other actions.

Action name

Description

CreateForeignServer

Create a foreign server.

UpdateForeignServer

Update a foreign server.

DeleteForeignServer

Delete a foreign server.

GetForeignServer

Get information about a specific foreign server.

ListForeignServers

Query all foreign servers.

After you grant the created policy to a RAM user, the user can perform the corresponding operations on the authorized foreign servers.

Project-level tenant resource access control

A foreign server is a tenant-level resource controlled by RAM policies. In contrast, project-level objects, such as tables, are controlled by the project administrator. Two security models manage how a tenant object is used by a project and how its permissions are reallocated within that project.

  • If project-level tenant resource access control is enabled, the creator can specify which projects can use the foreign server by setting up an attachment between the foreign server and the projects. The project administrator then uses a policy to grant users within the project permission to use the foreign server.

  • If project-level tenant resource access control is not enabled, any user who creates an external project or external schema can use this foreign server. Access to the external system is also based on the RAM role permissions specified by the creator in the foreign server. In addition to the foreign servers introduced in Data Lakehouse Solution 2.0, network connections, custom images, and quota groups are also tenant-level objects controlled by the project-level tenant resource access control switch.

Note

For more information about project-level tenant resource access control, see Project-level tenant resource access control. This feature is currently in preview and does not support enabling checks yet.

To use a foreign server and an external schema to access a federated foreign table with project-level tenant resource access control enabled, perform the following operations:

  1. Configure the attachment between the tenant object and the project. To do this, click the tenant object, such as a foreign server, select the project to attach it to, and confirm the configuration. The attached tenant object appears in the project's list of attached tenant objects.

  2. Configure a policy for the tenant object that is attached to the project. For information about how to configure a policy, see Policy-based access control.

    • Procedure

      1. In the Workspace section of the MaxCompute console, on the Project Management page, click Manage in the Actions column for the target project.

      2. On the Roles and Permissions tab, click Edit Role in the Actions column for the target role.

      3. In the Edit Role dialog box, set Authorization Method to Policy.

      4. You can modify the role policy in the script editor for Policy Authorization.

    • Example

      Taking a foreign server as an example, the following policy allows user 'a' to use the foreign server fs_hive:

      {
          "Statement":[
              {
                  "Action":[
                      "odps:Usage"
                  ],
                  "Effect":"Allow",
                  "Resource":[
                      "acs:odps:*:servers/fs_hive"
                  ]
              }
          ],
          "Version":"1"
      }

      When project-level tenant resource access control is enabled, you must first grant a user permission to use a tenant resource before you can control the usage permissions for that resource at the user or role level.

  3. Enable the tenant object check switch for the project (which enables project-level tenant resource access control).

    1. Log on to the MaxCompute console and select a region.

    2. In the navigation pane on the left, under Workspace, on the Project Management page, click Manage in the Actions column for the target project.

    3. On the Parameter Configuration tab, in the Permission Properties section, click Edit.

    4. Turn on the Enable Project-level Tenant Resource Access Control switch and click Submit.

    Important

    After you enable this feature, the project immediately verifies the permissions for all tenant objects that are in use or will be used, including foreign servers, network connections, custom images, and quota groups. Therefore, do not enable this switch until you have fully configured the attachments between tenant objects and the project, and granted the necessary policy permissions. If permissions are missing, tasks that depend on them may fail.

Grant permissions on federated foreign tables within an external schema

After an external schema is created, its tables are owned by the creator of the external schema. To grant permissions on the external schema or table operations to other users, perform the following operations.

Important
  • Because MaxCompute does not store persistent metadata from the data source in data lakehouse mode, access policies manage permissions based on the data source object's name. If an object's name changes, the authorization becomes invalid. You must revoke the permission using the `REVOKE` command or by deleting the policy. If you do not promptly remove the permission, a new object with the same name might inherit the old policy's permissions. This could grant the original user unintended access to the new object.

  • After tenant object authentication is enabled within a project, you must attach the current project to the foreign server before creating an external schema. If you do not attach it in advance, an error will occur when you create the external schema.

Assume the project for creating and managing the external schema is named test_lakehouse_project:

Operation

Required permissions

Example procedure

Create an external schema

  • CreateSchema permission on the project.

  • If project-level tenant resource access control is enabled, Usage permission on the foreign server.

  1. Specify the current project and grant CreateSchema permission on the project.

    -- Specify the current project
    use test_lakehouse_project;
    
    -- Grant CreateSchema permission on the project to a RAM user
    GRANT CreateSchema ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. (Optional) If project-level tenant resource access control is enabled and the foreign server has been attached to the current project, grant permissions to the user within the project as follows:

    -- Add the RAM account system to the current project space
    ADD accountprovider ram;
    
    -- Add a RAM user
    ADD USER `RAM$xxx@test.aliyunid.com:test_user`;
    
    -- Create a project role
    CREATE role test_lakhouse_role;
    
    -- Add a policy to the project role to allow usage of the foreign server
    put policy D:\bin\allow.policy ON role test_lakhouse_role;
    
    -- Add the RAM user to the project role
    GRANT role test_lakhouse_role TO `RAM$xxx@test.aliyunid.com:test_user`;

    The content of the allow.policy file is as follows:

    {
      "Version": "1",
      "Statement": [{
        "Action": "odps:Usage",
        "Effect": "Allow",
        "Resource": ["acs:odps:*:servers/fs_hive"]
      }]
    }
  3. Create an external schema.

    CREATE EXTERNAL SCHEMA IF NOT EXISTS es_hive3
    with fs_hive
    ON 'default' ;

Query all external schemas

CreateInstance and List permissions on the project.

  1. Grant CreateInstance permission on the project to a RAM user.

    GRANT CreateInstance ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Grant List permission on the project.

    GRANT List ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  3. View all schemas.

    SHOW schemas;

Query a specific external schema

Describe permission on the schema.

  1. Grant Describe permission on a schema to a RAM user.

    GRANT DESCRIBE ON SCHEMA es_hive3 TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Query a specific external schema.

    DESC SCHEMA es_hive3;

Modify external schema properties

Modifying external schema properties is not currently supported.

Delete a specific external schema

Drop permission on the schema.

  1. Grant Drop permission on a schema to a RAM user.

    GRANT DROP ON SCHEMA es_hive3 TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Delete a specific external schema.

    DROP SCHEMA es_hive3;

Use a specific external schema

  • Not applicable to standard three-layer model projects.

  • When a project is upgraded from a two-layer model to a three-layer model, Describe permission on the schema is required.

Specify the current external schema to use.

USE SCHEMA es_hive3;

If the project is upgraded from a two-layer model to a three-layer model, you must grant Describe permission on the specified schema.

GRANT DESCRIBE ON SCHEMA es_hive3 TO USER RAM$xxx@test.aliyunid.com:test_user;

Query a table in a specific external schema

Select permission on the table in the schema.

  1. Grant Select permission on a table in a schema to a RAM user.

    GRANT SELECT ON TABLE es_hive3.t1 TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Query a table in a specific external schema.

    SELECT * FROM es_hive3.t1;

Import data from a table in an external schema to an internal table

  • Select permission on the table in the schema.

  • CreateTable and CreateInstance permissions on the project.

  1. Grant CreateTable permission on the project to a RAM user.

    GRANT CreateTable ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Grant CreateInstance permission on the project.

    GRANT CreateInstance ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  3. Grant Select permission on a specific table.

    GRANT SELECT ON TABLE es_hive3.t1 TO USER RAM$xxx@test.aliyunid.com:test_user;
  4. Import data from a table in an external schema to an internal table.

    CREATE TABLE default.t1_copy_ram3 AS SELECT * FROM t1;