All Products
Search
Document Center

DataWorks:Supported data sources and synchronization solutions

Last Updated:Oct 16, 2025

DataWorks Data Integration enables seamless data synchronization across multiple data sources, including MySQL, MaxCompute, Hologres, and Kafka. Data Integration provides batch synchronization, real-time data synchronization, and whole-database migration solutions for use cases like batch ETL, real-time data replication with second-level latency, and whole-database migration.

Synchronization solutions

Solution

Source

Destination

Latency

Use case

Single-table sync (batch)

A single table

A single table or partition

Daily batch or periodic sync

Periodic full or incremental sync

Sharded database and table sync (bacth)

Multiple tables sharing identical schema

A single table or partition

Daily or custom intervals

Periodic full, periodic incremental

Single-table sync (real-time)

A single table

A single table or partition

A few minutes or seconds

Real-time incremental (CDC)

Sharded database and table sync (real-time)

Multiple logical tables (aggregated from physical tables)

One or multiple tables

A few minutes or seconds

Full + real-time incremental (CDC)

Whole-database sync (batch)

An entire database or multiple tables

Multiple tables and their partitions

One-time or periodic

One-time/periodic full, one-time/periodic incremental, one-time full + periodic incremental

Whole-database sync (real-time)

An entire database or multiple tables

Multiple tables and their partitions

A few minutes or seconds

Full + real-time incremental (CDC)

Whole-database full and incremental sync

(near real-time)

An entire database or multiple tables

Multiple tables and their partitions

  • Initial load: Full batch processing

  • Ongoing updates: Daily incremental sync

Full + real-time incremental (CDC)

Recommended synchronization solutions

Choose your data synchronization approach based on these key factors:

  1. Data freshness requirements: batch or real-time.

  2. Data scale and complexibility: The number of tables to sync and their processing logic.

Based on these factors, we recommend two main categories of synchronization solutions: bacth and real-time.

1. Batch synchronization solutions (daily batch or periodic sync)

The solutions are suitable for use cases that do not require high data timeliness (for example, daily batch) and involve periodic batch processing.

Important

Incremental synchronization requires a field to track data changes, such as a timestamp column (last_modified) or auto-incrementing ID. Without such a field, run full sync periodically instead.

a. Select "Single-table sync (batch)"

Ideal for custom processing logic on a limited number of diverse data sources.

  • Core advantage: flexible processing logic.

    • Advanced transformations: Enables complex field mapping, filtering, enrichment, and AI-powered processing.

    • Heterogeneous source integration: The best choice for processing non-standard data sources like APIs and log files.

  • Core limitation: expensive to scale.

    • Configuration overhead: Managing individual tasks becomes costly at scale.

    • High resource consumption: Each task is scheduled independently. The resource consumption of syncing 100 independent tables is far greater than that of one whole-database task.

See also: Single-table batch synchronization tasks
b. Select "Whole-database sync (batch)"

Efficiently migrate large volumes of homogeneous tables between systems.

  • Core advantages: High operational efficiency and low cost.

    • Efficient: Configure hundreds of tables at once with automatic object matching, greatly improving development efficiency.

    • Cost-effective: Resources are optimized through unified scheduling, resulting in extremely low costs (for example, one whole-database task may consume 2 CUs versus 100 CUs for equivalent single-table tasks).

    • Typical scenarios: Building the ODS layer of a data warehouse, periodic database backups, and data cloud migration.

  • Core limitation: Simple processing logic.

    • Primarily designed for replication and does not support complex transformation logic for individual tables.

Recommended solution: Offline whole-database synchronization tasks.

2. Real-time synchronization solutions (sub-minute latency)

Real-time solutions are suitable for applications that require capturing real-time data changes (inserts, deletes, updates) from the source to support real-time analytics and fast decision-making.

Important

The source must support Change Data Capture (CDC) or is a message queue. For example, MySQL requires binary logging (Binlog), while Kafka functions as a native message queue.

Select "single-table real-time" or "whole-database real-time"
  • Single-table real-time: Suitable for cases requiring complex processing of real-time change streams from a single table.

  • Whole-database real-time: The standard solution for building real-time data warehouses and lakes, implementing real-time database disaster recovery. It offers significant advantages in efficiency and cost-effectiveness.

Recommended solutions:Real-time single-table synchronization task; Data Integration-side synchronization task

3. Special case: syncing real-time data to append-only tables

Important

Real-time synchronization captures CDC events including inserts, updates, and deletes. For append-only storage systems like MaxCompute non-Delta tables which do not natively support physical Update and Delete operations, writing a raw CDC stream directly results in data inconsistencies (for example, delete operations are ignored).

  • DataWorks solution: Base + Log tables

    • This solution resolves the issue by creating a Base table (full snapshot) and a Log table (incremental changes) at the destination.

    • Write method: CDC data streams to the Log table in real time. Daily, the system automatically schedules a task to merge the changes from the Log table into the Base table, generating an up-to-date full snapshot. This approach ensures that changes are written to the incremental table within minutes and merged into the Base table daily.

Recommended solution: Synchronize full and incremental data in a database to MaxCompute in quasi real time.

Data source read/write capabilities

Data source

Single-table sync (batch)

Single-table sync (real-time)

Whole-database sync (batch)

Whole-database sync (real-time)

Whole-database full and incremental (near real-time)

Amazon S3 data source

Read/Write

-

-

-

-

Amazon Redshift data source

Read/Write

-

-

-

-

AnalyticDB for MySQL 2.0 data source

Read/Write

-

-

-

-

AnalyticDB for MySQL 3.0 data source

Read/Write

Write

Read

Write

-

AnalyticDB for PostgreSQL data source

Read/Write

-

Read

-

-

ApsaraDB for OceanBase data source

Read/Write

-

-

Read

Read

Azure Blob Storage data source

Read

-

-

-

-

BigQuery data source

Read

-

-

-

-

ClickHouse data source

Read/Write

-

-

-

-

DataHub data source

Read/Write

Read/Write

-

Write

-

DLF data source

Write

Write

-

Write

-

Db2 data source

Read/Write

-

Read

-

-

Doris data source

Read/Write

Write

-

-

-

DM data source

Read/Write

-

Read

-

-

DRDS (PolarDB-X 1.0) data source

Read/Write

-

Read

-

-

Elasticsearch

Read/Write

Write

Write

Write

-

FTP data source

Read/Write

-

-

-

-

GBase8a

Read/Write

-

-

-

-

HBase

  • HBase: Read/Write

  • HBase 2.0.x SQL: Read

  • HBase 1.1.x SQL: Write

-

-

-

-

HDFS data source

Read/Write

-

-

-

-

Hive

Read/Write

-

Write

-

-

Hologres data source

Read/Write

Read/Write

Read/Write

Write

-

HttpFile data source

Read

-

-

-

-

Kafka data source

Read/Write

Read/Write

-

Write

-

KingbaseES data source

Read/Write

-

-

-

-

Lindorm data source

Read/Write

-

-

-

-

Simple Log Service data source

Read/Write

Read

-

-

-

MaxCompute data source

Read/Write

Write

Write

-

Write

MariaDB data source

Read/Write

-

-

-

-

Maxgraph data source

Write

-

-

-

-

Memcache data source

Write

-

-

-

-

MetaQ data source

Read

-

-

-

-

Milvus data source

Write

-

-

-

-

MongoDB data source

Read/Write

-

-

Read

-

MySQL data source

Read/Write

Read

Read

Read

Read

OpenSearch data source

Write

-

-

-

-

Oracle data source

Read/Write

Read

Read

Read

Read

OSS data source

Read/Write

Write

Write

-

-

OSS-HDFS data source

Read/Write

Write

-

-

-

PolarDB data source

Read/Write

Read

Read

Read

Read

PolarDB-X 2.0 data source

Read/Write

-

Read

Read

-

PostgreSQL data source

Read/Write

-

Read

Read

-

Redis data source

Write

-

-

-

-

RestAPI (HTTP) data source

Read/Write

-

-

-

-

Salesforce data source

Read/Write

-

-

-

-

SAP HANA data source

Read/Write

-

-

-

-

Sensors Data data source

Write

-

-

-

-

StarRocks data source

Read/Write

Write

Write

-

-

SQL Server data source

Read/Write

-

Read

-

-

Tablestore data source

Read/Write

Write

-

-

-

Tablestore Stream data source

Read/Write

-

-

-

-

TiDB data source

Read/Write

-

-

-

-

TSDB data source

Write

-

-

-

-

Vertica

Read/Write

-

-

-

-

TOS data source

Read

-

-

-

-

Use cases

References

The following Data Integration documents help you get started quickly.