All Products
Search
Document Center

DataWorks:Find IP geolocation by using MaxCompute UDFs

Last Updated:Sep 24, 2025

MaxCompute cannot directly call external API operations. This tutorial provides a complete case study that shows you how to use an offline IP database and a user-defined function (UDF) to efficiently analyze IP address geolocations in MaxCompute.

Background

MaxCompute is a closed computing environment and cannot directly call external API operations via HTTP, such as the Taobao IP database API, to query IP geolocations in real time. The following three solutions can be used to analyze the geolocation of IP addresses in MaxCompute.

Solution

Implementation

Pros

Cons

Conclusion

Export data for query

Export the IP addresses to be analyzed from MaxCompute to a local device, and then use a script to cyclically call API operations for queries.

Simple

Low efficiency. Subject to API rate limits (such as 10 QPS), making it unsuitable for large-scale data.

Not recommended

Query a local database

Download both the IP database and the IP addresses to be analyzed to a local device, and then perform matching using a local program.

No API rate limits

The data is separated from the data warehouse and cannot be associated with other business data (such as order tables) in MaxCompute for analysis.

Not recommended

Import the IP database for query

Upload the entire IP address database to a MaxCompute table, and then use a UDF in SQL to perform queries.

  • Efficient: Fully utilizes MaxCompute's parallel computing capabilities.

  • Integrated: Can be directly associated with any MaxCompute table.

  • Scalable: Suitable for any data scale.

Requires regular manual updates to the IP database.

Recommended

Conclusion: The third solution is the only one that can meet the demands for high-performance and highly integrated analysis in big data use cases. This tutorial details how to implement this solution.

Solution overview

This solution consists of three main modules:

  1. Data preparation: Upload a public IP address database file to a MaxCompute table as a reference.

  2. UDF development: Create a UDF to convert string IP addresses into integers. This is essential for efficient range matching.

  3. SQL analysis: Write SQL statements that call the UDF to look up IP addresses in the MaxCompute table and obtain their geolocations.

Preparations

  1. Create a workspace that has not opted in to the Data Studio public preview. This tutorial uses a DataWorks workspace in Basic mode.

  2. Bind a serverless resource group to the workspace.

  3. Bind a MaxCompute computing resource to the workspace and test its network connectivity.

  4. In the target workspace, create a business flow.

Procedure

Prepare an IP address database

  • Sample database: ipdata.csv.

  • Sample data structure:

    • The data format is UTF-8.

    • The first four data fields are the start and end addresses of the IP range. The first two are in decimal integer format, and the next two are in dotted-decimal format.

Upload the IP address database to MaxCompute

  1. Create a table.

    1. Go to the DataStudio page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

    2. On the DataStudio page, right-click the target business flow and choose Create Node > MaxCompute > ODPS SQL.

    3. In the dialog box, select the computing resource you prepared for Engine Instance, enter a name, and click Confirm.

    4. On the node editor page, enter the following code:

      -- If a table named ipresource already exists, delete it first.
      DROP TABLE IF EXISTS ipresource;
      -- Create a table to store IP addresses.
      CREATE TABLE IF NOT EXISTS ipresource 
      (
          start_ip     BIGINT,   -- Start address of the IP range (in decimal integer format).
          end_ip       BIGINT,   -- End address of the IP range (in decimal integer format).
          start_ip_arg STRING,   -- Start address of the IP range (in dotted-decimal string format, such as "1.0.1.x").
          end_ip_arg   STRING,   -- End address of the IP range (in dotted-decimal string format, such as "1.0.3.x").
          country      STRING,   -- Country name.
          area         STRING,   -- Area name (usually empty or same as the country).
          city         STRING,   -- City name (may be a province for some data).
          county       STRING,   -- District/county name.
          isp          STRING    -- Internet service provider (such as China Telecom, China Unicom, and China Mobile).
      );
    5. Click the ** icon to run the code. In the Parameters dialog box, select the serverless resource group you bound and click Run. Wait for Estimate MaxCompute Computing Cost to complete, then click Run.

  1. Upload data to the table.

    1. Click the image icon in the upper-left corner. In the displayed page, click All Products > Data Integration > Upload and Download.

    2. Click the image icon in the left navigation pane.

    3. Click Upload Data. Configure the parameters as described in the following table.

      Parameter

      Description

      Specify Data to Be Uploaded

      Select File

      Upload the ipdata.csv file.

      Configure Destination Table

      Compute Engine

      MaxCompute

      MaxComputeProject Name

      Select the taget MaxCompute project.

      Destination Table

      Select the ipresource table.

      Important

      If the newly created table is not in the list, go to Data Map > My Data > Refresh Table Metadata and manually refresh the metadata for odps.<project_name>.ipresource.

      Resource Group

      Select the serverless resource group you prepared.

      Preview Data of Uploaded File

      Click Mapping by Order to map the data in the .csv file to the fields of the ipresource table.

    4. Click Upload Data.

  2. Verify that the data is uploaded.

    Return to the ODPS SQL node editor on the DataStudio page. Execute the following SQL query. If you can see the number of data rows and sample data, the upload was successful.

    -- Query the number of data rows in the table.
    SELECT COUNT(*) FROM ipresource;
    -- Preview the first 10 rows of data.
    SELECT * FROM ipresource limit 10;

Develop a UDF

To look up the geolocation of an IP address in SQL, you need a UDF that converts a dotted-decimal IP string (such as '1.0.2xx.4x') into a comparable decimal integer.

  1. Create a MaxCompute Python resource.

    1. On the DataStudio page, right-click the target business flow and choose Create Resource > MaxCompute > Python.

    2. In the dialog box, enter a name, such as mc.py. Select Upload to MaxCompute and click Create.

    3. On the node editor page, enter the following code:

      from functools import reduce
      from odps.udf import annotate
      
      @annotate("string->bigint")
      class IPtoInt(object):
          def evaluate(self, ip):
              try:
                  return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
              except:
                  return 0
    4. Click the image.png icon.

  2. Create a MaxCompute UDF.

    1. On the DataStudio page, right-click the target business flow and choose Create Function > MaxCompute > Function.

    2. In the dialog box, enter a name, such as ip2int, and click Create. This will be the UDF name and cannot be changed after creation.

    3. On the editor page, configure the parameters. Key parameters are described below:

      For more parameter descriptions, see Register a MaxCompute function.

      Parameter

      Description

      Function Type

      Keep the default setting Other Function.

      Class Name

      The main class that implements the UDF. Example value: mc.IPtoInt.

      Note

      When the resource type is Python, the class name format is Python resource name.Class name (the .py in the resource name is not required).

      Resources

      Select the resource file mc.py from the previous step.

    4. Click the image icon to save the function.

    5. Click the image.png icon. In the dialog box, enter the change description and click Confirm.

Find IP geolocation by using SQL

  1. Return to the ODPS SQL node editor. Execute the following SQL query:

    -- Replace this with the actual IP address you want to query.
    SELECT * FROM ipresource
    WHERE ip2int('1.0.2xx.4x') >= start_ip
    AND ip2int('1.0.2xx.4x') <= end_ip;
  2. Click the ** icon to run the code. In the Parameters dialog box, select the serverless resource group you bound and click Run. Wait for Estimate MaxCompute Computing Cost to complete, then click Run.

  3. View the execution result. You can find the location of the IP address.

References