Creates a user-defined function (UDF) in a MaxCompute project.
Prerequisites
Before you register a UDF, you must run the ADD JAR <localfile> [COMMENT '<comment>'][-f];
command to add the required resources to a MaxCompute project. For more information, see ADD JAR.
Limits
Function names must be unique in a project. You cannot create a function that has the same name as an existing function in the project.
UDFs cannot overwrite built-in functions of MaxCompute. Only the project owner can use UDFs to overwrite built-in functions. If you use a UDF that overwrites a built-in function, warning information is displayed in Summary of the Logview of your job after the SQL statement is executed.
Syntax
CREATE FUNCTION <function_name> AS <'package_to_class'> USING <'resource_list'>;
Parameters
function_name: required. The name of the UDF that you want to create.
package_to_class: required. The class of the UDF that you want to create. This parameter is case-sensitive and must be enclosed in single quotation marks (').
For a Java UDF, specify this name as a fully qualified class name from the top-level package name to the UDF class name.
For a Python UDF, specify this name in the Python script name.Class name format.
NoteThe Python script name refers to the underlying resource name that uniquely identifies the resource. For example, if you upload a resource as
pyudf_test.py
and then rename it toPYUDF_TEST.py
in DataStudio or use the MaxCompute client to overwrite it, the underlying resource name remainspyudf_test.py
. Therefore, when you register the user-defined function, the class name must bepyudf_test.SampleUDF
. You can run theLIST RESOURCES;
command to view the underlying names of all resources.
resource_list: required. The list of resources used by the UDF.
The resource list must include the resources that contain the UDF code. Make sure that the resources are uploaded to MaxCompute.
If the code calls the Distributed Cache API to read resource files, this resource list must also contain the list of resource files that are read by the UDF.
The resource list consists of multiple resource names and must be enclosed in single quotation marks ('). The resource names must be separated by commas (,).
To specify a resource from a different project, use the
<project_name>/resources/<resource_name>
format.
NoteIf schema is enabled and you need to use resources from other projects, see Work with objects in a schema.
Examples
Example 1: Create the
my_lower
function. The Java UDF classorg.alidata.odps.udf.examples.Lower
is in my_lower.jar.CREATE FUNCTION my_lower AS 'org.alidata.odps.udf.examples.Lower' USING 'my_lower.jar';
Example 2: Create the
my_lower
function. The Python UDF class MyLower is in the pyudf_test.py script within thetest_project
project.CREATE FUNCTION my_lower as 'pyudf_test.MyLower' using 'test_project/resources/pyudf_test.py';
Example 3: Create the
test_udtf
function. The Java UDF classcom.aliyun.odps.examples.udf.UDTFResource
is in udtfexample1.jar, and the function also depends on the FILE resource file_resource.txt, the Table resource table_resource1, and the Archive resource test_archive.zip.CREATE FUNCTION test_udtf AS 'com.aliyun.odps.examples.udf.UDTFResource' USING 'udtfexample1.jar, file_resource.txt, table_resource1, test_archive.zip';
Related statements
FUNCTION: If you do not need to store SQL functions in the metadata system of MaxCompute, you can create temporary SQL functions. These functions apply only to the current SQL script.
DELETE FUNCTION: Deletes a function. You can write a UDF and call the delete_function() method of a MaxCompute entry object to delete the UDF.
DROP FUNCTION: Deletes an existing UDF from a MaxCompute project.
DESC FUNCTION: Views the information of a specified UDF in a MaxCompute project. The information includes the name, owner, creation time, class name, and resource list of the UDF.
LIST FUNCTIONS: Views the information of all UDFs in a MaxCompute project.
UPDATE FUNCTION: Updates a function. You can write a UDF and call the update method of MaxCompute to update the UDF.