CN115964050A - Method and system for realizing user-defined function - Google Patents

Method and system for realizing user-defined function Download PDF

Info

Publication number
CN115964050A
CN115964050A CN202211669162.5A CN202211669162A CN115964050A CN 115964050 A CN115964050 A CN 115964050A CN 202211669162 A CN202211669162 A CN 202211669162A CN 115964050 A CN115964050 A CN 115964050A
Authority
CN
China
Prior art keywords
udf
source code
jvm
user
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211669162.5A
Other languages
Chinese (zh)
Inventor
张凤
周成祖
魏超
朱海勇
王杰诚
苏海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202211669162.5A priority Critical patent/CN115964050A/en
Publication of CN115964050A publication Critical patent/CN115964050A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The method comprises the steps that a system service interface receives and analyzes a Flink SQL task and UDF request parameters submitted by a user; responding to a UDF request parameter submitted by a user and only providing a UDF unique identifier, and acquiring the UDF unique identifier from a system cache; responding to the UDF source code provided in the UDF request parameter, acquiring an interpreter or a compiler corresponding to a development language according to the UDF source code, dynamically converting one or more UDF source codes developed by different languages from a text form into a JVM object, and registering a UDF function pointed by the JVM object in a Flink SQL runtime environment; and in response to the successful execution of the UDF related Flink SQL task, persistently storing the UDF source code related information, and writing the UDF unique identifier into a system cache. The method and the device can reduce the remote connection overhead of persistent storage, solve the problem of repeated compiling of the UDF, accelerate debugging efficiency and solve the problem of collaborative development of research personnel and common users in different technical fields.

Description

Method and system for realizing user-defined function
Technical Field
The invention relates to the technical field of computer software, in particular to a method and a system for realizing a user-defined function.
Background
UDF (User Defined Function), a User-Defined Function, accepts parameters, performs an operation, and returns the result of the operation. The generalized UDF is a collection of three functions, a user-defined scalar function (UDF), a user-defined table valued function (UDTF), and a user-defined aggregation function (UDAF), and the narrow UDF represents the user-defined scalar function (UDF).
In a big data service scene, flink is the most widely used distributed computing component at present, and provides a user with the ability to initiate a computing request by using SQL so as to reduce the development and use thresholds of people in different technical fields. When the preset functions cannot support the realization of complex or personalized services, a user can write code logic by self to create a self-defined function by referring to the realization of the preset functions of the Flink SQL so as to expand the query function of the Flink SQL and meet the requirements of diversified big data calculation and service analysis.
Before the UDF is formally used, developers usually need to compile a UDF source code program implemented in different languages such as Java, scala, python and the like in an offline editor according to service requirements, compile and package the UDF source code program into a jar packet, and upload the jar packet and corresponding Flink operation together to a distributed cluster to submit and run the corresponding Flink operation. If the operation fails, the program needs to be subjected to the procedures of program modification, compiling and packaging, submitting operation and the like again, the steps are complex, the debugging is difficult, the requirement of developers in different technical fields for carrying out collaborative development with common users cannot be met, and the capability of the users for debugging and using the UDF is greatly limited.
A certain solution exists at present, a UDF can be edited through an independent visual component, a compiled UDF file is stored in a database, and then the component submits a unique file identifier of the UDF as a parameter of a target task to a cluster for operation. In addition, the other solution can directly submit the UDF source code online and compile the UDF during running, but the problem of repeated compilation of the verified UDF is easily caused, and the online mixed programming of common big data development languages such as Java, scala, python and the like cannot be simultaneously supported, so that the requirements of common users and professional technicians on collaborative development and debugging cannot be met.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a method and a system for implementing a user-defined function, so as to solve the technical problems.
According to one aspect of the present invention, a method for implementing a user-defined function is provided, including:
s1: a system service interface receives and analyzes a Flink SQL task and a UDF request parameter submitted by a user;
s2: responding to that only a UDF unique identifier is provided in a UDF request parameter submitted by a user, and acquiring the UDF unique identifier from a system cache;
responding to the UDF source code provided in the UDF request parameter, acquiring an interpreter or a compiler corresponding to a development language according to the UDF source code, dynamically converting one or more UDF source codes developed by different languages from a text form into a JVM object, and registering a UDF function pointed by the JVM object in a Flink SQL runtime environment;
s3: and in response to the successful execution of the UDF related Flink SQL task, persistently storing the UDF source code related information, and writing the UDF unique identifier into a system cache.
In some specific embodiments, S2 specifically includes:
responding to the existence of the UDF unique identifier, and continuing to execute S3;
and responding to the absence of the UDF unique identifier, acquiring a corresponding UDF source code, writing the UDF unique identifier into a system global cache, and continuing to execute S3.
In some specific embodiments, the UDF source code obtains an interpreter or a compiler corresponding to the development language, and dynamically parses and compiles the UDF source code during runtime, which specifically includes:
responding to that the UDF source code is Python, and dynamically analyzing the UDF source code into a JVM object through Jython;
responding to that the UDF source code is Scala, and dynamically analyzing the UDF source code into a JVM object through a Scala Toolbox;
and in response to the UDF source code being Java, dynamically compiling the UDF source code through the JavaCompiler, and loading the generated Java bytecode into a JVM for running by using a class loader.
In some specific embodiments, executing the UDF-related Flink SQL task in S3 specifically includes the following steps:
in response to the UDF function already existing in the system cache, the compilation and registration need not be repeated;
registering a UDF function pointed by the JVM object into a Flink SQL runtime environment in response to the fact that the UDF function is the JVM object processed by an interpreter or a compiler, wherein during registration, a UDF unique identifier is used as a UDF function name for repeated use by a user in the Flink SQL runtime environment;
the user-submitted SQL job is run in a Flink SQL runtime environment, where SQL includes one or more UDF functions implemented using different development languages.
In some specific embodiments, the persistent storage of the UDF source code related information in S3, and the writing of the UDF unique identifier into the system cache specifically includes the following steps:
responding to that only a UDF source code is provided in the UDF parameters, covering a current Flink SQL runtime environment or a same-name UDF function stored in persistent storage, dynamically analyzing or compiling the UDF source code during runtime, and registering a compiled JVM object in the Flink SQL runtime environment;
responding to that the UDF source code is not provided in the UDF parameters and the unique UDF identifier does not exist in the system cache, dynamically analyzing or compiling the UDF source code during running, and registering the compiled JVM object in a Flink SQL runtime environment;
submitting and running the Flink SQL task, responding to the successful execution of the Flink SQL task, persistently storing the related information of the UDF source code, and writing the unique UDF identifier into a system cache.
In some specific embodiments, the UDF parameters include a UDF unique identification, a UDF development language, and a UDF source code.
According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.
According to a third aspect of the present invention, a system for implementing a user-defined function is provided, where the system includes:
the parameter analysis module is configured for receiving and analyzing a Flink SQL task and a UDF request parameter submitted by a user;
the compiling module is configured for dynamically analyzing and compiling the UDF source codes written aiming at different development languages when running through corresponding interpreters or compilers to generate objects capable of running in the JVM environment;
the task execution module is configured for running SQL (structured query language) operation submitted by a user in a Flink SQL runtime environment, wherein the SQL comprises one or more UDF (UDF function) functions realized by using different development languages;
and the storage module is configured to persistently store the UDF source code related information and write the unique UDF identifier into a system cache.
In some specific embodiments, the parameter parsing module receives a Flink SQL task request parameter submitted by a user, each task providing one or more UDF functions developed in different languages, and the parameter of each UDF function includes a UDF unique identifier, a UDF development language, and a UDF source code.
In some specific embodiments, after receiving the UDF source code parameter, the compiling module obtains a corresponding interpreter or compiler according to the UDF development language type, and dynamically parses and compiles the UDF source code during runtime, specifically including:
responding to that the UDF source code is Python, and dynamically analyzing the UDF source code into a JVM object through Jython;
responding to that the UDF source code is Scala, and dynamically analyzing the UDF source code into a JVM object through a Scala Toolbox;
and in response to the UDF source code being Java, dynamically compiling the UDF source code through a JavaCompiler, and loading the generated Java bytecode into a JVM to run by using a class loader.
In some specific embodiments, the task execution module includes:
in response to the UDF function already existing in the system cache, the compilation and registration need not be repeated;
registering a UDF function pointed by the JVM object into a Flink SQL runtime environment in response to the fact that the UDF function is the JVM object processed by an interpreter or a compiler, wherein during registration, a UDF unique identifier is used as a UDF function name for repeated use by a user in the Flink SQL runtime environment;
and running the SQL operation submitted by the user in a Flink SQL runtime environment, wherein the SQL comprises one or more UDF functions realized by using different development languages.
In some specific embodiments, the storage module improves UDF multiplexing and repeatable debugging capabilities through system caching and persistent storage, the system caching employs JVM in-process caching, and the persistent storage employs a document-based NoSQL database.
The invention provides a method and a system for realizing a user-defined function, which are used for persistently storing a successfully debugged UDF to improve the multiplexing capability of the UDF, reducing the remote connection overhead of persistent storage through a system cache mechanism, solving the problem of repeated compilation of the UDF, accelerating the debugging efficiency and solving the problem of collaborative development through a multi-language online mixed programming mode.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is a flow chart of a method for implementing a user-defined function according to an embodiment of the present application;
FIG. 2 is a block diagram of a system for implementing a specific user-defined function of the present application;
FIG. 3 is a flow diagram of an implementation of a compiling module of a specific embodiment of the present application;
FIG. 4 is a flowchart of an implementation of a task execution module of a particular embodiment of the present application;
FIG. 5 is a flow diagram of an implementation of a memory module of a particular embodiment of the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
FIG. 1 shows a flowchart of a method for implementing a user-defined function according to an embodiment of the present application. As shown in fig. 1, the method includes:
s101: the system service interface accepts and parses the Flink SQL task and UDF request parameters submitted from the user.
S102: responding to that only a UDF unique identifier is provided in a UDF request parameter submitted by a user, and acquiring the UDF unique identifier from a system cache;
responding to the UDF source code provided in the UDF request parameter, acquiring an interpreter or a compiler corresponding to a development language according to the UDF source code, dynamically converting one or more UDF source codes developed by different languages from a text form into a JVM object, and registering a UDF function pointed by the JVM object in a Flink SQL runtime environment.
In a specific embodiment, in response to the existence of the UDF unique identifier, S103 is continuously executed; and responding to the absence of the UDF unique identifier, acquiring a corresponding UDF source code, writing the UDF unique identifier into a system global cache, and continuing to execute S103.
In a specific embodiment, the UDF source code obtains an interpreter or a compiler corresponding to the development language, and dynamically parses and compiles the UDF source code during runtime: responding to that the UDF source code is Python, and dynamically analyzing the UDF source code into a JVM object through Jython; responding to that the UDF source code is Scala, and dynamically analyzing the UDF source code into a JVM object through a Scala Toolbox; and in response to the UDF source code being Java, dynamically compiling the UDF source code through the JavaCompiler, and loading the generated Java bytecode into a JVM for running by using a class loader.
S103: and in response to the successful execution of the UDF related Flink SQL task, persistently storing the UDF source code related information, and writing the UDF unique identifier into a system cache.
In a specific embodiment, executing the UDF-related Flink SQL task specifically includes: in response to the UDF function already existing in the system cache, the compilation and registration need not be repeated; registering a UDF function pointed by the JVM object into a Flink SQL runtime environment in response to the fact that the UDF function is the JVM object processed by an interpreter or a compiler, wherein during registration, a UDF unique identifier is used as a UDF function name for repeated use by a user in the Flink SQL runtime environment; and running the SQL operation submitted by the user in a Flink SQL runtime environment, wherein the SQL comprises one or more UDF functions realized by using different development languages.
In a specific embodiment, the persistently storing the UDF source code related information, and writing the UDF unique identifier into the system cache specifically includes: responding to that only a UDF source code is provided in a UDF parameter, covering a current Flink SQL runtime environment or a same-name UDF function stored in persistent storage, dynamically analyzing or compiling the UDF source code during runtime, and registering a compiled JVM object in the Flink SQL runtime environment; responding to the situation that a UDF source code is not provided in the UDF parameters and a unique UDF identifier does not exist in a system cache, dynamically analyzing or compiling the UDF source code during running, and registering a compiled JVM object in a Flink SQL running environment; submitting and running the Flink SQL task, responding to the successful execution of the Flink SQL task, persistently storing the related information of the UDF source code, and writing the unique UDF identifier into a system cache.
With continued reference to FIG. 2, FIG. 2 illustrates a framework diagram of a system for implementing a user-defined function according to an embodiment of the invention, the system architecture including: the parameter analysis module is configured for receiving and analyzing a Flink SQL task and a UDF request parameter submitted by a user; the compiling module is configured for dynamically analyzing and compiling the UDF source codes written aiming at different development languages in a running mode through corresponding interpreters or compilers to generate an object capable of running in a JVM (JVM) environment; the task execution module is configured for running SQL (structured query language) operation submitted by a user in a Flink SQL runtime environment, wherein the SQL comprises one or more UDF (UDF function) functions realized by using different development languages; and the storage module is configured to persistently store the UDF source code related information and write the UDF unique identifier into a system cache.
In a specific embodiment, the implementation flow steps of the system are as follows:
(1) And receiving and analyzing the Flink SQL task and the UDF related request parameters submitted from the user through a system service interface.
(2) If a certain UDF request parameter submitted by a user only provides a UDF unique identifier but not a UDF source code, acquiring the UDF unique identifier from a system cache:
(2.1) if the unique UDF identifier exists, the UDF is registered in the current Flink SQL runtime environment, and the system can directly call the UDF function to perform service processing without any processing. And (5) continuing to execute the step (4).
(2.2) if the unique UDF identifier does not exist, the fact that the UDF is not registered in the current Flink SQL runtime environment but may exist in a history record of the persistent storage system is shown, therefore, the system can continue to be connected with the persistent storage system, acquire the relevant information of the corresponding UDF source code, and write the unique UDF identifier into a system global cache. And (5) continuing to execute the step (4).
(3) If the UDF source code is provided in the UDF request parameter, or the UDF source code is obtained in step (2.2), the system obtains an interpreter or compiler of the corresponding development language according to the UDF source code, dynamically converts one or more UDF source codes developed using different languages from a text form into a JVM object, and registers the UDF function pointed by the JVM object in the Flink SQL runtime environment.
(4) And executing the UDF related Flink SQL task to finish business processing, if the execution is successful, persistently storing the related information of the UDF source code, and simultaneously writing the unique UDF identifier into a system cache to avoid repeated compilation of the verified UDF source code in the same Flink SQL runtime environment.
In a specific embodiment, the system provides a RESTful external service interface, receives a Flink SQL task request parameter submitted by a front-end user, where each SQL task may provide one or more UDF functions developed in different languages, and the related parameter information of each UDF function is shown in table 1 below, where the description of other task parameters that are not related to UDF is ignored:
TABLE 1 UDF-related parameters
Figure BDA0004015040660000071
In a specific embodiment, the compiling module is mainly used for dynamically analyzing and compiling the UDF source codes written in different development languages through running times of corresponding interpreters or compilers to generate an object capable of running in the JVM environment, so as to meet the requirement of users in different technical fields for realizing collaborative development through a multi-language online mixed programming mode. Fig. 3 shows a flowchart of an implementation of a compiling module according to a specific embodiment of the present application, and as shown in fig. 3, the flowchart mainly includes the following steps:
(1) After receiving the relevant parameters of the UDF source code, the compiling module acquires a corresponding interpreter or compiler according to the type of the UDF development language, and dynamically analyzes and compiles the UDF source code during operation:
(1.1) if the development language used by the UDF source code is Python, the system dynamically analyzes the UDF source code into a JVM object through Jython. Jython is a Python interpreter and compiler written in Java language that can dynamically or statically compile Python code into Java bytecode, enabling Python programs to seamlessly integrate with Java.
(1.2) if the development language used by the UDF source code is Scale, the system dynamically analyzes the UDF source code into a JVM object through the Scale Toolbox. The Scala Toolbox is a tool class provided by the Scala compiler module, and the Scala code can be dynamically compiled and run in a reflection mode.
(1.3) if the development language used by the UDF source code is Java, dynamically compiling the UDF source code by the system through JavaCompailer, and loading the generated Java bytecode into a JVM by using a class loader for running. The Java compiler is a Java source code compiler provided by Java API, and dynamic compilation aiming at the Java source code can be conveniently realized through the StandardFileManager module.
With continuing reference to fig. 4, fig. 4 shows a flowchart of an implementation of a task execution module according to a specific embodiment of the present application, and as shown in fig. 4, the flowchart mainly includes the following steps:
(1) The task execution module firstly judges the UDF source:
(1.1) if the UDF already exists in the system cache, it means that the UDF is already registered in the current Flink SQL runtime environment, and there is no need to repeat compiling and registering.
(1.2) if UDF is a JVM object processed through an interpreter and a compiler in a compilation module, the system needs to register the UDF function pointed to by the JVM object into the Flink SQL runtime environment. And during registration, using the unique UDF identifier as a UDF function name which can be repeatedly used by various users in a Flink SQL runtime environment.
(2) SQL operation submitted by a user is operated in a Flink SQL runtime environment, and the SQL can comprise one or more UDF functions realized by using different development languages to finish business processing.
In particular embodiments, the storage module improves UDF multiplexing and repeatable debug capabilities through system caching and persistent storage. To reduce unnecessary overhead, a JVM in-process cache may be used as a system cache, and a document-based NoSQL database may be used as a UDF persistent storage system. Fig. 5 shows a flowchart of implementation of the storage module according to the embodiment of the present invention, and as shown in fig. 5, the flowchart mainly includes the following steps:
(1) The storage module judges whether only a UDF unique identifier is provided in the UDF parameter and a UDF source code is not provided:
(1.1) if the UDF source code is provided in the UDF parameter, the front-end user needs to cover the current Flink SQL runtime environment or the same-name UDF function stored in the persistent storage after changing the UDF source code. And entering a compiling module, dynamically analyzing or compiling the UDF source code during running, and registering the compiled JVM object in a Flink SQL running environment. And (4) continuing to execute the step (3).
(1.2) if the UDF source code is not provided in the UDF parameter, the UDF source code is verified in the historical debugging process and is subjected to persistent storage. And (5) continuing to execute the step (2).
(2) Judging whether the unique UDF identifier exists in the system cache or not:
(2.1) if the unique UDF identifier exists in the system cache, the UDF is registered in the current Flink SQL runtime environment and is verified.
And (2.2) if the unique UDF identifier does not exist in the system cache, the UDF is possibly stored in the persistent storage system in a source code mode. And acquiring the UDF source code information from the persistent storage system. And entering a compiling module, dynamically analyzing or compiling the UDF source code during running, and registering the compiled JVM object in a Flink SQL running environment.
(3) And the entering task execution module submits and runs a Flink SQL task to verify the UDF service processing capacity. And if the SQL task is successfully executed, writing the related information of the UDF source code into a persistent storage system, and writing the unique UDF identifier into a system cache.
According to the method and the system for realizing the user-defined function, the UDF which is successfully debugged is stored persistently, the multiplexing capacity of the UDF can be improved, the remote connection overhead of persistent storage can be reduced through a system cache mechanism, the problem of repeated compiling of the UDF is solved, the debugging efficiency is accelerated, and the problem of collaborative development of research personnel and common users in different technical fields can be solved through a multi-language online mixed programming mode.
Referring now to FIG. 6, shown is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.
As shown in fig. 6, the computer system includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: a system service interface receives and analyzes a Flink SQL task and a UDF request parameter submitted by a user; responding to a UDF request parameter submitted by a user and only providing a UDF unique identifier, and acquiring the UDF unique identifier from a system cache; responding to the UDF source code provided in the UDF request parameter, acquiring an interpreter or a compiler corresponding to a development language according to the UDF source code, dynamically converting one or more UDF source codes developed by different languages from a text form into a JVM object, and registering a UDF function pointed by the JVM object in a Flink SQL runtime environment; and in response to the successful execution of the UDF related Flink SQL task, persistently storing the UDF source code related information, and writing the UDF unique identifier into a system cache.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for realizing a user-defined function is characterized by comprising the following steps:
s1: a system service interface receives and analyzes a Flink SQL task and a UDF request parameter submitted by a user;
s2: responding to the UDF request parameter submitted by a user and only providing a UDF unique identifier, and acquiring the UDF unique identifier from a system cache;
responding to the UDF source code provided in the UDF request parameter, acquiring an interpreter or a compiler corresponding to a development language according to the UDF source code, dynamically converting one or more UDF source codes developed by different languages from a text form into a JVM object, and registering a UDF function pointed by the JVM object to a Flink SQL runtime environment;
s3: and responding to the successful execution of the UDF related Flink SQL task, persistently storing the UDF source code related information, and writing the UDF unique identifier into a system cache.
2. The method for implementing a user-defined function according to claim 1, wherein the S2 specifically includes:
in response to the UDF unique identifier already existing, continuing to execute the S3;
and responding to the absence of the UDF unique identifier, acquiring a corresponding UDF source code, writing the UDF unique identifier into a system global cache, and continuously executing the S3.
3. The method according to claim 2, wherein the UDF source code obtains an interpreter or a compiler corresponding to a development language, and dynamically parses and compiles the UDF source code during runtime, specifically including:
responding to that the UDF source code is Python, and dynamically analyzing the UDF source code into a JVM object through Jython;
responding to the UDF source code as Scala, and dynamically analyzing the UDF source code into a JVM object through a Scala Toolbox;
and in response to that the UDF source code is Java, dynamically compiling the UDF source code through a JavaCompiler, and loading the generated Java bytecode into a JVM (JVM) by using a class loader for running.
4. The method according to claim 3, wherein the step of executing the UDF-related Flink SQL task in S3 specifically comprises the following steps:
in response to a UDF function already existing in the system cache, without requiring repeated compilation and registration;
in response to that the UDF function is the JVM object processed by the interpreter or the compiler, registering the UDF function pointed by the JVM object in the Flink SQL runtime environment, wherein during registration, a UDF unique identifier is used as a UDF function name for the user to repeatedly use in the Flink SQL runtime environment;
the user-submitted SQL job is run in a Flink SQL runtime environment, where SQL includes one or more UDF functions implemented using different development languages.
5. The method according to claim 4, wherein the step of persistently storing the UDF source code related information in S3 and writing the UDF unique identifier into a system cache specifically includes the steps of:
responding to that only a UDF source code is provided in the UDF parameters, covering the current Flink SQL runtime environment or a homonymous UDF function stored in persistent storage, dynamically analyzing or compiling the UDF source code during runtime, and registering a compiled JVM object in the Flink SQL runtime environment;
responding to that the UDF source code is not provided in the UDF parameters and the unique UDF identifier does not exist in the system cache, dynamically analyzing or compiling the UDF source code during running, and registering the compiled JVM object in a Flink SQL runtime environment;
submitting and running a Flink SQL task, responding to the successful execution of the Flink SQL task, persistently storing the related information of the UDF source code, and writing the unique identifier of the UDF into a system cache.
6. The method of claim 1, wherein the UDF parameters include a UDF unique identifier, a UDF development language, and a UDF source code.
7. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any of claims 1 to 6.
8. A system for implementing a user-defined function, the system comprising:
the parameter analysis module is configured for receiving and analyzing a Flink SQL task and a UDF request parameter submitted by a user;
the compiling module is configured for dynamically analyzing and compiling the UDF source codes written aiming at different development languages in a running mode through corresponding interpreters or compilers to generate an object capable of running in a JVM (JVM) environment;
the task execution module is configured for running SQL (structured query language) operation submitted by a user in a Flink SQL runtime environment, wherein the SQL comprises one or more UDF (UDF function) functions realized by using different development languages;
and the storage module is configured to persistently store the UDF source code related information and write the UDF unique identifier into a system cache.
9. The system for implementing user-defined functions of claim 8, wherein the parameter parsing module receives Flink SQL task request parameters submitted by a user, each task providing one or more UDF functions developed in different languages, and the parameters of each UDF function include a unique UDF identifier, a UDF development language, and a UDF source code.
10. The system according to claim 8, wherein the compiling module obtains a corresponding interpreter or compiler according to the UDF development language type after receiving the UDF source code parameter, and dynamically parses and compiles the UDF source code during runtime, specifically including:
responding to the UDF source code as Python, and dynamically analyzing the UDF source code as a JVM object through Jython;
responding to the UDF source code as Scala, and dynamically analyzing the UDF source code into a JVM object through a Scala Toolbox;
and in response to that the UDF source code is Java, dynamically compiling the UDF source code through a JavaCompiler, and loading the generated Java bytecode into a JVM (JVM) by using a class loader for running.
11. The system for implementing the user-defined function according to claim 8, wherein the task execution module implements the task execution module by:
in response to the UDF function already existing in the system cache, the compiling and registering need not be repeated;
registering a UDF function pointed by the JVM object into the Flink SQL runtime environment in response to the UDF function being the JVM object processed by the interpreter or the compiler, wherein the unique UDF identifier is used as a UDF function name for repeated use by a user in the Flink SQL runtime environment during registration;
and running the SQL operation submitted by the user in a Flink SQL runtime environment, wherein the SQL comprises one or more UDF functions realized by using different development languages.
12. The system of claim 8, wherein the storage module is configured to improve UDF multiplexing and repeatable debugging capabilities by using a system cache and a persistent storage, the system cache is implemented using a JVM in-process cache, and the persistent storage is implemented using a document-based NoSQL database.
CN202211669162.5A 2022-12-23 2022-12-23 Method and system for realizing user-defined function Pending CN115964050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211669162.5A CN115964050A (en) 2022-12-23 2022-12-23 Method and system for realizing user-defined function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211669162.5A CN115964050A (en) 2022-12-23 2022-12-23 Method and system for realizing user-defined function

Publications (1)

Publication Number Publication Date
CN115964050A true CN115964050A (en) 2023-04-14

Family

ID=87357489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211669162.5A Pending CN115964050A (en) 2022-12-23 2022-12-23 Method and system for realizing user-defined function

Country Status (1)

Country Link
CN (1) CN115964050A (en)

Similar Documents

Publication Publication Date Title
CN110096338B (en) Intelligent contract execution method, device, equipment and medium
US9430200B1 (en) Cross-library framework architecture feature sets
CN110865889B (en) Method and device for transmitting event between components
US8181155B2 (en) Unified expression and location framework
US11550599B2 (en) Method and apparatus for running applet
US20110131561A1 (en) Memory Optimization of Virtual Machine Code by Partitioning Extraneous Information
CN110058861B (en) Source code processing method and device, storage medium and electronic equipment
US20060212861A1 (en) Typed intermediate representation for object-oriented languages
WO2017015071A1 (en) Incremental interprocedural dataflow analysis during compilation
CN111488579A (en) Vulnerability detection method and device, electronic equipment and computer readable storage medium
CN114115904A (en) Information processing method, information processing apparatus, server, and storage medium
CN111771186A (en) Compiler generated asynchronous enumeratable objects
US8661421B2 (en) Methods and apparatuses for endian conversion
CN110188071B (en) Data processing method and device, electronic equipment and computer readable medium
CN115964050A (en) Method and system for realizing user-defined function
CN114416103A (en) Code compiling method and device, computer readable medium and electronic equipment
US10133559B2 (en) Generating executable files through compiler optimization
EP3906470B1 (en) Techniques for scheduling instructions in compiling source code
CN113778451A (en) File loading method and device, computer system and computer readable storage medium
CN113448874A (en) Method and device for generating unit test script
US9720660B2 (en) Binary interface instrumentation
CN111796865A (en) Byte code file modification method and device, terminal equipment and medium
CN117075907B (en) Application program compiling method, system, compiler and storage medium
JP2019109687A (en) Programming language conversion support device, programming language conversion support method and program
CN117075912B (en) Method for program language conversion, compiling method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination