CN116954607A

CN116954607A - Multi-source heterogeneous real-time task processing method, system, equipment and medium

Info

Publication number: CN116954607A
Application number: CN202310952410.5A
Authority: CN
Inventors: 曹林; 刘洋; 涂平; 靖琦东; 贺群雄; 张林宇; 刘准; 梁春峰; 仇亚龙; 刘博�; 彭中益; 王斯政; 廖佳佳; 李志超; 贺若龙; 彭东
Original assignee: China Power Industry Internet Co ltd
Current assignee: China Power Industry Internet Co ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-27

Abstract

The application relates to a multi-source heterogeneous real-time task processing method, a system, equipment and a medium, wherein the method provides WEB visual interface task configuration through the design of a real-time data source management page and a target table metadata management page; the construction process of the real-time task configuration is efficiently completed through the design of links such as Kafka protocol format standard, kafka data protocol analysis method, seaTunnel real-time data integration pipeline model, seaTunnel real-time integration plug-in type, seaTunnel execution environment plug-in, seaTunnel real-time input plug-in, seaTunnel real-time conversion plug-in, seaTunnel real-time output plug-in, seaTunnel real-time data integration pipeline plug-in configuration, seaTunnel pipeline plug-in combination arrangement, seaTunnel data storage model, seaTunnel target table data structure analysis, seaTunnel real-time scheduling task construction and the like.

Description

Multi-source heterogeneous real-time task processing method, system, equipment and medium

Technical Field

The application belongs to the technical field of data processing, and relates to a multi-source heterogeneous real-time task processing method, a system, equipment and a medium.

Background

Along with the wide application of the 5G technology in the industrial field, the types of industrial equipment at the edge end are diversified, and the industrial equipment generates a large amount of heterogeneous working condition data, so that the health condition of the equipment is very important to monitor in time, and even if the data protocols of the industrial equipment are different, the heterogeneous data are required to be transmitted to a cloud or a large data platform in time and efficiently so as to acquire the abnormal condition of the equipment in real time and complete the alarm. At present, most real-time integrated systems have certain technical difficulties in processing links such as various protocols, various networks, complex processes, task construction and the like.

At present, a Flink computing engine is a current real-time mainstream technology, and many students have studied the method for constructing a Flink real-time task in depth, for example, the method for constructing a Flink real-time integrated task by dynamic loading job configuration, the method for constructing a Flink real-time integrated task by dragging a source end operator, the method for constructing a Flink real-time integrated task by operating a page cross-network configuration, and the method for constructing a Flink real-time integrated task by uploading a FlinkJAR package through a front end, which still have the technical problem of low task data processing efficiency when facing heterogeneous data processing tasks of a large number of cloud-edge industrial devices.

Disclosure of Invention

Aiming at the problems in the traditional method, the invention provides a multi-source heterogeneous real-time task processing method, a multi-source heterogeneous real-time task processing system, a computer device and a computer readable storage medium, which can greatly improve the task data processing efficiency.

In order to achieve the above object, the embodiment of the present invention adopts the following technical scheme:

in one aspect, a method for processing a multi-source heterogeneous real-time task is provided, including the steps of:

according to the real-time task scene of the heterogeneous data source, configuring a real-time data source management page of the task; configuring a real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification;

Configuring a target table metadata management page of a task; the configuration process comprises the steps of selecting a Phoenix target output source or a MySQL target output source, pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length;

according to a standard template of the Kafka protocol format, adopting a nested structure based on List+map and adopting a Json data format to package Kafka real-time streaming data;

analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to a Kafka data protocol analysis template; the composition of Mapping data protocol includes protocol field, field alias, field position and data type;

creating a SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time data integration pipeline model; the pipeline module of the SeaTunnel real-time data integration pipeline comprises a Flink execution environment module, a Flink input module, a Flink conversion module and a Flink output module;

dynamically selecting an integrated plug-in and configuring plug-in parameter information based on a DAG dragging mode in a pipeline module composition of a SeaTunel real-time data integrated pipeline according to a SeaTunel real-time data integrated pipeline plug-in model; the integrated plug-ins comprise an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters;

According to the SeaTunnel real-time integrated plug-in combination model, combining and arranging integrated plug-ins in a pipeline module composition of a SeaTunnel real-time data integrated pipeline; the combination arrangement comprises the steps of carrying out association combination on an input plug-in, a conversion plug-in and an output plug-in;

according to a Kafka data analysis flow model, configuring and utilizing Mapping protocol analysis rules to analyze Kafka real-time streaming data of a Json nested structure in a Kafka real-time input source plug-in of a Flink input module, and converting the Kafka real-time streaming data into streaming data of a plurality of single-layer structures;

according to the SeaTunnel data storage model, configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module; the data storage mode comprises a detail model, a primary key model and an aggregation model;

constructing a SeaTunnel real-time task configuration template according to the task parameter configuration template based on the Flink;

according to the SeaTunnel real-time Task configuration template, in the pipeline module composition of the SeaTunnel real-time data integration pipeline, an input plug-in the Flink input module, a conversion plug-in the Flink conversion module and an output plug-in the Flink output module are packaged into a real-time Task entity Task;

According to the SeaTunnel target table data structure analysis model, module parameter analysis and conversion processing are carried out in Task configuration of a real-time Task entity Task, and a target table data structure is obtained; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing data fields and types of the conversion module, and simultaneously converting the data fields and the types according to a SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module;

and combining the Flink calculation engine, task configuration, data storage mode, target table data structure and target table sentence according to the SeaTunnel real-time scheduling Task model, and packaging the combined combination to form a SeaTunnel real-time scheduling Task instance, so as to complete SeaTunnel real-time Task scheduling of the real-time Task scene.

In another aspect, a multi-source heterogeneous real-time task processing system is provided, including:

the first configuration module is used for configuring a real-time data source management page of a task according to a real-time task scene of the heterogeneous data source; configuring a real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification;

The second configuration module is used for configuring a target table metadata management page of the task; the configuration process comprises the steps of selecting a Phoenix target output source or a MySQL target output source, pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length;

the data packaging module is used for packaging the Kafka real-time streaming data by adopting a nested structure based on List+map and adopting a Json data format according to a Kafka protocol format standard template;

the protocol analysis module is used for analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to the Kafka data protocol analysis template; the composition of Mapping data protocol includes protocol field, field alias, field position and data type;

the pipeline creation module is used for creating a SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time data integration pipeline model; the pipeline module of the SeaTunnel real-time data integration pipeline comprises a Flink execution environment module, a Flink input module, a Flink conversion module and a Flink output module;

the plug-in configuration module is used for dynamically selecting an integrated plug-in and configuring plug-in parameter information based on a DAG dragging mode in the pipeline module composition of the SeaTunnel real-time data integrated pipeline according to the SeaTunnel real-time data integrated pipeline plug-in model; the integrated plug-ins comprise an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters;

The plug-in arrangement module is used for carrying out combination arrangement on the integrated plug-ins in the pipeline module composition of the SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time integrated plug-in combination model; the combination arrangement comprises the steps of carrying out association combination on an input plug-in, a conversion plug-in and an output plug-in;

the analysis configuration module is used for configuring and utilizing Mapping protocol analysis rules to analyze the Kafka real-time streaming data of the Json nested structure in the Kafka real-time input source plug-in of the Flink input module according to the Kafka data analysis flow model, and converting the Kafka real-time streaming data into streaming data of a plurality of single-layer structures;

the storage selection module is used for configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module according to the SeaTunnel data storage model; the data storage mode comprises a detail model, a primary key model and an aggregation model;

the template construction module is used for constructing a SeaTunnel real-time task configuration template according to the task parameter configuration template based on the Flink;

and the Task entity module is used for packaging an input plug-in the Flink input module, a conversion plug-in the Flink conversion module and an output plug-in the Flink output module into a real-time Task entity Task in a pipeline module composition of the SeaTunel real-time data integration pipeline according to the SeaTunel real-time Task configuration template.

The structure analysis module is used for carrying out module parameter analysis and conversion processing in Task configuration of the real-time Task entity Task according to the SeaTunnel target table data structure analysis model to obtain a target table data structure; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing data fields and types of the conversion module, and simultaneously converting the data fields and the types according to a SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module;

and the Task scheduling module is used for combining the Flink calculation engine, the Task configuration, the data storage mode, the target table data structure and the target table construction sentence according to the SeaTunnel real-time scheduling Task model, and packaging the combined combination into a SeaTunnel real-time scheduling Task instance to complete SeaTunnel real-time Task scheduling of a real-time Task scene.

In yet another aspect, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the multi-source heterogeneous real-time task processing method described above when executing the computer program.

In yet another aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the multi-source heterogeneous real-time task processing method described above.

One of the above technical solutions has the following advantages and beneficial effects:

according to the multi-source heterogeneous real-time task processing method, system, equipment and medium, the WEB visual interface task configuration is provided through the real-time data source management page and the target table metadata management page design; the construction process of the real-time task configuration is efficiently completed through the design of links such as Kafka protocol format standard, kafka data protocol analysis method, seaTunnel real-time data integration pipeline model, seaTunnel real-time data integration pipeline plug-in type, seaTunnel execution environment plug-in, seaTunnel real-time input plug-in, seaTunnel real-time conversion plug-in, seaTunnel real-time output plug-in, seaTunnel real-time data integration pipeline plug-in configuration, seaTunnel pipeline plug-in combination arrangement, seaTunnel data storage model, seaTunnel target table data structure analysis, seaTunnel real-time scheduling task construction and the like.

Compared with the prior art, the technical scheme of the application is based on the multi-source heterogeneous real-time task construction of SeaTunnel, and a set of simple and efficient one-stop real-time task construction standard scheme is created for reducing the technical difficulty of the development of the Flink task; decoupling each link of the real-time integration by adopting a SeaTunnel plug-in module, and simplifying the processing process of the Flink real-time data; a unified standard protocol format is designed, so that protocol differences of equipment of different manufacturers are eliminated; a unified protocol analysis specification is designed, so that the multi-source heterogeneous real-time conversion capability is improved; designing various data source real-time plug-ins, arranging and configuring the plug-ins dynamically in advance, and solving the problem of configuration of a plurality of network real-time tasks; and through WEB visual interface task configuration, the user interaction experience is enhanced. Finally, through the set of real-time task construction standard and specification technology, the technical problem of real-time integration of cloud edge equipment of multiple service types, multiple data protocols and multiple network types is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments or the conventional techniques of the present application, the drawings required for the descriptions of the embodiments or the conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a flow diagram of a method for processing multi-source heterogeneous real-time tasks in one embodiment;

FIG. 2 is a schematic diagram of a real-time data integration pipeline in one embodiment;

FIG. 3 is a diagram illustrating real-time plug-in classification and relationships in one embodiment;

FIG. 4 is a flow diagram of automatic updating of a table structure in one embodiment;

FIG. 5 is a schematic diagram of a module composition framework of a multi-source heterogeneous real-time task processing system in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. It is noted that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

In the process of realizing the invention, the inventor finds that most methods adopt a more traditional hard coding method when carrying out real-time task construction in the prior art, and most integrated systems lack an effective Flink task configuration scheme; in cloud edge real-time integration of multiple equipment types, development difficulty is greatly increased by adopting a hard coding mode; or the manual operation and maintenance cost is increased by adopting a packing and uploading Jar program. For example, a method for constructing a Flink real-time integrated task by dynamically loading job configuration is used for data transmission of various equipment protocols, the customization degree of the task configuration is too high, and the processing efficiency is reduced; the method for constructing the Flink real-time integrated task by dragging the source end operator lacks decoupling and splitting of the real-time data integration link, so that the task parameters are repeatedly configured; the method for constructing the Flink real-time integrated task through the cross-network configuration of the operation page is used for executing various database operations such as inquiring data, inserting, updating and deleting data, creating and modifying a database table structure and the like, and the flow is complicated with the increase of the service due to the fact that SQL (Structured Query Language, a programming language for managing a relational database) is required to be customized and developed according to different service requirements; the method for constructing the Flink real-time integrated task by uploading the FlinkJAR package at the front end adopts a hard code task configuration mode, so that the development efficiency is low.

Aiming at the defects of the prior art, the invention aims to create a set of real-time task construction scheme and system based on SeaTunnel (Apache SeaTunnel), which are a data integration platform with distributed, high performance and easy expansion and used for real-time synchronization and conversion of mass data, load a plurality of heterogeneous data sources simultaneously, are compatible with a plurality of equipment protocols and low-code plug-in configuration, reduce the development difficulty of a Flink task, simplify the real-time data processing flow.

Embodiments of the present invention will be described in detail below with reference to the attached drawings in the drawings of the embodiments of the present invention.

Referring to fig. 1, in one embodiment, a multi-source heterogeneous real-time task processing method is provided, which includes the following processing steps S10 to S22:

s10, configuring a real-time data source management page of a task according to a real-time task scene of a heterogeneous data source; configuring the real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification.

It can be understood that for each industrial device facing the current application scenario, the industrial device is respectively used as various heterogeneous data sources, when a task system is built for the real-time task scenario based on the SeaTunnel platform, a Kafka real-time input source, a Phoenix target output source, a MySQL target output source and the like are added in a real-time data source management page of the platform, connection information of the data sources is saved, connectivity test and verification of the data sources are completed, and therefore data flow cannot be wrong in the subsequent operation process.

S11, configuring a target table metadata management page of a task; the configuration process comprises selecting a Phoenix target output source or a MySQL target output source, pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length.

It can be understood that in the target table metadata management page of the platform, a Phoenix/MySQL target output source is selected, metadata information such as a data set, a data field, a field type, a primary key, precision, length and the like in the target output source is pulled, and the information is stored in a relevant configuration so as to complete the importing of the target table and the field metadata information in the real-time data source.

S12, according to a standard template of the Kafka protocol format, adopting a nested structure based on List+map and adopting a Json data format to package Kafka real-time streaming data.

It will be appreciated that it is next necessary to define a Kafka protocol format standard template to normalize the Kafka device real-time data format and to encapsulate the device real-time streaming data in a Json data format based on such protocol format. The Kafka protocol format standard refers to a message format specification defined in the Apache Kafka project for communication between devices and applications. Apache Kafka is a distributed streaming data platform for building high reliability, scalability, and fault tolerance real-time streaming applications. The Kafka protocol format standard defines the structure and content of messages so that devices and applications can communicate in a consistent manner. This protocol format standard defines the key-value pair structure of a message, where the key is used to identify the type or attribute of the message and the value contains specific message data. The Kafka protocol format standard typically uses Json or binary format to represent messages. The Json format is easy to read and parse, while the binary format can provide higher performance and smaller data transfer. Using the Kafka protocol format standard, a device may publish messages to the Kafka cluster and an application may subscribe to and process the messages. This event-based messaging mode allows devices and applications to exchange and interoperate in real-time. It should be noted that the Kafka protocol format standard is not an official standard, but is a contract formulated and generalized by the Kafka community. Thus, when the Kafka protocol format standard is actually used, the appropriate expansion and customization can be performed according to specific requirements and service scenes.

The list+map-based nested structure is a commonly used data structure for representing data having a hierarchical relationship. In such a structure, a complex data model can be constructed using a combination of a List (List) and a Map (Map). For example, each map contains keys (key 1, key2, key3, key 4) and corresponding values. In these key-value pairs, it can be seen that the value corresponding to key3 is a nested map, while the value corresponding to key4 is a nested linked list. By using the nested structure of list+map, the multi-level data relationship can be conveniently represented, so that the data can be organized and accessed in a hierarchical manner. This architecture is widely used in many programming languages and data exchange formats (e.g., json) to meet the requirements of complex data models. Through the device protocol standard design, the device protocol difference is eliminated, and the Flink data conversion capability is improved.

Specifically, a nested structure based on List+Map is designed, a device working condition protocol format is standardized, and Kafka real-time streaming data is packaged by adopting a Json data format. The Kafka protocol format standard template can be used as follows:

{"field1":"value1","field2":value2,"field3":"value3","field4":"value4",...,"list":[{"field5":"value5","map":{"field6":"value6","field7":value7,"field8":value8,...},"fiel d9":value9}]}。

wherein, variable field is protocol field prefix, variable value is actual data prefix, variable list is linear linked list data structure, variable map is key value data structure, and list and map are utilized to form multi-layer nested structure. In this embodiment, a configured list+map-based nested structure protocol sample may be as follows:

{"traceId":"1111111111","productId":null,"dataType":"XXXXX","tenantId":"222222","deviceId":"3333333","data":[{"identifier":"111","value":{"handle":"zzzzzzz","gasConcentration":4444,"vibrationX":555,"vibrationY":666,},"ts":1677650773507}]}。

The SeaTunnel real-time integration plug-in type is designed, specifically, the decoupling and unit splitting are carried out on a real-time data integration link, and the following 4 unit real-time task plug-in types are designed: execution environment plug-ins, input plug-ins, conversion plug-ins, output plug-ins, etc., real-time input plug-ins based on SeaTunnel, for example, may include plug-ins such as Kafka, rabbitMQ, flinkCDC, hudi and Paimon; real-time output plug-ins based on SeaTunnel, for example, may include plug-ins such as MySQL, doris, kafka, phoniex and ClickHouse; the real-time conversion plug-in based on SeaTunnel can comprise plug-ins such as Protocal protocol conversion, SQL conversion, filter data filtration, window Window calculation and the like, and SeaTunnel execution environment plug-in configuration is also set to pull the execution environment plug-in and configure the Flink execution parameters thereof.

S13, analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to a Kafka data protocol analysis template; the composition of the Mapping data protocol includes a protocol field, a field alias, a field location, and a data type. The Kafka data protocol analysis template used can be as follows:

{"field_1":"field1|type1","field_2":"field2|type2","field_3":"field3|type3","field_4":"field4|type4","field_5":"list[].field5|type5","field_6":"list[].map.field6|type6","fi eld_7":"list[].map.field7|type7","field_8":"list[].map.field8|type8","field_9:"field9|ty pe9"}。

wherein, variable field is protocol field prefix, variable field_is field alias prefix, variable type is data type and type includes string, int, long, double and float etc. existing data type, variable list [ ] is linear chain table, variable map is key value pair.

It will be appreciated that it is then necessary to define the Kafka data protocol parsing method: a Mapping-based data protocol analysis template is designed, and a Kafka protocol rule analysis (rule) method is defined according to the protocol template so as to be used for analyzing equipment working condition protocols. Mapping-based data protocol parsing templates are templates for defining data transmission formats and communication specifications, which use the structure of key-value pairs to describe the properties and values of messages, and can be extended and customized according to specific requirements. The protocol template consists of a top-level Mapping (Mapping) that contains a plurality of key-value pairs. Each key (field 1, field2, field3, field 4) corresponds to a value, which may be a simple data type (e.g., string, number, etc.) or a complex data structure such as a nested map, linked list, etc. The protocol templates specify the data structure and content in the transmission of the message by defining fields and their values. The sender and the receiver encode and decode the data according to the agreement of the protocol template to ensure the correct analysis and processing of the data. The protocol templates can be expanded and customized according to actual requirements. For example, new fields may be added, existing field data types may be modified, or constraints may be changed to accommodate different application scenarios and data exchange requirements. Mapping-based data protocol parsing templates are widely used in various fields and technologies, such as network communication protocols, data transmission formats (e.g., json, XML, etc.), and API design and data exchange interface specifications. It provides a uniform way to define and interpret data for reliable data communication between different systems and components.

In this embodiment, the analysis sample configured based on Mapping data protocol may be as follows:

{ "device_id": "device|string", "data_type": "datatype|string", "identifier": "data [ ]. Identifier|string", "handle": "data [ ]. Handle|string", "visualization x": "data [ ]. Visualization x|long", "ts": "data [ ]. Ts|long" }. The definitions of the variables in the sample may be understood by referring to the same variables in the Mapping data protocol, and will not be described in detail herein.

S14, creating a SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time data integration pipeline model; the pipeline module of the SeaTunnel real-time data integration pipeline comprises a Flink execution environment module, a Flink input module, a Flink conversion module and a Flink output module.

It may be appreciated that in this embodiment, a real-time data integration pipeline for the current task scenario may be created by using a SeaTunnel real-time data integration pipeline model based on the DAG view operation mode, and the composition of the integration pipeline module may include a Flink execution environment module, a Flink input module, a Flink conversion module, and a Flink output module.

Specifically, firstly, a DAG (directed acyclic graph) -based view operation is adopted to create an empty real-time data integration pipeline, and the empty real-time data integration pipeline is divided into 4 flank pipeline modules: a flank execution environment module, a flank input module, a flank conversion module, a flank output module, and the like, as shown in fig. 2 and 3. Wherein, the utilized SeaTunnel real-time data integrated pipeline model can be as follows ₁ The following is shown:

wherein, flinkPipe () is real-timeData integration pipeline model, n ₁ To input the number of modules, n ₂ To convert the number of modules, n ₃ For the number of output modules, FE is a Flink execution environment module, FI is a Flink input module, FT is a Flink conversion module, and FO is a Flink output module.

S15, dynamically selecting an integrated plug-in and configuring plug-in parameter information based on a DAG dragging mode in a pipeline module composition of a SeaTunnel real-time data integrated pipeline according to a SeaTunnel real-time data integrated pipeline plug-in model; the integrated plug-in comprises an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters. Wherein, the utilized SeaTunnel real-time data integration pipeline plug-in model can be as follows ₂ The following is shown:

wherein, flinkPlugin () is a real-time data integration pipeline plug-in model, n ₁ For inputting the number of module plug-ins, n ₂ For the number of conversion module plug-ins, n ₃ The FPE is an execution environment module plug-in, the FPI is an input module plug-in, the FPT is a conversion module plug-in, the FPO is an output module plug-in, the Protocol is a Protocol conversion plug-in, the SQL is an SQL conversion plug-in, the Filter is a data filtering plug-in, the Window is a Window calculation plug-in, and the { } is a plug-in.

It is understood that the DAG (directed acyclic graph) -based drag manner refers to constructing and editing a dataflow graph of tasks through drag operations. A DAG is a graph structure in which nodes represent tasks or operations and edges represent dependencies between tasks. In the DAG-based drag-and-drop approach, users can interact through a graphical interface (GUI) to visually create and edit a dataflow graph of tasks. This approach is typically used in a visualization data processing tool or task orchestration tool, where a user can define input, output, and conversion relationships of tasks through simple drag and drop operations, without having to write complex code.

The following is a general flow based on DAG towing mode: preparing a task component: task components refer to available operation or task elements that a user can drag to a workspace. Creating a task: the user may select a desired task from the task components and drag it into the workspace. Connection tasks: the user can build a dependency between tasks by dragging the connection lines. The connection lines represent the flow direction of the data stream, from the output of one task to the input of another task. Configuration tasks: the user may configure the parameters and attributes of the task by double clicking on the task node or using other interaction means. This may include setting up execution environments, selecting input sources, defining conversion operations, specifying output targets, and so forth. Adjusting the task sequence: the user can adjust the order between the task nodes by dragging them, thereby controlling the execution order of the tasks. Verification and running tasks: once the construction of the data flow graph of the task is completed, a user can verify the data flow graph to ensure that the structure and the dependency relationship of the graph are correct. The user may then perform the task, observing its output. The DAG-based drag-and-drop approach provides an intuitive and interactive way of task construction that allows non-programming professionals to easily create and manage data processing tasks as well. It can improve the efficiency of task development and reduce the complexity of writing and debugging code.

Specifically, in the process of performing plug-in configuration on the SeaTunnel real-time data integration pipeline, in the SeaTunnel real-time data integration pipeline, plug-in examples and parameter configurations are dynamically selected based on a DAG dragging mode, for example: the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises execution environment parameters, mapping data protocols, filtering conditions and target output table parameters. Through the SeaTunnel real-time integration plug-in configuration, the method can be compatible with various heterogeneous data sources and is suitable for the Flink multi-service real-time integration.

S16, according to a SeaTunnel real-time integrated plug-in combination model, combining and arranging integrated plug-ins in a pipeline module composition of a SeaTunnel real-time data integrated pipeline; the composition arrangement includes an associative combination of an input plug-in, a conversion plug-in, and an output plug-in.

It can be understood that the SeaTunnel real-time data integration plugins are combined and arranged, that is, the relationships of 3 module plugins (input plugins, conversion plugins and output plugins) in the real-time data integration pipeline are combined and arranged, wherein the SeaTunnel real-time integration plugin combination model can be used as follows ₃ The following is shown:

wherein, flinkCombine () is a real-time integrated plug-in combination model,is from n ₃ 1 output plug-in is selected from the output module plug-ins, and the output plug-ins are->Represents a value represented by n ₁ Selection of a among input module plugins ₁ Input plug-in->Represents a value represented by n ₂ Selection of a among the conversion module inserts ₂ A conversion plug-ins, and a ₁ ≤n ₁ ，a ₁ ≤n ₂ FPI is a Flink input plug-in, FPT is a Flink conversion plug-in, and FPO is a Flink output plug-in. Specifically, the SeaTunnel real-time integration plug-in is combined and arranged, firstly, an output module plug-in FPO is taken as a marking center, then selection and combination are carried out in an input module plug-in FPI and a conversion module plug-in FPT, and meanwhile, a DAG node connection mode is adopted to combineThe latter 3 plug-ins are arranged in relation. Through the combination arrangement of the SeaTunnel real-time integration plug-ins, low-code data development is realized, and the Flink real-time task construction process is simplified.

S17, according to a Kafka data analysis flow model, the Kafka real-time streaming data of the Json nested structure is analyzed by configuring and utilizing Mapping protocol analysis rules in a Kafka real-time input source plug-in of the Flink input module, and the Kafka real-time streaming data is converted into streaming data of a plurality of single-layer structures.

It can be understood that after the step S15 is completed, a protocol parsing rule is further configured for the Kafka real-time input source plug-in, and according to the Mapping-based data protocol parsing template, a Mapping protocol parsing rule is configured for the Kafka real-time input source plug-in to parse the list+map-based Json format data defined in the step into single-layer simple-structure streaming data. Wherein, the Kafka data analysis flow model can be used as follows ₄ The following is shown:

wherein KafkaDecode () is a Kafka data parsing process, n ₁ For the number of input module plugins, k is the number of Json format data of a Single plugin, m is the number of analyzed data, mapping is a protocol analysis rule, single is a Single-layer data structure, an arrow indicates a data analysis direction, and { } indicates a data structure composition.

S18, configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module according to the SeaTunnel data storage model; the data storage mode includes a detail model DM, a primary key model PKM, or an aggregation model AM. Wherein, the SeaTunnel data storage model can be expressed as the following formula y ₅ The following is shown:

wherein, flinkStore () isOutput table storage model, output table T is MySQL type or Phoenix type, n ₃ A for outputting the number of module plug-ins ₁ ,a ₂ ,a ₃ Is an integer parameter and a ₁ +a ₂ +a ₃ =1, dm is a detail model, PKM is a primary key model, AM is an aggregation model, { } represents the target table type composition.

It will be appreciated that it is then also necessary to select a data storage schema for the target output plug-in Phoenix/MySQL. Specifically, through an interface visual operation mode, one data storage mode of a detail model, a primary key model and an aggregation model is selected from the target output plug-in.

S19, constructing a SeaTunnel real-time task configuration template according to the task parameter configuration template based on the Flink.

It can be understood that after the data storage mode selection is completed, a SeaTunnel real-time task configuration template is also required to be constructed, and according to the SeaTunnel real-time data integration design concept and in combination with the task configuration standard specification, a task parameter configuration template based on Flink is designed, which can be specifically shown as follows:

{env{job-name＝"task-seatunnel-flink-0-0001",parallelism＝1},source{Kafka{}},transform{filter{},sql{}},sink{Phoenix{}}}。

wherein env { … } represents a Flink execution environment module configuration, job-name represents an application name, parallel is a data processing parallelism, source { … } represents an input module plug-in configuration, transform { … } represents a conversion module plug-in configuration, sink { … } represents an output module plug-in configuration, kafka, filter, sql and Phoenix are plug-in definition identifiers.

S20, according to the SeaTunnel real-time Task configuration template, in the pipeline module composition of the SeaTunnel real-time data integration pipeline, an input plug-in the Flink input module, a conversion plug-in the Flink conversion module and an output plug-in the Flink output module are packaged into a real-time Task entity Task. Wherein, the utilized SeaTunnel real-time task configuration template can be represented by the following formula y ₆ The following is shown:

wherein, flinkTask () represents a real-time task configuration template, n ₁ For inputting the number of module plug-ins, n ₂ For the number of conversion module plug-ins, n ₃ For the number of output module plug-ins, FPI is an input module plug-in, FPT is a conversion module plug-in, FPO is an output module plug-in, template is a Flink Task configuration Template, task is a real-time Task entity, and an arrow indicates a Task conversion direction.

It can be understood that after the above steps are completed, the real-time Task entity Task is converted for 4 types of integration plugins in the SeaTunnel real-time data integration pipeline, that is, the execution environment plugin configuration in the flank execution environment module, the input plugin configuration in the flank input module, the conversion plugin configuration in the flank conversion module and the output plugin configuration in the flank output module, and the Task parameter configuration template based on flank is utilized to convert the Task entity Task and the configuration thereof in SeaTunnel real-time.

S21, according to a SeaTunnel target table data structure analysis model, module parameter analysis and conversion processing are carried out in Task configuration of a real-time Task entity Task, and a target table data structure is obtained; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing data fields and types of the conversion module, and simultaneously converting the data fields and the types according to a SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module.

It can be understood that 3 plug-ins in the real-time Task entity Task are configured and analyzed by using the SeaTunnel target table data structure analysis model and the SeaTunnel real-time integrated plug-in combination model and converted into a target table data structure, and target table construction statement is automatically spliced, so that automatic creation of the Seatunnel target table and automatic change of the target table structure are realized. Wherein, the defined SeaTunnel target table data structure analysis model can be represented by the following formula y ₇ The following is shown:

wherein, flinkStruct () represents a data structure parsing model, flinkCombie is a SeaTunnel real-time integration plug-in combination model, n ₁ For inputting the number of module plug-ins, n ₂ For the number of conversion module plug-ins, n ₃ For the number of output module plug-ins, FSI is configured for input module plug-ins, FST is configured for conversion module plug-ins, FSO is configured for output module plug-ins, FOTS is a target table data structure, and arrow indicates the analysis direction of the data structure.

Specifically, in the configuration of the real-time Task entity Task, mapping protocol analysis rules in a Kafka real-time source input plug-in of an input module are obtained, and data fields and types of the input module are analyzed; acquiring parameter configuration of a filter plug-in or an sql plug-in a conversion module, and analyzing data fields and types of the conversion module; and obtaining the target table name of the output plugin in the output module, and converting according to the real-time integrated plugin combination model to obtain a final target table data structure.

S22, according to the SeaTunnel real-time scheduling Task model, a Flink calculation engine, task configuration, a data storage mode, a target table data structure and a target table sentence are combined and packaged into a SeaTunnel real-time scheduling Task instance, and SeaTunnel real-time Task scheduling of a real-time Task scene is completed.

Wherein, the defined SeaTunnel real-time scheduling task model can be as follows ₈ The following is shown:

wherein SeaTunnel () is formed by real-time scheduling task, n ₃ For outputting the number of module plug-ins, the Flink is a Flink (real-time) computing engine, the Task is Task configuration, the Store is a data storage mode, the FOTS is a target table data structure, and the DDL is a target table construction table sentence

According to the multi-source heterogeneous real-time task processing method, the WEB visual interface task configuration is provided through the real-time data source management page and the target table metadata management page design; the construction process of the real-time task configuration is efficiently completed through the design of links such as Kafka protocol format standard, kafka data protocol analysis method, seaTunnel real-time data integration pipeline model, seaTunnel real-time integration plug-in type, seaTunnel execution environment plug-in, seaTunnel real-time input plug-in, seaTunnel real-time conversion plug-in, seaTunnel real-time output plug-in, seaTunnel real-time data integration pipeline plug-in configuration, seaTunnel pipeline plug-in combination arrangement, seaTunnel data storage model, seaTunnel target table data structure analysis, seaTunnel real-time scheduling task construction and the like.

In one embodiment, further, as shown in fig. 4, the above-mentioned multi-source heterogeneous real-time task processing method may further include such processing steps as:

s23, extracting Mapping protocol analysis rules according to Kafka real-time input source plug-in configuration of SeaTunnel real-time scheduling task instance configuration;

S24, analyzing the data field and the type of the source input module according to the Mapping protocol analysis rule, and analyzing the parameters of the transformation conversion module to obtain the data structure of the final sink output module;

s25, analyzing a SeaTunnel target table data structure according to the data structure of the sink output module and splicing target table construction statement;

s26, obtaining a target table construction statement by adopting Jdbc connection, judging whether a SeaTunnel target table exists, and if so, comparing the obtained target table construction statement with a data structure of a sink output module;

and S27, if the comparison result determines that the SeaTunnel target table data structure is unchanged, maintaining the SeaTunnel target table data structure unchanged.

It can be understood that, in order to automatically identify the change of the data structure and implement the automatic change operation, the data structure of the SeaTunnel target table may be first analyzed, specifically, according to the foregoing plug-in relation dynamically arranged by the SeaTunnel real-time plug-in relation, the Mapping protocol of the Kafka real-time input source plug-in step S17 is extracted, the data field and the type of the source input module are analyzed, and the plug-in parameters of the transform conversion module are analyzed, so as to obtain the data structure of the final sink output module target table.

And then realizing automatic creation of the SeaTunnel target table, namely analyzing a sink output module target table data structure from the steps, splicing table construction statement DDL of the target table, judging whether the Seatunnel target table exists by adopting Jdbc connection, if so, comparing the acquired table construction statement with the data structure of the sink output module, judging whether the table structure is changed, and if not, maintaining the table structure unchanged, thereby completing automatic update of the target table structure under the condition, automatically identifying the change of the data structure, realizing changing operation, reducing manual hard coding and improving the real-time integrated development efficiency of the Flink.

Furthermore, the multi-source heterogeneous real-time task processing method can further comprise the following processing steps:

if the Jdbc connection is adopted, according to the target table construction sentence, the SeaTunnel target table is judged to be absent, and according to the spliced target table construction sentence, the SeaTunnel target table is automatically constructed.

Specifically, in the process of splicing the target table construction statement DDL, when the Jdbc connection is adopted to determine that the SeaTunnel target table does not exist, the automatic construction of the table can be directly performed according to the flow of the SeaTunnel, so that the automatic construction of the SeaTunnel target table is realized, and the task processing efficiency is improved.

and if the comparison result determines that the SeaTunnel target table data structure is changed, changing the SeaTunnel target table data structure into the latest SeaTunnel target table data structure corresponding to the target table sentence.

Specifically, the obtained target table construction statement is compared with the data structure of the sink output module, whether the table structure is changed is judged, if the table structure is changed, the SeaTunnel target table data structure is required to be changed into the latest SeaTunnel target table data structure corresponding to the target table construction statement, and therefore automatic change operation of the table structure is finally completed, and task processing efficiency is further improved.

It should be understood that, although the steps in the flowcharts 1 and 4 described above are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps of the flowcharts 1 and 4 described above may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least some of the other steps or other steps.

In one embodiment, as shown in fig. 5, a multi-source heterogeneous real-time task processing system 100 is provided, which includes a first configuration module 11, a second configuration module 12, a data encapsulation module 13, a protocol parsing module 14, a pipeline creation module 15, a plug-in configuration module 16, a plug-in orchestration module 17, a parsing configuration module 18, a storage selection module 19, a template construction module 20, a task entity module 21, a structure parsing module 22, and a task scheduling module 23. Wherein:

the first configuration module 11 is configured to configure a real-time data source management page of a task according to a real-time task scene of a heterogeneous data source; configuring the real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification. The second configuration module 12 is used for configuring a target table metadata management page of the task; the configuration process comprises selecting a Phoenix target output source or a MySQL target output source, pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length.

The data encapsulation module 13 is used for encapsulating the Kafka real-time streaming data by adopting a nested structure based on List+map and adopting a Json data format according to a standard template of the Kafka protocol format. The protocol parsing module 14 is configured to parse Kafka real-time streaming data based on a Mapping data protocol set in advance according to a Kafka data protocol parsing template; the composition of the Mapping data protocol includes a protocol field, a field alias, a field location, and a data type. The pipeline creation module 15 is configured to create a SeaTunnel real-time data integration pipeline according to a SeaTunnel real-time data integration pipeline model; the pipeline module of the SeaTunnel real-time data integration pipeline comprises a Flink execution environment module, a Flink input module, a Flink conversion module and a Flink output module. The plug-in configuration module 16 is configured to dynamically select an integrated plug-in and configure plug-in parameter information based on a DAG drag mode in a pipeline module composition of the SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time data integration pipeline plug-in model; the integrated plug-in comprises an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters. The plug-in arrangement module 17 is used for carrying out combination arrangement on integrated plug-ins in a pipeline module composition of the SeaTunnel real-time data integration pipeline according to a SeaTunnel real-time integrated plug-in combination model; the composition arrangement includes an associative combination of an input plug-in, a conversion plug-in, and an output plug-in. The parsing configuration module 18 is configured to configure and utilize Mapping protocol parsing rules to parse Kafka real-time streaming data of the Json nested structure in a Kafka real-time input source plug-in of the link input module according to the Kafka data parsing flow model, and convert the Kafka real-time streaming data into streaming data of a plurality of single-layer structures. The storage selection module 19 is configured to configure a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the link output module according to the SeaTunnel data storage model; the data storage schema includes a detail model, a primary key model, and an aggregation model. The template construction module 20 is configured to construct a SeaTunnel real-time task configuration template according to the task parameter configuration template based on the link. The Task entity module 21 is configured to package an input plug-in the flank input module, a conversion plug-in the flank conversion module, and an output plug-in the flank output module into a real-time Task entity Task in a pipeline module composition of the SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time Task configuration template. The structure analysis module 22 is configured to perform module parameter analysis and conversion processing in Task configuration of the real-time Task entity Task according to the SeaTunnel target table data structure analysis model to obtain a target table data structure; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing data fields and types of the conversion module, and simultaneously converting the data fields and the types according to a SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module. The Task scheduling module 23 is configured to combine and package the link calculation engine, task configuration, data storage mode, target table data structure and target table sentence into a SeaTunnel real-time Task scheduling instance according to the SeaTunnel real-time Task scheduling model, so as to complete SeaTunnel real-time Task scheduling of the real-time Task scene.

The multi-source heterogeneous real-time task processing system 100 provides task configuration of a WEB visual interface through real-time data source management pages and target table metadata management page design; the construction process of the real-time task configuration is efficiently completed through the design of links such as Kafka protocol format standard, kafka data protocol analysis method, seaTunnel real-time data integration pipeline model, seaTunnel real-time integration plug-in type, seaTunnel execution environment plug-in, seaTunnel real-time input plug-in, seaTunnel real-time conversion plug-in, seaTunnel real-time output plug-in, seaTunnel real-time data integration pipeline plug-in configuration, seaTunnel pipeline plug-in combination arrangement, seaTunnel data storage model, seaTunnel target table data structure analysis, seaTunnel real-time scheduling task construction and the like.

In one embodiment, the multi-source heterogeneous real-time task processing system 100 can further include:

and the rule extraction module is used for extracting Mapping protocol analysis rules according to the Kafka real-time input source plug-in configuration of the SeaTunnel real-time scheduling task instance configuration.

The structure acquisition module is used for analyzing the data field and the type of the source input module according to the Mapping protocol analysis rule and analyzing the parameters of the transformation conversion module to acquire the data structure of the final sink output module.

And the statement splicing module is used for analyzing the SeaTunnel target table data structure according to the data structure of the sink output module and splicing the target table construction statement.

And the structure comparison module is used for acquiring a target table statement by adopting the Jdbc connection and judging whether the SeaTunnel target table exists, and if so, comparing the acquired target table statement with the data structure of the sink output module.

And the structure processing module is used for maintaining the SeaTunnel target table data structure unchanged when the comparison result determines that the SeaTunnel target table data structure is unchanged.

In one embodiment, the structure comparison module is further configured to automatically create the SeaTunnel target table according to the target table construction sentence that has been spliced when the SeaTunnel target table is determined to be absent according to the target table construction sentence by using Jdbc connection.

In one embodiment, the above structure processing module is further configured to change the SeaTunnel target table data structure to the latest SeaTunnel target table data structure corresponding to the target build table sentence when the comparison result determines that the SeaTunnel target table data structure changes.

For specific limitations of the multi-source heterogeneous real-time task processing system 100, reference may be made to the corresponding limitations of the multi-source heterogeneous real-time task processing method hereinabove, and no further description is given herein. The various modules in the multi-source heterogeneous real-time task processing system 100 described above can be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a device with a data processing function, or may be stored in a memory of the device in software, so that the processor may call and execute operations corresponding to the above modules, where the device may be, but is not limited to, various data computing and processing devices existing in the art.

In one embodiment, there is also provided a computer device including a memory and a processor, the memory storing a computer program, the processor implementing the following processing steps when executing the computer program: according to the real-time task scene of the heterogeneous data source, configuring a real-time data source management page of the task; configuring a real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification; configuring a target table metadata management page of a task; the configuration process comprises the steps of selecting a Phoenix target output source or a MySQL target output source, pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length; according to a standard template of the Kafka protocol format, adopting a nested structure based on List+map and adopting a Json data format to package Kafka real-time streaming data; analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to a Kafka data protocol analysis template; the composition of Mapping data protocol includes protocol field, field alias, field position and data type; creating a SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time data integration pipeline model; the pipeline module of the SeaTunnel real-time data integration pipeline comprises a Flink execution environment module, a Flink input module, a Flink conversion module and a Flink output module; dynamically selecting an integrated plug-in and configuring plug-in parameter information based on a DAG dragging mode in a pipeline module composition of a SeaTunel real-time data integrated pipeline according to a SeaTunel real-time data integrated pipeline plug-in model; the integrated plug-ins comprise an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters; according to the SeaTunnel real-time integrated plug-in combination model, combining and arranging integrated plug-ins in a pipeline module composition of a SeaTunnel real-time data integrated pipeline; the combination arrangement comprises the steps of carrying out association combination on an input plug-in, a conversion plug-in and an output plug-in; according to a Kafka data analysis flow model, configuring and utilizing Mapping protocol analysis rules to analyze Kafka real-time streaming data of a Json nested structure in a Kafka real-time input source plug-in of a Flink input module, and converting the Kafka real-time streaming data into streaming data of a plurality of single-layer structures; according to the SeaTunnel data storage model, configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module; the data storage mode comprises a detail model, a primary key model and an aggregation model; constructing a SeaTunnel real-time task configuration template according to the task parameter configuration template based on the Flink; according to the SeaTunnel real-time Task configuration template, in the pipeline module composition of the SeaTunnel real-time data integration pipeline, an input plug-in the Flink input module, a conversion plug-in the Flink conversion module and an output plug-in the Flink output module are packaged into a real-time Task entity Task; according to the SeaTunnel target table data structure analysis model, module parameter analysis and conversion processing are carried out in Task configuration of a real-time Task entity Task, and a target table data structure is obtained; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing data fields and types of the conversion module, and simultaneously converting the data fields and the types according to a SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module; and combining the Flink calculation engine, task configuration, data storage mode, target table data structure and target table sentence according to the SeaTunnel real-time scheduling Task model, and packaging the combined combination to form a SeaTunnel real-time scheduling Task instance, so as to complete SeaTunnel real-time Task scheduling of the real-time Task scene.

It will be appreciated that the above-mentioned computer device may include other software and hardware components not listed in the specification besides the above-mentioned memory and processor, and may be specifically determined according to the model of the specific computer device in different application scenarios, and the detailed description will not be listed in any way.

In one embodiment, the processor may also implement the steps or sub-steps added to the embodiments of the multi-source heterogeneous real-time task processing method described above when executing the computer program.

In one embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the following processing steps: according to the real-time task scene of the heterogeneous data source, configuring a real-time data source management page of the task; configuring a real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification; configuring a target table metadata management page of a task; the configuration process comprises the steps of selecting a Phoenix target output source or a MySQL target output source, pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length; according to a standard template of the Kafka protocol format, adopting a nested structure based on List+map and adopting a Json data format to package Kafka real-time streaming data; analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to a Kafka data protocol analysis template; the composition of Mapping data protocol includes protocol field, field alias, field position and data type; creating a SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time data integration pipeline model; the pipeline module of the SeaTunnel real-time data integration pipeline comprises a Flink execution environment module, a Flink input module, a Flink conversion module and a Flink output module; dynamically selecting an integrated plug-in and configuring plug-in parameter information based on a DAG dragging mode in a pipeline module composition of a SeaTunel real-time data integrated pipeline according to a SeaTunel real-time data integrated pipeline plug-in model; the integrated plug-ins comprise an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters; according to the SeaTunnel real-time integrated plug-in combination model, combining and arranging integrated plug-ins in a pipeline module composition of a SeaTunnel real-time data integrated pipeline; the combination arrangement comprises the steps of carrying out association combination on an input plug-in, a conversion plug-in and an output plug-in; according to a Kafka data analysis flow model, configuring and utilizing Mapping protocol analysis rules to analyze Kafka real-time streaming data of a Json nested structure in a Kafka real-time input source plug-in of a Flink input module, and converting the Kafka real-time streaming data into streaming data of a plurality of single-layer structures; according to the SeaTunnel data storage model, configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module; the data storage mode comprises a detail model, a primary key model and an aggregation model; constructing a SeaTunnel real-time task configuration template according to the task parameter configuration template based on the Flink; according to the SeaTunnel real-time Task configuration template, in the pipeline module composition of the SeaTunnel real-time data integration pipeline, an input plug-in the Flink input module, a conversion plug-in the Flink conversion module and an output plug-in the Flink output module are packaged into a real-time Task entity Task; according to the SeaTunnel target table data structure analysis model, module parameter analysis and conversion processing are carried out in Task configuration of a real-time Task entity Task, and a target table data structure is obtained; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing data fields and types of the conversion module, and simultaneously converting the data fields and the types according to a SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module; and combining the Flink calculation engine, task configuration, data storage mode, target table data structure and target table sentence according to the SeaTunnel real-time scheduling Task model, and packaging the combined combination to form a SeaTunnel real-time scheduling Task instance, so as to complete SeaTunnel real-time Task scheduling of the real-time Task scene.

In one embodiment, the computer program may further implement the steps or sub-steps added to the embodiments of the multi-source heterogeneous real-time task processing method described above when executed by a processor.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus dynamic random access memory (Rambus DRAM, RDRAM for short), and interface dynamic random access memory (DRDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it is possible for those skilled in the art to make several variations and modifications without departing from the spirit of the present application, which fall within the protection scope of the present application. The scope of the application is therefore intended to be covered by the appended claims.

Claims

1. The multi-source heterogeneous real-time task processing method is characterized by comprising the following steps of:

according to the real-time task scene of the heterogeneous data source, configuring a real-time data source management page of the task; configuring the real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification;

Configuring a target table metadata management page of the task; the configuration process comprises the steps of selecting a Phoenix target output source or a MySQL target output source, and pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length;

analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to a Kafka data protocol analysis template; the composition of the Mapping data protocol comprises a protocol field, a field alias, a field position and a data type;

dynamically selecting an integrated plug-in and configuring plug-in parameter information based on a DAG dragging mode in a pipeline module composition of the SeaTunel real-time data integrated pipeline according to a SeaTunel real-time data integrated pipeline plug-in model; the integrated plug-in comprises an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters;

According to a SeaTunnel real-time integrated plug-in combination model, combining and arranging integrated plug-ins in a pipeline module composition of the SeaTunnel real-time data integrated pipeline; the combination arrangement comprises the steps of performing association combination on the input plug-in, the conversion plug-in and the output plug-in;

according to a Kafka data analysis flow model, configuring and utilizing Mapping protocol analysis rules to analyze Kafka real-time streaming data of a Json nested structure in a Kafka real-time input source plug-in of the Flink input module, and converting the Kafka real-time streaming data into streaming data of a plurality of single-layer structures;

according to a SeaTunnel data storage model, configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module; the data storage mode comprises a detail model, a primary key model and an aggregation model;

according to a SeaTunnel real-time Task configuration template, in the pipeline module composition of the SeaTunnel real-time data integration pipeline, an input plug-in a Flink input module, a conversion plug-in a Flink conversion module and an output plug-in a Flink output module are packaged into a real-time Task entity Task;

According to the SeaTunnel target table data structure analysis model, module parameter analysis and conversion processing are carried out in Task configuration of the real-time Task entity Task, and a target table data structure is obtained; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing the data fields and types of the conversion module, and simultaneously converting the data fields and types according to the SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module;

2. The multi-source heterogeneous real-time task processing method according to claim 1, further comprising the steps of:

extracting Mapping protocol analysis rules according to Kafka real-time input source plug-in configuration of the SeaTunnel real-time scheduling task instance configuration;

Analyzing the data field and the type of the source input module according to the Mapping protocol analysis rule, and analyzing the parameters of the transform conversion module to obtain the data structure of the final sink output module;

analyzing a SeaTunnel target table data structure according to the data structure of the sink output module and splicing a target table construction statement;

obtaining the target table construction statement by adopting Jdbc connection, judging whether a SeaTunnel target table exists, and if so, comparing the obtained target table construction statement with the data structure of the sink output module;

and if the comparison result determines that the SeaTunnel target table data structure is unchanged, maintaining the SeaTunnel target table data structure unchanged.

3. The multi-source heterogeneous real-time task processing method according to claim 2, further comprising the steps of:

if the Jdbc connection is adopted, according to the target table construction sentence, judging that the SeaTunnel target table does not exist, and automatically creating the SeaTunnel target table according to the spliced target table construction sentence.

4. The multi-source heterogeneous real-time task processing method according to claim 2, further comprising the steps of:

and if the comparison result determines that the SeaTunnel target table data structure is changed, changing the SeaTunnel target table data structure into the latest SeaTunnel target table data structure corresponding to the target build table sentence.

5. A multi-source heterogeneous real-time task processing system, comprising:

the first configuration module is used for configuring a real-time data source management page of a task according to a real-time task scene of the heterogeneous data source; configuring the real-time data source management page comprises adding a Kafka real-time input source, a Phoenix target output source and a MySQL target output source in the real-time data source management page, storing connection configuration information of the input source and the output source, and completing connectivity test and verification;

the second configuration module is used for configuring a target table metadata management page of the task; the configuration process comprises the steps of selecting a Phoenix target output source or a MySQL target output source, and pulling and storing metadata information in the target output source, wherein the metadata information comprises a data set, a data field, a field type, a primary key, precision and length;

the protocol analysis module is used for analyzing the Kafka real-time streaming data based on a preset Mapping data protocol according to the Kafka data protocol analysis template; the composition of the Mapping data protocol comprises a protocol field, a field alias, a field position and a data type;

the plug-in configuration module is used for dynamically selecting integrated plug-ins and configuring plug-in parameter information based on a DAG dragging mode in the pipeline module composition of the SeaTunnel real-time data integrated pipeline according to a SeaTunnel real-time data integrated pipeline plug-in model; the integrated plug-in comprises an execution environment plug-in, an input plug-in, a conversion plug-in and an output plug-in, wherein the execution environment plug-in is selected as a Flink execution environment plug-in, the input plug-in is selected as a Kafka real-time input source plug-in, the conversion plug-in is selected as a Filter conversion plug-in, the output plug-in is selected as a MySQL target output source plug-in or a Phoenix target output source plug-in, and the plug-in parameter information comprises an execution environment parameter, a Mapping data protocol, filtering conditions and target output table parameters;

the plug-in arrangement module is used for carrying out combination arrangement on the integrated plug-ins in the pipeline module composition of the SeaTunnel real-time data integration pipeline according to a SeaTunnel real-time integrated plug-in combination model; the combination arrangement comprises the steps of performing association combination on the input plug-in, the conversion plug-in and the output plug-in;

the storage selection module is used for configuring a data storage mode in a MySQL target output source plug-in or a Phoenix target output source plug-in of the Flink output module according to a SeaTunnel data storage model; the data storage mode comprises a detail model, a primary key model and an aggregation model;

and the Task entity module is used for packaging an input plug-in the Flink input module, a conversion plug-in the Flink conversion module and an output plug-in the Flink output module into a real-time Task entity Task in the pipeline module composition of the SeaTunnel real-time data integration pipeline according to the SeaTunnel real-time Task configuration template.

The structure analysis module is used for carrying out module parameter analysis and conversion processing in Task configuration of the real-time Task entity Task according to a SeaTunnel target table data structure analysis model to obtain a target table data structure; the module parameter analysis and conversion processing comprises the steps of obtaining Mapping protocol analysis rules of Kafka real-time source input plugins in an input module, analyzing data fields and types of the input module, obtaining parameter configuration of filter conversion plugins or sql plugins in a conversion module, analyzing the data fields and types of the conversion module, and simultaneously converting the data fields and types according to the SeaTunnel real-time integration plugin combination model to obtain a target table data structure of a final output module;

And the Task scheduling module is used for combining the Flink calculation engine, the Task configuration, the data storage mode, the target table data structure and the target table construction sentence according to the SeaTunnel real-time scheduling Task model, and packaging the combined result into a SeaTunnel real-time scheduling Task instance to complete SeaTunnel real-time Task scheduling of the real-time Task scene.

6. A multi-source heterogeneous real time task processing system as claimed in claim 5, further comprising:

the rule extraction module is used for extracting Mapping protocol analysis rules according to Kafka real-time input source plug-in configuration of the SeaTunnel real-time scheduling task instance configuration;

the structure acquisition module is used for analyzing the data field and the type of the source input module according to the Mapping protocol analysis rule and analyzing the parameters of the transformation conversion module to acquire the data structure of the final sink output module;

the statement splicing module is used for analyzing a SeaTunnel target table data structure according to the data structure of the sink output module and splicing target table construction statement;

the structure comparison module is used for acquiring the target table construction statement by adopting Jdbc connection and judging whether a SeaTunnel target table exists, if so, comparing the acquired target table construction statement with the data structure of the sink output module;

7. The system of claim 6, wherein the structure comparison module is further configured to automatically create a SeaTunnel target table based on the spliced target table statement when the j dbc connection is used to determine that the SeaTunnel target table does not exist based on the target table statement.

8. The system according to claim 6, wherein the structure processing module is further configured to change the SeaTunnel target table data structure to a latest SeaTunnel target table data structure corresponding to the target build table sentence when the comparison result determines that the SeaTunnel target table data structure is changed.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the multi-source heterogeneous real-time task processing method of any of claims 1 to 4 when the computer program is executed.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the multi-source heterogeneous real-time task processing method of any of claims 1 to 4.