CN115357309B - Data processing method, device, system and computer readable storage medium - Google Patents

Data processing method, device, system and computer readable storage medium Download PDF

Info

Publication number
CN115357309B
CN115357309B CN202211304175.2A CN202211304175A CN115357309B CN 115357309 B CN115357309 B CN 115357309B CN 202211304175 A CN202211304175 A CN 202211304175A CN 115357309 B CN115357309 B CN 115357309B
Authority
CN
China
Prior art keywords
data processing
data
target
model
debugging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211304175.2A
Other languages
Chinese (zh)
Other versions
CN115357309A (en
Inventor
胡建宇
何文杰
陈飞
陈紫良
胡文广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202211304175.2A priority Critical patent/CN115357309B/en
Publication of CN115357309A publication Critical patent/CN115357309A/en
Application granted granted Critical
Publication of CN115357309B publication Critical patent/CN115357309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a data processing method, which comprises the following steps: obtaining a target data processing model based on user operation of a plurality of data processing modules on the canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions; submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model; submitting the target data processing model to an operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged; thus, the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result for each data processing model and the target debugging result for the data processing model, and the time consumption is short and the accuracy is high. The embodiment of the application also discloses a data processing device, a system and a computer readable storage medium.

Description

Data processing method, device, system and computer readable storage medium
Technical Field
The present application relates to data processing technology in the field of big data processing, and in particular, to a data processing method, apparatus, system, and computer readable storage medium.
Background
The Flink is a calculation framework oriented to real-time processing of big data and is used for carrying out stateful calculation on data streams. Generally, after a data stream is acquired by a flank, the data stream is processed according to pre-configured data processing logic. However, after the data processing logic is configured by the Flink, a large number of complicated operations and a long time are required to verify the accuracy of the data processing logic and the processing result, resulting in long time consumption and poor accuracy.
Disclosure of Invention
In order to solve the above technical problems, it is desirable to provide a data processing method, apparatus, system and computer readable storage medium, which solve the problems of long time consumption and poor accuracy caused by a large number of complicated operations and a long time required to verify the accuracy of data processing logic and processing results in large data development.
The technical scheme of the application is realized as follows:
a method of data processing, the method comprising:
obtaining a target data processing model based on user operation of a plurality of data processing modules on the canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions;
Submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model;
and submitting the target data processing model to an operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged.
In the above scheme, the obtaining the target data processing model based on the operation of the user on the plurality of data processing modules on the canvas includes:
determining a plurality of data processing modules based on user operations on the canvas;
acquiring target configuration data for each data processing module for realizing a target function;
and obtaining the target data processing model based on the data processing module and the target configuration data.
In the above scheme, the operation node is configured to process the target data according to the target data processing model.
In the above scheme, the debug node includes a link node, and the run node includes another resource coordinator YARN node.
In the above solution, the obtaining the target configuration data for each data processing module for implementing the target function includes:
Acquiring first configuration data for each data processing module;
receiving a determining instruction for the first configuration data, and determining intermediate configuration data for each data processing module from configuration data to be selected based on the determining instruction;
obtaining second configuration data of each data processing module based on the intermediate configuration data and the first configuration data;
and determining the second configuration data as target configuration data of each data processing module under the condition that the second configuration data is determined to be parsed.
In the above scheme, the determining a plurality of data processing modules based on the operation of the user on the canvas includes:
determining a first data processing module representing a data source, a second data processing module having a data processing function and a third data processing module representing a storage location of the processed data based on a user operation on the canvas; wherein the data processing module comprises the first data processing module, the second data processing module and the third data processing module;
correspondingly, acquiring the first configuration data for each data processing module includes:
Receiving first sub-configuration data for the first data processing module, second sub-configuration data for the second data processing module, and third sub-configuration data for the third data processing module; wherein the first configuration data includes the first sub-configuration data, the second sub-configuration data, and the third sub-configuration data.
In the above scheme, the method further comprises:
determining first abnormal data in the second configuration data under the condition that the second configuration data is determined not to be parsed through grammar;
and determining a first abnormality cause and a first optimization scheme based on the first abnormality data and displaying the first abnormality cause and the first optimization scheme.
In the above solution, the obtaining a target data processing model based on the data processing module and the target configuration data includes:
obtaining an initial data processing model based on each data processing module, a first association relation among the data processing modules and each target configuration data;
carrying out integrity check on the initial data processing model;
and determining the initial data processing model as the target data processing model under the condition of passing the integrity check.
In the above solution, submitting the target data processing model to a debug node for debugging, and before obtaining the intermediate debug result for each data processing module and the target debug result for the target data processing model, further including:
checking the executable of the target data processing model;
correspondingly, submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model, wherein the method comprises the following steps of:
under the condition that the target data processing model is confirmed to be executable, submitting the target data processing model to the debugging node for analysis to obtain data processing logic;
and processing the debugging data based on the data processing logic to obtain the intermediate debugging result and the target debugging result.
In the above solution, submitting the target data processing model to the debug node for analysis to obtain data processing logic includes:
submitting the target data processing model to the debugging node for analysis to obtain a processing operator and an output operator in each data processing module; the processing operator is used for processing data, and the output operator is used for outputting a processing result of each data processing module;
Determining a second association relation between the processing operators and a third association relation between the processing operators and the output operators;
determining first data processing logic for each of the data processing modules based on the processing operator and the output operator;
determining second data processing logic for the target data processing model based on the processing operator, the output operator, the second association, and the third association; wherein the data processing logic comprises the first data processing logic and the second data processing logic.
In the above scheme, the method further comprises:
periodically acquiring first operation data, the intermediate debugging result and the target debugging result when the target data processing model is debugged through the debugging node;
displaying a first message for representing successful debugging of the target data processing model under the condition that the intermediate debugging result and the target debugging result represent successful debugging of the target data processing model;
analyzing the first operation data, the intermediate debugging result and the target debugging result under the condition that the intermediate debugging result and the target debugging result represent the target data processing model debugging abnormality/the target debugging result is not obtained within a target time threshold value, and determining second abnormal data when the target data processing module is debugged;
And determining a second abnormality cause and a second optimization scheme based on the second abnormality data and displaying the second abnormality cause and the second optimization scheme.
In the above scheme, the method further comprises:
acquiring second operation data when the target data processing model is operated through the operation node;
analyzing the second operation data through a monitoring node to determine the operation state and the data processing condition of the target data processing model;
under the condition that the running state is normal, displaying the running state and the data processing condition;
under the condition that the running state is abnormal, analyzing the second running data and the data processing condition to obtain third abnormal data aiming at the running of the target data processing model;
determining a third abnormality cause and a third optimization scheme based on the third abnormality data and displaying the third abnormality cause and the third optimization scheme;
and processing the target data based on the target data processing model under the condition that the running state is abnormal and the data processing system meets the model recovery condition.
A data processing apparatus, the apparatus comprising:
an acquisition unit for acquiring a target data processing model based on user operations on a plurality of data processing modules on a canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions;
The first processing unit is used for submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model;
and the second processing unit is used for submitting the target data processing model to an operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged.
A data processing system, the system comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute the data processing program in the memory, so as to implement the steps of the data processing method.
A computer readable storage medium storing one or more programs executable by one or more processors to implement the steps of any of the data processing methods described above.
A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the steps of any of the data processing methods described above.
The data processing method, the device, the system and the computer readable storage medium provided by the embodiment of the application can obtain the target data processing model based on the operation of a user on a plurality of data processing modules on a canvas, then submit the target data processing model to a debugging node for debugging, obtain an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model, and then submit the target data processing model to an operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged; the data processing module is provided with data processing logic for processing data to realize corresponding functions; in this way, the target data processing model is submitted to the debugging node for debugging so as to obtain the intermediate debugging result for each data processing module and the target debugging result for the target data processing model, so that the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result for each data processing model and the target debugging result for the data processing model, the time consumption is short, the accuracy is high, and the problems that a large amount of complicated operations and a long time are required to verify the accuracy of the data processing logic after the data processing logic is configured by the Flink in the related technology are solved, and the time consumption and the accuracy are poor are caused.
Drawings
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 2 is a flowchart of another data processing method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating another data processing method according to an embodiment of the present disclosure;
FIG. 4 is a diagram of a canvas in a data processing system provided in an embodiment of the present application;
fig. 5 is a schematic diagram of filling first configuration data in a data processing method according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating planning of second configuration data in a data processing method according to an embodiment of the present application;
fig. 7 is a schematic diagram of syntax parsing in a data processing method according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an initial data processing model in a data processing system according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a debug result in a data processing method according to an embodiment of the present application;
FIG. 10 is a schematic diagram of another debug result in a data processing method according to an embodiment of the present application;
fig. 11 is a schematic diagram of a case of failing syntax parsing in a data processing method according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a further debugging result in a data processing method according to an embodiment of the present disclosure;
FIG. 13 is a schematic diagram of a debug result in another data processing method according to an embodiment of the present application;
FIG. 14 is a diagram of an operation and maintenance interface in a data processing system according to an embodiment of the present application;
FIG. 15 is a schematic diagram of a detail interface for a data processing model in a data processing system according to an embodiment of the present application;
FIG. 16 is a diagram of a data processing situation presentation interface in a data processing system according to an embodiment of the present application;
FIG. 17 is a schematic diagram of an exception log in a data processing method according to an embodiment of the present disclosure;
FIG. 18 is a schematic diagram of a plurality of templates in a data processing system according to an embodiment of the present application;
FIG. 19 is a schematic diagram of a non-timing correlation template in a data processing system according to an embodiment of the present application;
FIG. 20 is a diagram of a statistical template in a data processing system according to an embodiment of the present application;
FIG. 21 is a schematic diagram of a timing correlation template in a data processing system according to an embodiment of the present application;
FIG. 22 is a schematic diagram of a duplicate template in a data processing system according to an embodiment of the present application;
FIG. 23 is a flowchart of a data processing method according to another embodiment of the present disclosure;
FIG. 24 is a flowchart illustrating a method for debugging a target data processing model according to another embodiment of the present disclosure;
FIG. 25 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 26 is a schematic diagram of a data processing system according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
It should be appreciated that reference throughout this specification to "an embodiment of the present application" or "the foregoing embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrase "in an embodiment of the present application" or "in the foregoing embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In various embodiments of the present application, the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
Without being specifically illustrated, the electronic device may perform any of the steps in the embodiments of the present application, and may be a processor of the electronic device performing the steps. It is further noted that the embodiments of the present application do not limit the order in which the following steps are performed by the electronic device. In addition, the manner in which the data is processed in different embodiments may be the same method or different methods. It should be further noted that any step in the embodiments of the present application may be independently executed by the electronic device, that is, when the electronic device executes any step in the embodiments described below, execution of the other step may not be dependent.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
An embodiment of the present application provides a data processing method, which may be applied to a data processing system, and referring to fig. 1, the method includes the following steps:
step 101, obtaining a target data processing model based on the operation of a plurality of data processing modules on a canvas by a user.
Wherein the data processing module has data processing logic for processing data to implement corresponding functions.
In the embodiment of the application, the data processing system may be composed of a master node, a debug node and an operation node; the main node is mainly used for constructing a target data processing model; the debugging node is used for debugging the data processing model to determine the correctness of the target data processing module; the operation node is used for processing the data based on the data processing model; each node may be deployed on one or more servers. Wherein, the main node can be operated with a front end interface and a data factory; the front-end interface is an interface for operation interaction with a user; in one possible approach, the front-end interface may include a canvas for large data development, the canvas including a plurality of data processing modules for building a data processing model, each data processing module having data processing logic for processing data to implement a corresponding function; the data processing model characterizes the whole data processing logic when processing data and is used for realizing the target function; therefore, the front-end interface is designed to facilitate the operation of a user through the visualized module, so that the user can configure the data processing logic required by the user under the condition of unfamiliar with the bottom code so as to realize the corresponding function; in addition, the user can also input configuration data for each data processing module through the canvas to configure the data processing rules, thereby generating data processing logic based on the data processing rules, and further realizing corresponding functions. The data factory is used for generating a data processing model based on the data processing modules and configuration data for each data processing module, in other words, the data factory is a process for processing data, that is, a background process in the data processing system.
It should be noted that the debug node and the operation node may be the same or different, which is not limited in the embodiment of the present application. Preferably, the debug node may be a Flink node, and the run node may be another resource coordinator (Yet Another Resource Negotiator, yan) node (yan may be deployed locally or in a non-local environment, such as the cloud). In the prior art, debugging and running of a target processing model are both carried out on a local or non-local YARN node, obvious network delay is generated when the target data processing model is submitted to the YARN node for debugging, the task is started slowly, and a certain scheduling process is needed to start debugging; the embodiment of the application can respectively put the debugging and running of the target processing model on different nodes for processing, specifically, the target data processing model is debugged through a local Flink node (i.e. a debugging node), and the debugging is successfully submitted to a YARN node (i.e. a running node) for using the target data processing model for data. Therefore, the target data processing model is debugged on the local node, the task starting speed is high, compared with the prior art that tasks are uniformly submitted to YARN nodes for debugging and running, unnecessary flow and network transmission overhead are reduced, network time delay is extremely low, and the debugging speed is higher. Meanwhile, the debugging and the running of the target processing model are respectively put on different nodes for processing, so that the debugging and the running of the target data processing model are not interfered with each other, and the safety and the reliability of a service environment are ensured.
In one implementation, a user may drag, configure, modify, etc., a plurality of data processing modules on a canvas to construct a target data processing model.
Step 102, submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model.
In the embodiment of the application, the intermediate debugging result is a result of each data processing module after data processing in the debugging process; in one possible manner, the intermediate debug result for each data processing module may be a result output by each data processing module in the process of debugging the entire target data processing model through debug data, or may be a result output after each data processing module is independently debugged through debug data. The target debugging result is the final output result after the whole target data processing model is debugged through the debugging data.
In one implementation, after the target data processing model is built by the master node, the target data processing model may be submitted to a debug node for debugging, to determine the accuracy of the target data processing module.
And step 103, submitting the target data processing model to the operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged.
The operation node is used for processing the data based on the data processing model.
In the embodiment of the application, whether the target data processing model is successfully debugged or not can be determined according to the intermediate debugging result aiming at each data processing model and the target debugging result aiming at the data processing model; if the target data processing model is successfully debugged, the target data can be processed based on the target data processing model to realize the target function.
In the embodiment of the application, the intermediate debugging result for each data processing model and the target debugging result for the data processing model can be obtained, so that under the condition that the target data processing model fails to be debugged, the target data processing model can be analyzed according to the intermediate debugging result and the target debugging result, and the position of the abnormal part of the target data processing model can be determined more accurately, so that the abnormal data processing module can be analyzed accurately to obtain the cause of the abnormality, the more accurate abnormal positioning is realized, and the difficulty in troubleshooting the abnormality is reduced. Secondly, the successful debugging of the target data processing model is determined through the intermediate debugging result aiming at each data processing model and the target debugging result aiming at the data processing model, so that the successful debugging accuracy of the target data processing model is further improved.
According to the data processing method provided by the embodiment of the application, the target data processing model is submitted to the debugging node for debugging so as to obtain the intermediate debugging result for each data processing module and the target debugging result for the target data processing model, the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result for each data processing model and the target debugging result for the data processing model, the time consumption is short, the accuracy is high, and the problems that after the data processing logic is configured through the Flink in the related technology, a large number of operations (such as breaking points in codes or complex operations such as analysis output logs) and a long time are needed to verify the accuracy of the data processing logic are solved, so that the time consumption is long and the accuracy is poor are solved.
Based on the foregoing embodiments, an embodiment of the present application provides a data processing method, referring to fig. 2, including the following steps:
step 201, the data processing system determines a plurality of data processing modules based on user operations on the canvas.
Wherein the data processing module has data processing logic for processing data to implement corresponding functions.
In one implementation, a user may determine a plurality of data processing modules currently desired by performing operations such as dragging, modifying, configuring, etc. on the plurality of data processing modules presented on the canvas.
Step 202, the data processing system obtains target configuration data for the data processing module for implementing the target function.
In the embodiment of the present application, the target function is a function that needs to be implemented by a user, in a feasible manner, the target function may be acquiring abnormal data in network data, and the target function may also be acquiring an event meeting a preset event rule from an event stream, and of course, the target function may also be determined according to an actual service requirement. The target configuration data is the configuration data of each data processing module, and the configuration data is used for configuring data processing rules so as to refine data processing logic; in one possible approach, the user may enter the target configuration data for each data processing module through a canvas.
Wherein, step 202 may be implemented by:
step 202a, the data processing system obtains first configuration data for each data processing module.
In the embodiment of the present application, the first configuration data is configuration data of each data processing module; in one possible manner, a user may input first configuration data for each data processing module to the data processing system through the canvas, such that the first configuration data for each data processing module may be obtained; wherein the first configuration data may be a structured query language (Structured Query Language, SQL) statement input by a user through a SQL edit box of the canvas.
Step 202b, the data processing system receives the determining instruction for the first configuration data, and determines intermediate configuration data for each data processing module from the configuration data to be selected based on the determining instruction.
In an embodiment of the present application, the determining instruction is a determining instruction for the first configuration data, and the confirming instruction is used for determining intermediate configuration data for each data processing module. The configuration data to be selected is configuration data related to the first configuration data; the configuration data to be selected may be information such as a field or a keyword related to the first configuration data. The intermediate configuration data is determined from the configuration data to be selected based on the confirmation instruction, and the intermediate configuration data can be information such as keywords, fields and the like related to the first configuration data. In one possible manner, after determining the data processing modules, the data processing system may determine, based on each data processing module, a keyword and a field corresponding to each data processing module, and use the keyword and the field corresponding to each data processing module as the configuration data to be selected of each data processing module, so that after acquiring the first configuration data for each data processing module, the corresponding configuration data to be selected may be displayed on a canvas for a user to select, and the user clicks on the configuration data to be selected required by the user to send a determination instruction to the data processing system; and after the data processing system receives the confirmation instruction, determining the configuration data to be selected clicked by the user as the intermediate configuration data of each data processing model.
Step 203c, the data processing system obtains the second configuration data of each data processing module based on the intermediate configuration data and the first configuration data.
In this embodiment of the present application, the second configuration data is configuration data obtained based on the intermediate configuration data and the first configuration data, and the second configuration data may be an SQL statement obtained by supplementing information such as an SQL statement and a keyword or a field related to the SQL statement, that is, the first configuration data is an incomplete SQL statement, and the second configuration data is an SQL statement obtained by supplementing the incomplete SQL statement based on information such as a keyword or a field related to the incomplete SQL statement. The second configuration data is determined based on the intermediate configuration data and the first configuration data, so that after the first configuration data is input by a user, information such as fields, keywords and the like related to the first configuration data is automatically supplemented, and the convenience of user operation is improved.
In step 202d, in case it is determined that the second configuration data is parsed, the data processing system determines that the second configuration data is the target configuration data of each data processing module.
In this embodiment of the present application, the second configuration data of each data processing module may be parsed by the data factory, and if the second configuration data of a certain data processing module passes the syntax parsing, the second configuration data of the certain data processing module is determined as the target configuration data.
In one possible approach, the data factory may parse the second configuration data based on a dynamic data management framework (Calcite); specifically, converting the second configuration data (i.e. the filled SQL statement) into a structured query language analysis object (Structured Query Language Parser, SQL Parser), i.e. a grammar analysis object, then calling a parsetmtList method to convert the filled SQL statement into an abstract syntax tree structure, then cutting the filled SQL statement into individual nodes based on the abstract syntax tree, and finally forming a structured query language Node (Structured Query Language Node, SQL Node) list; and if the steps can all normally run, parsing through grammar.
Step 203, the data processing system obtains a target data processing model based on the data processing module and the target configuration data.
In the embodiment of the application, the target data processing model is an overall data processing logic for representing processing data based on the data processing module and the target configuration data, that is, the target function can be realized by processing the data based on the target data processing model.
Step 204, the data processing system obtains an initial data processing model based on each data processing module, the first association relationship among the data processing modules, and each target configuration data.
In the embodiment of the application, a certain association relationship exists among a plurality of data processing modules for realizing the target function, and the first association relationship is the association relationship among each data processing module. The initial data processing model is a data processing model obtained based on each data processing module, a first association relationship among the data processing modules, and each target configuration data.
Step 205, the data processing system performs integrity check on the initial data processing model.
In the embodiment of the application, after the initial data processing model is obtained, the integrity of the initial data processing model is required to be checked to determine whether the initial data processing model has abnormality such as deletion; in one possible approach, the initial data processing model may be integrity checked by a data factory daemon to determine that the initial data processing model is not deficient in the necessary data to process the data.
Step 206, in the case of passing the integrity check, the data processing system determines the initial data processing model as the target data processing model.
In an embodiment of the present application, if the initial data processing model passes the integrity check, determining the initial data processing model as the target data processing model; if the initial data processing model fails the integrity check, a prompt message indicating that the initial data processing model fails the integrity check can be displayed on a canvas so that a user can learn the message; of course, the location, reason, and solution of the initial data processing model failing the integrity check may also be displayed on the canvas for reference by the user.
Step 207, the data processing system verifies the target data processing model's executable.
In the embodiment of the application, the executable performance of the target data processing model is checked, namely whether the debugging node for currently debugging the target data processing model is available or not is determined, and the memory and execution conditions and the like required by the target data processing model can be met by the debugging node.
In other embodiments of the present application, a user may input debug data through a canvas and click a debug button to send a debug instruction instructing a data processing system to process a target data model; after receiving the debug instruction, the data processing system can create a new debug theme or debug table or debug index and other debug identifiers to identify the current debug task; after creating the debug flag, it is then detected whether the debug node is available.
Step 208, in the case that it is determined that the target data processing model is executable, the data processing system submits the target data processing model to the debug node for analysis to obtain data processing logic.
In other embodiments of the present application, it is determined whether the target data processing model is executable, i.e., whether a debug node currently debugging the target data processing model is capable of debugging the target data processing model normally. The target data processing model is obtained based on the data processing modules realizing the target functions, the association relation among the data processing modules and the target configuration data, wherein the data processing logic is implied, so that the data processing logic can be obtained after the data processing model is analyzed.
Step 209, the data processing system processes the debug data based on the data processing logic to obtain an intermediate debug result and a target debug result.
In the embodiment of the application, the debugging data is used for debugging the target data processing model to determine whether the target data processing model has an abnormality. In one possible manner, the user may input debug data through the canvas, or may determine a data source from the data processing module, determine debug data from the data source, and then process the debug data based on the data processing logic to obtain an intermediate debug result and a target debug result, thereby determining the accuracy of the target data processing model.
Step 210, the data processing system submits the target data processing model to the running node in case the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged.
The operation node is used for processing the data based on the data processing model.
It should be noted that, in this embodiment, the descriptions of the same steps and the same content as those in other embodiments may refer to the descriptions in other embodiments, and are not repeated here.
According to the data processing method provided by the embodiment of the application, the target data processing model is submitted to the debugging node for debugging so as to obtain the intermediate debugging result aiming at each data processing module and the target debugging result aiming at the target data processing model, the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result aiming at each data processing model and the target debugging result aiming at the data processing model, the time consumption is short, the accuracy is high, and the problems that a large amount of operations and a long time are required to verify the accuracy of the data processing logic after the data processing logic is configured through the Flink in the related technology are solved, so that the time consumption and the accuracy are poor are caused.
Based on the foregoing embodiments, an embodiment of the present application provides a data processing method, referring to fig. 3, including the following steps:
step 301, the data processing system determines a first data processing module characterizing a source of data, a second data processing module having data processing functionality, and a third data processing module characterizing a storage location of the processed data based on a user operation on the canvas.
The data processing module is provided with data processing logic for processing data to realize corresponding functions; the plurality of data processing modules in the data processing system comprise a first data processing module, a second data processing module and a third data processing module.
In an embodiment of the present application, the first data processing module is configured to configure a data source; the second data processing module is used for configuring data processing rules, triggering conditions and other information; the third data processing module is used for determining the storage place of the processed data.
In one possible manner, as shown in FIG. 4, a canvas included in a data processing system may include a data source, a data destination, a relational character and condition, and a plurality of templates that are pre-constructed; wherein the data source is a first data processing module, the relational character and the condition are a second data processing module, and the data destination is a third data processing module; multiple databases such as Kafka, a distributed file storage system (Hadoop Distributed System, hdfs) and the like can be arranged in the data sources of the canvas in advance, so that a user can set the data sources from the existing databases, and the user operation is facilitated; the relational symbols such as an and, an or, a not and the like can be set on the canvas in advance, whether the satisfied conditions are single conditions or multiple conditions and the like, so that the user can select the relational symbols, the or, the not and the like, and the user is facilitated; multiple templates such as a common template, a statistical template, a time sequence association template, a non-time sequence association template, a repetition time template and the like can be built in advance, so that a user can construct a required model directly based on the existing template. In addition, the canvas may also include a debug key, a publish key, a go to operation and maintenance key, a withdraw key, a reread key, and a store key; moreover, the canvas can be operated by a full screen display key, an amplifying key, a shrinking key, a 1:1 restoring key, a downloading key and an eye protection key at the right lower part of the canvas; the update time may also be displayed on the canvas to facilitate the user to learn the update time of configuring the target data processing model. The multiple data processing modules obtained by operating the canvas by the user may be as shown in fig. 4, the first data processing module (i.e. the data source) may be Kafka, and the theme of the first data processing module is Flinky2; the second data processing module may be FlinkSQL; the third data processing module may be constituted by two data destinations, namely a data destination Kafka and a data destination splitter.
Step 302, the data processing system receives first sub-configuration data for a first data processing module, second sub-configuration data for a second data processing module, and third sub-configuration data for a third data processing module.
Wherein the first configuration data includes first, second and third sub-configuration data.
In this embodiment of the present application, the first sub-configuration data is configuration data of the first data processing module, where the first sub-configuration data may include component basic information such as a component name and a component description; the second sub-configuration number is configuration data of the second data processing module, and the second sub-configuration data may include a data source, an alias, a time window, a time field, and a delay tolerance time; the third sub-configuration data is configuration data of a third data processing module; the third sub-configuration data may be the same as or different from the information included in the first sub-configuration data, which is not limited in the embodiment of the present application. In a possible manner, as shown in fig. 5, after determining the first data processing module, the second data processing module and the third data processing module, the user may configure information in each data processing module, click each data processing module to enter a configuration interface corresponding to each data processing module, configure an alias of a data source, a time window, a time field and a delay tolerance time in the first data processing module through a canvas shown in fig. 5, and set a rule for the data source through a data processing rule, so that first configuration data for the data source is obtained.
Step 303, the data processing system receives a determining instruction for the first sub-configuration data, the second sub-configuration data and the third sub-configuration data, and determines first intermediate configuration data for the first data processing module, second intermediate configuration data for the second data processing module and third intermediate configuration data for the third data processing module from the first, second and third configuration data to be selected, respectively, based on the determining instruction.
The configuration data to be selected comprises first configuration data to be selected, second configuration data to be selected and third configuration data to be selected; the intermediate configuration data includes first intermediate configuration data, second intermediate configuration data, and third intermediate configuration data.
In the embodiment of the application, the first to-be-selected configuration data is configuration data related to the first sub-configuration data, the second to-be-selected configuration data is configuration data related to the second sub-configuration data, and the third to-be-selected configuration data is configuration data related to the third intermediate configuration data; the first intermediate configuration data may be information such as a keyword or a field related to the first data processing module, the second intermediate configuration data may be information such as a keyword or a field related to the second data processing module, and the third intermediate configuration data may be information such as a keyword or a field related to the third data processing module.
Step 304, the data processing system determines fourth sub-configuration data of the first data processing module based on the first intermediate configuration data and the first sub-configuration data, determines fifth sub-configuration data of the second data processing module based on the second intermediate configuration data and the second sub-configuration data, and determines sixth sub-configuration data of the third data processing module based on the third intermediate configuration data and the third sub-configuration data.
Wherein the second configuration data includes fourth, fifth and sixth sub-configuration data.
In the embodiment of the present application, the fourth sub-configuration data is configuration data determined based on the first intermediate configuration data and the first sub-configuration data, that is, configuration data after being complemented with respect to the first data processing module; the fifth sub-configuration data is configuration data determined based on the second intermediate configuration data and the second sub-configuration data, that is, the configuration data complemented for the second data processing module; the sixth sub-configuration data is configuration data determined based on the third intermediate configuration data and the third sub-configuration data.
In a possible manner, as shown in fig. 5, when the user sets the data processing rule through the canvas, information (i.e., the first configuration data) such as a keyword or a field related to the input first configuration data is displayed, as shown in an area indicated by an arrow at a field deducing and filling position in fig. 5, @ a.intsy99, @ a.operation, and @ a.time1, which are shown in the area indicated by an arrow at a field deducing and filling position in fig. 5, is selected by the user, after the user clicks the first configuration data, the user automatically supplements the first configuration data based on the clicked first configuration data to obtain fourth configuration data, so as to realize a field deducing and filling function, facilitate the user operation, and save the operation time. The implementation process of configuring the configuration data of the second data processing module and the third data processing module is similar to the implementation process of configuring the first data processing module, which is not limited in the embodiment of the present application.
In other embodiments of the present application, after the second configuration data is obtained, that is, after the user edits the data processing rule on the canvas, the user may send a normalization instruction to the data processing system through a key set on the canvas for normalizing the second configuration data; after the data processing system receives the normalization instruction, the normalization of the second configuration data can be realized through a data factory, so that the subsequent operation is convenient; the regions corresponding to the data processing rules shown in fig. 6 (a) and fig. 6 (b) are used for the user to input the second configuration data, and normalized keys are arranged at the data processing rules of the canvas of fig. 6 (a), so that the second configuration data before normalization shown in fig. 6 (a) can be converted into the second configuration data after normalization shown in fig. 6 (b) through the normalized keys.
In step 305, in the case that the fourth sub-configuration data, the fifth sub-configuration data, and the sixth sub-configuration data are determined to be parsed, the data processing system sequentially determines the fourth sub-configuration data, the fifth sub-configuration data, and the sixth sub-configuration data as target configuration data of the first data processing module, the second data processing module, and the third data processing module.
In the embodiment of the application, after the user inputs the data processing rule (namely, the second configuration data) through the canvas, an analysis instruction can be sent to the data processing system through a key set arranged on the canvas and used for carrying out grammar analysis on the second configuration data; after receiving the analysis instruction, the data processing system carries out grammar analysis on the second configuration data through the data factory to obtain a grammar detection result, and the grammar detection result is displayed on a canvas for a user to check. As shown in fig. 7, a grammar analysis key is set at the data processing rule of the canvas shown in fig. 7, after the configuration data of the current data processing module is written in the area corresponding to the data processing rule of the canvas shown in fig. 7, the written configuration data can be subjected to grammar analysis through the grammar analysis key, and then the grammar detection result (i.e. no problem is found) is displayed in the information area below the data processing rule for the user to review.
Step 306, the data processing system obtains an initial data processing model based on each data processing module, the first association relationship among the data processing modules, and each target configuration data.
In this embodiment of the present application, as shown in fig. 8, the first data processing module includes a data source Kafka1 and a data source Kafka2, the second data processing module is FlinkSQL, the third data processing module includes a data destination Kafka, and a connection line between each data processing module is a first association relationship between each data processing module; wherein, the characterization of the data sources Kafka1 and Kafka2 requires the processing of the data in the data sources Kafka1 and Kafka2, the flanksql is used for aggregating the data in the data sources Kafka1 and Kafka2, and the data destination Kafka is the storage place of the processed data, so the purpose of the initial data processing model constructed based on the multiple data processing modules shown in fig. 8 is to implement the aggregation of the two event streams by the flanksql and then write the data into the Kafka.
Step 307, the data processing system performs an integrity check on the initial data processing model.
Step 308, in the case of passing the integrity check, the data processing system determines the initial data processing model as the target data processing model.
Step 309, the data processing system verifies the target data processing model's executable.
Step 310, under the condition that the target data processing model is determined to be executable, the data processing system submits the target data processing modules to the debugging node for analysis, and a processing operator and an output operator in each data processing module are obtained.
The processing operator is used for processing the data, and the output operator is used for outputting the processing result of each data processing module.
In this embodiment of the present application, after the data processing system acquires the target data processing model, the target data processing model may be stored in JSON, and the Java archive file (Java Archive File, jar package) and the JavaScript object profile (JavaScript Object Notation, JSON) file that are obtained and stored in the target data processing model are submitted to a debug node, where the debug node debugs the target data processing model. Each data processing module is used for processing the data, and each data processing module is used for processing the data based on a processing operator; the output operator is used for obtaining the processing result of each data processing module and is convenient for checking the abnormality in the data processing module. In one possible way, an output operator is further connected to each operator except for the data source and destination operators to obtain the output result of the intermediate processing operator. The target operator is used for outputting a final result after the data processing is completed, and the intermediate processing operator is used for processing operators in the data processing process.
Step 311, the data processing system determines a second association relationship between the processing operators and a third association relationship between the processing operators and the output operators.
In the embodiment of the application, the second association relationship is an association relationship between processing operators, and the third association relationship is an association relationship between the processing operators and output operators. The operations when processing the data have certain dependency relationship, so that a second association relationship between the processing operators and a third association relationship between the processing operators and the output operators need to be determined to obtain logic when processing the data.
Step 312, the data processing system determines a first data processing logic for each data processing module based on the processing operator and the output operator.
In an embodiment of the present application, the first data processing logic is data processing logic of each data processing module. The data processing logic for each data processing module can be obtained based on the processing operator and the output operator of each data processing module, so that the output result of each data processing model is obtained.
Step 313, the data processing system determines a second data processing logic for the target data processing model based on the processing operator, the output operator, the second association, and the third association.
Wherein the data processing logic comprises first data processing logic and second data processing logic.
In an embodiment of the present application, the second data processing logic is data processing logic of the entire target data processing model. And obtaining the data processing logic of the whole target data processing model based on the output operator and the processing operator of each data processing module, the second association relation among the processing operators and the third association relation among the output operators. In one possible approach, after the jar package and JSON file for the target data processing model are obtained, the jar package and JSON file may be parsed to obtain one or more directed acyclic graphs (Directed Acyclic Graph, DAG) that characterize the data processing logic by the DAG.
Step 314, the data processing system processes the debug data based on the data processing logic to obtain an intermediate debug result and a target debug result.
In the embodiment of the application, the data processing logic includes a first data processing logic for each data processing module and a second data processing logic for the target data processing module, and an output result of each data processing module, that is, an intermediate debugging result, can be obtained through the first data processing logic; the final output result of the target data processing model, namely the target debugging result, can be obtained through the second data processing logic.
As shown in FIG. 9, for debugging of an exemplary target data processing model, a user may view in real time, through the canvas, the debugging conditions of the intermediate debugging results, the debugging progress, the target debugging results, etc., wherein "1{" product "is shown in the canvas of FIG. 9: "Commodity 1", price ":20, "name": "test 1", "age":18 A) is provided; 2{ "product": "Commodity 1", price ":20, "name": "test 2", "age":18 A) is provided; 3{ "product": "commodity 2", price ":21, "name": "test 1", "age":18};4{ "product": "commodity 2", price ":21, "name": "test 2", "age":18};5{ "product": "commodity 3", price ":22, "name": "test 1", "age":18};6{ "product": "commodity 3", price ":22, "name": "test 2", "age":18};7{ "product": "Commodity 1", price ":20, "name": "test 3", "age":18};8{ "product": "commodity 2", price ":21, "name": "test 3", "age":18};9{ "product": "commodity 3", price ":22, "name": "test 3", "age":18} "are intermediate debugging results, and the target debugging results are not shown in the figure because the debugging task shown in fig. 9 is not completed. FIG. 10 shows the debugging results of another target data processing model, which can be seen to be used for parsing data, the debugging results shown in the output data details of the canvas in FIG. 10 being four normal data and one abnormal data; as can be seen from the sample log and the contents of the fields, the four fields of time1, country, product, risk are normal data, and the field of uuld is abnormal data.
Step 315, the data processing system submits the target data processing model to the running node in case the intermediate debugging result and the target debugging result characterize that the target data processing model is successfully debugged.
The operation node is used for processing the data based on the data processing model.
Wherein, the operation of the data processing system for processing data based on the data processing model through the operation node can be realized by the following steps:
in step 315a, the data processing system determines, by the operating node, the target data based on the first data processing module.
In the embodiment of the present application, when the target data processing model is successfully debugged, the target data may be processed by the target data processing model, first, the target data may be determined from the data source in the first data processing module, and the target data may be all the data in the data source, or may be the target data meeting the target condition, that is, all the data in the data source may be processed, or may be some of the data in the data source, which is not limited in the embodiment of the present application.
And step 315b, the data processing system processes the target data based on the second data processing module to obtain processed data.
In the embodiment of the application, the target data is processed through the data processing rule set in the second data processing module, so as to obtain the processed data.
Step 315c, the data processing system stores the processed data based on the third data processing module.
In an embodiment of the present application, the third data processing module includes a data destination, and therefore the processed data is stored to the data destination based on the third data processing module. In one possible manner, if the target data is a plurality of event streams in the Kafka database, the second processing module sets an event rule (i.e., a data processing rule), where the event rule may be that two events occurring in the event streams are acquired at the same time, and the data destination is a splitter; then, the multiple event streams are analyzed through the time rule, and two events happening in the event streams are acquired and stored in the inpper.
Based on the foregoing embodiments, in other embodiments of the present application, steps 316 to 317 may be further performed after step 304 is performed.
Step 316, the data processing system determines the first exception data in the second configuration data if it is determined that the second configuration data is not parsed.
In the embodiment of the present application, the first abnormal data is abnormal data in the second configuration data. The second configuration data is not parsed, which indicates that the second configuration data may have abnormal conditions such as grammar errors and keyword deletions, and at this time, the abnormal data in the second configuration data, that is, the first abnormal data, may be directly located.
In this embodiment of the present application, fig. 11 shows a case where the second configuration data is not parsed, if the second configuration data is not parsed, the first exception data and the exception cause in the second configuration data are displayed on a canvas for a user to view, as in the canvas shown in fig. 11, the configuration data is input in a region corresponding to a data processing rule, a syntax detection result is displayed in a region corresponding to information below the data processing rule, the configuration data not parsed in fig. 11 is "SELECTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT", and the exception cause not parsed in fig. 11 is not shown, but a region may be added in the region corresponding to the information for displaying the exception cause not parsed in syntax.
Step 317, the data processing system determines a first abnormality cause and a first optimization scheme based on the first abnormality data and displays the first abnormality cause and the first optimization scheme.
In the embodiment of the present application, the first abnormal cause is a cause of generation of first abnormal data; the first optimization is a solution to the first anomaly data. In one possible manner, the data processing system may store the exception cause and the optimization scheme corresponding to each grammar exception in advance, and may store the exception cause and the optimization scheme in a table form, so that after the first exception data is obtained, the first exception cause and the first optimization scheme may be obtained through the table.
Based on the foregoing embodiment, in other embodiments of the present application, the data processing method may further include the following steps:
step 318, the data processing system periodically obtains, through the debugging node, the first running data, the intermediate debugging result and the target debugging result when the target data processing model is debugged.
In the embodiment of the application, the first operation data is data when the target data processing model is debugged through the debugging node; the first operational data may include an operational log including loaded data, current state, processing time, etc. In one possible manner, after the data factory submits the target data processing model to the debug node, a polling request may be periodically sent to the debug node to obtain the first operational data, the intermediate debug result, and the target debug result.
It should be noted that, after the step 318 is performed, the steps 319 or 320-321 may be performed.
Step 319, in case the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged, the data processing system displays a first message for representing that the target data processing model is successfully debugged.
In the embodiment of the application, the first message is used for representing that the target data processing model is successfully debugged. In one possible manner, if the target data processing model is determined to be successfully debugged based on the intermediate debugging result and the target debugging result, a first message is displayed on the canvas so that the user can know that the target data processing model is successfully debugged.
Step 320, under the condition that the intermediate debugging result and the target debugging result represent the debugging abnormality of the target data processing model/the target debugging result is not obtained within the target time threshold, the data processing system analyzes the first operation data, the intermediate debugging result and the target debugging result, and determines second abnormal data when the target data processing module is debugged.
In this embodiment of the present application, the second abnormal data is abnormal data when the target data processing module is debugged, and the second abnormal data may be abnormal data of the target data processing model itself, or may be abnormal data caused by a disconnection of a debug node or insufficient memory, which may be any abnormal data occurring when the target data processing module is debugged.
As shown in fig. 12, shown on the left side of fig. 12 is a currently constructed target data processing model for parsing data in Kafka; "run complete" shown in the debug results area on the right side of FIG. 12, characterized to complete debugging of the target data processing model; the area corresponding to the output data details in the right side of fig. 12 shows the abnormal data and the cause of the abnormality occurring in the debugging process, wherein "1@ #% # @ … …" is the abnormal data, and "this is dirty data" is the cause of the abnormality; the data details on the right side of fig. 12 also show the exception log parsing rules for the user to see. If a debug interrupt occurs while debugging the target data processing model, then the debug results characterizing the run interrupt are displayed in the canvas as shown in FIG. 13 as "bad-! The operation is interrupted, the result is abnormal, the abnormality is caused by the connection abnormality of the dialer-default component, and the corresponding solution is to check whether the dialer-default works normally, so that the debugging result, the abnormality cause and the solution are displayed on a canvas, a user can know the debugging condition of the target data processing model conveniently, the abnormal condition can be processed conveniently, and the testing efficiency is improved.
And 321, the data processing system determines a second abnormality cause and a second optimization scheme based on the second abnormality data and displays the second abnormality cause and the second optimization scheme.
In the embodiment of the present application, the second abnormality cause is a cause of generation of second abnormality data; the second optimization solution is a solution to the second outlier. In one possible manner, the data processing system may store the debug-occurring abnormal data and the corresponding abnormal cause and optimization scheme in advance, and may store the debug-occurring abnormal data in a table form, so that after obtaining the second abnormal data, the second abnormal cause and the second optimization scheme may be obtained through the table.
Based on the foregoing embodiment, in other embodiments of the present application, the data processing method further includes the following steps:
step 322, the data processing system obtains, through the operation node, second operation data when the target data processing model is operated.
In the embodiment of the application, the second operation data is data when the target data processing model is operated; the second operational data may include indicators of the amount of data processed, the amount of pile-up data, the rate of processing, and the like.
Step 323, the data processing system analyzes the second operation data through the monitoring node of the data processing system, and determines the operation state and the data processing condition of the target data processing model.
In an embodiment of the present application, the monitoring node may be a promiscus (promethaus) for monitoring the operation of the target data processing model. The monitoring node can analyze the second operation data to obtain the operation state and the data processing condition of the target data processing model; the operating state includes online, abnormal offline, etc.
Step 324, in the case that the operation state is normal, the data processing system displays the operation state and the data processing condition.
In a feasible manner, the operation and maintenance interface can be entered through a forward operation and maintenance button on the canvas, namely, the interface shown in fig. 14, wherein the operation and maintenance interface mainly displays the data processing condition in a chart form, so that the user can conveniently check the data processing condition, and the data processing condition comprises the processed data quantity, the total accumulated data quantity, the data entering rate, the data processing rate, the model state and the like in the target time of each target data processing model; the target time may be selected by itself, and the target time may be 1 hour. After clicking a certain target data processing model, a detail interface for the target data processing model is entered, as shown in fig. 15, where the detail interface includes a model name, a processing data amount of about 1 hour, an output data amount of about 1 hour, a total accumulated data amount, an abnormal data amount, a corresponding operation, and indexes such as update time, CPU occupation condition, and the like.
Step 325, under the condition that the operation state is abnormal, the data processing system analyzes the second operation data and the data processing condition to obtain third abnormal data aiming at the operation of the target data processing model.
In the embodiment of the application, the third abnormal data is abnormal data when the target data processing model runs. And analyzing the second operation data and the data processing condition, and obtaining whether the target data processing model is abnormal in operation or not.
And 326, the data processing system determines a third abnormality cause and a third optimization scheme based on the third abnormality data and displays the third abnormality cause and the third optimization scheme.
In the embodiment of the present application, the third abnormality cause is a cause of generation of third abnormal data; the third optimization solution is a solution for the third abnormal data. In one possible manner, the data processing system may store in advance the abnormal data occurring during the model running and the corresponding abnormal cause and the corresponding optimization scheme, and may store the abnormal data in a table form, so that after obtaining the third abnormal data, the third abnormal cause and the third optimization scheme may be obtained through the table. As shown in fig. 16, the running state and the data processing condition of each of the model 1, the model 2, the model 3 and the model 4 are included in fig. 16, and if an abnormality occurs, the abnormal data, the cause of the abnormality and the solution are displayed on an operation and maintenance interface, such as an abnormality log shown on the right side of fig. 16, for the user to view. In addition, the current abnormal log interface can be updated through the refreshing key shown in fig. 16, the current abnormal log interface can be downloaded through the downloading key, the current abnormal log can be displayed in a full screen through the full screen key, and the abnormal condition, abnormal event and the like of each model can be recorded, so that the historical abnormal condition of each model can be conveniently checked; as shown in FIG. 17, FIG. 17 shows a detail interface of a target data processing model with model name "111222" through which an anomaly log button can view that the model is offline and not self-restorable when 2011-11-30-20:56:00; the model can be checked to be successfully recovered when 2011-11-30:20:51:00, the number of cores is increased by 1, and the core memory is increased by 1GB; the model can be viewed as being abnormally offline at 2011-11-30:20:47:00; the model can be viewed as being created at 2011-11-30:20:44:00.
Step 327, in the case that the operation state is abnormal and the data processing system meets the model recovery condition, the data processing system processes the target data based on the target data processing model.
In the embodiment of the application, the model recovery condition can be preset, and the model recovery condition can be that the initial allocated memory is insufficient but the cluster memory is still available in a large amount, and in this case, the abnormal offline target data processing model is re-online so as to realize the model self-recovery function and avoid service interruption. As shown in FIG. 17, the self-recovery condition of the model can be written into the exception log, so that the user can check conveniently.
In other embodiments of the present application, the data processing system also provides CEP semantic analysis that expresses that if an event hits a configured event rule over a time horizon, then the hit event will be output by the analysis. The data processing system is pre-configured with templates of various data processing models, as shown in fig. 18, which includes a common template for filtering data streams by an and or, a statistical template for counting data streams by an aggregation function and a trigger rule, a timing correlation template for acquiring events occurring in order in the data streams by an ordered event rule, a non-timing correlation template for acquiring a plurality of events in the data streams by a non-timing event rule, and a repetition time template for acquiring repeated events in the data streams by configuring a repeated event rule. In addition, the data processing model can be additionally arranged through the areas such as model names, descriptions and the like.
As shown in fig. 19, the non-timing related template is used for matching non-timing related events, that is, event 1 and event 2, where event 1 is an event with fx field and event 2 is an event with sj field, where the event 1 and event 2 are to be determined from the data source Hdfs. As shown in fig. 19, the component basic information, the association condition, the time window, the matching policy, and the like can also be configured by a non-timing association template; wherein the component basic information comprises a component name, a component description, a data packet field and the like; the association condition comprises a mode, wherein the mode is used for setting event rules; the time window comprises an event field and window time, wherein the event field is used for limiting the event time; the matching policy is from what data the matching should begin when the event is re-matched. In a possible manner, if the configured event rule is (a, b, c), the events input by the data stream in the designated time range are a, b, c, a, b, d, e, f in turn, and because of no time sequence association, (a, b, c), (b, c, a), (c, a, b) three hit event combinations are output, so that by configuring a simple template in the canvas, a complex matching rule can be realized and output to the production environment. The statistical template shown in fig. 20 includes component basic information, function configuration, grouping statistics, trigger rules, time windows, etc., wherein the component basic information includes component names and component descriptions; the function configuration includes at least one function configuration, each function configuration including at least one function, at least one summing field, and at least one output field; the packet statistics include a packet field, which is an identification of the packet; the triggering rule is used for limiting the matching time of the event; the event window comprises a sliding window, a rolling window, types such as an inapplicable time window, window time, step length, time field and the like; the conditions in the statistical templates may be configured based on the user requirements, which are not limited in the embodiments of the present application. FIG. 21 illustrates a timing relationship template including component base information, relationship conditions, time windows, event strict proximity, and matching policies, etc.; wherein the component basic information comprises a component name, a component description, a data packet field and the like; the association condition comprises an option mode and an expression mode; the time window comprises a time field and a window time; event strict proximity means that events must occur in sequence; the matching policy is from what data the matching should begin when the event is re-matched. Shown in fig. 22 is a repetitive event template that includes component basic information, association conditions, time windows, event strict proximity, and matching policies, etc.
As shown in fig. 23, the user configures the target data processing model by dragging the data processing module over the canvas and inputting target configuration data for each data processing model; the data factory background program can verify the integrity of a target data processing model, and analyze the target data processing model through an engine layer to generate a JSON file and jar package aiming at the target data processing model; under the condition that debugging is needed, submitting the JSON file and the jar package to the flank cluster; the Flink cluster debugs the target data processing model based on the debugging data, and outputs an intermediate debugging result and a target debugging result; the first operation data, the intermediate debugging result and the target debugging result when the target data processing model is debugged can be obtained through the polling request, the background process of the data factory can analyze the first operation data, the intermediate debugging result and the target debugging result to determine whether the debugging is abnormal under the condition of abnormal debugging, and then the debugging result, the abnormal data and the abnormal reason are displayed at the front end. If the target data processing model is formally issued and operated, firstly checking available resources on the YARN cluster after clicking and starting, and if the resources are insufficient, proving that the target data processing model has no executable performance; and then, task information required to be issued and operated is stored in a task table of the database, and the function of asynchronously submitting real-time jobs to the YARN cluster is realized by periodically scanning the table through a timed task. In the operation stage, the jar packet of the engine layer and the JSON file are submitted to the YARN cluster together, and the jar packet of the engine layer is used for analyzing the JSON file to generate one or more directed acyclic graphs (namely data processing logic), wherein the directed acyclic graphs comprise objects such as a pipeline, operators, data streams and the like; and then creating an operation environment corresponding to the engine layer, configuring operation parameters, initializing a related data structure, and calling an execution method to operate in the YARN cluster. In addition, if the task is in the debugging page, the front end can continuously poll and acquire the running state of the task, and finally the running result is returned after the integration of the data factory back end process. If the operation is formally issued, the state of the YARN cluster can be monitored through a Prometaus alarm system, a user can send a request to a background process of the data factory when the monitoring operation and maintenance page is checked, and the background process can inquire the monitoring state in the database and return the result. For the model group with abnormal monitoring state, if the user starts the model self-recovery function, the abnormal offline model is re-online when the model recovery condition is met (the initial allocation of the memory is insufficient and the cluster memory is still available in a large amount), so that the service interruption is avoided. In addition, the abnormal cause diagnosis module of the background process of the data factory can also carry out log analysis on the model group in the abnormal state, and the diagnosed abnormal cause and the suggested solution are displayed together when the model group is displayed to the user abnormal log.
Before debugging the target data processing model, as shown in fig. 24, the integrity of the configured target data processing model and the connectivity of components need to be detected, so that the target data processing model can be normally executed, then a debugging environment is created, the availability of the Flink cluster is detected, the Flink cluster is ensured to be capable of debugging the target data processing model, then a JSON file and a Jar package are constructed, and a debugging task is submitted to the Flink cluster; the running state of the debugging task, the polling running log and the intermediate debugging result can be obtained through the polling request; the intermediate debugging result is stored in a MySQL database, the overall state of the debugging task can be obtained through the intermediate debugging result and the polling running state, and a progress bar can be calculated and whether polling can be stopped or not through the overall state of the debugging task and the intermediate debugging result; whether the task is overtime or finished can be determined through the polling running log filtered by the filtered keywords, the overall state of the debugging task and the intermediate debugging result; the task can be canceled when the time is out; after the task is finished or the task is canceled, log diagnosis can be performed by detecting the task at regular time to determine whether an abnormality exists, and if the abnormality exists, the front end displays abnormal data, the cause of the abnormality and the solution.
In the embodiment of the application, the distributed data processing logic is refined into the visualized operator through packaging the Flink framework and the native application programming interface (Application Programming Interface, API), and the distributed workflow configuration can be realized through dragging connecting lines and configuring data processing rules on the canvas, so that the high visualization is realized, rich operator types are provided, the service is better supported, and the development threshold is reduced. Secondly, a target data processing model configured by a user is encapsulated secondarily, tasks are submitted to the Flink cluster operation, all operator operation results and log data are collected to carry out diagnosis analysis, the debugging environment and the production environment are effectively isolated, the result generated by debugging workflow and abnormal pollution to the production environment are not needed to worry, and the time consumption of debugging the model is reduced to more than ten seconds. Meanwhile, through diagnosis and analysis of the log, a user can not only check normal data and abnormal data in the result respectively, but also obtain clear reasons and solutions of the abnormal condition for the workflow running abnormally without the assistance of professional staff in most scenes. CEP semantics, such as no timing-related templates (orFollowBy), that are not found in Flink are provided by deep reformulation within Flink native code. In addition, a separate SQL editing frame is provided for filling the FlinkSQL sentence for the user, the FlinkSQL sentence is analyzed based on the Calcite, the function that a developer can check the grammar of the FlinkSQL sentence at any time is realized, and practical functions such as field filling, sentence formatting and the like are provided, so that the user can develop as if using a database tool; by monitoring the real-time workflow at regular time and displaying the operation data obtained by monitoring in a diagrammatical manner, a user can conveniently check key indexes such as the operation state, the data processing rate, the accumulated data quantity, the resource allocation condition, the abnormal information and the like of the target data processing model; the system also provides functions of dynamic resource allocation, model self-recovery mechanism of offline or restarting of the model, intelligent diagnosis and analysis of abnormal logs and the like, and well solves the problem of difficult maintenance of the system in the industry.
It should be noted that, in this embodiment, the descriptions of the same steps and the same content as those in other embodiments may refer to the descriptions in other embodiments, and are not repeated here.
According to the data processing method provided by the embodiment of the application, the target data processing model is submitted to the debugging node for debugging so as to obtain the intermediate debugging result aiming at each data processing module and the target debugging result aiming at the target data processing model, the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result aiming at each data processing model and the target debugging result aiming at the data processing model, the time consumption is short, the accuracy is high, and the problems that a large amount of operations and a long time are required to verify the accuracy of the data processing logic after the data processing logic is configured through the Flink in the related technology are solved, so that the time consumption and the accuracy are poor are caused.
Based on the foregoing embodiments, the present application provides a data processing apparatus, which may be applied to the data processing method provided in the embodiment corresponding to fig. 1 to 3, and referring to fig. 25, the data processing apparatus 4 may include:
an acquisition unit 41 for acquiring a target data processing model based on user operations on a plurality of data processing modules on a canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions;
A first processing unit 42, configured to submit the target data processing model to a debug node for debug, to obtain an intermediate debug result for each data processing module and a target debug result for the target data processing model;
the second processing unit 43 is configured to submit the target data processing model to the operation node if the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged.
In other embodiments of the present application, the obtaining unit 41 is specifically configured to perform the following steps:
determining a plurality of data processing modules based on user operations on the canvas;
acquiring target configuration data for each data processing module for realizing a target function;
and obtaining a target data processing model based on the data processing module and the target configuration data.
In other embodiments of the present application, the operational node is configured to process the target data according to a target data processing model.
In other embodiments of the present application, the debug node comprises a Flink node and the run node comprises another resource coordinator YARN node.
In other embodiments of the present application, the obtaining unit 41 is specifically configured to perform the following steps:
Acquiring first configuration data for each data processing module;
receiving a determining instruction aiming at the first configuration data, and determining intermediate configuration data aiming at each data processing module from the configuration data to be selected based on the determining instruction;
obtaining second configuration data of each data processing module based on the intermediate configuration data and the first configuration data;
in the case where it is determined that the second configuration data is parsed, it is determined that the second configuration data is target configuration data of each data processing module.
In other embodiments of the present application, the obtaining unit 41 is specifically configured to perform the following steps:
determining a first data processing module representing a data source, a second data processing module having a data processing function and a third data processing module representing a storage location of the processed data based on a user operation on the canvas; the data processing module comprises a first data processing module, a second data processing module and a third data processing module;
accordingly, the obtaining unit 41 is specifically configured to perform the following steps:
receiving first sub-configuration data for a first data processing module, second sub-configuration data for a second data processing module, and third sub-configuration data for a third data processing module; wherein the first configuration data includes first, second and third sub-configuration data.
In other embodiments of the present application, the obtaining unit 41 is further configured to perform the following steps:
determining first abnormal data in the second configuration data under the condition that the second configuration data is determined not to be parsed;
and determining a first abnormality cause and a first optimization scheme based on the first abnormality data and displaying the first abnormality cause and the first optimization scheme.
In other embodiments of the present application, the obtaining unit 41 is specifically configured to perform the following steps:
obtaining an initial data processing model based on each data processing module, a first association relation among the data processing modules and each target configuration data;
carrying out integrity check on the initial data processing model;
in the case of passing the integrity check, the initial data processing model is determined to be the target data processing model.
In other embodiments of the present application, the first processing unit 42 is further configured to perform the following steps:
checking the executable of the target data processing model;
accordingly, the first processing unit 42 is specifically configured to perform the following steps:
under the condition that the executable of the target data processing model is determined, submitting the target data processing model to a debugging node for analysis to obtain data processing logic;
and processing the debugging data based on the data processing logic to obtain an intermediate debugging result and a target debugging result.
In other embodiments of the present application, the first processing unit 42 is specifically configured to perform the following steps:
submitting the target data processing model to a debugging node for analysis to obtain a processing operator and an output operator in each data processing module; the processing operator is used for processing the data, and the output operator is used for outputting the processing result of each data processing module;
determining a second association relation between the processing operators and a third association relation between the processing operators and the output operators;
determining a first data processing logic for each data processing module based on the processing operator and the output operator;
determining second data processing logic for the target data processing model based on the processing operator, the output operator, the second association and the third association; wherein the data processing logic comprises first data processing logic and second data processing logic.
In other embodiments of the present application, the first processing unit 42 is further configured to perform the following steps:
periodically acquiring first operation data, an intermediate debugging result and a target debugging result when the target data processing model is debugged through a debugging node;
under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged, displaying a first message for representing that the target data processing model is successfully debugged;
Under the condition that the intermediate debugging result and the target debugging result represent the debugging abnormality of the target data processing model/the target debugging result is not obtained within the target time threshold, analyzing the first operation data, the intermediate debugging result and the target debugging result, and determining second abnormal data when the target data processing module is debugged;
and determining a second abnormality cause and a second optimization scheme based on the second abnormality data and displaying the second abnormality cause and the second optimization scheme.
In other embodiments of the present application, the second processing unit 43 is further configured to perform the following steps:
acquiring second operation data when the target data processing model is operated through the operation node;
analyzing the second operation data through the monitoring node to determine the operation state and the data processing condition of the target data processing model;
under the condition that the running state is normal, displaying the running state and the data processing condition;
under the condition that the running state is abnormal, analyzing the second running data and the data processing condition to obtain third abnormal data aiming at the running of the target data processing model;
determining a third abnormality cause and a third optimization scheme based on the third abnormality data and displaying the third abnormality cause and the third optimization scheme;
And under the condition that the running state is abnormal and the data processing system meets the model recovery condition, processing the target data based on the target data processing model.
It should be noted that, in the data processing method provided by the embodiment corresponding to fig. 1 to 3, specific descriptions of steps executed by each unit may be omitted here.
According to the data processing device provided by the embodiment of the application, the target data processing model is submitted to the debugging node for debugging so as to obtain the intermediate debugging result aiming at each data processing module and the target debugging result aiming at the target data processing model, the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result aiming at each data processing model and the target debugging result aiming at the data processing model, the time consumption is short and the accuracy is high, and the problems that after the data processing logic is configured through the Flink in the related technology, a large amount of operations and a long time are required to verify the accuracy of the data processing logic, so that the time consumption and the accuracy are long are solved
Based on the foregoing embodiments, the embodiments of the present application provide a data processing system, which may be applied to the data processing method provided in the corresponding embodiments of fig. 1 to 3, and referring to fig. 26, the data processing system 5 may include: a processor 51, a memory 52 and a communication bus 53, wherein:
A communication bus 53 for enabling communication connection between the processor 51 and the memory 52;
the processor 51 is configured to execute a data processing program in the memory 52 to implement the steps of:
obtaining a target data processing model based on user operation of a plurality of data processing modules on the canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions;
submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model;
and submitting the target data processing model to the operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged.
In other embodiments of the present application, the processor 51 is configured to execute a data processing program in the memory 52 to obtain a target data processing model based on user operation of a plurality of data processing modules on a canvas to implement the steps of:
determining a plurality of data processing modules based on user operations on the canvas;
acquiring target configuration data for each data processing module for realizing a target function;
And obtaining a target data processing model based on the data processing module and the target configuration data.
In other embodiments of the present application, the operational node is configured to process the target data according to a target data processing model.
In other embodiments of the present application, the debug node comprises a Flink node and the run node comprises another resource coordinator YARN node.
In other embodiments of the present application, the processor 51 is configured to execute the acquisition of the target configuration data for each data processing module for implementing the target function by the data processing program in the memory 52, so as to implement the following steps:
acquiring first configuration data for each data processing module;
receiving a determining instruction aiming at the first configuration data, and determining intermediate configuration data aiming at each data processing module from the configuration data to be selected based on the determining instruction;
obtaining second configuration data of each data processing module based on the intermediate configuration data and the first configuration data;
in the case where it is determined that the second configuration data is parsed, it is determined that the second configuration data is target configuration data of each data processing module.
In other embodiments of the present application, the processor 51 is configured to execute a data processing program in the memory 52 to determine a plurality of data processing modules based on user operations on a canvas to implement the steps of:
Determining a first data processing module representing a data source, a second data processing module having a data processing function and a third data processing module representing a storage location of the processed data based on a user operation on the canvas; the data processing module comprises a first data processing module, a second data processing module and a third data processing module;
accordingly, in other embodiments of the present application, the processor 51 is configured to execute the acquisition of the first configuration data of the data processing program in the memory 52 for each data processing module, so as to implement the following steps:
receiving first sub-configuration data for a first data processing module, second sub-configuration data for a second data processing module, and third sub-configuration data for a third data processing module; wherein the first configuration data includes first, second and third sub-configuration data.
In other embodiments of the present application, the processor 51 is configured to execute the data processing program in the memory 52, and the following steps may be implemented:
determining first abnormal data in the second configuration data under the condition that the second configuration data is determined not to be parsed;
And determining a first abnormality cause and a first optimization scheme based on the first abnormality data and displaying the first abnormality cause and the first optimization scheme.
In other embodiments of the present application, the processor 51 is configured to execute a data processing program in the memory 52 to obtain a target data processing model based on the data processing module and the target configuration data, so as to implement the following steps:
obtaining an initial data processing model based on each data processing module, a first association relation among the data processing modules and each target configuration data;
carrying out integrity check on the initial data processing model;
in the case of passing the integrity check, the initial data processing model is determined to be the target data processing model.
In other embodiments of the present application, the processor 51 is configured to execute the data processing program in the memory 52, submit the target data processing model to the debug node for debugging, obtain an intermediate debug result for each data processing module and a target debug result for the target data processing model, and further implement the following steps:
checking the executable of the target data processing model;
accordingly, the processor 51 is configured to execute the data processing program in the memory 52, submit the target data processing model to the debug node for debugging, and obtain an intermediate debug result for each data processing module and a target debug result for the target data processing model, so as to implement the following steps:
Under the condition that the executable of the target data processing model is determined, submitting the target data processing model to a debugging node for analysis to obtain data processing logic;
and processing the debugging data based on the data processing logic to obtain an intermediate debugging result and a target debugging result.
In other embodiments of the present application, processor 51 is configured to execute a data processing program in memory 52 to submit a target data processing model to a debug node for analysis to obtain data processing logic to implement the steps of:
submitting the target data processing model to a debugging node for analysis to obtain a processing operator and an output operator in each data processing module; the processing operator is used for processing the data, and the output operator is used for outputting the processing result of each data processing module;
determining a second association relation between the processing operators and a third association relation between the processing operators and the output operators;
determining a first data processing logic for each data processing module based on the processing operator and the output operator;
determining second data processing logic for the target data processing model based on the processing operator, the output operator, the second association and the third association; wherein the data processing logic comprises first data processing logic and second data processing logic.
In other embodiments of the present application, the processor 51 is configured to execute the data processing program in the memory 52, and the following steps may be implemented:
periodically acquiring first operation data, an intermediate debugging result and a target debugging result when the target data processing model is debugged through a debugging node;
under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged, displaying a first message for representing that the target data processing model is successfully debugged;
under the condition that the intermediate debugging result and the target debugging result represent the debugging abnormality of the target data processing model/the target debugging result is not obtained within the target time threshold, analyzing the first operation data, the intermediate debugging result and the target debugging result, and determining second abnormal data when the target data processing module is debugged;
and determining a second abnormality cause and a second optimization scheme based on the second abnormality data and displaying the second abnormality cause and the second optimization scheme.
In other embodiments of the present application, the processor 51 is configured to execute the data processing program in the memory 52, and the following steps may be implemented:
analyzing the second operation data through the monitoring node to determine the operation state and the data processing condition of the target data processing model;
Under the condition that the running state is normal, displaying the running state and the data processing condition;
under the condition that the running state is abnormal, analyzing the second running data and the data processing condition to obtain third abnormal data aiming at the running of the target data processing model;
determining a third abnormality cause and a third optimization scheme based on the third abnormality data and displaying the third abnormality cause and the third optimization scheme;
and under the condition that the running state is abnormal and the data processing system meets the model recovery condition, processing the target data based on the target data processing model.
It should be noted that, in the data processing method provided by the embodiment corresponding to fig. 1 to 3, specific descriptions of steps executed by the processor may be omitted here.
According to the data processing system provided by the embodiment of the application, the target data processing model is submitted to the debugging node for debugging so as to obtain the intermediate debugging result for each data processing module and the target debugging result for the target data processing model, the accuracy of the target data processing model can be rapidly and accurately verified based on the intermediate debugging result for each data processing model and the target debugging result for the data processing model, the time consumption is short, the accuracy is high, and the problems that a large amount of operations and a long time are required to verify the accuracy of the data processing logic after the data processing logic is configured through the Flink in the related technology are solved, so that the time consumption and the accuracy are poor are caused.
Based on the foregoing embodiments, embodiments of the present application provide a computer readable storage medium storing one or more programs executable by one or more processors to implement the steps of the data processing method provided by the corresponding embodiments of fig. 1 to 3.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application.

Claims (15)

1. A method of data processing, the method comprising:
obtaining a target data processing model based on user operation of a plurality of data processing modules on the canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions;
Submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model;
submitting the target data processing model to an operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged;
the step of submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model comprises the following steps:
under the condition that the target data processing model is confirmed to be executable, submitting the target data processing model to a debugging node for analysis to obtain a processing operator and an output operator in each data processing module; the processing operator is used for processing data, and the output operator is used for outputting a processing result of each data processing module;
determining data processing logic for the target data processing model according to the processing operator and the output operator; the data processing logic comprises first data processing logic for each data processing module and second data processing logic for the target data processing model, wherein the first data processing logic is obtained according to the processing operator and the output operator, and the second data processing logic is obtained according to the processing operator, the output operator, a second association relation among the processing operators and a third association relation between the processing operator and the output operator;
And processing the debugging data based on the data processing logic to obtain an intermediate debugging result and a target debugging result.
2. The method of claim 1, wherein the obtaining the target data processing model based on user operation of the plurality of data processing modules on the canvas comprises:
determining a plurality of data processing modules based on user operations on the canvas;
acquiring target configuration data for each data processing module for realizing a target function;
and obtaining the target data processing model based on the data processing module and the target configuration data.
3. The method of claim 1, wherein the operational node is configured to process target data according to the target data processing model.
4. A method according to any one of claims 1-3, wherein the debug node comprises a Flink node and the run node comprises another resource coordinator YARN node.
5. The method of claim 2, wherein the obtaining target configuration data for each of the data processing modules for implementing a target function comprises:
acquiring first configuration data for each data processing module;
Receiving a determining instruction for the first configuration data, and determining intermediate configuration data for each data processing module from configuration data to be selected based on the determining instruction;
obtaining second configuration data of each data processing module based on the intermediate configuration data and the first configuration data;
and determining the second configuration data as target configuration data of each data processing module under the condition that the second configuration data is determined to be parsed.
6. The method of claim 2, wherein the determining a plurality of data processing modules based on user operations on the canvas comprises:
determining a first data processing module representing a data source, a second data processing module having a data processing function and a third data processing module representing a storage location of the processed data based on a user operation on the canvas; wherein the data processing module comprises the first data processing module, the second data processing module and the third data processing module;
correspondingly, acquiring the first configuration data for each data processing module includes:
receiving first sub-configuration data for the first data processing module, second sub-configuration data for the second data processing module, and third sub-configuration data for the third data processing module; wherein the first configuration data includes the first sub-configuration data, the second sub-configuration data, and the third sub-configuration data.
7. The method of claim 5, wherein the method further comprises:
determining first abnormal data in the second configuration data under the condition that the second configuration data is determined not to be parsed through grammar;
and determining a first abnormality cause and a first optimization scheme based on the first abnormality data and displaying the first abnormality cause and the first optimization scheme.
8. The method of claim 2, wherein the deriving a target data processing model based on the data processing module and the target configuration data comprises:
obtaining an initial data processing model based on each data processing module, a first association relation among the data processing modules and each target configuration data;
carrying out integrity check on the initial data processing model;
and determining the initial data processing model as the target data processing model under the condition of passing the integrity check.
9. The method of claim 1, wherein submitting the target data processing model to a debug node for debugging, prior to obtaining intermediate debug results for each of the data processing modules and target debug results for the target data processing model, further comprises:
And checking the executable performance of the target data processing model.
10. The method of claim 1, wherein the determining data processing logic for the target data processing model from the processing operator and the output operator comprises:
determining a second association relation between the processing operators and a third association relation between the processing operators and the output operators;
determining first data processing logic for each of the data processing modules based on the processing operator and the output operator;
determining second data processing logic for the target data processing model based on the processing operator, the output operator, the second association, and the third association; wherein the data processing logic comprises the first data processing logic and the second data processing logic.
11. The method according to claim 10, wherein the method further comprises:
periodically acquiring first operation data, the intermediate debugging result and the target debugging result when the target data processing model is debugged through the debugging node;
Displaying a first message for representing successful debugging of the target data processing model under the condition that the intermediate debugging result and the target debugging result represent successful debugging of the target data processing model;
analyzing the first operation data, the intermediate debugging result and the target debugging result under the condition that the intermediate debugging result and the target debugging result represent the target data processing model debugging abnormality/the target debugging result is not obtained within a target time threshold value, and determining second abnormal data when the target data processing module is debugged;
and determining a second abnormality cause and a second optimization scheme based on the second abnormality data and displaying the second abnormality cause and the second optimization scheme.
12. The method according to claim 1, wherein the method further comprises:
acquiring second operation data when the target data processing model is operated through the operation node;
analyzing the second operation data through a monitoring node to determine the operation state and the data processing condition of the target data processing model;
under the condition that the running state is normal, displaying the running state and the data processing condition;
Under the condition that the running state is abnormal, analyzing the second running data and the data processing condition to obtain third abnormal data aiming at the running of the target data processing model;
determining a third abnormality cause and a third optimization scheme based on the third abnormality data and displaying the third abnormality cause and the third optimization scheme;
and processing the target data based on the target data processing model under the condition that the running state is abnormal and the data processing system meets the model recovery condition.
13. A data processing apparatus, the apparatus comprising:
an acquisition unit for acquiring a target data processing model based on user operations on a plurality of data processing modules on a canvas; the data processing module is provided with data processing logic for processing data to realize corresponding functions;
the first processing unit is used for submitting the target data processing model to a debugging node for debugging, and obtaining an intermediate debugging result aiming at each data processing module and a target debugging result aiming at the target data processing model;
the second processing unit is used for submitting the target data processing model to an operation node under the condition that the intermediate debugging result and the target debugging result represent that the target data processing model is successfully debugged;
The first processing unit is specifically configured to submit the target data processing model to a debug node for analysis under the condition that the target data processing model is determined to be executable, so as to obtain a processing operator and an output operator in each data processing module; the processing operator is used for processing data, and the output operator is used for outputting a processing result of each data processing module; determining data processing logic for the target data processing model according to the processing operator and the output operator; the data processing logic comprises first data processing logic for each data processing module and second data processing logic for the target data processing model, wherein the first data processing logic is obtained according to the processing operator and the output operator, and the second data processing logic is obtained according to the processing operator, the output operator, a second association relation among the processing operators and a third association relation between the processing operator and the output operator; and processing the debugging data based on the data processing logic to obtain an intermediate debugging result and a target debugging result.
14. A data processing system, the system comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute a data processing program in the memory, so as to implement the steps of the data processing method according to any one of claims 1 to 12.
15. A computer readable storage medium storing one or more programs executable by one or more processors to implement the steps of the data processing method of any one of claims 1-12.
CN202211304175.2A 2022-10-24 2022-10-24 Data processing method, device, system and computer readable storage medium Active CN115357309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211304175.2A CN115357309B (en) 2022-10-24 2022-10-24 Data processing method, device, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211304175.2A CN115357309B (en) 2022-10-24 2022-10-24 Data processing method, device, system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN115357309A CN115357309A (en) 2022-11-18
CN115357309B true CN115357309B (en) 2023-07-14

Family

ID=84008792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211304175.2A Active CN115357309B (en) 2022-10-24 2022-10-24 Data processing method, device, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115357309B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133717A (en) * 2014-08-26 2014-11-05 中电海康集团有限公司 Service automatic generation and deployment method for data open system
CN104520814A (en) * 2012-08-07 2015-04-15 超威半导体公司 System and method for configuring cloud computing systems
CN106952426A (en) * 2017-02-24 2017-07-14 百富计算机技术(深圳)有限公司 Data processing method and device
CN109144735A (en) * 2018-09-29 2019-01-04 百度在线网络技术(北京)有限公司 Method and apparatus for handling data
CN113806429A (en) * 2020-06-11 2021-12-17 深信服科技股份有限公司 Canvas type log analysis method based on large data stream processing framework
CN113868127A (en) * 2021-09-22 2021-12-31 南京苏宁电子信息技术有限公司 Online debugging method and device, computer equipment and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9336023B2 (en) * 2009-12-18 2016-05-10 Sybase, Inc. Visual generation of mobile applications based on data models
EP2523107B1 (en) * 2011-04-19 2018-11-07 LG Electronics Inc. Mobile terminal and system for managing applications using the same
US20160179161A1 (en) * 2014-12-22 2016-06-23 Robert P. Adler Decode information library
KR102401772B1 (en) * 2015-10-02 2022-05-25 삼성전자주식회사 Apparatus and method for executing application in electronic deivce
EP3376373A1 (en) * 2017-03-15 2018-09-19 Siemens Aktiengesellschaft A method for deployment and execution of a machine learning model on a field device
CN110597678B (en) * 2019-09-09 2022-05-31 腾讯科技(深圳)有限公司 Debugging method and debugging unit
CN111352616A (en) * 2020-02-20 2020-06-30 苏宁云计算有限公司 Real-time calculation visualization development system and application method thereof
CN111752665A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Flow generation method and device for RPA flow generation end and storage medium
CN112632391A (en) * 2020-12-30 2021-04-09 深圳市华傲数据技术有限公司 Data processing method, device and storage medium
CN112988130A (en) * 2021-02-24 2021-06-18 恒安嘉新(北京)科技股份公司 Visual modeling method, device, equipment and medium based on big data
CN112948467B (en) * 2021-03-18 2023-10-10 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium
CN114416779A (en) * 2022-03-21 2022-04-29 北京德塔精要信息技术有限公司 Data processing method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104520814A (en) * 2012-08-07 2015-04-15 超威半导体公司 System and method for configuring cloud computing systems
CN104133717A (en) * 2014-08-26 2014-11-05 中电海康集团有限公司 Service automatic generation and deployment method for data open system
CN106952426A (en) * 2017-02-24 2017-07-14 百富计算机技术(深圳)有限公司 Data processing method and device
CN109144735A (en) * 2018-09-29 2019-01-04 百度在线网络技术(北京)有限公司 Method and apparatus for handling data
CN113806429A (en) * 2020-06-11 2021-12-17 深信服科技股份有限公司 Canvas type log analysis method based on large data stream processing framework
CN113868127A (en) * 2021-09-22 2021-12-31 南京苏宁电子信息技术有限公司 Online debugging method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN115357309A (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN110764753B (en) Business logic code generation method, device, equipment and storage medium
CN110287052B (en) Root cause task determination method and device for abnormal task
US10949225B2 (en) Automatic detection of user interface elements
US9576037B2 (en) Self-analyzing data processing job to determine data quality issues
JP7163435B2 (en) Executable logic for handling keyed data in the network
CN112558931B (en) Intelligent model construction and operation method for user workflow mode
US9165049B2 (en) Translating business scenario definitions into corresponding database artifacts
US20140109053A1 (en) Identifying high impact bugs
CN110647387B (en) Education cloud big data task scheduling method and system
CN110928772A (en) Test method and device
US10528456B2 (en) Determining idle testing periods
CN111459698A (en) Database cluster fault self-healing method and device
WO2017044069A1 (en) Automatic regression identification
CN111339118A (en) Kubernetes-based resource change history recording method and device
US11119899B2 (en) Determining potential test actions
CN115357309B (en) Data processing method, device, system and computer readable storage medium
CN113254350A (en) Flink operation testing method, device, equipment and storage medium
Gawade et al. Stethoscope: a platform for interactive visual analysis of query execution plans
US10061681B2 (en) System for discovering bugs using interval algebra query language
CN110580170A (en) software performance risk identification method and device
CN115757175A (en) Transaction log file processing method and device
Dreves et al. Validating Data and Models in Continuous ML Pipelines.
CN115454702A (en) Log fault analysis method and device, storage medium and electronic equipment
CN111539529B (en) Event reasoning method and device
CN111159203B (en) Data association analysis method, platform, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant