CN114281549A

CN114281549A - Data processing method and device

Info

Publication number: CN114281549A
Application number: CN202111633475.0A
Authority: CN
Inventors: 李斌; 李昱; 王全礼; 张圳; 郭程建
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-05

Abstract

The application provides a data processing method and device, and belongs to the technical field of data processing. The method mainly comprises the following steps: firstly, responding to an adding request triggered by a user, adding a plurality of task operators in original data of a target task, wherein the task operators are used for triggering the operation on the data to be processed of the target task; then, establishing a dependency relationship among a plurality of task operators to generate task data of the target task; and finally, sending a processing request of the target task to the server, wherein the processing request comprises task data of the target task. By the method, the task data of the target task can be generated according to the plurality of task operators, the processing request containing the task data is sent to the server, and the server can complete the whole processing flow of the data to be processed of the target task according to the processing request, so that the processing efficiency of the server on the data to be processed of the target task is improved.

Description

Data processing method and device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus.

Background

In the financial industry, content data is mainly unstructured or semi-structured text data, such as text data of customer service interaction, financial information, work orders, statistical reports and the like, and extraction of required information from the content data is a problem to be solved.

Conventionally, content data is generally processed by a Natural Language Processing (NLP) technique and a machine learning technique, and content data is counted by a Structured Query Language (SQL) technique, thereby extracting necessary information from the content data.

However, in the conventional content data processing method, a plurality of processes or statistical processes are required to be performed separately to extract necessary information from content data, resulting in inefficient processing of content data.

Disclosure of Invention

The application provides a data processing method and device, which aim to solve the technical problem that the processing efficiency of content data is low in the prior art.

In a first aspect, the present application provides a method for processing data, the method including:

responding to an adding request triggered by a user, and adding a plurality of task operators in original data of a target task, wherein the task operators are used for triggering the operation on the data to be processed of the target task;

establishing a dependency relationship among the task operators to generate task data of the target task;

and sending a processing request of the target task to a server, wherein the processing request comprises task data of the target task.

According to the data processing method provided by the first aspect, the terminal device can generate the task data of the target task according to the plurality of task operators and send the processing request containing the task data to the server, so that the server can complete the whole processing flow of the data to be processed of the target task according to the processing request, and the processing efficiency of the server on the data to be processed of the target task is improved.

In an optional implementation, the adding a plurality of task operators to the original data of the target task includes:

and adding the plurality of task operators in the original data of the target task according to a preset adding sequence of the task operators.

In an optional implementation manner, the preset adding order of the task operators is a first task operator, at least one second task operator, and a third task operator in sequence;

the first task operator is used for acquiring the data to be processed, the second task operator is used for processing or counting the data to be processed, and the third task operator is used for outputting result data corresponding to the data to be processed.

According to the data processing method provided by the embodiment, the plurality of task operators can be added into the original data of the target task according to the adding sequence, so that the tasks such as the input and output task, the natural language processing task, the machine learning task, the statistical analysis task and the like can be unified into the task data of the same target task, and the development and the cooperation efficiency of developers on the target processing task are improved.

In an optional embodiment, before the establishing the dependency relationship between the plurality of task operators, the method further comprises:

determining operator parameters respectively corresponding to the plurality of task operators contained in the adding request;

and configuring the task operators according to the operator parameters respectively corresponding to the task operators.

In an optional implementation manner, if a task operator is the first task operator, the configuring the plurality of task operators includes:

and configuring an acquisition path of the first task operator to the data to be processed according to the operator parameter corresponding to the first task operator.

In an optional implementation manner, if the task operator is the third task operator, the configuring the plurality of task operators includes:

and configuring an output path of the third task operator to the result data according to the operator parameter corresponding to the third task operator.

According to the data processing method provided by the embodiment, the task operators can be specifically configured in a mode of setting operator parameters according to different target tasks, so that the same type of task operators can be applied to different data processing flows, and the utilization rate of the task operators is further improved.

In an optional embodiment, after the generating task data of the target task, the method further includes:

responding to a test request triggered by a user, and determining test data corresponding to the target task from test set data;

and determining a test result corresponding to the test data according to the test data and the task data. In an optional implementation manner, the sending, to a server, a processing request of the target task includes:

and if the test result corresponding to the test data is successful, sending a processing request of the target task to a server.

By the data processing method provided by the embodiment, the task data of the target task can be tested in advance through the test set data, and the processing request of the target task is sent after the test result is successful, so that the accuracy of the processing result corresponding to the data to be processed is improved.

In an optional embodiment, the establishing a dependency relationship between the task operators includes:

and establishing a dependency relationship among the task operators according to the structure of the directed acyclic graph, wherein the structure of the directed acyclic graph comprises task nodes corresponding to the task operators, and task paths among the task nodes are not closed-loop.

According to the data processing method provided by the embodiment, the dependency relationship among the task operators can be established according to the structure of the directed acyclic graph, so that the server can flexibly and orderly complete the target task according to the dependency relationship among the task operators.

In a second aspect, the present application provides a method for processing data, the method including:

receiving a processing request of a target task sent by a terminal device, wherein the processing request comprises task data of the target task, and the task data of the target task is generated by establishing a dependency relationship among a plurality of task operators;

and determining result data corresponding to the data to be processed according to the task data of the target task.

In an optional embodiment, the task data of the target task includes a first task operator, at least one second task operator, and a third task operator;

the first task operator is used for acquiring the data to be processed according to an acquisition path, the at least one second task operator is used for determining result data corresponding to the data to be processed, and the third task operator is used for outputting the result data corresponding to the data to be processed according to an output path.

In an optional implementation manner, the processing request of the target task includes an identifier of a target business system, and after determining result data corresponding to the data to be processed, the method further includes:

and sending the result data to the terminal equipment corresponding to the target service system according to the identification of the target service system.

In an optional implementation manner, after receiving the processing request of the target task sent by the terminal device, the method further includes:

and acquiring monitoring data corresponding to the task data of the target task in the running state according to a preset monitoring index, wherein the monitoring index is used for indicating whether the running state of the task data of the target task is normal.

In a third aspect, the present application provides an apparatus for processing data, the apparatus comprising:

the response module is used for responding to an adding request triggered by a user and adding a plurality of task operators in original data of a target task, wherein the task operators are used for triggering the operation on the data to be processed of the target task;

the generating module is used for establishing a dependency relationship among the task operators and generating task data of the target task;

and the sending module is used for sending a processing request of the target task to a server, wherein the processing request comprises task data of the target task.

In an optional implementation manner, the response module is specifically configured to add the plurality of task operators to the original data of the target task according to a preset adding order of the task operators.

In an optional implementation manner, the preset adding order of the task operators is a first task operator, at least one second task operator, and a third task operator in sequence; the first task operator is used for acquiring the data to be processed, the second task operator is used for processing or counting the data to be processed, and the third task operator is used for outputting result data corresponding to the data to be processed.

In an optional implementation manner, the response module is further configured to determine operator parameters corresponding to the task operators included in the addition request; and configuring the task operators according to the operator parameters respectively corresponding to the task operators.

In an optional implementation manner, if the task operator is the first task operator, the response module is specifically configured to configure, according to an operator parameter corresponding to the first task operator, an acquisition path of the first task operator to the to-be-processed data.

In an optional implementation manner, if the task operator is the third task operator, the response module is specifically configured to configure an output path of the third task operator to the result data according to an operator parameter corresponding to the third task operator.

In an optional embodiment, the generating module is further configured to determine, in response to a test request triggered by a user, test data corresponding to the target task from test set data; and determining a test result corresponding to the test data according to the test data and the task data.

In an optional implementation manner, the sending module is specifically configured to send the processing request of the target task to the server if the test result corresponding to the test data is successful.

In an optional implementation manner, the generating module is specifically configured to establish a dependency relationship between the plurality of task operators according to a structure of a directed acyclic graph, where the structure of the directed acyclic graph includes task nodes corresponding to the plurality of task operators, and a task path between the task nodes has no closed loop.

In a fourth aspect, the present application provides an apparatus for processing data, the apparatus comprising:

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a processing request of a target task sent by a terminal device, the processing request comprises task data of the target task, and the task data of the target task is generated by establishing a dependency relationship among a plurality of task operators;

and the processing module is used for determining result data corresponding to the data to be processed according to the task data of the target task.

In an optional embodiment, the task data of the target task includes a first task operator, at least one second task operator, and a third task operator; the first task operator is used for acquiring the data to be processed according to an acquisition path, the at least one second task operator is used for determining result data corresponding to the data to be processed, and the third task operator is used for outputting the result data corresponding to the data to be processed according to an output path.

In an optional implementation manner, the processing request of the target task includes an identifier of a target service system, and the processing module is further configured to send the result data to a terminal device corresponding to the target service system according to the identifier of the target service system.

In an optional implementation manner, the processing module is further configured to acquire monitoring data corresponding to the task data of the target task in an operating state according to a preset monitoring index, where the monitoring index is used to indicate whether the operating state of the task data of the target task is normal.

In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the method of any one of the first aspect.

In a sixth aspect, the present application also provides a computer program product comprising a computer program that, when executed by a processor, performs the method of any one of the second aspects.

In a seventh aspect, the present invention also provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method according to any of the first aspect.

In an eighth aspect, the present invention also provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method according to any of the second aspects.

In a ninth aspect, the present application further provides an electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of the first aspects.

In a tenth aspect, the present application further provides an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of the second aspects.

According to the data processing method and device, firstly, in response to an adding request triggered by a user, a plurality of task operators are added into original data of a target task, and the task operators are used for triggering operation on to-be-processed data of the target task; then, establishing a dependency relationship among a plurality of task operators to generate task data of the target task; and finally, sending a processing request of the target task to the server, wherein the processing request comprises task data of the target task. By the method, the terminal device can generate the task data of the target task according to the plurality of task operators and send the processing request containing the task data to the server, and the server can complete the whole processing flow of the data to be processed of the target task according to the processing request, so that the processing efficiency of the server on the data to be processed of the target task is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings needed to be used in the description of the embodiments or the prior art, and obviously, the drawings in the following description are some embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without inventive labor.

Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic view of a DAG structure of task data of a target task according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another data processing method according to an embodiment of the present application;

FIG. 5 is a block diagram of a data processing system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the development of big data technology and artificial intelligence technology, all industries try or finish the construction work of big data and artificial intelligence platforms. In the financial industry, content data is mainly unstructured or semi-structured text data, such as text data of customer service interaction, financial information, work orders, statistical reports and the like, and extraction of required information from the content data is a problem to be solved.

In order to solve the above technical problems, embodiments of the present application provide a data processing method and apparatus, which generate task data of a target task according to a plurality of task operators, and send a processing request including the task data to a server, so that the server can complete a whole processing flow of to-be-processed data of the target task according to the processing request, thereby improving processing efficiency of the server on the to-be-processed data of the target task.

An application scenario of the data processing method according to the present application is described below.

Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application. As shown in fig. 1, the terminal apparatus 101 and the server 102 are included. Firstly, the terminal device 101 may add a plurality of task operators to the original data of the target task according to an addition request triggered by a user, so as to generate task data of the target task; and sends a processing request containing task data of the target task to the server. After receiving the processing request, the server 102 may trigger operations such as acquisition, processing, statistics, and output of the data to be processed according to the processing request, and send processing result data corresponding to the data to be processed to the terminal device 101.

The terminal device 101 may include one terminal device, or may include a plurality of terminal devices. When the terminal device 101 includes a terminal device, the terminal device may send a processing request including task data of a target task to the server, or receive processing result data corresponding to data to be processed from the server. When the terminal device 101 includes a plurality of terminal devices, a part of the terminal devices may send a processing request including task data of a target task to the server, and another part of the terminal devices may receive processing result data corresponding to data to be processed from the server.

The terminal device may be a mobile phone (mobile phone), a tablet computer (pad), a computer with a wireless transceiving function, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a wireless terminal in a self driving (self driving), a wireless terminal in a remote surgery (remote medical supply), a wireless terminal in a smart grid (smart grid), a wireless terminal in a smart home (smart home), and the like. In the embodiment of the present application, the apparatus for implementing the function of the terminal may be the terminal, or may be an apparatus capable of supporting the terminal to implement the function, such as a chip system, and the apparatus may be installed in the terminal. In the embodiment of the present application, the chip system may be composed of a chip, and may also include a chip and other discrete devices.

The server may be, but is not limited to, a single web server, a server group consisting of a plurality of web servers, or a cloud based on cloud computing consisting of a large number of computers or web servers.

It should be understood that the application scenario of the present technical solution may be a processing scenario of the data in fig. 1, but is not limited thereto, and may also be applied to other scenarios that require processing of the data.

It can be understood that the above data processing method can be implemented by the data processing apparatus provided in the embodiments of the present application, and the data processing apparatus can be part or all of a certain device, such as a terminal device or a server.

The technical solutions of the embodiments of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart illustrating a data processing method according to an embodiment of the present application, where the embodiment relates to a process in which a terminal device sends a processing request of a target task to a server according to an addition request triggered by a user. As shown in fig. 2, the method includes:

s201, the terminal device responds to an adding request triggered by a user, and adds a plurality of task operators in original data of a target task, wherein the task operators are used for triggering operation on to-be-processed data of the target task.

In this embodiment of the present application, the terminal device may first add a plurality of task operators in the original data of the target task in response to an addition request triggered by a user.

The data to be processed may be any type of data. In some embodiments, the data to be processed may be unstructured or semi-structured text data. Illustratively, text data of the type of customer service interaction, financial information, work orders, statistics, etc. may be included. The target task may be any type of task performed on the data to be processed. In some embodiments, the type of target task may be determined according to business process requirements. Illustratively, the target tasks may include "analyze new words within about 7 days", "count popular topics within about 1 month", "calculate pairwise similarity of news for news in about 3 months", and the like. It should be understood that the target task may indicate to acquire the data to be processed within a preset time range.

The embodiment of the present application does not limit the triggering method of the addition request. In some embodiments, the triggering method of the addition request may be determined according to the type of the terminal device. Illustratively, when the terminal device is a mobile phone or a tablet computer, the addition request may be triggered by dragging a visual graph corresponding to the task operator to the target area.

The embodiment of the application does not limit the types of the task operators. In some embodiments, the task operators may be data Processing models constructed according to Natural Language Processing (NLP) techniques or machine learning techniques; exemplary data processing models may include types of text clustering, classification, word segmentation, similarity calculation, named entity recognition, multi-label classification, and the like. For example, a "classification task operator" may implement a classification algorithm to classify input data; a word segmentation task operator can realize a word segmentation algorithm to segment data. In other embodiments, the task operator may be a Structured Query Language (SQL) that performs statistical analysis tasks; illustratively, SQL to perform statistical analysis tasks of the packet ordering, packet aggregation, etc. type may be included. In still other embodiments, the method may further include an operator of a data input and output type, and may exemplarily include an operator of an input type for acquiring data to be processed and an operator of an output type for outputting result data. The embodiment of the application also does not limit the source of the task operator. In some embodiments, the source of the task operator may be determined according to the type of the task operator. Illustratively, the task operators can be set according to a generic, standard NLP toolkit or machine learning model, and can also be set according to a generic SQL-based statistical analysis task.

In some embodiments, a unified interface may also be provided for compiling task operators for the required functions; or loading the existing data model of the third party to be directly used as a task operator; or the task operator of the statistical analysis type is edited through SQL statements (such as spark SQL and HiveQL), so that the task operator is expanded. In other embodiments, a plurality of task operators may be combined according to a structural requirement of a Directed Acyclic Graph (DAG) to generate a combined task operator. Illustratively, the classification task operator may be combined with the segmentation task operator to generate a classification segmentation task operator.

The method and the device do not limit how the terminal device adds the plurality of task operators in the original data of the target task. In some embodiments, the terminal device may add a plurality of task operators in the original data of the target task according to a preset addition order of the task operators. For example, the preset adding order of the task operators may be to add the first task operator first, then add the at least one second task operator, and finally add the third task operator. The first task operator is used for acquiring data to be processed, the second task operator is used for processing or counting the data to be processed, and the third task operator is used for outputting result data corresponding to the data to be processed.

It should be understood that the addition request triggered by the user may further include operator parameters corresponding to the plurality of task operators. The embodiment of the application does not limit how the terminal device configures the task operator. In some embodiments, the terminal device may further determine operator parameters corresponding to the plurality of task operators, and then configure the plurality of task operators according to the operator parameters corresponding to the plurality of task operators. Exemplarily, if the type of the task operator is a first task operator, the terminal device may configure an acquisition path of the first task operator to the data to be processed according to an operator parameter corresponding to the first task operator; if the type of the task operator is the third task operator, the terminal device may configure an output path of the third task operator to the result data according to an operator parameter corresponding to the third task operator.

In the step, a plurality of task operators are added to the original data of the target task, so that the data input and output task, the NLP task, the machine learning task, the statistical analysis task and the like can be unified in one data processing flow, the same task operator can be applied to different data processing flows, and the processing efficiency of the data to be processed and the utilization rate of the task operators are improved.

S202, the terminal device establishes a dependency relationship among a plurality of task operators to generate task data of a target task.

In this step, after adding the plurality of task operators to the original data of the target task, the terminal device may establish a dependency relationship between the plurality of task operators to generate the task data of the target task.

It should be understood that there are dependencies between task operators required to complete a target task, such as chronological dependencies in execution time. The embodiment of the application does not limit how to establish the dependency relationship among the task operators. In some embodiments, the terminal device may establish a dependency relationship between the plurality of task operators according to the structure of the DAG graph. The structure of the established DAG graph comprises task nodes corresponding to a plurality of task operators, any one task node is used as an initial task node, and a complete task path formed according to the execution sequence of the task operators has no closed loop.

Exemplarily, fig. 3 is a schematic diagram of a DAG structure of task data of a target task according to an embodiment of the present application. As shown in fig. 3, a first task operator 301, a second task operator 302, and a third task operator 303 are included. Wherein the first task operator 301 may be used to obtain work order data. The second task operators 302 include word segmentation operators 3021, vectorization operators 3022, feature merging operators 3023, text classification operators 3024, and word frequency statistics operators 3025. The word frequency statistical operator 3025 is configured to perform word frequency statistics on the data after the text classification, and extract N words with the highest word frequency from the data. The third task operator 303 is configured to output the N vocabularies with the highest word frequency to the HBase result table and the HDFS file for storage. In other embodiments, a plurality of task operators corresponding to the second task operator 302 may also be combined into one combined operator, and the combined operator may be configured to perform word segmentation, vectorization, feature merging, text classification, and other operations on the work order data, perform word frequency statistics on the data after the text classification, and extract N words with the highest word frequency from the data.

In some embodiments, after generating the task data of the target task, the terminal device may further receive a test request triggered by a user; determining test data corresponding to the target task from the test set data according to the test request; and then, determining a test result corresponding to the test data according to the test data and the task data. The method and the device for determining the test data corresponding to the target task are not limited, and in some embodiments, the test data corresponding to the target task can be determined according to the type of the target task.

S203, the terminal device sends a processing request of the target task to the server, and the processing request comprises task data of the target task.

In this step, after generating the task data of the target task, the terminal device may send a processing request of the target task to the server.

The time for the terminal device to send the processing request of the target task to the server is not limited in the embodiment of the application. In some embodiments, the terminal device may directly send a processing request of the target task to the server after generating the task data of the target task. In other embodiments, the terminal device may send a processing request of the target task to the server after the test result corresponding to the test data is successful.

According to the data processing method, firstly, in response to an adding request triggered by a user, a plurality of task operators are added into original data of a target task, and the task operators are used for triggering operation on to-be-processed data of the target task; then, establishing a dependency relationship among a plurality of task operators to generate task data of the target task; and finally, sending a processing request of the target task to the server, wherein the processing request comprises task data of the target task. By the method, the terminal device can generate the task data of the target task according to the plurality of task operators and send the processing request containing the task data to the server, and the server can complete the whole processing flow of the data to be processed of the target task according to the processing request, so that the processing efficiency of the server on the data to be processed of the target task is improved.

On the basis of the above embodiment, how the server determines the result data corresponding to the data to be processed according to the processing request is described below. Fig. 4 is a schematic flowchart of another data processing method provided in an embodiment of the present application, and as shown in fig. 4, the method includes:

s401, the server receives a processing request of a target task sent by the terminal equipment.

In this embodiment of the present application, the terminal device may first receive a processing request of a target task sent by the terminal device.

The processing request comprises task data of a target task, and the task data of the target task is generated by establishing a dependency relationship among a plurality of task operators.

S402, the server determines result data corresponding to the data to be processed according to the task data of the target task.

In this step, after receiving the processing request of the target task, the server may determine, according to the task data of the target task, result data corresponding to the data to be processed.

The embodiment of the application does not limit how the server determines the result data corresponding to the data to be processed. In some embodiments, the server may determine result data corresponding to the data to be processed according to a first task operator, at least one second task operator, and a third task operator included in the task data of the target task. Illustratively, the server may obtain the data to be processed according to the first task operator and the obtaining path; then, processing or performing statistical analysis on the data to be processed according to at least one second task operator, and determining result data corresponding to the data to be processed; and finally, outputting result data corresponding to the data to be processed according to the output path according to the third task operator.

The embodiment of the application does not limit how the server determines the result data corresponding to the data to be processed according to the task data. In some embodiments, the result data corresponding to the data to be processed may be determined according to the execution order of the task operators in the task data through a graph traversal algorithm. Illustratively, the graph traversal algorithm may include a Breadth-first search algorithm (BFS), or the like.

The embodiment of the application does not limit the type of the result data correspondingly output by the data to be processed. In some embodiments, the type of result data corresponding to the data to be processed may be one type or multiple types. In other embodiments, the output path of the result data may be determined according to the storage form of the result data. The storage form of the result data may include a relational Database, Comma-Separated Values (CSV) File, data serialization (Avro) File, Distributed File System (HDFS), non-relational Distributed Database (HBase table), and the like. In some embodiments, the server may also store intermediate data generated in the determination of the resulting data. Illustratively, the server may send the intermediate data to a data receiving server or to a distributed message system (Kafka) or the like via a Remote Procedure Call (RPC).

The embodiment of the application does not limit how to transfer the processing or statistical data between different task operators. In some embodiments, a distributed data set may be utilized to transfer data between different task operators. Illustratively, Spark DataFrame may be used to transfer data between different task operators. In other embodiments, the second task operator only has one output data, which is not limited in this embodiment of the present application.

It should be understood that the result data corresponding to the data to be processed can be applied to some business systems according to business requirements. The embodiment of the application does not limit the way in which the server sends the result data to the terminal device. In some embodiments, the processing request of the target task may further include an identifier of the target service system, and the server may push the result data to the terminal device corresponding to the target service system in a data service manner according to the configured scheduling task and the identifier of the target service system. In other embodiments, the server may provide a unified data access interface, receive a data access request sent by the terminal device through the data access interface, and send result data corresponding to the data access request to the terminal device.

It should be noted that, in other embodiments, after receiving the processing request of the target task, the server may further collect, according to a preset monitoring index, monitoring data corresponding to task data of the target task in the running state. The monitoring index can be used for indicating whether the running state of the task data of the target task is normal or not. The embodiment of the application does not limit the specific content of the monitoring index, and can be specifically set according to the actual situation. In some embodiments, the monitoring metrics may include task progress, resource usage, number of records, number of exception records, running logs, other custom metrics, and the like.

In other embodiments, the DAG graph may need to be checked before determining the result data corresponding to the data to be processed. The verification is divided into a plurality of layers, including the verification of metadata, namely, the time ranges of the data to be processed and the result data and the related data models are verified in the input and output stages of the data; checking operator parameters corresponding to the task operator, wherein the checking comprises non-null checking of parameter values, validity checking of the parameter values and the like; and checking a DAG graph structure corresponding to the task data of the target task, wherein the checking comprises checking whether a cyclic dependency relationship exists in the DAG graph structure, checking whether a dependency relationship which should not exist according to a constraint rule exists, checking whether the input and output number of each operator is correct, and the like.

The technical terms, technical effects, technical features, and alternative embodiments of S401 to S402 can be understood with reference to S201 to S203 shown in fig. 2, and repeated descriptions thereof will not be repeated here.

On the basis of the above-described embodiments, a data processing system according to an embodiment of the present application will be described below. Fig. 5 is a schematic structural diagram of a data processing system according to an embodiment of the present application, and as shown in fig. 5, the data processing system includes: the system comprises a data management unit 501, an intelligent analysis unit 502, a task monitoring unit 503 and a business linkage policy management unit 504.

In the data management unit 501, the data management unit is mainly responsible for acquiring newly added data in the target data source device, and integrating and classifying the acquired data; and then storing the classified newly added data in a database according to different service requirements. The new data may include information, customer service order, and newspaper, and the like, which is not limited in this embodiment of the application. The server can access the newly added data from the data source equipment in real time, and can also receive the newly added data uploaded by the data source equipment at regular time. The storage type of the newly added data may include a Database type, a disk File type, a Distributed File System (HDFS) type, a non-relational Distributed Database (HBase) type, a Distributed message System (Kafka topoic) type, a search engine type (ES), and the like, which is not limited in this embodiment of the present disclosure. The storage information of the added data may include a file path or a stored table name, and the data file and the data record in the added data use a timestamp to indicate the generation time of the data.

In the intelligent analysis unit 502, a data processing platform may be built based on big data technology (e.g., Hadoop, Spark, HBase, ES, etc.), and a uniform interface for compiling task operators is provided. Task data of a target task can be constructed by combining a Directed Acyclic Graph (DAG), and processing or statistical data between different task operators can be transferred by using a distributed data set (such as Spark DataFrame). Through the visualized icon link or name link of the task operator, the type information and the function description information of the task operator and the operator parameter metadata information table corresponding to the function description information can be displayed.

In the task monitoring unit 503, a monitoring system for performing fine-grained monitoring and visual management on the target analysis task is provided. The monitoring service component is constructed based on monitoring service interfaces (such as Spark Listener, Accumulator, and monitoring service interfaces of Yarn, Livy, Spark, etc.). Index monitoring and real-time tracking of the target task are achieved through mechanisms such as a Spark Listener (Spark Listener) and an accumulator and a custom Remote Procedure Call (RPC) framework. Meanwhile, the system monitoring indexes are obtained in real time through the Yarn, Spark and Livy related monitoring interfaces, and the monitoring indexes are visually displayed.

The set monitoring indexes comprise task progress, resource use condition, record number, abnormal record number, running logs, user-defined indexes and the like. It should be understood that for one business application scenario, multiple target analysis tasks may need to be constructed, and scheduling dependencies exist between the multiple target analysis tasks. Therefore, the monitoring measures are divided into two levels, the first level of monitoring is monitoring of a scheduling task level between target analysis tasks, and the second level of monitoring is monitoring of specific analysis tasks. In the running process of the target analysis task, the monitoring service component can continuously refresh the monitoring interface, and update the monitoring data corresponding to the monitoring index through Remote Procedure Call (RPC).

In the service linkage policy management unit 504, linkage modes between different service systems include two modes, namely service interface calling and data service. It should be appreciated that different business systems or different functions of the same business system may require similar targeted analysis tasks or monitoring to be performed on the same data to be processed. Therefore, the processing result corresponding to the data to be processed can be applied to a specific service scene in the target service system through the service linkage policy management platform.

Technical terms, technical effects, technical features, and alternative embodiments of the present application can be understood with reference to S201 to S203 shown in fig. 2, and repeated descriptions thereof will not be repeated here.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer readable storage medium, and when executed, performs steps comprising the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing device may be implemented by software, hardware or a combination of both, and may be, for example, the terminal device in the foregoing embodiments, to execute the data processing method in the foregoing embodiments. As shown in fig. 6, the data processing apparatus 600 includes:

a response module 601, configured to add, in response to an addition request triggered by a user, a plurality of task operators in original data of a target task, where the task operators are used to trigger an operation on to-be-processed data of the target task;

a generating module 602, configured to establish a dependency relationship among multiple task operators, and generate task data of a target task;

the sending module 603 is configured to send a processing request of the target task to the server, where the processing request includes task data of the target task.

In an optional implementation manner, the response module 601 is specifically configured to add a plurality of task operators in the original data of the target task according to a preset addition sequence of the task operators.

In an optional implementation manner, the preset adding sequence of the task operators is a first task operator, at least one second task operator and a third task operator in sequence; the first task operator is used for acquiring data to be processed, the second task operator is used for processing or counting the data to be processed, and the third task operator is used for outputting result data corresponding to the data to be processed.

In an optional implementation manner, the response module 601 is further configured to determine operator parameters corresponding to a plurality of task operators included in the addition request; and configuring the plurality of task operators according to the operator parameters respectively corresponding to the plurality of task operators.

In an optional implementation manner, if the task operator is a first task operator, the response module 601 is specifically configured to configure an acquisition path of the data to be processed by the first task operator according to an operator parameter corresponding to the first task operator.

In an optional implementation manner, if the task operator is a third task operator, the response module 601 is specifically configured to configure an output path of the third task operator to the result data according to an operator parameter corresponding to the third task operator.

In an optional embodiment, the generating module 602 is further configured to determine, in response to a test request triggered by a user, test data corresponding to a target task from test set data; and determining a test result corresponding to the test data according to the test data and the task data.

In an optional implementation manner, the sending module 603 is specifically configured to send a processing request of the target task to the server if the test result corresponding to the test data is successful.

In an optional implementation manner, the generating module 602 is specifically configured to establish a dependency relationship between multiple task operators according to a structure of a directed acyclic graph, where the structure of the directed acyclic graph includes task nodes corresponding to the multiple task operators, and a task path between the task nodes has no closed loop.

It should be noted that the data processing apparatus provided in the embodiment shown in fig. 6 may be used to execute the data processing method provided in the foregoing embodiment, and the specific implementation manner and the technical effect are similar, and are not described again here.

Fig. 7 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application. The data processing device may be implemented by software, hardware or a combination of the two, and may be, for example, the server in the above embodiment, to execute the data processing method in the above embodiment. As shown in fig. 7, the data processing apparatus 700 includes:

a receiving module 701, configured to receive a processing request of a target task sent by a terminal device, where the processing request includes task data of the target task, and the task data of the target task is generated by establishing a dependency relationship among multiple task operators;

the processing module 702 is configured to determine, according to the task data of the target task, result data corresponding to the data to be processed.

In an optional embodiment, the task data of the target task includes a first task operator, at least one second task operator, and a third task operator; the first task operator is used for acquiring data to be processed according to the acquisition path, the at least one second task operator is used for determining result data corresponding to the data to be processed, and the third task operator is used for outputting the result data corresponding to the data to be processed according to the output path.

In an optional implementation manner, the processing request of the target task includes an identifier of the target service system, and the processing module 702 is further configured to send the result data to the terminal device corresponding to the target service system according to the identifier of the target service system.

In an optional implementation manner, the processing module 702 is further configured to collect, according to a preset monitoring index, monitoring data corresponding to task data of a target task in an operating state, where the monitoring index is used to indicate whether the operating state of the task data of the target task is normal.

It should be noted that the data processing apparatus provided in the embodiment shown in fig. 7 may be configured to execute the data processing method provided in the foregoing embodiment, and the specific implementation manner and the technical effect are similar, and are not described again here.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 800 may include: at least one processor 801 and a memory 802. Fig. 8 shows an electronic device as an example of a processor.

The memory 802 stores programs. In particular, the program may include program code including computer operating instructions.

Memory 802 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 801 is used for executing computer execution instructions stored in the memory 802 to realize the data processing method;

the processor 801 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present Application.

Alternatively, in a specific implementation, if the communication interface, the memory 802 and the processor 801 are implemented independently, the communication interface, the memory 802 and the processor 801 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.

Alternatively, in a specific implementation, if the communication interface, the memory 802 and the processor 801 are integrated into a chip, the communication interface, the memory 802 and the processor 801 may complete communication through an internal interface.

Fig. 9 is a schematic structural diagram of another electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 may include: at least one processor 901 and memory 902. Fig. 9 shows an electronic device as an example of a processor 901, and the processor 901 is configured to execute computer-executable instructions stored in the memory 902 to implement the above-mentioned data processing method. The structure, function, and the like of each part of the electronic apparatus shown in fig. 9 can be understood with reference to the electronic apparatus shown in fig. 8, and repetitive contents will not be described here.

The embodiment of the application also provides a chip which comprises a processor and an interface. Wherein the interface is used for inputting and outputting data or instructions processed by the processor. The processor is used for executing the method for processing the data taking the terminal device as the execution subject provided in the above method embodiment. The chip can be applied to a data processing device.

The embodiment of the application also provides another chip which comprises a processor and an interface. Wherein the interface is used for inputting and outputting data or instructions processed by the processor. The processor is used for executing the method provided in the above method embodiment and takes the server as the execution subject to process the data. The chip can be applied to a data processing device.

The embodiment of the present application further provides a program, which is used to execute the data processing method taking the terminal device as an execution subject provided in the above method embodiment when the program is executed by a processor.

The embodiment of the present application further provides another program, which is used to execute the processing method of data with the server as the execution subject provided in the above method embodiment when the program is executed by a processor.

The present application further provides a program product, such as a computer-readable storage medium, where instructions are stored in the program product, and when the program product runs on a computer, the program product causes the computer to execute the method for processing data that is mainly executed by a terminal device and provided in the foregoing method embodiments.

The present application further provides another program product, for example, a computer-readable storage medium, where instructions are stored in the program product, and when the program product runs on a computer, the program product causes the computer to execute the method for processing data with a server as an execution subject, provided by the above method embodiment.

The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. Specifically, the computer-readable storage medium has stored therein program information for the processing method of the above-described data.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of processing data, the method comprising:

2. The method of claim 1, wherein adding a plurality of task operators to the original data of the target task comprises:

3. The method according to claim 2, wherein the preset order of adding the task operators is a first task operator, at least one second task operator, and a third task operator in sequence;

4. The method of claim 3, wherein prior to said establishing dependencies between said plurality of task operators, said method further comprises:

5. The method of claim 4, wherein if a task operator is the first task operator, the configuring the plurality of task operators comprises:

6. The method of claim 4, wherein if a task operator is the third task operator, the configuring the plurality of task operators comprises:

7. The method of claim 1, wherein after the generating task data for the target task, the method further comprises:

and determining a test result corresponding to the test data according to the test data and the task data.

8. The method of claim 7, wherein sending the processing request of the target task to the server comprises:

9. The method of claim 1, wherein establishing dependencies between the plurality of task operators comprises:

10. A method of processing data, the method comprising:

11. The method of claim 10, wherein the task data of the target task comprises a first task operator, at least one second task operator, a third task operator;

12. The method according to claim 10, wherein the processing request of the target task includes an identifier of a target business system, and after the determining result data corresponding to the data to be processed, the method further comprises:

13. The method according to claim 10, wherein after receiving the processing request of the target task sent by the terminal device, the method further comprises:

14. An apparatus for processing data, the apparatus comprising:

15. A computer storage medium having computer executable instructions stored thereon, which when executed by a processor, are configured to implement the method of any one of claims 1 to 9.

16. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of claims 1 to 9.

17. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 9 when executed by a processor.