CN111026568A

CN111026568A - Data and task relation construction method and device, computer equipment and storage medium

Info

Publication number: CN111026568A
Application number: CN201911229154.7A
Authority: CN
Inventors: 孙朝和; 申志彬; 谢瑶
Original assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Current assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2020-04-17
Anticipated expiration: 2039-12-04
Also published as: CN111026568B

Abstract

The invention relates to a method and a device for constructing a data and task relationship, computer equipment and a storage medium, wherein the method comprises the steps of acquiring a task creating request to obtain a task to be created; generating related attributes of a task model according to a task to be created to obtain target attributes; determining the dependency relationship between the task to be created and the task in the data model according to the target attribute so as to obtain the blood relationship; analyzing the task to be created to obtain the association relation between the task and the input table and the output table; and updating the relationship among the blood relationship, the tasks, the input table and the output table in the metadata blood relationship management system so that the terminal can position the related tasks according to the metadata blood relationship management system when the data table has problems. The invention realizes the incidence relation between the construction task and the data model, so as to construct the more detailed and clearer blood relationship between the task and the data model, facilitate the quick reading and positioning to the related task, and improve the efficiency of solving the problem.

Description

Data and task relation construction method and device, computer equipment and storage medium

Technical Field

The invention relates to a computer, in particular to a data and task relation construction method, a data and task relation construction device, a computer device and a storage medium.

Background

Big data technology refers to the ability to quickly obtain valuable information from a wide variety of types of data. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, a data mining system, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.

In the metadata blood margin management system of the existing big data platform, blood margin relations of data models of components such as Hive, Kafka and HBase are only recorded, the flow direction of data in the components can be better seen through the metadata management system, but the current system still cannot meet the requirements of checking the blood margin relations among tasks, checking data tables of the components related to each task and checking the task of the data models of the components, when the data tables are too many, and some data tables have problems, the related tasks cannot be quickly positioned to process the problems due to the fact that the requirements cannot be met, and the efficiency of solving the problems is reduced.

Therefore, it is necessary to design a new method to implement the association between the construction task and the data model, so as to construct a more detailed and clearer blood relationship between the task and the data model, facilitate the quick reading and positioning of the relevant task, and improve the efficiency of solving the problem.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a data and task relation construction method, a data and task relation construction device, a computer device and a storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme: the data and task relation construction method comprises the following steps:

acquiring a task creating request to obtain a task to be created;

generating related attributes of a task model according to a task to be created to obtain target attributes;

determining the dependency relationship between the task to be created and the task in the data model according to the target attribute so as to obtain a blood relationship;

analyzing the task to be created to obtain an association relation between the task and an input table and an output table;

and updating the relationship among the blood relationship, the tasks, the input table and the output table in the metadata blood relationship management system so as to position the related tasks according to the metadata blood relationship management system when the data table has problems.

The further technical scheme is as follows: the acquiring the task creation request to obtain the task to be created further includes:

an initial task model is defined at the metadata consanguinity management system.

The further technical scheme is as follows: the generating of the relevant attributes of the task model according to the task to be created to obtain the target attributes comprises:

and calling a corresponding initial task model in the metadata blood margin management system according to the task to be created, and generating related attributes of the task model to obtain the target attributes.

The further technical scheme is as follows: the target attributes comprise a task name, a task type, a task identifier, a task creator and task creation time.

The further technical scheme is as follows: the analyzing the task to be created to obtain the association relationship between the task and the input table and the output table comprises the following steps:

acquiring a script related to a task to be created in a storage process;

analyzing the task to be created by adopting a script analysis engine corresponding to the script to obtain an input source and an output target of the task to be created;

establishing an incidence relation between a task and an input table and between the input table and the task according to an input source of the task to be established;

and establishing an association relation between the tasks and the output table and an association relation between the output table and the tasks according to the output target of the task to be established.

The invention also provides a data and task relation construction device, which comprises:

the request acquisition unit is used for acquiring a task creation request to obtain a task to be created;

the attribute generation unit is used for generating the relevant attributes of the task model according to the task to be created so as to obtain the target attributes;

the blood relationship building unit is used for determining the dependency relationship between the task to be created and the task in the data model according to the target attribute so as to obtain a blood relationship;

the analysis unit is used for analyzing the task to be created to obtain the association relation between the task and the input table and the output table;

and the updating unit is used for updating the relationship among the blood relationship, the tasks, the input table and the output table into the metadata blood relationship management system so as to be convenient for positioning to the related tasks according to the metadata blood relationship management system when the data table has problems.

The further technical scheme is as follows: further comprising:

and the model defining unit is used for defining an initial task model in the metadata blood margin management system.

The further technical scheme is as follows: the parsing unit further includes:

the script obtaining subunit is used for obtaining a script related to the task to be created in the storage process;

the engine analysis subunit is used for analyzing the task to be created by adopting a script analysis engine corresponding to the script so as to obtain an input source and an output target of the task to be created;

the first relation building subunit is used for building the incidence relation between the tasks and the input table and between the input table and the tasks according to the input source of the task to be built;

and the second relation construction subunit is used for constructing the association relation between the task and the output table and between the output table and the task according to the output target of the task to be created.

The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.

The invention also provides a storage medium storing a computer program which, when executed by a processor, is operable to carry out the method as described above.

Compared with the prior art, the invention has the beneficial effects that: according to the invention, the correlation attributes are generated in the task creating process, the dependency relationship between the tasks is determined according to the attributes, the task to be created is analyzed, and the correlation relationship between the tasks and the table, the table and the tasks is constructed according to the correlation attributes, so that the correlation relationship between the tasks and the data model is constructed, the more detailed and clearer blood relationship between the tasks and the data model is constructed, the quick reading and positioning to the related tasks are facilitated, and the problem solving efficiency is improved.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a data and task relationship construction method provided in an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a data and task relationship construction method according to an embodiment of the present invention;

FIG. 3 is a sub-flow diagram of a data and task relationship construction method provided by an embodiment of the present invention;

FIG. 4 is a diagram illustrating data and task relationships provided by an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a data and task relationship construction method according to another embodiment of the present invention;

FIG. 6 is a schematic block diagram of a data and task relationship building apparatus provided by an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a parsing unit of a data and task relationship building apparatus provided by an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a data and task relationship construction apparatus provided by another embodiment of the present invention;

FIG. 9 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a data and task relationship construction method according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a data and task relationship building method provided by an embodiment of the present invention. The data and task relationship construction method is applied to a server with a metadata consanguineous relationship management system. The server can be a server in a distributed service platform, the server performs data interaction with a terminal, in addition, the server also performs data interaction with an actuator so as to execute a task, generates an association relation between a data model and the task by means of a request initiated by the terminal, and updates the association relation into a metadata blood relationship management system, and once a data table has a problem, information in the metadata blood relationship management system can be called to perform positioning of the relevant task so as to solve the problem quickly.

Fig. 2 is a flowchart illustrating a data and task relationship building method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S150.

And S110, acquiring a task creating request to obtain a task to be created.

In this embodiment, the task to be created refers to information such as a type and an actuator related to the task, for example, if a task capable of automatically identifying a category of a commodity needs to be created, the related type is a classification model, and the actuator may be an actuator where the classification model is located, that is, a task for automatically identifying the category by calling the actuator.

And S120, generating the relevant attributes of the task model according to the task to be created to obtain the target attributes.

In this embodiment, the target attribute refers to an attribute that needs to be constructed in a process of generating a task model by using the task to be created, and specifically, the target attribute includes a task name, a task type, a task identifier, a task creator, and task creation time.

Specifically, a corresponding initial task model in the metadata consanguinity management system is called according to the task to be created, and relevant attributes of the task model are generated to obtain target attributes.

A corresponding initial task model is defined in a metadata blood-edge relationship management system, when a task scheduling system creates a task, information required by the task model is acquired, and data in a JSON format is generated, such as { "jobName": stg _ m _ i _ h _ base _ address "," projName ": stg", "jobType": sqoop "," uuid ": stg.

S130, determining the dependency relationship between the task to be created and the task in the data model according to the target attribute so as to obtain a blood relationship;

in the present embodiment, the blood-related relationship refers to a dependency relationship between the task to be created and the task within the data model.

The task scheduling system can construct a dependency relationship between two tasks, and can obtain an item, a task name and an item and a task name which are depended by the current task, such as { "project name": bi "," jobName ": bi _ base _ address", "depproject name": stg "," depJobName ": stg _ m _ i _ h _ base _ address", and a bloody-edge relationship { "lineaddress": stg.

S140, analyzing the task to be created to obtain the association relation between the task and the input table and the output table.

In this embodiment, the association relationship between the task and the input TABLE and the output TABLE refers to the relationship between the input TABLE and the output TABLE associated with the task and the relationship between the input TABLE and the output TABLE and the task, as shown in fig. 4, wherein JOB a, JOB B, and JOB C refer to the task, TABLE a, TABLE B, and TABLE C refer to the input TABLE and the output TABLE, and arrows in the figure refer to the association relationship.

In an embodiment, referring to fig. 3, the step S140 may include steps S141 to S144.

And S141, acquiring a script related to the task to be created in the storage process.

Various related scripts, such as sql scripts and sql oop scripts, are stored in the task to be created, the scripts are actual execution contents of the task, the type of the related scripts is determined according to the type of the task, for example, the type of the task is a task for operating data in a database, and the type of the script is the sql script.

And S142, analyzing the task to be created by adopting a script analysis engine corresponding to the script to obtain an input source and an output target of the task to be created.

Different script analysis engines are selected according to different task types to obtain input sources of the tasks to be created, such as mysql table, hive table, hdfs path, hbase table and the like, and also obtain output targets of the tasks to be created, such as mysql table, hive table, hdfs path, hbase table and the like.

S143, establishing an incidence relation between the tasks and the input table and between the input table and the tasks according to the input source of the task to be established;

s144, establishing an association relation between the tasks and the output table and between the output table and the tasks according to the output target of the task to be established.

Constructing a task and a table, a table and a relation of the task and then updating the relation into a metadata context management system, wherein the relation of the task and the table comprises { "type": joba "," jobName ": stg _ m _ i _ h _ base _ address", "objectName": stg "," jobType ": sqoop", "uuid": stg.stg.stg _ m _ i _ h _ base _ address "," jobType ": sqoop", "referrence": [ { "uuid": ip: host: db1.table A "," name ": table A" }, { "uuid": ip: host: 2.table B "," name ": table B" };

table and task relationships:

{“type”:”table”,“name”:”tableA”,”uuid”:”ip:host:db1.tableA”,”references”:[{“uuid”:”stg.stg_m_i_h_base_address”,”name”:”stg_m_i_h_base_address”}]}。

taking the Hive task as an example, the Hive sql is analyzed to relate to an input source and an output target, and the association relationship between the task and a table, the table and the task is constructed, wherein the table comprises an input table and an output table.

S150, updating the relationship among the blood relationship, the tasks, the input table and the output table in the metadata blood relationship management system so that the related tasks can be positioned according to the metadata blood relationship management system when the data table has problems.

The metadata updating consanguinity management system can enable the consanguinity relation between the tasks and the data model to be more detailed and clear, so that when some data tables have problems, related tasks can be quickly positioned to process the problems, and the problem solving efficiency is improved.

Data model information of each component in the big data component is obtained through analyzing the task information in the task scheduling system, and an incidence relation between the task and the data model is established, so that a more detailed and clearer blood-related relation between the task and the data model is achieved.

According to the data and task relationship construction method, the relevant attributes are generated in the task creating process, the dependency relationship between the tasks is determined according to the attributes, the to-be-created tasks are analyzed, and the association relationship between the tasks and the table and the association relationship between the table and the tasks are constructed according to the to-be-created tasks, so that the association relationship between the tasks and the data model is constructed, the more detailed and clearer blood relationship between the tasks and the data model is constructed, the quick reading and positioning to the relevant tasks are facilitated, and the problem solving efficiency is improved.

Fig. 5 is a flowchart illustrating a data and task relationship building method according to another embodiment of the present invention. As shown in fig. 5, the data and task relationship construction method of the present embodiment includes steps S210 to S260. Steps S220 to S260 are similar to steps S110 to S150 in the above embodiments, and are not described herein again. The added step S210 in the present embodiment is explained in detail below.

S210, defining an initial task model in the metadata blood reason management system.

A model of a task entity is well defined in a metadata consanguinity management system, wherein the model includes but is not limited to attributes such as task names, task types, task identifications, task creators and task creation time, and an initial task model refers to a model for recording related attributes related to different types of tasks.

Fig. 6 is a schematic block diagram of a data and task relationship building apparatus 300 according to an embodiment of the present invention. As shown in fig. 6, the present invention also provides a data and task relationship building apparatus 300 corresponding to the above data and task relationship building method. The data and task relationship construction apparatus 300 includes a unit for performing the above-described data and task relationship construction method, and may be configured in a server. Specifically, referring to fig. 6, the data and task relationship building apparatus 300 includes a request obtaining unit 302, an attribute generating unit 303, a blood relationship building unit 304, a parsing unit 305, and an updating unit 306.

A request obtaining unit 302, configured to obtain a task creation request to obtain a task to be created; the attribute generation unit 303 is configured to generate a relevant attribute of the task model according to the task to be created, so as to obtain a target attribute; a blood relationship establishing unit 304, configured to determine, according to the target attribute, a dependency relationship between the task to be created and a task in the data model, so as to obtain a blood relationship; the analysis unit 305 is configured to analyze the task to be created to obtain an association relationship between the task and the input table and the output table; the updating unit 306 is configured to update the relationship between the blood relationship and the task and the association between the input table and the output table in the metadata blood relationship management system, so that when a problem occurs in the data table, the related task can be located according to the metadata blood relationship management system.

In an embodiment, as shown in fig. 7, the parsing unit 305 includes a script obtaining sub-unit 3051, an engine parsing sub-unit 3052, a first relationship building sub-unit 3053, and a second relationship building sub-unit 3054.

The script obtaining sub-unit 3051 is configured to obtain a script related to the task to be created in the storage process; the engine analysis subunit 3052 is configured to analyze the task to be created by using a script analysis engine corresponding to the script, so as to obtain an input source and an output target of the task to be created; the first relation construction subunit 3053, configured to construct, according to an input source of the task to be created, an association relation between the task and the input table, and between the input table and the task; and the second relation building subunit 3054, configured to build, according to the output target of the task to be created, an association relation between the task and the output table, and an association relation between the output table and the task.

Fig. 8 is a schematic block diagram of a data and task relationship construction apparatus 300 according to another embodiment of the present invention. As shown in fig. 8, the data and task relationship building apparatus 300 of the present embodiment is the above-described embodiment, and is added with a model definition unit 301.

A model definition unit 301, configured to define an initial task model in the metadata blood margin management system.

It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation processes of the data and task relationship constructing apparatus 300 and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and conciseness of description, no further description is provided herein.

The data and task relationship construction apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 9.

Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 9, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 include program instructions that, when executed, cause the processor 502 to perform a data and task relationship construction method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be caused to execute a data and task relationship construction method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

acquiring a task creating request to obtain a task to be created; generating related attributes of a task model according to a task to be created to obtain target attributes; determining the dependency relationship between the task to be created and the task in the data model according to the target attribute so as to obtain a blood relationship; analyzing the task to be created to obtain an association relation between the task and an input table and an output table; and updating the relationship among the blood relationship, the tasks, the input table and the output table in the metadata blood relationship management system so as to position the related tasks according to the metadata blood relationship management system when the data table has problems.

In an embodiment, before implementing the step of obtaining the task creation request to obtain the task to be created, the processor 502 further implements the following steps:

In an embodiment, when the processor 502 implements the step of generating the relevant attributes of the task model according to the task to be created to obtain the target attributes, the following steps are specifically implemented:

The target attributes comprise task names, task types, task identifiers, task creators and task creation time.

In an embodiment, when the processor 502 implements the step of analyzing the task to be created to obtain the association relationship between the task and the input table and the output table, the following steps are specifically implemented:

acquiring a script related to a task to be created in a storage process; analyzing the task to be created by adopting a script analysis engine corresponding to the script to obtain an input source and an output target of the task to be created; establishing an incidence relation between a task and an input table and between the input table and the task according to an input source of the task to be established; and establishing an association relation between the tasks and the output table and an association relation between the output table and the tasks according to the output target of the task to be established.

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:

In an embodiment, before the step of executing the computer program to implement the get task creation request to obtain the task to be created, the processor further implements the following steps:

In an embodiment, when the processor executes the computer program to implement the step of generating the relevant attributes of the task model according to the task to be created to obtain the target attributes, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of analyzing the task to be created to obtain the association relationship between the task and the input table and the output table, the following steps are specifically implemented:

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The data and task relation construction method is characterized by comprising the following steps:

acquiring a task creating request to obtain a task to be created;

2. The method for constructing data and task relationships according to claim 1, wherein before obtaining the task creation request to obtain the task to be created, the method further comprises:

3. The method for constructing data and task relationships according to claim 1, wherein generating the relevant attributes of the task model according to the task to be created to obtain the target attributes comprises:

4. The data and task relationship construction method of claim 3, wherein the target attributes include task name, task type, task identification, task creator, and task creation time.

5. The method for constructing the data and task relationship according to claim 1, wherein the analyzing the task to be created to obtain the association relationship between the task and the input table and the output table comprises:

acquiring a script related to a task to be created in a storage process;

6. The data and task relation building device is characterized by comprising the following components:

7. The data and task relationship building apparatus of claim 6, further comprising:

8. The data and task relationship building apparatus of claim 7, wherein the parsing unit further comprises:

9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-5 when executing the computer program.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.