CN111158800B - Method and device for constructing task DAG based on mapping relation - Google Patents

Method and device for constructing task DAG based on mapping relation Download PDF

Info

Publication number
CN111158800B
CN111158800B CN201911419978.0A CN201911419978A CN111158800B CN 111158800 B CN111158800 B CN 111158800B CN 201911419978 A CN201911419978 A CN 201911419978A CN 111158800 B CN111158800 B CN 111158800B
Authority
CN
China
Prior art keywords
task
stage
entity
mapping
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911419978.0A
Other languages
Chinese (zh)
Other versions
CN111158800A (en
Inventor
堵新政
张毅然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911419978.0A priority Critical patent/CN111158800B/en
Publication of CN111158800A publication Critical patent/CN111158800A/en
Application granted granted Critical
Publication of CN111158800B publication Critical patent/CN111158800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • G06F9/4484Executing subprograms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a method and a device for constructing a task DAG based on a mapping relation, wherein the method comprises the following steps: respectively creating a source entity and a target entity according to the acquired source table and target table; establishing a mapping relation between a source entity and a target entity, wherein the mapping relation comprises table mapping and field mapping; generating tasks of each stage of business data management; and constructing a task DAG structure through the dependency relationship of the tasks in each stage. In the invention, the DAG of the tasks in each stage is automatically constructed by establishing the mapping relation between the source entity and the target entity, and information required by executing each task is generated, thereby reducing the workload of manual participation and improving the efficiency of DAG construction.

Description

Method and device for constructing task DAG based on mapping relation
Technical Field
The invention relates to the field of data management, in particular to a method and a device for constructing a task DAG based on a mapping relation.
Background
In the data management process, tasks are generally disassembled into a plurality of subtasks according to different stages of data processing, and the subtasks have sequential dependency relationships to form a Directed Acyclic Graph (DAG) structure of the subtasks. The construction of a task DAG typically has two ways, one is a DAG that is automatically generated using tools or techniques, such as spark tasks; one is through the artificial DAG of constructing according to the business flow of data handling, for example can construct DAG acquisition- > conversion- > mapping- > integration according to the data governance flow. For the second scenario, a common way is to create each subtask on the workflow canvas, configure information such as execution command, parameters, etc. of each subtask, and then establish a dependency relationship between the tasks. This approach is clearly inefficient for construction for many tasks.
Disclosure of Invention
The embodiment of the invention provides a method and a device for constructing a task DAG based on a mapping relation, which are used for at least solving the problem of low DAG construction efficiency of tasks in a data management flow in the related technology.
According to an embodiment of the present invention, there is provided a method for constructing a task DAG based on a mapping relationship, including: respectively creating a source entity and a target entity according to the acquired source table and target table; establishing a mapping relation between a source entity and a target entity, wherein the mapping relation comprises table mapping and field mapping; generating tasks of each stage of business data management; and constructing a task DAG structure through the dependency relationship of the tasks in each stage.
Optionally, the tasks of generating the various phases of business data governance include: dividing each stage in the data treatment process according to the business processing flow, and defining template information of task execution of each stage; and generating entity tasks from the source entity to each stage of the target entity according to the defined data management stage, and generating instance information of task execution according to template information of task execution of each stage.
Optionally, building the task DAG structure by dependencies of the tasks of the respective phases includes: and constructing a DAG structure of the task according to the task dependency relationship generated when the task of each stage is generated.
Optionally, the task dependency includes at least one of: the dependency relationship between the task and the father task, and the task is a root task.
According to another embodiment of the present invention, there is provided an apparatus for constructing a task DAG based on a mapping relationship, including: the acquisition module is used for respectively creating a source entity and a target entity according to the acquired source table and target table; the mapping module is used for establishing a mapping relation between the source entity and the target entity, wherein the mapping relation comprises table mapping and field mapping; the task module is used for generating tasks of each stage of business data management; and the construction module is used for constructing a task DAG structure through the dependency relationship of the tasks of each stage.
Optionally, the task module includes: the dividing unit is used for dividing each stage of the data treatment process according to the business processing flow and defining template information of task execution of each stage; the generating unit is used for generating entity tasks from the source entity to each stage of the target entity according to the defined data management stage, and generating instance information of task execution according to template information of task execution of each stage.
Optionally, the building module includes: and the construction unit is used for constructing the DAG structure of the task according to the task dependency relationship generated when the task of each stage is generated.
Optionally, the task dependency includes at least one of: the dependency relationship between the task and the father task, the task is the root task
According to a further embodiment of the invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In the embodiment of the invention, the DAG of each stage of task of data management is automatically constructed by establishing the mapping relation between the source entity and the target entity, and information required by execution of each task is generated, so that the workload of manual participation is reduced, and the efficiency of DAG construction is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of building a task DAG based on a mapping relationship according to an embodiment of the invention;
FIG. 2 is a flow chart of a build task DAG according to an embodiment of the invention;
FIG. 3 is a flow chart of a build task DAG according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a task DAG according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a build task DAG device architecture according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a build task DAG device architecture according to an alternative embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In this embodiment, a method for constructing a task DAG based on a mapping relationship is provided, and fig. 1 is a flowchart of a method according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
step S102, respectively creating a source entity and a target entity according to the acquired source table and target table;
step S104, establishing a mapping relation between a source entity and a target entity, wherein the mapping relation comprises table mapping and field mapping;
step S106, generating tasks of each stage of business data management;
step S108, constructing a task DAG structure through the dependency relationship of the tasks in each stage.
In step S106 of the present embodiment, the following steps may be included: dividing each stage in the data treatment process according to the business processing flow, and defining template information of task execution of each stage; and generating entity tasks from the source entity to each stage of the target entity according to the defined data management stage, and generating instance information of task execution according to template information of task execution of each stage.
In step S108 of the present embodiment, a DAG structure of each stage task is constructed from the task dependency relationship generated when the task is generated.
In step S108 of the present embodiment, the task dependency relationship includes at least one of the following: the dependency relationship between the task and the father task, and the task is a root task.
In order to facilitate understanding of the technical solutions provided by the present invention, the following detailed description will be made with reference to embodiments of specific scenarios.
The embodiment provides a method for automatically constructing a task DAG based on a mapping relation. In this embodiment, a source entity and a target entity are created based on a source table and a target table, a mapping relationship between the source entity and the target entity is created, tasks of each stage of business data management are generated, and a task DAG is constructed.
As shown in fig. 2, the method of this embodiment mainly includes the following steps:
step S201, creating a source entity and a target entity: and respectively creating a source entity and a target entity according to the acquired source table and target table.
Step S202, defining a data management stage: the data management process divides each stage according to the business processing flow, and defines Template information (Template) for executing each stage task.
Step S203, mapping is established: and establishing a Mapping relation (Mapping) between the source entity and the target entity, wherein the Mapping relation comprises table Mapping and field Mapping.
Step S204, generating a stage task: according to the defined data management stage, generating entity tasks (tasks) of each stage from a source entity to a target entity through a Task builder, and generating instance information of Task execution according to template information of Task execution of each stage.
Step S205, constructing a task DAG: the dependency relationship between the task and the parent task can be generated in the generation stage task, and the task without the parent task is a root task, namely the first stage task of data management. Through the dependency relationship of the stage task, the DAGBuilder constructs a task DAG structure.
The above steps are described in detail below in connection with specific examples. In the present embodiment, it is assumed that the source table a (table field has a1, a2, a 3), the source table B (table field has B1, B2, B3), and the target table T (table field has T1, T2, T3).
As shown in fig. 3, the method mainly comprises the following specific steps:
step S301, creating source entities EA (a 1, a2, a 3) and EB (b 1, b2, b 3) according to the source table A and the target table T, and target entities ET (T1, T2, T3) respectively;
step S302, assume that each stage task defining data governance is 4 stages of acquisition (S1), pretreatment (S2), cleaning (S3) and fusion (S4); the task execution template information of each stage corresponds to S1-CMD, S2-CMD and S4-CMD respectively;
step S303, establishing a mapping relation between source entities EA, EB and a target entity ET, and EA-ET (a 1-t1, a2-t2, a3-t 3) and EB-ET (b 1-t1, b2-t2, b3-t 3);
step S304, mapping EA-ET to the source entity and the target entity, generating 4 stage tasks defined in step S302: TASK-EA-S1, TASK-EA-S2, TASK-EA-ET-S3; TASK-EB-S1, TASK-EB-S2, TASK-EB-ET-S3; TASK-ET-S4;
step S305, generating the DAG structure of the task generated in step S304 according to the dependency relationship of the stage task, as shown in fig. 4.
In the embodiment, the task DAG is constructed based on the mapping relation, so that the workload of manual participation configuration is greatly reduced, the construction efficiency of the task DAG is improved, and the correctness of the task dependency relation can be ensured.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The present embodiment also provides a device for constructing a task DAG based on a mapping relationship, which is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" or "unit" may be a combination of software and/or hardware that implements the predetermined functionality. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 5 is a block diagram of an apparatus for constructing a task DAG based on a mapping relationship according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes an acquisition module 10, a mapping module 20, a task module 30, and a construction module 40.
The acquiring module 10 is configured to create a source entity and a target entity according to the acquired source table and target table, respectively.
The mapping module 20 is configured to establish a mapping relationship between the source entity and the target entity, where the mapping relationship includes a table mapping and a field mapping.
And a task module 30, configured to generate tasks of each stage of business data governance.
A building module 40, configured to build a task DAG structure according to the dependency relationship of the tasks in each stage.
Fig. 6 is a block diagram of an apparatus for constructing a task DAG based on a mapping relationship according to an embodiment of the present invention, as shown in fig. 6, the apparatus includes, in addition to all the modules shown in fig. 5, a task module 30 including: the dividing unit 301 is configured to divide each stage in the data management process according to the service processing flow, and define template information of task execution of each stage; the generating unit 302 is configured to generate entity tasks from the source entity to the target entity according to the defined data governance stages, and generate instance information of task execution according to template information of task execution of each stage.
In the present embodiment, the construction unit 40 includes: the construction module 401 is configured to construct a DAG structure of the task according to task dependency relationships generated when generating the task of each stage.
It should be noted that each of the above modules or units may be implemented by software or hardware, and for the latter, may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
An embodiment of the invention also provides a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, respectively creating a source entity and a target entity according to an acquired source table and a target table;
s2, establishing a mapping relation between a source entity and a target entity, wherein the mapping relation comprises table mapping and field mapping;
s3, generating tasks of each stage of business data management;
s4, constructing a task DAG structure through the dependency relationship of the tasks in each stage.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, respectively creating a source entity and a target entity according to an acquired source table and a target table;
s2, establishing a mapping relation between a source entity and a target entity, wherein the mapping relation comprises table mapping and field mapping;
s3, generating tasks of each stage of business data management;
s4, constructing a task DAG structure through the dependency relationship of the tasks in each stage.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method of constructing a task DAG based on a mapping relationship, comprising:
respectively creating a source entity and a target entity according to the acquired source table and target table;
establishing a mapping relation between the source entity and the target entity, wherein the mapping relation comprises table mapping and field mapping;
generating tasks of each stage of business data management according to the mapping relation between the source entity and the target entity, wherein each stage comprises: collecting, preprocessing, cleaning and fusing;
constructing a task DAG structure through the dependency relationship of the tasks of each stage, wherein the task of the first stage is the root task of the task DAG structure;
wherein, the tasks of each stage of generating business data management comprise:
dividing each stage in the data treatment process according to the business processing flow, and defining template information of task execution of each stage;
and generating entity tasks from the source entity to each stage of the target entity according to the defined data management stage, and generating instance information of task execution according to template information of task execution of each stage.
2. The method of claim 1, wherein building a task DAG structure from dependencies of tasks of each stage comprises:
and constructing a DAG structure of the task according to the task dependency relationship generated when the task of each stage is generated.
3. The method of claim 1, wherein the task dependencies include at least one of: the dependency relationship between the task and the father task, and the task is a root task.
4. An apparatus for constructing a task DAG based on a mapping relationship, comprising:
the acquisition module is used for respectively creating a source entity and a target entity according to the acquired source table and target table;
the mapping module is used for establishing a mapping relation between the source entity and the target entity, wherein the mapping relation comprises table mapping and field mapping;
the task module is used for generating tasks of each stage of business data management through the mapping relation between the source entity and the target entity, wherein each stage comprises: collecting, preprocessing, cleaning and fusing;
the construction module is used for constructing a task DAG structure according to the dependency relationship of the tasks in each stage, wherein the task in the first stage is the root task of the task DAG structure;
wherein the task module comprises:
the dividing unit is used for dividing each stage of the data treatment process according to the business processing flow and defining template information of task execution of each stage;
the generating unit is used for generating entity tasks from the source entity to each stage of the target entity according to the defined data management stage, and generating instance information of task execution according to template information of task execution of each stage.
5. The apparatus of claim 4, wherein the build module comprises:
and the construction unit is used for constructing the DAG structure of the task according to the task dependency relationship generated when the task of each stage is generated.
6. The apparatus of claim 4, wherein the task dependencies comprise at least one of: the dependency relationship between the task and the father task, and the task is a root task.
7. A computer readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to perform the method of any of the claims 1 to 3 when run.
8. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 3.
CN201911419978.0A 2019-12-31 2019-12-31 Method and device for constructing task DAG based on mapping relation Active CN111158800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911419978.0A CN111158800B (en) 2019-12-31 2019-12-31 Method and device for constructing task DAG based on mapping relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911419978.0A CN111158800B (en) 2019-12-31 2019-12-31 Method and device for constructing task DAG based on mapping relation

Publications (2)

Publication Number Publication Date
CN111158800A CN111158800A (en) 2020-05-15
CN111158800B true CN111158800B (en) 2024-02-23

Family

ID=70560526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911419978.0A Active CN111158800B (en) 2019-12-31 2019-12-31 Method and device for constructing task DAG based on mapping relation

Country Status (1)

Country Link
CN (1) CN111158800B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625692B (en) * 2020-05-27 2023-08-22 抖音视界有限公司 Feature extraction method, device, electronic equipment and computer readable medium
CN111753238A (en) * 2020-06-05 2020-10-09 北京有竹居网络技术有限公司 Data mapping method and device and electronic equipment
CN111651460A (en) * 2020-06-11 2020-09-11 上海德易车信息科技有限公司 Data management method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160112830A (en) * 2015-03-20 2016-09-28 한국전자통신연구원 Method and Apparatus for Generating Optimal Task based Data Processing Service
CN109800226A (en) * 2018-12-25 2019-05-24 北京明略软件系统有限公司 A kind of data administer in task management method and device
CN110516081A (en) * 2019-09-02 2019-11-29 北京明略软件系统有限公司 The display methods and device of tables of data mapping relations
CN110555038A (en) * 2018-03-28 2019-12-10 阿里巴巴集团控股有限公司 Data processing system, method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160112830A (en) * 2015-03-20 2016-09-28 한국전자통신연구원 Method and Apparatus for Generating Optimal Task based Data Processing Service
CN110555038A (en) * 2018-03-28 2019-12-10 阿里巴巴集团控股有限公司 Data processing system, method and device
CN109800226A (en) * 2018-12-25 2019-05-24 北京明略软件系统有限公司 A kind of data administer in task management method and device
CN110516081A (en) * 2019-09-02 2019-11-29 北京明略软件系统有限公司 The display methods and device of tables of data mapping relations

Also Published As

Publication number Publication date
CN111158800A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111158800B (en) Method and device for constructing task DAG based on mapping relation
CN110471754B (en) Data display method, device, equipment and storage medium in job scheduling
CN107807815B (en) Method and device for processing tasks in distributed mode
EP3113020A1 (en) Data processing device and method for processing serial tasks
CN109684319B (en) Data cleaning system, method, device and storage medium
CN110781180B (en) Data screening method and data screening device
CN111143446A (en) Data structure conversion processing method and device of data object and electronic equipment
CN102810184A (en) Method and device for dynamically executing workflow and enterprise system
CN106339802A (en) Task allocation method, task allocation device and electronic equipment
US9542161B2 (en) Method and system for generating a source code for a computer program for execution and simulation of a process
CN111966597B (en) Test data generation method and device
CN111435329A (en) Automatic testing method and device
CN114253845A (en) Automatic testing method and device for special-shaped architecture integration system
CA3102814A1 (en) System and method for data ingestion and workflow generation
CN114860566A (en) Source code testing method and device, electronic equipment and storage medium
CN106127026A (en) Authority configuring method and device
CN111160403A (en) Method and device for multiplexing and discovering API (application program interface)
CN110928876A (en) Credit data storage method and device
CN114116181B (en) Distributed data analysis task scheduling system and method
CN112000414B (en) Configurable display method and device for parameter information
CN113742052B (en) Batch task processing method and device
CN110858806B (en) Generation method and device of node deployment file, node deployment method and device and electronic equipment
CN106557359B (en) Task scheduling method and system
Sungur et al. Identifying relevant resources and relevant capabilities of collaborations-a case study
Guedes et al. Goals and Scenarios to Software Product Lines: the GS2SPL Approach.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant