CN116383172A - Data backtracking method, storage medium and electronic device - Google Patents

Data backtracking method, storage medium and electronic device Download PDF

Info

Publication number
CN116383172A
CN116383172A CN202310177924.8A CN202310177924A CN116383172A CN 116383172 A CN116383172 A CN 116383172A CN 202310177924 A CN202310177924 A CN 202310177924A CN 116383172 A CN116383172 A CN 116383172A
Authority
CN
China
Prior art keywords
task
backtracking
data
dependent
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310177924.8A
Other languages
Chinese (zh)
Inventor
杨猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd, Haier Uplus Intelligent Technology Beijing Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202310177924.8A priority Critical patent/CN116383172A/en
Publication of CN116383172A publication Critical patent/CN116383172A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data backtracking method, a storage medium and an electronic device, and relates to the technical field of intelligent household appliances, wherein the data backtracking method comprises the following steps: receiving a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task; performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link. The method provided by the application can greatly improve the retrieval capability of task dependent items, effectively improve the efficiency and accuracy of data backtracking, and reduce the error rate and the consumption of resources during scheduling operation.

Description

Data backtracking method, storage medium and electronic device
Technical Field
The application relates to the technical field of intelligent household appliances, in particular to a data backtracking method, a storage medium and an electronic device.
Background
In ETL (Extract-Transform-Load) tasks, data often needs to be backtracked, and dependency relationships need to be carded during the backtracking process. In the data backtracking process, task dependence and task scheduling are problems which need to be considered seriously, dependence among various tasks is not achieved by one-time scheduling operation in the triggering operation process. In the existing platform integration capability, the whole task dependency data is stored in a relational database, and meanwhile, each task is scheduled to generate a large flow arrangement, however, the system has a large bottleneck in retrieval performance, and resources are consumed in scheduling operation. When the backtracking task runs, resources of the whole platform are occupied, and meanwhile, problems of the stability of the platform are caused, so that how to provide a more effective data backtracking scheme to improve the data backtracking efficiency becomes a difficult problem to be solved.
Disclosure of Invention
The application provides a data backtracking method which is used for solving the defects that in the prior art, task dependent item retrieval efficiency is low and backtracking task operation is affected.
The application provides a data backtracking method, which comprises the following steps:
receiving a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task;
performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship;
and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link.
Further, performing dependency check based on a preset task dependency relationship knowledge graph to obtain a dependency task item corresponding to the data backtracking task, which specifically includes:
performing dependency check on upstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and outputting prompt information of incompletion of the backtracking task under the condition that the upstream table data does not meet the preset backtracking task requirement;
under the condition that the upstream table data meets the requirement of a preset backtracking task, performing dependency check on downstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and acquiring dependent task items corresponding to the downstream table data layer by utilizing a graph calculation mode until target dependent task items are acquired and completed, and determining the target dependent task items as dependent task items corresponding to the data backtracking task; the target dependent task item comprises a dependent task item corresponding to downstream table data acquired layer by layer.
Further, the data backtracking method further includes, after outputting the dependent task item to a preset Kafka message queue:
and reading the dependent task items in the Kafka message queue based on a preset Flink application program, and carrying out task scheduling on the dependent task items to obtain the task dependent link.
Further, the data backtracking method, before receiving the task backtracking request sent by the client, further includes:
task dependency relations corresponding to the data backtracking tasks are predetermined;
and storing the task dependency relationship into a preset graph database to construct and form a corresponding topological relationship, and outputting the task dependency relationship knowledge graph based on the topological relationship.
Further, the data backtracking method further includes, after performing data backtracking based on the task dependent link:
acquiring a processing result of a data backtracking task; the processing result of the data backtracking task comprises the number of task nodes running, the duration of the task nodes running and the information of whether the task nodes run successfully or not; returning the processing result of the data backtracking task to the client; the task dependency relationship knowledge graph comprises at least one task node.
Further, the data backtracking method further includes: under the condition that the upstream table data meet the preset backtracking task requirement, the dependent task items corresponding to the upstream table data are acquired layer by layer; the target dependent task item further comprises a dependent task item corresponding to the upstream table data acquired layer by layer.
The application also provides a data backtracking device, including:
the backtracking request receiving unit is used for receiving a task backtracking request sent by the client; the task backtracking request carries identification information of the data backtracking task;
the dependent task item acquisition unit is used for carrying out dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship;
and the data backtracking processing unit is used for outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link and carrying out data backtracking based on the task dependent link.
Further, the dependent task item obtaining unit is specifically configured to:
performing dependency check on upstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and outputting prompt information of incompletion of the backtracking task under the condition that the upstream table data does not meet the preset backtracking task requirement;
under the condition that the upstream table data meets the requirement of a preset backtracking task, performing dependency check on downstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and acquiring dependent task items corresponding to the downstream table data layer by utilizing a graph calculation mode until target dependent task items are acquired and completed, and determining the target dependent task items as dependent task items corresponding to the data backtracking task; the target dependent task item comprises a dependent task item corresponding to downstream table data acquired layer by layer.
Further, the data backtracking device, after outputting the dependent task item to a preset Kafka message queue, further includes:
and the task scheduling unit is used for reading the dependent task items in the Kafka message queue based on a preset Flink application program and performing task scheduling on the dependent task items so as to obtain the task dependent link.
Further, before receiving the task backtracking request sent by the client, the data backtracking device further includes:
the task dependency relationship determining unit is used for determining task dependency relationships corresponding to the data backtracking tasks in advance;
the knowledge graph construction unit is used for storing the task dependency relationship into a preset graph database to construct and form a corresponding topological relationship, and outputting the task dependency relationship knowledge graph based on the topological relationship.
Further, the data backtracking device further includes, after performing data backtracking based on the task dependent link:
the processing result feedback unit is used for acquiring the processing result of the data backtracking task; the processing result of the data backtracking task comprises the number of task nodes running, the duration of the task nodes running and the information of whether the task nodes run successfully or not; returning the processing result of the data backtracking task to the client; the task dependency relationship knowledge graph comprises at least one task node.
Further, in the data backtracking device, the dependent task item obtaining unit is further configured to obtain, in a case where the upstream table data meets a preset backtracking task requirement, a dependent task item corresponding to the upstream table data layer by layer; the target dependent task item further comprises a dependent task item corresponding to the upstream table data acquired layer by layer.
The present application also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to implement a data backtracking method as described in any of the above by execution of the computer program.
The present application also provides a computer-readable storage medium including a stored program, wherein the program when executed performs a data backtracking method as described in any one of the above.
The present application also provides a computer program product comprising a computer program which when executed by a processor implements a data backtracking method as described in any one of the above.
According to the data backtracking method, the task backtracking request sent by the client is received, dependency check is conducted based on a preset task dependency relationship knowledge graph, and the dependency task item corresponding to the data backtracking task is obtained; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link. The method can greatly improve the retrieval capability of task dependent items, effectively improve the efficiency and accuracy of data backtracking, and reduce the error rate and the consumption of resources during scheduling operation.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of a hardware environment of a data backtracking method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of the data backtracking method provided in the present application;
FIG. 3 is a schematic diagram of a specific application of the data backtracking method provided in the present application;
FIG. 4 is a second embodiment of a data trace-back method according to the present application;
fig. 5 is a schematic structural diagram of a data backtracking device provided in the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description of the present application and the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to one aspect of the embodiments of the present application, a data backtracking method is provided. The data backtracking method is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (Intelligence House) ecology and the like. Alternatively, in the present embodiment, the above-described data backtracking method may be applied to a hardware environment constituted by the terminal device 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.
The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.
Embodiments thereof are described in detail below based on the data backtracking method described in the present application. As shown in fig. 2, which is a flow chart of the data backtracking method provided in the present application, a specific implementation process includes the following steps:
step 201: and receiving a task backtracking request sent by the client. The task backtracking request carries identification information of the data backtracking task.
In the embodiment of the invention, before receiving a task backtracking request sent by a client, a task dependency relationship corresponding to a data backtracking task, namely, a dependency relationship of data to be backtracked, is required to be determined in advance, the task dependency relationship is stored in a preset graph database to construct and form a corresponding topological relationship, and the output of the topological relationship is determined to be the task dependency relationship knowledge graph. And after the task dependency relationship knowledge graph is obtained, receiving a task backtracking request sent by the client.
Step 202: performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship.
In this step, dependency check may be performed on the upstream table data of the data trace-back task based on a preset task dependency relationship knowledge graph, and when the upstream table data does not meet a preset trace-back task requirement, prompt information that the data trace-back task is not completed is output. Under the condition that the upstream table data meets the requirement of a preset backtracking task, performing dependency check on downstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and acquiring dependent task items corresponding to the downstream table data layer by utilizing a graph calculation mode until target dependent task items are acquired and completed, and determining the target dependent task items as dependent task items corresponding to the data backtracking task; the target dependent task item comprises a dependent task item corresponding to downstream table data acquired layer by layer. Further, under the condition that the upstream table data meets the preset backtracking task requirement, dependent task items corresponding to the upstream table data are acquired layer by layer; the target dependent task item further comprises a dependent task item corresponding to the upstream table data acquired layer by layer.
Specifically, a task backtracking request triggered by a user is obtained, a module is triggered first, whether an upstream dependent task item meets the preset backtracking task requirement is checked, namely whether all data of a required table are in place is checked; upstream data does not meet the requirement of the backtracking task, the upward dependent task item is triggered to be acquired, whether the data is in place is checked from the first layer, and if the top-layer data does not meet the requirement of the backtracking task, the backtracking requirement cannot be completed; and if the upstream data meet the requirement of the backtracking task, the scheduling of the current backtracking task is started, after the scheduling is completed, the acquisition of the downward dependent task items is triggered, and the dependency list (comprising a plurality of dependent task items) is acquired layer by means of the capability of graph calculation. As shown in fig. 3, in the upward and downward dependency checking process, whether the data table meets the requirement of the backtracking task is checked upward, if the data of the upstream table is not met, the direct result is output, and the backtracking task cannot be completed; and (3) downward task dependent items, acquiring a downstream dependent list by utilizing the preset graph computing capability, and acquiring the downstream dependent list layer by layer.
The method is based on a task dependency relationship knowledge graph as a storage medium and a calculation engine, acquires a task blood relationship (from a dispatching tool configuration side and an application analysis side), establishes a task entity according to a specified relationship, and attaches an upstream and downstream dependency relationship; outputting the task entity and the dependency relationship to a graph database; a certain data backtracking task needs to perform data backtracking, supports upward and downward acquisition of a dependent task list, outputs the tasks needing to be rerun layer by layer to a Kafka message queue; and reading the Kafka queue in real time, and completing event-driven layer-by-layer task scheduling through data in the message queue to complete data backtracking of the whole link. The dependency storage is carried out by means of the knowledge graph, the Kafka message queue is used for carrying out task decomposition of intermediate task gradual scheduling, on one hand, task dependency acquisition time is reduced, on the other hand, the operation pressure of the platform is relieved, on the other hand, event driving is supported, and downstream tasks can be triggered according to the operation condition of the tasks. The task dependency relationship knowledge graph refers to a knowledge graph of a topological structure formed by corresponding dependency relationship of a backtracking task.
Step 203: and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link.
After outputting the dependent task item to a preset Kafka message queue, the method further comprises: and reading the dependent task items in the Kafka message queue based on a preset Flink application program, and carrying out task scheduling on the dependent task items to obtain the task dependent link. In the specific implementation process, based on the real-time reading scheduling tool configuration library Azkaban and the dependency analysis library Parse of the Flink application program, all task dependency chains of the current department are obtained, a format of task entity and dependency relationship is generated, and the format is output to neo4j. And the method can also be used for obtaining all task dependency chains of the current department based on Spark offline real-time reading scheduling tool configuration library Azkaban and dependency analysis library Parse, generating a format of task entity and dependency relationship and outputting the format to neo4j.
In addition, after the data backtracking based on the task dependent link, the method further comprises: acquiring a processing result of a data backtracking task; the processing result of the data backtracking task comprises the number of task nodes running, the duration of the task nodes running and the information of whether the task nodes run successfully or not; returning the processing result of the data backtracking task to the client; the task dependency relationship knowledge graph comprises at least one task node.
Specifically, the dependent task item is written into the Kafka message queue, the Flink task reads the data in the Kafka message queue, task scheduling is performed, each task module performs downward dependent task item acquisition after scheduling is completed, and then the dependent task item is written into the Kafka message queue until the whole task dependent link is completed. After the whole backtracking link (i.e. task dependent link) is completed, the backtracking task requirement is output, and the upward and downward task requirement is respectively scheduled by how many dependent task items and how much time is consumed; before the backtracking task requirement starts, the user can pop up, how many tasks are needed upwards and downwards to schedule and predict time consumption, and can choose to start or stop. When task dependency relationship knowledge graphs are used as storage media and a computing engine, an executor reads the task dependency relationship knowledge graphs, acquires task dependency items, outputs the task dependency items to Kafka, buffers information, subscribes Kafka-topic in real time to acquire the dependency task items, calls Azkaban-API to trigger task operation, and after each task operation node is finished, triggers the executor to read the task dependency relationship knowledge graphs to acquire whether the trace task has the dependency task items or not, and triggers and outputs a Kafka information queue and a task operation step until the whole task dependency link is completely executed.
In the process of outputting task dependent items to a Kafka message queue, the task dependent items are required to be output to the Kafka message queue layer by layer, after downstream consumption, lower task dependent items are acquired, circulation is continued until all task dependent items are acquired, the acquired task dependent items are sent to an execution module to be scheduled layer by layer, the task dependent items are sent to the execution module to be operated, after each task dependent item is operated, the downstream task dependent item acquisition is carried out once, the process is repeated, and after all task dependent items in a task dependent link are operated, data backtracking execution results are output to an output module, for example, information including the number of operation of task nodes, the operation duration of the task nodes, whether the task nodes are operated successfully or not is included. And finally, sending the execution result of the backtracking task to the user.
For example, as shown in FIG. 4, the user is first required to configure a backtracking task, such as that check the table data dependence, and return the table data dependence results; checking task dependency items downwards, and returning a task dependency result; outputting a layer of task dependency, performing a layer of task execution, acquiring a lower layer of task dependency after the execution of the layer of task is completed, performing the lower layer of task execution, outputting a result after the execution is completed, sending result reminding information to a client, and outputting an unsatisfied result if the check list dependency is unsatisfied.
The task dependency relationship is stored in a graph database, and when data backtracking is carried out by means of graph computing capability, task dependency item output is carried out; and constructing an event-driven type by combining the dependent task item with the Kafka message queue and the Flink capability, so as to realize the scheduling of the data backtracking task item. That is, task dependency relationship knowledge graphs and event-driven capabilities are built in conventional data backtracking tasks, task backtracking is performed on the task dependency relationship knowledge graphs, task dependency items which are dependent on event-driven scheduling can be greatly improved in retrieval capability of the task dependency items, and operation resources of a platform are relieved when data backtracking task scheduling is performed, so that a data backtracking link is completely executed on the premise that normal task operation is not affected.
According to the data backtracking method, the task backtracking request sent by the client is received, dependency check is conducted based on a preset task dependency relationship knowledge graph, and the dependency task item corresponding to the data backtracking task is obtained; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link. The method can greatly improve the retrieval capability of task dependent items, effectively improve the efficiency and accuracy of data backtracking, and reduce the error rate and the consumption of resources during scheduling operation.
The data trace-back device provided by the present application is described below, and the data trace-back device described below and the data trace-back method described above may be referred to correspondingly.
Fig. 5 is a schematic structural diagram of a data backtracking device provided in the present application.
The data backtracking device specifically comprises the following parts:
a backtracking request receiving unit 501, configured to receive a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task;
the dependent task item obtaining unit 502 is configured to perform dependency check based on a preset task dependency relationship knowledge graph, and obtain a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship;
and the data backtracking processing unit 503 is configured to output the dependent task item to a preset Kafka message queue, obtain a task dependent link, and perform data backtracking based on the task dependent link.
Further, the dependent task item obtaining unit is specifically configured to:
performing dependency check on upstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and outputting prompt information of incompletion of the backtracking task under the condition that the upstream table data does not meet the preset backtracking task requirement;
under the condition that the upstream table data meets the requirement of a preset backtracking task, performing dependency check on downstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and acquiring dependent task items corresponding to the downstream table data layer by utilizing a graph calculation mode until target dependent task items are acquired and completed, and determining the target dependent task items as dependent task items corresponding to the data backtracking task; the target dependent task item comprises a dependent task item corresponding to downstream table data acquired layer by layer.
Further, the data backtracking device, after outputting the dependent task item to a preset Kafka message queue, further includes:
and the task scheduling unit is used for reading the dependent task items in the Kafka message queue based on a preset Flink application program and performing task scheduling on the dependent task items so as to obtain the task dependent link.
Further, before receiving the task backtracking request sent by the client, the data backtracking device further includes:
the task dependency relationship determining unit is used for determining task dependency relationships corresponding to the data backtracking tasks in advance;
the knowledge graph construction unit is used for storing the task dependency relationship into a preset graph database to construct and form a corresponding topological relationship, and outputting the task dependency relationship knowledge graph based on the topological relationship.
Further, the data backtracking device further includes, after performing data backtracking based on the task dependent link:
the processing result feedback unit is used for acquiring the processing result of the data backtracking task; the processing result of the data backtracking task comprises the number of task nodes running, the duration of the task nodes running and the information of whether the task nodes run successfully or not; returning the processing result of the data backtracking task to the client; the task dependency relationship knowledge graph comprises at least one task node.
Further, in the data backtracking device, the dependent task item obtaining unit is further configured to obtain, in a case where the upstream table data meets a preset backtracking task requirement, a dependent task item corresponding to the upstream table data layer by layer; the target dependent task item further comprises a dependent task item corresponding to the upstream table data acquired layer by layer.
The task dependency relationship is stored in a graph database, and when data backtracking is carried out by means of graph computing capability, task dependency item output is carried out; and constructing an event-driven type by combining the dependent task item with the Kafka message queue and the Flink capability, so as to realize the scheduling of the data backtracking task item. That is, task dependency relationship knowledge graphs and event-driven capabilities are built in conventional data backtracking tasks, task backtracking is performed on the task dependency relationship knowledge graphs, task dependency items which are dependent on event-driven scheduling can be greatly improved in retrieval capability of the task dependency items, and operation resources of a platform are relieved when data backtracking task scheduling is performed, so that a data backtracking link is completely executed on the premise that normal task operation is not affected.
According to the data backtracking device, the task backtracking request sent by the client is received, the dependency check is carried out based on the preset task dependency relationship knowledge graph, and the dependency task item corresponding to the data backtracking task is obtained; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link. The method can greatly improve the retrieval capability of task dependent items, effectively improve the efficiency and accuracy of data backtracking, and reduce the error rate and the consumption of resources during scheduling operation.
Fig. 6 illustrates a physical structure of an electronic device. As shown in fig. 6, the electronic device may include: processor 601, communication interface (Communications Interface) 604, memory 602 and communication bus 603, wherein processor 601, communication interface 604, memory 602 accomplish the communication between each other through communication bus 603. The processor 601 may invoke logic instructions in the memory 602 to perform a data backtracking method comprising: receiving a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task; performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link.
Further, the logic instructions in the memory 602 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present application further provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a computer readable storage medium, where the computer program when executed by a processor can perform a data backtracking method provided by the above methods, and the method includes: receiving a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task; performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link.
In still another aspect, the present application further provides a computer readable storage medium, where the computer readable storage medium includes a stored program, where the program executes a data backtracking method provided by the above methods, and the method includes: receiving a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task; performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship; and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A data backtracking method, comprising:
receiving a task backtracking request sent by a client; the task backtracking request carries identification information of the data backtracking task;
performing dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship;
and outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link, and carrying out data backtracking based on the task dependent link.
2. The data backtracking method according to claim 1, wherein the dependency check is performed based on a preset task dependency relationship knowledge graph to obtain a dependency task item corresponding to the data backtracking task, and the method specifically comprises:
performing dependency check on upstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and outputting prompt information of incompletion of the backtracking task under the condition that the upstream table data does not meet the preset backtracking task requirement;
under the condition that the upstream table data meets the requirement of a preset backtracking task, performing dependency check on downstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and acquiring dependent task items corresponding to the downstream table data layer by utilizing a graph calculation mode until target dependent task items are acquired and completed, and determining the target dependent task items as dependent task items corresponding to the data backtracking task; the target dependent task item comprises a dependent task item corresponding to downstream table data acquired layer by layer.
3. The data backtracking method of claim 1, further comprising, after outputting the dependent task item to a preset Kafka message queue:
and reading the dependent task items in the Kafka message queue based on a preset Flink application program, and carrying out task scheduling on the dependent task items to obtain the task dependent link.
4. The data backtracking method of claim 1, further comprising, prior to receiving the task backtracking request sent by the client:
task dependency relations corresponding to the data backtracking tasks are predetermined;
and storing the task dependency relationship into a preset graph database to construct and form a corresponding topological relationship, and outputting the task dependency relationship knowledge graph based on the topological relationship.
5. The data backtracking method of claim 1, further comprising, after backtracking data based on the task dependent link:
acquiring a processing result of a data backtracking task; the processing result of the data backtracking task comprises the number of task nodes running, the duration of the task nodes running and the information of whether the task nodes run successfully or not; returning the processing result of the data backtracking task to the client; the task dependency relationship knowledge graph comprises at least one task node.
6. The data backtracking method of claim 2, further comprising: under the condition that the upstream table data meet the preset backtracking task requirement, the dependent task items corresponding to the upstream table data are acquired layer by layer; the target dependent task item further comprises a dependent task item corresponding to the upstream table data acquired layer by layer.
7. A data backtracking apparatus, comprising:
the backtracking request receiving unit is used for receiving a task backtracking request sent by the client; the task backtracking request carries identification information of the data backtracking task;
the dependent task item acquisition unit is used for carrying out dependency check based on a preset task dependency relationship knowledge graph to acquire a dependent task item corresponding to the data backtracking task; the task dependency relationship knowledge graph is a topological relationship constructed based on a predetermined task dependency relationship;
and the data backtracking processing unit is used for outputting the dependent task item to a preset Kafka message queue, obtaining a task dependent link and carrying out data backtracking based on the task dependent link.
8. The data backtracking apparatus of claim 7, wherein the dependent task item acquisition unit is specifically configured to:
performing dependency check on upstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and outputting prompt information of incompletion of the backtracking task under the condition that the upstream table data does not meet the preset backtracking task requirement;
under the condition that the upstream table data meets the requirement of a preset backtracking task, performing dependency check on downstream table data of the data backtracking task based on a preset task dependency relationship knowledge graph, and acquiring dependent task items corresponding to the downstream table data layer by utilizing a graph calculation mode until target dependent task items are acquired and completed, and determining the target dependent task items as dependent task items corresponding to the data backtracking task; the target dependent task item comprises a dependent task item corresponding to downstream table data acquired layer by layer.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the data backtracking method of any one of claims 1 to 6.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to perform the data backtracking method of any one of claims 1 to 6 by means of the computer program.
CN202310177924.8A 2023-02-28 2023-02-28 Data backtracking method, storage medium and electronic device Pending CN116383172A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310177924.8A CN116383172A (en) 2023-02-28 2023-02-28 Data backtracking method, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310177924.8A CN116383172A (en) 2023-02-28 2023-02-28 Data backtracking method, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN116383172A true CN116383172A (en) 2023-07-04

Family

ID=86979680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310177924.8A Pending CN116383172A (en) 2023-02-28 2023-02-28 Data backtracking method, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN116383172A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738009A (en) * 2023-08-09 2023-09-12 北京谷器数据科技有限公司 Method for archiving and backtracking data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738009A (en) * 2023-08-09 2023-09-12 北京谷器数据科技有限公司 Method for archiving and backtracking data
CN116738009B (en) * 2023-08-09 2023-11-21 北京谷器数据科技有限公司 Method for archiving and backtracking data

Similar Documents

Publication Publication Date Title
CN116383172A (en) Data backtracking method, storage medium and electronic device
CN117573320A (en) Task node execution method and device, storage medium and electronic device
CN114598719A (en) Smart city Internet of things event management method, device and readable medium
CN114911535B (en) Application program component configuration method, storage medium and electronic device
CN116361155A (en) Method and device for testing software development kit, storage medium and electronic device
CN114915514B (en) Method and device for processing intention, storage medium and electronic device
CN116360584A (en) Virtual target product generation method and device, storage medium and electronic device
CN116033006A (en) Data processing method, system, storage medium and electronic device
CN115174296B (en) Equipment function access method and device, storage medium and electronic device
CN114826899B (en) Debugging method and device for equipment control service, storage medium and electronic device
CN114760235B (en) Method and device for executing dial testing task, storage medium and electronic device
CN116301767A (en) Interface file generation method and device, storage medium and electronic device
CN115296986B (en) Event recording method and device, storage medium and electronic device
CN116501698A (en) Timing task processing method and device, storage medium and electronic device
CN116756480A (en) Data statistics method and device, storage medium and electronic device
CN115344240A (en) Data processing method, data processing device, storage medium and electronic device
CN117376360A (en) Data processing method, storage medium and electronic device
CN116541012A (en) Basic parameter modification method and device, storage medium and electronic device
CN116467070A (en) Task processing method and device, storage medium and electronic device
CN115481317A (en) Recommendation method for workbench scene, storage medium and electronic device
CN117749548A (en) Method and device for controlling initiation of interactive service, storage medium and electronic device
CN117573735A (en) Label acquisition method and device, storage medium and electronic device
Babczyński et al. Performance evaluation of multiagent personalized information system
CN115550214A (en) Task monitoring method and device, storage medium and electronic device
CN117857621A (en) Data pushing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination