CN116932252B

CN116932252B - Asynchronous task compensation method and device based on batch data import pipeline

Info

Publication number: CN116932252B
Application number: CN202311198930.8A
Authority: CN
Inventors: 范荷华; 樊现刚; 陈轶欧
Original assignee: Beijing Cssca Technologies Co ltd
Current assignee: Beijing Cssca Technologies Co ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2024-01-26
Anticipated expiration: 2043-09-18
Also published as: CN116932252A

Abstract

The application provides an asynchronous task compensation method and device based on a batch data import pipeline, which belong to the field of database management, and the method comprises the following steps: creating a message queue of a plurality of tasks according to the database operation sequence; distributing the tasks to different task control monitors according to the sequence of the tasks in the message queue, sequentially importing each task and recording the importing execution condition of each task; and if the task is successfully imported, the next task is imported until all the tasks are imported. According to the method and the device, the message queue is set, the task control and the execution condition monitoring can be carried out, the link tracking can be carried out on the integrity of the imported data, and the normal execution of other data is not influenced due to the fact that the data is lost when the task fails, so that the importing of the whole database transaction is prevented from being restarted, and the importing efficiency is improved.

Description

Asynchronous task compensation method and device based on batch data import pipeline

Technical Field

The present disclosure relates to the field of database management, and in particular, to an asynchronous task compensation method and apparatus based on batch data import pipeline.

Background

In order to improve consistency of imported data, database transactions are generally used in program methods. A database transaction is an inseparable sequence of database operations, and is a basic unit of concurrency control of a database, the execution result of which must ensure consistency, and the transaction ensures a logical set of operations, either all or none.

The method flow of the common transaction in the prior art is as follows: the system imports data requests, processes imported data one by one in a pipeline, finally imports stored information successfully, and fails to store the information in the pipeline. If the data import fails, the next time the data needs to be imported again from the first step, and the failed task is not continuously executed from the failed step. In addition, when the amount of imported data is large, a large amount of resources of the system are occupied.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provide an asynchronous task compensation method and device based on batch data import pipeline.

The application provides an asynchronous task compensation method based on batch data import pipeline, which comprises the following steps:

creating a message queue of a plurality of tasks according to the database operation sequence;

distributing the tasks to different task control monitors according to the sequence of the tasks in the message queue, sequentially importing each task and recording the importing execution condition of each task;

and if the task is successfully imported, the next task is imported until all the tasks are imported.

Optionally, before the re-executing the task, the method includes:

and analyzing the recorded information of the execution failure task, and repairing the failed system design or importing the data problem.

Optionally, the step of performing the next task if the importing is successful includes:

and modifying the task state to 1, storing the successful execution data into a database, sending information to inform the message queue that the current task is completed, and continuing the next task.

Optionally, if the task import fails, re-executing the task import includes:

and modifying the task state to-1, storing the data with the execution failure into a database, and not sending information to a message queue.

Optionally, the sequentially performing import of each task and recording import execution condition of each task includes:

judging whether the task ID corresponding to the task control monitor is empty or not;

if the task is empty, the ID of the task is produced, and the operation is executed; not empty and the execution state is successful, then no task is performed.

The application provides an asynchronous task compensation device based on batch data import pipeline, which comprises:

a message queue for creating a message queue of a plurality of tasks according to the database operation sequence;

the execution module is used for distributing the tasks to different task control monitors according to the sequence of the tasks in the message queue, sequentially importing each task and recording the importing execution condition of each task;

the task control console is used for judging according to the execution conditions: and if the task is successfully imported, the next task is imported until all the tasks are imported.

Optionally, before the task is re-executed, the detection module includes:

Optionally, after the next task is successfully imported, the detection module includes:

Optionally, the detection module after re-executing the task import if the task import fails, includes:

Optionally, the executing module sequentially performs importing of each task and records importing and executing conditions of each task, including:

The application has the advantages and beneficial effects that:

the application provides an asynchronous task compensation method based on batch data import pipeline, which comprises the following steps: creating a message queue of a plurality of tasks according to the database operation sequence; distributing the tasks to different task control monitors according to the sequence of the tasks in the message queue, sequentially importing each task and recording the importing execution condition of each task; and if the task is successfully imported, the next task is imported until all the tasks are imported. According to the method and the device, the message queue is set, the task control and the execution condition monitoring can be carried out, the link tracking can be carried out on the integrity of the imported data, and the normal execution of other data is not influenced due to the fact that the data is lost when the task fails, so that the importing of the whole database transaction is prevented from being restarted, and the importing efficiency is improved.

Drawings

FIG. 1 is a prior art schematic diagram of batch data importation in the present application.

FIG. 2 is a schematic diagram of asynchronous task compensation based on batch data import pipeline in the present application.

FIG. 3 is a schematic diagram of batch data importation in the present application.

FIG. 4 is a schematic diagram of an asynchronous task compensation device based on a batch data import pipeline in the present application

Detailed Description

The present application is further described in conjunction with the drawings and detailed embodiments so that those skilled in the art may better understand the present application and practice it.

The following are examples of specific implementation provided for the purpose of illustrating the technical solutions to be protected in this application in detail, but this application may also be implemented in other ways than described herein, and one skilled in the art may implement this application by using different technical means under the guidance of the conception of this application, so this application is not limited by the following specific embodiments.

The patent relates to the processing of batch data import systems, providing a mechanism for compensating pipeline task failures to ensure the correct completion of pipeline tasks and not to affect the continued import of this batch of other data.

Referring to fig. 1, in order to improve consistency of imported data, a system generally uses a transaction technique in a program method, which is a common practice in the industry, as mentioned in the background art. Database transactions are an inseparable sequence of database operations, and are also the basic unit of concurrency control of a database. The transaction guarantees that a logical set of operations is either all performed or none performed, thereby ensuring data consistency.

Specifically, the transaction technique ensures the integrity of the entire import process, namely during the process of importing data. If any error or exception occurs during the import process, the transaction will ensure that all data is rolled back to the state before import, thus avoiding data inconsistencies.

Further, a single thread can take multiple steps, and when the amount of imported batch data is large, it can occupy a large amount of system resources.

The method and the device for setting the database transaction are firstly used for setting the database transaction, so that the tasks are not bound any more to be executed or not executed.

FIG. 2 is a schematic diagram of asynchronous task compensation based on batch data import pipeline in the present application. FIG. 3 is a schematic diagram of batch data importation in the present application.

Referring to fig. 2 and 3, an asynchronous task compensation method based on batch data import pipeline includes the steps of:

s101, creating a message queue of a plurality of tasks according to a database operation sequence.

Message queuing is a technique that utilizes efficient and reliable messaging mechanisms for platform-independent data communication and for distributed system integration based on data communication. The message queue is based on data communication, and indirect communication is carried out through the shared queue, so that decoupling between a sender and a receiver can be realized, and development and evolution can be independently carried out. The message queue has the characteristics of asynchronous communication, decoupling, loose coupling, reliability and the like, and can improve the scalability and performance of the system and simplify the design of the system. Message queues typically provide a persistence mechanism to ensure reliability of messages during transmission and reception, and even after transmission of a message, if the recipient is temporarily unavailable, the message is retained in the queue, which prevents the message from being lost, and a mechanism to handle failures and anomalies in the system.

Based on the above, the application adds a message queue based on the prior art. Specifically, in the process of importing batch data of a database, the batch data is enabled to execute the operation of a message queue first, so that the following purposes are achieved:

decoupling: separating the batch data import operation from the database operation can make the two processes independent of each other, reducing the coupling between them. Thus, the importing process and the database operation can be performed in parallel, and the efficiency of the system is improved.

Asynchronous processing: the message queue supports asynchronous processing, which means that after the import operation is completed, the result can be notified to the corresponding handler or system through the message queue. The asynchronous processing mode can improve the response speed of the system.

Data reliability: the batch data may be temporarily stored in the queue via the message queue awaiting subsequent processing. Even if faults or abnormal conditions occur in the data importing process, the data cannot be lost, and the reliability of the data is ensured.

Specifically, first, a message queue is created, and an existing message queue system, such as RabbitMQ, kafka, can be selected for use, or a simple message queue can be implemented by itself. Then, a connection is made with the created message queue. The specific manner of connection depends on the message queue system used, and the present application makes the connection by providing a corresponding library or API.

Depending on the requirements of the message queue system used, necessary initialization operations are performed, including setting the serialization of the messages, creating topics or queues, etc.

S102, distributing the tasks to different task control monitors according to the sequence of the tasks in the message queue, sequentially importing each task and recording the importing execution condition of each task.

And creating a task control monitor, wherein the task control monitor is used for receiving the messages in the message queue and executing corresponding tasks according to the content of the messages. Such as writing a listener program or creating task control listeners using a client library provided by the message queue system.

The task control listener is coupled to the message queue such that the listener can receive and process messages in the message queue.

The task control listener may obtain the task data associated with the message from the database, and if the task is a data importation, the task control listener may need to obtain detailed information of importation data from the database, such as data sources, data formats, etc. The specific manner of querying depends on the structure and design of the database. And the task control monitor acquires the execution data of the task control monitor by querying the database. The data includes information such as the status of the task, task ID, and the like.

When the import of each task is sequentially carried out, the task control monitor records the import execution condition of each task.

Specifically, after the task control monitor execution data is obtained, whether the process task ID corresponding to the task control monitor is empty may be further determined, including:

if the task is empty, an ID of the task needs to be generated;

if not, the process continues to the next step.

Further, if the flow task ID corresponding to the task control listener is null, the ID of the task is generated. The specific generation mode can be designed according to the service requirement.

If the flow task ID corresponding to the task control monitor is not null and the state is successful, the task of the task control monitor is not executed.

S103, judging according to the execution condition, if the task is not imported, re-executing the task, and if the task is successfully imported, importing the next task until all tasks are imported.

If the task is to conduct data importation, then operations need to be performed to extract and import data from the message queue into the database. After the execution operation is completed, the state of the task is modified to be 1, which indicates that the task execution is successful. The task state may be designed according to the actual situation, where it is assumed that state 1 indicates that the task execution is successful.

And saving the data of successful task execution in a database for subsequent inquiry and audit. The saved data may include information such as task ID, task execution time, task result, etc.

A message is sent to the message queue informing the message queue that the task has been performed. In this way, the message queue can continue processing the next message without waiting for further processing of the task. After the completion message is sent, processing of the next task is continued. This process may repeat the above steps until all tasks are performed.

After the task fails to execute, the state of the task is modified to be-1, which indicates that the task fails to execute. The task state may be designed based on the actual situation, where it is assumed that state-1 indicates that the task has failed to execute. And saving the data of the task execution failure in a database for subsequent inquiry and audit. The saved data may include information such as task ID, task execution time, cause of task failure, etc.

After the task execution fails, no failed message needs to be sent to the message queue. Thus, the message queue can be prevented from processing the failed message, and the additional overhead of the system is reduced.

The application is provided with a task console connected to a database, and queries a corresponding log table or log file to obtain an execution log of a task control monitor.

According to the query result, the task console can screen out the failed records. These records typically have information about the status of the task as failed, task ID, etc. The task console may analyze the record of the failure to find the cause of the failure. Such as viewing specific operations performed by the task, results of the operations, error information, etc.

After analyzing the cause of the failure, the task console may generate a new task and resend it to the message queue if it is determined that the task can be re-executed. Therefore, manual re-execution of tasks can be avoided, and the automation degree of the system is improved.

Further, the task console monitors the task execution condition of each task control monitor in real time, including information such as the state of the task, the task ID, the execution time and the like. Thus, a system administrator or developer can timely know the execution state of the task, find out the task which fails to be executed and process the task.

The task console is connected to the database and queries the execution log of the task control listener. By checking the log, the execution process and the result of the task can be known, and the failure cause can be found out and repaired. The task control console analyzes the task with the failure execution and finds out the reason of the failure. By analyzing the failure cause, whether the system design problem or the imported data problem causes the task to fail can be determined.

According to the failure reason of analysis, the task console can take corresponding measures to repair. If a system design issue, it may be necessary to modify the program code or adjust the system configuration; if it is a problem of imported data, it may be necessary to repair the data source or adjust the way the data is imported.

After repairing the problem, the task console may regenerate and send the failed task to the message queue. Thus, the message queue can re-execute the task according to the normal flow, and successful completion of the task is ensured.

In order to better monitor the state and execution condition of each task, a task control monitor is introduced. These listeners can track the execution of each task, including information on the status of the task, task ID, execution time, execution results, etc. With this information, the task console can quickly locate a failed task and re-execute the task without having to re-execute the entire import process.

To further increase the efficiency of import, asynchronous multithreading is used. In this way, each task control listener can independently and concurrently execute tasks without waiting for the completion of other listeners or the entire import process. This approach can significantly reduce the lead-in time and improve the throughput of the system.

To better understand and monitor the data importation process, a data link tracking function is introduced. By this function, the complete process of each piece of data from the beginning of the import to the successful import can be tracked. This can help developers and system administrators locate and solve problems quickly, improving reliability and maintainability of the system.

The application also provides an asynchronous task compensation device based on batch data import pipeline, comprising: message queue 301, execution module, task console 303.

An asynchronous task compensation device based on batch data import pipeline as shown in fig. 4.

A message queue 301, creating a message queue 301 of multiple tasks from a database sequence of operations.

Message queue 301 is a technique for platform-independent data communication using efficient and reliable messaging mechanisms and distributed system integration based on data communication. The message queue 301 is based on data communication, and indirectly communicates through a shared queue, so that decoupling between a sender and a receiver can be realized, and development and evolution can be independently performed. The message queue 301 has the characteristics of asynchronous communication, decoupling, loose coupling, reliability and the like, and can improve the scalability and performance of the system and simplify the design of the system. Message queue 301 typically provides a persistence mechanism to ensure reliability of messages during transmission and reception, and even after transmission of a message, if the recipient is temporarily unavailable, the message is retained in the queue, which prevents the message from being lost, and provides a mechanism to handle failures and anomalies in the system.

Based on this, the present application adds a message queue 301 on the basis of the prior art. Specifically, in the process of importing batch data of a database, the batch data is made to execute the operation of the message queue 301 first, so as to achieve the following purposes:

Asynchronous processing: the message queue 301 supports asynchronous processing, which means that after the import operation is completed, the result can be notified to the corresponding handler or system through the message queue 301. The asynchronous processing mode can improve the response speed of the system.

Data reliability: the batch data may be temporarily stored in a queue via message queue 301 awaiting subsequent processing. Even if faults or abnormal conditions occur in the data importing process, the data cannot be lost, and the reliability of the data is ensured.

Specifically, first, a message queue 301 is created, and an existing message queue 301, such as RabbitMQ, kafka, may be selected for use, or a simple message queue 301 may be implemented by itself. Then, a connection is made with the created message queue 301. The specific manner of connection depends on the message queue 301 used, and the present application connects by providing a corresponding library or API.

Depending on the requirements of the message queue 301 used, necessary initialization operations are performed, including setting the serialization of the messages, creating topics or queues, etc.

The execution module comprises a task control monitor 302, the tasks are distributed to different task control monitors 302 according to the sequence of the tasks in the message queue 301, the import of each task is sequentially carried out, and the import execution condition of each task is recorded.

A task control listener 302 is created, the task control listener 302 is configured to receive the message in the message queue 301 and perform a corresponding task according to the content of the message. Such as writing a listener program or creating task control listener 302 using a client library provided by message queue 301.

Task control listener 302 is coupled to message queue 301 so that the listener can receive and process messages in message queue 301.

The task control listener 302 may obtain the task data associated with the message from a database, and if the task is to perform a data import, the task control listener 302 may need to obtain the detailed information of the import data from the database, such as data sources, data formats, etc. The specific manner of querying depends on the structure and design of the database. By querying the database, the task control listener 302 obtains the execution data for the task control listener 302. The data includes information such as the status of the task, task ID, and the like.

As the importation of each task proceeds in turn, the task control listener 302 records the importation execution of each task.

Specifically, after the task control monitor 302 executes the data, it may further be determined whether the process task ID corresponding to the task control monitor 302 is empty, including:

if the task is empty, an ID of the task needs to be generated;

if not, the process continues to the next step.

Further, if the flow task ID corresponding to the task control listener 302 is null, the ID of the task is generated. The specific generation mode can be designed according to the service requirement.

If the flow task ID corresponding to the task control listener 302 is not null and the status is successful, the task of the task control listener 302 is not executed.

And a task console 303, judging according to the execution condition, if the task is not imported, re-executing the task, and if the task is successfully imported, performing the next task until all the tasks are imported.

If the task is to conduct data importation, then operations need to be performed to extract and import data from the message queue 301 into the database. After the execution operation is completed, the state of the task is modified to be 1, which indicates that the task execution is successful. The task state may be designed according to the actual situation, where it is assumed that state 1 indicates that the task execution is successful.

A message is sent to the message queue 301 informing the message queue 301 that the task has been performed. In this way, message queue 301 can continue processing the next message without waiting for further processing of the task. After the completion message is sent, processing of the next task is continued. This process may repeat the above steps until all tasks are performed.

After a task execution failure, there is no need to send the failed message to the message queue 301. This avoids message queue 301 from handling failed messages and reduces overhead on the system.

The application is provided with a task console 303 connected to the database and querying a corresponding log table or log file to obtain the execution log of the task control listener 302.

Based on the query results, the task console 303 may filter out failed records. These records typically have information about the status of the task as failed, task ID, etc. The task console 303 may analyze the record of the failure to find the cause of the failure. Such as viewing specific operations performed by the task, results of the operations, error information, etc.

After analyzing the cause of the failure, if it is determined that the task can be re-executed, the task console 303 may generate a new task and re-send it to the message queue 301. Therefore, manual re-execution of tasks can be avoided, and the automation degree of the system is improved.

Further, the task console 303 monitors the task execution status of each task control monitor 302 in real time, including information such as the status of the task, the task ID, and the execution time. Thus, a system administrator or developer can timely know the execution state of the task, find out the task which fails to be executed and process the task.

The task console 303 is connected to a database, querying the execution log of the task control listener 302. By checking the log, the execution process and the result of the task can be known, and the failure cause can be found out and repaired. The task console 303 analyzes the task of which execution fails, and finds out the cause of the failure. By analyzing the failure cause, whether the system design problem or the imported data problem causes the task to fail can be determined.

Based on the analyzed failure cause, the task console 303 may take corresponding action to repair. If a system design issue, it may be necessary to modify the program code or adjust the system configuration; if it is a problem of imported data, it may be necessary to repair the data source or adjust the way the data is imported.

Re-executing the failed task: after repairing the problem, the task console 303 may regenerate and send the failed task to the message queue 301. In this way, the message queue 301 can re-execute the task in accordance with the normal flow, ensuring successful completion of the task.

To better monitor the status and execution of each task, the present application introduces a task control listener 302. These listeners can track the execution of each task, including information on the status of the task, task ID, execution time, execution results, etc. With this information, the task console 303 can quickly locate a failed task and re-execute the task without having to re-execute the entire import procedure.

To further increase the efficiency of import, asynchronous multithreading is used. In this way, each task control listener 302 can independently and concurrently execute tasks without waiting for the completion of other listeners or the entire import process. This approach can significantly reduce the lead-in time and improve the throughput of the system.

Claims

1. An asynchronous task compensation method based on batch data import pipeline is characterized by comprising the following steps:

setting the database transaction is released, so that the tasks are not bound to be executed or not executed any more;

creating a message queue of a plurality of tasks according to the database operation sequence; decoupling a batch data importing operation from a database operation, and enabling the batch data to execute the operation of the message queue in advance in the process of importing the batch data of the database; batch data can be temporarily stored in the queue through the message queue to wait for subsequent processing;

the tasks are distributed to different task control monitors according to the sequence of the tasks in the message queue; connecting the task control monitor with the message queue to enable the task control monitor to receive and process the message in the message queue, sequentially importing each task and recording the importing execution condition of each task; introducing a data link tracking function when each task import execution condition is recorded, wherein the data link tracking function is used for tracking the complete process from the beginning of import to the successful import of each piece of data;

the message queue adopts an asynchronous multithreading processing mode, so that each task control monitor independently and parallelly executes tasks;

after acquiring execution data of a task control monitor, judging whether a task ID corresponding to the task control monitor is empty, if so, producing the ID of the task, and executing operation; if the task is not empty and the execution state is successful, the task is not executed;

judging according to the execution condition, if the task is failed to be imported, not needing to send the failed message to a message queue, analyzing the failure reason and repairing, if the task is determined to be re-executed, generating a new task by a task console and re-sending the new task to the message queue, re-executing the importing of the task, and if the importing is successful, importing the next task until all the tasks are imported; the task control console is connected to the database, inquires the execution log of the task control monitor, screens out failure records according to the inquiry result, analyzes the failure records, finds out the failure reason and repairs the failure reason.

2. The method for asynchronous task compensation based on a batch data import pipeline according to claim 1, wherein before the importing of the task is re-executed, comprising:

3. The asynchronous task compensation method based on a batch data import pipeline according to claim 1, wherein the importing of the next task if the importing is successful comprises:

4. The method for asynchronous task compensation based on a batch data import pipeline according to claim 1, wherein if the task import fails, re-executing the import of the task comprises:

5. An asynchronous task compensation device based on a batch data import pipeline for performing the asynchronous task compensation method based on a batch data import pipeline according to any one of claims 1 to 4, comprising:

the message queue is used for creating a message queue of a plurality of tasks according to the database operation sequence, and connecting the task control monitor with the message queue so that the task control monitor receives and processes the messages in the message queue;

the execution module is used for distributing the tasks to different task control monitors according to the sequence of the tasks in the message queue, sequentially importing each task and recording the importing and executing conditions of each task, judging whether the task ID corresponding to the task control monitor is empty after acquiring the executing data of the task control monitor, and if so, producing the ID of the task and executing the operation; if the task is not empty and the execution state is successful, the task is not executed;

the task control console is used for judging according to the execution conditions: if the task is imported to fail, the failed message is not required to be sent to a message queue, after the failure reason is analyzed and repaired, if the task is determined to be re-executable, the task console generates a new task and re-sends the new task to the message queue, the task is re-executed, and if the task is imported successfully, the next task is imported until all tasks are imported; the task console is connected to the database, and inquires a corresponding log table or log file to obtain an execution log of the task control monitor, and according to the inquiry result, the task console can screen out failure records, wherein the failure records have task state failure and task ID information, and the task console analyzes the failure records to find out the reason of the failure.

6. The asynchronous task compensation device based on batch data import pipeline of claim 5, comprising, prior to re-executing the import of the task:

7. The asynchronous task compensation device based on a batch data import pipeline according to claim 5, comprising, after the next task is imported if the importing is successful:

8. The asynchronous task compensation device based on a batch data import pipeline according to claim 5, comprising, after re-executing the import of the task if the import of the task fails: