CN115858245A

CN115858245A - Data backup job scheduling system and backup job scheduling method

Info

Publication number: CN115858245A
Application number: CN202211674441.0A
Authority: CN
Inventors: 陈浩; 张有成
Original assignee: Nanjing Unary Information Technology Co ltd
Current assignee: Nanjing Unary Information Technology Co ltd
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-03-28

Abstract

The invention discloses a data backup job scheduling system and a backup job scheduling method, which belong to the technical field of data backup and comprise the following steps: the operation engine module is used for splitting and arranging operation and operation stages; the job manager is used for managing the life cycle of the job, the job state, controlling the job flow and receiving the scheduling result of the job engine; the event responder is used for receiving and processing the job control signal sent by the job manager; the phase executor is used for starting and executing the operation phase process; the data backup operation scheduling system and the backup operation scheduling method decouple core backup services of different types of data sources, multiplex common services, enable the whole backup process to be structured and clear, are easy to maintain and expand, and reduce the difficulty and cost of development and maintenance; the upgrading cost is reduced; the requirement on the computing performance of the backup client is greatly reduced, and the resource occupation of the backup client is reduced.

Description

Data backup job scheduling system and backup job scheduling method

Technical Field

The invention belongs to the technical field of data backup, and particularly relates to a data backup job scheduling system and a backup job scheduling method.

Background

Under the background of the current big data era, more and more enterprises begin to transform data, meanwhile, the state also issues legal provisions related to data security, and more enterprises begin to pay attention to data security; therefore, in order to protect data against loss, a large number of enterprises begin to use data backup system software; some enterprises with higher requirements and larger scale have higher requirements on the data backup system, such as the type of backup support, the parameter requirement of backup performance, the influence degree on the production environment and the like;

the traditional data backup system consists of a backup server, a media server and a plurality of backup clients, wherein the backup server and the media server belong to the backup server, the backup server is mainly used for backup process control, and the media server is mainly used for backup data storage; the backup client provides a backup data source which comprises an operation execution module, a plan setting backup strategy is established through a backup server, the backup strategy is issued to the backup client after the time point triggered by the plan is reached, the operation execution module is started to execute a backup task after the backup client receives a backup command and the strategy, the backup operation executes a series of preprocessing operations firstly, and then data is transmitted to a media server for backup;

the control of the whole execution process of the backup operation is controlled by a backup client, complex operations can be simply split into a plurality of sub-operations to be completed, but the splitting granularity is still relatively coarse, and the control execution of the sub-operation flow is often realized in a single-thread serial manner; the conventional data backup system has the following disadvantages:

1. if a data backup system needs to support more and more backup services of different types of data sources, the problems of program codes being overstaffed and coupling being high are inevitably brought, and the maintenance and the expansion are more and more difficult;

2. when the operation execution module of the backup client performs service processing, some complex service processing can occupy certain degrees of computing resources of the backup client except for data transmission and interaction with the server, and the business system of a client is influenced to a certain degree in a production environment;

3. because the server and the client have relevance in the whole backup process, when the server is upgraded, the client needs to be upgraded according to the situation, and when one backup system has a plurality of clients and is dispersed in regions, the upgrading needs higher cost;

therefore, it is necessary to develop a new data backup job scheduling system and a new backup job scheduling method to solve the existing problems.

Disclosure of Invention

The invention aims to provide a data backup operation scheduling system and a backup operation scheduling method, which aim to solve the problems of larger resource occupation and higher maintenance and upgrading cost of a backup client.

In order to achieve the purpose, the invention provides the following technical scheme: a data backup job scheduling system, comprising:

the operation engine module is used for splitting and arranging operation and operation stages;

the job manager is used for managing the life cycle of the job, the job state, controlling the job flow and receiving the scheduling result of the job engine;

the event responder is used for receiving and processing the job control signal sent by the job manager;

and the phase executor is used for starting and executing the operation phase process.

Preferably, the job engine module includes:

a sub-job generator for splitting the entire job into a plurality of sub-jobs according to a backup policy;

the operation scheduler is used for determining the execution sequence of the sub-operations and arranging the sub-operation list obtained by splitting according to the priority to obtain the sub-operations which can run concurrently;

and the operation stage scheduler is used for splitting one sub-operation into a plurality of operation stages to obtain operation stages capable of running concurrently and determining the running sequence among the operation stages.

Preferably, the data backup job scheduling system is configured at a node, and when the node is a backup server, the data backup job scheduling system includes: the system comprises an operation engine module, an operation manager, an event responder and a phase executor.

Preferably, the data backup job scheduling system is configured at a node, and the node includes a backup client and/or a backup server.

The invention also provides a data backup job scheduling method, which comprises the following steps:

the job manager informs the job engine to split the job;

the operation manager receives the splitting result to obtain a sub-operation list;

the job manager informs a job engine to perform job arrangement, receives a job arrangement result, obtains a job execution list and triggers sub-job execution;

the job manager informs the job engine to perform job phase arrangement, and the job engine returns an arrangement result to obtain a concurrently executed job phase;

the operation manager issues operation phases to be executed to the nodes, an event responder of the nodes receives the request and then creates operation phase processes, and the operation phase processes execute service operation;

after the execution of the operation stage is finished, the operation manager informs the operation engine to continue the arrangement of the operation stage until the execution of all the operation stages is finished and the sub-operation is finished;

and the job manager informs the job engine to rearrange the jobs until all the sub-jobs are executed and the jobs are finished.

Preferably, the operation phase completes a specific work in the operation, one operation phase is executed only on one node and has atomicity, and one operation phase is usually a process. And uploading check points, states and progress in the process of executing the process in the operation stage.

Preferably, the method for triggering the sub-job includes:

the strategy triggers and executes, and the operation manager starts an operation main thread;

the operation manager calls an operation engine to split an operation interface, the operation engine splits the whole operation into different sub-operations according to the backup source and the parameter configuration, and an ordered sub-operation list is returned;

the operation manager stores the sub-operation list and adds the sub-operation list into a sub-operation queue to be scheduled;

the operation manager calls an operation engine interface, transmits the operation engine interface into a sub-operation queue to be scheduled, and the operation engine returns a sub-operation list to be concurrently executed in the current round according to parameter setting, the current environment and the service requirement;

if the concurrent sub-job list is not empty, triggering the execution of the sub-job, removing the sub-job from the list to be scheduled, and monitoring the state of the sub-job;

monitoring the state of the concurrent sub-job list until any sub-job is finished;

after any sub-job is finished, a new round of sub-job scheduling is triggered again, and a sub-job queue to be scheduled is received;

if the concurrent sub-job list is empty, no sub-job to be scheduled exists, and no sub-job in operation exists, ending the job;

the backup source is a sub-job or a plurality of backup sources are used as a sub-job.

Preferably, the monitoring of the concurrent sub-job list state includes:

if the concurrent sub-job list is empty, continuously judging whether sub-jobs are executed;

if the concurrent sub-job list is empty and a sub-job is being executed, continuing waiting, monitoring the running of the sub-job, and waiting for the running to be finished;

if the concurrent sub-job list is empty and no sub-job is being executed, judging whether the sub-job to be scheduled is not executed;

and if the sub-jobs to be scheduled are not executed, continuing waiting, monitoring the running of the sub-jobs, and waiting for the end of the running.

Preferably, the sub-job execution step includes:

triggering and executing the sub-operation, and starting a sub-operation thread by the operation manager;

the operation manager calls an operation engine interface, transmits related parameters of operation and phases and acquires an operation phase list to be concurrently executed in the current round;

if the list of the concurrently executed operation stages is not empty, the operation stages are sent to corresponding execution nodes, and each execution node starts the process of the operation stage to execute specific services;

if the list of the concurrently executed operation stages is empty, continuously judging whether the sub-operation has an operation stage still being executed;

if the current sub-operation still has an operation stage in operation, continuing waiting, monitoring the execution of the operation stage, and waiting for the completion of the execution;

when any operation stage is finished, triggering a new round of operation stage scheduling, determining a new round of operation stage to be executed concurrently, continuing to execute the related parameters of the incoming operation and the stage, and acquiring an operation stage list to be executed concurrently in the current round;

and the list of the concurrently executed operation phases is empty, no operation phase of the current sub-operation is in operation, all the operation phases of the sub-operation are judged to be executed, and the execution of the sub-operation is finished.

Preferably, when the execution of the operation phase is abnormal, the rollback phase corresponding to the normal operation phase is executed, and then the corresponding rollback phases are executed in reverse order according to the execution sequence of the executed operation phases.

The invention has the technical effects and advantages that: the data backup operation scheduling system and the backup operation scheduling method can decouple core backup services of different types of data sources, realize multiplexing of common services, and have no more limitation on the multiplexing service range, no matter service processing of different types of data sources, service processing of different products of different product lines or other scenes as long as the common services can be multiplexed, such as: the copy data management and real-time backup services can use volume copy, so that the volume copy logic can be independently extracted to be used as an independent public operation stage, the volume copy logic can be directly used in the implementation of two different products without writing the volume copy logic codes in respective implementation, the implementation personnel of the upper-layer service do not need to pay attention to how the bottom-layer service is implemented, the center of gravity can be placed on the upper-layer service, the code logic is clear, the redundancy is reduced, the whole backup process is structured and clear, the maintenance and the expansion are easy, and the difficulty and the cost of development and maintenance are reduced; the method has the advantages that the part of the backup client for processing complex business logic and data calculation is divided into a plurality of operation stages to be moved to the server for execution, the requirement on the calculation performance of the backup client is greatly reduced, the resource occupation of the backup client is reduced, the backup client only keeps basic command operation and simple general business, when the server is upgraded, most scenes of the client do not need to be upgraded, the upgrading cost is reduced, the resource occupation of the backup client is reduced, the influence on the production environment of a client is reduced, the calculation resources of the server are fully utilized, the coupling performance of the backup client and the server is reduced, the upgrading frequency of the client is reduced, the operation of the client is reduced by the operation execution, the resource occupation of the client is reduced, the later maintenance and upgrading are facilitated, the development process is simplified, the development process is flexible, the development and maintenance cost is reduced, the development and maintenance personnel do not need to pay attention to all details of the operation execution, developers and maintenance personnel of different businesses are facilitated, the multiplexing of the same business flow is facilitated, the development difficulty of abnormal processing is reduced, the abnormal rollback is added into the necessary flow of the development, the improvement of the overall performance and the thinning of the operation are facilitated, the whole process is tracked and the whole process is facilitated; the scheme solves the problems that most of the existing schemes without agent backup need to depend on specific conditions, such as a virtualization environment, a cloud environment and the like for backup, or depend on a shared storage technology and the like, and have defects in universality, and reduces the resource occupation of a client; by the operation splitting technology, the problems that operation splitting is not fine enough, the business processing process is simply classified, a plurality of complex processing logics and operations are still arranged in the operation, and one sub-operation still needs to do a plurality of works are solved.

Drawings

FIG. 1 is a schematic block diagram of the system of the present invention;

FIG. 2 is a flowchart of a data backup job scheduling method of the present invention;

FIG. 3 is a flow chart of a sub-job scheduling method of the present invention;

FIG. 4 is a flowchart of a method for scheduling job phases of the present invention;

fig. 5 is a schematic diagram of the stage rollback process of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a data backup job scheduling system as shown in fig. 1, which relates to a backup server and a backup client;

the backup server comprises a job manager, a job engine, an event responder and a phase executor; the backup client comprises an event responder and a phase executor;

wherein:

the job manager is used for managing the life cycle of the job, managing the job state, controlling the job flow and receiving the scheduling result of the job engine;

the job engine is used for splitting and arranging jobs and job phases and comprises a sub job generator, a job scheduler and a job phase scheduler; the sub-job generator is used for splitting the whole job into a plurality of sub-jobs according to a certain rule according to a backup strategy;

the job scheduler determines the execution sequence of the sub-jobs, and is used for arranging the split sub-job list according to the priority to obtain the sub-jobs which can run concurrently;

the operation stage dispatcher is responsible for splitting one sub-operation into a plurality of operation stages to obtain operation stages capable of running concurrently and determine the running sequence among the operation stages;

the operation engine modules can be provided with a plurality of modules according to the service requirements, process different types of services and respond to the operation scheduling control of the operation manager;

the event responder is responsible for receiving and processing the job control signal sent by the job manager;

the phase executor is responsible for starting and executing the operation phase process;

the backup strategy refers to configuration information of data needing to be protected, and comprises connection information, a protection period, a backup storage medium and the like;

the operation refers to one-time execution corresponding to the backup strategy, one operation consists of a plurality of operation stages, different stages in one operation can be executed at different nodes and comprise a backup server, and after one operation fails, the operation can be added into the queue again to be scheduled again;

the operation stage completes a specific work in the operation, one operation stage is composed of a plurality of operation steps, one operation stage can only be executed at the same node, the operation stage has atomicity, retry of certain operations can be carried out in the operation stage, such as network, the operation stage can also be replayed, and one operation stage is usually a process;

the operation step is to complete one or more operations of a certain function item in the operation stage;

when one job is triggered to execute, the whole job is divided into a plurality of sub-jobs, each sub-job is divided into a plurality of job stages, and the execution of the whole job becomes the permutation and combination of the job stages;

different services have different processing flows, the flows are refined and sorted to be defined into different operation stages, and the operation is orderly executed according to the arrangement sequence by using the arrangement of the operation stages;

each stage only finishes a certain specific work in the operation, so that a large free space is available during arrangement, and different effects can be achieved through arrangement in different modes;

generating check point information in the execution process of the operation stage, recording the operation done in the operation stage, retrying the operation stage with the abnormality as required once the operation stage with the abnormality occurs, and performing reverse rollback according to the execution sequence of the operation stage, wherein the rollback basis is the check point information, and what is done before is what is rolled back;

the operation stage of each normal flow is forced to correspond to a rollback stage, when the execution process is abnormal and needs to be rolled back, the corresponding rollback stage is executed, the rollback stage performs operation before the rollback stage according to the check point information of the corresponding normal stage, and the operation does not need to care how the whole operation is rolled back;

defining some common operations and services as a common operation stage, and realizing multiplexing among different services;

when a new function is needed, only a new operation stage is developed for realization, and then the new operation stage is directly added into the operation arrangement, so that the execution of the function is realized, and the expansion is easy;

when the operation stage is arranged, the operation stage which needs complex service logic, large amount of calculation and consumes calculation resources is issued to the backup server for execution, and the operation stage issued to the backup client only executes basic command calling and simple basic logic processing, so that the resource occupation of the backup client is reduced, the server resources are fully utilized, meanwhile, the coupling between the client and the server is reduced, the dependency relationship is reduced, and the subsequent version upgrading and compatibility are facilitated;

when the operation stage is arranged, the sub-operation or the operation stage is executed concurrently according to the backup strategy setting or the current environment condition, and the overall performance and the throughput of the operation are effectively improved.

The invention provides a data backup job scheduling method as shown in fig. 2, which comprises the following steps:

when the job triggering time is reached, starting to execute the job;

the job manager informs the job engine to split the job; in this embodiment, the form of the notification includes a sending instruction;

the operation manager informs the operation engine to perform operation arrangement;

the operation manager receives the operation arrangement result, obtains sub-operations to be executed concurrently in the current round, and triggers the sub-operations to be executed;

the operation manager informs the operation engine to arrange the operation stage;

the operation engine returns an arrangement result to obtain the operation stage to be executed concurrently in the current round;

the operation manager issues the operation stage to be executed to each node, the event responder of each node receives the request and creates a process corresponding to the operation stage, and the operation stage process executes a specific service operation;

reporting check points, states and progress in the process of executing the process in the operation stage;

when the execution of the operation stage is finished, the operation manager informs the operation engine to arrange the operation stage again, and starts a new operation stage scheduling until all the operation stages are finished and the sub-operation is finished;

when the sub-jobs are finished, the job manager informs the job engine to schedule the jobs again, and the scheduling of a new round of sub-jobs is started until the sub-jobs are completely executed and the jobs are finished;

as shown in fig. 3, the trigger sub-job method includes:

the job manager calls a job engine to split a job interface, the job engine splits the whole job into different sub-jobs according to the backup source and various parameter configurations, and an ordered sub-job list is returned;

the operation manager calls an operation engine interface, transmits the operation engine interface into a sub-operation queue to be scheduled, and the operation engine returns a sub-operation list to be executed concurrently according to parameter setting, current environment and service requirements;

triggering the execution of the sub-jobs when the current round of concurrent sub-job lists are not empty, removing the sub-jobs from the list to be scheduled, and monitoring the state of the sub-jobs;

if the current round of concurrent sub-job list is empty, continuously judging whether any sub-job is executed;

if the concurrent sub-job list is empty and a sub-job is executed, continuing waiting, monitoring the operation of the sub-job, and waiting for the end of the operation

If the current round of concurrent sub-job list is empty and no sub-job is executed, judging whether the sub-job to be scheduled is not executed

If there is not executed sub-job to be scheduled, continuing waiting, monitoring the operation of sub-job, waiting for the end of operation

After any sub-job is finished, the job manager calls the job engine interface again to re-trigger a new round of sub-job scheduling;

the current round of concurrent sub-job list is empty, has no sub-job to be scheduled and has no sub-job in operation, and the operation is finished;

the job splitting can be realized by independently forming one sub job by one backup source, can be realized by forming a plurality of backup sources into one sub job, and can also be realized by determining the sub job division according to other standards; in the list of the current round of concurrent execution sub-jobs to be determined, there may be only a single sub-job, and the concurrence is not necessarily required.

As shown in fig. 4, the sub-job execution includes:

the sub-job triggers execution, and the job manager opens the sub-job thread

if the operation phase list of the current round of concurrent execution is not empty, the operation phase is sent to the corresponding execution node, and each execution node starts the operation phase process to execute the specific service

If the operation stage list of the current round of concurrent execution is empty, whether the sub-operation has the operation stage still being executed or not is continuously judged

If the current sub-job still has the operation stage in operation, continuing to wait, monitoring the execution of the operation stage, and waiting for the completion of the execution

After any operation stage is finished, the operation manager calls an operation engine interface again, triggers a new operation stage scheduling and determines a new operation stage to be executed concurrently;

the operation stage list of the current round of concurrent execution is empty, and no operation stage of the current sub-operation runs, and the execution of all the operation stages of the sub-operation is judged to be finished, and the execution of the sub-operation is finished;

the operation phase list executed concurrently in the current round can be only one operation phase, and does not need to be concurrent;

a stage rollback process:

as shown in fig. 5, a sub-job is arranged according to job phases and sequentially comprises a server preparation phase, a client preparation phase, a data copying phase, a client completion phase and a service period completion phase, and each phase necessarily has a rollback phase corresponding to the phase;

if the execution of a certain stage is abnormal and needs to be rolled back, the rolling back stage corresponding to the normal operation stage is executed, and then the corresponding rolling back stage is executed in reverse order according to the execution sequence of the operation stages executed previously;

if the data copy phase is executed abnormally and needs to be rolled back, the execution sequence is as follows:

a server preparation stage, a client preparation stage, a data copying rollback stage, a client preparation rollback stage and a server preparation rollback stage;

the rollback phase can be realized by using null or simply according to different services, but each phase must have a corresponding rollback phase, that is, each normal service operation phase corresponds to an abnormal rollback service operation phase, and the rollback phases exist in pairs.

The data backup operation scheduling system and the backup operation scheduling method decouple core backup services of different types of data sources, realize multiplexing of common services, and have no more limitation on the multiplexing service range, no matter service processing of different types of data sources, service processing of different products of different product lines or other scenes, as long as the common services can be multiplexed, such as: the copy data management and real-time backup services can use the volume copy, so that the volume copy logic can be independently extracted to serve as an independent public operation stage, the volume copy logic can be directly used in the implementation of two different products without writing the volume copy logic codes in respective implementation, the implementation personnel of the upper-layer service do not need to pay attention to how the bottom-layer service is implemented, the center of gravity can be placed in the upper-layer service, the code logic is clear, the redundancy is reduced, the whole backup process is structured and clear, the maintenance and the expansion are easy, and the difficulty and the cost of development and maintenance are reduced; the method has the advantages that the part of the backup client for processing complex business logic and data calculation is divided into a plurality of operation stages to be moved to the server for execution, the requirement on the calculation performance of the backup client is greatly reduced, the resource occupation of the backup client is reduced, the backup client only keeps basic command operation and simple general business, when the server is upgraded, most scenes of the client can be free from upgrading, the upgrading cost is reduced, the resource occupation of the backup client is reduced, the influence on the production environment of a client is reduced, the calculation resource of the server is fully utilized, the coupling performance of the backup client and the server is reduced, the upgrading frequency of the client is reduced, the operation of the client is reduced by the operation execution, the resource occupation of the client is reduced, the later maintenance and upgrading are facilitated, the development process is simplified, the development process is flexibly activated, the development and maintenance costs are reduced, the development and maintenance personnel do not need to pay attention to all details of the operation execution, developers and maintenance personnel of different businesses are facilitated, the multiplexing of the same business flow is facilitated, the development of abnormal processing is facilitated, the abnormal rollback is added into the necessary flow of the development, the improvement of the overall performance and the whole-course of the operation execution is facilitated, the thinning of the operation is facilitated, and the whole-process is facilitated; the scheme solves the problems that most of the existing schemes without agent backup need to depend on specific conditions, such as a virtualization environment, a cloud environment and the like for backup, or depend on a shared storage technology and the like, and have defects in universality, and reduces the resource occupation of a client; by the operation splitting technology, the problems that operation splitting is not fine enough, the business processing process is simply classified, a plurality of complex processing logics and operations are still arranged in the operation, and one sub-operation still needs to do a plurality of works are solved.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims

1. A data backup job scheduling system, characterized by: the method comprises the following steps:

2. The data backup job scheduling system according to claim 1, wherein: the job engine module includes:

3. The data backup job scheduling system according to claim 1, wherein: the data backup job scheduling system is configured at a node, and when the node is a backup server, the data backup job scheduling system comprises: the system comprises a job engine module, a job manager, an event responder and a phase executor.

4. The data backup job scheduling system according to claim 1, wherein: the data backup job scheduling system is configured at a node, and the node comprises a backup client and/or a backup server.

5. A data backup job scheduling method is characterized in that: the method comprises the following steps:

the job manager informs the job engine to split the job;

6. The data backup job scheduling method according to claim 5, wherein: the operation stage finishes a specific work in the operation, one operation stage is executed only on one node and has atomicity, and one operation stage is usually a process; and uploading check points, states and progress in the process of executing the operation phase process.

7. The data backup job scheduling method according to claim 5, wherein: the method for triggering the sub-operation comprises the following steps:

if the concurrent sub-job list is not empty, triggering the execution of the sub-job, removing the sub-job from the to-be-scheduled list, and monitoring the state of the sub-job;

if the concurrent sub-job list is empty, no sub-job to be scheduled exists, and no sub-job in operation exists, the operation is finished;

8. The data backup job scheduling method according to claim 5, wherein: the monitoring of the concurrent sub-job list state comprises:

9. The data backup job scheduling method according to claim 5, wherein: the sub-job execution step includes:

the operation manager calls an operation engine interface, transmits relevant parameters of operation and phases and obtains an operation phase list to be executed concurrently in the current round;

and if the list of the concurrently executed operation stages is empty and no operation stage of the current sub-operation is running, judging that the execution of all the operation stages of the sub-operation is finished and finishing the execution of the sub-operation.

10. A data backup job scheduling method according to any one of claims 5 to 9, wherein: and when the execution of the operation stages is abnormal, executing the corresponding rollback stages, and executing the corresponding rollback stages in a reverse order according to the execution sequence of the executed operation stages.