CN113032385A

CN113032385A - Easily-extensible configurable data backup system and method

Info

Publication number: CN113032385A
Application number: CN202110597523.9A
Authority: CN
Inventors: 衣得平
Original assignee: Beijing Jiangrongxin Technology Co ltd
Current assignee: Beijing Jiangrongxin Technology Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-06-25
Anticipated expiration: 2041-05-31
Also published as: CN113032385B

Abstract

The invention discloses a configurable data backup system and method easy to expand, which comprises a management platform service module, a process processing engine module and a multi-data source service module; the management platform service module is used for configuring system parameters and parameters related to data backup, providing a migration flow monitoring function, and providing service registration, service discovery and service monitoring; the flow processing engine module is used for executing migration tasks based on the parameters configured by the management platform service module, the migration tasks are executed among the migration groups according to the hierarchical code sequence, and the migration tasks are executed in batches according to the sequence set in the system parameters in the migration groups; the multi-data source service module is used for configuring, initializing and managing transactions of a plurality of data sources; the system adopts a batch processing mode to perform data migration of small batches and multiple batches, and adopts an error processing mode to ensure the integrity of the data migration for abnormal conditions occurring in the migration process.

Description

Easily-extensible configurable data backup system and method

Technical Field

The invention relates to the technical field of data storage and data processing, in particular to an easily extensible and configurable data backup system and method.

Background

Along with the continuous deep digitalization of a bank network credit business system, the continuous and stable operation of the business system and the increase of time, a large amount of business data generated in the business system are stored in a database table, the gradual increase of the data can cause the response speed of the business in the day to be slow and exceed the maximum response time of business requirements in the current business system, the time consumption of business processing in the end of the day is long, the development of normal business in the second day can be delayed, the query of the data requested by the superior is slow, so that the inconvenience is caused by the fact that the data is requested to be reported according to the superior (bank supervision, pedestrian and in-row) requirements, even the query is failed due to insufficient temporary space caused by overlarge data, the business system cannot normally operate due to insufficient storage space of a physical machine where the database is located, and the network.

In order to solve the above problems, the current processing method adopted by banks is to manually and periodically backup the production database, backup the database into a backup file, and then manually clean the production system, that is, delete the expired transaction data and process data to release the storage space of the database, although the current processing method solves the above problems, the current processing method has the following defects:

(1) the whole library has more backup redundant data and the backup files occupy larger disk space: the current backup scheme is a whole-library backup, which is rough, and the backup data contains a large amount of redundant data (non-service data), so that the generated backup files are extremely large;

(2) the backup file can not be directly inquired and used: backup files of the database cannot be directly used for screening and querying, and the screening and querying can be carried out only by restoring the backup files to the database;

(3) the time consumption of each backup is too long and the daytime business development is influenced: the database is a full backup, and because the database is large, each backup cannot be completed in a short time and day business cannot be developed during backup, daily transactions need to be suspended during each backup, and normal development of the daily transactions is influenced;

(4) the database backup file is complicated to use, and influences the data reporting timeliness: the superior (bank supervision, pedestrian and inline) often sends a text to request the data to be reported, the data to be reported is unpredictable in composite dimensionality (for example, the data to be reported is the principal balance of a certain date node joint loan in the past) and must be reported strictly according to the deadline, based on the requirement, the backup file of a certain time node needs to be taken out for restoration, and then the backup file is inquired and reported according to the superior requirement, so that the whole process is complicated, the consumed time is very long, and the reporting efficiency is greatly influenced;

(5) the manual data cleaning process has high requirements on personnel and is easy to make mistakes: at present, the data cleaning in a manual mode requires personnel to be familiar with the system, and the risk of errors exists when the data is cleaned.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide an easily extensible and configurable data backup system, which uses a backup database to replace an original backup file, and periodically (with a small amount and a high frequency) migrates expired data into the backup database through a migration (cleaning) system, so as to solve the problems that the original migration consumes a long time and the backup data cannot be directly queried; the backup database can be expanded horizontally, and data migration (data cleaning) actions can be carried out among a plurality of physical machines and a plurality of data sources; the system only backs up the business data (the non-business data do not need to be backed up), and the physical space occupation of the backup is reduced; the data backup is flexibly configured, the whole process of data migration (data cleaning) is executed by a program without manual intervention, the automation of the data backup process is realized, and the possibility of manual error is avoided.

The second purpose of the invention is to provide a configurable data backup method easy to expand, which periodically performs data migration on data in a business base table in a cross-data-source manner according to the actual situation of business data in a configurable manner so as to achieve the purpose of backup, and deletes expired data or migrates infrequent data so as to improve the service performance of the business base table and reduce the occupation of a disk space by a database.

The first technical scheme adopted by the invention is as follows: an easily extensible and configurable data backup system comprises a management platform service module, a process processing engine module and a multi-data source service module;

the management platform service module is used for configuring system parameters and parameters related to data backup, providing a migration flow monitoring function, and providing service registration, service discovery and service monitoring; the parameters related to the data backup comprise SQL parameter variables and migration SQL templates; defining the same batch of related data in a group mode in the migration SQL template to form a migration group;

the flow processing engine module is used for executing migration tasks based on parameters configured by the management platform service module, executing the migration tasks based on a hierarchical code sequence among the migration groups, and executing the migration tasks in batches based on a sequence set in system parameters in the migration groups;

the multi-data source service module is used for configuring, initializing and managing transactions of a plurality of data sources.

Preferably, the management platform service module comprises a registration center submodule, a scheduling management submodule, a configuration center submodule and a monitoring center submodule;

the registration center submodule is used for service registration, service discovery and service monitoring functions;

the scheduling management submodule is used for configuring system parameters, including a migration task kernel expression, migration scheduling execution frequency and a migration time-consuming threshold, and triggering an execution process;

the configuration center sub-module is used for configuring the configuration related to data backup, and comprises a configuration SQL parameter variable and a migration SQL template, wherein the configuration migration SQL template comprises a configuration processing type, a migration group, a migration service channel, a data source, a migration rule and a data dictionary;

the monitoring center submodule is used for providing a migration flow monitoring function and monitoring various indexes in the migration process.

Preferably, the flow processing engine module comprises a configuration parser, a flow processor, a log tracker and an error handler;

the configuration parser is used for dynamically generating the SQL parameter variable into a current-day SQL variable, generating a current-day executable SQL statement by combining the current-day SQL variable and the migration SQL template, and storing the executable SQL statement into a migration task table;

the flow processor is used for executing a migration task;

the log tracker is used for recording the migrated task log;

the error handler is to compensate for an abnormally terminated migration task.

Preferably, the multiple data source service module comprises a data access and persistence sub-module and a thing control sub-module;

the data access and persistence sub-module is used for initializing and persisting the data source;

and the transaction control submodule is used for performing transaction control on the data source.

Preferably, the plurality of data sources all belong to MYSQL databases, and include a management platform database, a data migration source library and a data migration target library.

The second technical scheme adopted by the invention is as follows: an extensible and configurable data backup method comprises the following steps:

s100: configuring system parameters, SQL parameter variables and migrating an SQL template; defining the same batch of related data in a group mode in the migration SQL template to form a migration group;

s200: generating a current day SQL parameter based on the SQL parameter variable, and generating a migration task based on the current day SQL parameter and the migration SQL template;

s300: scheduling and executing the migration task until the migration is finished; the migration tasks are sequentially executed among the migration groups based on the hierarchical codes, and the migration tasks are executed in batches based on the sequence set in the system parameters in the migration groups.

Preferably, the step S200 includes the following sub-steps:

s210: extracting the SQL parameter variable to generate a current day SQL parameter;

s220: extracting the migration SQL template;

s230: configuring and synthesizing SQL sentences executable by the current migration task based on the SQL parameters of the current day and the migration SQL template;

s240: and saving the SQL statement as a migration task.

Preferably, the step S300 includes the following sub-steps:

s310: when the migration scheduling operation starting date set in the system parameters is reached, starting a migration task;

s320: judging whether the previous unfinished migration task exists, if so, extracting the unfinished migration task and the unfinished migration task according to the time sequence, processing the unfinished migration task firstly, and then processing the unfinished migration task; if not, directly extracting the unexecuted migration task for processing.

Preferably, the step S300 of executing migration tasks based on the hierarchical code sequence among the migration groups specifically includes:

(1) sorting the plurality of migration groups based on the hierarchical code to generate a migration group dimension;

(2) and circularly processing the migration tasks of each migration group based on the migration group dimension, and if all the migration tasks in the current migration group are processed, continuing to process the migration tasks in the next migration group.

Preferably, the executing of the migration tasks in batches based on the sequence set in the system parameters in the migration group in step S300 specifically includes:

(1) extracting migration data in batches based on values set in system parameters;

(2) judging whether the data can be extracted or not, if the data cannot be extracted, finishing the processing of the current migration task, and continuing the next migration task;

if the data can be extracted, inserting the data into a target base table, deleting the extracted data in the original base table, judging whether the current time reaches the migration scheduling operation stop date of the data or whether the processing process is abnormal, and if the migration scheduling operation stop date of the data or the processing process is abnormal, marking the data as an unfinished migration task; if the migration scheduling operation stop date is not reached and no exception occurs, performing migration processing on the data; and continuously extracting migration data in batches based on the values set in the system parameters after the data migration is completed, and circularly processing until all migration tasks in the migration group are completed.

The beneficial effects of the above technical scheme are that:

(1) the easily-extensible configurable data backup system disclosed by the invention adopts the backup database to replace the original backup file, and the expired data is migrated to the backup database regularly (in a high-frequency small amount) through the migration (cleaning) system, so that the problems that the original migration consumes a long time and the backup data cannot be directly inquired are solved.

(2) The easily-extensible configurable data backup system only backs up business data (non-business data does not need to be backed up), reduces the occupation of backup physical space, and enables data backup to be more refined, namely, data migration of the system is not migration of a whole library or a whole table, but data migration customized based on business, namely migration based on business data dimensionality, and only migration of business data meeting configuration conditions.

(3) The data backup system disclosed by the invention adopts multi-data source configuration, and performs transverse data expansion by adding a MYSQL database, so that data migration (data cleaning) action can be performed among a plurality of physical machines and a plurality of data sources.

(4) The SQL parameter variables and the migration SQL template can flexibly configure the migration mode and the dependency relationship according to the condition of the actual service base table, so that the data migration (data cleaning) process can be configured according to the actual service condition according to the development requirement of a service system, the service change can be flexibly coped with, and the flexible configuration of data backup is realized; and the migration time can be flexibly controlled.

(5) In the invention, the whole process of data migration (data cleaning) is executed by a program without manual intervention, thereby realizing the automation of the data backup process and avoiding the possibility of manual error.

(6) The data backup system structure disclosed by the invention adopts a micro-service system, is based on a micro-service framework of SSM (springMVC + Spring + Mybtais), and carries out secondary development on the basis of a MYSQL database; the deployment mode of the system is single machine deployment, does not depend on any cluster technology (such as kubernets and Hadoop), has low learning cost, and can be configured and used only by directly deploying on a Linux system.

(7) The data backup system disclosed by the invention does not adopt a message middleware mode to migrate data, adopts a paging (batch) processing mode to perform data migration in small batches and multiple batches for processing large data, and adopts an error processing mode to ensure the integrity of data migration for abnormal conditions occurring in the migration process.

Drawings

FIG. 1 is a schematic structural diagram of an easily extensible and configurable data backup system according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for easily extensible and configurable data backup according to an embodiment of the present invention;

FIG. 3 is a flow diagram of generating migration tasks provided by one embodiment of the invention;

FIG. 4 is a flow diagram of performing a migration task according to one embodiment of the present invention;

FIG. 5 is a flowchart of processing migration tasks between and within migration groups according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following detailed description of the embodiments and the accompanying drawings are provided to illustrate the principles of the invention and are not intended to limit the scope of the invention, which is defined by the claims, i.e., the invention is not limited to the preferred embodiments described.

In the description of the present invention, it is to be noted that, unless otherwise specified, "a plurality" means two or more; the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; the specific meaning of the above terms in the present invention can be understood as appropriate to those of ordinary skill in the art.

Example 1

As shown in fig. 1, the embodiment discloses an easily extensible and configurable data backup system, which includes a management platform service module, a process processing engine module, and a multiple data source service module.

The management platform service module is used for configuring relevant parameters and system parameters of data backup; providing a migration flow monitoring function, and monitoring various indexes in the migration process; the role of a registration center in the service is assumed, and service registration, service discovery and service monitoring are provided; the management platform service module comprises a registration center submodule, a scheduling management submodule, a configuration center submodule and a monitoring center submodule;

the registry submodule is used for realizing unified service registration, service discovery and service monitoring functions, and a caller can discover and use the service through the registry;

the scheduling management submodule is used for configuring system parameters, and comprises a migration task corner expression, migration scheduling execution frequency (namely batch migration and deleted data volume) and a migration time consumption threshold (namely migration scheduling operation starting date and migration scheduling operation stopping date), and triggering an execution process;

the configuration center sub-module is used for configuring data backup related configuration, and comprises a configuration SQL (structured query language) parameter variable and a migration SQL template, wherein the configuration migration SQL template comprises parameters such as a configuration processing type, a migration group, a migration service channel, a data source, a migration rule, a data dictionary and the like, and the processing type comprises data migration and data cleaning; when the configuration of the migration group is used for migrating the data with the same batch of relevance, the data are defined in a group mode;

The flow processing engine module is used for processing specific migration actions based on parameters configured by the management platform service module, and comprises the functions of rule analysis, parameter execution and SQL statement conversion and synthesis, data migration (data cleaning) rule execution, execution of log related to the process and result registration and error registration, and error processing; the flow processing engine module comprises a configuration parser, a flow processor, a log tracker and an error processor;

the configuration parser is used for dynamically generating the current day SQL variables from the configuration defined by the SQL parameter variable table, generating current day executable SQL statements by combining the current day SQL variables and the migration SQL template, and storing the current day executable SQL statements into the migration task table;

the flow processor is used for executing migration tasks, the migration groups execute according to the hierarchical code sequence, and the migration groups execute in batches according to the sequence set in the system parameters;

the log tracker is used for recording the task log of the migration, so that the migration progress and abnormal conditions can be conveniently checked;

the error handler is to compensate for the abnormally terminated migration task.

The multi-data-source service module is used for configuring, initializing and managing multiple data sources across base tables and machines based on Spring and Mybatis, and the data sources comprise a management platform database, a data migration source library and a data migration target library; the management platform database, the data migration source database and the data migration target database all belong to MYSQL databases; the management platform database is used by the data backup system; the number of the data migration source base is at least one, and the number of the data migration target base is at least one; the multi-data source service module comprises a data access and persistence sub-module and an object control sub-module;

the data access and persistence sub-module is used for carrying out data source initialization and persistence management on the management platform database, the data migration source library and the data migration target library;

and the object control submodule is used for performing transaction control on the management platform database, the data migration source library and the data migration target library so as to ensure the consistency and the integrity of data.

The method adopts multi-data source configuration, and performs transverse data expansion by adding the MYSQL database, namely, by adding database configuration in a configuration file, a data source is created for the MYSQL database after the service is restarted, and transaction management is performed; data migration (data cleansing) actions may be performed between multiple physical machines, multiple data sources.

The data backup system structure disclosed by the invention adopts a micro-service system, is based on a micro-service framework of SSM (springMVC + Spring + Mybtais), and carries out secondary development on the basis of a MYSQL database; the deployment mode of the system is single machine deployment, does not depend on any cluster technology (such as kubernets and Hadoop), has low learning cost, and can be configured and used only by directly deploying on a Linux system.

The easily-extensible configurable data backup system disclosed by the invention can flexibly configure the backup base table to perform data backup (data cleaning) according to the actual needs of the business; when processing large-batch data, high-frequency small-batch data processing can be performed by configuring data in batches, and the data processing can stably run under ten million-level data; the time consumed by migration (cleaning) is controllable; the data integrity can be ensured not to be lost when the abnormity occurs; the data can be continuously processed from the position where the error occurs last time through the error processing function, and the integrity of data migration is ensured; the backup base table can also be transversely expanded; in addition, the backup data recovery function is supported to meet the data query requirement under a special service scene.

Example 2

As shown in fig. 2, the present embodiment discloses an easily extensible and configurable data backup method, which includes the following steps:

s100: configuring system parameters, SQL parameter variables and migrating an SQL template; when a migration SQL template is configured, when the same batch of related data is migrated, a migration group is defined in a group mode, and each piece of data in the migration group migrates the related tables one by one according to the sequence;

wherein, the configuration of the system parameters is shown in table 1; the configuration of the SQL parameter variables is shown in the table 2, and when the SQL parameter variables are configured, the parameters can be matched and replaced with the variables in the SQL template through parameter names to obtain actual operation parameters; the configuration of the migration SQL template is shown in table 3, and when the migration SQL template is configured, the SQL statement template is migrated based on the actual business condition, that is, the SQL parameter variables and the migration SQL template can flexibly configure the migration mode and the dependency relationship according to the actual business table condition.

Table 1 configuring system parameters

The parameters corn _ rule and start _ times in the system parameters of table 1 control when, how often, and how the migration schedule is executed; the parameter end _ time is used for controlling the migration ending time, the migration is in batch, when the migration of each batch is ended, whether the current time exceeds the end _ time or not is judged, if yes, the current migration processing is ended, and the migration time can be flexibly controlled.

TABLE 2 parameter configuration description for configuring SQL parameter variables

Table 3 configuration migration SQL template correlation definition

S200: generating a current day SQL parameter based on the SQL parameter variable, and generating a migration task based on the current day SQL parameter and a migration SQL template;

the specific time point for generating the migration task every day is configured according to the actual situation, for example, the migration task is generated based on the SQL parameter and the migration SQL template of the day at 0 point every day;

as shown in fig. 3, generating a migration task includes the following sub-steps:

s210: extracting SQL parameter variables to generate current day SQL parameters;

s220: extracting a migration SQL template;

s230: configuring and synthesizing SQL sentences executable by the current migration task based on the current day SQL parameters and the migration SQL template;

s240: and storing the SQL statement as a migration task, and finishing the generation of the migration task.

For example, if the loan date to be migrated is 1 year before the current date (e.g., 2021.5.14), the loans that are currently cleared, configure the variables $ { currDate }, $ { offset _ days } in the SQL parameter variables, currDate being the current date, offset _ days being the number of days of the date offset, and a value of-365 days;

the SQL statement template in the migration SQL template is configured as follows:

select * from contract_info where encash_date >= ${currDate} and contract_status = 'CLEAR'；

generating an SQL statement executable by the current migration task based on the current day SQL parameter and the migration SQL template, specifically:

select * from contract_info where encash_date >= '2021-05-14' and contract_status = 'CLEAR'。

the business data to be migrated needs to carry out SQL statement template and dynamic parameter configuration according to the actual business data condition, and the migration engine synthesizes the SQL statement template and the dynamic parameter into the migration SQL statements executed on the same day and carries out the migration in a group according to the specified sequence when executing the migration engine; for the main foreign key relation in the business data, migration according to the business data dimension is realized by taking a foreign key as a variable and combining an SQL statement template to synthesize a sub-table migration statement; therefore, according to the development needs of a service system, the data migration (data cleaning) process of the invention can be configured according to the actual conditions of the service, can flexibly cope with the service change, and realize the flexible configuration of data backup; and the migration time can be flexibly controlled.

The invention only backs up the business data (the non-business data does not need to be backed up), reduces the occupied backup physical space and enables the data backup to be more refined, namely, the data migration of the invention does not carry out the migration of the whole library or the whole table, but carries out the data migration based on the business customization, is the migration based on the business data dimensionality and only migrates the business data which accords with the configuration condition.

S300: the method for processing the migration task by calling the flow processor in the flow processing engine module until the migration is finished specifically comprises the following sub-steps as shown in fig. 4:

s320: judging whether the last unfinished migration task exists in the migration group according to the execution state, if so, extracting the unfinished migration task and the unexecuted migration task according to a time sequence, for example, extracting the unfinished migration task first according to the time sequence, and then extracting the unexecuted migration task; merging the unfinished migration task data and the unexecuted migration task data, wherein the unfinished migration tasks are prioritized in the merged migration task data, and the unfinished migration tasks are processed firstly and then; if not, directly extracting the unexecuted migration task for processing; wherein the execution state comprises an incomplete migration task and an unexecuted migration task.

As shown in fig. 5, migration tasks are executed between migration groups in an incomplete migration task or an unexecuted migration task based on a migration group dimension (hierarchical code order), the migration tasks are executed in batches based on an order set in system parameters within a migration group, that is, the execution order between each migration group is sorted according to the hierarchical code, and data of a library table within the migration group is migrated in batches according to an order set by the batch _ limit in the system parameters.

(1) Executing migration tasks among the migration groups based on the migration group dimensions;

sorting the plurality of migration groups based on the hierarchical code to generate a migration group dimension;

and circularly processing the migration tasks of each migration group based on the dimensionality of the migration groups, and if all the migration tasks in the current migration group are processed, continuing to process the migration tasks in the next migration group.

(2) Executing the migration tasks in batches by the data of the base table in the migration group based on the sequence set in the system parameters;

1) extracting migration data in batches according to a value set by batch _ limit (data volume for batch migration and deletion) in a system parameter;

2) it is determined whether the data can be extracted,

if the data cannot be extracted, the current migration task is processed, and the next migration task is continued;

if the data can be extracted, inserting the data into a target base table, deleting the extracted data in an original base table, judging whether the current time reaches the migration scheduling operation stop date of the data or whether the processing process is abnormal, if the processing process is abnormal, terminating the data migration processing and rolling back the data, and marking the data to be an unfinished migration task, wherein the unfinished reason is abnormal termination of a program; if the migration scheduling operation stop date is reached, marking the migration task as an incomplete migration task, wherein the incomplete reason is terminated by the end time; and (3) if the migration scheduling operation stop date is not reached and no exception occurs, performing migration processing on the data, continuing the step (1) after the data migration is completed, and circularly repeating until all the migration tasks in the migration group are completed.

The invention does not adopt a message middleware mode to migrate data, adopts a paging (batch) processing mode to perform data migration of small batch and multiple batch for processing big data, adopts an error processing mode to ensure data migration integrity for abnormal conditions (incomplete migration tasks, the reason for incompletion is abnormal termination of programs) occurring in the migration process, and specifically comprises the following steps: and deleting redundant data left by the exception processing of the last migration task, and continuing the migration processing from the failed data position, thereby ensuring that the data is complete and cannot be lost.

While the invention has been described with reference to a preferred embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the technical features mentioned in the embodiments can be combined in any way as long as there is no structural conflict. It is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims. The invention has not been described in detail and is part of the common general knowledge of a person skilled in the art.

Claims

1. An easily extensible and configurable data backup system is characterized by comprising a management platform service module, a process processing engine module and a multi-data source service module;

2. The data backup system of claim 1, wherein the management platform service module comprises a registry submodule, a scheduling management submodule, a configuration center submodule and a monitoring center submodule;

3. The data backup system of claim 1, wherein the flow processing engine module comprises a configuration parser, a stream processor, a log tracker, and an error processor;

the flow processor is used for executing a migration task;

the log tracker is used for recording the migrated task log;

the error handler is to compensate for an abnormally terminated migration task.

4. The data backup system of claim 1, wherein the multiple data source service module comprises a data access and persistence sub-module and a transaction control sub-module;

5. The data backup system of claim 1, wherein the plurality of data sources all belong to MYSQL databases, comprising a management platform database, a data migration source library, and a data migration target library.

6. An extensible and configurable data backup method is characterized by comprising the following steps:

7. The data backup method according to claim 6, wherein the step S200 comprises the following sub-steps:

s220: extracting the migration SQL template;

s240: and saving the SQL statement as a migration task.

8. The data backup method according to claim 6, wherein the step S300 comprises the following sub-steps:

9. The data backup method according to claim 8, wherein the step S300 of executing migration tasks based on the hierarchical code sequence among the migration groups specifically comprises:

10. The data backup method according to claim 8, wherein the executing of the migration tasks in the migration group in batches based on the sequence set in the system parameters in step S300 specifically includes: