CN116186007A

CN116186007A - Archiving system and method for historical data

Info

Publication number: CN116186007A
Application number: CN202211717410.9A
Authority: CN
Inventors: 杨亮亮
Original assignee: Chongqing Fumin Bank Co Ltd
Current assignee: Chongqing Fumin Bank Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-05-30

Abstract

The invention relates to the technical field of data processing, in particular to a system and a method for archiving historical data, wherein the system comprises a server, and the server comprises the following modules: and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data; task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the method is also used for batch processing of historical data to be archived; and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database. The method and the device can timely and efficiently clean the data of the database.

Description

Archiving system and method for historical data

Technical Field

The invention relates to the technical field of data processing, in particular to an archiving system and method for historical data.

Background

Along with the rapid development of the service, the data volume generated by each service development is increased sharply, so that the occupied space of the database is increased continuously; backup and restoration of databases becomes more difficult and the rapidly expanding data volume places tremendous strain on the load capacity of the application system, system resources, and operating efficiency. And as the data becomes older, the likelihood of the data being accessed is continually reduced. If a database splitting scheme is not adopted by most systems in an enterprise, and database resource early warning is carried out, the database resource early warning is generally carried out in a centralized mode through an artificial script, but data cleaning is not timely, and the data cleaning efficiency is low.

Disclosure of Invention

It is an object of the present invention to provide an archiving system for historical data that enables timely and efficient cleaning of the data of a database.

In order to achieve the above object, there is provided an archiving system for historical data, comprising a server including the following modules:

and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;

task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;

and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.

Further, the server also comprises the following modules:

and a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;

the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking.

Further, the server also comprises the following modules:

and a data splitting module: the method comprises the steps of acquiring data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording sub-data packets obtained after splitting.

Further, the server also comprises the following modules:

and the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display.

Another object of the present invention is to provide a method for archiving historical data, the method being applied to the above system, and specifically comprising the following steps:

a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;

task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;

and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.

Further, the method also comprises the following steps:

the process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;

the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.

Further, the method also comprises the following steps:

and a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.

Further, the method also comprises the following steps:

and a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.

Principle and advantage:

1. the scheme adopts the strategies of matching analysis, archiving and deleting at last, and also supports the restoration of the archived data to the production library, so that the normal access and the call of the data are not influenced, and the problems of shortage of database resources and untimely cleaning are greatly relieved.

2. And multithreading parallel processing is supported, so that the working efficiency of data archiving is greatly improved.

3. In the whole data archiving process, if one step fails, full rollback is carried out, and the integrity of the final data is ensured.

4. The scheme can provide visual management of each historical data archiving subtask so as to facilitate the reference of the real-time execution flow record of each historical data archiving subtask.

Drawings

FIG. 1 is a logical block diagram of an archiving system for historical data in accordance with an embodiment of the present invention;

FIG. 2 is a schematic flow chart of data archiving.

Detailed Description

The following is a further detailed description of the embodiments:

examples

The utility model provides an archival system for historical data, basically as shown in fig. 1, fig. 2, including server, user side and display screen all with server communication connection, the user side is desktop computer, notebook computer or panel computer, the display screen is that data monitoring large screen or a plurality of small screen assemble side by side for the visual control of archival of historical data in the database, the server includes the following module:

and a data configuration module: the method comprises the steps of acquiring configuration data transmitted by a user through a user side, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data; the data source configuration data mainly comprises a database needing archiving processing and computing resources of a planning server, the archiving configuration data is screening conditions of historical data which specifically need archiving processing in the database, such as time screening conditions, calling frequency screening conditions, data type screening conditions, data memory size screening conditions and the like, and the data source configuration data also comprises standard execution flow-standard execution flow records of data archiving. The scheduling configuration data is a set of rules for setting the execution sequence of each historical data archiving subtask in the serial historical data archiving task set, a set of rules for setting the execution sequence of each historical data archiving subtask in the parallel historical data archiving task set, the number of the historical data archiving subtasks parallelly executed in the parallel historical data archiving task set and the like according to the computing resources of the planning server and the historical data processing amount required to be archived and the set of the parallel historical data archiving task set.

Task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to data source configuration data and archiving configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set according to scheduling configuration data; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set; the parallel historical data archiving task set is formed by constructing 1 or more historical data profiling subtasks, can be executed in parallel to improve the working efficiency, and the serial historical data archiving task set is responsible for managing the execution sequence, and is combined with parallel processing to maximally utilize the computing resources of the server, so that the processing efficiency of data archiving is improved.

the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking. The whole process realizes the whole process management, if one step fails, the whole rollback is carried out, and the integrity of the final data is ensured.

And a data splitting module: the method comprises the steps of obtaining data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, numbering and associated recording sub-data packets obtained after splitting, and on one hand, avoiding error caused by overlarge data packet capacity in an archiving process and wasting excessive time due to backtracking, so that archiving efficiency is improved; on the other hand, the problem that storage confusion is caused to influence later consulting call is avoided.

And the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display. In this embodiment, the visual management of the task can be provided based on the WEB setting management interface on the display screen. And the functions of starting, stopping, restarting, skipping, retrying and the like are provided while the visualization is performed, so that the user can conveniently perform corresponding operation according to the own requirements.

The archiving method for the historical data is applied to the system and comprises the following steps:

The foregoing is merely exemplary of the present invention, and the specific structures and features well known in the art will be described in detail herein, so that those skilled in the art will be able to ascertain the general knowledge of the state of the art, including the application date or the priority date, and to ascertain the general knowledge of the state of the art, without the ability to apply the general experimental means before that date, so that those skilled in the art, with the benefit of this disclosure, may make various modifications of the present invention with the ability to work itself, without the ability to work out the present invention, as such typical structures or methods would be considered to be an obstacle for those skilled in the art to practice the present invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims

1. An archiving system for historical data, characterized by: the server comprises the following modules:

2. An archiving system for historical data according to claim 1, wherein: the server also comprises the following modules:

3. An archiving system for historical data according to claim 2, wherein: the server also comprises the following modules:

4. An archiving system for historical data according to claim 3, wherein: the server also comprises the following modules:

5. A method for archiving historical data, comprising the steps of:

6. The archiving method for historical data of claim 5, wherein: the method also comprises the following steps:

7. A method for archiving historical data according to claim 6, wherein: the method also comprises the following steps:

8. A method for archiving historical data according to claim 7, wherein: the method also comprises the following steps: