CN116186007A - Archiving system and method for historical data - Google Patents
Archiving system and method for historical data Download PDFInfo
- Publication number
- CN116186007A CN116186007A CN202211717410.9A CN202211717410A CN116186007A CN 116186007 A CN116186007 A CN 116186007A CN 202211717410 A CN202211717410 A CN 202211717410A CN 116186007 A CN116186007 A CN 116186007A
- Authority
- CN
- China
- Prior art keywords
- data
- archiving
- historical data
- historical
- task set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data processing, in particular to a system and a method for archiving historical data, wherein the system comprises a server, and the server comprises the following modules: and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data; task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the method is also used for batch processing of historical data to be archived; and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database. The method and the device can timely and efficiently clean the data of the database.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an archiving system and method for historical data.
Background
Along with the rapid development of the service, the data volume generated by each service development is increased sharply, so that the occupied space of the database is increased continuously; backup and restoration of databases becomes more difficult and the rapidly expanding data volume places tremendous strain on the load capacity of the application system, system resources, and operating efficiency. And as the data becomes older, the likelihood of the data being accessed is continually reduced. If a database splitting scheme is not adopted by most systems in an enterprise, and database resource early warning is carried out, the database resource early warning is generally carried out in a centralized mode through an artificial script, but data cleaning is not timely, and the data cleaning efficiency is low.
Disclosure of Invention
It is an object of the present invention to provide an archiving system for historical data that enables timely and efficient cleaning of the data of a database.
In order to achieve the above object, there is provided an archiving system for historical data, comprising a server including the following modules:
and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.
Further, the server also comprises the following modules:
and a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;
the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking.
Further, the server also comprises the following modules:
and a data splitting module: the method comprises the steps of acquiring data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording sub-data packets obtained after splitting.
Further, the server also comprises the following modules:
and the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display.
Another object of the present invention is to provide a method for archiving historical data, the method being applied to the above system, and specifically comprising the following steps:
a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.
Further, the method also comprises the following steps:
the process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;
the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.
Further, the method also comprises the following steps:
and a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.
Further, the method also comprises the following steps:
and a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.
Principle and advantage:
1. the scheme adopts the strategies of matching analysis, archiving and deleting at last, and also supports the restoration of the archived data to the production library, so that the normal access and the call of the data are not influenced, and the problems of shortage of database resources and untimely cleaning are greatly relieved.
2. And multithreading parallel processing is supported, so that the working efficiency of data archiving is greatly improved.
3. In the whole data archiving process, if one step fails, full rollback is carried out, and the integrity of the final data is ensured.
4. The scheme can provide visual management of each historical data archiving subtask so as to facilitate the reference of the real-time execution flow record of each historical data archiving subtask.
Drawings
FIG. 1 is a logical block diagram of an archiving system for historical data in accordance with an embodiment of the present invention;
FIG. 2 is a schematic flow chart of data archiving.
Detailed Description
The following is a further detailed description of the embodiments:
examples
The utility model provides an archival system for historical data, basically as shown in fig. 1, fig. 2, including server, user side and display screen all with server communication connection, the user side is desktop computer, notebook computer or panel computer, the display screen is that data monitoring large screen or a plurality of small screen assemble side by side for the visual control of archival of historical data in the database, the server includes the following module:
and a data configuration module: the method comprises the steps of acquiring configuration data transmitted by a user through a user side, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data; the data source configuration data mainly comprises a database needing archiving processing and computing resources of a planning server, the archiving configuration data is screening conditions of historical data which specifically need archiving processing in the database, such as time screening conditions, calling frequency screening conditions, data type screening conditions, data memory size screening conditions and the like, and the data source configuration data also comprises standard execution flow-standard execution flow records of data archiving. The scheduling configuration data is a set of rules for setting the execution sequence of each historical data archiving subtask in the serial historical data archiving task set, a set of rules for setting the execution sequence of each historical data archiving subtask in the parallel historical data archiving task set, the number of the historical data archiving subtasks parallelly executed in the parallel historical data archiving task set and the like according to the computing resources of the planning server and the historical data processing amount required to be archived and the set of the parallel historical data archiving task set.
Task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to data source configuration data and archiving configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set according to scheduling configuration data; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set; the parallel historical data archiving task set is formed by constructing 1 or more historical data profiling subtasks, can be executed in parallel to improve the working efficiency, and the serial historical data archiving task set is responsible for managing the execution sequence, and is combined with parallel processing to maximally utilize the computing resources of the server, so that the processing efficiency of data archiving is improved.
And a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.
And a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;
the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking. The whole process realizes the whole process management, if one step fails, the whole rollback is carried out, and the integrity of the final data is ensured.
And a data splitting module: the method comprises the steps of obtaining data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, numbering and associated recording sub-data packets obtained after splitting, and on one hand, avoiding error caused by overlarge data packet capacity in an archiving process and wasting excessive time due to backtracking, so that archiving efficiency is improved; on the other hand, the problem that storage confusion is caused to influence later consulting call is avoided.
And the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display. In this embodiment, the visual management of the task can be provided based on the WEB setting management interface on the display screen. And the functions of starting, stopping, restarting, skipping, retrying and the like are provided while the visualization is performed, so that the user can conveniently perform corresponding operation according to the own requirements.
The archiving method for the historical data is applied to the system and comprises the following steps:
a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.
The process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;
the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.
And a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.
And a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.
The foregoing is merely exemplary of the present invention, and the specific structures and features well known in the art will be described in detail herein, so that those skilled in the art will be able to ascertain the general knowledge of the state of the art, including the application date or the priority date, and to ascertain the general knowledge of the state of the art, without the ability to apply the general experimental means before that date, so that those skilled in the art, with the benefit of this disclosure, may make various modifications of the present invention with the ability to work itself, without the ability to work out the present invention, as such typical structures or methods would be considered to be an obstacle for those skilled in the art to practice the present invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.
Claims (8)
1. An archiving system for historical data, characterized by: the server comprises the following modules:
and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.
2. An archiving system for historical data according to claim 1, wherein: the server also comprises the following modules:
and a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;
the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking.
3. An archiving system for historical data according to claim 2, wherein: the server also comprises the following modules:
and a data splitting module: the method comprises the steps of acquiring data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording sub-data packets obtained after splitting.
4. An archiving system for historical data according to claim 3, wherein: the server also comprises the following modules:
and the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display.
5. A method for archiving historical data, comprising the steps of:
a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.
6. The archiving method for historical data of claim 5, wherein: the method also comprises the following steps:
the process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;
the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.
7. A method for archiving historical data according to claim 6, wherein: the method also comprises the following steps:
and a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.
8. A method for archiving historical data according to claim 7, wherein: the method also comprises the following steps:
and a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211717410.9A CN116186007A (en) | 2022-12-29 | 2022-12-29 | Archiving system and method for historical data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211717410.9A CN116186007A (en) | 2022-12-29 | 2022-12-29 | Archiving system and method for historical data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116186007A true CN116186007A (en) | 2023-05-30 |
Family
ID=86447102
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211717410.9A Pending CN116186007A (en) | 2022-12-29 | 2022-12-29 | Archiving system and method for historical data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116186007A (en) |
-
2022
- 2022-12-29 CN CN202211717410.9A patent/CN116186007A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102323945B (en) | SQL (Structured Query Language)-based database management method and device | |
US10135693B2 (en) | System and method for monitoring performance of applications for an entity | |
CN111125444A (en) | Big data task scheduling management method, device, equipment and storage medium | |
JP2009543226A (en) | Automation of standard operating procedures in database management | |
CN113946499A (en) | Micro-service link tracking and performance analysis method, system, equipment and application | |
CN101349987A (en) | Statistical analysis method of computer using condition | |
CN112817720A (en) | Visual workflow scheduling method and device and electronic equipment | |
WO2005008414A2 (en) | Method and apparatus for parallel action processing | |
CN116560893B (en) | Computer application program operation data fault processing system | |
CN111290942A (en) | Pressure testing method, device and computer readable medium | |
CN110033242B (en) | Working time determining method, device, equipment and medium | |
CN109150596B (en) | SCADA system real-time data dump method and device | |
CN116186007A (en) | Archiving system and method for historical data | |
CN115840656A (en) | Automatic operation and maintenance method and system for application program based on fault self-healing | |
CN116044867A (en) | Hydraulic system control method, system, equipment and medium based on automatic programming | |
CN109614330A (en) | Storage system service test method, device, system, storage control and medium | |
CN114970476A (en) | Data processing method, system, electronic device and storage medium | |
CN113010210A (en) | Automatic operation and maintenance operation management method and system | |
CN114090382A (en) | Health inspection method and device for super-converged cluster | |
US20050216490A1 (en) | Automatic database diagnostic usage models | |
CN113934595A (en) | Data analysis method and system, storage medium and electronic terminal | |
CN113064776A (en) | BMC fault diagnosis method and device | |
US11556446B2 (en) | Programmatic performance anomaly detection | |
CN114900447B (en) | Software and hardware resource management monitoring system based on Pass platform | |
TWI690810B (en) | Database management system and database management method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |