CN116186007A - Archiving system and method for historical data - Google Patents

Archiving system and method for historical data Download PDF

Info

Publication number
CN116186007A
CN116186007A CN202211717410.9A CN202211717410A CN116186007A CN 116186007 A CN116186007 A CN 116186007A CN 202211717410 A CN202211717410 A CN 202211717410A CN 116186007 A CN116186007 A CN 116186007A
Authority
CN
China
Prior art keywords
data
archiving
historical data
historical
task set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211717410.9A
Other languages
Chinese (zh)
Inventor
杨亮亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Fumin Bank Co Ltd
Original Assignee
Chongqing Fumin Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Fumin Bank Co Ltd filed Critical Chongqing Fumin Bank Co Ltd
Priority to CN202211717410.9A priority Critical patent/CN116186007A/en
Publication of CN116186007A publication Critical patent/CN116186007A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a system and a method for archiving historical data, wherein the system comprises a server, and the server comprises the following modules: and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data; task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the method is also used for batch processing of historical data to be archived; and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database. The method and the device can timely and efficiently clean the data of the database.

Description

Archiving system and method for historical data
Technical Field
The invention relates to the technical field of data processing, in particular to an archiving system and method for historical data.
Background
Along with the rapid development of the service, the data volume generated by each service development is increased sharply, so that the occupied space of the database is increased continuously; backup and restoration of databases becomes more difficult and the rapidly expanding data volume places tremendous strain on the load capacity of the application system, system resources, and operating efficiency. And as the data becomes older, the likelihood of the data being accessed is continually reduced. If a database splitting scheme is not adopted by most systems in an enterprise, and database resource early warning is carried out, the database resource early warning is generally carried out in a centralized mode through an artificial script, but data cleaning is not timely, and the data cleaning efficiency is low.
Disclosure of Invention
It is an object of the present invention to provide an archiving system for historical data that enables timely and efficient cleaning of the data of a database.
In order to achieve the above object, there is provided an archiving system for historical data, comprising a server including the following modules:
and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.
Further, the server also comprises the following modules:
and a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;
the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking.
Further, the server also comprises the following modules:
and a data splitting module: the method comprises the steps of acquiring data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording sub-data packets obtained after splitting.
Further, the server also comprises the following modules:
and the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display.
Another object of the present invention is to provide a method for archiving historical data, the method being applied to the above system, and specifically comprising the following steps:
a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.
Further, the method also comprises the following steps:
the process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;
the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.
Further, the method also comprises the following steps:
and a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.
Further, the method also comprises the following steps:
and a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.
Principle and advantage:
1. the scheme adopts the strategies of matching analysis, archiving and deleting at last, and also supports the restoration of the archived data to the production library, so that the normal access and the call of the data are not influenced, and the problems of shortage of database resources and untimely cleaning are greatly relieved.
2. And multithreading parallel processing is supported, so that the working efficiency of data archiving is greatly improved.
3. In the whole data archiving process, if one step fails, full rollback is carried out, and the integrity of the final data is ensured.
4. The scheme can provide visual management of each historical data archiving subtask so as to facilitate the reference of the real-time execution flow record of each historical data archiving subtask.
Drawings
FIG. 1 is a logical block diagram of an archiving system for historical data in accordance with an embodiment of the present invention;
FIG. 2 is a schematic flow chart of data archiving.
Detailed Description
The following is a further detailed description of the embodiments:
examples
The utility model provides an archival system for historical data, basically as shown in fig. 1, fig. 2, including server, user side and display screen all with server communication connection, the user side is desktop computer, notebook computer or panel computer, the display screen is that data monitoring large screen or a plurality of small screen assemble side by side for the visual control of archival of historical data in the database, the server includes the following module:
and a data configuration module: the method comprises the steps of acquiring configuration data transmitted by a user through a user side, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data; the data source configuration data mainly comprises a database needing archiving processing and computing resources of a planning server, the archiving configuration data is screening conditions of historical data which specifically need archiving processing in the database, such as time screening conditions, calling frequency screening conditions, data type screening conditions, data memory size screening conditions and the like, and the data source configuration data also comprises standard execution flow-standard execution flow records of data archiving. The scheduling configuration data is a set of rules for setting the execution sequence of each historical data archiving subtask in the serial historical data archiving task set, a set of rules for setting the execution sequence of each historical data archiving subtask in the parallel historical data archiving task set, the number of the historical data archiving subtasks parallelly executed in the parallel historical data archiving task set and the like according to the computing resources of the planning server and the historical data processing amount required to be archived and the set of the parallel historical data archiving task set.
Task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to data source configuration data and archiving configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set according to scheduling configuration data; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set; the parallel historical data archiving task set is formed by constructing 1 or more historical data profiling subtasks, can be executed in parallel to improve the working efficiency, and the serial historical data archiving task set is responsible for managing the execution sequence, and is combined with parallel processing to maximally utilize the computing resources of the server, so that the processing efficiency of data archiving is improved.
And a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.
And a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;
the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking. The whole process realizes the whole process management, if one step fails, the whole rollback is carried out, and the integrity of the final data is ensured.
And a data splitting module: the method comprises the steps of obtaining data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, numbering and associated recording sub-data packets obtained after splitting, and on one hand, avoiding error caused by overlarge data packet capacity in an archiving process and wasting excessive time due to backtracking, so that archiving efficiency is improved; on the other hand, the problem that storage confusion is caused to influence later consulting call is avoided.
And the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display. In this embodiment, the visual management of the task can be provided based on the WEB setting management interface on the display screen. And the functions of starting, stopping, restarting, skipping, retrying and the like are provided while the visualization is performed, so that the user can conveniently perform corresponding operation according to the own requirements.
The archiving method for the historical data is applied to the system and comprises the following steps:
a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.
The process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;
the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.
And a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.
And a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.
The foregoing is merely exemplary of the present invention, and the specific structures and features well known in the art will be described in detail herein, so that those skilled in the art will be able to ascertain the general knowledge of the state of the art, including the application date or the priority date, and to ascertain the general knowledge of the state of the art, without the ability to apply the general experimental means before that date, so that those skilled in the art, with the benefit of this disclosure, may make various modifications of the present invention with the ability to work itself, without the ability to work out the present invention, as such typical structures or methods would be considered to be an obstacle for those skilled in the art to practice the present invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims (8)

1. An archiving system for historical data, characterized by: the server comprises the following modules:
and a data configuration module: the method comprises the steps of acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling module: the method comprises the steps of matching historical data to be archived from a database according to scheduling configuration data and data source configuration data, and establishing a parallel historical data archiving task set and a serial historical data archiving task set; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and a data archiving module: and the parallel historical data archiving task set and the serial historical data archiving task set are used for extracting the historical data from the database according to the archiving configuration data, writing the historical data into the archiving library and deleting the historical data of the database.
2. An archiving system for historical data according to claim 1, wherein: the server also comprises the following modules:
and a process recording module: the real-time execution flow record is used for the real-time record data archiving module to record each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set;
the flow comparison module is as follows: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have a flow error, backtracking to the previous flow step with the error according to the real-time execution flow record, generating backtracking information, and sending the backtracking information to a data archiving module for backtracking.
3. An archiving system for historical data according to claim 2, wherein: the server also comprises the following modules:
and a data splitting module: the method comprises the steps of acquiring data packet size information of historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording sub-data packets obtained after splitting.
4. An archiving system for historical data according to claim 3, wherein: the server also comprises the following modules:
and the visual management module is used for: and the real-time execution flow record is used for recording each historical data archiving subtask on a display screen for visual display.
5. A method for archiving historical data, comprising the steps of:
a data configuration step: acquiring configuration data of a user, wherein the configuration data comprises data source configuration data, archiving configuration data and scheduling configuration data;
task scheduling: according to the scheduling configuration data and the data source configuration data, history data needing to be archived are matched from a database, and a parallel history data archiving task set and a serial history data archiving task set are established; the system is also used for processing the historical data to be archived in batches according to the parallel historical data archiving task set and the serial historical data archiving task set;
and (3) data archiving: and extracting the historical data from the database according to the archiving configuration data by the parallel historical data archiving task set and the serial historical data archiving task set, writing the historical data into the archiving library, and deleting the historical data of the database.
6. The archiving method for historical data of claim 5, wherein: the method also comprises the following steps:
the process recording step: recording real-time execution flow records of each historical data profiling subtask in the parallel historical data archiving task set and the serial historical data archiving task set in the data archiving step in real time;
the flow comparison step: and comparing and analyzing the real-time execution flow record with a preset standard execution flow record, if the real-time execution flow record is analyzed to have flow errors, backtracking to the error previous flow step according to the real-time execution flow record, generating backtracking information, and backtracking the backtracking information through a data archiving step.
7. A method for archiving historical data according to claim 6, wherein: the method also comprises the following steps:
and a data splitting step: and acquiring data packet size information of the historical data corresponding to each historical data archiving subtask, comparing the data packet size information with a data packet splitting threshold, splitting the historical data according to the data packet splitting threshold if the data packet size information is larger than the data packet splitting threshold, and numbering and associated recording the sub-data packets obtained after splitting.
8. A method for archiving historical data according to claim 7, wherein: the method also comprises the following steps:
and a visual management step: recording the real-time execution flow of each historical data archiving subtask on a display screen for visual display.
CN202211717410.9A 2022-12-29 2022-12-29 Archiving system and method for historical data Pending CN116186007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211717410.9A CN116186007A (en) 2022-12-29 2022-12-29 Archiving system and method for historical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211717410.9A CN116186007A (en) 2022-12-29 2022-12-29 Archiving system and method for historical data

Publications (1)

Publication Number Publication Date
CN116186007A true CN116186007A (en) 2023-05-30

Family

ID=86447102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211717410.9A Pending CN116186007A (en) 2022-12-29 2022-12-29 Archiving system and method for historical data

Country Status (1)

Country Link
CN (1) CN116186007A (en)

Similar Documents

Publication Publication Date Title
CN102323945B (en) SQL (Structured Query Language)-based database management method and device
US10135693B2 (en) System and method for monitoring performance of applications for an entity
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
JP2009543226A (en) Automation of standard operating procedures in database management
CN113946499A (en) Micro-service link tracking and performance analysis method, system, equipment and application
CN101349987A (en) Statistical analysis method of computer using condition
CN112817720A (en) Visual workflow scheduling method and device and electronic equipment
WO2005008414A2 (en) Method and apparatus for parallel action processing
CN116560893B (en) Computer application program operation data fault processing system
CN111290942A (en) Pressure testing method, device and computer readable medium
CN110033242B (en) Working time determining method, device, equipment and medium
CN109150596B (en) SCADA system real-time data dump method and device
CN116186007A (en) Archiving system and method for historical data
CN115840656A (en) Automatic operation and maintenance method and system for application program based on fault self-healing
CN116044867A (en) Hydraulic system control method, system, equipment and medium based on automatic programming
CN109614330A (en) Storage system service test method, device, system, storage control and medium
CN114970476A (en) Data processing method, system, electronic device and storage medium
CN113010210A (en) Automatic operation and maintenance operation management method and system
CN114090382A (en) Health inspection method and device for super-converged cluster
US20050216490A1 (en) Automatic database diagnostic usage models
CN113934595A (en) Data analysis method and system, storage medium and electronic terminal
CN113064776A (en) BMC fault diagnosis method and device
US11556446B2 (en) Programmatic performance anomaly detection
CN114900447B (en) Software and hardware resource management monitoring system based on Pass platform
TWI690810B (en) Database management system and database management method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination