CN116361066A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116361066A
CN116361066A CN202111619612.5A CN202111619612A CN116361066A CN 116361066 A CN116361066 A CN 116361066A CN 202111619612 A CN202111619612 A CN 202111619612A CN 116361066 A CN116361066 A CN 116361066A
Authority
CN
China
Prior art keywords
backup
time point
files
file
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111619612.5A
Other languages
Chinese (zh)
Inventor
裴庭伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202111619612.5A priority Critical patent/CN116361066A/en
Publication of CN116361066A publication Critical patent/CN116361066A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a storage medium. The data processing method comprises the following steps: and determining a first time point under the condition that the utilization rate of the first backup space is smaller than a set threshold value. And generating a first backup file of the first database, and storing the first backup file into the first backup space. The first backup file represents a full backup file, and the backup time point corresponding to the first backup file is a first time point; the first backup space is different from the storage space of the first database.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.
Background
A cloud database is a database deployed in a virtual computing environment. When a user has a service requirement, the data of the cloud database needs to be restored to a target time point. In general, the data recovery is performed by selecting the last full-volume backup file before the target time point, the incremental backup file between the time point corresponding to the full-volume backup file and the target time point, and the log backup file, where the recovery time period depends on the frequency of the full-volume backup, and if the frequency of the full-volume backup is higher, the fewer the incremental backup files and the log backup files that need to be recovered, the shorter the recovery time period. However, frequent full-volume backups can create a large load pressure on the cloud database, affecting the application performance of the cloud database. That is, in the related art, when data recovery is performed, the data recovery efficiency and the application performance of the cloud database cannot be considered.
Disclosure of Invention
In view of the foregoing, a main objective of the embodiments of the present application is to provide a data processing method, apparatus, electronic device, and storage medium, so as to solve the problem that in the related art, when data recovery is performed, the data recovery efficiency and the application performance of the cloud database cannot be considered.
In order to achieve the above purpose, the technical solution of the embodiments of the present application is implemented as follows:
the embodiment of the application provides a data processing method, which comprises the following steps:
determining a first time point under the condition that the utilization rate of the first backup space is smaller than a set threshold value;
generating a first backup file of a first database, and storing the first backup file into the first backup space; wherein,,
the first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
In the above aspect, the determining the first time point includes:
determining a second time point when a first number of continuous second backup files exist in the first backup space and are all log backup files; the second time point represents a backup time point corresponding to a first second backup file in the continuous second backup files;
The first point in time is determined based on the second point in time.
In the above aspect, the determining the first time point based on the second time point includes:
determining a third time point in the continuous second backup files at intervals of N second backup files by taking the second backup files corresponding to the second time point as a starting point, and obtaining at least one third time point;
determining the first point in time from the at least one third point in time; wherein,,
n is an integer greater than 1 and less than the first number.
In the above aspect, the determining the first time point in the at least one third time point includes:
and determining the last determined third time point as the first time point.
In the above solution, the generating the first backup file of the first database includes:
determining a fourth point in time; the fourth time point represents a backup time point corresponding to the third backup file; the third backup file characterizes the last full-backed up file before the first time point;
generating the first backup file based on the third backup file and a second number of fourth backup files; wherein,,
The second number of fourth backup files characterizes backup files with corresponding backup time points located between the fourth time point and the first time point, and the second number of fourth backup files does not comprise the full backup files.
In the above solution, the generating the first backup file based on the third backup file and the second number of fourth backup files includes:
taking out the third backup files and the second number of fourth backup files from the first backup space and storing the third backup files and the second number of fourth backup files in a newly built second database;
and generating the first backup file by backing up the second database.
In the above aspect, before the determining the first time point, the method further includes:
determining a fifth backup file; the fifth backup file characterizes a first full backup file stored in the first backup space;
and clearing the third number of sixth backup files under the condition that the third number of sixth backup files correspondingly exist before the backup time point corresponding to the fifth backup files.
The embodiment of the application also provides a data processing device, which comprises:
A determining unit configured to determine a first time point if a usage rate of the first backup space is smaller than a set threshold;
the storage unit is used for generating a first backup file of the first database and storing the first backup file into the first backup space; wherein,,
the first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
The embodiment of the application also provides electronic equipment, which comprises: a processor and a memory for storing a computer program capable of running on the processor, wherein,
the processor is configured to perform the steps of any of the methods described above when the computer program is run.
The present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
In the embodiment of the present application, the first time point is determined when the usage rate of the first backup space is smaller than the set threshold. And generating a first backup file of the first database, and storing the first backup file into the first backup space. The first backup file represents a full backup file, and the backup time point corresponding to the first backup file is a first time point; the first backup space is different from the storage space of the first database. In this way, when the usage rate of the first backup space corresponding to the first database meets the condition, the full-volume backup files can be generated as many as possible, the occupation ratio of the full-volume backup files in the first backup space is maximally improved, and the higher the number of the full-volume backup files, the faster the data recovery speed and the higher the data recovery efficiency. In addition, the full-volume backup process in the embodiment of the present application is not performed on the first database, so that even if the full-volume backup is performed multiple times, the load of the first database is not increased, and the performance of the first database is not affected.
Drawings
Fig. 1 is a schematic implementation flow chart of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of determining a first time point according to an embodiment of the present application;
FIG. 3 is a schematic diagram of generating a first backup file according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating cleaning a third number of sixth backup files according to an embodiment of the present application;
fig. 5 is a schematic implementation flow chart of a data processing method provided by an application embodiment of the present application;
FIG. 6 is a schematic flowchart of another implementation of a data processing method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a first backup space before and after adjustment according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The technical solutions described in the embodiments of the present application may be arbitrarily combined without any conflict.
In addition, in the embodiments of the present application, the terms "first," "second," etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence. The term "and/or" is merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
The cloud database is a database deployed in a virtual computing environment, has the advantages of pay-per-demand, expansion on demand, flexible backup, flexible recovery, high availability, storage integration and the like, and is widely pursued by small and medium enterprises. A system that provides services such as cloud database subscription, deletion, backup, restoration, and the like is referred to as a cloud database system. The user can select specific specifications of the cloud database and the corresponding capacity of the backup space at a subscription interface of the cloud database system.
A user can select a backup mode of the cloud database on a management interface of the cloud database system, and the backup mode mainly comprises three backup modes of full backup, incremental backup and log backup.
The full-volume backup refers to performing full backup on all data corresponding to a cloud database at a certain time point, and based on the full-volume backup file, all data corresponding to the cloud database at the backup time point can be restored.
The incremental backup refers to backup of all changed data after the previous backup, the previous backup can be full-volume backup or incremental backup, and all data corresponding to the cloud database at the current backup time point can be restored based on the incremental backup file and the previous backup file.
The log backup refers to a binary log file Binlog of the backup cloud database, and based on the log backup file and the previous backup file, all data corresponding to any time point between the time point corresponding to the earliest full backup file and the time point of the current backup of the cloud database can be restored.
The full backup and the incremental backup are temporary actions, and the backup process is finished after the backup is completed. The log backup is a long-term continuous action, and after the log backup is started, the log backup is performed as long as the cloud database has new binlog generation.
The data recovery based on the full-volume backup file is the fastest, however, all data at a specific time point corresponding to the full-volume backup file can be recovered, and the load pressure of the cloud database is high when the full-volume backup is performed, which may affect the application performance. The data recovery based on the log backup file has the advantages that the recovery speed is slowest, all data corresponding to any reasonable time point can be recovered, and when the log backup is carried out, the cloud database has no load pressure, and the application performance is not affected.
At present, the industry mainly develops and deploys a cloud database based on an open source platform Kubernetes, and builds a public cloud platform of the database, namely a cloud database system. The backup files of the cloud database comprise full backup files, incremental backup files and log backup files, and are sequentially stored in the corresponding backup space according to time sequence.
In cloud database systems, data recovery is a very important capability, and in disaster recovery, data rollback, etc., the data recovery can play a key role. The main current practice in the industry needs to restore through backup files manually backed up by a user or backup files automatically backed up by a system. When a user has a service requirement, the data of the cloud database needs to be restored to a target time point. In general, the data recovery is performed by selecting the last full-volume backup file before the target time point, the incremental backup file and the log backup file between the time point corresponding to the full-volume backup file and the target time point. The recovery time period depends on the number of full, incremental and log backup files that need to be recovered, and the greater the number, the longer the time required for recovery. The recovery time is inversely related to the frequency of full-volume backup, and if the frequency of full-volume backup is higher, the fewer incremental backup files and log backup files need to be recovered, the shorter the recovery time. However, frequent full-volume backups can create a large load pressure on the cloud database, affecting the application performance of the cloud database. The backup files of the cloud database are stored in the corresponding backup space, the capacity of the backup space is fixed, and the designated capacity of the user during ordering is the maximum storable capacity of the backup space. In an actual backup process, this maximum capacity is typically not reached, so there is typically a large amount of remaining space in the backup space.
That is, in the related art, when data recovery is performed, the data recovery efficiency and the application performance of the cloud database cannot be considered.
Based on this, the embodiment of the application provides a data processing method, a data processing device, an electronic device and a storage medium, and a first time point is determined under the condition that the usage rate of a first backup space is smaller than a set threshold value. And generating a first backup file of the first database, and storing the first backup file into the first backup space. The first backup file represents a full backup file, and the backup time point corresponding to the first backup file is a first time point; the first backup space is different from the storage space of the first database. In this way, when the usage rate of the first backup space corresponding to the first database meets the condition, the full-volume backup files can be generated as many as possible, the occupation ratio of the full-volume backup files in the first backup space is maximally improved, and the higher the number of the full-volume backup files, the faster the data recovery speed and the higher the data recovery efficiency. In addition, the full-volume backup process in the embodiment of the present application is not performed on the first database, so that even if the full-volume backup is performed multiple times, the load of the first database is not increased, and the performance of the first database is not affected.
The present application is described in further detail below with reference to the accompanying drawings and examples.
Fig. 1 is a schematic implementation flow chart of a data method according to an embodiment of the present application. As shown in fig. 1, the method includes:
step 101: and determining a first time point under the condition that the utilization rate of the first backup space is smaller than a set threshold value.
Here, first, the usage of the first backup space corresponding to the first database is detected, and if the usage of the first backup space is smaller than the set threshold, it is indicated that there is a large amount of unused remaining space in the first backup space, and in this case, the first time point is determined. The first time point is a time point corresponding to full-scale backup.
The set threshold may be any value between 20% and 90%, and the specific value of the set threshold is set according to the actual situation, which is not limited in the embodiment of the present application.
In an embodiment, the determining the first time point includes:
determining a second time point when a first number of continuous second backup files exist in the first backup space and are all log backup files; the second time point represents a backup time point corresponding to a first second backup file in the continuous second backup files;
The first point in time is determined based on the second point in time.
Here, the first backup space stores the full backup files, the incremental backup files, and the log backup files of the first database, and all of these backup files are sequentially stored in the first backup space according to the sequence of the corresponding backup time points.
And determining a backup time point corresponding to the first log backup file in the continuous log backup files as a second time point when the first number of continuous second backup files exist in the first backup space and the continuous second backup files are log files, namely when the first number of continuous log backup files exist in the first backup space.
The first number may be 5, 10, 15, and the specific value of the first number is set according to the actual situation, which is not limited in the embodiment of the present application.
Illustratively, there are 5 consecutive log backup files in the first backup space, and the correspondence between the 5 consecutive log backup files and the corresponding backup time points is as follows:
12:01-log backup file 1,
12:02-log backup file 2,
12:03-log backup file 3,
12:04-log backup file 4,
12:05-log backup file 5.
The backup time point corresponding to the first log backup file in the 5 log backup files is determined as a second time point, so the second time point is 12:01.
It should be noted that, the closer the backup time point corresponding to the backup file is to the current time point, the greater the probability that the user selects the backup file at the backup time point to perform data recovery. Thus, here, in reverse order of time, from back to front, a first occurrence of a first number of consecutive sets of log backup files is determined in the first backup space, and a second point in time is determined based on the sets.
Continuing with the example above, assuming the current point in time is 12:08, there are 5 consecutive log backup files for the first backup space between 12:01-12:05 and 5 consecutive log backup files between 11:40-11:45. In this case, according to the reverse order of time, the first occurrence is a set of 5 consecutive log backup files between 12:01-12:05, then the second point in time is determined based on the 5 consecutive log backup files between 12:01-12:05, but not based on the 5 consecutive log backup files between 11:40-11:45.
After the second time is determined, the first time point is determined based on the second time point.
The first time point can be accurately determined by determining the second time point and then determining the first time point based on the second time point.
In an embodiment, the determining the first time point based on the second time point includes:
determining a third time point in the continuous second backup files at intervals of N second backup files by taking the second backup files corresponding to the second time point as a starting point, and obtaining at least one third time point;
determining the first point in time from the at least one third point in time; wherein,,
n is an integer greater than 1 and less than the first number.
Here, a third time point is determined from the continuous log backup files at intervals of N log backup files with the log backup file corresponding to the second time point as a starting point, so as to obtain at least one third time point. The first point in time is determined from the at least one third point in time. Wherein N is an integer greater than 1 and less than the first number.
It should be noted that, when a third time point is determined for every N log backup files, the log backup files corresponding to the second time point are included for counting.
Fig. 2 is a schematic diagram of determining a first time point according to an embodiment of the present application, as shown in fig. 2:
the triangle pattern represents a second backup file, that is, a log backup file, and a first number of 5 consecutive log backup files appear in the first backup space according to a reverse order of time, and a backup time point corresponding to a first log backup file in the 5 consecutive log backup files is determined as a second time point.
Assuming that N is 2, a third time point is determined from every 2 log backup files in the 5 consecutive log backup files, starting from the log backup file corresponding to the second time point. As shown in fig. 2, two third time points are determined, and the first time point is determined based on the two third time points. And when a third time point is determined every 2 log backup files, the log backup files corresponding to the second time point are also included for counting.
By determining at least one third time point, and determining the first time point in the at least one third time point, the first time point can be accurately determined.
In an embodiment, said determining said first point in time in said at least one third point in time comprises:
And determining the last determined third time point as the first time point.
Here, since the last determined third time point is closest to the current time point, the last determined third time point is determined as the first time point.
As shown in fig. 2, the second one of the determined two third time points is determined as the first time point.
By determining the last determined third time point as the first time point, not only the first time point can be accurately determined, but also the determined first time point can be nearest to the current time point.
Step 102: generating a first backup file of a first database, and storing the first backup file into the first backup space; wherein,,
the first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
Here, after the first point in time is determined, a first backup file of the first database is generated, that is, a full-size backup file of the first database is generated, and the full-size backup file is stored in the first backup space. The backup time point corresponding to the first backup file is a first time point. In this way, the client may restore all data corresponding to the first database at the first point in time based on the first backup file.
It should be noted that, the first backup space is not the same as the storage space of the first database itself, and the generation process of the first backup file is not performed in the storage space of the first database itself.
In one embodiment, the generating the first backup file of the first database includes:
determining a fourth point in time; the fourth time point represents a backup time point corresponding to the third backup file; the third backup file characterizes the last full-backed up file before the first time point;
generating the first backup file based on the third backup file and a second number of fourth backup files; wherein,,
the second number of fourth backup files characterizes backup files with corresponding backup time points located between the fourth time point and the first time point, and the second number of fourth backup files does not include the full backup files.
Here, when the first backup file of the first database is generated, a third backup file is determined first, where the third backup file is the last full-size backup file before the first time point, that is, the full-size backup file whose corresponding backup time point is closest to the first time point. And after the third backup file is determined, determining a fourth time point. The fourth time point is a backup time point corresponding to the third backup file.
After the fourth time point is determined, a second number of fourth backup files, of which corresponding backup time points are located between the fourth time point and the first time point, are determined. The second number of fourth backup files are incremental backup files and/or log backup files, excluding full backup files.
A first backup file of the first database is generated based on the third backup file and the second number of fourth backup files.
For example, the first time point is 17:17, and the last full-volume backup file before the first time point, that is, the backup time point corresponding to the third backup file, is 17:00, so the fourth time point is 17:00, and 20 fourth backup files corresponding to the backup time points between 17:00 and 17:17 are determined. The first backup file is generated based on the third backup file of 17:00 and 20 fourth backup files between 17:00-17:17.
By generating the first backup file based on the full amount of backup files corresponding to the fourth time point nearest to the first time point and the second number of fourth backup files between the fourth time point and the first time point, the second number of fourth backup files needing to be restored can be as small as possible, so that the generation efficiency of generating the first backup file is improved.
In an embodiment, the generating the first backup file based on the third backup file and the second number of fourth backup files includes:
taking out the third backup files and the second number of fourth backup files from the first backup space and storing the third backup files and the second number of fourth backup files in a newly built second database;
and generating the first backup file by backing up the second database.
Here, when the first backup file is generated, the third backup file and the second number of fourth backup files are taken out from the first backup space and stored in the newly built second database, and the first backup file of the first database is generated by backing up the second database.
It should be noted that the second database is a newly built second database, and only the third backup files and the second number of fourth backup files are stored in the second database, so the second database can be directly backed up to obtain the first backup files.
Fig. 3 is a schematic diagram of generating a first backup file according to an embodiment of the present application, where the schematic diagram is shown in fig. 3:
the process of generating the first backup file is started.
Step 301: and reading the backup file. The third backup file and the second number of fourth backup files are read from the first backup space.
Step 302: and storing the data in a second database. And storing the read third backup files and the second number of fourth backup files to the newly built second database.
Step 303: a first backup file is generated. And obtaining the first backup file by backing up the second database.
Step 304: and storing the first backup space. After the first backup file is generated, the generated first backup file is stored in the first backup space, so that the client can restore all data corresponding to the first database at the first time point based on the first backup file.
Ending the process of generating the first backup file.
The third backup files and the fourth backup files with the second number are stored in the newly built second database, and the second database is backed up, so that the first backup files are generated, and the first backup files can be accurately generated.
In an embodiment, before said determining the first point in time, the method further comprises:
determining a fifth backup file; the fifth backup file characterizes a first full backup file stored in the first backup space;
and clearing the third number of sixth backup files under the condition that the third number of sixth backup files correspondingly exist before the backup time point corresponding to the fifth backup files.
Here, before determining the first time point, determining the fifth backup file, that is, determining the first full-size backup file stored in the first backup space, and determining whether the sixth backup file exists before the backup time point corresponding to the fifth backup file. If it is determined that the sixth backup file exists correspondingly before the backup time point corresponding to the fifth backup file, and a third number of sixth backup files exists, in this case, the third number of sixth backup files is cleared. The third number of sixth backup files includes incremental backup files and/or log backup files.
Because the user can manually clean the manually backed-up full-volume backup files, the recovery operations of the incremental backup files and the log backup files depend on the previously backed-up full-volume backup files. If the user only manually clears the full backed-up files of the manual backup, and there is a third number of incremental backed-up files and/or log backed-up files after the cleared full backed-up files, then the third number of incremental backed-up files and/or log backed-up files are invalid backed-up files. Therefore, to ensure the validity of the backup files in the first backup space, a third number of incremental backup files and/or log backup files in the first backup space that cannot be used for data recovery need to be purged.
FIG. 4 is a schematic diagram illustrating cleaning a third number of sixth backup files according to an embodiment of the present application, as shown in FIG. 4:
the large circles represent full backed up files.
Firstly, determining a first full-volume backup file stored in a first backup space, namely determining a fifth backup file, wherein a third number of sixth backup files with the number of 4 exist before a backup time point corresponding to the fifth backup file, and in an actual backup process, the 4 sixth backup files belong to invalid backup files, so that the 4 sixth backup files are cleared from the first backup space.
By clearing the third number of sixth backup files before the backup time point corresponding to the fifth backup file, the validity of the backup files in the first backup space can be ensured.
In some embodiments, if the usage rate of the first backup space is greater than or equal to the set threshold, determining whether an invalid backup file exists in the first backup space, if the invalid backup file exists in the first backup space, removing the part of the invalid backup file from the first backup space, and detecting whether the usage rate of the first backup space is less than the set threshold after a set period of time. The set time length can be 4 hours, 5 hours and 6 hours, and the specific value of the set time length can be set according to actual conditions, which is not limited in the embodiment of the present application.
If the usage rate of the first backup space is less than the set threshold after the interval is set for a long time, step 101 and step 102 are executed.
Fig. 5 is a schematic implementation flow chart of a data processing method provided by an application embodiment of the present application, and as shown in fig. 5:
firstly, cleaning invalid backup files in a first backup space.
And detecting whether the utilization rate of the first backup space is smaller than a set threshold T, if so, determining a first time point and generating a first backup file, wherein the first backup file is a full backup file, and the corresponding backup time point is the first time point. After the first backup file is generated, the use rate of the first backup space is continuously and circularly detected, and the process is circularly performed under the condition that the use rate of the first backup space is smaller than T.
And if the utilization rate of the first backup space is greater than or equal to T, detecting the utilization rate of the first backup space after a set time interval, and cleaning invalid backup files in the first backup space within the set time interval.
Fig. 6 is a schematic implementation flow chart of another data processing method provided in an application embodiment of the present application, as shown in fig. 6:
development and deployment of a cloud database system, namely a public cloud platform of a database, are carried out based on Kubernetes.
And deploying a user management module and a backup recovery management module of the cloud database at a management plane Kubernetes Master node. And carrying out operations such as ordering, deleting, backing up, recovering and the like of the cloud database on the nodes of the resource plane Kubernetes according to the scheduling of the management plane, wherein each Node corresponds to one cloud database. The user management module is mainly used for providing services such as ordering, deleting and the like for the cloud database. And the backup recovery module is used for backing up the data of the cloud database and providing a service for recovering the data corresponding to any legal appointed time point of the cloud database.
And the user performs full-volume backup and incremental backup of the cloud database data according to the needs. The backup recovery module dynamically generates a first backup file according to the utilization rate of the backup space and the actual storage condition of the backup file. Backup data of the cloud database is stored in the corresponding backup space. And the backup space corresponding to each cloud database stores manual backup full-volume backup files, a first full-volume backup file which is dynamically generated, manual backup incremental backup files and log backup files according to the sequence of corresponding backup time points.
When the data of the cloud database at the appointed time point needs to be restored, the full quantity of backup files corresponding to the appointed time point are taken out from the corresponding backup space to restore the data.
Fig. 7 is a schematic diagram of a first backup space before and after adjustment provided in an embodiment of the application, as shown in fig. 7:
the large circles represent full backed up files.
The left side is the first backup space without dynamically generating the first backup files, and as can be seen, the total amount of backup files in the first backup space is less, when the data recovery is carried out, the data recovery is needed based on other backup files with more amounts except the total amount of backup files, the required recovery time is longer, and the data recovery efficiency is lower.
The right side is a first backup space after the first backup file is dynamically generated, and each time a first time point is determined, a first backup file is generated. Therefore, when the data recovery is performed, the data recovery is performed only based on the other backup files with smaller quantity, and the required recovery time is shorter, so that the data recovery efficiency is improved.
In the embodiment of the present application, the first time point is determined when the usage rate of the first backup space is smaller than the set threshold. And generating a first backup file of the first database, and storing the first backup file into the first backup space. The first backup file represents a full backup file, and the backup time point corresponding to the first backup file is a first time point; the first backup space is different from the storage space of the first database. In this way, when the usage rate of the first backup space corresponding to the first database meets the condition, the full-volume backup files can be generated as many as possible, the occupation ratio of the full-volume backup files in the first backup space is maximally improved, and the higher the number of the full-volume backup files, the faster the data recovery speed and the higher the data recovery efficiency. In addition, the full-volume backup process in the embodiment of the present application is not performed on the first database, so that even if the full-volume backup is performed multiple times, the load of the first database is not increased, and the performance of the first database is not affected.
In order to implement the method of the embodiment of the present application, the embodiment of the present application further provides a data processing apparatus, and fig. 8 is a schematic diagram of the data processing apparatus provided in the embodiment of the present application, as shown in fig. 8, where the apparatus includes:
a determining unit 801, configured to determine a first time point when the usage rate of the first backup space is less than a set threshold.
A storage unit 802, configured to generate a first backup file of a first database, and store the first backup file in the first backup space; wherein,,
the first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
In an embodiment, the determining unit 801 is further configured to determine a second time point when a first number of consecutive second backup files exist in the first backup space, and the consecutive second backup files are all log backup files; the second time point represents a backup time point corresponding to a first second backup file in the continuous second backup files;
the first point in time is determined based on the second point in time.
In an embodiment, the determining unit 801 is further configured to determine a third time point from the second backup files corresponding to the second time point, and obtain at least one third time point every N second backup files in the continuous second backup files;
determining the first point in time from the at least one third point in time; wherein,,
n is an integer greater than 1 and less than the first number.
In an embodiment, the determining unit 801 is further configured to determine the last determined third time point as the first time point.
In an embodiment, the device further comprises: a generation unit for determining a fourth point in time; the fourth time point represents a backup time point corresponding to the third backup file; the third backup file characterizes the last full-backed up file before the first time point;
generating the first backup file based on the third backup file and a second number of fourth backup files; wherein,,
the second number of fourth backup files characterizes backup files with corresponding backup time points located between the fourth time point and the first time point, and the second number of fourth backup files does not comprise the full backup files.
In an embodiment, the generating unit is further configured to take the third backup file and the second number of fourth backup files out of the first backup space and store the third backup file and the second number of fourth backup files in a newly built second database;
and generating the first backup file by backing up the second database.
In an embodiment, the device further comprises: the clearing unit is used for determining a fifth backup file; the fifth backup file characterizes a first full backup file stored in the first backup space;
and clearing the third number of sixth backup files under the condition that the third number of sixth backup files correspondingly exist before the backup time point corresponding to the fifth backup files.
In practical applications, the determining unit 801, the storing unit 802, the generating unit, the clearing unit may be implemented by a processor in a terminal, such as a central processing unit (CPU, central Processing Unit), a digital signal processor (DSP, digital Signal Processor), a micro control unit (MCU, microcontroller Unit), a programmable gate array (FPGA, field-Programmable Gate Array), or the like.
It should be noted that: in the data processing apparatus provided in the above embodiment, only the division of each program module is used for illustration when information is displayed, and in practical application, the processing allocation may be performed by different program modules according to needs, that is, the internal structure of the apparatus is divided into different program modules, so as to complete all or part of the processing described above. In addition, the data processing apparatus and the data processing method embodiment provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the data processing apparatus and the data processing method embodiment are detailed in the method embodiment, which is not described herein again.
Based on the hardware implementation of the program modules, and in order to implement the method of the embodiment of the application, the embodiment of the application also provides an electronic device. Fig. 9 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application, where, as shown in fig. 9, the electronic device includes:
a communication interface 901 capable of information interaction with other devices such as a network device and the like;
and the processor 902 is connected with the communication interface 901 to realize information interaction with other devices, and is used for executing the methods provided by one or more technical schemes on the terminal side when running the computer program. And the computer program is stored on the memory 903.
Specifically, the processor 902 is configured to determine a first time point when the usage rate of the first backup space is less than a set threshold;
generating a first backup file of a first database, and storing the first backup file into the first backup space; wherein,,
the first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
In an embodiment, the processor 902 is further configured to determine a second point in time when there is a first number of consecutive second backup files in the first backup space, and the consecutive second backup files are all log backup files; the second time point represents a backup time point corresponding to a first second backup file in the continuous second backup files;
the first point in time is determined based on the second point in time.
In an embodiment, the processor 902 is further configured to determine a third time point from the second backup files corresponding to the second time point as a starting point, and obtain at least one third time point every N second backup files in the continuous second backup files;
determining the first point in time from the at least one third point in time; wherein,,
n is an integer greater than 1 and less than the first number.
In an embodiment, the processor 902 is further configured to determine the last determined third time point as the first time point.
In an embodiment, the processor 902 is further configured to determine a fourth point in time; the fourth time point represents a backup time point corresponding to the third backup file; the third backup file characterizes the last full-backed up file before the first time point;
Generating the first backup file based on the third backup file and a second number of fourth backup files; wherein,,
the second number of fourth backup files characterizes backup files with corresponding backup time points located between the fourth time point and the first time point, and the second number of fourth backup files does not comprise the full backup files.
In an embodiment, the processor 902 is further configured to fetch the third backup file and the second number of fourth backup files from the first backup space and store the third backup file and the second number of fourth backup files in a newly created second database;
and generating the first backup file by backing up the second database.
In an embodiment, before the determining the first time point, the processor 902 is further configured to determine a fifth backup file; the fifth backup file characterizes a first full backup file stored in the first backup space;
and clearing the third number of sixth backup files under the condition that the third number of sixth backup files correspondingly exist before the backup time point corresponding to the fifth backup files.
Of course, in actual practice, the various components in the electronic device would be coupled together by bus system 904. It is appreciated that the bus system 904 is used to facilitate connected communications between these components. The bus system 904 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus system 904 in fig. 9.
The memory 903 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It is to be appreciated that the memory 903 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only Memory (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory 903 described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The methods disclosed in the embodiments of the present application may be applied to the processor 902 or implemented by the processor 902. The processor 902 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 902. The processor 902 described above may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 902 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied in a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium in the memory 903 and the processor 902 reads the program in the memory 903, in combination with its hardware, to perform the steps of the method described above.
The processor 902 implements the respective flows in the respective methods of the embodiments of the present application when executing the program.
In an exemplary embodiment, the present application also provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a memory 903 storing a computer program executable by the processor 902 for performing the steps of the aforementioned method. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, terminal and method may be implemented in other manners. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the prior art, and the computer software product may be stored in a storage medium, and include several instructions to cause an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data processing, the method comprising:
determining a first time point under the condition that the utilization rate of the first backup space is smaller than a set threshold value;
generating a first backup file of a first database, and storing the first backup file into the first backup space; wherein,,
the first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
2. The data processing method of claim 1, wherein the determining a first point in time comprises:
determining a second time point when a first number of continuous second backup files exist in the first backup space and are all log backup files; the second time point represents a backup time point corresponding to a first second backup file in the continuous second backup files;
the first point in time is determined based on the second point in time.
3. The data processing method of claim 2, wherein the determining the first point in time based on the second point in time comprises:
Determining a third time point in the continuous second backup files at intervals of N second backup files by taking the second backup files corresponding to the second time point as a starting point, and obtaining at least one third time point;
determining the first point in time from the at least one third point in time; wherein,,
n is an integer greater than 1 and less than the first number.
4. A data processing method according to claim 3, wherein said determining said first point in time from said at least one third point in time comprises:
and determining the last determined third time point as the first time point.
5. The data processing method of claim 1, wherein generating the first backup file of the first database comprises:
determining a fourth point in time; the fourth time point represents a backup time point corresponding to the third backup file; the third backup file characterizes the last full-backed up file before the first time point;
generating the first backup file based on the third backup file and a second number of fourth backup files; wherein,,
the second number of fourth backup files characterizes backup files with corresponding backup time points located between the fourth time point and the first time point, and the second number of fourth backup files does not comprise the full backup files.
6. The data processing method of claim 5, wherein generating the first backup file based on the third backup file and the second number of fourth backup files comprises:
taking out the third backup files and the second number of fourth backup files from the first backup space and storing the third backup files and the second number of fourth backup files in a newly built second database;
and generating the first backup file by backing up the second database.
7. The data processing method of claim 1, wherein prior to said determining a first point in time, the method further comprises:
determining a fifth backup file; the fifth backup file characterizes a first full backup file stored in the first backup space;
and clearing the third number of sixth backup files under the condition that the third number of sixth backup files correspondingly exist before the backup time point corresponding to the fifth backup files.
8. A data processing apparatus, the apparatus comprising:
a determining unit configured to determine a first time point if a usage rate of the first backup space is smaller than a set threshold;
the storage unit is used for generating a first backup file of the first database and storing the first backup file into the first backup space; wherein,,
The first backup file represents a full-scale backup file, and the backup time point corresponding to the first backup file is the first time point; the first backup space is different from a storage space of the first database.
9. An electronic device, comprising: a processor and a memory for storing a computer program capable of running on the processor, wherein,
the processor being adapted to perform the steps of the method of any of claims 1-7 when the computer program is run.
10. A storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202111619612.5A 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium Pending CN116361066A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111619612.5A CN116361066A (en) 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111619612.5A CN116361066A (en) 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116361066A true CN116361066A (en) 2023-06-30

Family

ID=86928911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111619612.5A Pending CN116361066A (en) 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116361066A (en)

Similar Documents

Publication Publication Date Title
CN108170555B (en) Data recovery method and equipment
CN107844268B (en) Data distribution method, data storage method, related device and system
CN106776130B (en) Log recovery method, storage device and storage node
CN105718548B (en) Based on the system and method in de-duplication storage system for expansible reference management
CN110750382B (en) Minimum storage regeneration code coding method and system for improving data repair performance
US11698728B2 (en) Data updating technology
CN109783014B (en) Data storage method and device
CN104899071A (en) Recovery method and recovery system of virtual machine in cluster
EP3474143B1 (en) Method and apparatus for incremental recovery of data
CN104077380A (en) Method and device for deleting duplicated data and system
US8433864B1 (en) Method and apparatus for providing point-in-time backup images
CN110333971A (en) SSD bad block table backup method, device, computer equipment and storage medium
CN110825559A (en) Data processing method and equipment
CN106708865B (en) Method and device for accessing window data in stream processing system
CN114116321A (en) Redundant data management method and device, computer equipment and storage medium
CN107329699B (en) Erasure rewriting method and system
CN113419897A (en) File processing method and device, electronic equipment and storage medium thereof
CN116361066A (en) Data processing method and device, electronic equipment and storage medium
US10489252B2 (en) Rotating incremental data backup
CN110941597B (en) Method and device for cleaning decompressed file, computing equipment and computer storage medium
CN115756955A (en) Data backup and data recovery method and device and computer equipment
CN111625397B (en) Service log backup method, cluster, device, electronic equipment and storage medium
CN112596959A (en) Distributed storage cluster data backup method and device
CN112650447B (en) Backup method, system and device for ceph distributed block storage
CN111444040B (en) Metadata backup method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination