CN107391306B - Heterogeneous database backup file recovery method - Google Patents

Heterogeneous database backup file recovery method Download PDF

Info

Publication number
CN107391306B
CN107391306B CN201710622124.7A CN201710622124A CN107391306B CN 107391306 B CN107391306 B CN 107391306B CN 201710622124 A CN201710622124 A CN 201710622124A CN 107391306 B CN107391306 B CN 107391306B
Authority
CN
China
Prior art keywords
data
database
backup
file
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710622124.7A
Other languages
Chinese (zh)
Other versions
CN107391306A (en
Inventor
刘赛
杨华飞
聂庆节
刘嘉华
刘军
张磊
马悦皎
缪骞云
张翼
张迎星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Nari Information and Communication Technology Co
Nanjing NARI Group Corp
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Nari Information and Communication Technology Co
Nanjing NARI Group Corp
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Nari Information and Communication Technology Co, Nanjing NARI Group Corp, Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201710622124.7A priority Critical patent/CN107391306B/en
Publication of CN107391306A publication Critical patent/CN107391306A/en
Application granted granted Critical
Publication of CN107391306B publication Critical patent/CN107391306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques

Abstract

the invention discloses a heterogeneous database backup file recovery method, which comprises the following steps: normalizing and converting data in a heterogeneous source database; adopting a DELTA compression algorithm of K-medoids clustering to perform clustering pretreatment on the data blocks, and classifying the data blocks with higher similarity into one class; compressing the same type of data blocks by utilizing a Delta compression algorithm; the data is restored based on an SQL reproduction method, a database version at a restoring end is read according to a configuration file, a metadata model is converted into SQL statements supported by a database of a corresponding version according to conversion rules, and the SQL statements are imported into the database after data consistency detection, so that the functions of stable backup and recovery of a heterogeneous database are realized. The invention can support various source databases through the expansion of the mapping rule, realizes the backup of the heterogeneous database, supports high-efficiency file compression and reduces the backup cost.

Description

Heterogeneous database backup file recovery method
Technical Field
The invention relates to a heterogeneous database backup file recovery method, and belongs to the technical field.
Background
in recent years, with the development of information technology, information management systems have been widely used. The system and the method become a platform for information release and information transaction by the characteristics of rapidness, high efficiency and convenience, and further promote the digitization and informatization process of the whole society, and various informatization systems construct the current 'information world'.
the development of various industries is not separated from data: product data, customer data, financial data, etc., the survival development of enterprises is increasingly dependent on IT systems. The information data is damaged in a large scale due to computer viruses, network intrusion, physical damage, manual operation errors and the like, so that the information system cannot provide normal service, and huge economic loss is caused in the fields of certain industries related to economic benefits, such as banks, electric power, communication and the like. The data is protected through a data backup means, and the local data can be rapidly recovered after a fault occurs.
database backup research is a demand-driven area in which large corporations have begun relatively early on, and some backup technologies have been in use for a considerable period of time in a variety of application environments. Foreign research on backup software began in the mid-80's of the 20 th century, and commercial backup products matured to date include: tivoli from EMC, NetVault from BakBone, BrihtStor from CA, etc.
The software research institute of the university of zhongshan has jointly developed NetBunker2 for network backup recovery of Linux backup servers with cantonese network technologies ltd. The Heartone Backup Enterprise of the Zhongshan same-direction company provides distributed Backup, realizes intelligent Backup recovery, and simplifies a server and a network storage environment.
In the open source field, backup software is developed vigorously, and a large number of excellent open source backup software appears, wherein a few of the excellent backup software are named and comprise Amanda, Bacula, BackupPC, Restore, Burt and the like. Open source software, although technically disclosed, functions only support the most basic work in some backups and is not suitable for business scenarios. It is therefore necessary to conduct theoretical studies on some commercial functions.
with the gradual development of enterprises, the enterprise data has the characteristics of large quantity, wide sources, multiple types, complex structure and the like. Enterprises accumulate a large amount of business data, the data has very important significance for normal operation of the enterprises, and due to the fact that database systems used in all stages are different, how to backup heterogeneous data becomes a key problem in the field of data backup. Although some large databases such as Oracle and SQL Server have provided database backup restore tools themselves, these tools only support a single database backup and do not solve the heterogeneous problem of the database backup process.
Disclosure of Invention
the invention aims to overcome the defects in the prior art, provides a heterogeneous database backup file recovery method, and solves the technical problem that heterogeneous data cannot be effectively backed up and recovered in the prior art.
in order to solve the technical problems, the technical scheme adopted by the invention is as follows: a heterogeneous database backup file recovery method comprises the following steps:
(1) Normalizing and converting data in a heterogeneous source database;
(2) Clustering preprocessing is carried out on the data blocks, the data blocks of the same type are compressed by utilizing a DELTA compression algorithm to generate corresponding binary storage files, and the compressed backup files are backed up to a backup medium;
(3) Restoring the metadata in the backup file by using an SQL (structured query language) reproduction method, and reading a database version of a restoring end according to the configuration file;
(4) and converting the metadata model into SQL statements supported by the corresponding version database according to the conversion rules, performing data consistency detection, and importing the SQL statements into the database to realize backup file recovery of the heterogeneous database.
The specific method of the step (1) is as follows:
101. Loading a driver: importing a driver into a development environment, and loading the driver through a class.
102. Creating a connection: after loading the driver, creating a database connection object through getConnect () function of DriverManage, wherein the connection object comprises: protocol name, IP address, port number, database name;
103. create State object: creating a State object through a create State () function of Connection;
104. and (3) executing the SQL statement: when an SQL statement produces a single result set, execute query (); when there is no returned result, executeUpdate (); when multiple result sets are returned, execute ();
105. and (3) obtaining a result: when executing execute () and execute query () of state, the returned result is a ResultSet object, and data in the returned result is obtained by using a next () function through a pointer pointing to the object;
106. loading a conversion rule according to the type of a database, converting heterogeneous data into metadata with unified standards through an excData () function, wherein each element in the metadata comprises a key field identifier and is used for checking the consistency of the data during data recovery, and if the metadata is changed in the backup process, setting the identifier to be 1;
107. Writing the obtained data into a file according to an XML format through a wrtData () function to generate a corresponding backup file;
108. closing the connection: if the database is no longer in use, the database connection is closed using the close () method.
the metadata is the minimum unit of the data model, and the structural expression of the metadata is shown as formula (1):
M=CS+SS (1)
wherein: CS is a content structure, defining the constituent elements of metadata and element content, SS is a syntax structure, defining the format structure of metadata and a specific description method;
The content structure expression is shown as formula (2):
CS=(T,Z,S,F) (2)
t represents a source table, is a table structure of a multi-source database, stores table structure information of data to be backed up, and comprises: the method comprises the steps of obtaining a source table serial number, a source table name, an identifier, a field number, a field name and field type information;
Z represents a field, is a data value of the multi-source database, and stores specific numerical values of the field in a table, including: a field sequence number, a field name, a field type, a field value, a table name, and an identifier;
s represents a preset set, which is a basic unit of backup and comprises a preset set number, a source server, a target server, a start time, an end time, a backup serial number, a source table serial number and a field serial number; the system comprises a plurality of units, a backup task module and a backup task module, wherein the units are used for defining backup objects, subdividing a backup process into the units, and continuing a backup task from an interrupt position when one backup task is interrupted;
f represents constraint, and the constraint element describes field constraint information in the table and is used for recording special column information in the table, wherein the special column information comprises a table name, a constraint serial number, a primary key column name, an external key column name, an index column name and an identifier.
the special column information is recorded separately to give an integrity description of the table structure.
the specific method of the step (2) is as follows:
201. segmenting a file to be compressed, adopting the size of a 1M file as a dividing unit, performing Delta compression between every two divided file blocks, storing the size of the file subjected to Delta compression in a temporary matrix arr _ DELTA [ N ] [ N ], and taking the size as the similarity between data blocks;
202. clustering the data blocks by using the similarity information stored in the similarity matrix as a clustering basis through a K-medoids clustering algorithm;
203. Selecting a feature set from a file by adopting a content-independent method, and determining the number of generated intermediate fingerprints and the size of the file according to the size of an allocable memory;
204. Setting the size of a sliding window, continuously moving the sliding window forwards, calculating data fingerprints under the moving window, and mapping the data fingerprints into super features or super fingerprint sets by adopting a Hash function;
205. if the super fingerprints are matched, searching a reference file with the highest similarity in the feature database, and compressing according to a compression function D after finding the reference file;
206. Encoding the ordered symbol string by a compression function D, and encoding a command by utilizing ADD, wherein the command format is (ADD, L, S), and the command format is that a character string S with the length of L is added at a specified position in V; COPY encoding command, its command format is (COPY, L, O), represent COPY length L, offset O character string to appointed position in V from R;
207. and recombining the compressed data blocks into a backup file.
the specific method for compressing the same type of data blocks by utilizing the DELTA compression algorithm comprises the following steps:
Partitioning the backup file, recording a data block set as S ═ S1, S2, S3 … Sn }, clustering data objects in the set S, dividing the data blocks into K classes C ═ C1', C2', C3'… Ck' }, and expressing the similarity between two similar data blocks as DELTA distance between the two similar data blocks, namely:
dist(Si,Sj)=delta(Si,Sj) (3)
Randomly selecting K data blocks as the center points of the clusters in S, respectively representing the K data blocks by { m1, m2 and m3 … mk }, and distributing points representing the rest data blocks to the nearest clusters to obtain cluster clusters C ═ C1, C2 and C3 … Ck };
for each cluster Ci, i belongs to {1,2,3 … k }, traversing the jth non-center point object Sj in the cluster, calculating the total cost of each data block S j and the rest data blocks S k in the cluster by using formula (4),
and selecting the minimum total cost point in the clusters as the central point of the new cluster, and iterating the steps until the central point of each cluster is not changed any more, and finally obtaining K clusters C ═ C1', C2', C3'… Ck'.
the specific method of the step (3) is as follows:
301. reading the type and the version number of a database at a recovery end, and loading a corresponding mapping rule according to the database version;
302. reading a preset set sequence number of a corresponding task according to the recovery task information, and searching a source table sequence number to be recovered, a constraint sequence number and a field sequence number according to the preset set sequence number;
303. Searching corresponding source table elements and constraint elements in the metadata according to the source table sequence numbers and the constraint sequence numbers, and checking the corresponding identifier content: if the identifier is 1, executing step 304, otherwise executing step 305;
304. Acquiring source table and dependency specific information, including: the method comprises the steps that a table name, a field name in the table, a field type, a main key, an external key and an index are obtained, a corresponding SQL statement is generated and stored in an SQL file, and an identifier is set to be 0 after the file is generated;
305. acquiring a corresponding field element according to the field sequence number, checking the content of the corresponding identifier, and executing a step 306 if the identifier is 1, or executing a step 307 if the identifier is not 1;
306. Acquiring field specific information, including field names, field types, field values and field corresponding source table names, generating corresponding INSERT statements according to the acquired information to realize data addition, storing the contents in an sql file, and setting an identifier to be 0 after the file generation is finished;
307. the sql file restores the data to the database by executing the control command.
when the SQL reproduction method is adopted to restore the metadata in the backup file, the value of the identifier in the metadata file is firstly checked:
if the identifier is 1, the data is not recovered, and the content in the backup file is converted into an SQL statement by reversely using a grammar mapping rule;
if the identifier is 0, the content is restored to the database in the previous restoration task, and conversion and restoration are not needed.
compared with the prior art, the invention has the following beneficial effects:
the invention designs a universal metadata model, defines the mapping rule between the data in the current mainstream databases Oracle, Mysql and PostgreSQL and the model, normalizes the data into metadata and stores the metadata in an XML file;
an improved DELTA compression algorithm is provided, repeated data deletion is carried out on the backup files, and the backup cost is reduced;
the problem of information isolated island caused by heterogeneous databases in enterprises can be solved, a consistent backup framework facing enterprise requirements is provided, the utilization rate of backup media can be improved, and the backup cost can be reduced;
And for the recovery task, recovering the metadata into SQL statements supported by the database with the specified version according to the configuration of the database, importing the data into the database by executing an SQL statement mode to realize recovery, and selectively recovering the data according to the modification marks in the source data model during recovery to ensure the consistency of the data.
Drawings
FIG. 1 is a schematic diagram of a backup system hierarchy;
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a heterogeneous data extraction flow diagram;
FIG. 4 is a flow chart of data compression based on K-medoids clustering;
Fig. 5 is a data recovery flow chart.
Detailed Description
the invention provides a heterogeneous database backup file recovery method, which comprises the following steps: designing a metadata model, normalizing and converting data in a heterogeneous source database, and storing the metadata model through an XML file; a DELTA compression algorithm based on K-medoids clustering is provided, data blocks are subjected to clustering preprocessing, and the data blocks with high similarity are classified into one class. Compressing the same type of data blocks by utilizing a Delta compression algorithm; the data is restored based on an SQL reproduction method, a database version at a restoring end is read according to a configuration file, a metadata model is converted into SQL statements supported by a database of a corresponding version according to conversion rules, and the SQL statements are imported into the database after data consistency detection, so that the functions of stable backup and recovery of a heterogeneous database are realized.
the invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The backup system includes three functions: data extraction, data processing and data recovery. And data extraction realizes unified description of different database data types through a metadata model, and source data are extracted and stored into a backup file according to a backup task. And the data processing compresses the repeated contents in the backup file by using a compression algorithm to generate a corresponding binary storage file, and backs up the compressed backup file to a backup medium. And converting the metadata in the backup file based on an SQL reproduction method for data recovery to generate SQL files which can be executed by databases of various versions, and finally importing the data into a database system to realize data recovery.
as shown in fig. 1, the hierarchical structure diagram of the backup system is divided into three layers, namely, a common connection layer, a service layer and an application layer.
(1) common connection layer
The public connection layer is positioned at the bottommost layer of the system and is responsible for realizing the database connection function, providing database connection and query service for the service layer, and providing encryption and decryption when backuping the database with higher security level so as to ensure the reliable connection established between the public connection layer and the heterogeneous data source. The method is mainly used for establishing connection with different databases through JDBC technology.
(2) business layer
the business layer realizes the core function of the system, and all the composition links of the database backup and recovery are realized at the same layer. The data conversion realizes the mutual mapping of metadata and database data, and shields the difference of data formats, constraint rules and SQL syntax of the heterogeneous database through the mapping rules, which is a difficult point for backup of the heterogeneous database.
the data compression function uses a DELTA compression algorithm based on K-medoids clustering, the efficiency is doubled on the basis of the most basic DELTA compression algorithm, the backup files can be compressed to about one fourth of the original files, and the backup cost can be reduced while the backup speed is increased.
The consistency detection function is to protect the data reliability and ensure that the content in the database after the recovery task is executed is the same as the content in the backup.
there is a mutual dependency relationship between them in the functional flow. And in the backup task stage, data conversion is firstly carried out, and then the converted data is compressed and stored in a backup medium. In the recovery stage, the compressed file is restored into a data file through a recovery technology of data compression, the data content is determined to be recovered through checking identifiers in the data, and then the data content is converted into an SQL statement through a conversion rule and is imported into a database.
(3) application layer
the application layer solves the practical problem by using the services provided by the service layer and the public connection layer, and mainly comprises a backup recovery task or a backup recovery plan customized by a user. The layer carries out interface design based on QT, and ensures the portability of the whole system and the expansibility of the system.
as shown in fig. 2 to 5, the method for restoring the backup file of the heterogeneous database provided by the present invention specifically includes the following steps:
(1) the specific method for normalizing and converting the data in the heterogeneous source database comprises the following steps:
101. Loading a driver: importing a driver into a development environment, and loading the driver through a class.
102. Creating a connection: after the driver is loaded, a database connection object is created through the getConnect () function of DriverManage, such as Connect ═ drivermanager. getConnect ("url", "UserName", "PassWord"). Although the urls of different databases have different formats, they should contain information such as protocol name, IP address, port number, database name, etc. UserName and PassWord are user names and passwords connected to a database management system;
103. create State object: creating a State object through a create State () function of Connection; the State element class is mainly used for executing the SQL Statement to obtain a result generated after execution;
104. And (3) executing the SQL statement: the method for executing the SQL Statement by the State element mainly comprises executeQuery (), executeUpdate (), and execute (). Execute query () is used when an SQL statement produces a single result set, execute update () is used when no result is returned, execute () is used when multiple result sets are returned;
105. and (3) obtaining a result: when executing execute () and execute query () of state, the returned result is a ResultSet object, and data in the returned result is obtained by using a next () function through a pointer pointing to the object;
106. loading a conversion rule according to the type of a database, converting heterogeneous data into metadata with unified standards through an excData () function, wherein each element in the metadata comprises a key field identifier and is used for checking the consistency of the data during data recovery, and if the metadata is changed in the backup process, setting the identifier to be 1;
107. writing the obtained data into a file according to an XML format through a wrtData () function to generate a corresponding backup file;
108. Closing the connection: in order not to waste resources, the database connection is closed with the close () method when the database is no longer used.
the data formats of various heterogeneous databases are different from the metadata format, and currently, the mainstream database data format has strong functional syntax and rich data type main keys, for example, Oracle's basic character types include CHAR, VARCHAR2, NCHAR, nvarch 2, and the like. Meanwhile, for different database systems, the situation that different data types cannot be supported exists, so that the mapping rule is set for conversion. The data mapping rule is also called a metadata dictionary and is a basis for normalizing heterogeneous data. The mapping rules are designed based on the data types in the heterogeneous source database, and can be classified into character type, real number type, integer type and byte type according to the meaning to be expressed by the data types. The specific mapping relationship is shown in table 1.
TABLE 1 data type mapping rules
the metadata is the minimum unit of the data model, and the structural expression of the metadata is shown as formula (1):
M=CS+SS (1)
wherein: CS (content structure) is a content structure, defines the composition elements of metadata and element content, SS (syntax structure) is a syntax structure, defines the format structure of the metadata and a specific description method;
The content structure expression is shown as formula (2):
CS=(T,Z,S,F) (2)
Source table (T): the element represents a table structure of a multi-source database, stores table structure information of data to be backed up, and comprises a source table serial number, a source table name, an identifier, a field number, a field name and field type information.
field (Z): the element represents the data value of the multi-source database, and the specific numerical value of the field in the storage table. Including field sequence number, field name, field type, field value, table name, and identifier.
predetermined set (S): defining backup objects subdivides the backup process into units, and when one backup task is interrupted, the backup task can be continued from the interrupted position. The mechanism saves time and improves backup efficiency, is helpful to ensure the consistency of backup results, and prevents data redundancy caused by repeated backup of data which is backed up. The predetermined set is defined as a basic unit of backup, containing objects to be backed up. The reservation integrator includes a reservation set number, a source server, a target server, a start time, an end time, a backup sequence number, a source table sequence number, and a field sequence number.
constraint (F): the constraint element describes field constraint information in the table and is used for recording special column information in the table. Including a table name, a constraint sequence number, a primary key column name, an foreign key column name, an index column name, and an identifier. The special column information must be recorded separately for its special function to describe the integrity of the table structure.
t represents a source table, is a table structure of a multi-source database, stores table structure information of data to be backed up, and comprises: the method comprises the steps of obtaining a source table serial number, a source table name, an identifier, a field number, a field name and field type information;
z represents a field, is a data value of the multi-source database, and stores specific numerical values of the field in a table, including: a field sequence number, a field name, a field type, a field value, a table name, and an identifier;
s represents a preset set, which is a basic unit of backup and comprises a preset set number, a source server, a target server, a start time, an end time, a backup serial number, a source table serial number and a field serial number; the system comprises a plurality of units, a backup task module and a backup task module, wherein the units are used for defining backup objects, subdividing a backup process into the units, and continuing a backup task from an interrupt position when one backup task is interrupted;
f represents constraint, and the constraint element describes field constraint information in the table and is used for recording special column information in the table, wherein the special column information comprises a table name, a constraint serial number, a primary key column name, an external key column name, an index column name and an identifier.
(2) clustering preprocessing is carried out on data blocks, the data blocks of the same type are compressed by utilizing a DELTA compression algorithm to generate corresponding binary storage files, and the compressed backup files are backed up to a backup medium, wherein the specific method comprises the following steps:
201. Segmenting a file to be compressed, adopting the size of a 1M file as a dividing unit, performing Delta compression between every two divided file blocks, storing the size of the file subjected to Delta compression in a temporary matrix arr _ DELTA [ N ] [ N ], and taking the size as the similarity between data blocks;
202. Clustering the data blocks by using the similarity information stored in the similarity matrix as a clustering basis through a K-medoids clustering algorithm, wherein the clustering result ensures that the similarity between the data blocks in the same class is higher;
203. selecting a feature set from a file by adopting a content-independent method, and determining the number of generated intermediate fingerprints and the size of the file according to the size of an allocable memory;
204. setting the size of a sliding window, continuously moving the sliding window forwards, and calculating the data fingerprint under the moving window. In order to improve the retrieval speed and reduce the search time, a Hash function is adopted to map the super features or the super fingerprint set;
205. if the super fingerprints are matched, the similarity of the two files is larger. Searching a reference file which is highly similar to the characteristic database in the characteristic database, and compressing the reference file according to a compression function D after the reference file is found;
206. Encoding the ordered symbol string by a compression function D, and encoding a command by utilizing ADD, wherein the command format is (ADD, L, S), and the command format is that a character string S with the length of L is added at a specified position in V; COPY encoding command, its command format is (COPY, L, O), represent COPY length L, offset O character string to appointed position in V from R;
207. And recombining the compressed data blocks into a backup file.
the specific method for compressing the same type of data blocks by utilizing the DELTA compression algorithm comprises the following steps:
Partitioning the backup file, recording a data block set as S ═ S1, S2, S3 … Sn }, clustering data objects in the set S, dividing the data blocks into K classes C ═ C1', C2', C3'… Ck' }, and expressing the similarity between two similar data blocks as DELTA distance between the two similar data blocks, namely:
dist(Si,Sj)=delta(Si,Sj) (3)
Randomly selecting K data blocks as the center points of the clusters in S, respectively representing the K data blocks by { m1, m2 and m3 … mk }, and distributing points representing the rest data blocks to the nearest clusters to obtain cluster clusters C ═ C1, C2 and C3 … Ck };
for each cluster Ci, i belongs to {1,2,3 … k }, traversing the jth non-center point object Sj in the cluster, calculating the total cost of each data block S j and the rest data blocks S k in the cluster by using formula (4),
And selecting the minimum total cost point in the clusters as the central point of the new cluster, and iterating the steps until the central point of each cluster is not changed any more, and finally obtaining K clusters C ═ C1', C2', C3'… Ck'.
(3) Restoring the metadata in the backup file by using an SQL (structured query language) reproduction method, and reading a database version of a restoring end according to the configuration file; and reversely using the conversion rule to restore the metadata information into SQL sentences which can be identified by the database and generating corresponding SQL files. The specific method comprises the following steps:
301. reading the type and the version number of a database at a recovery end, and loading a corresponding mapping rule according to the database version;
302. Reading a preset set sequence number of a corresponding task according to the recovery task information, and searching a source table sequence number to be recovered, a constraint sequence number and a field sequence number according to the preset set sequence number;
303. searching corresponding source table elements and constraint elements in the metadata according to the source table sequence numbers and the constraint sequence numbers, and checking the corresponding identifier content: if the identifier is 1, executing step 304, otherwise executing step 305;
304. acquiring source table and dependency specific information, including: the method comprises the steps that a table name, a field name in the table, a field type, a main key, an external key and an index are obtained, a corresponding SQL statement is generated and stored in an SQL file, and an identifier is set to be 0 after the file is generated;
305. acquiring a corresponding field element according to the field sequence number, checking the content of the corresponding identifier, and executing a step 306 if the identifier is 1, or executing a step 307 if the identifier is not 1;
306. Acquiring field specific information, including field names, field types, field values and field corresponding source table names, generating corresponding INSERT statements according to the acquired information to realize data addition, storing the contents in an sql file, and setting an identifier to be 0 after the file generation is finished;
307. the sql file restores the data to the database by executing the control command.
(4) And converting the metadata model into SQL statements supported by the corresponding version database according to the conversion rules, performing data consistency detection, and importing the SQL statements into the database to realize backup file recovery of the heterogeneous database.
when the SQL reproduction method is adopted to restore the metadata in the backup file, the value of the identifier in the metadata file is firstly checked:
If the identifier is 1, the data is not recovered, and the content in the backup file is converted into an SQL statement by reversely using a grammar mapping rule;
If the identifier is 0, the content is restored to the database in the previous restoration task, and conversion and restoration are not needed.
the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (7)

1. a method for recovering backup files of a heterogeneous database is characterized by comprising the following steps:
(1) Normalizing and converting data in a heterogeneous source database;
(2) clustering preprocessing is carried out on the data blocks, the data blocks of the same type are compressed by utilizing a DELTA compression algorithm to generate corresponding binary storage files, and the compressed backup files are backed up to a backup medium;
(3) Restoring the metadata in the backup file by using an SQL (structured query language) reproduction method, and reading a database version of a restoring end according to the configuration file; the specific method comprises the following steps:
301. Reading the type and the version number of a database at a recovery end, and loading a corresponding mapping rule according to the database version;
302. reading a preset set sequence number of a corresponding task according to the recovery task information, and searching a source table sequence number to be recovered, a constraint sequence number and a field sequence number according to the preset set sequence number;
303. searching corresponding source table elements and constraint elements in the metadata according to the source table sequence numbers and the constraint sequence numbers, and checking the corresponding identifier content: if the identifier is 1, executing step 304, otherwise executing step 305;
304. acquiring source table and dependency specific information, including: the method comprises the steps that a table name, a field name in the table, a field type, a main key, an external key and an index are obtained, a corresponding SQL statement is generated and stored in an SQL file, and an identifier is set to be 0 after the file is generated;
305. acquiring a corresponding field element according to the field sequence number, checking the content of the corresponding identifier, and executing a step 306 if the identifier is 1, or executing a step 307 if the identifier is not 1;
306. acquiring field specific information, including field names, field types, field values and field corresponding source table names, generating corresponding INSERT statements according to the acquired information to realize data addition, storing the contents in an sql file, and setting an identifier to be 0 after the file generation is finished;
307. the sql file restores the data to the database;
(4) And converting the metadata model into SQL statements supported by the corresponding version database according to the conversion rules, performing data consistency detection, and importing the SQL statements into the database to realize backup file recovery of the heterogeneous database.
2. The method for restoring the backup files of the heterogeneous database according to claim 1, wherein the specific method in step (1) is as follows:
101. Loading a driver: importing a driver into a development environment, and loading the driver through a class.
102. creating a connection: after loading the driver, creating a database connection object through getConnect () function of DriverManage, wherein the connection object comprises: protocol name, IP address, port number, database name;
103. Create State object: creating a State object through a create State () function of Connection;
104. And (3) executing the SQL statement: when an SQL statement produces a single result set, execute query (); when there is no returned result, executeUpdate (); when multiple result sets are returned, execute ();
105. and (3) obtaining a result: when executing execute () and execute query () of state, the returned result is a ResultSet object, and data in the returned result is obtained by using a next () function through a pointer pointing to the object;
106. loading a conversion rule according to the type of a database, converting heterogeneous data into metadata with unified standards through an excData () function, wherein each element in the metadata comprises a key field identifier and is used for checking the consistency of the data during data recovery, and if the metadata is changed in the backup process, setting the identifier to be 1;
107. Writing the obtained data into a file according to an XML format through a wrtData () function to generate a corresponding backup file;
108. Closing the connection: if the database is no longer in use, the database connection is closed using the close () method.
3. The method for restoring the backup file of the heterogeneous database according to claim 1, wherein the metadata is a minimum unit of a data model, and a metadata structure expression is shown in formula (1):
M=CS+SS (1)
wherein: CS is a content structure, which refers to metadata construction elements and element contents, SS is a syntax structure, and a metadata format structure and a specific description method are defined;
the content structure expression is shown as formula (2):
CS=(T,Z,S,F) (2)
t represents a source table, is a table structure of a multi-source database, stores table structure information of data to be backed up, and comprises: the method comprises the steps of obtaining a source table serial number, a source table name, an identifier, a field number, a field name and field type information;
z represents a field, is a data value of the multi-source database, and stores specific numerical values of the field in a table, including: a field sequence number, a field name, a field type, a field value, a table name, and an identifier;
S represents a preset set, which is a basic unit of backup and comprises a preset set number, a source server, a target server, a start time, an end time, a backup serial number, a source table serial number and a field serial number; the system comprises a plurality of units, a backup task module and a backup task module, wherein the units are used for defining backup objects, subdividing a backup process into the units, and continuing a backup task from an interrupt position when one backup task is interrupted;
f represents constraint, and the constraint element describes field constraint information in the table and is used for recording special column information in the table, wherein the special column information comprises a table name, a constraint serial number, a primary key column name, an external key column name, an index column name and an identifier.
4. The method of claim 3, wherein the special column information is recorded separately to describe the integrity of the table structure.
5. the method for restoring the backup files of the heterogeneous database according to claim 1, wherein the specific method in the step (2) is as follows:
201. Segmenting a file to be compressed, adopting the size of a 1M file as a dividing unit, performing Delta compression between every two divided file blocks, storing the size of the file subjected to Delta compression in a temporary matrix arr _ DELTA [ N ] [ N ], and taking the size as the similarity between data blocks;
202. clustering the data blocks by using the similarity information stored in the similarity matrix as a clustering basis through a K-medoids clustering algorithm;
203. Selecting a feature set from a file by adopting a content-independent method, and determining the number of generated intermediate fingerprints and the size of the file according to the size of an allocable memory;
204. setting the size of a sliding window, continuously moving the sliding window forwards, calculating data fingerprints under the moving window, and mapping the data fingerprints into super features or super fingerprint sets by adopting a Hash function;
205. If the super fingerprints are matched, searching a reference file with the highest similarity in the feature database, and compressing according to a compression function D after finding the reference file;
206. Encoding the ordered symbol string by a compression function D, and encoding a command by utilizing ADD, wherein the command format is (ADD, L, S), and the command format is that a character string S with the length of L is added at a specified position in V; COPY encoding command, its command format is (COPY, L, O), represent COPY length L, offset O character string to appointed position in V from R;
207. and recombining the compressed data blocks into a backup file.
6. The method for restoring the backed-up files of the heterogeneous database according to claim 5, wherein the specific method for compressing the same type of data blocks by using a DELTA compression algorithm is as follows:
Partitioning the backup file, recording a data block set as S ═ S1, S2, S3 … Sn }, clustering data objects in the set S, dividing the data blocks into K classes C ═ C1', C2', C3'… Ck' }, and expressing the similarity between two similar data blocks as DELTA distance between the two similar data blocks, namely:
dist(Si,Sj)=delta(Si,Sj) (3)
randomly selecting K data blocks as the center points of the clusters in S, respectively representing the K data blocks by { m1, m2 and m3 … mk }, and distributing points representing the rest data blocks to the nearest clusters to obtain cluster clusters C ═ C1, C2 and C3 … Ck };
for each cluster Ci, i belongs to {1,2,3 … k }, traversing the jth non-center point object Sj in the cluster, calculating the total cost of each data block S j and the rest data blocks S k in the cluster by using formula (4),
And selecting the minimum total cost point in the clusters as the central point of the new cluster, and iterating the steps until the central point of each cluster is not changed any more, and finally obtaining K clusters C ═ C1', C2', C3'… Ck'.
7. the method for restoring the backed-up file of the heterogeneous database according to claim 1, wherein the "SQL rendition method" is adopted to restore the metadata in the backed-up file by first checking the value of the identifier in the metadata file:
If the identifier is 1, the data is not recovered, and the content in the backup file is converted into an SQL statement by reversely using a grammar mapping rule;
If the identifier is 0, the content is restored to the database in the previous restoration task, and conversion and restoration are not needed.
CN201710622124.7A 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method Active CN107391306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710622124.7A CN107391306B (en) 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710622124.7A CN107391306B (en) 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method

Publications (2)

Publication Number Publication Date
CN107391306A CN107391306A (en) 2017-11-24
CN107391306B true CN107391306B (en) 2019-12-10

Family

ID=60341216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710622124.7A Active CN107391306B (en) 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method

Country Status (1)

Country Link
CN (1) CN107391306B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165260A (en) * 2018-09-25 2019-01-08 安徽信息工程学院 Method of data transfer based on ORACLE data basd link
CN109298976B (en) * 2018-10-17 2022-04-12 成都索贝数码科技股份有限公司 Heterogeneous database cluster backup system and method
CN109271463B (en) * 2018-11-30 2022-06-07 四川巧夺天工信息安全智能设备有限公司 Method for recovering inodb compressed data of MySQL database
CN109614434A (en) * 2018-12-14 2019-04-12 万翼科技有限公司 Data lead-in method, device and computer readable storage medium
CN110515764B (en) * 2019-07-30 2022-12-06 国云科技股份有限公司 System and method for cloud database backup and cross-cloud recovery
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN110928899B (en) * 2019-11-29 2023-06-20 中孚安全技术有限公司 Universal database backup method and system
CN111427938B (en) * 2020-03-18 2023-08-29 中国建设银行股份有限公司 Data transfer method and device
CN112347189A (en) * 2020-11-05 2021-02-09 江苏电力信息技术有限公司 Cloud computing-based financial data consistency failure discovery and recovery method
CN113806138A (en) * 2021-02-05 2021-12-17 京东科技控股股份有限公司 Backup recovery detection method and device for database, electronic equipment and storage medium
CN112882866B (en) * 2021-02-24 2023-12-15 上海泰宇信息技术股份有限公司 Backup method suitable for mass files
CN115145884A (en) * 2021-03-30 2022-10-04 华为技术有限公司 Data compression method and device
CN114443739A (en) * 2022-04-08 2022-05-06 北京华顺信安科技有限公司 Method and device for extracting product version number
CN115757461B (en) * 2022-11-09 2023-06-23 北京新数科技有限公司 Result clustering method for bank database application system
CN115994056B (en) * 2023-03-24 2023-06-13 无锡芯享信息科技有限公司 Method and system for archiving and recovering relational database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8612396B1 (en) * 2009-03-31 2013-12-17 Amazon Technologies, Inc. Cloning and recovery of data volumes
CN105160012A (en) * 2015-09-23 2015-12-16 烽火通信科技股份有限公司 Management system and method of heterogeneous database
US9304756B1 (en) * 2005-01-21 2016-04-05 Callwave Communications, Llc Methods and systems for transferring data over a network
CN105868343A (en) * 2016-03-28 2016-08-17 上海携程商务有限公司 Database migration method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1217270C (en) * 2002-03-14 2005-08-31 上海网上乐园信息技术有限公司 System for backing up isomerous data in same network and its realization method
CN102426609B (en) * 2011-12-28 2013-02-13 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN103838755A (en) * 2012-11-23 2014-06-04 景幂机械(上海)有限公司 Remote heterogeneous disaster tolerant system of database
CN103198159B (en) * 2013-04-27 2016-01-06 国家计算机网络与信息安全管理中心 A kind of many copy consistency maintaining methods of isomeric group reformed based on affairs
CN105574187B (en) * 2015-12-23 2019-02-19 武汉达梦数据库有限公司 A kind of Heterogeneous Database Replication transaction consistency support method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9304756B1 (en) * 2005-01-21 2016-04-05 Callwave Communications, Llc Methods and systems for transferring data over a network
US8612396B1 (en) * 2009-03-31 2013-12-17 Amazon Technologies, Inc. Cloning and recovery of data volumes
CN105160012A (en) * 2015-09-23 2015-12-16 烽火通信科技股份有限公司 Management system and method of heterogeneous database
CN105868343A (en) * 2016-03-28 2016-08-17 上海携程商务有限公司 Database migration method and system

Also Published As

Publication number Publication date
CN107391306A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN107391306B (en) Heterogeneous database backup file recovery method
US20210089502A1 (en) Application-aware and remote single instance data management
US8631052B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
US8346778B2 (en) Organizing portions of a cascading index on disk
Meister et al. A study on data deduplication in HPC storage systems
US8667032B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
US8219524B2 (en) Application-aware and remote single instance data management
US20160042019A1 (en) Columnar Storage of a Database Index
CN107657049B (en) Data processing method based on data warehouse
US20160321142A1 (en) Database recovery and index rebuilds
US7680998B1 (en) Journaled data backup during server quiescence or unavailability
US9002800B1 (en) Archive and backup virtualization
Narang Database management systems
US6901418B2 (en) Data archive recovery
CN103440265B (en) The delta data catching method of MYSQL database based on MapReduce
JP7408626B2 (en) Tenant identifier replacement
CN110019017B (en) High-energy physical file storage method based on access characteristics
US10311021B1 (en) Systems and methods for indexing backup file metadata
CN112416879A (en) Block-level data deduplication method based on NTFS (New technology File System)
US20240037118A1 (en) Method, database host, and medium for database b-tree branch locking
CN115658391A (en) Backup recovery method of WAL mechanism based on QianBase MPP database
CN107291574B (en) Backup data recovery primary key generation method based on interpretation system
CN112889039A (en) Identification of records for post-clone tenant identifier conversion
van Otterdijk et al. Succinct Data Structures and Delta Encoding for Modern Databases
CN103309983B (en) Electrical anti-error system and method based on embedded database SQLite

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant