CN107391306A - A kind of isomeric data library backup file access pattern method - Google Patents

A kind of isomeric data library backup file access pattern method Download PDF

Info

Publication number
CN107391306A
CN107391306A CN201710622124.7A CN201710622124A CN107391306A CN 107391306 A CN107391306 A CN 107391306A CN 201710622124 A CN201710622124 A CN 201710622124A CN 107391306 A CN107391306 A CN 107391306A
Authority
CN
China
Prior art keywords
data
backup
database
field
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710622124.7A
Other languages
Chinese (zh)
Other versions
CN107391306B (en
Inventor
刘赛
杨华飞
聂庆节
刘嘉华
刘军
张磊
马悦皎
缪骞云
张翼
张迎星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Nari Information and Communication Technology Co
Nanjing NARI Group Corp
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Nari Information and Communication Technology Co
Nanjing NARI Group Corp
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Nari Information and Communication Technology Co, Nanjing NARI Group Corp, Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201710622124.7A priority Critical patent/CN107391306B/en
Publication of CN107391306A publication Critical patent/CN107391306A/en
Application granted granted Critical
Publication of CN107391306B publication Critical patent/CN107391306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of isomeric data library backup file access pattern method, including:Data normalizing in isomery source database is converted;The DELTA compression algorithms clustered using K medoids, cluster preprocessing is carried out to data block, the higher data block of similarity is divided into one kind;Same class data block is compressed using DELTA compression algorithms;Based on " SQL reappears method " to data convert, reducing end database version is read according to configuration file, metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, database is imported after carrying out data consistency detection, realizes the stable backup of heterogeneous database with recovering function.The present invention can support a variety of source databases by the expansion of mapping ruler, realize the backup of heterogeneous database, and support efficient compressing file, reduce backup cost.

Description

A kind of isomeric data library backup file access pattern method
Technical field
The present invention relates to a kind of isomeric data library backup file access pattern method, belong to technical field.
Background technology
In recent years, as the development of information technology, information management system are popularized significantly.They are with quick, efficient, just The characteristics of prompt, becomes information issue, the platform of information trading, and further promotes the digitlization of entire society and informationization to enter Journey, miscellaneous information system construct in current " the information-based world ".
The development of all trades and professions be unable to do without " data ":Product data, customer data, financial data etc., the existence of enterprise Development becomes increasingly dependent on IT system.Because the reasons such as computer virus, network intrusions, physical damnification, manual operation error are to information Data cause large-scale damage, information system can not be provided normal service, for the industry of some relation economic interests Such as bank, electric power and communication field can also cause huge economic loss.Data are protected by data backup means Shield, it is ensured that can rapidly recover local data after failure generation.
DB Backup research belongs to requirement driven field, and each major company more early starts correlative study in this respect, has A little redundancy techniques are under types of applications environment using for quite a long time.The external research to backup software started from for 20th century The mid-80, up to the present ripe commercial backup product include:Tivoli, BakBone company of EMC Inc. BrightStor of NetVault, CA company etc..
Software study institute of Zhongshan University develops jointly NetBunker2 with GuangZhou WeiTeng Networks Science Co., Ltd and is used for The network backup of Linux backup servers recovers.The HeartOne Backup Enterprise of middle mountain company in the same direction, which are provided, to be divided Cloth backs up, and realizes that intelligent backup recovers, simplifies server and network storage environment.
In the field of increasing income, backup software flourishes, and large quantities of outstanding backup softwares of increasing income occurs, wherein more famous Including Amanda, Bacula, BackupPC, Restore, Burt etc..Although open source software technology discloses but function is only capable of propping up Most basic work in some backups is held, does not apply to business scenario.Therefore it there is a need to and theoretical research carried out to some commercial functions.
With the progressively development of enterprise, business data has the characteristics that quantity is big, source is wide, species is more, complicated.Enterprise Industry have accumulated substantial amounts of business datum, and these data are of great significance to the normal operation tool of enterprise, due to each rank The Database Systems that section uses are different, and backup how is carried out to isomeric data turns into one, data backup field key issue.To the greatest extent Manage some large databases such as Oracle and SQL Server and have been provided for Database backup-restore instrument, but these in itself Instrument only supports centralized database to back up, and can not solve the Heterogeneity of DB Backup process.
The content of the invention
It is an object of the invention to overcome deficiency of the prior art, there is provided a kind of isomeric data library backup file access pattern side Method, solve isomeric data in the prior art can not effective Backup and Restore technical problem.
In order to solve the above technical problems, the technical solution adopted in the present invention is:A kind of isomeric data library backup file is extensive Compound method, comprises the following steps:
(1) data normalizing in isomery source database is converted;
(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, it is raw Into corresponding binary storage file, and by the backup file backup after compression into backup medium;
(3) metadata in backup file is reduced using " SQL reappears method ", reducing end number is read according to configuration file According to storehouse version;
(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, carries out data Database is imported after consistency detection, realizes isomeric data library backup file access pattern.
The specific method of step (1) is as follows:
101st, load driver program:Driver is imported into development environment, driven by Class.forName () function pair Program is loaded;
102nd, connection is created:After having loaded driver, pass through DriverManage getConnect () function creation Database connection object, connecting object include:Protocol name, IP address, port numbers, database-name;
103rd, Statement objects are created:Pass through Connection create Statement () function creation Statement objects;
104th, SQL statement is performed:When SQL statement produces single result set, use executeQuery ();When without return When as a result, use executeUpdate ();When returning to multiple result sets, use execute ();
105th, result is obtained:The result returned as the execute () and executeQuery () for performing statement It is ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result;
106th, transformation rule is loaded according to type of database, isomeric data is converted into by unification by excData () function The metadata of standard, each element includes crucial field specifier in metadata, checks the consistent of data during for data recovery Property, it is 1 that identifier is set if being modified to metadata in backup procedure;
107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup text Part;
108th, connection is closed:If do not use database, close database with close () method and connect.
The metadata is the minimum unit of data model, shown in metadata structure expression formula such as formula (1):
M=CS+SS (1)
Wherein:CS is content structure, and the constitution element and element content for defining metadata are carried out, and SS is syntactic structure, Define metadata format structure and specifically describe method;
Shown in content structure expression formula such as formula (2):
CS=(T, Z, S, F) (2)
T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising:Source table sequence Number, source table name, identifier, field number, field name and field type information;
Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including:Field sequence number, Field name, field type, field value, table name and identifier;
S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, beginning Time, end time, backup sequence number, source table sequence number and field sequence number;For defining backup object, backup procedure is subdivided into Unit, when a backup tasks interrupt, continue backup tasks from interruption position;
F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table Name, constraint sequence number, primary key column name, foreign key column name, index column name and identifier.
The special column information is individually recorded so as to carry out integrality description to table structure.
The specific method of step (2) is as follows:
201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files two of division DELTA compressions are carried out between two, the file size after DELTA compressions is stored in provisional matrix arr_delta [N] [N], made Similarity between data block;
202nd, the similarity information preserved using in similarity matrix is clustered by K-medoids and calculated as cluster foundation Method clusters to data block;
203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce Middle fingerprint quantity and file size;
204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data under moving window Fingerprint, using Hash Function Mappings into super feature or super fingerprint collection;
If the 205, super fingerprint matches, one is searched in property data base with its similarity highest with reference to text Part, after finding the reference paper, it is compressed according to compression function D;
206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format is (ADD, L, S), represent the character string S that the specified location addition length in V is L;COPY coded commands, its command format are (COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V;
207th, the data block after compression is reconsolidated as backup file.
The specific method being compressed using DELTA compression algorithms to same class data block is:
Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, to the data pair in set S As being clustered, data block is divided into K classes C'={ C1', C2', C3' ... Ck'}, the similarity table between two set of metadata of similar data blocks The DELTA distances of the two are shown as, i.e.,:
Dist (Si, Sj)=delta (Si, Sj) (3)
Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, is represented surplus The point of remaining data block is distributed to away from its nearest cluster, obtains clustering cluster C={ C1, C2, C3 ... Ck };
To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, is counted with formula (4) Calculate each data block S in clusterjWith remainder data block SkTotal cost,
Select central point of the total cost point minimum in cluster as new cluster, iteration above step, until the center of each cluster Point no longer changes, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
The specific method of step (3) is as follows:
301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version;
302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, is searched according to predetermined collection sequence number to be restored Source table sequence number, constraint sequence number and field sequence number;
303rd, corresponding source table element and constraint element, inspection pair in metadata are searched according to source table sequence number and constraint sequence number Answer identifier contents:If identifier is 1, step 304 is performed, otherwise performs step 305;
304th, obtain source table and rely on specifying information, including:Field name in table name, table, field type, major key, external key with And index, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, file generated is set identifier after terminating For 0;
305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, held if identifier is 1 Row step 306, otherwise perform step 307;
306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to Obtain the corresponding INSERT sentences of information generation and realize that data are added, and by the storage of these contents into .sql files, file generated Identifier is arranged to 0 after end;
307th, control command is called, database is restored data to by performing .sql files.
Identifier in meta data file is first checked for when being reduced using " SQL reappears method " to metadata in backup file Value:
If identifier is 1, then it represents that the data were not resumed, and another mistake is to use grammer mapping ruler by backup file Content transformation is SQL statement;
If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without again Carry out conversion recovery.
Compared with prior art, the beneficial effect that is reached of the present invention is:
The present invention designs a kind of general metadata schema, and define current main-stream database Oracle, Mysql and Mapping ruler in PostgreSql between data and this model, by data normalization be metadata after storage into XML file;
A kind of improved DELTA compression algorithms are proposed, data de-duplication is carried out to backup file, reduces backup cost;
" information island " problem that enterprises heterogeneous database is brought can be overcome, there is provided towards the consistent of enterprise demand Property backup framework, while backup medium utilization rate can also be lifted, reduce backup cost;
The SQL statement that metadata is reverted to the support of indicated release database is configured according to database for recovery tasks, Database realizing recovery is imported data to by performing SQL statement mode, is marked when recovering according to modification in source data model Selective recovery is carried out to data, ensures data consistency.
Brief description of the drawings
Fig. 1 is standby system hierarchical structure schematic diagram;
Fig. 2 is the flow chart of the present invention;
Fig. 3 is that isomeric data extracts flow chart;
Fig. 4 is to be based on K-medoids cluster data compression process figures;
Fig. 5 is Data Recovery Process figure.
Embodiment
The present invention provides a kind of isomeric data library backup file access pattern method, including:A kind of metadata schema is designed, to different Data normalizing is converted in structure source database, and metadata schema is stored by XML file;It is proposed based on K-medoids clusters DELTA compression algorithms, cluster preprocessing first is carried out to data block, the higher data block of similarity is divided into one kind.Utilize DELTA Compression algorithm is compressed to same class data block;Based on " SQL reappears method " to data convert, read and reduced according to configuration file Client database version, metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, enters line number According to database is imported after consistency detection, realize the stable backup of heterogeneous database with recovering function.
The invention will be further described below in conjunction with the accompanying drawings.Following examples are only used for clearly illustrating the present invention Technical scheme, and can not be limited the scope of the invention with this.
Standby system includes three functions:Data pick-up, data processing and data recovery.Data pick-up passes through metadata Model realization carries out Unify legislation to disparate databases data type, and source data is extracted and stored according to backup tasks In backup file.Data processing is compressed using compression algorithm to duplicate contents in backup file, binary system corresponding to generation Storage file, and by backup file backup after compression into backup medium.Data recovery is based on SQL and reappears method by backup file Metadata is changed, and the .sql files that generation can be performed by each edition data place, finally imports data to data base set System realizes data recovery.
As shown in figure 1, being standby system hierarchical structure schematic diagram, it is divided into three-decker, respectively commonly connected layer, business Layer and application layer.
(1) commonly connected layer
Commonly connected layer is located at the system bottom, is responsible for the realization of database linkage function, and database is provided to operation layer Connection and inquiry service, also provide encrypting and decrypting, guarantee and isomeric data when being backed up for level of security higher data storehouse That is established between source is reliably connected.Mainly realize to establish with disparate databases by JDBC technologies and connect.
(2) operation layer
Operation layer realizes system core function, and the realization of each integral link of data base backup recovery is all in this layer. Data conversion realizes that metadata mutually maps with database data, by mapping ruler shielding heterogeneous database data form, about The difference of beam rule and SQL syntax, this is one difficult point of isomeric data library backup.
Data compression function uses the DELTA compression algorithms based on K-medoids clusters, is compressed in most basic DELTA It one times of improved efficiency on the basis of algorithm, can be original a quarter or so by backup compressing file, increase backup can realized Backup cost is reduced while speed.
Consistency detection function is protected for data reliability, ensure recovery tasks perform after in database content with Content is identical during backup.
Relation of interdependence between them on functional sequence be present.The backup tasks stage carries out data conversion first, then File carries out compression storing data into backup medium after converting.Restoration stage will be compressed by the recovery technology of data compression File is reduced to data file, and by checking, identifier determines to recover data content in data, then is converted into by transformation rule SQL statement imports database.
(3) application layer
The service solving practical problems that application layer is provided using operation layer and commonly connected layer, it is main to include what user customized Backup and Restore task or Backup and Restore plan.The layer be based on QT carry out interface, ensure total system transplantability and The autgmentability of system.
As shown in Figure 2-5, it is isomeric data library backup file access pattern method provided by the invention, specifically includes following step Suddenly:
(1) data normalizing in isomery source database is converted, specific method is as follows:
101st, load driver program:Driver is imported into development environment, driven by Class.forName () function pair Program is loaded;
102nd, connection is created:After having loaded driver, pass through DriverManage getConnect () function creation Database connection object, such as Connect connect=DriverManager.getConnection (" url ", " UserName ", " PassWord ").Although the url of disparate databases has different-format, protocol name, IP should be wherein included The information such as address, port numbers, database-name.UserName and PassWord is the user name for being connected to data base management system And password;
103rd, Statement objects are created:Pass through Connection create Statement () function creation Statement objects;Statement classes are mainly used to perform SQL statement to obtain the result generated after execution;
104th, SQL statement is performed:Statement perform SQL statement method mainly have executeQuery (), ExecuteUpdate () and three kinds of execute ().ExecuteQuery is used when SQL statement produces single result set (), executeUpdate () is used when without returning result, execute () is used when returning to multiple result sets;
105th, result is obtained:The result returned as the execute () and executeQuery () for performing statement It is ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result;
106th, transformation rule is loaded according to type of database, isomeric data is converted into by unification by excData () function The metadata of standard, each element includes crucial field specifier in metadata, checks the consistent of data during for data recovery Property, it is 1 that identifier is set if being modified to metadata in backup procedure;
107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup text Part;
108th, connection is closed:In order to not cause the wasting of resources, when not using database, closed with close () method Database connects.
There is some difference with metadata form for each heterogeneous database data form, and Sybase data format exists at present Data type major key enriches while Functional Grammar is powerful, if Oracle base character type include CHAR, VARCHAR2, NCHAR and NVARCHAR2 etc..Simultaneously for disparate databases system, that different types of data can not be supported be present, because This setting mapping ruler is converted.Data mapping ruler is also referred to as metadata dictionary, and isomeric data is normalized Basis.The design of mapping ruler, can with reference to data type implication to be expressed based on the data type in isomery source database It is classified as character type, Real-valued, integer type and byte type.Specific mapping relations are as shown in table 1.
The DATATYPES TO of table 1 rule
Metadata is the minimum unit of data model, shown in metadata structure expression formula such as formula (1):
M=CS+SS (1)
Wherein:CS (Content Structure) is content structure, defines the constitution element and element content of metadata Carry out, SS (Syntax Structure) is syntactic structure, defines metadata format structure and specifically describes method;
Shown in content structure expression formula such as formula (2):
CS=(T, Z, S, F) (2)
Source table (T):The table structure of the element representation Various database, the table structural information of data to be backed up is stored, comprising Source table sequence number, source table name, identifier, field number, field name and field type information.
Field (Z):The data value of the element representation Various database, the concrete numerical value of field in storage table.Including field Sequence number, field name, field type, field value, table name and identifier.
Predetermined collection (S):Define backup object and backup procedure is subdivided into unit, when a backup tasks interrupt When can from interruption position continue backup tasks.Such mechanism had both saved the time and has improved backup efficiency, it helps ensures Backup result uniformity, the data for preventing from having backed up cause data redundancy by backup is repeated.Predetermined collection is defined as backing up Base unit, include the object to be backed up.Predetermined set member includes predetermined collection numbering, source server, destination server, beginning Time, end time, backup sequence number, source table sequence number and field sequence number.
Constrain (F):Constraint element describes field constraint information in table, for special column information in record sheet.Including table name, Constrain sequence number, primary key column name, foreign key column name, index column name and identifier.Special column information must be entered due to its specific function Row is individually recorded so as to carry out integrality description to table structure.
T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising:Source table sequence Number, source table name, identifier, field number, field name and field type information;
Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including:Field sequence number, Field name, field type, field value, table name and identifier;
S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, beginning Time, end time, backup sequence number, source table sequence number and field sequence number;For defining backup object, backup procedure is subdivided into Unit, when a backup tasks interrupt, continue backup tasks from interruption position;
F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table Name, constraint sequence number, primary key column name, foreign key column name, index column name and identifier.
(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, it is raw Into corresponding binary storage file, and by the backup file backup after compression into backup medium, specific method is as follows:
201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files two of division DELTA compressions are carried out between two, the file size after DELTA compressions is stored in provisional matrix arr_delta [N] [N], made Similarity between data block;
202nd, the similarity information preserved using in similarity matrix is clustered by K-medoids and calculated as cluster foundation Method clusters to data block, and cluster result ensures that similarity is higher between data block in same class;
203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce Middle fingerprint quantity and file size;
204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data under moving window Fingerprint.In order to improve retrieval rate, reduce and search the time, using Hash Function Mappings into super feature or super fingerprint collection;
If the 205th, super fingerprint matches, the similarity of two files is larger.In property data base search for one with Its highly similar reference paper, after finding the reference paper, it is compressed according to compression function D;
206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format is (ADD, L, S), represent the character string S that the specified location addition length in V is L;COPY coded commands, its command format are (COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V;
207th, the data block after compression is reconsolidated as backup file.
The specific method being compressed using DELTA compression algorithms to same class data block is:
Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, to the data pair in set S As being clustered, data block is divided into K classes C'={ C1', C2', C3' ... Ck'}, the similarity table between two set of metadata of similar data blocks The DELTA distances of the two are shown as, i.e.,:
Dist (Si, Sj)=delta (Si, Sj) (3)
Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, is represented surplus The point of remaining data block is distributed to away from its nearest cluster, obtains clustering cluster C={ C1, C2, C3 ... Ck };
To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, is counted with formula (4) Calculate each data block S in clusterjWith remainder data block SkTotal cost,
Select central point of the total cost point minimum in cluster as new cluster, iteration above step, until the center of each cluster Point no longer changes, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
(3) metadata in backup file is reduced using " SQL reappears method ", reducing end number is read according to configuration file According to storehouse version;It is reverse that metadata information is reduced to SQL statement that database can identify using transformation rule and generated corresponding .sql files.Specific method is as follows:
301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version;
302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, is searched according to predetermined collection sequence number to be restored Source table sequence number, constraint sequence number and field sequence number;
303rd, corresponding source table element and constraint element, inspection pair in metadata are searched according to source table sequence number and constraint sequence number Answer identifier contents:If identifier is 1, step 304 is performed, otherwise performs step 305;
304th, obtain source table and rely on specifying information, including:Field name in table name, table, field type, major key, external key with And index, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, file generated is set identifier after terminating For 0;
305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, held if identifier is 1 Row step 306, otherwise perform step 307;
306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to Obtain the corresponding INSERT sentences of information generation and realize that data are added, and by the storage of these contents into .sql files, file generated Identifier is arranged to 0 after end;
307th, control command is called, database is restored data to by performing .sql files.
(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, carries out data Database is imported after consistency detection, realizes isomeric data library backup file access pattern.
Identifier in meta data file is first checked for when being reduced using " SQL reappears method " to metadata in backup file Value:
If identifier is 1, then it represents that the data were not resumed, and another mistake is to use grammer mapping ruler by backup file Content transformation is SQL statement;
If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without again Carry out conversion recovery.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these are improved and deformation Also it should be regarded as protection scope of the present invention.

Claims (8)

  1. A kind of 1. isomeric data library backup file access pattern method, it is characterised in that comprise the following steps:
    (1) data normalizing in isomery source database is converted;
    (2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, generation pair The binary storage file answered, and by the backup file backup after compression into backup medium;
    (3) metadata in backup file is reduced using " SQL reappears method ", reduction client database is read according to configuration file Version;
    (4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, it is consistent to carry out data Property detection after import database, realize isomeric data library backup file access pattern.
  2. 2. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (1) it is specific Method is as follows:
    101st, load driver program:Driver is imported into development environment, passes through Class.forName () function pair driver Loaded;
    102nd, connection is created:After having loaded driver, pass through DriverManage getConnect () function creation data Storehouse connecting object, connecting object include:Protocol name, IP address, port numbers, database-name;
    103rd, Statement objects are created:Pass through Connection create Statement () function creations Statement Object;
    104th, SQL statement is performed:When SQL statement produces single result set, use executeQuery ();When without returning result When, use executeUpdate ();When returning to multiple result sets, use execute ();
    105th, result is obtained:When the execute () for performing statement and the result returned during executeQuery () are ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result;
    106th, transformation rule is loaded according to type of database, isomeric data is converted into by unified standard by excData () function Metadata, each element includes crucial field specifier in metadata, and the uniformity of data is checked during for data recovery, standby Identifier is set for 1 if being modified to metadata during part;
    107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup file;
    108th, connection is closed:If do not use database, close database with close () method and connect.
  3. 3. isomeric data library backup file access pattern method according to claim 1, it is characterised in that the metadata is number It is shown according to the minimum unit of model, metadata structure expression formula such as formula (1):
    M=CS+SS (1)
    Wherein:CS is content structure, and the constitution element and element content for defining metadata are carried out, and SS is syntactic structure, definition Metadata format structure and specific descriptions method;
    Shown in content structure expression formula such as formula (2):
    CS=(T, Z, S, F) (2)
    T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising:Source table sequence number, Source table name, identifier, field number, field name and field type information;
    Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including:Field sequence number, field Name, field type, field value, table name and identifier;
    S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, the time started, End time, backup sequence number, source table sequence number and field sequence number;For defining backup object, backup procedure is subdivided into each Unit, when a backup tasks interrupt, continue backup tasks from interruption position;
    F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table name, about Beam sequence number, primary key column name, foreign key column name, index column name and identifier.
  4. 4. isomeric data library backup file access pattern method according to claim 3, it is characterised in that the special column information Individually record is so as to the progress integrality description of table structure.
  5. 5. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (2) it is specific Method is as follows:
    201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files of division two-by-two it Between carry out DELTA compressions, DELTA compression after file size be stored in provisional matrix arr_delta [N] [N], as number According to the similarity between block;
    202nd, the similarity information preserved using in similarity matrix passes through K-medoids clustering algorithms pair as cluster foundation Data block is clustered;
    203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce among Fingerprint quantity and file size;
    204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data fingerprint under moving window, Using Hash Function Mappings into super feature or super fingerprint collection;
    If the 205, super fingerprint matches, one and its similarity highest reference paper are searched in property data base, is looked for To after the reference paper, it is compressed according to compression function D;
    206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format for (ADD, L, S), the character string S that the specified location addition length in V is L is represented;COPY coded commands, its command format for (COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V;
    207th, the data block after compression is reconsolidated as backup file.
  6. 6. isomeric data library backup file access pattern method according to claim 5, it is characterised in that compressed using DELTA The specific method that algorithm is compressed to same class data block is:
    Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, the data object in set S is entered Row cluster, data block is divided into K classes C'=, and { C1', C2', C3' ... Ck'}, the similarity between two set of metadata of similar data blocks are expressed as The DELTA distances of the two, i.e.,:
    Dist (Si, Sj)=delta (Si, Sj) (3)
    Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, represents remainder Distributed to according to the point of block away from its nearest cluster, obtain clustering cluster C={ C1, C2, C3 ... Ck };
    To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, cluster is calculated with formula (4) In each data block SjWith remainder data block SkTotal cost,
    <mrow> <mi>cos</mi> <mi> </mi> <mi>t</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </munderover> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </munderover> <mi>d</mi> <mi>e</mi> <mi>l</mi> <mi>t</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
    Select the central point of minimum total cost point as new cluster in cluster, iteration above step, until each cluster central point not Change again, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
  7. 7. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (3) it is specific Method is as follows:
    301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version;
    302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, source table to be restored is searched according to predetermined collection sequence number Sequence number, constraint sequence number and field sequence number;
    303rd, corresponding source table element and constraint element in metadata are searched according to source table sequence number and constraint sequence number, checks corresponding mark Know symbol content:If identifier is 1, step 304 is performed, otherwise performs step 305;
    304th, obtain source table and rely on specifying information, including:Field name, field type, major key, external key and rope in table name, table Draw, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, identifier is arranged to 0 by file generated after terminating;
    305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, step is performed if identifier is 1 Rapid 306, otherwise perform step 307;
    306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to acquisition The corresponding INSERT sentences of information generation realize that data are added, and by the storage of these contents into .sql files, file generated terminates Identifier is arranged to 0 afterwards;
    307th, control command is called, database is restored data to by performing .sql files.
  8. 8. heterogeneous database local backup and restoration methods according to belonging to claim 1, it is characterised in that using " SQL reappears Method " first checks for the value of identifier in meta data file when being reduced to metadata in backup file:
    If identifier is 1, then it represents that the data were not resumed, another mistake to using grammer mapping ruler by content in backup file It is converted into SQL statement;
    If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without carrying out again Conversion recovers.
CN201710622124.7A 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method Active CN107391306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710622124.7A CN107391306B (en) 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710622124.7A CN107391306B (en) 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method

Publications (2)

Publication Number Publication Date
CN107391306A true CN107391306A (en) 2017-11-24
CN107391306B CN107391306B (en) 2019-12-10

Family

ID=60341216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710622124.7A Active CN107391306B (en) 2017-07-27 2017-07-27 Heterogeneous database backup file recovery method

Country Status (1)

Country Link
CN (1) CN107391306B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165260A (en) * 2018-09-25 2019-01-08 安徽信息工程学院 Method of data transfer based on ORACLE data basd link
CN109271463A (en) * 2018-11-30 2019-01-25 四川巧夺天工信息安全智能设备有限公司 A method of restoring the innodb compressed data of MySQL database
CN109298976A (en) * 2018-10-17 2019-02-01 成都索贝数码科技股份有限公司 Heterogeneous database cluster backup system and method
CN109614434A (en) * 2018-12-14 2019-04-12 万翼科技有限公司 Data lead-in method, device and computer readable storage medium
CN110515764A (en) * 2019-07-30 2019-11-29 国云科技股份有限公司 A kind of cloud DB Backup and the system and method across cloud recovery
CN110928899A (en) * 2019-11-29 2020-03-27 中孚安全技术有限公司 Universal database backup method and system
CN111427938A (en) * 2020-03-18 2020-07-17 中国建设银行股份有限公司 Data unloading method and device
CN112347189A (en) * 2020-11-05 2021-02-09 江苏电力信息技术有限公司 Cloud computing-based financial data consistency failure discovery and recovery method
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN112882866A (en) * 2021-02-24 2021-06-01 上海泰宇信息技术股份有限公司 Backup method suitable for massive files
CN113806138A (en) * 2021-02-05 2021-12-17 京东科技控股股份有限公司 Backup recovery detection method and device for database, electronic equipment and storage medium
CN114443739A (en) * 2022-04-08 2022-05-06 北京华顺信安科技有限公司 Method and device for extracting product version number
WO2022206334A1 (en) * 2021-03-30 2022-10-06 华为技术有限公司 Data compression method and apparatus
CN115757461A (en) * 2022-11-09 2023-03-07 北京新数科技有限公司 Bank database application system result clustering method
CN115994056A (en) * 2023-03-24 2023-04-21 无锡芯享信息科技有限公司 Method and system for archiving and recovering relational database

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1445672A (en) * 2002-03-14 2003-10-01 上海网上乐园信息技术有限公司 System for backing up isomerous data in same network and its realization method
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN103198159A (en) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters
US8612396B1 (en) * 2009-03-31 2013-12-17 Amazon Technologies, Inc. Cloning and recovery of data volumes
CN103838755A (en) * 2012-11-23 2014-06-04 景幂机械(上海)有限公司 Remote heterogeneous disaster tolerant system of database
CN105160012A (en) * 2015-09-23 2015-12-16 烽火通信科技股份有限公司 Management system and method of heterogeneous database
US9304756B1 (en) * 2005-01-21 2016-04-05 Callwave Communications, Llc Methods and systems for transferring data over a network
CN105574187A (en) * 2015-12-23 2016-05-11 武汉达梦数据库有限公司 Duplication transaction consistency guaranteeing method and system for heterogeneous databases
CN105868343A (en) * 2016-03-28 2016-08-17 上海携程商务有限公司 Database migration method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1445672A (en) * 2002-03-14 2003-10-01 上海网上乐园信息技术有限公司 System for backing up isomerous data in same network and its realization method
US9304756B1 (en) * 2005-01-21 2016-04-05 Callwave Communications, Llc Methods and systems for transferring data over a network
US8612396B1 (en) * 2009-03-31 2013-12-17 Amazon Technologies, Inc. Cloning and recovery of data volumes
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN103838755A (en) * 2012-11-23 2014-06-04 景幂机械(上海)有限公司 Remote heterogeneous disaster tolerant system of database
CN103198159A (en) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters
CN105160012A (en) * 2015-09-23 2015-12-16 烽火通信科技股份有限公司 Management system and method of heterogeneous database
CN105574187A (en) * 2015-12-23 2016-05-11 武汉达梦数据库有限公司 Duplication transaction consistency guaranteeing method and system for heterogeneous databases
CN105868343A (en) * 2016-03-28 2016-08-17 上海携程商务有限公司 Database migration method and system

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165260A (en) * 2018-09-25 2019-01-08 安徽信息工程学院 Method of data transfer based on ORACLE data basd link
CN109298976A (en) * 2018-10-17 2019-02-01 成都索贝数码科技股份有限公司 Heterogeneous database cluster backup system and method
CN109271463A (en) * 2018-11-30 2019-01-25 四川巧夺天工信息安全智能设备有限公司 A method of restoring the innodb compressed data of MySQL database
CN109271463B (en) * 2018-11-30 2022-06-07 四川巧夺天工信息安全智能设备有限公司 Method for recovering inodb compressed data of MySQL database
CN109614434A (en) * 2018-12-14 2019-04-12 万翼科技有限公司 Data lead-in method, device and computer readable storage medium
CN110515764A (en) * 2019-07-30 2019-11-29 国云科技股份有限公司 A kind of cloud DB Backup and the system and method across cloud recovery
CN110515764B (en) * 2019-07-30 2022-12-06 国云科技股份有限公司 System and method for cloud database backup and cross-cloud recovery
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN110928899A (en) * 2019-11-29 2020-03-27 中孚安全技术有限公司 Universal database backup method and system
CN110928899B (en) * 2019-11-29 2023-06-20 中孚安全技术有限公司 Universal database backup method and system
CN111427938A (en) * 2020-03-18 2020-07-17 中国建设银行股份有限公司 Data unloading method and device
CN111427938B (en) * 2020-03-18 2023-08-29 中国建设银行股份有限公司 Data transfer method and device
CN112347189A (en) * 2020-11-05 2021-02-09 江苏电力信息技术有限公司 Cloud computing-based financial data consistency failure discovery and recovery method
CN113806138A (en) * 2021-02-05 2021-12-17 京东科技控股股份有限公司 Backup recovery detection method and device for database, electronic equipment and storage medium
CN112882866A (en) * 2021-02-24 2021-06-01 上海泰宇信息技术股份有限公司 Backup method suitable for massive files
CN112882866B (en) * 2021-02-24 2023-12-15 上海泰宇信息技术股份有限公司 Backup method suitable for mass files
WO2022206334A1 (en) * 2021-03-30 2022-10-06 华为技术有限公司 Data compression method and apparatus
CN114443739A (en) * 2022-04-08 2022-05-06 北京华顺信安科技有限公司 Method and device for extracting product version number
CN115757461A (en) * 2022-11-09 2023-03-07 北京新数科技有限公司 Bank database application system result clustering method
CN115994056A (en) * 2023-03-24 2023-04-21 无锡芯享信息科技有限公司 Method and system for archiving and recovering relational database

Also Published As

Publication number Publication date
CN107391306B (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN107391306A (en) A kind of isomeric data library backup file access pattern method
CN110799960B (en) System and method for database tenant migration
CN101814045B (en) Data organization method for backup services
CN104813276B (en) Recover database from standby system streaming
CN102222085B (en) Data de-duplication method based on combination of similarity and locality
US8631052B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
US8578109B2 (en) Systems and methods for retaining and using data block signatures in data protection operations
CN104850598A (en) Method for recovering backup of real-time database
CN103714123B (en) Enterprise&#39;s cloud memory partitioning object data de-duplication and restructuring version control method
CN102436408B (en) Data storage cloud and cloud backup method based on Map/Dedup
CN106663047A (en) Systems and methods for oprtimized signature comparisons and data replication
CN104932956A (en) Big-data-oriented cloud disaster tolerant backup method
US8667032B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
CN104516967A (en) Electric power system mass data management system and use method thereof
CN101963982A (en) Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash
US10990571B1 (en) Online reordering of database table columns
CN104462185A (en) Digital library cloud storage system based on mixed structure
Narang Database management systems
CN102890678A (en) Gray-code-based distributed data layout method and query method
CN103916459A (en) Big data filing and storing system
CN104239443A (en) Serialization data operation log storage method
US10909091B1 (en) On-demand data schema modifications
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN109947743A (en) A kind of the NoSQL big data storage method and system of optimization
CN109271456A (en) Host data library file deriving method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant