CN107391306A - A kind of isomeric data library backup file access pattern method - Google Patents
A kind of isomeric data library backup file access pattern method Download PDFInfo
- Publication number
- CN107391306A CN107391306A CN201710622124.7A CN201710622124A CN107391306A CN 107391306 A CN107391306 A CN 107391306A CN 201710622124 A CN201710622124 A CN 201710622124A CN 107391306 A CN107391306 A CN 107391306A
- Authority
- CN
- China
- Prior art keywords
- data
- backup
- database
- field
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of isomeric data library backup file access pattern method, including:Data normalizing in isomery source database is converted;The DELTA compression algorithms clustered using K medoids, cluster preprocessing is carried out to data block, the higher data block of similarity is divided into one kind;Same class data block is compressed using DELTA compression algorithms;Based on " SQL reappears method " to data convert, reducing end database version is read according to configuration file, metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, database is imported after carrying out data consistency detection, realizes the stable backup of heterogeneous database with recovering function.The present invention can support a variety of source databases by the expansion of mapping ruler, realize the backup of heterogeneous database, and support efficient compressing file, reduce backup cost.
Description
Technical field
The present invention relates to a kind of isomeric data library backup file access pattern method, belong to technical field.
Background technology
In recent years, as the development of information technology, information management system are popularized significantly.They are with quick, efficient, just
The characteristics of prompt, becomes information issue, the platform of information trading, and further promotes the digitlization of entire society and informationization to enter
Journey, miscellaneous information system construct in current " the information-based world ".
The development of all trades and professions be unable to do without " data ":Product data, customer data, financial data etc., the existence of enterprise
Development becomes increasingly dependent on IT system.Because the reasons such as computer virus, network intrusions, physical damnification, manual operation error are to information
Data cause large-scale damage, information system can not be provided normal service, for the industry of some relation economic interests
Such as bank, electric power and communication field can also cause huge economic loss.Data are protected by data backup means
Shield, it is ensured that can rapidly recover local data after failure generation.
DB Backup research belongs to requirement driven field, and each major company more early starts correlative study in this respect, has
A little redundancy techniques are under types of applications environment using for quite a long time.The external research to backup software started from for 20th century
The mid-80, up to the present ripe commercial backup product include:Tivoli, BakBone company of EMC Inc.
BrightStor of NetVault, CA company etc..
Software study institute of Zhongshan University develops jointly NetBunker2 with GuangZhou WeiTeng Networks Science Co., Ltd and is used for
The network backup of Linux backup servers recovers.The HeartOne Backup Enterprise of middle mountain company in the same direction, which are provided, to be divided
Cloth backs up, and realizes that intelligent backup recovers, simplifies server and network storage environment.
In the field of increasing income, backup software flourishes, and large quantities of outstanding backup softwares of increasing income occurs, wherein more famous
Including Amanda, Bacula, BackupPC, Restore, Burt etc..Although open source software technology discloses but function is only capable of propping up
Most basic work in some backups is held, does not apply to business scenario.Therefore it there is a need to and theoretical research carried out to some commercial functions.
With the progressively development of enterprise, business data has the characteristics that quantity is big, source is wide, species is more, complicated.Enterprise
Industry have accumulated substantial amounts of business datum, and these data are of great significance to the normal operation tool of enterprise, due to each rank
The Database Systems that section uses are different, and backup how is carried out to isomeric data turns into one, data backup field key issue.To the greatest extent
Manage some large databases such as Oracle and SQL Server and have been provided for Database backup-restore instrument, but these in itself
Instrument only supports centralized database to back up, and can not solve the Heterogeneity of DB Backup process.
The content of the invention
It is an object of the invention to overcome deficiency of the prior art, there is provided a kind of isomeric data library backup file access pattern side
Method, solve isomeric data in the prior art can not effective Backup and Restore technical problem.
In order to solve the above technical problems, the technical solution adopted in the present invention is:A kind of isomeric data library backup file is extensive
Compound method, comprises the following steps:
(1) data normalizing in isomery source database is converted;
(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, it is raw
Into corresponding binary storage file, and by the backup file backup after compression into backup medium;
(3) metadata in backup file is reduced using " SQL reappears method ", reducing end number is read according to configuration file
According to storehouse version;
(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, carries out data
Database is imported after consistency detection, realizes isomeric data library backup file access pattern.
The specific method of step (1) is as follows:
101st, load driver program:Driver is imported into development environment, driven by Class.forName () function pair
Program is loaded;
102nd, connection is created:After having loaded driver, pass through DriverManage getConnect () function creation
Database connection object, connecting object include:Protocol name, IP address, port numbers, database-name;
103rd, Statement objects are created:Pass through Connection create Statement () function creation
Statement objects;
104th, SQL statement is performed:When SQL statement produces single result set, use executeQuery ();When without return
When as a result, use executeUpdate ();When returning to multiple result sets, use execute ();
105th, result is obtained:The result returned as the execute () and executeQuery () for performing statement
It is ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result;
106th, transformation rule is loaded according to type of database, isomeric data is converted into by unification by excData () function
The metadata of standard, each element includes crucial field specifier in metadata, checks the consistent of data during for data recovery
Property, it is 1 that identifier is set if being modified to metadata in backup procedure;
107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup text
Part;
108th, connection is closed:If do not use database, close database with close () method and connect.
The metadata is the minimum unit of data model, shown in metadata structure expression formula such as formula (1):
M=CS+SS (1)
Wherein:CS is content structure, and the constitution element and element content for defining metadata are carried out, and SS is syntactic structure,
Define metadata format structure and specifically describe method;
Shown in content structure expression formula such as formula (2):
CS=(T, Z, S, F) (2)
T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising:Source table sequence
Number, source table name, identifier, field number, field name and field type information;
Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including:Field sequence number,
Field name, field type, field value, table name and identifier;
S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, beginning
Time, end time, backup sequence number, source table sequence number and field sequence number;For defining backup object, backup procedure is subdivided into
Unit, when a backup tasks interrupt, continue backup tasks from interruption position;
F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table
Name, constraint sequence number, primary key column name, foreign key column name, index column name and identifier.
The special column information is individually recorded so as to carry out integrality description to table structure.
The specific method of step (2) is as follows:
201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files two of division
DELTA compressions are carried out between two, the file size after DELTA compressions is stored in provisional matrix arr_delta [N] [N], made
Similarity between data block;
202nd, the similarity information preserved using in similarity matrix is clustered by K-medoids and calculated as cluster foundation
Method clusters to data block;
203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce
Middle fingerprint quantity and file size;
204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data under moving window
Fingerprint, using Hash Function Mappings into super feature or super fingerprint collection;
If the 205, super fingerprint matches, one is searched in property data base with its similarity highest with reference to text
Part, after finding the reference paper, it is compressed according to compression function D;
206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format is
(ADD, L, S), represent the character string S that the specified location addition length in V is L;COPY coded commands, its command format are
(COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V;
207th, the data block after compression is reconsolidated as backup file.
The specific method being compressed using DELTA compression algorithms to same class data block is:
Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, to the data pair in set S
As being clustered, data block is divided into K classes C'={ C1', C2', C3' ... Ck'}, the similarity table between two set of metadata of similar data blocks
The DELTA distances of the two are shown as, i.e.,:
Dist (Si, Sj)=delta (Si, Sj) (3)
Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, is represented surplus
The point of remaining data block is distributed to away from its nearest cluster, obtains clustering cluster C={ C1, C2, C3 ... Ck };
To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, is counted with formula (4)
Calculate each data block S in clusterjWith remainder data block SkTotal cost,
Select central point of the total cost point minimum in cluster as new cluster, iteration above step, until the center of each cluster
Point no longer changes, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
The specific method of step (3) is as follows:
301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version;
302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, is searched according to predetermined collection sequence number to be restored
Source table sequence number, constraint sequence number and field sequence number;
303rd, corresponding source table element and constraint element, inspection pair in metadata are searched according to source table sequence number and constraint sequence number
Answer identifier contents:If identifier is 1, step 304 is performed, otherwise performs step 305;
304th, obtain source table and rely on specifying information, including:Field name in table name, table, field type, major key, external key with
And index, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, file generated is set identifier after terminating
For 0;
305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, held if identifier is 1
Row step 306, otherwise perform step 307;
306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to
Obtain the corresponding INSERT sentences of information generation and realize that data are added, and by the storage of these contents into .sql files, file generated
Identifier is arranged to 0 after end;
307th, control command is called, database is restored data to by performing .sql files.
Identifier in meta data file is first checked for when being reduced using " SQL reappears method " to metadata in backup file
Value:
If identifier is 1, then it represents that the data were not resumed, and another mistake is to use grammer mapping ruler by backup file
Content transformation is SQL statement;
If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without again
Carry out conversion recovery.
Compared with prior art, the beneficial effect that is reached of the present invention is:
The present invention designs a kind of general metadata schema, and define current main-stream database Oracle, Mysql and
Mapping ruler in PostgreSql between data and this model, by data normalization be metadata after storage into XML file;
A kind of improved DELTA compression algorithms are proposed, data de-duplication is carried out to backup file, reduces backup cost;
" information island " problem that enterprises heterogeneous database is brought can be overcome, there is provided towards the consistent of enterprise demand
Property backup framework, while backup medium utilization rate can also be lifted, reduce backup cost;
The SQL statement that metadata is reverted to the support of indicated release database is configured according to database for recovery tasks,
Database realizing recovery is imported data to by performing SQL statement mode, is marked when recovering according to modification in source data model
Selective recovery is carried out to data, ensures data consistency.
Brief description of the drawings
Fig. 1 is standby system hierarchical structure schematic diagram;
Fig. 2 is the flow chart of the present invention;
Fig. 3 is that isomeric data extracts flow chart;
Fig. 4 is to be based on K-medoids cluster data compression process figures;
Fig. 5 is Data Recovery Process figure.
Embodiment
The present invention provides a kind of isomeric data library backup file access pattern method, including:A kind of metadata schema is designed, to different
Data normalizing is converted in structure source database, and metadata schema is stored by XML file;It is proposed based on K-medoids clusters
DELTA compression algorithms, cluster preprocessing first is carried out to data block, the higher data block of similarity is divided into one kind.Utilize DELTA
Compression algorithm is compressed to same class data block;Based on " SQL reappears method " to data convert, read and reduced according to configuration file
Client database version, metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, enters line number
According to database is imported after consistency detection, realize the stable backup of heterogeneous database with recovering function.
The invention will be further described below in conjunction with the accompanying drawings.Following examples are only used for clearly illustrating the present invention
Technical scheme, and can not be limited the scope of the invention with this.
Standby system includes three functions:Data pick-up, data processing and data recovery.Data pick-up passes through metadata
Model realization carries out Unify legislation to disparate databases data type, and source data is extracted and stored according to backup tasks
In backup file.Data processing is compressed using compression algorithm to duplicate contents in backup file, binary system corresponding to generation
Storage file, and by backup file backup after compression into backup medium.Data recovery is based on SQL and reappears method by backup file
Metadata is changed, and the .sql files that generation can be performed by each edition data place, finally imports data to data base set
System realizes data recovery.
As shown in figure 1, being standby system hierarchical structure schematic diagram, it is divided into three-decker, respectively commonly connected layer, business
Layer and application layer.
(1) commonly connected layer
Commonly connected layer is located at the system bottom, is responsible for the realization of database linkage function, and database is provided to operation layer
Connection and inquiry service, also provide encrypting and decrypting, guarantee and isomeric data when being backed up for level of security higher data storehouse
That is established between source is reliably connected.Mainly realize to establish with disparate databases by JDBC technologies and connect.
(2) operation layer
Operation layer realizes system core function, and the realization of each integral link of data base backup recovery is all in this layer.
Data conversion realizes that metadata mutually maps with database data, by mapping ruler shielding heterogeneous database data form, about
The difference of beam rule and SQL syntax, this is one difficult point of isomeric data library backup.
Data compression function uses the DELTA compression algorithms based on K-medoids clusters, is compressed in most basic DELTA
It one times of improved efficiency on the basis of algorithm, can be original a quarter or so by backup compressing file, increase backup can realized
Backup cost is reduced while speed.
Consistency detection function is protected for data reliability, ensure recovery tasks perform after in database content with
Content is identical during backup.
Relation of interdependence between them on functional sequence be present.The backup tasks stage carries out data conversion first, then
File carries out compression storing data into backup medium after converting.Restoration stage will be compressed by the recovery technology of data compression
File is reduced to data file, and by checking, identifier determines to recover data content in data, then is converted into by transformation rule
SQL statement imports database.
(3) application layer
The service solving practical problems that application layer is provided using operation layer and commonly connected layer, it is main to include what user customized
Backup and Restore task or Backup and Restore plan.The layer be based on QT carry out interface, ensure total system transplantability and
The autgmentability of system.
As shown in Figure 2-5, it is isomeric data library backup file access pattern method provided by the invention, specifically includes following step
Suddenly:
(1) data normalizing in isomery source database is converted, specific method is as follows:
101st, load driver program:Driver is imported into development environment, driven by Class.forName () function pair
Program is loaded;
102nd, connection is created:After having loaded driver, pass through DriverManage getConnect () function creation
Database connection object, such as Connect connect=DriverManager.getConnection (" url ",
" UserName ", " PassWord ").Although the url of disparate databases has different-format, protocol name, IP should be wherein included
The information such as address, port numbers, database-name.UserName and PassWord is the user name for being connected to data base management system
And password;
103rd, Statement objects are created:Pass through Connection create Statement () function creation
Statement objects;Statement classes are mainly used to perform SQL statement to obtain the result generated after execution;
104th, SQL statement is performed:Statement perform SQL statement method mainly have executeQuery (),
ExecuteUpdate () and three kinds of execute ().ExecuteQuery is used when SQL statement produces single result set
(), executeUpdate () is used when without returning result, execute () is used when returning to multiple result sets;
105th, result is obtained:The result returned as the execute () and executeQuery () for performing statement
It is ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result;
106th, transformation rule is loaded according to type of database, isomeric data is converted into by unification by excData () function
The metadata of standard, each element includes crucial field specifier in metadata, checks the consistent of data during for data recovery
Property, it is 1 that identifier is set if being modified to metadata in backup procedure;
107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup text
Part;
108th, connection is closed:In order to not cause the wasting of resources, when not using database, closed with close () method
Database connects.
There is some difference with metadata form for each heterogeneous database data form, and Sybase data format exists at present
Data type major key enriches while Functional Grammar is powerful, if Oracle base character type include CHAR, VARCHAR2,
NCHAR and NVARCHAR2 etc..Simultaneously for disparate databases system, that different types of data can not be supported be present, because
This setting mapping ruler is converted.Data mapping ruler is also referred to as metadata dictionary, and isomeric data is normalized
Basis.The design of mapping ruler, can with reference to data type implication to be expressed based on the data type in isomery source database
It is classified as character type, Real-valued, integer type and byte type.Specific mapping relations are as shown in table 1.
The DATATYPES TO of table 1 rule
Metadata is the minimum unit of data model, shown in metadata structure expression formula such as formula (1):
M=CS+SS (1)
Wherein:CS (Content Structure) is content structure, defines the constitution element and element content of metadata
Carry out, SS (Syntax Structure) is syntactic structure, defines metadata format structure and specifically describes method;
Shown in content structure expression formula such as formula (2):
CS=(T, Z, S, F) (2)
Source table (T):The table structure of the element representation Various database, the table structural information of data to be backed up is stored, comprising
Source table sequence number, source table name, identifier, field number, field name and field type information.
Field (Z):The data value of the element representation Various database, the concrete numerical value of field in storage table.Including field
Sequence number, field name, field type, field value, table name and identifier.
Predetermined collection (S):Define backup object and backup procedure is subdivided into unit, when a backup tasks interrupt
When can from interruption position continue backup tasks.Such mechanism had both saved the time and has improved backup efficiency, it helps ensures
Backup result uniformity, the data for preventing from having backed up cause data redundancy by backup is repeated.Predetermined collection is defined as backing up
Base unit, include the object to be backed up.Predetermined set member includes predetermined collection numbering, source server, destination server, beginning
Time, end time, backup sequence number, source table sequence number and field sequence number.
Constrain (F):Constraint element describes field constraint information in table, for special column information in record sheet.Including table name,
Constrain sequence number, primary key column name, foreign key column name, index column name and identifier.Special column information must be entered due to its specific function
Row is individually recorded so as to carry out integrality description to table structure.
T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising:Source table sequence
Number, source table name, identifier, field number, field name and field type information;
Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including:Field sequence number,
Field name, field type, field value, table name and identifier;
S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, beginning
Time, end time, backup sequence number, source table sequence number and field sequence number;For defining backup object, backup procedure is subdivided into
Unit, when a backup tasks interrupt, continue backup tasks from interruption position;
F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table
Name, constraint sequence number, primary key column name, foreign key column name, index column name and identifier.
(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, it is raw
Into corresponding binary storage file, and by the backup file backup after compression into backup medium, specific method is as follows:
201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files two of division
DELTA compressions are carried out between two, the file size after DELTA compressions is stored in provisional matrix arr_delta [N] [N], made
Similarity between data block;
202nd, the similarity information preserved using in similarity matrix is clustered by K-medoids and calculated as cluster foundation
Method clusters to data block, and cluster result ensures that similarity is higher between data block in same class;
203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce
Middle fingerprint quantity and file size;
204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data under moving window
Fingerprint.In order to improve retrieval rate, reduce and search the time, using Hash Function Mappings into super feature or super fingerprint collection;
If the 205th, super fingerprint matches, the similarity of two files is larger.In property data base search for one with
Its highly similar reference paper, after finding the reference paper, it is compressed according to compression function D;
206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format is
(ADD, L, S), represent the character string S that the specified location addition length in V is L;COPY coded commands, its command format are
(COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V;
207th, the data block after compression is reconsolidated as backup file.
The specific method being compressed using DELTA compression algorithms to same class data block is:
Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, to the data pair in set S
As being clustered, data block is divided into K classes C'={ C1', C2', C3' ... Ck'}, the similarity table between two set of metadata of similar data blocks
The DELTA distances of the two are shown as, i.e.,:
Dist (Si, Sj)=delta (Si, Sj) (3)
Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, is represented surplus
The point of remaining data block is distributed to away from its nearest cluster, obtains clustering cluster C={ C1, C2, C3 ... Ck };
To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, is counted with formula (4)
Calculate each data block S in clusterjWith remainder data block SkTotal cost,
Select central point of the total cost point minimum in cluster as new cluster, iteration above step, until the center of each cluster
Point no longer changes, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
(3) metadata in backup file is reduced using " SQL reappears method ", reducing end number is read according to configuration file
According to storehouse version;It is reverse that metadata information is reduced to SQL statement that database can identify using transformation rule and generated corresponding
.sql files.Specific method is as follows:
301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version;
302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, is searched according to predetermined collection sequence number to be restored
Source table sequence number, constraint sequence number and field sequence number;
303rd, corresponding source table element and constraint element, inspection pair in metadata are searched according to source table sequence number and constraint sequence number
Answer identifier contents:If identifier is 1, step 304 is performed, otherwise performs step 305;
304th, obtain source table and rely on specifying information, including:Field name in table name, table, field type, major key, external key with
And index, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, file generated is set identifier after terminating
For 0;
305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, held if identifier is 1
Row step 306, otherwise perform step 307;
306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to
Obtain the corresponding INSERT sentences of information generation and realize that data are added, and by the storage of these contents into .sql files, file generated
Identifier is arranged to 0 after end;
307th, control command is called, database is restored data to by performing .sql files.
(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, carries out data
Database is imported after consistency detection, realizes isomeric data library backup file access pattern.
Identifier in meta data file is first checked for when being reduced using " SQL reappears method " to metadata in backup file
Value:
If identifier is 1, then it represents that the data were not resumed, and another mistake is to use grammer mapping ruler by backup file
Content transformation is SQL statement;
If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without again
Carry out conversion recovery.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these are improved and deformation
Also it should be regarded as protection scope of the present invention.
Claims (8)
- A kind of 1. isomeric data library backup file access pattern method, it is characterised in that comprise the following steps:(1) data normalizing in isomery source database is converted;(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, generation pair The binary storage file answered, and by the backup file backup after compression into backup medium;(3) metadata in backup file is reduced using " SQL reappears method ", reduction client database is read according to configuration file Version;(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, it is consistent to carry out data Property detection after import database, realize isomeric data library backup file access pattern.
- 2. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (1) it is specific Method is as follows:101st, load driver program:Driver is imported into development environment, passes through Class.forName () function pair driver Loaded;102nd, connection is created:After having loaded driver, pass through DriverManage getConnect () function creation data Storehouse connecting object, connecting object include:Protocol name, IP address, port numbers, database-name;103rd, Statement objects are created:Pass through Connection create Statement () function creations Statement Object;104th, SQL statement is performed:When SQL statement produces single result set, use executeQuery ();When without returning result When, use executeUpdate ();When returning to multiple result sets, use execute ();105th, result is obtained:When the execute () for performing statement and the result returned during executeQuery () are ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result;106th, transformation rule is loaded according to type of database, isomeric data is converted into by unified standard by excData () function Metadata, each element includes crucial field specifier in metadata, and the uniformity of data is checked during for data recovery, standby Identifier is set for 1 if being modified to metadata during part;107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup file;108th, connection is closed:If do not use database, close database with close () method and connect.
- 3. isomeric data library backup file access pattern method according to claim 1, it is characterised in that the metadata is number It is shown according to the minimum unit of model, metadata structure expression formula such as formula (1):M=CS+SS (1)Wherein:CS is content structure, and the constitution element and element content for defining metadata are carried out, and SS is syntactic structure, definition Metadata format structure and specific descriptions method;Shown in content structure expression formula such as formula (2):CS=(T, Z, S, F) (2)T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising:Source table sequence number, Source table name, identifier, field number, field name and field type information;Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including:Field sequence number, field Name, field type, field value, table name and identifier;S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, the time started, End time, backup sequence number, source table sequence number and field sequence number;For defining backup object, backup procedure is subdivided into each Unit, when a backup tasks interrupt, continue backup tasks from interruption position;F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table name, about Beam sequence number, primary key column name, foreign key column name, index column name and identifier.
- 4. isomeric data library backup file access pattern method according to claim 3, it is characterised in that the special column information Individually record is so as to the progress integrality description of table structure.
- 5. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (2) it is specific Method is as follows:201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files of division two-by-two it Between carry out DELTA compressions, DELTA compression after file size be stored in provisional matrix arr_delta [N] [N], as number According to the similarity between block;202nd, the similarity information preserved using in similarity matrix passes through K-medoids clustering algorithms pair as cluster foundation Data block is clustered;203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce among Fingerprint quantity and file size;204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data fingerprint under moving window, Using Hash Function Mappings into super feature or super fingerprint collection;If the 205, super fingerprint matches, one and its similarity highest reference paper are searched in property data base, is looked for To after the reference paper, it is compressed according to compression function D;206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format for (ADD, L, S), the character string S that the specified location addition length in V is L is represented;COPY coded commands, its command format for (COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V;207th, the data block after compression is reconsolidated as backup file.
- 6. isomeric data library backup file access pattern method according to claim 5, it is characterised in that compressed using DELTA The specific method that algorithm is compressed to same class data block is:Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, the data object in set S is entered Row cluster, data block is divided into K classes C'=, and { C1', C2', C3' ... Ck'}, the similarity between two set of metadata of similar data blocks are expressed as The DELTA distances of the two, i.e.,:Dist (Si, Sj)=delta (Si, Sj) (3)Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, represents remainder Distributed to according to the point of block away from its nearest cluster, obtain clustering cluster C={ C1, C2, C3 ... Ck };To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, cluster is calculated with formula (4) In each data block SjWith remainder data block SkTotal cost,<mrow> <mi>cos</mi> <mi> </mi> <mi>t</mi> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </munderover> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </munderover> <mi>d</mi> <mi>e</mi> <mi>l</mi> <mi>t</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>Select the central point of minimum total cost point as new cluster in cluster, iteration above step, until each cluster central point not Change again, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
- 7. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (3) it is specific Method is as follows:301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version;302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, source table to be restored is searched according to predetermined collection sequence number Sequence number, constraint sequence number and field sequence number;303rd, corresponding source table element and constraint element in metadata are searched according to source table sequence number and constraint sequence number, checks corresponding mark Know symbol content:If identifier is 1, step 304 is performed, otherwise performs step 305;304th, obtain source table and rely on specifying information, including:Field name, field type, major key, external key and rope in table name, table Draw, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, identifier is arranged to 0 by file generated after terminating;305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, step is performed if identifier is 1 Rapid 306, otherwise perform step 307;306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to acquisition The corresponding INSERT sentences of information generation realize that data are added, and by the storage of these contents into .sql files, file generated terminates Identifier is arranged to 0 afterwards;307th, control command is called, database is restored data to by performing .sql files.
- 8. heterogeneous database local backup and restoration methods according to belonging to claim 1, it is characterised in that using " SQL reappears Method " first checks for the value of identifier in meta data file when being reduced to metadata in backup file:If identifier is 1, then it represents that the data were not resumed, another mistake to using grammer mapping ruler by content in backup file It is converted into SQL statement;If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without carrying out again Conversion recovers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710622124.7A CN107391306B (en) | 2017-07-27 | 2017-07-27 | Heterogeneous database backup file recovery method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710622124.7A CN107391306B (en) | 2017-07-27 | 2017-07-27 | Heterogeneous database backup file recovery method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107391306A true CN107391306A (en) | 2017-11-24 |
CN107391306B CN107391306B (en) | 2019-12-10 |
Family
ID=60341216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710622124.7A Active CN107391306B (en) | 2017-07-27 | 2017-07-27 | Heterogeneous database backup file recovery method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107391306B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165260A (en) * | 2018-09-25 | 2019-01-08 | 安徽信息工程学院 | Method of data transfer based on ORACLE data basd link |
CN109271463A (en) * | 2018-11-30 | 2019-01-25 | 四川巧夺天工信息安全智能设备有限公司 | A method of restoring the innodb compressed data of MySQL database |
CN109298976A (en) * | 2018-10-17 | 2019-02-01 | 成都索贝数码科技股份有限公司 | Heterogeneous database cluster backup system and method |
CN109614434A (en) * | 2018-12-14 | 2019-04-12 | 万翼科技有限公司 | Data lead-in method, device and computer readable storage medium |
CN110515764A (en) * | 2019-07-30 | 2019-11-29 | 国云科技股份有限公司 | A kind of cloud DB Backup and the system and method across cloud recovery |
CN110928899A (en) * | 2019-11-29 | 2020-03-27 | 中孚安全技术有限公司 | Universal database backup method and system |
CN111427938A (en) * | 2020-03-18 | 2020-07-17 | 中国建设银行股份有限公司 | Data unloading method and device |
CN112347189A (en) * | 2020-11-05 | 2021-02-09 | 江苏电力信息技术有限公司 | Cloud computing-based financial data consistency failure discovery and recovery method |
CN112685223A (en) * | 2019-10-17 | 2021-04-20 | 伊姆西Ip控股有限责任公司 | File type based file backup |
CN112882866A (en) * | 2021-02-24 | 2021-06-01 | 上海泰宇信息技术股份有限公司 | Backup method suitable for massive files |
CN113806138A (en) * | 2021-02-05 | 2021-12-17 | 京东科技控股股份有限公司 | Backup recovery detection method and device for database, electronic equipment and storage medium |
CN114443739A (en) * | 2022-04-08 | 2022-05-06 | 北京华顺信安科技有限公司 | Method and device for extracting product version number |
WO2022206334A1 (en) * | 2021-03-30 | 2022-10-06 | 华为技术有限公司 | Data compression method and apparatus |
CN115757461A (en) * | 2022-11-09 | 2023-03-07 | 北京新数科技有限公司 | Bank database application system result clustering method |
CN115994056A (en) * | 2023-03-24 | 2023-04-21 | 无锡芯享信息科技有限公司 | Method and system for archiving and recovering relational database |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1445672A (en) * | 2002-03-14 | 2003-10-01 | 上海网上乐园信息技术有限公司 | System for backing up isomerous data in same network and its realization method |
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
CN103198159A (en) * | 2013-04-27 | 2013-07-10 | 国家计算机网络与信息安全管理中心 | Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters |
US8612396B1 (en) * | 2009-03-31 | 2013-12-17 | Amazon Technologies, Inc. | Cloning and recovery of data volumes |
CN103838755A (en) * | 2012-11-23 | 2014-06-04 | 景幂机械(上海)有限公司 | Remote heterogeneous disaster tolerant system of database |
CN105160012A (en) * | 2015-09-23 | 2015-12-16 | 烽火通信科技股份有限公司 | Management system and method of heterogeneous database |
US9304756B1 (en) * | 2005-01-21 | 2016-04-05 | Callwave Communications, Llc | Methods and systems for transferring data over a network |
CN105574187A (en) * | 2015-12-23 | 2016-05-11 | 武汉达梦数据库有限公司 | Duplication transaction consistency guaranteeing method and system for heterogeneous databases |
CN105868343A (en) * | 2016-03-28 | 2016-08-17 | 上海携程商务有限公司 | Database migration method and system |
-
2017
- 2017-07-27 CN CN201710622124.7A patent/CN107391306B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1445672A (en) * | 2002-03-14 | 2003-10-01 | 上海网上乐园信息技术有限公司 | System for backing up isomerous data in same network and its realization method |
US9304756B1 (en) * | 2005-01-21 | 2016-04-05 | Callwave Communications, Llc | Methods and systems for transferring data over a network |
US8612396B1 (en) * | 2009-03-31 | 2013-12-17 | Amazon Technologies, Inc. | Cloning and recovery of data volumes |
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
CN103838755A (en) * | 2012-11-23 | 2014-06-04 | 景幂机械(上海)有限公司 | Remote heterogeneous disaster tolerant system of database |
CN103198159A (en) * | 2013-04-27 | 2013-07-10 | 国家计算机网络与信息安全管理中心 | Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters |
CN105160012A (en) * | 2015-09-23 | 2015-12-16 | 烽火通信科技股份有限公司 | Management system and method of heterogeneous database |
CN105574187A (en) * | 2015-12-23 | 2016-05-11 | 武汉达梦数据库有限公司 | Duplication transaction consistency guaranteeing method and system for heterogeneous databases |
CN105868343A (en) * | 2016-03-28 | 2016-08-17 | 上海携程商务有限公司 | Database migration method and system |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165260A (en) * | 2018-09-25 | 2019-01-08 | 安徽信息工程学院 | Method of data transfer based on ORACLE data basd link |
CN109298976A (en) * | 2018-10-17 | 2019-02-01 | 成都索贝数码科技股份有限公司 | Heterogeneous database cluster backup system and method |
CN109271463A (en) * | 2018-11-30 | 2019-01-25 | 四川巧夺天工信息安全智能设备有限公司 | A method of restoring the innodb compressed data of MySQL database |
CN109271463B (en) * | 2018-11-30 | 2022-06-07 | 四川巧夺天工信息安全智能设备有限公司 | Method for recovering inodb compressed data of MySQL database |
CN109614434A (en) * | 2018-12-14 | 2019-04-12 | 万翼科技有限公司 | Data lead-in method, device and computer readable storage medium |
CN110515764A (en) * | 2019-07-30 | 2019-11-29 | 国云科技股份有限公司 | A kind of cloud DB Backup and the system and method across cloud recovery |
CN110515764B (en) * | 2019-07-30 | 2022-12-06 | 国云科技股份有限公司 | System and method for cloud database backup and cross-cloud recovery |
CN112685223A (en) * | 2019-10-17 | 2021-04-20 | 伊姆西Ip控股有限责任公司 | File type based file backup |
CN110928899A (en) * | 2019-11-29 | 2020-03-27 | 中孚安全技术有限公司 | Universal database backup method and system |
CN110928899B (en) * | 2019-11-29 | 2023-06-20 | 中孚安全技术有限公司 | Universal database backup method and system |
CN111427938A (en) * | 2020-03-18 | 2020-07-17 | 中国建设银行股份有限公司 | Data unloading method and device |
CN111427938B (en) * | 2020-03-18 | 2023-08-29 | 中国建设银行股份有限公司 | Data transfer method and device |
CN112347189A (en) * | 2020-11-05 | 2021-02-09 | 江苏电力信息技术有限公司 | Cloud computing-based financial data consistency failure discovery and recovery method |
CN113806138A (en) * | 2021-02-05 | 2021-12-17 | 京东科技控股股份有限公司 | Backup recovery detection method and device for database, electronic equipment and storage medium |
CN112882866A (en) * | 2021-02-24 | 2021-06-01 | 上海泰宇信息技术股份有限公司 | Backup method suitable for massive files |
CN112882866B (en) * | 2021-02-24 | 2023-12-15 | 上海泰宇信息技术股份有限公司 | Backup method suitable for mass files |
WO2022206334A1 (en) * | 2021-03-30 | 2022-10-06 | 华为技术有限公司 | Data compression method and apparatus |
CN114443739A (en) * | 2022-04-08 | 2022-05-06 | 北京华顺信安科技有限公司 | Method and device for extracting product version number |
CN115757461A (en) * | 2022-11-09 | 2023-03-07 | 北京新数科技有限公司 | Bank database application system result clustering method |
CN115994056A (en) * | 2023-03-24 | 2023-04-21 | 无锡芯享信息科技有限公司 | Method and system for archiving and recovering relational database |
Also Published As
Publication number | Publication date |
---|---|
CN107391306B (en) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107391306A (en) | A kind of isomeric data library backup file access pattern method | |
CN110799960B (en) | System and method for database tenant migration | |
CN101814045B (en) | Data organization method for backup services | |
CN104813276B (en) | Recover database from standby system streaming | |
CN102222085B (en) | Data de-duplication method based on combination of similarity and locality | |
US8631052B1 (en) | Efficient content meta-data collection and trace generation from deduplicated storage | |
US8578109B2 (en) | Systems and methods for retaining and using data block signatures in data protection operations | |
CN104850598A (en) | Method for recovering backup of real-time database | |
CN103714123B (en) | Enterprise's cloud memory partitioning object data de-duplication and restructuring version control method | |
CN102436408B (en) | Data storage cloud and cloud backup method based on Map/Dedup | |
CN106663047A (en) | Systems and methods for oprtimized signature comparisons and data replication | |
CN104932956A (en) | Big-data-oriented cloud disaster tolerant backup method | |
US8667032B1 (en) | Efficient content meta-data collection and trace generation from deduplicated storage | |
CN104516967A (en) | Electric power system mass data management system and use method thereof | |
CN101963982A (en) | Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash | |
US10990571B1 (en) | Online reordering of database table columns | |
CN104462185A (en) | Digital library cloud storage system based on mixed structure | |
Narang | Database management systems | |
CN102890678A (en) | Gray-code-based distributed data layout method and query method | |
CN103916459A (en) | Big data filing and storing system | |
CN104239443A (en) | Serialization data operation log storage method | |
US10909091B1 (en) | On-demand data schema modifications | |
CN102722450B (en) | Storage method for redundancy deletion block device based on location-sensitive hash | |
CN109947743A (en) | A kind of the NoSQL big data storage method and system of optimization | |
CN109271456A (en) | Host data library file deriving method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |