CN107391306A

CN107391306A - A kind of isomeric data library backup file access pattern method

Info

Publication number: CN107391306A
Application number: CN201710622124.7A
Authority: CN
Inventors: 刘赛; 杨华飞; 聂庆节; 刘嘉华; 刘军; 张磊; 马悦皎; 缪骞云; 张翼; 张迎星
Original assignee: State Grid Corp of China SGCC; Nari Information and Communication Technology Co; Nanjing NARI Group Corp; Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Nari Information and Communication Technology Co; Nanjing NARI Group Corp; Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2017-07-27
Filing date: 2017-07-27
Publication date: 2017-11-24
Anticipated expiration: 2037-07-27
Also published as: CN107391306B

Abstract

The invention discloses a kind of isomeric data library backup file access pattern method, including：Data normalizing in isomery source database is converted；The DELTA compression algorithms clustered using K medoids, cluster preprocessing is carried out to data block, the higher data block of similarity is divided into one kind；Same class data block is compressed using DELTA compression algorithms；Based on " SQL reappears method " to data convert, reducing end database version is read according to configuration file, metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, database is imported after carrying out data consistency detection, realizes the stable backup of heterogeneous database with recovering function.The present invention can support a variety of source databases by the expansion of mapping ruler, realize the backup of heterogeneous database, and support efficient compressing file, reduce backup cost.

Description

A kind of isomeric data library backup file access pattern method

Technical field

The present invention relates to a kind of isomeric data library backup file access pattern method, belong to technical field.

Background technology

In recent years, as the development of information technology, information management system are popularized significantly.They are with quick, efficient, just The characteristics of prompt, becomes information issue, the platform of information trading, and further promotes the digitlization of entire society and informationization to enter Journey, miscellaneous information system construct in current " the information-based world ".

The development of all trades and professions be unable to do without " data "：Product data, customer data, financial data etc., the existence of enterprise Development becomes increasingly dependent on IT system.Because the reasons such as computer virus, network intrusions, physical damnification, manual operation error are to information Data cause large-scale damage, information system can not be provided normal service, for the industry of some relation economic interests Such as bank, electric power and communication field can also cause huge economic loss.Data are protected by data backup means Shield, it is ensured that can rapidly recover local data after failure generation.

DB Backup research belongs to requirement driven field, and each major company more early starts correlative study in this respect, has A little redundancy techniques are under types of applications environment using for quite a long time.The external research to backup software started from for 20th century The mid-80, up to the present ripe commercial backup product include：Tivoli, BakBone company of EMC Inc. BrightStor of NetVault, CA company etc..

Software study institute of Zhongshan University develops jointly NetBunker2 with GuangZhou WeiTeng Networks Science Co., Ltd and is used for The network backup of Linux backup servers recovers.The HeartOne Backup Enterprise of middle mountain company in the same direction, which are provided, to be divided Cloth backs up, and realizes that intelligent backup recovers, simplifies server and network storage environment.

In the field of increasing income, backup software flourishes, and large quantities of outstanding backup softwares of increasing income occurs, wherein more famous Including Amanda, Bacula, BackupPC, Restore, Burt etc..Although open source software technology discloses but function is only capable of propping up Most basic work in some backups is held, does not apply to business scenario.Therefore it there is a need to and theoretical research carried out to some commercial functions.

With the progressively development of enterprise, business data has the characteristics that quantity is big, source is wide, species is more, complicated.Enterprise Industry have accumulated substantial amounts of business datum, and these data are of great significance to the normal operation tool of enterprise, due to each rank The Database Systems that section uses are different, and backup how is carried out to isomeric data turns into one, data backup field key issue.To the greatest extent Manage some large databases such as Oracle and SQL Server and have been provided for Database backup-restore instrument, but these in itself Instrument only supports centralized database to back up, and can not solve the Heterogeneity of DB Backup process.

The content of the invention

It is an object of the invention to overcome deficiency of the prior art, there is provided a kind of isomeric data library backup file access pattern side Method, solve isomeric data in the prior art can not effective Backup and Restore technical problem.

In order to solve the above technical problems, the technical solution adopted in the present invention is：A kind of isomeric data library backup file is extensive Compound method, comprises the following steps：

(1) data normalizing in isomery source database is converted；

(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, it is raw Into corresponding binary storage file, and by the backup file backup after compression into backup medium；

(3) metadata in backup file is reduced using " SQL reappears method ", reducing end number is read according to configuration file According to storehouse version；

(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, carries out data Database is imported after consistency detection, realizes isomeric data library backup file access pattern.

The specific method of step (1) is as follows：

101st, load driver program：Driver is imported into development environment, driven by Class.forName () function pair Program is loaded；

102nd, connection is created：After having loaded driver, pass through DriverManage getConnect () function creation Database connection object, connecting object include：Protocol name, IP address, port numbers, database-name；

103rd, Statement objects are created：Pass through Connection create Statement () function creation Statement objects；

104th, SQL statement is performed：When SQL statement produces single result set, use executeQuery ()；When without return When as a result, use executeUpdate ()；When returning to multiple result sets, use execute ()；

105th, result is obtained：The result returned as the execute () and executeQuery () for performing statement It is ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result；

106th, transformation rule is loaded according to type of database, isomeric data is converted into by unification by excData () function The metadata of standard, each element includes crucial field specifier in metadata, checks the consistent of data during for data recovery Property, it is 1 that identifier is set if being modified to metadata in backup procedure；

107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup text Part；

108th, connection is closed：If do not use database, close database with close () method and connect.

The metadata is the minimum unit of data model, shown in metadata structure expression formula such as formula (1)：

M=CS+SS (1)

Wherein：CS is content structure, and the constitution element and element content for defining metadata are carried out, and SS is syntactic structure, Define metadata format structure and specifically describe method；

Shown in content structure expression formula such as formula (2)：

CS=(T, Z, S, F) (2)

T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising：Source table sequence Number, source table name, identifier, field number, field name and field type information；

Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including：Field sequence number, Field name, field type, field value, table name and identifier；

S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, beginning Time, end time, backup sequence number, source table sequence number and field sequence number；For defining backup object, backup procedure is subdivided into Unit, when a backup tasks interrupt, continue backup tasks from interruption position；

F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table Name, constraint sequence number, primary key column name, foreign key column name, index column name and identifier.

The special column information is individually recorded so as to carry out integrality description to table structure.

The specific method of step (2) is as follows：

201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files two of division DELTA compressions are carried out between two, the file size after DELTA compressions is stored in provisional matrix arr_delta [N] [N], made Similarity between data block；

202nd, the similarity information preserved using in similarity matrix is clustered by K-medoids and calculated as cluster foundation Method clusters to data block；

203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce Middle fingerprint quantity and file size；

204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data under moving window Fingerprint, using Hash Function Mappings into super feature or super fingerprint collection；

If the 205, super fingerprint matches, one is searched in property data base with its similarity highest with reference to text Part, after finding the reference paper, it is compressed according to compression function D；

206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format is (ADD, L, S), represent the character string S that the specified location addition length in V is L；COPY coded commands, its command format are (COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V；

207th, the data block after compression is reconsolidated as backup file.

The specific method being compressed using DELTA compression algorithms to same class data block is：

Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, to the data pair in set S As being clustered, data block is divided into K classes C'={ C1', C2', C3' ... Ck'}, the similarity table between two set of metadata of similar data blocks The DELTA distances of the two are shown as, i.e.,：

Dist (Si, Sj)=delta (Si, Sj) (3)

Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, is represented surplus The point of remaining data block is distributed to away from its nearest cluster, obtains clustering cluster C={ C1, C2, C3 ... Ck }；

To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, is counted with formula (4) Calculate each data block S in cluster_jWith remainder data block S_kTotal cost,

Select central point of the total cost point minimum in cluster as new cluster, iteration above step, until the center of each cluster Point no longer changes, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.

The specific method of step (3) is as follows：

301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version；

302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, is searched according to predetermined collection sequence number to be restored Source table sequence number, constraint sequence number and field sequence number；

303rd, corresponding source table element and constraint element, inspection pair in metadata are searched according to source table sequence number and constraint sequence number Answer identifier contents：If identifier is 1, step 304 is performed, otherwise performs step 305；

304th, obtain source table and rely on specifying information, including：Field name in table name, table, field type, major key, external key with And index, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, file generated is set identifier after terminating For 0；

305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, held if identifier is 1 Row step 306, otherwise perform step 307；

306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to Obtain the corresponding INSERT sentences of information generation and realize that data are added, and by the storage of these contents into .sql files, file generated Identifier is arranged to 0 after end；

307th, control command is called, database is restored data to by performing .sql files.

Identifier in meta data file is first checked for when being reduced using " SQL reappears method " to metadata in backup file Value：

If identifier is 1, then it represents that the data were not resumed, and another mistake is to use grammer mapping ruler by backup file Content transformation is SQL statement；

If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without again Carry out conversion recovery.

Compared with prior art, the beneficial effect that is reached of the present invention is：

The present invention designs a kind of general metadata schema, and define current main-stream database Oracle, Mysql and Mapping ruler in PostgreSql between data and this model, by data normalization be metadata after storage into XML file；

A kind of improved DELTA compression algorithms are proposed, data de-duplication is carried out to backup file, reduces backup cost；

" information island " problem that enterprises heterogeneous database is brought can be overcome, there is provided towards the consistent of enterprise demand Property backup framework, while backup medium utilization rate can also be lifted, reduce backup cost；

The SQL statement that metadata is reverted to the support of indicated release database is configured according to database for recovery tasks, Database realizing recovery is imported data to by performing SQL statement mode, is marked when recovering according to modification in source data model Selective recovery is carried out to data, ensures data consistency.

Brief description of the drawings

Fig. 1 is standby system hierarchical structure schematic diagram；

Fig. 2 is the flow chart of the present invention；

Fig. 3 is that isomeric data extracts flow chart；

Fig. 4 is to be based on K-medoids cluster data compression process figures；

Fig. 5 is Data Recovery Process figure.

Embodiment

The present invention provides a kind of isomeric data library backup file access pattern method, including：A kind of metadata schema is designed, to different Data normalizing is converted in structure source database, and metadata schema is stored by XML file；It is proposed based on K-medoids clusters DELTA compression algorithms, cluster preprocessing first is carried out to data block, the higher data block of similarity is divided into one kind.Utilize DELTA Compression algorithm is compressed to same class data block；Based on " SQL reappears method " to data convert, read and reduced according to configuration file Client database version, metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, enters line number According to database is imported after consistency detection, realize the stable backup of heterogeneous database with recovering function.

The invention will be further described below in conjunction with the accompanying drawings.Following examples are only used for clearly illustrating the present invention Technical scheme, and can not be limited the scope of the invention with this.

Standby system includes three functions：Data pick-up, data processing and data recovery.Data pick-up passes through metadata Model realization carries out Unify legislation to disparate databases data type, and source data is extracted and stored according to backup tasks In backup file.Data processing is compressed using compression algorithm to duplicate contents in backup file, binary system corresponding to generation Storage file, and by backup file backup after compression into backup medium.Data recovery is based on SQL and reappears method by backup file Metadata is changed, and the .sql files that generation can be performed by each edition data place, finally imports data to data base set System realizes data recovery.

As shown in figure 1, being standby system hierarchical structure schematic diagram, it is divided into three-decker, respectively commonly connected layer, business Layer and application layer.

(1) commonly connected layer

Commonly connected layer is located at the system bottom, is responsible for the realization of database linkage function, and database is provided to operation layer Connection and inquiry service, also provide encrypting and decrypting, guarantee and isomeric data when being backed up for level of security higher data storehouse That is established between source is reliably connected.Mainly realize to establish with disparate databases by JDBC technologies and connect.

(2) operation layer

Operation layer realizes system core function, and the realization of each integral link of data base backup recovery is all in this layer. Data conversion realizes that metadata mutually maps with database data, by mapping ruler shielding heterogeneous database data form, about The difference of beam rule and SQL syntax, this is one difficult point of isomeric data library backup.

Data compression function uses the DELTA compression algorithms based on K-medoids clusters, is compressed in most basic DELTA It one times of improved efficiency on the basis of algorithm, can be original a quarter or so by backup compressing file, increase backup can realized Backup cost is reduced while speed.

Consistency detection function is protected for data reliability, ensure recovery tasks perform after in database content with Content is identical during backup.

Relation of interdependence between them on functional sequence be present.The backup tasks stage carries out data conversion first, then File carries out compression storing data into backup medium after converting.Restoration stage will be compressed by the recovery technology of data compression File is reduced to data file, and by checking, identifier determines to recover data content in data, then is converted into by transformation rule SQL statement imports database.

(3) application layer

The service solving practical problems that application layer is provided using operation layer and commonly connected layer, it is main to include what user customized Backup and Restore task or Backup and Restore plan.The layer be based on QT carry out interface, ensure total system transplantability and The autgmentability of system.

As shown in Figure 2-5, it is isomeric data library backup file access pattern method provided by the invention, specifically includes following step Suddenly：

(1) data normalizing in isomery source database is converted, specific method is as follows：

102nd, connection is created：After having loaded driver, pass through DriverManage getConnect () function creation Database connection object, such as Connect connect=DriverManager.getConnection (" url ", " UserName ", " PassWord ").Although the url of disparate databases has different-format, protocol name, IP should be wherein included The information such as address, port numbers, database-name.UserName and PassWord is the user name for being connected to data base management system And password；

103rd, Statement objects are created：Pass through Connection create Statement () function creation Statement objects；Statement classes are mainly used to perform SQL statement to obtain the result generated after execution；

104th, SQL statement is performed：Statement perform SQL statement method mainly have executeQuery (), ExecuteUpdate () and three kinds of execute ().ExecuteQuery is used when SQL statement produces single result set (), executeUpdate () is used when without returning result, execute () is used when returning to multiple result sets；

108th, connection is closed：In order to not cause the wasting of resources, when not using database, closed with close () method Database connects.

There is some difference with metadata form for each heterogeneous database data form, and Sybase data format exists at present Data type major key enriches while Functional Grammar is powerful, if Oracle base character type include CHAR, VARCHAR2, NCHAR and NVARCHAR2 etc..Simultaneously for disparate databases system, that different types of data can not be supported be present, because This setting mapping ruler is converted.Data mapping ruler is also referred to as metadata dictionary, and isomeric data is normalized Basis.The design of mapping ruler, can with reference to data type implication to be expressed based on the data type in isomery source database It is classified as character type, Real-valued, integer type and byte type.Specific mapping relations are as shown in table 1.

The DATATYPES TO of table 1 rule

Metadata is the minimum unit of data model, shown in metadata structure expression formula such as formula (1)：

M=CS+SS (1)

Wherein：CS (Content Structure) is content structure, defines the constitution element and element content of metadata Carry out, SS (Syntax Structure) is syntactic structure, defines metadata format structure and specifically describes method；

Shown in content structure expression formula such as formula (2)：

CS=(T, Z, S, F) (2)

Source table (T)：The table structure of the element representation Various database, the table structural information of data to be backed up is stored, comprising Source table sequence number, source table name, identifier, field number, field name and field type information.

Field (Z)：The data value of the element representation Various database, the concrete numerical value of field in storage table.Including field Sequence number, field name, field type, field value, table name and identifier.

Predetermined collection (S)：Define backup object and backup procedure is subdivided into unit, when a backup tasks interrupt When can from interruption position continue backup tasks.Such mechanism had both saved the time and has improved backup efficiency, it helps ensures Backup result uniformity, the data for preventing from having backed up cause data redundancy by backup is repeated.Predetermined collection is defined as backing up Base unit, include the object to be backed up.Predetermined set member includes predetermined collection numbering, source server, destination server, beginning Time, end time, backup sequence number, source table sequence number and field sequence number.

Constrain (F)：Constraint element describes field constraint information in table, for special column information in record sheet.Including table name, Constrain sequence number, primary key column name, foreign key column name, index column name and identifier.Special column information must be entered due to its specific function Row is individually recorded so as to carry out integrality description to table structure.

(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, it is raw Into corresponding binary storage file, and by the backup file backup after compression into backup medium, specific method is as follows：

202nd, the similarity information preserved using in similarity matrix is clustered by K-medoids and calculated as cluster foundation Method clusters to data block, and cluster result ensures that similarity is higher between data block in same class；

204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data under moving window Fingerprint.In order to improve retrieval rate, reduce and search the time, using Hash Function Mappings into super feature or super fingerprint collection；

If the 205th, super fingerprint matches, the similarity of two files is larger.In property data base search for one with Its highly similar reference paper, after finding the reference paper, it is compressed according to compression function D；

207th, the data block after compression is reconsolidated as backup file.

Dist (Si, Sj)=delta (Si, Sj) (3)

(3) metadata in backup file is reduced using " SQL reappears method ", reducing end number is read according to configuration file According to storehouse version；It is reverse that metadata information is reduced to SQL statement that database can identify using transformation rule and generated corresponding .sql files.Specific method is as follows：

Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these are improved and deformation Also it should be regarded as protection scope of the present invention.

Claims

A kind of 1. isomeric data library backup file access pattern method, it is characterised in that comprise the following steps：

(1) data normalizing in isomery source database is converted；

(2) cluster preprocessing is carried out to data block, same class data block is compressed using DELTA compression algorithms, generation pair The binary storage file answered, and by the backup file backup after compression into backup medium；

(3) metadata in backup file is reduced using " SQL reappears method ", reduction client database is read according to configuration file Version；

(4) metadata schema is converted into the SQL statement supported in corresponding edition data storehouse according to transformation rule, it is consistent to carry out data Property detection after import database, realize isomeric data library backup file access pattern.
2. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (1) it is specific Method is as follows：

101st, load driver program：Driver is imported into development environment, passes through Class.forName () function pair driver Loaded；

102nd, connection is created：After having loaded driver, pass through DriverManage getConnect () function creation data Storehouse connecting object, connecting object include：Protocol name, IP address, port numbers, database-name；

103rd, Statement objects are created：Pass through Connection create Statement () function creations Statement Object；

104th, SQL statement is performed：When SQL statement produces single result set, use executeQuery ()；When without returning result When, use executeUpdate ()；When returning to multiple result sets, use execute ()；

105th, result is obtained：When the execute () for performing statement and the result returned during executeQuery () are ResultSet objects, and the pointer by pointing to the object uses the data in next () function acquisition returning result；

106th, transformation rule is loaded according to type of database, isomeric data is converted into by unified standard by excData () function Metadata, each element includes crucial field specifier in metadata, and the uniformity of data is checked during for data recovery, standby Identifier is set for 1 if being modified to metadata during part；

107th, the data obtained is write according to XML format by file by wrtData () function, generates corresponding backup file；

108th, connection is closed：If do not use database, close database with close () method and connect.
3. isomeric data library backup file access pattern method according to claim 1, it is characterised in that the metadata is number It is shown according to the minimum unit of model, metadata structure expression formula such as formula (1)：

M=CS+SS (1)

Wherein：CS is content structure, and the constitution element and element content for defining metadata are carried out, and SS is syntactic structure, definition Metadata format structure and specific descriptions method；

Shown in content structure expression formula such as formula (2)：

CS=(T, Z, S, F) (2)

T represents source table, is the table structure of Various database, stores the table structural information of data to be backed up, comprising：Source table sequence number, Source table name, identifier, field number, field name and field type information；

Z represents field, is the data value of Various database, the concrete numerical value of field in storage table, including：Field sequence number, field Name, field type, field value, table name and identifier；

S represents predetermined collection, for the base unit of backup, including predetermined collection numbering, source server, destination server, the time started, End time, backup sequence number, source table sequence number and field sequence number；For defining backup object, backup procedure is subdivided into each Unit, when a backup tasks interrupt, continue backup tasks from interruption position；

F represents constraint, and constraint element describes field constraint information in table, for special column information in record sheet, including table name, about Beam sequence number, primary key column name, foreign key column name, index column name and identifier.
4. isomeric data library backup file access pattern method according to claim 3, it is characterised in that the special column information Individually record is so as to the progress integrality description of table structure.
5. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (2) it is specific Method is as follows：

201st, treat compressed file and carry out cutting, using 1M file sizes as dividing unit, to the blocks of files of division two-by-two it Between carry out DELTA compressions, DELTA compression after file size be stored in provisional matrix arr_delta [N] [N], as number According to the similarity between block；

202nd, the similarity information preserved using in similarity matrix passes through K-medoids clustering algorithms pair as cluster foundation Data block is clustered；

203rd, using content independent methodology from file selected characteristic collection, according to can storage allocation size, it is determined that produce among Fingerprint quantity and file size；

204th, the size of a sliding window is set, constantly moves forward sliding window, calculates the data fingerprint under moving window, Using Hash Function Mappings into super feature or super fingerprint collection；

If the 205, super fingerprint matches, one and its similarity highest reference paper are searched in property data base, is looked for To after the reference paper, it is compressed according to compression function D；

206th, order symbol string is encoded by compression function D, using ADD coded commands, its command format for (ADD, L, S), the character string S that the specified location addition length in V is L is represented；COPY coded commands, its command format for (COPY, L, O), represent that it is L that length is replicated from R, offset is the specified location in O character string to V；

207th, the data block after compression is reconsolidated as backup file.
6. isomeric data library backup file access pattern method according to claim 5, it is characterised in that compressed using DELTA The specific method that algorithm is compressed to same class data block is：

Piecemeal is carried out to backup file, data block set is designated as S={ S1, S2, S3 ... Sn }, the data object in set S is entered Row cluster, data block is divided into K classes C'=, and { C1', C2', C3' ... Ck'}, the similarity between two set of metadata of similar data blocks are expressed as The DELTA distances of the two, i.e.,：

Dist (Si, Sj)=delta (Si, Sj) (3)

Central point of the K data block as cluster is arbitrarily selected in S, is represented respectively with { m1, m2, m3 ... mk }, represents remainder Distributed to according to the point of block away from its nearest cluster, obtain clustering cluster C={ C1, C2, C3 ... Ck }；

To each cluster Ci, i ∈ { 1,2,3 ... k }, j-th of non-central point object Sj in cluster is traveled through, cluster is calculated with formula (4) In each data block S_jWith remainder data block S_kTotal cost,

<mrow> <mi>cos</mi> <mi> </mi> <mi>t</mi> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </munderover> <mi>d</mi> <mi>i</mi> <mi>s</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </munderover> <mi>d</mi> <mi>e</mi> <mi>l</mi> <mi>t</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Select the central point of minimum total cost point as new cluster in cluster, iteration above step, until each cluster central point not Change again, K cluster C'={ C1', C2', C3' ... the Ck'} finally obtained.
7. isomeric data library backup file access pattern method according to claim 1, it is characterised in that step (3) it is specific Method is as follows：

301st, reset terminal type of database and version number are read, corresponding mapping ruler is loaded according to database version；

302nd, the predetermined collection sequence number of corresponding task is read according to recovery tasks information, source table to be restored is searched according to predetermined collection sequence number Sequence number, constraint sequence number and field sequence number；

303rd, corresponding source table element and constraint element in metadata are searched according to source table sequence number and constraint sequence number, checks corresponding mark Know symbol content：If identifier is 1, step 304 is performed, otherwise performs step 305；

304th, obtain source table and rely on specifying information, including：Field name, field type, major key, external key and rope in table name, table Draw, corresponding SQL statement is generated after the completion of acquisition and is stored into .sql files, identifier is arranged to 0 by file generated after terminating；

305th, corresponding field element is obtained according to field sequence number, and checks corresponding identifier contents, step is performed if identifier is 1 Rapid 306, otherwise perform step 307；

306th, field specifying information is obtained, including:Field name, field type, field value, field corresponding source table name, according to acquisition The corresponding INSERT sentences of information generation realize that data are added, and by the storage of these contents into .sql files, file generated terminates Identifier is arranged to 0 afterwards；

307th, control command is called, database is restored data to by performing .sql files.
8. heterogeneous database local backup and restoration methods according to belonging to claim 1, it is characterised in that using " SQL reappears Method " first checks for the value of identifier in meta data file when being reduced to metadata in backup file：

If identifier is 1, then it represents that the data were not resumed, another mistake to using grammer mapping ruler by content in backup file It is converted into SQL statement；

If identifier is 0, then it represents that the content is had been restored in database in recovery tasks before, without carrying out again Conversion recovers.