CN107145523A

CN107145523A - Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching

Info

Publication number: CN107145523A
Application number: CN201710237034.6A
Authority: CN
Inventors: 陈岭; 顾伟东
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-04-12
Filing date: 2017-04-12
Publication date: 2017-09-08
Anticipated expiration: 2037-04-12
Also published as: CN107145523B

Abstract

The invention discloses a kind of large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching, it is embodied as follows：1) data in former knowledge base are screened, Uniform data format, on this basis obtain knowledge base in relation and initial matching entity pair；2) using the pretreated knowledge library partition of relation pair in knowledge base, and block is simplified；3) using matching entities to matching block, obtain matching block pair；4) candidate's entity pair is selected in matching block pair, and combines method for measuring similarity and threshold value confirmation candidate's entity pair；5) repeat the above steps, until new candidate's entity pair can not be found, obtain all matching entities pair.The present invention combines the thought alignment Heterogeneous Knowledge storehouse of Iterative matching, is had broad application prospects in fields such as knowledge base alignment, data fusion, automatic question answerings.

Description

Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching

Technical field

The present invention relates to knowledge base alignment field, more particularly to a kind of large-scale Heterogeneous Knowledge storehouse alignment based on Iterative matching Method.

Background technology

With Web 3.0 arrival, the knowledge base of structuring is increasingly frequently occurred on internet.These knowledge bases It is widely used in all kinds of semantic applications, for example：Automatic question answering, search service and social interaction server etc..However, single knowledge base Limited information, limit these application function.In this context, knowledge base alignment has huge development space.Knowledge Storehouse alignment (Knowledge Base Alignment) is often referred to the entity alignment of knowledge base, i.e., automatic to find to represent in reality together Two entities of one things simultaneously connect them.

Due to the continuous growth of knowledge base scale, alignment procedure is generally divided into two steps by knowledge base alignment schemes：Hair Existing candidate's entity pair and confirmation candidate's entity pair.It was found that candidate's entity using a small amount of attribute for each entity to generally quickly being screened Go out several candidate's entities, confirm candidate's entity to by comparing two entities comprehensively, utilizing similarity and the entity of threshold decision two Whether match.Compare due to avoiding entity between any two accurate, this way substantially increases the whole efficiency of method.Mesh Before, the bottleneck of knowledge base alignment schemes is that the candidate's entity found, to usually having omitted, further results in the reality that can be matched Body is to undiscovered.

To improve the quality of candidate's entity pair, researcher proposes the thought using Iterative matching, i.e. often wheel discovery is a small amount of Matching entities pair, and be used as next round find candidate's entity pair foundation.However, traditional knowledge base alignment schemes are generally closed Note have between the alignment of isomorphism knowledge base, i.e. two knowledge bases it is more can alignment relation.Its basic assumption is：If a pair of entities to Match somebody with somebody, and they have the relation of alignment, then and their " compatible neighbours " have greater probability matching, therefore " compatible neighbours " are made For candidate's entity pair.But, due between knowledge base can alignment relation it is few, conventional method is by holiday candidate's entity pair.In order to The problem is solved, researcher proposes to use class-based knowledge base alignment schemes.This method is by the example with same characteristic features It is divided into same class, and excludes the incoherent candidate's entity of content with class, candidate's entity pair is confirmed with this.However, Because this method only obtains candidate's entity pair in the model starting stage by classical partitioning technique, therefore when between two knowledge bases When the attribute of alignment is less, this method will also omit more candidate's entity pair.

The content of the invention

In view of above-mentioned, the present invention proposes a kind of large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching.This method Knowledge base alignment is carried out with reference to Iterative matching thought, traveling through relation pair knowledge base using iteration framework carries out subregion, expands The search space of candidate's entity pair；Meanwhile, using being selected using thought of dividing and ruling and confirm candidate's entity pair so that each entity is only Need and several candidate's entities are compared comprehensively, improve the efficiency of method.

A kind of large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching, are specifically included：

Data preprocessing phase：Knowledge base KB former to any two₁、KB₂In data screened, Uniform data format And meaningless character processing is rejected, and count acquisition and knowledge base KB ' after processing₁Corresponding set of relations R₁, with processing after know Know storehouse KB '₂Corresponding set of relations R₂, compare acquisition initial matching entity to collection

Knowledge base align stage：Utilize set of relations R₁With set of relations R₂In relation pair knowledge base KB '₁With knowledge base KB '₂ Subregion is carried out, and each block is simplified, obtains simplifying block collection B '₁With B '₂；Then, using initial matching entity to collectionBlock collection B ' is simplified in matching₁With B '₂In block, obtain match block pair, finally, matching block pair in select candidate Entity pair, and combine method for measuring similarity and threshold value δ_eConfirm candidate's entity pair.

Described data preprocessing phase is concretely comprised the following steps：

The former knowledge base KB of (1-1) input any two₁、KB₂, and remove knowledge base KB₁、KB₂In it is unrelated with task of aliging Information；

(1-2) is to knowledge base KB₁In literal L₁With knowledge base KB₂In literal L₂Uniform data format, by day Phase, numeral, name are expressed as unified form；

(1-3) removes knowledge base KB₁In literal L₁With knowledge base KB₂In literal L₂Middle stop words character, symbol Character, linguistic labelses character, knowledge base KB ' after being handled₁With KB '₂；

(1-4) statistics is obtained and knowledge base KB '₁Relative set of relations R₁And knowledge base KB '₂Corresponding set of relations R₂；

(1-5) compares knowledge base KB '₁With knowledge base KB '₂In all entities, obtain initial matching entity to collection

Knowledge base is defined as hexa-atomic group of (E, L, R, P, F_R,F_P), wherein, E, L, R, P difference presentation-entity, literal, relation And the set of attribute；The triplet sets of entity-relationship-entity are represented, it is entity to represent object Relation is true；The triplet sets of entity-attribute-literal are represented, the category that object is literal is represented Sexual behavior is real；F_RAnd F_PIn all there is insignificant information, for example：The original text for being used for extracting triple is included in some knowledge bases Language material, these information can influence the efficiency of algorithm.In addition, some triples comprising " sameAs " relation should be also removed.

The detailed process of the step (1-4) is：

For knowledge base KB '₁, travel through the triplet sets F for belonging to the knowledge base_R1In all triples (entity-pass System-entity), statistics obtains set of relations R₁；For knowledge base KB '₂, travel through the triplet sets F for belonging to the knowledge base_R2In institute There is triple (entity-relationship-entity), statistics obtains set of relations R₂, set of relations R₁With set of relations R₂For follow-up knowledge base point Area is operated.

In step (1-5), described initial matching entity is to collectionAcquisition process be：

First, knowledge base KB ' is extracted₁In all entities composition entity set E₁, extract knowledge base KB '₂In all entities Constitute entity set E₂；And with entity set E₁In any entity and entity set E₂In the cartesian product of any entity be used as entity Right, composition entity is to collection；

Then, screening obtains string representation identical entity pair of the entity to two entity name attributes of concentration, obtains To pre- initial matching entity to collection；

Finally, pre- initial matching entity is screened to concentrating the entity pair with one-to-one matching relationship, is used as initial matching Entity is to collection

Described knowledge base align stage is concretely comprised the following steps：

(2-1) Input knowledge storehouse KB '₁, knowledge base KB '₂, set of relations R₁, set of relations R₂, initial matching entity to collection Block similarity threshold δ is set_b, entity similarity threshold δ_e, physical quantities threshold value δ in block₁And matching entities in block Rate threshold δ₂, matching entities are to collection M_eInitial matching entity is initialized as to collection

(2-2) randomly selects set of relations R₁Or set of relations R₂In any relation, using the relation by knowledge base KB '₁With know Know storehouse KB '₂In entity be divided into several blocks, obtain and knowledge base KB '₁Corresponding block collection B₁And knowledge base KB '₂Phase Corresponding block collection B₂；

(2-3) removes Except block collection B₁With block collection B₂In be also easy to produce high amount of calculation or be difficult to generate matching entities pair block, Obtain simplifying block collection B '₁With simplify block collection B '₂；

(2-4) is using matching entities to collection M_eIn all matching entities to measurement simplify block collection B '₁Middle either block with Simplify block collection B '₂Similarity between middle either block, selection Similarity value is more than block similarity threshold δ_bTwo blocks Matched, obtain matching block to collection；

(2-5) to belonging to matching block to any matching block pair of concentration, with a block of the matching block pair Any non-matching entities the cartesian product of any non-matching entities in another block of block pair is matched with this as time Entity pair is selected, composition candidate's entity is to collection；

(2-6) judges whether not find new candidate's entity pair, if it is not, execution step (2-7) is redirected, if so, terminate iteration, Output matching entity is to collection M_e；

(2-7) calculates candidate's entity to the similarity between each entity of candidate's entity centering two of concentration, by Similarity value More than entity similarity threshold δ_eCorresponding candidate's entity to added to matching entities to collection M_eIn, remaining candidate's entity is to house Abandon；

(2-8) judges whether iterations is less than iteration threshold, all no, redirects execution step (2-2)；If so, terminating to change In generation, output matching entity is to collection M_e。

In step (2-2), the detailed process that the entity in knowledge base is divided into several blocks is by described utilization relation；

Firstly, for knowledge base KB '₁In triplet sets F_R1, count and obtain triplet sets F_R1Middle n kinds object is real Body；

Then, for every kind of object entity, by triplet sets F_R1In all subject entities corresponding thereto be placed on one Rise, obtain 1 block, n kind object entities obtain n block, composition block collection B₁；

Block collection B is obtained using same method₂。

In step (2-3), the described block for being also easy to produce high amount of calculation or being difficult to generation matching entities pair includes：Entity number Amount exceedes threshold value δ₁Block, matching entities ratio be less than threshold value δ₂Block and entity all matched blocks.

In step (2-4), the acquisition methods of the similarity between block are：

Each block is seen be entity set, matched entity is the identical element regarding two set as, profit With set similarity come the similarity between Metrics block, similarity sim_block(b_k,b_l) calculation formula be：

Wherein, b_kAnd b_lTwo blocks are represented, | b_k∩b_l| represent that matching entities are to quantity in two blocks, | b_k∪b_l| represent Total physical quantities in two blocks.

In step (2-7), the acquisition formula of the similarity between entity is：

sim(e_i,e_j)=α sim_string(e_i,e_j)+(1-α)sim_block(b_k,b_l)

s.t.e_i∈b_k,e_j∈b_l

Wherein, b_kAnd b_lDifference presentation-entity e_iAnd e_jThe block at place, sim_string(e_i,e_j) and sim_block(b_k,b_l) point Similarity of character string and block similarity between other presentation-entity, α are the weights of similarity of character string, and span is [0,1].

Preferably, using based on Levenshtein distances, based on Jaro-Winker distances, based on q-gram and being based on I-SUB similarity function, and combine by way of linear weighted function these measuring similarity functions and calculate and obtain character string phase Like degree.

The present invention combines Iterative matching thought and carries out Heterogeneous Knowledge storehouse alignment, and relation pair knowledge base is traveled through using iteration framework Subregion is carried out, the search space of candidate's entity pair is expanded；Meanwhile, using being selected using thought of dividing and ruling and confirm candidate's entity It is right so that each entity only needs comprehensively to be compared with several candidate's entities, improves the efficiency of method.With existing method phase Than the advantage is that：

(1) knowledge base alignment is regarded as an iterative process.In different iteration, travel through each relation pair knowledge base and divided Area, and using matching block to selecting candidate's entity pair so that alignment schemes are independent of the relation that can be alignd between knowledge base And attribute.

(2) in every wheel iteration, a small amount of matching entities pair are only found, and by these matching entities to for candidate's entity pair Select, used the information of more matching entities pair due to selecting the process of candidate's entity pair, therefore improve candidate's entity To quality.

Brief description of the drawings

Fig. 1 is the FB(flow block) of the large-scale Heterogeneous Knowledge storehouse alignment schemes of the invention based on Iterative matching；

Fig. 2 is the flow of data preprocessing phase in the large-scale Heterogeneous Knowledge storehouse alignment schemes of the invention based on Iterative matching Figure；

Fig. 3 is the flow of knowledge base align stage in the large-scale Heterogeneous Knowledge storehouse alignment schemes of the invention based on Iterative matching Figure.

Embodiment

In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and embodiment is to technical scheme It is described in detail.

As shown in figure 1, the large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching of the invention be divided into data prediction and Two parts of knowledge base alignment.Data prediction part：Data in former knowledge base KB are screened, Uniform data format, And obtain the relation in knowledge base and initial matching entity pair；Knowledge base aligned portions：First with the relation pair in knowledge base Pretreated knowledge library partition, and block is simplified, then using matching entities to matching block, obtain matching block pair, Then candidate's entity pair is selected in matching block pair, and combines method for measuring similarity and threshold value confirmation candidate's entity pair, most After repeat the above steps, until new candidate's entity pair can not be found, you can obtain all matching entities pair.

Shown in Fig. 2 is the flow chart of data preprocessing phase；According to Fig. 2, the stage is divided into following steps：

S1-1, the former knowledge base KB of input any two₁、KB₂, and remove knowledge base KB₁、KB₂In it is unrelated with task of aliging Information.

Knowledge base is defined as hexa-atomic group of (E, L, R, P, F_R,F_P), wherein, E, L, R, P difference presentation-entity, literal, relation And the set of attribute；The triplet sets of entity-relationship-entity are represented, expression object is entity Relation it is true；The triplet sets of entity-attribute-literal are represented, it is literal to represent object Attribute is true；F_RAnd F_PIn all there is insignificant information, for example：The original text for being used for extracting triple is included in some knowledge bases This language material, these information can influence the efficiency of algorithm.In addition, some comprising " triple of same As " relations should be also removed.

S1-2, to knowledge base KB₁In literal L₁With knowledge base KB₂In literal L₂Uniform data format, by day Phase, numeral, name are expressed as unified form.

The expression way of the literals such as name, date, numeral in different knowledge bases may be different, for example：“2016-01- 01 " and " 01.01.2016 ".By these information unifications, beneficial to subsequently comparing, in addition, literal is unified into small letter by method.

S1-3, removes knowledge base KB₁In literal L₁With knowledge base KB₂In literal L₂Middle stop words character, symbol The meaningless character such as character, linguistic labelses, knowledge base KB ' after being handled₁With KB '₂。

There may be some meaningless characters in being described in knowledge base for entity attributes, for example：" the ", " a " and Stop words such as " an ", " # ", "！" and symbol and the " linguistic labelses such as@en " such as " * ".These characters influence the similarity of entity pair Measurement, therefore remove these characters.

S1-4, statistics is obtained and knowledge base KB '₁Relative set of relations R₁And knowledge base KB '₂Corresponding set of relations R₂。

In this step, for knowledge base KB '₁, travel through the triplet sets F for belonging to the knowledge base_R1In all triples (entity-relationship-entity), statistics obtains set of relations R₁；For knowledge base KB '₂, travel through the triplet sets for belonging to the knowledge base F_R2In all triples (entity-relationship-entity), statistics obtain set of relations R₂, set of relations R₁With set of relations R₂For follow-up Knowledge base division operation.

S1-5, compares knowledge base KB '₁With knowledge base KB '₂In all entities, obtain initial matching entity to collection

In this step, initial matching entity is to collectionAcquisition process be：

Shown in Fig. 3 is the flow chart of knowledge base align stage；According to Fig. 3, the stage is divided into following steps：

S2-1, Input knowledge storehouse KB '₁, knowledge base KB '₂, set of relations R₁, set of relations R₂, initial matching entity to collection Block similarity threshold δ is set_bFor 0.2, entity similarity threshold δ_eFor physical quantities threshold value δ in 0.65, block₁For 50 and Matching entities rate threshold δ in block₂For 0.3, matching entities are to collection M_eInitial matching entity is initialized as to collection

S2-2, randomly selects set of relations R₁Or set of relations R₂In any relation, using the relation by knowledge base KB '₁With know Know storehouse KB '₂In entity be divided into several blocks, obtain and knowledge base KB '₁Corresponding block collection B₁And knowledge base KB '₂Phase Corresponding block collection B₂。

In this step, it is by the detailed process that the entity in knowledge base is divided into several blocks using relation；

Firstly, for knowledge base KR '₁In triplet sets F_R1, count and obtain triplet sets F_R1Middle n kinds object is real Body；

Then, for every kind of object entity, by triplet sets F_R1In all subject entities corresponding thereto be placed on one Rise, obtain 1 block, n kind object entities obtain n block, composition block collection B₁。

Block collection B is obtained using same method₂, i.e.,：

Firstly, for knowledge base KB '₂In triplet sets F_R2, count and obtain triplet sets F_R2Middle n kinds object is real Body；

Then, for every kind of object entity, by triplet sets F_R2In all subject entities corresponding thereto be placed on one Rise, obtain 1 block, n kind object entities obtain n block, composition block collection B₂。

S2-3, removes Except block collection B₁With block collection B₂In be also easy to produce high amount of calculation or be difficult to generate matching entities pair block, Obtain simplifying block collection B '₁With simplify block collection B '₂。

In this step, the block for being also easy to produce high amount of calculation or being difficult to generation matching entities pair includes：Physical quantities exceed threshold Value δ₁Block, matching entities ratio be less than threshold value δ₂Block and entity all matched blocks.

S2-4, using matching entities to collection M_eIn all matching entities to measurement simplify block collection B '₁Middle either block with Simplify block collection B '₂Similarity between middle either block, selection Similarity value is more than block similarity threshold δ_bTwo blocks Matched, obtain matching block to collection.

S2-5, to belonging to any matching block pair of the matching block to concentration, with a block of the matching block pair Any non-matching entities the cartesian product of any non-matching entities in another block of block pair is matched with this as time Entity pair is selected, composition candidate's entity is to collection.

S2-6, judges whether not find new candidate's entity pair, if it is not, execution S2-7 is redirected, if so, terminate iteration, and output Matching entities are to collection M_e。

S2-7, the similarity between calculating candidate's entity to concentrating each entity of candidate's entity centering two, by Similarity value More than entity similarity threshold δ_eCorresponding candidate's entity to added to matching entities to collection M_eIn, remaining candidate's entity is to house Abandon.

In this step, the similarity between entity is measured by 2 kinds of modes：Similarity of character string is similar with block Degree, and both similarities are combined with certain weight, its formula is as follows：

sim(e_i,e_j)=α sim_string(e_i,e_j)+(1-α)sim_block(b_k,b_l)

s.t.e_i∈b_k,e_j∈b_l

Wherein, sim_string(e_i,e_j) and sim_block(b_k,b_l) similarity of character string and block phase respectively between presentation-entity Like degree, b_kAnd b_lDifference presentation-entity e_iAnd e_jThe block at place, α is the weight of similarity of character string, and value is 0.6.For reality Body e_iAnd e_jShared attribute is to (for example：Name), similarity of character string measures the similarity of these property values.Method is used A variety of measuring similarity functions, such as：Based on Levenshtein distances, based on Jaro-Winker distances, based on q-gram and base In I-SUB similarity function, and combine by way of linear weighted function these measuring similarity functions.Block similarity passes through Similarity between block where entity carrys out the similarity of presentation-entity.After the similarity for obtaining inter-entity, with reference to threshold value δ_eSentence Whether this pair of entity that break matches, and by newfound matching entities to adding all matching entities pair.

S2-8, judges whether iterations is less than iteration threshold, all no, redirects execution S2-2；If so, terminate iteration, it is defeated Go out matching entities to collection M_e。

Technical scheme and beneficial effect are described in detail above-described embodiment, Ying Li Solution is to the foregoing is only presently most preferred embodiment of the invention, is not intended to limit the invention, all principle models in the present invention Interior done any modification, supplement and equivalent substitution etc. are enclosed, be should be included in the scope of the protection.

Claims

1. a kind of large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching, are specifically included：

Data preprocessing phase：Knowledge base KB former to any two₁、KB₂In data screened, Uniform data format and Meaningless character processing is rejected, and counts acquisition and knowledge base KB ' after processing₁Corresponding set of relations R₁And knowledge base after processing KB′₂Corresponding set of relations R₂, compare acquisition initial matching entity to collection

Knowledge base align stage：Utilize set of relations R₁With set of relations R₂In relation pair knowledge base KB '₁With knowledge base KB '₂Divided Area, and each block is simplified, obtain simplifying block collection B '₁With B '₂；Then, using initial matching entity to collection With simplifying block collection B '₁With B '₂In block, obtain match block pair, finally, matching block pair in select candidate's entity pair, And combine method for measuring similarity and threshold value δ_eConfirm candidate's entity pair.

2. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 1 based on Iterative matching, it is characterised in that described Data preprocessing phase is concretely comprised the following steps：

The former knowledge base KB of (1-1) input any two₁、KB₂, and remove knowledge base KB₁、KB₂In the information unrelated with task of aliging；

(1-2) is to knowledge base KB₁In literal L₁With knowledge base KB₂In literal L₂Uniform data format, by date, number Word, name are expressed as unified form；

(1-3) removes knowledge base KB₁In literal L₁With knowledge base KB₂In literal L₂Middle stop words character, sign character, Linguistic labelses character, knowledge base KB ' after being handled₁With KB '₂；

(1-4) statistics is obtained and knowledge base KB '₁Corresponding set of relations R₁And knowledge base KB '₂Corresponding set of relations R₂；

3. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 2 based on Iterative matching, it is characterised in that the step Suddenly the detailed process of (1-4) is：

For knowledge base KB '₁, travel through the triplet sets F for belonging to the knowledge base_R1In all entity-relationship-entity ternarys Group, statistics obtains set of relations R₁；For knowledge base KB '₂, travel through the triplet sets F for belonging to the knowledge base_R2In all realities Body-relation-entity triple, statistics obtains set of relations R₂。

4. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 2 based on Iterative matching, it is characterised in that step In (1-5), described initial matching entity is to collectionAcquisition process be：

First, knowledge base KB ' is extracted₁In all entities composition entity set E₁, extract knowledge base KB '₂In all entities composition Entity set E₂；And with entity set E₁In any entity and entity set E₂In the cartesian product of any entity be used as entity pair, group Into entity to collection；

Then, screening obtains string representation identical entity pair of the entity to two entity name attributes of concentration, obtains pre- Initial matching entity is to collection；

Finally, pre- initial matching entity is screened to concentrating the entity pair with one-to-one matching relationship, is used as initial matching entity To collection

5. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 1 based on Iterative matching, it is characterised in that described Knowledge base align stage is concretely comprised the following steps：

(2-1) Input knowledge storehouse KB '₁, knowledge base KB '₂, set of relations R₁, set of relations R₂, initial matching entity to collectionSet Block similarity threshold δ_b, entity similarity threshold δ_e, physical quantities threshold value δ in block₁And matching entities ratio in block Threshold value δ₂, matching entities are to collection M_eInitial matching entity is initialized as to collection

(2-2) randomly selects set of relations R₁Or set of relations R₂In any relation, using the relation by knowledge base KB '₁And knowledge base KB′₂In entity be divided into several blocks, obtain and knowledge base KB '₁Corresponding block collection B₁And knowledge base KB '₂It is corresponding Block collection B₂；

(2-3) removes Except block collection B₁With block collection B₂In be also easy to produce high amount of calculation or be difficult to generate matching entities pair block, obtain Simplify block collection B '₁With simplify block collection B '₂；

(2-4) is using matching entities to collection M_eIn all matching entities to measurement simplify block collection B '₁Middle either block is with simplifying Block collection B '₂Similarity between middle either block, selection Similarity value is more than block similarity threshold δ_bTwo blocks carry out Matching, obtains matching block to collection；

(2-5) to belonging to matching block to any matching block pair of concentration, with appointing in a block of the matching block pair The cartesian product that one non-matching entities match any non-matching entities in another block of block pair with this is real as candidate Body pair, composition candidate's entity is to collection；

(2-6) judges whether not find new candidate's entity pair, if it is not, execution step (2-7) is redirected, if so, terminate iteration, and output Matching entities are to collection M_e；

(2-7) calculates candidate's entity to the similarity between each entity of candidate's entity centering two of concentration, and Similarity value is more than Entity similarity threshold δ_eCorresponding candidate's entity to added to matching entities to collection M_eIn, remaining candidate's entity is to giving up；

(2-8) judges whether iterations is less than iteration threshold, all no, redirects execution step (2-2)；If so, terminate iteration, it is defeated Go out matching entities to collection M_e。

6. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 5 based on Iterative matching, it is characterised in that step In (2-2), the detailed process that the entity in knowledge base is divided into several blocks is by described utilization relation；

Firstly, for knowledge base KB '₁In triplet sets F_R1, count and obtain triplet sets F_R1Middle n kinds object entity；

Then, for every kind of object entity, by triplet sets F_R1In all subject entities corresponding thereto put together, obtain To 1 block, n kind object entities obtain n block, composition block collection B₁；

Block collection B is obtained using same method₂。

7. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 5 based on Iterative matching, it is characterised in that step In (2-3), the described block for being also easy to produce high amount of calculation or being difficult to generation matching entities pair includes：Physical quantities exceed threshold value δ₁ Block, matching entities ratio be less than threshold value δ₂Block and entity all matched blocks.

8. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 5 based on Iterative matching, it is characterised in that step In (2-4), the acquisition methods of the similarity between block are：

Each block is seen be entity set, matched entity utilizes collection to the identical element regarding two set as Close the similarity that similarity is come between Metrics block, similarity sim_block(b_k,b_l) calculation formula be：

Wherein, b_kAnd b_lTwo blocks are represented, | b_k∩b_l| represent that matching entities are to quantity in two blocks, | b_k∪b_l| represent two Total physical quantities in block.

9. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 8 based on Iterative matching, it is characterised in that step In (2-7), the acquisition formula of the similarity between entity is：

sim(e_i,e_j)=α sim_string(e_i,e_j)+(1-α)sim_block(b_k,b_l)

s.t.e_i∈b_k,e_j∈b_l

Wherein, b_kAnd b_lDifference presentation-entity e_iAnd e_jThe block at place, sim_string(e_i,e_j) and sim_block(b_k,b_l) difference table Show the similarity of character string and block similarity of inter-entity, α is the weight of similarity of character string, and span is [0,1].

10. the large-scale Heterogeneous Knowledge storehouse alignment schemes as claimed in claim 9 based on Iterative matching, it is characterised in that use Based on Levenshtein distances, based on Jaro-Winker distances, the similarity function based on q-gram and based on I-SUB, and These measuring similarity functions are combined by way of linear weighted function and calculate acquisition similarity of character string.