CN106951526A - A kind of entity set extended method and device - Google Patents
A kind of entity set extended method and device Download PDFInfo
- Publication number
- CN106951526A CN106951526A CN201710168839.XA CN201710168839A CN106951526A CN 106951526 A CN106951526 A CN 106951526A CN 201710168839 A CN201710168839 A CN 201710168839A CN 106951526 A CN106951526 A CN 106951526A
- Authority
- CN
- China
- Prior art keywords
- entity
- path
- candidate
- node
- fructification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of entity set extended method and device provided in an embodiment of the present invention, according to predetermined seed entity set, extract candidate's entity composition candidate's entity set from object knowledge collection of illustrative plates;From heterogeneous information network corresponding with object knowledge collection of illustrative plates, it is determined that planting first path between fructification;First path is:The access path being made up of entity type and relationship type between two node types in heterogeneous information network;Wherein, described two node types are the different corresponding node types of kind fructification;The quantity of the kind fructification pair connected according to every first path determines first significance level in every first path;According to first significance level in every first path, the second significance level of each candidate's entity in candidate's entity set is determined;By in candidate's entity set, candidate's entity that the second significance level meets the first preparatory condition is defined as entity to be extended, and entity to be extended is added in seed entity set.Effective entity set extension can be carried out using the present invention.
Description
Technical field
The present invention relates to entity set expansion technique field, more particularly to a kind of entity set extended method and device.
Background technology
Entity set extension refers to, it is known that several entity seeds with certain semantic type (also referred to as particular common characteristics),
More entities of the certain semantic type are obtained according to certain rule.For example, given certain semantic type is national capital
Entity seed set { Beijing, Washington, Moscow }, it is desirable to find out more national capitals, such as find out that { Soul, Tokyo is lucky
Long Po, }.At present, entity set extension has been obtained for being widely applied, for example, extension and the query suggestion of dictionary
Extension etc..
Most common entity set extended method is, chooses a data source, to data source by it is certain it is regular handle,
Other entities that therefrom determine has identical semantic type with planting fructification are used as the extensible element of entity set.Existing entity
Collect extended method, data source is mostly used as using text or webpage.However, because the data volume included in single text and webpage has
Limit so that the validity of entity set extension is undesirable, it is impossible to meet increasingly soaring entity set extension demand.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of entity set extended method and device, to improve entity set extension
Validity.
To achieve these goals, in a first aspect, the embodiments of the invention provide a kind of entity set extended method, the side
Method includes:
According to predetermined seed entity set, candidate's entity is extracted from object knowledge collection of illustrative plates, and will extract what is obtained
Candidate's entity constitutes candidate's entity set;The object knowledge collection of illustrative plates at least includes the kind fructification in the seed entity set;
From heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that planting first path between fructification;Institute
Stating first path is:The connection being made up of between two node types in the heterogeneous information network entity type and relationship type
Path;Wherein, described two node types are the corresponding node type of kind fructification different in the seed entity set;
The quantity of the kind fructification pair connected according to every first path determines first significance level in every first path;
According to first significance level in every first path, second of each candidate's entity in candidate's entity set is determined
Significance level;
By in candidate's entity set, candidate's entity that second significance level meets the first preparatory condition is defined as treating
Entity is extended, and the entity to be extended is added in the seed entity set.
Alternatively, it is described according to predetermined seed entity set, candidate's entity, bag are extracted from object knowledge collection of illustrative plates
Include:
Determine each entity type collection for planting fructification in predetermined seed entity set;
The common factor of all entity type collection is defined as initial solid set of types;
According to the hierarchical relationship of each entity type in the initial solid set of types, determine that the seed entity set is corresponding
Final entity type collection;The entity in the object knowledge collection of illustrative plates, meeting the final entity type centralized entity type is made
For candidate's entity.
Alternatively, the hierarchical relationship according to each entity type in the initial solid set of types, it is determined that final entity
Set of types, including:
At least one hierarchical relationship corresponding to the initial solid set of types is determined, wherein, any hierarchical relationship is extremely
The subordinate relation of few two entity types;
The entity type of the bottom will be located in each hierarchical relationship, be defined as final entity type, and will be identified
Final entity type composition is final entity type collection.
Alternatively, it is described from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that between planting fructification
First path, including:
From heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that section corresponding with the seed entity set
Point set, wherein, the node set includes node corresponding with the kind fructification in the seed entity set;
It regard each node in the node set as first node;
Using each first node as current source Node, accessed and each current source Node in the heterogeneous information network
The current target node connected by the side of preset kind, sets up multiple structured data tables to be selected corresponding with side type;Wherein,
Any structured data table to be selected includes:By each first node with by the structured data table to be selected it is corresponding while type while connect
The first instance of the current target node composition connect is to, the similarity of each first instance pair, the path accessed and phase
Like property fraction;The similarity scores are the summation of the similarity of all first instances pair;
For each structured data table to be selected, judge to be connected with each current source Node in the structured data table to be selected
Current target node whether be Section Point;If it is, by current source Node in the structured data table to be selected corresponding
The similarity of one entity pair is designated as the first numerical value, and the corresponding path accessed of the current source Node is defined as into one
First path examples, are otherwise designated as second value;Wherein, the Section Point is:With current source Node pair in the node set
The different node of the first node answered;
From structured data table to be selected, the structured data table to be selected that selection meets the second preparatory condition is used as current structure number
According to table;Second preparatory condition includes:The most species of the kind fructification stored in structured data table to be selected;When what is stored
When the most structured data table to be selected of seed entity class has multiple, second preparatory condition also includes:Structured data to be selected
The minimum number of the first instance pair stored in table;
Each current target node in the current structure tables of data is updated to current source Node, returned described in performing
The current target node that access is connected with each current source Node by the side of preset kind in the heterogeneous information network
Step;
When the path length accessed in each current structure tables of data is more than three preset values, or when each current
When seed number of entities in structured data table is less than four preset values, all first path examples that statistics is determined, and according to
Entity type and relationship type that all first path examples are included, obtain the corresponding first road of all first path examples
Footpath.
Alternatively, described from structured data table to be selected, the structured data table to be selected that selection meets the second preparatory condition is made
For current structure tables of data, including:
From similarity scores are not more than multiple structured data tables to be selected of the first preset value, selection meets the second default bar
The structured data table to be selected of part is used as current structure tables of data.
Alternatively, the quantity of the kind fructification pair connected according to every first path determines first weight in every first path
Degree is wanted, including:
The kind fructification that the first path of all kinds of fructifications pair determination every connected according to every first path is connected is to total
Number;
The kind fructification connected according to every first path determines the of every first path to sum and the first preset model
One significance level;
Wherein, first preset model is:
Wherein, WkFor first path PkCorresponding first significance level, l is the bar number in first path;SPkFor first path PkThe kind fructification connected is to sum, and m is the quantity of kind of fructification,For kind of a fructification pair
Total quantity.
Alternatively, first significance level according to every first path, determines each time in candidate's entity set
The second significance level of entity is selected, including:
According to first significance level and the second preset model in every first path, determine each in candidate's entity set
Second significance level of candidate's entity;
Wherein, second preset model is:
sj∈ S, i ∈ { 1,2,3 ..., n }, wherein, R (ci, S) represent to wait
Select entity ciThe second significance level, n be candidate's entity quantity;sjKind of a fructification is represented, S represents the seed entity set, m
For the quantity of kind of fructification;WkFor first path PkCorresponding first significance level, l is the bar number in first path;r{(ci,sj)|PkTable
Show first path PkWhether connection kind fructification sjWith candidate's entity ci, if it is, r=1, otherwise, r=0.
Alternatively, described by candidate's entity set, second significance level meets the candidate of the first preparatory condition
Entity is defined as entity to be extended, including:
By in candidate's entity set, candidate's entity that second significance level is more than the second preset value is defined as waiting to expand
Open up entity.
Alternatively, described by candidate's entity set, second significance level meets the candidate of the first preparatory condition
Entity is defined as entity to be extended, including:
According to second significance level, candidate's entity in candidate's entity set is ranked up in descending order, obtained
First candidate's entity set;Also, the candidate that preceding first predetermined number of sequence is chosen from the first candidate entity set is real
Body is used as entity to be extended.
In order to realize foregoing invention purpose, second aspect, the embodiments of the invention provide a kind of entity set expanding unit, institute
Stating device includes:
Candidate's entity set determining module, for according to predetermined seed entity set, being extracted from object knowledge collection of illustrative plates
Candidate's entity, and obtained candidate's entity composition candidate's entity set will be extracted;The object knowledge collection of illustrative plates at least includes the kind
The kind fructification that fructification is concentrated;
First path determination module, for from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, determining seed
First path between entity;First path is:By entity type between two node types in the heterogeneous information network
The access path constituted with relationship type;Wherein, described two node types are real for seed different in the seed entity set
The corresponding node type of body;
First significance level determining module, the quantity of the kind fructification pair for being connected according to every first path determines every
First significance level in first path;
Second significance level determining module, for the first significance level according to every first path, determines that the candidate is real
Second significance level of each candidate's entity that body is concentrated;
Entity set expansion module, for by candidate's entity set, second significance level to meet the first default bar
Candidate's entity of part is defined as entity to be extended, and the entity to be extended is added in the seed entity set.
A kind of entity set extended method and device provided in an embodiment of the present invention, on the one hand, by comprising data volume it is huge
Object knowledge collection of illustrative plates be used as data source carry out entity set extension;On the other hand, from heterogeneous letter corresponding with object knowledge collection of illustrative plates
Cease and first path between seed entity set determined in network, since it is determined that each first path be connection kind fructification pair
Path, therefore, these yuan of path can accurately reflect the particular common characteristics of seed inter-entity, and then utilize each first path
The first significance level determined by candidate's entity the second significance level more effectively, and then according to the second significance level determine
Entity to be extended also more effectively.Therefore, it can be carried using entity set extended method provided in an embodiment of the present invention and device
The validity of high entity set extension.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of entity set extended method provided in an embodiment of the present invention;
Fig. 2 is the partial schematic diagram of Yago knowledge mappings;
Fig. 3 be Yago knowledge mappings in entity type hierarchical relationship partial schematic diagram;
Fig. 4 be Fig. 1 shown in embodiment in step S102 a kind of detail flowchart;
Fig. 5 is the principle schematic that first path is determined using a kind of detail flowchart shown in Fig. 4;
Fig. 6 A to Fig. 6 D illustrate for a kind of validation verification result of entity set extended method provided in an embodiment of the present invention
Figure, be from Fig. 6 A to Fig. 6 D entity types being corresponding in turn to:The performer of the film of Glenn Stevens Pierre Burger director, director obtain
The film of countries movies prize director, the software that the company positioned at California mountain scene city produces must be crossed, positioned at Massachusetts Cambridge
The scientist of university;
Fig. 7 is a kind of structured flowchart of entity set expanding unit provided in an embodiment of the present invention;
Fig. 8 be Fig. 7 shown in embodiment in module 702 a kind of detailed block diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
In order to solve the problem of prior art is present, the embodiments of the invention provide a kind of entity set extended method and dress
Put, illustrated respectively with reference to specific embodiment.
First to being illustrated the embodiments of the invention provide a kind of entity set extended method.
As shown in figure 1, a kind of entity set extended method provided in an embodiment of the present invention, comprises the following steps:
S101, according to predetermined seed entity set, candidate's entity is extracted from object knowledge collection of illustrative plates, and will be extracted
The candidate's entity composition candidate's entity set arrived;The seed that the object knowledge collection of illustrative plates is at least included in the seed entity set is real
Body;
Planting fructification can be set previously according to given certain semantic type, the collection being made up of all kinds of fructifications
It is seed entity set to close.For example, previously given specific semantic type is movie director, then Li An can be predefined, old
Paean, Zhang Yimou constitute seed entity set { Li An, Chen Kaige, Zhang Yimou } as kind of a fructification.
Knowledge mapping is a data set being on a grand scale, mainly by<Main body, predicate, object>Such triple structure
Into.Yago knowledge mappings for example shown in embodiment of the present invention Fig. 2, one of triple is<This Pierre's Burger, director,
Battle steed film>, it is meant that represented by the triple, this Pierre's Burger has directed film battle steed.It is existing in addition to Yago knowledge mappings
Have and also there is some other knowledge mapping, such as DBpedia and Freebase in technology.
In embodiments of the present invention, object knowledge collection of illustrative plates, refers to the knowledge mapping related to predetermined kind of fructification.
It will be appreciated to those of skill in the art that when carrying out entity set extension, the data source only used is with planting fructification tool
There is correlation, the accurate extension of entity set could be realized.
Specifically, object knowledge collection of illustrative plates at least includes the kind fructification in the seed entity set.
In embodiments of the present invention, candidate's entity is the entity for having particular common characteristics with kind of fructification.Wherein, it is specific
Common trait includes:Entity type is identical.
S102, from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that plant fructification between first road
Footpath;First path is:It is made up of between two node types in the heterogeneous information network entity type and relationship type
Access path;Wherein, described two node types are the corresponding node class of kind fructification different in the seed entity set
Type;
Heterogeneous information network (Heterogeneous Information Network) is a digraph G=(V, E),
Wherein, V is the set of all entity nodes, and E is the entity object type in the set on all relation sides, digraph | A | > 1 or
Link the relationship type between different entities object | R | > 1, in a network, one entity object of a node on behalf are (referred to as real
Body), a line represents the relation between two entity objects connected by this edge.Also, it there is reflecting for node type
Penetrate functionWith the mapping function ψ of a side type:E → R, belongs to a kind of special for each entity object v ∈ V
Object typeEach edge e ∈ E belong to a kind of special relationship type ψ (e) ∈ R.
First path refers to, by entity type and relationship type group between two node types in the heterogeneous information network
Into access path, first path represents the semantic relation between two node types.One member path ∏ is defined asIt is by entity type (node type) and relationship type (side type) group
Into a sequence, it is described in A1The node and A of typel+1Between the node of type, pass through a series of A1,…,Al+1Class
The node and R of type1,…,RlOne paths of the side connection of type, wherein, A1Corresponding node type is referred to as source node class
Type, Al+1Corresponding node type is referred to as destination node type.
In heterogeneous information network, first path is widely used for catching abundant semantic information, and we define object a1With
al+1Between a pathsIt is a paths example of first path P, if meeting following
Condition is rightAnd ψ (ei)∈Ri, wherein,Represent to all i.
Generally, a first path there may be mulitpath example, for example, a paths example is:Another paths example is:
Because this two paths example all meets first pathSo we say this two paths
All it is the path examples in this first path.
Due to knowledge mapping it is main by<Main body, predicate, object>Such triple is constituted, and subject and object therein can
To correspond to an entity respectively, predicate therein can represent certain relation or attribute between subject and object, also, knowledge
Relation or the equal more than one of attribute between the type and subject and object of the subject and object included in collection of illustrative plates.Therefore, root
A heterogeneous information network can be built in advance according to knowledge mapping.
For example, in fig. 2, " director " and " performance " is two distinct types of relation, " performer " and " film " is different
Entity type,It is between Toby Kai Beier and Glenn Stevens Pierre's Burger
One first path.
In addition, in fig. 2, Toby Kai Beier and Martin McCain belong to performer's class, Toby Kai Beier and Ni Ji
Your Nigel Havers is not only performer's class, and falls within performer's class of the film of Glenn Stevens Pierre Burger director, in order to
Better discriminate between both classifications, we the former be called coarseness entity type, the latter is fine-grained entity type, according to
Candidate's entity that fine granularity entity type is determined more likely is confirmed as entity to be extended.
Specifically, building heterogeneous information network according to knowledge mapping belongs to prior art, therefore, this process is not done herein
It is described in detail.
In embodiments of the present invention, described two nodes are the corresponding section of kind fructification different in the seed entity set
Point, the node being made up of described two nodes is to can be referred to as " plant fructification to ".
Table 1 lists seed entity set for { s1,s2,…,smWhen, that plants the corresponding node composition of fructification " plants fructification
It is right ".As shown in table 1, when source node is s1When, destination node is { s2,…,smIn any one;When source node is s2When, mesh
Mark node is { s1,s3,…,smIn any one;When source node is other nodes, by that analogy, no longer chat one by one herein
State.
Table 1
It should also be noted that, in embodiments of the present invention, an active node and destination node pair in each first path
The entity answered is kind of a fructification, and the corresponding entity of other nodes is non-seed entity.
S103, the quantity of the kind fructification pair connected according to every first path determine the first important journey in every first path
Degree;
In a kind of embodiment provided in an embodiment of the present invention, step S103 includes:
Step 1, the quantity of all kinds of fructifications pair connected according to every first path determine what every first path was connected
Fructification is planted to sum;
Specifically, because each paths example connects a pair kinds of fructifications, therefore, the seed that every first path is connected
Entity is to sum of the sum for the quantity of the kind fructification pair of the corresponding all path examples connections in this yuan of path.
Step 2, the kind fructification connected according to every first path are to sum and the first preset model, it is determined that per Tiao Yuan roads
First significance level in footpath;
Wherein, first preset model is:
Wherein, WkFor first path PkCorresponding first significance level, l is the bar number in first path;SPkFor first path PkThe kind fructification connected is to sum, and m is the quantity of kind of fructification,For kind of a fructification pair
Total quantity.
All important first paths are defined in step s 102, but the significance level in every first path is different
, applicant shows by substantial amounts of experimental verification, the seed that the significance level in a certain bar member path is connected with the first path of this
Entity is relevant to sum, and the kind fructification of this member path connection is bigger to sum, and this yuan of path can more reflect kind of a fructification
Common trait, therefore, this member path it is more important.
In consideration of it, the embodiment of the present invention proposes the first important journey that each first path is determined according to the first preset model
The method of degree, from the first preset model, it is seen that, first path PkThe kind fructification connected is bigger to sum, its correspondence
The first importance value it is bigger.
It should be noted that determining that the method for first significance level in every first path is not limited to above-mentioned one kind, existing skill
The method of first significance level in the first path of other determinations every present in art, suitable for the present invention.
S104, the first significance level according to every first path, determine each candidate's entity in candidate's entity set
The second significance level;
In a kind of embodiment provided in an embodiment of the present invention, step S104 includes:
According to first significance level and the second preset model in every first path, determine each in candidate's entity set
Second significance level of candidate's entity;
Wherein, second preset model is:
sj∈ S, i ∈ { 1,2,3 ..., n }, wherein, R (ci, S) represent to wait
Select entity ciThe second significance level, n be candidate's entity quantity;sjKind of a fructification is represented, S represents the seed entity set, m
For the quantity of kind of fructification;WkFor first path PkCorresponding first significance level, l is the bar number in first path;r{(ci,sj)|PkTable
Show first path PkWhether connection kind fructification sjWith candidate's entity ci, if it is, r=1, otherwise, r=0.
It is seen that, the second significance level and the first significance level correlation, due to the of a certain article of member path
One significance level is bigger, and the particular common characteristics of seed inter-entity can be reflected by illustrating that the first path of this is got over, therefore, according to the first weight
Want the second significance level of candidate's entity of degree determination more effectively.
Explanation is needed also exist for, determines that the method for the second significance level of each candidate's entity is not limited to above-mentioned one kind,
The method of second significance level of other each candidate's entities present in prior art, suitable for the embodiment of the present invention.
S105, by candidate's entity set, candidate's entity that second significance level meets the first preparatory condition is true
It is set to entity to be extended, and the entity to be extended is added in the seed entity set.
In a kind of embodiment provided in an embodiment of the present invention, step S105 includes:
By in candidate's entity set, candidate's entity that second significance level is more than the second preset value is defined as waiting to expand
Open up entity.
In another embodiment provided in an embodiment of the present invention, step S105 includes:
According to second significance level, candidate's entity in candidate's entity set is ranked up in descending order, obtained
First candidate's entity set;Also, the candidate that preceding first predetermined number of sequence is chosen from the first candidate entity set is real
Body is used as entity to be extended.
Applicant uses corresponding according to the object knowledge collection of illustrative plates to the entity to be extended of the first selected predetermined number
Sequence index carry out validation verification, it was confirmed that the validity of this method.
A kind of entity set extended method provided in an embodiment of the present invention, on the one hand, by comprising the huge target of data volume
Knowledge mapping carries out entity set extension as data source;On the other hand, from heterogeneous information network corresponding with object knowledge collection of illustrative plates
The middle first path determined between kind of fructification, since it is determined that each first path for connection kind fructification pair path, because
This, these yuan of path can accurately reflect the particular common characteristics of seed inter-entity, and then utilize the first of each first path
Second significance level of candidate's entity determined by significance level more effectively, and then according to the second significance level determine wait expand
Open up entity also more effectively.Therefore, entity set extended method provided in an embodiment of the present invention can improve having for entity set extension
Effect property.
In addition, the knowledge mapping such as Yago has become a kind of instrument of quick-searching information.With knowledge mapping
Prevalence, many researchers begin to use this instrument to aid in improving the accurate of the entity set extension in text or webpage
Property.However, also few work at present carry out entity set extension using knowledge mapping as single data source.But handle is known
It is necessary to know collection of illustrative plates to carry out entity set extension as single data source, and reason is as follows:(1) it is traditional based on text or
The entity set extended method of person's info web needs complicated natural language processing, and this can influence the accurate of extension to a certain extent
Rate, and do not need these complicated pretreatments using knowledge mapping as single data source;(2) knowledge mapping includes abundant
Entity and semantic relation, this will have very much benifit to entity set extension.
In a kind of embodiment provided in an embodiment of the present invention, in above-mentioned steps S101, according to predetermined
Seed entity set, the step of extracting candidate's entity from object knowledge collection of illustrative plates, can include:
Step 1, the entity type collection for determining each kind fructification in predetermined seed entity set;
For example, coming for the kind fructification Li An in the seed entity set { Li An, Chen Kaige, Zhang Yimou } that above determines
Say, corresponding entity type collection is { people, director };For kind of fructification Chen Kaige and Zhang Yimou, corresponding kind of fructification
Set of types is { people, director, performer }.
Step 2, the common factor of all entity type collection is defined as initial solid set of types;
Because identical entity type can more reflect the common trait of inter-entity, therefore, by the friendship of all entity type collection
Collection is defined as initial solid set of types, can be with significantly more efficient progress entity set extension.
Specifically such as, the entity type collection { people, director } and seed entity type collection { people, director, performer } determined in step 1
Common factor be { people, director }, namely determine that initial solid set of types is { people, director }.
Step 3, the hierarchical relationship according to each entity type in the initial solid set of types, determine the seed entity set
Corresponding final entity type collection;The final entity type centralized entity type in the object knowledge collection of illustrative plates, will be met
Entity is used as candidate's entity.
Because " people " in initial solid set of types { people, director } is although this entity type can reflect kind of fructification
Common trait, but its granularity is thicker, the candidate's entity for causing to determine it is semantic indefinite.Therefore, in the embodiment of the present invention
In, according further to the hierarchical relationship of each entity type in initial solid set of types, determine that the seed entity set is corresponding
Final entity type collection.
" coarseness " entity type will be referred to as comprising the more entity types of subtype in embodiments of the present invention, accordingly
Subtype is referred to as " fine granularity " entity type, for example, in " people " and " director " the two entity types, " people " belongs to coarse grain
Degree, " director " belongs to fine granularity, it will be appreciated by persons skilled in the art that the coarseness and fine granularity of entity type are relative
For.
Specifically, the hierarchical relationship of each entity type refers to that the subordinate of each entity type is closed in initial solid set of types
System, for example, in initial solid set of types { people, director }, " director " this entity type is subordinated to " people " this entity type.
More specifically, above-mentioned steps 3 can include:
Sub-step 1, at least one hierarchical relationship corresponding to the initial solid set of types is determined, wherein, any level
Relation is the subordinate relation of at least two entity types;
Sub-step 2, the entity type that will be located at the bottom in each hierarchical relationship, are defined as final entity type, and will
Identified final entity type composition is final entity type collection.
Entity type or the relationship type often tissue in the way of level in knowledge mapping, the description of this hierarchical relationship
Subordinate relation between entity type or relationship type (also referred to as set membership), Fig. 3 shows that the level of entity type is closed
The partial schematic diagram of system, all these types share a root node things.
As shown in figure 3, when entity type collection is { things, people, movie director, performer, artificiality, film }, can be with structure
Build out:Movie director, which is subordinated to people, people and is subordinated to things, performer and is subordinated to people, film, is subordinated to artificiality and artificiality subordinate
In the hierarchical relationship of things.In figure 3, the entity type positioned at the bottom is:Movie director, performer and film.
For the initial solid set of types { people, director } determined in step 2, it is positioned at undermost entity type:Lead
Drill.Therefore, final entity type is " director ", and the final entity type collection of composition is { director }.
It will be appreciated by persons skilled in the art that the entity type that final entity type is concentrated can be that one kind can also
It is a variety of, this is all rational.
It is not difficult to find out, in the present embodiment, on the one hand, due to the entity type that initial solid set of types is various fructifications
The common factor of collection, and the entity type in the common factor of the entity type collection of various fructifications can more reflect the common spy of kind of fructification
Levy;On the other hand, due to initial solid type be centrally located at the entity type of the bottom more can representative species fructification semanteme, and
Final candidate entity type collection is determined according to the hierarchical relationship of each entity type in initial solid set of types, therefore, according to
Candidate's entity of final candidate entity type collection selection, more likely has specific common trait, more having can with kind of a fructification
It can be added to as entity to be extended in seed entity set, this tentatively ensure that entity set extension provided in an embodiment of the present invention
The validity of method.
In addition, it is necessary to which explanation, determines that the method for candidate's entity is not limited to above-mentioned one kind side of the present embodiment offer
Method, the method for other determinations candidate's entity present in prior art is applied to the embodiment of the present invention.
In a kind of embodiment provided in an embodiment of the present invention, in the step S102 in the embodiment shown in Fig. 1,
It is described from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that plant fructification between first path, including:
Step 1, from the heterogeneous information network corresponding with the object knowledge collection of illustrative plates, determine one group and the seed
The corresponding node of kind fructification in entity set;
Step 2, using each node of determination as source node, travel through the heterogeneous information network, when destination node be except
During kind fructification outside the source node itself, the path for connecting the source node and the destination node is defined as a first path real
Example;
All first path examples that step 3, statistics are determined, and the entity included according to all first path examples
Type and relationship type, obtain the corresponding first path of all first path examples.
It is not difficult to find out, due to only making identified one group node corresponding with the kind fructification in the seed entity set
For source node, travel through the heterogeneous information network and determine each important first path, therefore, it can reduce time for determining first path
Scope is gone through, the efficiency for determining first path can be not only improved, additionally aid saving computing resource.
Below please also refer to Fig. 4 and Fig. 5, Fig. 4 shows one kind of the step S102 in the embodiment shown in Fig. 1 in detail
Flow chart, namely a kind of flow chart of first determining method of path.Fig. 5 shows true using a kind of detail flowchart shown in Fig. 4
The principle schematic in fixed member path.
In a kind of embodiment provided in an embodiment of the present invention, as shown in figure 4, in embodiment shown in Fig. 1
It is described from heterogeneous information network corresponding with the object knowledge collection of illustrative plates in step S102, it is determined that planting the member between fructification
Path, including:
S401, from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that with the seed entity set pair
The node set answered, wherein, the node set includes node corresponding with the kind fructification in the seed entity set;
In a kind of embodiment, the node set includes and the seed physical quantities in the seed entity set
Equal and one-to-one node.For example, it is assumed that seed entity set is performer { 1,2,3 }, the then corresponding node set
For performer { 1,2,3 }.
In embodiments of the present invention, equal with the seed entity set seed physical quantities and one-to-one node is selected
The set of composition is to reduce seeking scope as the purpose of node set, reduces the amount of calculation for determining every first path, saves meter
Calculate resource.
Certainly, it will be appreciated by persons skilled in the art that in the case where computing resource is more abundant, can also select
But quantity corresponding with kind of fructification is more than the node composition node set of seed physical quantities, and this is all rational.For example it is false
If seed entity set is performer { 1,2,3 }, the corresponding node set can be performer { 1,2,3,1,2,3 }.
S402, it regard each node in the node set as first node;
Describe for convenience, in the present embodiment, using seed entity set as performer { 1,2,3 }, the corresponding node
Collection is illustrated exemplified by being combined into performer { 1,2,3 }.
Specifically, regarding node set as first node for each node in performer { 1,2,3 }.
S403, it regard each first node as current source Node;
Alternatively, for convenience of explanation, an initial configuration tables of data can be initially set up.
In embodiments of the present invention, structured data table citation form is as shown in table 2.In table 2, (s, t) represents source node s
The entity pair constituted with destination node t;σ (s, t | ∏) similarity of the entity under current path ∏ to (s, t) is represented, if
The entity of current path ∏ connections is kind of a fructification pair to (s, t), then similarity is the first numerical value, and otherwise similarity is the
Two numerical value.In embodiments of the present invention, the first numerical value is more than second value, it is generally the case that the first numerical value is equal to 1, the second number
Value is equal to 0.(s ..., t) it is expressed as finding all sections accessed with the source node s passage paths ∏ destination node t being connected
Point.Certainly, (s ... t) is not necessarily required to be contained in structured data table.
Table 2
Specifically, initial configuration tables of data is as shown in the Table A in Fig. 5.Due under initial situation, the node of current accessed
For first node in itself, therefore, source node and destination node are the entity pair of first node, source node and destination node composition
Corresponding similarity is 0, and for first node in itself, the similarity scores of initial configuration tables of data are also 0 to the node accessed.
S404, in the heterogeneous information network access with each current source Node pass through the side of preset kind be connected ought
Preceding destination node, sets up multiple structured data tables to be selected corresponding with side type;
Wherein, any structured data table to be selected includes:It is corresponding with by the structured data table to be selected by each first node
While type while the first instance of current target node composition that connects to, the similarity of each first instance pair, visited
The path asked and similarity scores;The similarity scores are the summation of the similarity of all first instances pair;
Specifically as shown in figure 5, on the basis of initial configuration tables of data A, being accessed in the heterogeneous information network with working as
The current target node that preceding source node 1,2 and 3 is connected by " performance " this edge, and pass through with current source Node 1,2 and 3
The current target node of " being born in " this edge connection.Herein as an example, only selecting " performance " and " being born in " two types
Side be extended, it will be understood by those skilled in the art that in actual applications, connect each current source Node and current mesh
It can be one or two kinds of or two or more to mark the side of the preset kind of node.
In Figure 5, exemplarily establish altogether two corresponding with " performance " and " being born in " two kinds of side it is to be selected
Structured data table, respectively table B and table C.
S405, for each structured data table to be selected, judge to save with each current source in the structured data table to be selected
Whether the current target node of point connection is Section Point;If it is, by the current source Node pair in the structured data table to be selected
The similarity for the first instance pair answered is designated as the first numerical value, and the corresponding path accessed of the current source Node is determined
For a first path examples, second value is otherwise designated as;Wherein, the Section Point is:With working as in the seed entity sets
The different node of the corresponding first node of preceding source node;
Specifically, in table B and table C in Figure 5, because the corresponding current target node of each first node is not
Two nodes, therefore, the similarity of each first instance pair are exemplarily labeled as 0.
S406, from structured data table to be selected, selection meet the second preparatory condition structured data table to be selected as current
Structured data table;Second preparatory condition includes:The most species of the kind fructification stored in structured data table to be selected;
Alternatively, when the most structured data table to be selected of the seed entity class stored has multiple, described second is pre-
If condition also includes:The minimum number of the first instance pair stored in structured data table to be selected.
Specifically, in Figure 5, because the species of the kind fructification stored in structured data table B to be selected is more than structure to be selected
Tables of data C, it is thereby possible to select structured data table B to be selected is used as current structure tables of data.
S407, each current target node in the current structure tables of data is updated to current source Node, return is held
Row is described to access the current goal being connected with each current source Node by the side of preset kind in the heterogeneous information network
The step of node;Namely return to execution step S404;
Specifically, as shown in figure 5, by the current target node film 12 in current structure tables of data B, film 17 and film
18 are updated to current source Node respectively, and perform step S404 to table B returns.
In Figure 5, after step S404 is performed to table B, exemplarily establish altogether and " director-1" and " creation-1" two kinds
The corresponding two structured data tables to be selected in side of type, respectively table D and table E.
It should be noted that in Figure 5, side " director-1" and " creation-1" in subscript " -1 " represent inverse relationship, namely
" director-1" expression " director " inverse relationship.For example, when film 12 passes through side " director-1" when being connected with people 7, illustrate film 12
Directed by people 7;When people 7 is connected by side " director " with film 12, illustrate that people 7 has directed film 12.In addition, structured data table
" " of last column represents unlisted first instance pair in B, D-H.
Similarly, in table D and table E in Figure 5, because the corresponding current target node of each first node is not
Two nodes, therefore, the similarity of each first instance pair are exemplarily labeled as 0.
Further, in Figure 5, because the species of the kind fructification stored in structured data table D to be selected is more than knot to be selected
Structure tables of data E, it is thereby possible to select structured data table D to be selected is as current structure tables of data, and returns to execution step S404.
After step S404 is performed to table D, exemplarily establish corresponding with " creation " and " editor " two kinds of side
Two structured data table F and G to be selected.After step S405 and S406 is performed to table F and G, current structure tables of data is determined
For H.In table H, because the corresponding current target node in first node 1,2 and 3 is Section Point, therefore, first instance pair
(1,2), the similarity of (2,3) and (3,1) can exemplarily be labeled as 1.
S408, when the path length accessed in each current structure tables of data is more than three preset values, or when every
When seed number of entities in one current structure tables of data is less than four preset values, all first path examples determined are counted,
Obtain the corresponding first path of all first path examples.
Wherein, the 3rd preset value can be the maximum length of access path set in advance, and the 4th preset value can be
The minimum value that seed number of entities should be met in structured data table set in advance.
Finally, as shown in table H, exemplarily, it may be determined that it is the 4 important first paths jumped to go out a length:
In the present embodiment, since it is determined that first path for connection kind of fructification pair important first path, therefore, these
First path can more accurately reflect the particular common characteristics of seed inter-entity.When the implementation shown in application embodiment of the present invention Fig. 4
When important first path that first determining method of path that example is provided is determined carries out entity set extension, validity is higher.
Alternatively, it is all also including what is accessed in structured data table to be selected in the embodiment shown in Fig. 4 of the present invention
Node, and by structured data table to be selected by " first instance to, the similarity of the first instance pair and with the first instance pair
The row of the corresponding all nodes accessed " composition is referred to as a tuple, also i.e. by table 2 by " (s, t), σ (s, t | ∏) and
(s ..., t) " composition row be referred to as a tuple.On this basis, after step S404 and before step S405, first road
Footpath determines that method also includes:
Judge each current target node whether be and the current target node where store in tuple accessed
Node;
If not, performing step S405;If it is, by the tuple where the current destination node from corresponding structure to be selected
After being deleted in tables of data, step S405 is performed.
It is seen that, in the present embodiment, due to being also recorded for having accessed in each tuple of structured data table to be selected
All nodes, and it is determined that whether being that the node accessed is sentenced to the destination node during each current target node
Have no progeny, can prevent that the first path determined constitutes loop, and then avoid undying traversal heterogeneous information network, improve member
The determination efficiency in path.
Alternatively, in a kind of embodiment provided in an embodiment of the present invention, step in the embodiment shown in Fig. 4
S406, namely it is described from structured data table to be selected, selection meets the structured data table to be selected of the second preparatory condition as current
Structured data table, including:
From similarity scores are not more than multiple structured data tables to be selected of the first preset value, selection meets the second default bar
The structured data table to be selected of part is used as current structure tables of data.
It is not difficult to find out, in multiple structured data tables to be selected of the first preset value are not more than from similarity scores, selection is full
When the structured data table to be selected of the second preparatory condition of foot is as current structure tables of data, first path searching can be further reduced
Scope, reduces amount of calculation, helps further to save computing resource.
In order to further illustrate a kind of validity of entity set extended method provided in an embodiment of the present invention, applicant is led to
Cross experiment and verified that specific verification process is as follows to this method:
1) object knowledge collection of illustrative plates is determined
Applicant is using classical Yago knowledge mappings as object knowledge collection of illustrative plates, and the data in Yago knowledge mappings are mainly come
Come from wikipedia, wordNet and GeoNames.Current this data set of Yago knowledge mappings have about 10,000,000 entity and
The fact that 120000000, herein main " yagoFacts ", " yagoSimpleTypes " using in Yago knowledge mappings and
" yagoTaxonomy " this three partial data is as data source, comprising 35 kinds of relations in this three partial data, 1.3 million entities,
More than 3000 kinds of entity type.Table 3 lists the specific descriptions of this three partial data.
Table 3
2) checking collection is determined
Applicant have selected representational four classes checking collection to verify entity set extension provided in an embodiment of the present invention altogether
The validity of method, four classes checking collection is as follows:The performer of the film of Glenn Stevens Pierre Burger of taking part in a performance director, positioned at California mountain
The software of Jing Cheng (Mountain View of California) company's production, director obtained countries movies prize
The film of (National Film Award) director, positioned at Massachusetts Cambridge (Cambridge of
Massachusetts the scientist of university), the entity that the checking of this four class is concentrated is designated as respectively:Performer*, software*, film*With
Scientist*, this four class checking concentrate entity number be respectively:112、98、653、202.
3) efficiency evaluation standard
The measurement of effective performance is carried out using p@k and MAP standards.P@k are represented to candidate's entity in candidate entity set
After being sorted by significance level, the percentage of positive example is belonged in preceding k result.
Main herein to use p@30, p@60,90 3 standards of p@are evaluated.MAP standards are p@30, p@60 and p@90 standard
The average value of true rate, is embodied as:Wherein, if the candidate entity of i-th bit is positive example,
reli=1, otherwise, reli=0.
3) comparison other is determined
By a kind of entity set extended method (Meta Path based Entity Set provided in an embodiment of the present invention
Expansion, abbreviation MP_ESE) it is compared with following three kinds of methods:
(1) the entity set extended method based on connection (Link-Based).By in text or webpage based on pattern
The inspiration of method, provides the entity set extended method based on the hop link relation of entity one.
(2) it is based on the entity set extended method of arest neighbors (Nearest-Neighbor).Provide while considering a hop link
With the entity set extended method of an arest neighbors for jumping entity.
(3) path is limited random walk PCRW (Path-Constrained Random Walk, PCRW) entity set expansion
Exhibition method.This method is the method based on path random walk in heterogeneous network, provides the entity set based on 2 hop link relations and expands
Exhibition method.
To every kind of method, three seeds of selection are concentrated to be tested from checking at random, every kind of method is run 30 times and is averaged
As a result it is compared.In entity set extended method provided in an embodiment of the present invention, the first preset value of setting is:m*(m-1)/2+
1, wherein m is plant the quantity of fructification, and the maximum path length in first path is set to 4.
4) the result
The result is as shown in Fig. 6 A to Fig. 6 D, and the entity type that Fig. 6 A to Fig. 6 D are corresponding in turn to is:Performer*, film*, it is soft
Part*, scientist*.When application method provided in an embodiment of the present invention progress entity set extension is can be seen that from Fig. 6 A to Fig. 6 D,
Accuracy rate is intended to high, especially " performer than the basic skills of setting*" and " film*" two classifications.In " performer*" and " film*”
In two classifications, the reason for accuracy rate in the basic skills of setting is low is that the link of a jump or double bounce can not be distinguished well
Fine-grained entity class, and the hop count in first path that method provided in an embodiment of the present invention is used is more, can be well
Fine-grained entity class is distinguished, therefore accuracy rate is high.In " software*" in classification, method provided in an embodiment of the present invention with
The accuracy rate of PCRW methods is close, and reason is " software*" it is an overlapping class, except given entity class, also with other one
The software of the entity class of individual coarseness, i.e. same company production.
In addition, from Fig. 6 A to Fig. 6 D, it can be seen that accuracy rate of the Link-Based algorithms in any one classification is all
Significantly lower than entity set extended method provided in an embodiment of the present invention, reason is that Link-Based algorithms are to be based on a hop link
, and the semantic information that a hop link is included is considerably less, it is impossible to accurately reflect the particular common characteristics of seed inter-entity.And
Entity set extended method provided in an embodiment of the present invention, employs the particular common characteristics that can accurately reflect seed inter-entity
Multi-hop link (first path), therefore the Precise Semantics information of kind of fructification can be captured, and then improve entity set extension
Accuracy rate.
In order to further intuitively illustrate the validity of entity set extended method provided in an embodiment of the present invention, table 4 is listed
Using entity set extended method provided in an embodiment of the present invention in " performer*" in classification, first three the important first road determined
Footpath, as can be seen from Table 4, these yuan of path reflect " performer*" classification kind fructification between it is potential specific common
Feature, may further determine that the more entities for belonging to this classification are used as entity to be extended by the use of these yuan of path.
Table 4
Sum it up, relative to above-mentioned three kinds of basic skills of setting, entity set extension side provided in an embodiment of the present invention
Method is more effective.
Corresponding to above method embodiment, the embodiment of the present invention additionally provides a kind of entity set expanding unit, carries out below
Describe in detail.
As shown in fig. 7, the embodiments of the invention provide a kind of entity set expanding unit, described device includes:Candidate's entity
Collect determining module 701, first path determination module 702, the first significance level determining module 703, the second significance level determining module
704 and entity set expansion module 705;
Candidate's entity set determining module 701, for according to predetermined seed entity set, being taken out from object knowledge collection of illustrative plates
Candidate's entity is taken, and obtained candidate's entity composition candidate's entity set will be extracted;The object knowledge collection of illustrative plates at least includes described
Kind fructification in seed entity set;
First path determination module 702, for from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that planting
First path between fructification;First path is:By entity class between two node types in the heterogeneous information network
The access path of type and relationship type composition;Wherein, described two node types are seed different in the seed entity set
The corresponding node type of entity;
First significance level determining module 703, the quantity of the kind fructification pair for being connected according to every first path is determined
First significance level in every first path;
Second significance level determining module 704, for the first significance level according to every first path, determines the candidate
Second significance level of each candidate's entity in entity set;
Entity set expansion module 705, for by candidate's entity set, second significance level to meet first and preset
Candidate's entity of condition is defined as entity to be extended, and the entity to be extended is added in the seed entity set.
A kind of entity set expanding unit provided in an embodiment of the present invention, on the one hand, by comprising the huge target of data volume
Knowledge mapping carries out entity set extension as data source;On the other hand, from heterogeneous information network corresponding with object knowledge collection of illustrative plates
The middle first path determined between kind of fructification, and since it is determined that first path of each type be a connection kind fructification pair
Path, therefore, these yuan of path can accurately reflect the potential common trait of seed inter-entity, and then utilize the of first path
Second significance level of candidate's entity determined by one significance level more effectively, and then according to the second significance level determine treating
Extend entity also more effectively.So, entity set extended method provided in an embodiment of the present invention can improve entity set extension
Validity.
In a kind of embodiment provided in an embodiment of the present invention, candidate's entity set in the embodiment shown in Fig. 7
Determining module 701 can specifically include:Entity type collection determination sub-module, initial solid set of types determination sub-module and final reality
Body set of types determination sub-module;
Entity type collection determination sub-module, for determining each entity for planting fructification in predetermined seed entity set
Set of types;
Initial solid set of types determination sub-module, for the common factor of all entity type collection to be defined as into initial solid type
Collection;
Final entity type collection determination sub-module, for the level according to each entity type in the initial solid set of types
Relation, determines the corresponding final entity type collection of the seed entity set;It in the object knowledge collection of illustrative plates, will meet described final
The entity of entity type centralized entity type is used as candidate's entity.
More specifically, final entity type collection determination sub-module can include:First determining unit and second determines list
Member.
First determining unit, for determining at least one hierarchical relationship corresponding to the initial solid set of types, wherein,
Any hierarchical relationship is the subordinate relation of at least two entity types;
Second determining unit, for the entity type by the bottom is located in each hierarchical relationship, is defined as final entity
Type, and be final entity type collection by identified final entity type composition.
It is not difficult to find out, in the present embodiment, on the one hand, due to the entity type that initial solid set of types is various fructifications
The common factor of collection, and the entity type in the common factor of the entity type collection of various fructifications can more reflect the common spy of kind of fructification
Levy;On the other hand, due to initial solid type be centrally located at the entity type of the bottom more can representative species fructification semanteme.And
Final candidate entity type collection is determined according to the hierarchical relationship of each entity type in initial solid set of types, therefore, according to
Candidate's entity of final candidate entity type collection selection, more likely has specific common trait, more having can with kind of a fructification
It can be added to as entity to be extended in seed entity set, and then the validity of entity set extension can be better ensured that.
In a kind of embodiment provided in an embodiment of the present invention, first path in the embodiment shown in Fig. 7 is determined
Module 702 can include:Node determination sub-module, spider module and determination sub-module.
Node determination sub-module, for from the heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that
One group of node corresponding with the kind fructification in the seed entity set;
Spider module, for as source node, each node of determination to be traveled through into the heterogeneous information network, when target section
When point is the kind fructification in addition to the source node itself, the path for connecting the source node and the destination node is defined as a member
Path examples;
Determination sub-module, for counting all first path examples determined, and according to all first path examples institutes
Comprising entity type and relationship type, obtain the corresponding first path of all first path examples.
It is not difficult to find out, due to only making identified one group node corresponding with the kind fructification in the seed entity set
For source node, travel through the heterogeneous information network and determine each important first path, therefore, reduce the traversal for determining first path
Scope, so can not only improve the efficiency for determining first path, additionally aid saving computing resource.
As shown in figure 8, in a kind of embodiment provided in an embodiment of the present invention, first path determination module 702 can
With including:Node set determination sub-module 801, first node determination sub-module 802, current source Node determination sub-module 803, treat
Select structured data table setting up submodule 804, the first judging submodule 805, selection submodule 806, renewal submodule 807 and Yuan Lu
Footpath determination sub-module 808;
Node set determination sub-module 801, for from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, really
Fixed node set corresponding with the seed entity set, wherein, the node set include with the seed entity set
Plant the corresponding node of fructification;
First node determination sub-module 802, for regarding each node in the node set as first node;
Current source Node determination sub-module 803, for regarding each first node as current source Node;
Structured data table setting up submodule 804 to be selected, for being accessed and each current source in the heterogeneous information network
The current target node that node is connected by the side of preset kind, sets up multiple structured data tables to be selected corresponding with side type;
First judging submodule 805, for for each structured data table to be selected, judging the structured data table to be selected
In the current target node that is connected with each current source Node whether be Section Point;If it is, by the structured data table to be selected
In the similarity of the corresponding first instance pair of the current source Node be designated as the first numerical value, it is and the current source Node is corresponding
The path accessed is defined as a first path examples, is otherwise designated as second value;Wherein, the Section Point is:The kind
The node that first node corresponding from current source Node is different in fructification set;
Submodule 806 is selected, for from structured data table to be selected, selection to meet the structure number to be selected of the second preparatory condition
Current structure tables of data is used as according to table;Second preparatory condition includes:The kind fructification stored in structured data table to be selected
Most species;
Submodule 807 is updated, for each current target node in the current structure tables of data to be updated to currently
Source node, and trigger structured data table setting up submodule 804 to be selected;
First path determination sub-module 808, for being more than the when the path length accessed in each current structure tables of data
During three preset values, or when the seed number of entities in each current structure tables of data is less than four preset values, statistics is determined
All first path examples gone out, and the entity type and relationship type included according to all first path examples, obtain institute
State the corresponding first path of all first path examples.
Wherein, the 3rd preset value can be the maximum length of access path set in advance, and the 4th preset value can be
The minimum value that seed number of entities should be met in structured data table set in advance.
In the present embodiment, since it is determined that first path for connection kind of fructification pair important first path, therefore, these
First path can more accurately reflect the particular common characteristics of seed inter-entity.When the implementation shown in application embodiment of the present invention Fig. 8
When important first path that the device that example is provided is determined carries out entity set extension, accuracy rate is higher.
Alternatively, it is all also including what is accessed in structured data table to be selected in the embodiment shown in Fig. 8 of the present invention
Node, and by structured data table to be selected by " first instance to, the similarity of the first instance pair and with the first instance pair
The row of the corresponding all nodes accessed " composition is referred to as a tuple.On this basis, structured data table to be selected is being triggered
After setting up submodule 804, before the first judging submodule 805 is triggered, first path determination module 702 can also include:
Second judging submodule, for judge each current target node whether be and the current target node where tuple
The node accessed of middle storage;
Submodule is triggered, in the case of being no in the judged result that the second judging submodule is obtained, knot to be selected is triggered
Structure tables of data setting up submodule 804;In the case where the judged result that the second judging submodule is obtained is to be, by the current goal
After tuple where node is deleted from corresponding structured data table to be selected, structured data table setting up submodule 804 to be selected is triggered.
It is seen that, in the present embodiment, due to being also recorded for having accessed in each tuple of structured data table to be selected
All nodes, and it is determined that whether being that the node accessed is sentenced to the destination node during each current target node
Have no progeny, can prevent that the first path determined constitutes loop, and then avoid undying traversal heterogeneous information network, improve first road
The determination efficiency in footpath.
Alternatively, in a kind of embodiment provided in an embodiment of the present invention, son is selected in the embodiment shown in Fig. 8
Module 806, in being not more than multiple structured data tables to be selected of the first preset value from similarity scores, selection meets the
The structured data table to be selected of two preparatory conditions is used as current structure tables of data.
It is not difficult to find out, in multiple structured data tables to be selected of the first preset value are not more than from similarity scores, selection is full
When the structured data table to be selected of the second preparatory condition of foot is as current structure tables of data, first path searching can be further reduced
Scope, reduces amount of calculation, contributes to the first path of further raising to determine efficiency, save computing resource.
In a kind of embodiment provided in an embodiment of the present invention, the important journey of first in embodiment shown in Fig. 7
Determining module 703 is spent, determines that every first path is connected specifically for all kinds of fructifications pair connected according to every first path
Kind fructification to sum;The kind fructification connected according to every first path determines every to sum and the first preset model
First significance level in first path;
Wherein, first preset model is:The physical significance of wherein each parameter is implemented with the above method
Correspondence is identical in example, and here is omitted.
It is not difficult to find out, the kind fructification that the first significance level is connected with first path is proportionate to sum, first path institute
The kind fructification of connection is to more, and the particular common characteristics of seed inter-entity can be reflected by illustrating that the first path of this is got over, therefore, according to
The first importance value that the kind fructification that first path is connected is determined to sum is more accurate.
In a kind of embodiment provided in an embodiment of the present invention, the important journey of second in embodiment shown in Fig. 7
Determining module 704 is spent, for the first significance level and the second preset model according to every first path, candidate's entity is determined
Second significance level of each candidate's entity concentrated;
Wherein, second preset model is:
The physical significance of wherein each parameter is identical with correspondence in above method embodiment, and here is omitted.
It is seen that, the second significance level and the first significance level correlation, due to the of a certain article of member path
One significance level is bigger, and the particular common characteristics of seed inter-entity can be reflected by illustrating that this yuan of path is got over, therefore, important according to first
Second significance level of candidate's entity that degree is determined is more accurate.
In a kind of embodiment provided in an embodiment of the present invention, the entity set extension in the embodiment shown in Fig. 7
Module 705, specifically for by candidate's entity set, candidate's entity that second significance level is more than the second preset value is true
It is set to entity to be extended.
In another embodiment provided in an embodiment of the present invention, the entity set in the embodiment shown in Fig. 7 expands
Module 705 is opened up, specifically for according to second significance level, being carried out in descending order to candidate's entity in candidate's entity set
Sequence, obtains first candidate's entity set;Also, preceding first predetermined number of sequence is chosen from the first candidate entity set
Candidate's entity be used as entity to be extended.
Applicant uses corresponding according to the object knowledge collection of illustrative plates to the entity to be extended of the first selected predetermined number
Sequence index carry out validation verification, it was confirmed that the validity of this method.
In the above two embodiments, it is that entity to be extended is determined according to the second significance level, due to the second important journey
Degree can correctly reflect the particular common characteristics of candidate's entity and seed inter-entity, therefore, be determined according to the second significance level
Entity to be extended more effectively, it is ensured that entity extension validity.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposited between operating
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those
Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device
Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the scope of the present invention.It is all
Any modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of entity set extended method, it is characterised in that methods described includes:
According to predetermined seed entity set, candidate's entity is extracted from object knowledge collection of illustrative plates, and obtained candidate will be extracted
Entity constitutes candidate's entity set;The object knowledge collection of illustrative plates at least includes the kind fructification in the seed entity set;
From heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that planting first path between fructification;The member
Path is:The link road being made up of between two node types in the heterogeneous information network entity type and relationship type
Footpath;Wherein, described two node types are the corresponding node type of kind fructification different in the seed entity set;
The quantity of the kind fructification pair connected according to every first path determines first significance level in every first path;
According to first significance level in every first path, determine that second of each candidate's entity in candidate's entity set is important
Degree;
By in candidate's entity set, candidate's entity that second significance level meets the first preparatory condition is defined as waiting to extend
Entity, and the entity to be extended is added in the seed entity set.
2. according to the method described in claim 1, it is characterised in that described according to predetermined seed entity set, from target
Candidate's entity is extracted in knowledge mapping, including:
Determine each entity type collection for planting fructification in predetermined seed entity set;
The common factor of all entity type collection is defined as initial solid set of types;
According to the hierarchical relationship of each entity type in the initial solid set of types, determine that the seed entity set is corresponding final
Entity type collection;The entity of the final entity type centralized entity type in the object knowledge collection of illustrative plates, will be met as time
Select entity.
3. method according to claim 2, it is characterised in that described according to each entity class in the initial solid set of types
The hierarchical relationship of type, it is determined that final entity type collection, including:
At least one hierarchical relationship corresponding to the initial solid set of types is determined, wherein, any hierarchical relationship is at least two
The subordinate relation of individual entity type;
The entity type of the bottom will be located in each hierarchical relationship, be defined as final entity type, and will be identified final
Entity type composition is final entity type collection.
4. according to the method described in claim 1, it is characterised in that described from heterogeneous letter corresponding with the object knowledge collection of illustrative plates
Cease in network, it is determined that first path between fructification is planted, including:
From heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that set of node corresponding with the seed entity set
Close, wherein, the node set includes node corresponding with the kind fructification in the seed entity set;
It regard each node in the node set as first node;
Using each first node as current source Node, access and pass through with each current source Node in the heterogeneous information network
The current target node of the side connection of preset kind, sets up multiple structured data tables to be selected corresponding with side type;Wherein, it is any
Structured data table to be selected includes:By each first node with by the structured data table to be selected it is corresponding while type while connect
The first instance of current target node composition is to, the similarity of each first instance pair, the path accessed and similitude
Fraction;The similarity scores are the summation of the similarity of all first instances pair;
For each structured data table to be selected, judge that what is be connected in the structured data table to be selected with each current source Node works as
Whether preceding destination node is Section Point;If it is, by current source Node in the structured data table to be selected corresponding first in fact
The similarity of body pair is designated as the first numerical value, and the corresponding path accessed of the current source Node is defined as into a Tiao Yuan roads
Footpath example, is otherwise designated as second value;Wherein, the Section Point is:It is corresponding with current source Node in the node set
The different node of first node;
From structured data table to be selected, the structured data table to be selected that selection meets the second preparatory condition is used as current structure data
Table;Second preparatory condition includes:The most species of the kind fructification stored in structured data table to be selected;When the kind stored
When the structured data table to be selected of fructification most species has multiple, second preparatory condition also includes:Structured data table to be selected
The minimum number of the first instance pair of middle storage;
Each current target node in the current structure tables of data is updated to current source Node, returned described in performing in institute
The step of stating the current target node that access is connected with each current source Node by the side of preset kind in heterogeneous information network;
When the path length accessed in each current structure tables of data is more than three preset values, or when each current structure
When seed number of entities in tables of data is less than four preset values, all first path examples determined are counted, and according to described
Entity type and relationship type that all first path examples are included, obtain the corresponding first path of all first path examples.
5. method according to claim 4, it is characterised in that described from structured data table to be selected, selection meets second
The structured data table to be selected of preparatory condition as current structure tables of data, including:
From similarity scores are not more than multiple structured data tables to be selected of the first preset value, selection meets the second preparatory condition
Structured data table to be selected is used as current structure tables of data.
6. the method according to claim any one of 1-4, it is characterised in that the seed connected according to every first path
The quantity of entity pair determines first significance level in every first path, including:
The kind fructification that the first path of all kinds of fructifications pair determination every connected according to every first path is connected is to sum;
The kind fructification connected according to every first path determines first weight in every first path to sum and the first preset model
Want degree;
Wherein, first preset model is:
Wherein, WkFor first path PkCorresponding first significance level, l is the bar number in first path;SPk
For first path PkThe kind fructification connected is to sum, and m is the quantity of kind of fructification,For the total quantity of kind of fructification pair.
7. the method according to claim any one of 1-4, it is characterised in that described important according to the first of every first path
Degree, determines the second significance level of each candidate's entity in candidate's entity set, including:
According to first significance level and the second preset model in every first path, each candidate in candidate's entity set is determined
Second significance level of entity;
Wherein, second preset model is:
sj∈ S, i ∈ { 1,2,3 ..., n }, wherein, R (ci, S) and represent candidate's entity
ciThe second significance level, n be candidate's entity quantity;sjKind of a fructification is represented, S represents the seed entity set, and m is seed
The quantity of entity;WkFor first path PkCorresponding first significance level, l is the bar number in first path;r{(ci,sj)|PkRepresent first road
Footpath PkWhether connection kind fructification sjWith candidate's entity ci, if it is, r=1, otherwise, r=0.
It is described 8. the method according to claim any one of 1-4, it is characterised in that described by candidate's entity set
Candidate's entity that second significance level meets the first preparatory condition is defined as entity to be extended, including:
By in candidate's entity set, candidate's entity that second significance level is more than the second preset value is defined as treating that extension is real
Body.
It is described 9. the method according to claim any one of 1-4, it is characterised in that described by candidate's entity set
Candidate's entity that second significance level meets the first preparatory condition is defined as entity to be extended, including:
According to second significance level, candidate's entity in candidate's entity set is ranked up in descending order, first is obtained
Candidate's entity set;Also, candidate's entity that preceding first predetermined number of sequence is chosen from the first candidate entity set is made
For entity to be extended.
10. a kind of entity set expanding unit, it is characterised in that described device includes:
Candidate's entity set determining module, for according to predetermined seed entity set, candidate to be extracted from object knowledge collection of illustrative plates
Entity, and obtained candidate's entity composition candidate's entity set will be extracted;It is real that the object knowledge collection of illustrative plates at least includes the seed
The kind fructification that body is concentrated;
First path determination module, for from heterogeneous information network corresponding with the object knowledge collection of illustrative plates, it is determined that planting fructification
Between first path;First path is:By entity type and pass between two node types in the heterogeneous information network
The access path of set type composition;Wherein, described two node types are kind fructification pair different in the seed entity set
The node type answered;
First significance level determining module, the quantity of the kind fructification pair for being connected according to every first path is determined per Tiao Yuan roads
First significance level in footpath;
Second significance level determining module, for the first significance level according to every first path, determines candidate's entity set
In each candidate's entity the second significance level;
Entity set expansion module, for by candidate's entity set, second significance level to meet the first preparatory condition
Candidate's entity is defined as entity to be extended, and the entity to be extended is added in the seed entity set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710168839.XA CN106951526B (en) | 2017-03-21 | 2017-03-21 | Entity set extension method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710168839.XA CN106951526B (en) | 2017-03-21 | 2017-03-21 | Entity set extension method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951526A true CN106951526A (en) | 2017-07-14 |
CN106951526B CN106951526B (en) | 2020-08-07 |
Family
ID=59472639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710168839.XA Active CN106951526B (en) | 2017-03-21 | 2017-03-21 | Entity set extension method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951526B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609152A (en) * | 2017-09-22 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | Method and apparatus for expanding query formula |
CN109145119A (en) * | 2018-07-02 | 2019-01-04 | 北京妙医佳信息技术有限公司 | The knowledge mapping construction device and construction method of health management arts |
CN110019826A (en) * | 2017-07-27 | 2019-07-16 | 北大医疗信息技术有限公司 | Construction method, construction device, equipment and the storage medium of medical knowledge map |
CN111488467A (en) * | 2020-04-30 | 2020-08-04 | 北京建筑大学 | Construction method and device of geographical knowledge graph, storage medium and computer equipment |
CN112463974A (en) * | 2019-09-09 | 2021-03-09 | 华为技术有限公司 | Method and device for establishing knowledge graph |
CN113052968A (en) * | 2021-04-30 | 2021-06-29 | 电子科技大学 | Knowledge graph construction method of three-dimensional structure geological model |
CN113221572A (en) * | 2021-05-31 | 2021-08-06 | 北京字节跳动网络技术有限公司 | Information processing method, device, equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270458A1 (en) * | 2007-04-24 | 2008-10-30 | Gvelesiani Aleksandr L | Systems and methods for displaying information about business related entities |
CN102844755A (en) * | 2010-04-27 | 2012-12-26 | 惠普发展公司,有限责任合伙企业 | Method of extracting named entity |
CN103488724A (en) * | 2013-09-16 | 2014-01-01 | 复旦大学 | Book-oriented reading field knowledge map construction method |
CN105913125A (en) * | 2016-04-12 | 2016-08-31 | 北京邮电大学 | Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device |
-
2017
- 2017-03-21 CN CN201710168839.XA patent/CN106951526B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270458A1 (en) * | 2007-04-24 | 2008-10-30 | Gvelesiani Aleksandr L | Systems and methods for displaying information about business related entities |
CN102844755A (en) * | 2010-04-27 | 2012-12-26 | 惠普发展公司,有限责任合伙企业 | Method of extracting named entity |
CN103488724A (en) * | 2013-09-16 | 2014-01-01 | 复旦大学 | Book-oriented reading field knowledge map construction method |
CN105913125A (en) * | 2016-04-12 | 2016-08-31 | 北京邮电大学 | Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019826A (en) * | 2017-07-27 | 2019-07-16 | 北大医疗信息技术有限公司 | Construction method, construction device, equipment and the storage medium of medical knowledge map |
CN110019826B (en) * | 2017-07-27 | 2023-02-28 | 北大医疗信息技术有限公司 | Construction method, construction device, equipment and storage medium of medical knowledge map |
CN107609152A (en) * | 2017-09-22 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | Method and apparatus for expanding query formula |
CN107609152B (en) * | 2017-09-22 | 2021-03-09 | 百度在线网络技术(北京)有限公司 | Method and apparatus for expanding query expressions |
CN109145119A (en) * | 2018-07-02 | 2019-01-04 | 北京妙医佳信息技术有限公司 | The knowledge mapping construction device and construction method of health management arts |
CN112463974A (en) * | 2019-09-09 | 2021-03-09 | 华为技术有限公司 | Method and device for establishing knowledge graph |
CN111488467A (en) * | 2020-04-30 | 2020-08-04 | 北京建筑大学 | Construction method and device of geographical knowledge graph, storage medium and computer equipment |
CN113052968A (en) * | 2021-04-30 | 2021-06-29 | 电子科技大学 | Knowledge graph construction method of three-dimensional structure geological model |
CN113052968B (en) * | 2021-04-30 | 2022-08-05 | 电子科技大学 | Knowledge graph construction method of three-dimensional structure geological model |
CN113221572A (en) * | 2021-05-31 | 2021-08-06 | 北京字节跳动网络技术有限公司 | Information processing method, device, equipment and medium |
CN113221572B (en) * | 2021-05-31 | 2024-05-07 | 抖音视界有限公司 | Information processing method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN106951526B (en) | 2020-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951526A (en) | A kind of entity set extended method and device | |
CN106250412B (en) | Knowledge mapping construction method based on the fusion of multi-source entity | |
Yin et al. | Building taxonomy of web search intents for name entity queries | |
Jin et al. | Distance-constraint reachability computation in uncertain graphs | |
CN103927302B (en) | A kind of file classification method and system | |
CN110309289A (en) | Sentence generation method, sentence generation device and intelligent equipment | |
CN103902545B (en) | A kind of classification path identification method and system | |
JP2011258235A (en) | System and method for ranking result of search by using click distance | |
CN111027743B (en) | OD optimal path searching method and device based on hierarchical road network | |
CN108520166A (en) | A kind of drug targets prediction technique based on multiple similitude network wandering | |
CN106951524A (en) | Overlapping community discovery method based on node influence power | |
CN106407302A (en) | Method for supporting function of calling specific functions of middleware database through simple SQL | |
CN104158748B (en) | A kind of topological detecting method towards system for cloud computing | |
CN104133868B (en) | A kind of strategy integrated for the classification of vertical reptile data | |
CN108345609A (en) | A kind of method and apparatus of processing POI information | |
CN103810260A (en) | Complex network community discovery method based on topological characteristics | |
CN110119478A (en) | A kind of item recommendation method based on similarity of a variety of user feedback datas of combination | |
Agarwal et al. | A social identity approach to identify familiar strangers in a social network | |
Zhang et al. | Exploring time factors in measuring the scientific impact of scholars | |
Ahmadi et al. | Unsupervised matching of data and text | |
CN106649731A (en) | Node similarity searching method based on large-scale attribute network | |
Hutair et al. | Social community detection based on node distance and interest | |
CN106126681B (en) | A kind of increment type stream data clustering method and system | |
CN107133274A (en) | A kind of distributed information retrieval set option method based on figure knowledge base | |
Partyka et al. | Semantic schema matching without shared instances |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |