CN105913125A - Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device - Google Patents
Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device Download PDFInfo
- Publication number
- CN105913125A CN105913125A CN201610225725.XA CN201610225725A CN105913125A CN 105913125 A CN105913125 A CN 105913125A CN 201610225725 A CN201610225725 A CN 201610225725A CN 105913125 A CN105913125 A CN 105913125A
- Authority
- CN
- China
- Prior art keywords
- path
- data structure
- structure body
- module
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
Landscapes
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Algebra (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An embodiment of the invention provides a heterogeneous information network element determining method, a link prediction method, a heterogeneous information network element determining device and a link prediction device. The heterogeneous information network element determining method comprises the steps of creating a first data structural body and inserting into a candidate set; selecting one data structural body from the candidate set according to the value of a comprehensive similarity score of the data structural body which is stored in the candidate set, checking whether a third entity pair which is same with a random first entity pair exists in the data structural body; if yes, storing a metapath which is connected with the third entity pair in the data structural body into a metapath set, deleting the data structural body in the candidate set, and continuously selecting a next data structural body from the candidate set; and otherwise, creating a third data structural body and inserting into the candidate set, and then continuously selecting a next data structural body from the candidate set until the candidate set is empty. According to the heterogeneous information network element determining method and the heterogeneous information network element determining device, the useful metapath can be quickly and accurately determined. According to the link prediction method and the link prediction device, more accurate prediction results can be obtained.
Description
Technical field
The first path that the present invention relates in heterogeneous information network determines technical field, particularly relates to a kind of unit path and determines
Method and device.
Background technology
In recent years, the research to heterogeneous information network is increasingly burning hoter, and many data mining work are all at heterogeneous information
Network is carried out.Heterogeneous information network (Heterogeneous Information Network) refers to, the entity in network
The network of relationship type | R | the > 1 between object type | A | > 1 or link different entities object, in a network, a node
Representing an entity object (abbreviation entity), a limit represents the relation between two entity objects connected by this edge.
Link prediction is to carry out the basis of data mining, such as data cleansing and recommendation etc. from heterogeneous information network.Chain
The general process of road prediction is: the multiple training entities pair linked by certain edges thereof determining in heterogeneous information network, enumerates this
All units path of a little training entities pair;Forecast model is set up, according to prediction in first path according to the training entity pair enumerated
Model calculates the entity to be predicted probability to being linked by above-mentioned specific unit path, when this probability is more than preset value when, says
Bright entity to be predicted links by this certain edges thereof.
Unit path refers to, the sequence in the different paths connecting two entities in heterogeneous information network is combined, and represents really
Semantic relation between body.Article one, first path ∏ is defined asIt describes
At node R1And Rl+1Between, by a series of node R1,…,Rl+1With chain roadside L1,…,LlA paths.
Being designated as example with the Dbpedia knowledge graph shown in Fig. 1, it contains a lot of different types of node and limit, such as node
Person, City, Country, limit bornIn, locatedIn, diedIn, hasCapital-1;Two nodes can be by a plurality of
Unit path is chained up, and such as, first path of link two nodes of Person and Country has and two: one is,Two are,Unit
Path is widely used in link prediction.Therefore, the top priority carrying out link prediction is, determines in heterogeneous information network
Unit path.
Mostly current existing link prediction method is for the heterogeneous information network under simple mode, such as, ecommerce
The heterogeneous information network of a kind of two merotypes being made up of user and article in website;Or in bibliographic data base DBLP, by discussing
Literary composition, author link together with meeting set key word, the heterogeneous information network of a kind of Star Schema of composition.At these networks
In, can manually be enumerated out for building first path of link prediction model.But, many actual heterogeneous information nets
Network is the most complicated, and its network structure cannot be described with simple network schemer so that be used for building link prediction model
First path also cannot be enumerated out, even and if be enumerated out, can produce is not much critically important first path yet.
Such as, Dbpedia, it is a kind of knowledge mapping, have recorded and surpasses 38,000,000 objects and 3,000,000,000 relations, wants first road of this network
It is impossible that footpath enumerates out.
Visible, need a kind of first determining method of path for the heterogeneous information network under complex patterns of proposition badly, and utilize
These yuan of path construction link prediction model carries out the link prediction in the heterogeneous information network under complex patterns.
Summary of the invention
The purpose of the embodiment of the present invention is to provide heterogeneous information network element path to determine, link prediction method and device,
To determine first path of the heterogeneous information network under complex patterns quickly and accurately, and utilize these yuan of path construction link pre-
Survey model and carry out the link prediction in the heterogeneous information network under complex patterns.
To achieve these goals, the embodiment of the invention discloses a kind of heterogeneous information network element determining method of path, institute
The method of stating includes:
S101, determine in heterogeneous information network to be determined unit path multiple first instances pair, wherein, each described first
Entity is to including source node and destination node, and each described first instance is at least being connect by the side chain of the first preset kind;
S102, according to the plurality of first instance to determining primary data structure body;Described primary data structure body includes:
The entity pair self being made up of with this source node the source node of each described first instance centering;
S103, according to the limit type in described heterogeneous information network, generate multiple first candidate unit paths that jumping figure is 1,
After each described first candidate unit path execution of step A to step D, execution step S104:
A. according to described heterogeneous information network, described primary data structure body and described first candidate unit path, quilt is generated
Multiple second instances pair of described first candidate unit path link;Wherein, the source node of described second instance pair be described initially
The source node of the entity pair in data structure body, the destination node of described second instance pair is except institute in described heterogeneous information network
State the node outside the source node of first instance centering;
B. each described second instance is calculated to by during described first candidate unit path link according to the first preset model
Similarity measure values;By described first candidate unit path, each described second instance to and correspondence similarity measure values protect
Deposit to the first data structure body;
C. calculate the comprehensive similarity scores of described first data structure body according to the second preset model and preserve to described
First data structure body;
D. described first data structure body is inserted candidate collection;
S104, size according to described comprehensive similarity scores, concentrate from described candidate and select a data structure, note
It it is the second data structure body;Whether the second data structure body described in procuratorial work exists with arbitrary described first instance identical
Three entities pair;
S105 if it does, by described second data structure body, links first path of described 3rd entity pair and described
Correspondence is preserved to unit's path collection by the 3rd entity, deletes the described second data structure body that described candidate is concentrated, and performs step
S104;
S106 is if it does not, according to the second candidate unit path and described different preserved in described second data structure body
Limit type in matter information network, generates multiple 3rd candidate unit path, the jumping figure in described 3rd candidate unit path and described the
The difference of the jumping figure in two candidate unit paths is 1;Delete the described second data structure body that described candidate is concentrated;To each described 3rd
After candidate unit path execution of step E to H, perform step S104;
E, according to described heterogeneous information network, described second data structure body and described 3rd candidate unit path, generate quilt
Multiple 4th entities pair that described 3rd candidate unit path connects, the source node of described 4th entity pair is described second data knot
The source node of the entity pair in structure body, the destination node of described 4th entity pair is except described first in described heterogeneous information network
Node outside the source node of entity pair;
F, according to described first preset model calculate each described 4th entity link by described 3rd candidate unit path
Time similarity measure values, by described 3rd candidate unit path, each described 4th entity to and the similarity measurement of correspondence
Value preserves to the 3rd data structure body;
G, the comprehensive similarity scores calculating described 3rd data structure body according to described second preset model preservation are extremely
Described 3rd data structure body;
H, by described 3rd data structure body insert described candidate collection;
Wherein, described first preset model is:Wherein, σ
(s,ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;∏1…iRepresent linked source
Node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path ∏1…i-1Upstream
Walking the set of accessibility destination node, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through limit Ri-1Arrive
Destination node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second preset model is:Wherein, S represents combining of data structure body
Close similarity scores;S is source node, t be by unit path ∏ up to destination node, τ is the number up to destination node;σ
(s, t | ∏) it is that entity is to (s, t) similarity measure values on unit path ∏;R (s)=1-α * N, r (s) represent source node s pair
In the contribution ability of current data structure with the selection of balanced structure body, α is the degradation factor of contribution ability, and N represents and protects
Deposit the source node number as the described first instance pair of s of the first path link integrated to described unit path.
The embodiment of the invention also discloses a kind of link prediction method, the method for described link prediction includes:
Determine entity pair to be predicted;
According to the 4th preset model and described unit path collection, determine that described entity to be predicted presets class to by described first
The probability that the side chain of type connects;Described 4th preset model is:Wherein, η (s, t |
It is γ) that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s, t) is described entity pair to be predicted,
Wherein s is source node, and t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ(s,t|∏i) it is
Described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt is unit path ∏iWeight;ω0
For correction factor;
Judge whether described probability is more than the 3rd preset value, if it is, determine that described 3rd entity is to by described default class
The limit of type connects.
The embodiment of the invention also discloses a kind of heterogeneous information network element path and determine that device, described device include: first
Determine module, second determine module, the first trigger module, the 3rd determine module, the first computing module, the second computing module,
One insert module, first select module, the second trigger module, the 3rd trigger module, the 4th determine module, the 3rd computing module,
4th computing module and second inserts module,
Described first determines module, for determining multiple first instances pair in unit to be determined path in heterogeneous information network,
Wherein, each described first instance is to including source node and destination node, and each described first instance is at least being preset by first
The side chain of type connects;
Described second determines module, is used for according to the plurality of first instance determining primary data structure body;At the beginning of described
Beginning data structure body includes: the entity pair self being made up of with this source node the source node of each described first instance centering;
Described first trigger module, for according to the limit type in described heterogeneous information network, generate jumping figure be 1 multiple
First candidate unit path, triggers the described 3rd successively and determines module, described first calculating each described first candidate unit path
After module, described second computing module and described first insert module, trigger described first and select module;
Described 3rd determines module, for according to described heterogeneous information network, described primary data structure body and described the
One candidate unit path, generates by multiple second instances pair of described first candidate unit path link;Wherein, described second instance pair
The source node that source node is the entity pair in described primary data structure body, the destination node of described second instance pair is described
Node in addition to the source node of described first instance centering in heterogeneous information network;
Described first computing module, for calculating each described second instance to by described first candidate according to the first preset model
Similarity measure values during unit's path link;By described first candidate unit path, each described second instance to and the similarity of correspondence
Metric preserves to the first data structure body;Wherein, described first preset model is:
Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;∏1…iRepresent
Linked source node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path
∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through
Limit Ri-1Arrive destination node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second computing module, for calculating the comprehensive similarity scores of described first data structure body according to the second preset model
And preserve to described first data structure body;Wherein, described second preset model is:
Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to destination node,
τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measure values on unit path ∏;R (s)=
1-α * N, r (s) represent that source node s contributes ability with the selection of balanced structure body for current data structure, and α is contribution energy
The degradation factor of power, N represents that the source node preserving the first path link integrated to described unit path is as the described first instance pair of s
Number;
Described first inserts module, for described first data structure body is inserted candidate collection;
Described first selects module, for the size according to described comprehensive similarity scores, concentrates from described candidate and selects
One data structure, is designated as the second data structure body;Whether the second data structure body described in procuratorial work exist described with arbitrary
First instance is to the 3rd identical entity pair;
Described second trigger module, in the case of the inspection result in described first selection module acquisition is for being, will
In described second data structure body, correspondence is preserved to unit road by the first path and described 3rd entity that link described 3rd entity pair
Footpath collection, deletes the described second data structure body that described candidate is concentrated, and triggers described first selection module;
Described 3rd trigger module, in the case of the described first inspection result selecting module to obtain is no, according to
Limit type in the second candidate unit path preserved in described second data structure body and described heterogeneous information network, generates multiple
3rd candidate unit path, the jumping figure in described 3rd candidate unit path is 1 with the difference of the jumping figure in described second candidate unit path;Delete
The described second data structure body that described candidate is concentrated;Each described 3rd candidate unit path is triggered the described 4th successively determine
After module, described 3rd computing module, described 4th computing module and described second insert module, trigger described first and select mould
Block;
Described 4th determines module, for according to described heterogeneous information network, described second data structure body and described the
Three candidate unit paths, generate multiple 4th entities pair connected by described 3rd candidate unit path, the source of described 4th entity pair
Node is the source node of the entity pair in described second data structure body, and the destination node of described 4th entity pair is described heterogeneous
Node in addition to the source node of described first instance pair in information network;
Described 3rd computing module, for calculating each described 4th entity to described according to described first preset model
Similarity measure values during the 3rd candidate unit path link, by described 3rd candidate unit path, each described 4th entity to and
The similarity measure values of its correspondence preserves to the 3rd data structure body;
Described 4th computing module, for calculating the comprehensive of described 3rd data structure body according to described second preset model
Similarity scores also preserves to described 3rd data structure body;
Described second inserts module, for described 3rd data structure body is inserted described candidate collection.
The embodiment of the invention also discloses a kind of link prediction device, the device of described link prediction includes: reality to be predicted
Body to determining module, probability determination module and the 4th judge module,
Described entity to be predicted, to determining module, is used for determining entity pair to be predicted;
Described probability determination module, for according to the 4th preset model and described unit path collection, determining described entity pair to be predicted
To the probability connect by the side chain of described first preset kind;Described 4th preset model is:
Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s, t) be described in treat
Prediction entity pair, wherein s is source node, and t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ
(s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt it is unit path
∏iWeight;ω0For correction factor;
Described 4th judge module, be used for judging described probability whether more than the 3rd preset value, if it is, determine described the
Three entities connect by the limit of described preset kind.
The heterogeneous information network element path that the embodiment of the present invention provides determines, link prediction method and device, can create
First data structure body also inserts candidate collection;Comprehensive similarity scores big of the data structure body preserved is concentrated according to candidate
Little, concentrate from candidate successively and select the data structure body that comprehensive similarity scores is big, be designated as the second data structure body, and procuratorial work should
Whether data structure body exists with arbitrary described first instance the 3rd identical entity pair;If it does, by described second
In data structure body, correspondence is preserved to unit's path collection by the first path and described 3rd entity that link described 3rd entity pair, deletes
The described second data structure body concentrated except described candidate, and continue the size according to comprehensive similarity scores from candidate concentration choosing
Select next data structure body;If it does not exist, then create the 3rd data structure body and insert candidate collection, then proceed to according to combining
The size closing similarity scores concentrates the next data structure body of selection from candidate;Until candidate collection is empty.Owing to unit path is true
Determine the size of comprehensive similarity scores that method and device is the data structure body concentrated according to candidate, strong according to degree of correlation
Weak, determine the link first instance more relevant first path to (train to) the most successively;Therefore, this
First determining method of path of bright offer and device, not only determine that the efficiency in unit path is high, and the first path determined be more useful;By
It is first path construction that the first determining method of path utilizing the embodiment of the present invention to provide determines in link prediction method and device
Forecast model, therefore, application the embodiment of the present invention provide a kind of link prediction method and device, it is thus achieved that predict the outcome more
Accurately.Certainly, arbitrary product or the method for implementing the present invention must be not necessarily required to reach all the above advantage simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to
Other accompanying drawing is obtained according to these accompanying drawings.
The flow chart of a kind of heterogeneous information network element determining method of path that Fig. 1 provides for the embodiment of the present invention;
Fig. 2 is step A in the embodiment shown in Fig. 1 to the flow chart of step D;
Fig. 3 is step E in the embodiment shown in Fig. 1 to the flow chart of step H;
A heterogeneous information network subgraph in the actual application that Fig. 4 provides for the embodiment of the present invention;
A kind of heterogeneous information network element determining method of path signal in the actual application that Fig. 5 provides for the embodiment of the present invention
Figure;
The flow chart of a kind of link prediction method that Fig. 6 provides for the embodiment of the present invention;
Fig. 7 carries out predicting the outcome of link prediction for a kind of link prediction method that the application embodiment of the present invention provides and compares
Figure;
Fig. 8 determines the structure chart of device for a kind of heterogeneous information network element path that the embodiment of the present invention provides;
The structure chart of a kind of link prediction device that Fig. 9 provides for the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of protection of the invention.
Embodiments provide a kind of heterogeneous information network element determining method of path and device, be applied to server;
Present invention also offers a kind of link prediction method and device, be applied to server.Illustrate separately below.
First a kind of heterogeneous information network element determining method of path is illustrated.
Referring to figs. 1 to Fig. 3, the present invention implements to provide a kind of heterogeneous information network element determining method of path, and the method can
To include:
S101, determine in heterogeneous information network to be determined unit path multiple first instances pair, wherein, each described first
Entity is to including source node and destination node, and each described first instance is at least being connect by the side chain of the first preset kind;
Owing to the general process of link prediction is: determine that the multiple training linked by certain edges thereof in heterogeneous information network are real
Body pair, enumerates all units path of these training entities pair;Prediction is set up in first path according to the training entity pair enumerated
Model, calculates the entity to be predicted probability to being linked by above-mentioned specific unit path according to forecast model, when this probability is more than pre-
If the when of value, illustrating that entity to be predicted links by this certain edges thereof.
Therefore, when unit path be link prediction service time, first instance is right, by first instance to being referred to as training
Set to composition can be referred to as training set;The certain types of limit that the limit of the first preset kind is in link prediction;
The quantity of first instance pair can be determined according to practical situation;The present invention is shown by experimental verification: when
When the quantity of one entity pair is less than 10, link prediction accuracy rate quickly improves along with the increasing of quantity of first instance pair;So
And, when the quantity of first instance pair is more than 10, the accuracy predicted the outcome is not had by the size of the quantity of first instance pair substantially
There is anything to affect.Through analyzing, causing the reason of this phenomenon is that the training set that the quantity of first instance pair is the least can not comprise institute
There is important first path, and the quantity of excessive first instance pair can introduce noise.It is preferred, therefore, that the number of first instance pair
Amount, in the interval of 10 to 20, is possible not only in this interval can find all useful first paths well, and is avoided that and draws
Enter too much noise, in addition, so can also save more time and space resources;
Concrete, in order to a kind of heterogeneous information network element path side of determination that the embodiment of the present invention provides is described more intuitively
Method, Fig. 4 provides a heterogeneous information network subgraph in an actual application, and this heterogeneous information network includes:
IsCitizenOf, WorkAt, wasBornIn, isLocatedin and Owns five limit of type, by therein
The limit of " isCitizenOf " this type is defined as the limit of the first preset kind, by (1,8), (2,8), (3,9), (4,9) this four
Individual entity is to being defined as first instance to (train to).
Below in conjunction with Fig. 4 and Fig. 5, step S102 to S106 is specifically described.
S102, according to the plurality of first instance to determining primary data structure body;Described primary data structure body includes:
The entity pair self being made up of with this source node the source node of each described first instance centering;
After first instance is to determining, embodiment use " data structure body " that the present invention provides records and determines link
The process in first path of first instance pair;
First determining a primary data structure body, this primary data structure body includes: by each first instance centering
The entity pair of source node and this source node self composition;Such as table No.1 in Fig. 5.
S103, according to the limit type in described heterogeneous information network, generate multiple first candidate unit paths that jumping figure is 1,
After each described first candidate unit path execution of step A to step D, perform step S104
Limit type in heterogeneous information network has how many kinds of, then generate the first candidate unit path that how many jumping figures are 1;Example
As, as it is shown in figure 5, according to there being the limit of five types in the heterogeneous information network shown in Fig. 4, then generating 5 jumping figures is the first of 1
Candidate unit path:With
Each described first candidate unit path is performed step A to step D:
A. according to described heterogeneous information network, described primary data structure body and described first candidate unit path, quilt is generated
Multiple second instances pair of described first candidate unit path link;Wherein, the source node of described second instance pair be described initially
The source node of the entity pair in data structure body, the destination node of described second instance pair is except institute in described heterogeneous information network
State the node outside the source node of first instance centering;
Wherein, second instance is to being being linked by described first candidate unit path of necessary being in described heterogeneous information network
Entity pair.
Such as, according in primary data structure body surface No.1 and Fig. 5 in the heterogeneous information network subgraph shown in Fig. 4, Fig. 5
EntitledThe first candidate unit path, generate quiltThe of first path link that this article 1 is jumped
Two entities pair: (1,8), (2,8), (3,9) and (4,9);
In like manner, quilt is generatedThis 1 jump first path link second instance pair: (1,5), (2,6) and
(3,7);Generate quiltThe second instance pair of this 1 first path link jumped: (4,6) and (4,7);
Due to the heterogeneous information network subgraph shown in Fig. 4 does not exist by "With" these two 1
The entity pair of the first path link jumped, therefore cannot generate by the entity pair of these two the first candidate units path link.
B. each described second instance is calculated to by during described first candidate unit path link according to the first preset model
Similarity measure values;By described first candidate unit path, each described second instance to and correspondence similarity measure values protect
Deposit to the first data structure body;
First preset model is the similarity measurements quantity algorithm PCRW (Path-Constrained disclosed in prior art
Random Walk);
First preset model particularly as follows:
Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;Represent linked source node s and destination node tiFirst path of jumping of i-1,
ViRepresent and start at unit path ∏ from source node s1…i-1The accessibility destination node of upper migration;I(Vi-1) represent and open from source node s
Begin at unit path ∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) represent
Whether can pass through limit Ri-1Arrive destination node ti, if R is (x, ti) equal to 1, otherwise R (x, ti) equal to 0;R (x) represents
Node x passes through limit Ri-1Accessibility number of network nodes;
Such as, the quilt calculated according to the first preset modelLink second instance to (1,8), (2,
8), the similarity measure values of (3,9) and (4,9) is 1;By the first candidate unit pathBy this first path
The second instance pair of link: (1,8), (2,8), (3,9) and (4,9), and each second instance is to corresponding similarity measurement
Value preserves the table No.2 to Fig. 5;
In like manner, willWithThis two yuan of first candidate unit path, by these two the first candidate unit paths
The second instance of link to and each second instance table No.3 that corresponding similarity measure values is preserved respectively to Fig. 5 and table
No.4;
Table No.2, table No.3 and table No.4 in Fig. 5 are the first data structure body.
C. calculate the comprehensive similarity scores of described first data structure body according to the second preset model and preserve to described
First data structure body;
Second preset model is:
Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to
Destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measurement on unit path ∏
Value;R (s)=1-α * N, r (s) represent source node s for current data structure contribute ability with the selection of balanced structure body,
α is the degradation factor of contribution ability, and N represents described as s of source node preserving the first path link integrated to described unit path
The number of first instance pair;
Such as, the first candidate unit path can be calculated according to the second preset modelWithThe comprehensive similarity scores S of the first corresponding data structure body is respectively as follows: 4,3 and 0.5, its corresponding preservation is extremely schemed
In table No.2, table No.3 and table No.4 in 5.
After each first candidate unit path execution of step C, each first data structure body at least includes: first waits
Choosing unit path, by the second instance of the first candidate unit path link to, each second instance to corresponding similarity measure values and
The comprehensive similarity scores S of this first data structure body.
D. described first data structure body is inserted candidate collection.
Such as, the table No.2 in Fig. 5, table No.3 and table No.4 are inserted in " candidate set ".
S104, size according to described comprehensive similarity scores, concentrate from described candidate and select a data structure, note
It it is the second data structure body;Whether the second data structure body described in procuratorial work exists with arbitrary described first instance identical
Three entities pair;If it is, perform step S105;Otherwise, step S106 is performed;
Such as, the size arrived according to comprehensive similarity scores, from " candidate set ", first select comprehensive similarity
The data structure body surface No.2 that mark is maximum, is designated as the second data structure body;Owing to the second data structure body surface No.2 existing
With the first instance determined in step S101 to the 3rd identical entity pair, therefore, step S105 is performed;
Such as, if according to the size of comprehensive similarity scores, from candidate concentrate the data structure body selected be No.3 also
It is designated as the second data structure body, owing to the second data structure body surface No.3 not existing and the first instance that determines in step S101
To the 3rd identical entity pair, therefore perform step S106.
S105, by described second data structure body, link first path of described 3rd entity pair and described 3rd entity
Correspondence is preserved to unit's path collection, deletes the described second data structure body that described candidate is concentrated, and perform step S104;
Such as, by the 3rd entity in the second data structure body surface No.2 to (1,8), (2,8), (3,9) and (4,9) and
Corresponding first pathCorresponding preserve to unit's path collection, namely it was confirmed that each first instance pair
A first pathThen, the second data structure body surface No.2 concentrated by candidate deletes, now selected works
In data structure body be and table No.3 and table No.4 that this candidate collection is continued executing with execution step S104.
S106, according in described second data structure body preserve the second candidate unit path and described heterogeneous information network in
Limit type, generate multiple 3rd candidate unit path, the jumping figure in described 3rd candidate unit path and described second candidate unit path
The difference of jumping figure be 1;Delete the described second data structure body that described candidate is concentrated;Each described 3rd candidate unit path is held
Go after step E to H, performed step S104;
When the second data structure body is table No.3, the second candidate unit path is3rd candidate unit path should
Being further added by a jumping on the basis of the second candidate unit path, the such as the 3rd candidate unit path can be:
Or it is
To each described 3rd candidate unit path execution step E to H:
E, according to described heterogeneous information network, described second data structure body and described 3rd candidate unit path, generate quilt
Multiple 4th entities pair that described 3rd candidate unit path connects, the source node of described 4th entity pair is described second data knot
The source node of the entity pair in structure body, the destination node of described 4th entity pair is except described first in described heterogeneous information network
Node outside the source node of entity pair;
Determining that the method for the 4th entity pair is consistent with the method determining second instance pair in step A, here is omitted.
F, according to described first preset model calculate each described 4th entity link by described 3rd candidate unit path
Time similarity measure values, by described 3rd candidate unit path, each described 4th entity to and the similarity measurement of correspondence
Value preserves to the 3rd data structure body;
First preset model is the first preset model described in step B;
Determine that the method for the 3rd data structure body is consistent with the method determining the first data structure body in step B, the most not
Repeat again.
G, the comprehensive similarity scores calculating described 3rd data structure body according to described second preset model preservation are extremely
Described 3rd data structure body;
Second preset model is the second preset model in step C;
The comprehensive similarity scores calculating the 3rd data structure body the concrete grammar preserved are identical with step C, herein
Repeat no more.
H, by described 3rd data structure body insert described candidate collection.
Such as, the table No.5 and table No.6 that represent the 3rd data structure body in Fig. 5 are inserted candidate collection.
After the 3rd corresponding for each 3rd candidate unit path data structure body is inserted candidate collection, return and continue executing with step
Rapid S104, until candidate collection is empty, when candidate collection is empty, illustrates to find out each first instance to corresponding all useful unit
Path.
It should be noted that in step B, step F the similarity measurement of computational entity pair, and in step C and step G
The purpose of the comprehensive similarity scores of middle calculating is: make the heterogeneous information network element path provided by the embodiment of the present invention determine
First path that method is determined be link training to being more relevant first path, when link training to first path more relevant
Time, this first path is more useful when building forecast model;It is right that these yuan of path has not only linked more training, Er Qiebiao
Show the source node of training centering and the more close relation of destination node, thus present the recessive character of training set.Such as,
Due to the comprehensive similarity scores maximum of this data structure body of table No.2 that candidate is concentrated, therefore in table No.2This first path is the Article 1 unit path found in Fig. 5, and it is not only the shortest frontier juncture system, and is institute
There is in candidate unit path maximally related one;
Further, since concentrate the data structure body chosen, all candidate to concentrate comprehensive similarity scores from candidate afterwards every time
The data structure body of relative maximum, therefore, first path that each step determines also is most useful, maximally related in Candidate Set at that time,
Which ensure that by the power of degree of correlation sequentially find train to relevant unit path;
This from training to source node from the beginning of, find useful first Path Method step by step and be referred to as greedy algorithm,
In each step, the first path being determined is all the most relevant and reaches first path of most destination node;Next, it is determined that this yuan of path
Whether link training right.If link, the training of this yuan of path and link thereof is to selected and preservation extremely unit's path collection;No
The most wolfishly continually look for, until candidate collection is empty;Finally, first set of paths γ will be generated.
A kind of heterogeneous information network element determining method of path that the application embodiment of the present invention provides, it is possible to concentrate according to candidate
The size of comprehensive similarity scores of data structure body, according to the power of degree of correlation, determine the most successively
Go out to link the first instance relevant unit path to (train to), not only determine that the efficiency in unit path is higher, and the first path determined
More useful.
Preferably, on the basis of the embodiment shown in Fig. 1, in order to make the first path determined more useful, more further
Relevant, after each first candidate unit path has been performed described step C, and each first candidate unit path is being performed described
Before step D, described method also comprises the steps:
I. judge whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;
If it is, perform described step D;Otherwise, described first data structure body is abandoned;
Concrete, whether judge not less than l according to the comprehensive similarity scores S of the first data structure body;
Wherein, l=ε * | A |;Wherein, ε, for limiting coefficient, determines according to actual application scenarios;| A | is the first data knot
The scale of structure body, the quantity of entity pair in the i.e. first data structure body;
And/or, after each 3rd candidate unit path has been performed described step G, and to each 3rd candidate unit road
Before footpath performs described step H, described method also comprises the steps:
J. judge whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;
If it is, perform described step H;Otherwise, described 3rd data structure body is abandoned;
Method described in concrete determination methods and step I is consistent, and here is omitted.
After the process of step I or step J so that insert the first data structure body and the 3rd data structure that candidate collects
Body is preferred data structure body, and the first path further ensuring generation is more relevant, can preferably describe every a pair training
Relation between pair, and the link training determined to first path too much will not introduce noise because number.
Preferably, on the basis of the embodiment shown in Fig. 1 of the present invention, in order to make the first path determined more have further
With, more relevant, before each 3rd candidate unit path is performed step E, described method also comprises the steps:
Judge whether the jumping figure in described 3rd candidate unit path is not more than the second preset value;If it is, perform described step
E;
For example, it is possible to the second preset value is set to 4, say, that when the jumping figure in the 3rd candidate unit path is more than 4, this unit
Path is almost without actual semantic relation.
Therefore, the jumping figure inserting the 3rd candidate unit path corresponding to the 3rd data structure body that collects of candidate is any limitation as,
Making insert candidate to collect the 3rd data structure body is preferred data structure body, further ensures the first path more phase determined
Close, can preferably describe the relation between every a pair training pair, and the link training determined to first path will not be because of counting
Many and introduce noise.
Although, utilize a kind of heterogeneous information network element determining method of path that the embodiment shown in Fig. 1 of the present invention provides, really
Fixed link training to every first path be all useful, relevant, but, these yuan of path is used for building link prediction
Model, and when carrying out link prediction according to forecast model, the influence degree in every first path is again different.Therefore, weighing apparatus is found
The method measuring the degree of correlation in every first path, and they are effectively integrated into forecast model be very important.
It is preferred, therefore, that on the basis of the embodiment shown in Fig. 1, a kind of heterogeneous information that the embodiment of the present invention provides
Network element determining method of path, it is also possible to including:
According to the 3rd preset model, determine that described unit path is concentrated weight corresponding to each first path and corresponding preserves extremely
Described unit path collection;
Described 3rd preset model is:
Wherein, h represents the output valve of the 3rd preset model, and ω is that power corresponding to each first path is concentrated in described unit path
The vector reassembled into, ω=[ω1,ω2,…,ωi...], ωiThe first path concentrating serial number i for described unit path is corresponding
Weight, it is assumed that unit concentrates in path and saves M bar unit path altogether, then i=1 ..., M, ωi>=0,
When the output valve h maximum of the 3rd preset model, the power that each first path is corresponding is concentrated in the first path in above formula
The vectorial ω reassembled into is optimum, the ω in ωiAlso it is optimum;
Wherein,x+It is positive example sample x+Similarity measure values composition on all units path
Vector, x+It is referred to as positive example value;x-It is negative example sample x-Similarity measure values group on all units path
The vector become, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample x-It is by described positive example sample
Destination node replace with this destination node with the node of type after, the sample that there is not link of composition;Q+ is institute
There is positive example value x+The similarity matrix of composition;Q-is all negative example values x-The similarity matrix of composition;For correction term.
It should be noted that owing to link prediction can be taken as a kind of special classification problem.So the present invention is just using
Example sample and negative example sample arrange weight for every first path determined with having supervision;So that utilize the unit with weight
The forecast model that path exercising goes out is more effectively.
The present invention is shown by experimental verification, arranges the random weight peace forecast model that all weight builds with giving unit path
Carry out link prediction to compare, the weight that the first path weight value learning method provided by the present invention is determined, and utilize with these
The forecast model of first path construction of weight can be obviously improved the accuracy of prediction.
A kind of heterogeneous information network element determining method of path that the embodiment of the present invention provides, can create the first data structure
Body also inserts candidate collection;The size of the comprehensive similarity scores of the data structure body preserved is concentrated, successively from candidate according to candidate
Concentrate and select the data structure body that comprehensive similarity scores is big, be designated as in the second data structure body, and this data structure body of procuratorial work
Whether exist with arbitrary described first instance the 3rd identical entity pair;If it does, by described second data structure body,
Correspondence is preserved to unit's path collection by the first path and described 3rd entity that link described 3rd entity pair, deletes described candidate and concentrates
Described second data structure body, and continue size according to comprehensive similarity scores and concentrate from candidate and select next data knot
Structure body;If it does not exist, then create the 3rd data structure body and insert candidate collection, then proceed to according to comprehensive similarity scores
Size is concentrated from candidate and is selected next data structure body;Until candidate collection is empty.Owing to the method is concentrated according to candidate
The size of the comprehensive similarity scores of data structure body, according to the power of degree of correlation, determines the most successively
The link first instance more relevant first path to (train to), therefore, first determining method of path that the present invention provides, the most really
The efficiency in fixed unit path is high, and the first path determined is more useful.
As shown in Figure 6, the embodiment of the present invention additionally provides a kind of link prediction method, and the method can include walking as follows
Rapid:
S401, determine entity pair to be predicted;
In heterogeneous information network, entity to be predicted is to can be except the external any entity pair of training.
S402, according to the 4th preset model and described unit path collection, determine that described entity to be predicted is to by described first
The probability that the side chain of preset kind connects;
Wherein, the limit of the first preset kind, it is the limit of the first preset kind described in the embodiment shown in Fig. 1, often
One training correspondence is at least connect by the side chain of the first preset kind.
4th preset model is:
Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s,t)
Being described entity pair to be predicted, wherein s is source node, and t is destination node;γ is described unit path collection;I is that unit path is in γ
Sequence number,;σ(s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωi
It is unit path ∏iWeight;ω0For correction factor;Wherein, γ is to use the embodiment of the method shown in Fig. 1 to obtain, ωiIt it is root
Determine according to the 3rd preset model.
S403, judge described probability whether more than the 3rd preset value, if it is, determine that described 3rd entity is to by described pre-
If the limit of type connects;
3rd preset value can set according to actual application scenarios, and generally 0.7;
Such as, general to connect by the side chain of the first preset kind when the entity to be predicted calculated according to the 4th preset model
Rate is 0.8, then illustrate, entity to be predicted between be implicitly present in " limit of the first preset kind " this link.
In order to show the effectiveness of a kind of link prediction method that the embodiment of the present invention provides more intuitively, the present invention passes through
This link prediction method is verified by experiment, and concrete proof procedure is as follows:
1) data set is determined
In an experiment, using Yago data set to verify, it is a large-scale knowledge mapping, the entity that it comprises
The record of the fact that more than 10,000,000 and more than 100,000,000 2 thousand ten thousand, the present invention, should only with its core factual aspect " YagoFact "
Part includes: the limit of 35 types, 4484914 relations and be subordinated to 1369931 entities of 3455 kinds of entity types.Article one,
Relation RDF data frame representation: (entity, relation, entity), object lesson such as, (New York, is positioned at, the U.S.).
2) evaluation criterion is determined
The present invention uses ROC (Receiver Operating Characteristic) curve to weigh distinct methods
Performance, it is using kidney-Yang rate (TPR) as y-axis, using False-Negative Rate (FPR) as x-axis, the most as shown in Figure 7.TPR is just to be predicted as
The ratio of positive example sample number and actual positive example sample number, and FPR is to be predicted as positive negative example sample and actual actual negative example sample
The ratio of this number.Area under curve is the biggest, it was predicted that result is the most accurate, and what being predicted as here was just referring to is exactly, predicted reality
Body connects by the limit of described preset kind.
3) comparison other is determined
Due to being fruitful, so adopting currently without the heterogeneous information link in network prediction being directed under complex patterns
With the basic link Forecasting Methodology in disclosed a kind of heterogeneous information network being applied under simple mode of the prior art with
The link prediction method that the present invention provides is made comparisons.
Disclosed a kind of basic link Forecasting Methodology of the prior art particularly as follows: travel through out training to all units road
Footpath, calculates the similarity measure values in each first path, and gives each the identical weight in first path according to PCRW algorithm,
Then build link forecast model, and utilize this model to be predicted.
Owing to the first path more than 4 jumpings is almost without actual semantic relation, the present invention is by of the prior art disclosed a kind of
The maximum hop count in the first path determined in basic link Forecasting Methodology is limited to 1,2,3,4 respectively, and i.e. corresponding generation four kinds is basic
They are respectively labeled as by link prediction method: PCRW-1, PCRW-2, PCRW-3 and PCRW-4, and by these four basic link
The comparison other of the link prediction method that Forecasting Methodology provides as the present invention.
In an experiment, have chosen two different types of links to be predicted:WithPhase is pre-
Survey result correspondence display respectively in Fig. 7 (a) and Fig. 7 (b).For each link, from Yago data set, choose 200 to depositing
At the entity pair of this both links, using therein 100 to as training entity pair, other 100 to as test entity pair, and
Assume that these these links do not exist in prognostic experiment.
In an experiment, ε is set to 0.005, and the jumping figure maximum limit in candidate unit path is made as 4.
4) experimental result
Experimental result as it is shown in fig. 7, it can be seen from figure 7 that the embodiment of the present invention provide a kind of link prediction method
Predictablity rate apparently higher than base link Forecasting Methodology of the prior art, this explanation, utilize the embodiment of the present invention to provide
The link prediction model of first path construction that determines of a kind of heterogeneous information network element determining method of path more effectively, more can be accurate
True carries out link prediction.
A kind of link prediction method that the embodiment of the present invention provides, it may be determined that entity pair to be predicted;Preset according to the 4th
Model and described unit path collection, determine described entity to be predicted to the probability connect by the side chain of described first preset kind, the
Four preset models are:Judge whether described probability is more than the 3rd preset value, if
It is to determine that described 3rd entity connects by the limit of described preset kind.Due to the 4th preset model as forecast model, it is
First path exercising that the heterogeneous information network element determining method of path provided by the embodiment of the present invention is determined out, and the 4th
Preset model also contemplates the weight in each first path, therefore, a kind of link prediction side that the application embodiment of the present invention provides
Method, it is thus achieved that predict the outcome the most accurate.
Corresponding to said method embodiment, the embodiment of the present invention additionally provides a kind of heterogeneous information network element as shown in Figure 8
Path determines that device, described device include: first determine module 101, second determine module the 102, first trigger module 103,
Three determine that module the 201, first computing module the 202, second computing module 203, first inserts module 204, first and selects module
104, the second trigger module the 105, the 3rd trigger module the 106, the 4th determines module the 301, the 3rd computing module the 302, the 4th calculating
Module 303 and second inserts module 304,
First determines module 101, for determining multiple first instances pair in unit to be determined path in heterogeneous information network, its
In, each described first instance is to including source node and destination node, and each described first instance is at least being preset class by first
The side chain of type connects;
Owing to the general process of link prediction is: determine that the multiple training linked by certain edges thereof in heterogeneous information network are real
Body pair, enumerates all units path of these training entities pair;Prediction is set up in first path according to the training entity pair enumerated
Model, calculates the entity to be predicted probability to being linked by above-mentioned specific unit path according to forecast model, when this probability is more than pre-
If the when of value, illustrating that entity to be predicted links by this certain edges thereof.
Therefore, when unit path be link prediction service time, first instance is right, by first instance to being referred to as training
Set to composition can be referred to as training set;The certain types of limit that the limit of the first preset kind is in link prediction;
The quantity of first instance pair can be determined according to the actual scale of heterogeneous information network, typically 10 to
On, it is also preferred that the left 10 to 20 between;
Second determines module 102, is used for according to the plurality of first instance determining primary data structure body;Described initially
Data structure body includes: the entity pair self being made up of with this source node the source node of each described first instance centering;
After first instance is to determining, embodiment use " data structure body " that the present invention provides records and determines link
The process in first path of first instance pair;
First determining a primary data structure body, this primary data structure body includes: by each first instance centering
The entity pair of source node and this source node self composition.
First trigger module 103, for according to the limit type in described heterogeneous information network, generate jumping figure be 1 multiple
First candidate unit path, triggers the described 3rd successively and determines module 201, described first meter each described first candidate unit path
After calculating module 202, described second computing module 203 and described first insertion module 204, trigger described first and select module;
Concrete, the limit type in heterogeneous information network has how many kinds of, then generate the first candidate unit that how many jumping figures are 1
Path.
3rd determines module 201, for according to described heterogeneous information network, described primary data structure body and described first
Candidate unit path, generates by multiple second instances pair of described first candidate unit path link;Wherein, described second instance pair
Source node is the source node of the entity pair in described primary data structure body, and the destination node of described second instance pair is described different
Node in addition to the source node of described first instance centering in matter information network;
Wherein, second instance is to being being linked by described first candidate unit path of necessary being in described heterogeneous information network
Entity pair.
First computing module 202, for calculating each described second instance to by described first according to the first preset model
Similarity measure values during the path link of candidate unit;By described first candidate unit path, each described second instance to and right
The similarity measure values answered preserves to the first data structure body;
First preset model is the similarity measurements quantity algorithm PCRW (Path-Constrained disclosed in prior art
Random Walk);
First preset model particularly as follows:
Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;
∏1…iRepresent linked source node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent start from source node s
Unit path ∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) indicate whether
Can be by limit Ri-1Arrive destination node ti, if R is (x, ti) equal to 1, otherwise R (x, ti) equal to 0;R (x) represents node x
By limit Ri-1Accessibility number of network nodes;
Second computing module 203, for calculating the most similar of described first data structure body according to the second preset model
Property mark preserving to described first data structure body;
Wherein, the second preset model is:
Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to
Destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measurement on unit path ∏
Value;R (s)=1-α * N, r (s) represent source node s for current data structure contribute ability with the selection of balanced structure body,
α is the degradation factor of contribution ability, and N represents described as s of source node preserving the first path link integrated to described unit path
The number of first instance pair;
After each first candidate unit path execution of step C, each first data structure body at least includes: first waits
Choosing unit path, by the second instance of the first candidate unit path link to, each second instance to corresponding similarity measure values and
The comprehensive similarity scores S of this first data structure body.
First inserts module 204, for described first data structure body is inserted candidate collection;
First selects module 104, for the size according to described comprehensive similarity scores, concentrates from described candidate and selects one
Individual data structure, is designated as the second data structure body;Whether the second data structure body described in procuratorial work exists and arbitrary described
One entity is to the 3rd identical entity pair;
Second trigger module 105, in the case of the inspection result in described first selection module acquisition is for being, by institute
Stating in the second data structure body, correspondence is preserved to unit path by the first path and described 3rd entity that link described 3rd entity pair
Collection, deletes the described second data structure body that described candidate is concentrated, and triggers described first selection module 104;
3rd trigger module 106, in the case of the described first inspection result selecting module to obtain is no, according to institute
State the limit type in the second candidate unit path and described heterogeneous information network preserved in the second data structure body, generate multiple the
Three candidate unit paths, the jumping figure in described 3rd candidate unit path is 1 with the difference of the jumping figure in described second candidate unit path;Delete institute
State the described second data structure body that candidate is concentrated;Each described 3rd candidate unit path is triggered the described 4th successively and determines mould
After block 301, described 3rd computing module 302, described 4th computing module 303 and described second insert module 304, trigger described
First selects module 104;
4th determines module 301, for according to described heterogeneous information network, described second data structure body and the described 3rd
Candidate unit path, generates multiple 4th entities pair connected by described 3rd candidate unit path, the source knot of described 4th entity pair
Point is the source node of the entity pair in described second data structure body, and the destination node of described 4th entity pair is described heterogeneous letter
Node in addition to the source node of described first instance pair in breath network;
Determine that the method for the 4th entity pair determines with the 3rd and module 201 determining, the method for second instance pair is consistent, herein
Repeat no more.
3rd computing module 302, for calculating each described 4th entity to described according to described first preset model
Similarity measure values during the 3rd candidate unit path link, by described 3rd candidate unit path, each described 4th entity to and
The similarity measure values of its correspondence preserves to the 3rd data structure body;
First preset model is the first preset model employed in the first computing module 202;
Determine with the first computing module 202, the method for the 3rd data structure body determines that the method for second instance pair is consistent,
Here is omitted.
4th computing module 303, for calculating the comprehensive of described 3rd data structure body according to described second preset model
Similarity scores also preserves to described 3rd data structure body;
Second preset model is the second preset model employed in the second computing module 203;
Calculate in the comprehensive similarity scores of the 3rd data structure body the concrete grammar preserved and the second computing module 203
Identical, here is omitted.
Second inserts module 304, for described 3rd data structure body is inserted described candidate collection.
After the 3rd corresponding for each 3rd candidate unit path data structure body is inserted candidate collection, trigger the first selection mould
Block 104, until candidate collection is empty, when candidate collection is empty, illustrates to find out each first instance to corresponding all useful unit
Path.
It should be noted that in the first computing module the 202, the 3rd computing module 302 similarity measurements of computational entity pair
Measure, and the purpose calculating comprehensive similarity scores in the second computing module 203 and the 4th computing module 303 is: make to pass through
First path that the method that the heterogeneous information network element path that the embodiment of the present invention provides determines is determined is that link training is to phase
The first path closed;It is right that these yuan of path has not only linked more training, and is demonstrated by training source node and the mesh of centering
The more close relation of mark node, thus present the recessive character of training set.
Further, since first select module 104 to concentrate the data structure body chosen, all candidate to concentrate from candidate afterwards every time
The data structure body of comprehensive similarity scores relative maximum, therefore, first path that each step determines also be at that time in Candidate Set
Be correlated with, which ensure that by the power of degree of correlation sequentially find train to relevant unit path;
This from training to source node from the beginning of, find useful first Path Method step by step and be referred to as greedy algorithm,
In each step, the first path being determined is all the most relevant and reaches first path of most destination node;Next, it is determined that this yuan of path
Whether link training right.If link, the training of this yuan of path and link thereof is to selected and preservation extremely unit's path collection;No
The most wolfishly continually look for, until candidate collection is empty;Finally, first set of paths γ will be generated.
A kind of heterogeneous information network element path that the application embodiment of the present invention provides determines device, it is possible to concentrate according to candidate
The size of comprehensive similarity scores of data structure body, according to the power of degree of correlation, determine the most successively
Go out to link first path that (train to) be correlated with by first instance, not only determine that the efficiency in first path is higher, and the first road determined
Footpath is more useful.
Preferably, on the basis of the embodiment shown in Fig. 8, in order to make the first path determined more relevant further, institute
State device also to include: the first judge module,
Described first judge module, is used for after triggering described second computing module, and inserts mould triggering described first
Before block, it is judged that whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;If
It is to trigger described first and insert module;
Concrete, whether judge not less than l according to the comprehensive similarity scores S of the first data structure body;
Wherein, l=ε * | A |;Wherein, ε, for limiting coefficient, determines according to actual application scenarios;| A | is the first data knot
The scale of structure body, the quantity of entity pair in the i.e. first data structure body;
And/or, described device also includes: the second judge module,
Described second judge module, is used for after triggering described 4th computing module, and inserts mould triggering described second
Before block, it is judged that whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;If
It is to trigger described second and insert module;
Concrete determination methods is consistent with the method employed in the first judge module, and here is omitted.
After the process of the first judge module and/or the second judge module so that insert the first data knot of candidate collection
Structure body and/or the 3rd data structure body are preferred data structure body, and the first path further ensuring generation is more relevant, energy
Relation between every a pair training pair preferably described, and the link training determined to first path too much will not draw because number
Enter noise.
Preferably, on the basis of the embodiment shown in Fig. 8 of the present invention, in order to make the first path more phase determined further
Closing, described device also includes: the 3rd judge module,
Described 3rd judge module, for before triggering the described 4th and determining module, it is judged that described 3rd candidate unit path
Jumping figure whether be not more than the second preset value;If it is, trigger the described 4th to determine module.
For example, it is possible to the second preset value is set to 4, say, that when the jumping figure in the 3rd candidate unit path is more than 4, this unit
Path is almost without actual semantic relation.
Therefore, the jumping figure inserting the 3rd candidate unit path corresponding to the 3rd data structure body that collects of candidate is any limitation as,
Making insert candidate to collect the 3rd data structure body is preferred data structure body, further ensures the first path more phase determined
Close, can preferably describe the relation between every a pair training pair, and the link training determined to first path will not be because of counting
Many and introduce noise.
Although, a kind of heterogeneous information network element path utilizing the embodiment shown in Fig. 8 of the present invention to provide determines device, really
Fixed link training to every first path be all useful, relevant, but, these yuan of path is used for building link prediction
Model, and when carrying out link prediction according to forecast model, the influence degree in every first path is again different.Therefore, weighing apparatus is found
The method measuring the degree of correlation in every first path, and they are effectively integrated into forecast model be very important.
It is preferred, therefore, that the device shown in Fig. 8 can also include: the 5th computing module,
5th computing module, for according to the 3rd preset model, determines that described unit path concentrates each first path corresponding
Weight and corresponding preserve to the most described unit path collection;
Wherein, the 3rd preset model is:
Wherein, h represents the output valve of the 3rd preset model, and ω is that power corresponding to each first path is concentrated in described unit path
The vector reassembled into, ω=[ω1,ω2,…,ωi...], ωiThe first path concentrating serial number i for described unit path is corresponding
Weight, it is assumed that unit concentrates in path and saves M bar unit path altogether, then i=1 ..., M, ωi>=0,
When the output valve h maximum of the 3rd preset model, the power that each first path is corresponding is concentrated in the first path in above formula
The vectorial ω reassembled into is optimum, the ω in ωiAlso it is optimum;
Wherein,x+It is positive example sample x+Similarity measure values composition on all units path
Vector, x+It is referred to as positive example value;x-It is negative example sample x-Similarity measure values group on all units path
The vector become, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample x-It is by described positive example sample
Destination node replace with this destination node with the node of type after, the sample that there is not link of composition;Q+ is institute
There is positive example value x+The similarity matrix of composition;Q-is all negative example values x-The similarity matrix of composition;For correction term.
It should be noted that owing to link prediction can be taken as a kind of special classification problem.So the present invention is just using
Example sample and negative example sample arrange weight for every first path determined with having supervision;So that utilize the unit with weight
The forecast model that path exercising goes out is more effectively.
A kind of heterogeneous information network element path that the embodiment of the present invention provides determines device, can create the first data structure
Body also inserts candidate collection;The size of the comprehensive similarity scores of the data structure body preserved is concentrated, successively from candidate according to candidate
Concentrate and select the data structure body that comprehensive similarity scores is big, be designated as in the second data structure body, and this data structure body of procuratorial work
Whether exist with arbitrary described first instance the 3rd identical entity pair;If it does, by described second data structure body,
Correspondence is preserved to unit's path collection by the first path and described 3rd entity that link described 3rd entity pair, deletes described candidate and concentrates
Described second data structure body, and continue size according to comprehensive similarity scores and concentrate from candidate and select next data knot
Structure body;If it does not exist, then create the 3rd data structure body and insert candidate collection, then proceed to according to comprehensive similarity scores
Size is concentrated from candidate and is selected next data structure body;Until candidate collection is empty.Owing to this device is concentrated according to candidate
The size of the comprehensive similarity scores of data structure body, according to the power of degree of correlation, determines the most successively
The link first instance more relevant first path to (train to), therefore, first path that the present invention provides determines device, the most really
The efficiency in fixed unit path is high, and the first path determined is more useful.
As it is shown in figure 9, the embodiment of the present invention additionally provides a kind of link prediction device, described device includes: reality to be predicted
Body to determining module 401, probability determination module 402 and the 4th judge module 403,
Entity to be predicted, to determining module 401, is used for determining entity pair to be predicted;
In heterogeneous information network, entity to be predicted is to can be except the external any entity pair of training.
Probability determination module 402, for according to the 4th preset model and described unit path collection, determining described entity to be predicted
To the probability connect by the side chain of described first preset kind;
4th preset model is:
Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s,t)
Being described entity pair to be predicted, wherein s is source node, and t is destination node;γ is described unit path collection;I is that unit path is in γ
Sequence number;σ(s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt is
Unit path ∏iWeight;ω0For correction factor;;Wherein, γ is to use the embodiment of the method shown in Fig. 1 to obtain, ωiIt it is root
Determine according to the 3rd preset model.
4th judge module 403, be used for judging described probability whether more than the 3rd preset value, if it is, determine described the
Three entities connect by the limit of described preset kind;
3rd preset value can set according to actual application scenarios, and generally 0.7;
Such as, general to connect by the side chain of the first preset kind when the entity to be predicted calculated according to the 4th preset model
Rate is 0.8, then illustrate, entity to be predicted between be implicitly present in " limit of the first preset kind " this link.
A kind of link prediction device that the embodiment of the present invention provides, it may be determined that entity pair to be predicted;Preset according to the 4th
Model and described unit path collection, determine described entity to be predicted to the probability connect by the side chain of described first preset kind, the
Four preset models are:Judge whether described probability is more than the 3rd preset value, if
It is to determine that described 3rd entity connects by the limit of described preset kind.Due to the 4th preset model as forecast model, it is
First path exercising that the heterogeneous information network element determining method of path provided by the embodiment of the present invention is determined out, and the 4th
Preset model also contemplates the weight in each first path, therefore, a kind of link prediction dress that the application embodiment of the present invention provides
Put, it is thus achieved that predict the outcome the most accurate.
It should be noted that embodiment of the present invention heterogeneous information network element path determines, link prediction method, can be by soft
Part program realizes.
For device embodiment, owing to it is substantially similar to embodiment of the method, so describe is fairly simple, relevant
Part sees the part of embodiment of the method and illustrates.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to a reality
Body or operation separate with another entity or operating space, and deposit between not necessarily requiring or imply these entities or operating
Relation or order in any this reality.And, term " includes ", " comprising " or its any other variant are intended to
Comprising of nonexcludability, so that include that the process of a series of key element, method, article or equipment not only include that those are wanted
Element, but also include other key elements being not expressly set out, or also include for this process, method, article or equipment
Intrinsic key element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that
Including process, method, article or the equipment of described key element there is also other identical element.
Each embodiment in this specification all uses relevant mode to describe, identical similar portion between each embodiment
Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for device
For executing example, owing to it is substantially similar to embodiment of the method, so describe is fairly simple, relevant part sees embodiment of the method
Part illustrate.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention.All
Any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, are all contained in protection scope of the present invention
In.
Claims (10)
1. a heterogeneous information network element determining method of path, it is characterised in that described method includes:
S101, determine in heterogeneous information network to be determined unit path multiple first instances pair, wherein, each described first instance
To including source node and destination node, each described first instance is at least being connect by the side chain of the first preset kind;
S102, according to the plurality of first instance to determining primary data structure body;Described primary data structure body includes: by often
The entity pair that the source node of first instance centering described in forms with this source node self;
S103, according to the limit type in described heterogeneous information network, generate multiple first candidate unit paths that jumping figure is 1, to often
First candidate unit path execution of step A described in one to after step D, performs step S104:
A. according to described heterogeneous information network, described primary data structure body and described first candidate unit path, generate described
Multiple second instances pair of the first candidate unit path link;Wherein, the source node of described second instance pair is described primary data
The source node of the entity pair in structure, the destination node of described second instance pair is except described the in described heterogeneous information network
Node outside the source node of one entity centering;
B. similar to by during described first candidate unit path link of each described second instance is calculated according to the first preset model
Property metric;By described first candidate unit path, each described second instance to and correspondence similarity measure values preserve extremely
First data structure body;
C. calculate the comprehensive similarity scores of described first data structure body according to the second preset model and preserve to described first
Data structure body;
D. described first data structure body is inserted candidate collection;
S104, size according to described comprehensive similarity scores, concentrate from described candidate and select a data structure, is designated as the
Two data structure bodies;Whether the second data structure body described in procuratorial work exist real to the identical the 3rd with arbitrary described first instance
Body pair;
S105 is if it does, by described second data structure body, link first path and the described 3rd of described 3rd entity pair
Correspondence is preserved to unit's path collection by entity, deletes the described second data structure body that described candidate is concentrated, and performs step S104;
S106 if it does not, according in described second data structure body preserve the second candidate unit path and described heterogeneous letter
Limit type in breath network, generates multiple 3rd candidate unit path, and the jumping figure in described 3rd candidate unit path is waited with described second
The difference of the jumping figure in choosing unit path is 1;Delete the described second data structure body that described candidate is concentrated;To each described 3rd candidate
After unit path execution of step E to H, perform step S104;
E, according to described heterogeneous information network, described second data structure body and described 3rd candidate unit path, generate described
Multiple 4th entities pair that 3rd candidate unit path connects, the source node of described 4th entity pair is described second data structure body
In the source node of entity pair, the destination node of described 4th entity pair is except described first instance in described heterogeneous information network
To source node outside node;
F, calculate each described 4th entity according to described first preset model to by during described 3rd candidate unit path link
Similarity measure values, by described 3rd candidate unit path, each described 4th entity to and correspondence similarity measure values protect
Deposit to the 3rd data structure body;
G, the comprehensive similarity scores calculating described 3rd data structure body according to described second preset model preservation are to the most described
3rd data structure body;
H, by described 3rd data structure body insert described candidate collection;
Wherein, described first preset model is:Wherein, σ (s, ti|
∏1…i) represent that source node s and destination node ti are at unit path ∏1…iOn similarity measure values;∏1…iRepresent linked source node s
With destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path ∏1…i-1Upper migration can
The set of the destination node arrived, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through limit Ri-1Arrive target
Node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second preset model is:Wherein, S represents the comprehensive phase of data structure body
Like property mark;S is source node, t be by unit path ∏ up to destination node, τ is the number up to destination node;σ(s,t|
It is ∏) that entity is to (s, t) similarity measure values on unit path ∏;R (s)=1-α * N, r (s) represent that source node s is for working as
The contribution ability of front data structure body is with the selection of balanced structure body, and α is the degradation factor of contribution ability, N represent preserve to
The source node of first path link that described unit path integrates is as the number of the described first instance pair of s.
Method the most according to claim 1, it is characterised in that after having performed described step C, and performing described step
Before D, described method also includes:
I. judge whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;If
It is to perform described step D;
And/or, after having performed described step G, and before performing described step H, described method also includes:
J. judge whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;If
It is to perform described step H.
Method the most according to claim 1, it is characterised in that before performing described step E, described method also includes:
Judge whether the jumping figure in described 3rd candidate unit path is not more than the second preset value;If it is, perform described step E.
Method the most according to claim 1, it is characterised in that described method also includes:
According to the 3rd preset model, determine that described unit path is concentrated weight corresponding to each first path and corresponding preserves to the most described
Unit's path collection;Described 3rd preset model is:
Wherein, h represents the output valve of the 3rd preset model,x+It is positive example sample x+In all units path
On similarity measure values composition vector, x+It is referred to as positive example value;x-It is negative example sample x-In all units
The vector of the similarity measure values composition on path, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample
This x-Be the destination node in described positive example sample is replaced with this destination node with the node of type after, not existing of composition
The sample of link;ω is the vector that weight composition corresponding to each first path is concentrated in described unit path;Q+ be all just
Example value x+The similarity matrix of composition;q-For all negative example values x-The similarity matrix of composition;For correction term.
5. the method that the method applied described in claim 4 carries out link prediction, it is characterised in that described link prediction
Method includes:
Determine entity pair to be predicted;
According to the 4th preset model and described unit path collection, determine that described entity to be predicted is to by described first preset kind
The probability that side chain connects;Described 4th preset model is:Wherein, η (s, t | γ) is
Entity to be predicted is to the probability connect by the side chain of described first preset kind;(s t) is described entity pair to be predicted, wherein s
Being source node, t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ(s,t|∏i) it is described
Entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt is unit path ∏iWeight;ω0For repairing
Positive coefficient;
Judge whether described probability is more than the 3rd preset value, if it is, determine that described 3rd entity is to by described preset kind
Limit connects.
6. a heterogeneous information network element path determines device, it is characterised in that described device includes: first determine module,
Two determine module, the first trigger module, the 3rd determine module, the first computing module, the second computing module, first insert module,
First select module, the second trigger module, the 3rd trigger module, the 4th determine module, the 3rd computing module, the 4th computing module
Module is inserted with second,
Described first determines module, for determining multiple first instances pair in unit to be determined path in heterogeneous information network, wherein,
Each described first instance is to including source node and destination node, and each described first instance is at least by the first preset kind
Side chain connects;
Described second determines module, is used for according to the plurality of first instance determining primary data structure body;Described initial number
Include according to structure: the entity pair self being made up of with this source node the source node of each described first instance centering;
Described first trigger module, for according to the limit type in described heterogeneous information network, generation jumping figure is multiple the first of 1
Candidate unit path, each described first candidate unit path is triggered successively the described 3rd determine module, described first computing module,
After described second computing module and described first inserts module, trigger described first and select module;
Described 3rd determines module, for according to described heterogeneous information network, described primary data structure body and described first marquis
Choosing unit path, generates by multiple second instances pair of described first candidate unit path link;Wherein, the source of described second instance pair
Node is the source node of the entity pair in described primary data structure body, and the destination node of described second instance pair is described heterogeneous
Node in addition to the source node of described first instance centering in information network;
Described first computing module, for calculating each described second instance to by described first candidate unit road according to the first preset model
The similarity measure values during link of footpath;By described first candidate unit path, each described second instance to and the similarity measurements of correspondence
Value preserves to the first data structure body;Wherein, described first preset model is:
Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;∏1…iRepresent
Linked source node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path
∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through
Limit Ri-1Arrive destination node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second computing module, for calculating the comprehensive similarity scores of described first data structure body according to the second preset model
And preserve to described first data structure body;Wherein, described second preset model is:
Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to destination node,
τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measure values on unit path ∏;R (s)=
1-α * N, r (s) represent that source node s contributes ability with the selection of balanced structure body for current data structure, and α is contribution energy
The degradation factor of power, N represents that the source node preserving the first path link integrated to described unit path is as the described first instance pair of s
Number;
Described first inserts module, for described first data structure body is inserted candidate collection;
Described first selects module, for the size according to described comprehensive similarity scores, concentrates from described candidate and selects one
Data structure body, is designated as the second data structure body;Whether the second data structure body described in procuratorial work exists and arbitrary described first
Entity is to the 3rd identical entity pair;
Described second trigger module, in the case of the inspection result in described first selection module acquisition is for being, by described
In second data structure body, correspondence is preserved to unit path by the first path and described 3rd entity that link described 3rd entity pair
Collection, deletes the described second data structure body that described candidate is concentrated, and triggers described first selection module;
Described 3rd trigger module, in the case of the described first inspection result selecting module to obtain is no, according to described
Limit type in the second candidate unit path preserved in second data structure body and described heterogeneous information network, generates the multiple 3rd
Candidate unit path, the jumping figure in described 3rd candidate unit path is 1 with the difference of the jumping figure in described second candidate unit path;Delete described
The described second data structure body that candidate is concentrated;Each described 3rd candidate unit path is triggered the described 4th successively and determines mould
After block, described 3rd computing module, described 4th computing module and described second insert module, trigger described first and select mould
Block;
Described 4th determines module, for according to described heterogeneous information network, described second data structure body and described 3rd marquis
Choosing unit path, generates multiple 4th entities pair connected by described 3rd candidate unit path, the source node of described 4th entity pair
For the source node of the entity pair in described second data structure body, the destination node of described 4th entity pair is described heterogeneous information
Node in addition to the source node of described first instance pair in network;
Described 3rd computing module, for calculating each described 4th entity to by the described 3rd according to described first preset model
The similarity measure values during link of candidate unit path, by described 3rd candidate unit path, each described 4th entity to and right
The similarity measure values answered preserves to the 3rd data structure body;
Described 4th computing module, for calculating the most similar of described 3rd data structure body according to described second preset model
Property mark preserving to described 3rd data structure body;
Described second inserts module, for described 3rd data structure body is inserted described candidate collection.
Device the most according to claim 6, it is characterised in that described device also includes: the first judge module,
Described first judge module, is used for after triggering described second computing module, and before triggering described first and inserting module,
Judge whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;If it is, touch
Send out described first insert module;
And/or, described device also includes: the second judge module,
Described second judge module, is used for after triggering described 4th computing module, and before triggering described second and inserting module,
Judge whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;If it is, touch
Send out described second insert module.
Device the most according to claim 6, it is characterised in that described device also includes: the 3rd judge module,
Described 3rd judge module, for before triggering the described 4th and determining module, it is judged that the jumping in described 3rd candidate unit path
Whether number is not more than the second preset value;If it is, trigger the described 4th to determine module.
Device the most according to claim 6, it is characterised in that described device also includes: the 5th computing module,
Described 5th computing module, for according to the 3rd preset model, determines that described unit path concentrates each first path corresponding
Weight and corresponding preserve to the most described unit path collection;Described 3rd preset model is:
Wherein, h represents the output valve of the 3rd preset model,x+It is positive example sample x+In all units path
On similarity measure values composition vector, x+It is referred to as positive example value;x-It is negative example sample x-In all units
The vector of the similarity measure values composition on path, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample
This x-Be the destination node in described positive example sample is replaced with this destination node with the node of type after, not existing of composition
The sample of link;ω is the vector that weight composition corresponding to each first path is concentrated in described unit path;Q+ be all just
Example value x+The similarity matrix of composition;q-For all negative example values x-The similarity matrix of composition;For correction term.
10. the device applied described in claim 9 carries out the device of link prediction, it is characterised in that described link prediction
Device include: entity to be predicted to determining module, probability determination module and the 4th judge module,
Described entity to be predicted, to determining module, is used for determining entity pair to be predicted;
Described probability determination module, for according to the 4th preset model and described unit path collection, determining that described entity to be predicted is to by institute
State the probability that the side chain of the first preset kind connects;Described 4th preset model is:
Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s, t) be described in treat
Prediction entity pair, wherein s is source node, and t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ
(s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt it is unit path
∏iWeight;ω0For correction factor;
Described 4th judge module, is used for judging whether described probability is more than the 3rd preset value, if it is, determine that the described 3rd is real
Body connects by the limit of described preset kind.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610225725.XA CN105913125B (en) | 2016-04-12 | 2016-04-12 | Heterogeneous information network element path determines, link prediction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610225725.XA CN105913125B (en) | 2016-04-12 | 2016-04-12 | Heterogeneous information network element path determines, link prediction method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105913125A true CN105913125A (en) | 2016-08-31 |
CN105913125B CN105913125B (en) | 2018-05-25 |
Family
ID=56746047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610225725.XA Active CN105913125B (en) | 2016-04-12 | 2016-04-12 | Heterogeneous information network element path determines, link prediction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105913125B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951526A (en) * | 2017-03-21 | 2017-07-14 | 北京邮电大学 | A kind of entity set extended method and device |
CN107944629A (en) * | 2017-11-30 | 2018-04-20 | 北京邮电大学 | A kind of recommendation method and device based on heterogeneous information network representation |
CN109299285A (en) * | 2018-09-11 | 2019-02-01 | 中国医学科学院医学信息研究所 | A kind of pharmacogenomics knowledge mapping construction method and system |
CN109635201A (en) * | 2018-12-18 | 2019-04-16 | 苏州大学 | The heterogeneous cross-platform association user account method for digging of social networks |
CN109800504A (en) * | 2019-01-21 | 2019-05-24 | 北京邮电大学 | A kind of embedding grammar and device of heterogeneous information network |
CN110555050A (en) * | 2018-03-30 | 2019-12-10 | 华东师范大学 | heterogeneous network node representation learning method based on meta-path |
CN112380434A (en) * | 2020-11-16 | 2021-02-19 | 吉林大学 | Interpretable recommendation system method fusing heterogeneous information network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050083848A1 (en) * | 2003-10-20 | 2005-04-21 | Huai-Rong Shao | Selecting multiple paths in overlay networks for streaming data |
CN103559320A (en) * | 2013-11-21 | 2014-02-05 | 北京邮电大学 | Method for sequencing objects in heterogeneous network |
CN103955535A (en) * | 2014-05-14 | 2014-07-30 | 南京大学镇江高新技术研究院 | Individualized recommending method and system based on element path |
-
2016
- 2016-04-12 CN CN201610225725.XA patent/CN105913125B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050083848A1 (en) * | 2003-10-20 | 2005-04-21 | Huai-Rong Shao | Selecting multiple paths in overlay networks for streaming data |
CN103559320A (en) * | 2013-11-21 | 2014-02-05 | 北京邮电大学 | Method for sequencing objects in heterogeneous network |
CN103955535A (en) * | 2014-05-14 | 2014-07-30 | 南京大学镇江高新技术研究院 | Individualized recommending method and system based on element path |
Non-Patent Citations (3)
Title |
---|
YIZHOU SUN,JIAWEI HAN: "Meta-Path-Based Search and Mining in Heterogeneous Information Networks", 《清华大学学报自然科学版(英文版)》 * |
孟晓峰: "基于异质信息网络的相似性度量研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
黄立威等: "一种基于元路径的异质信息网络链路预测模型", 《计算机学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951526A (en) * | 2017-03-21 | 2017-07-14 | 北京邮电大学 | A kind of entity set extended method and device |
CN106951526B (en) * | 2017-03-21 | 2020-08-07 | 北京邮电大学 | Entity set extension method and device |
CN107944629A (en) * | 2017-11-30 | 2018-04-20 | 北京邮电大学 | A kind of recommendation method and device based on heterogeneous information network representation |
CN107944629B (en) * | 2017-11-30 | 2020-08-07 | 北京邮电大学 | Recommendation method and device based on heterogeneous information network representation |
CN110555050A (en) * | 2018-03-30 | 2019-12-10 | 华东师范大学 | heterogeneous network node representation learning method based on meta-path |
CN110555050B (en) * | 2018-03-30 | 2023-03-31 | 华东师范大学 | Heterogeneous network node representation learning method based on meta-path |
CN109299285A (en) * | 2018-09-11 | 2019-02-01 | 中国医学科学院医学信息研究所 | A kind of pharmacogenomics knowledge mapping construction method and system |
CN109635201A (en) * | 2018-12-18 | 2019-04-16 | 苏州大学 | The heterogeneous cross-platform association user account method for digging of social networks |
CN109635201B (en) * | 2018-12-18 | 2020-07-31 | 苏州大学 | Heterogeneous social network cross-platform associated user account mining method |
CN109800504A (en) * | 2019-01-21 | 2019-05-24 | 北京邮电大学 | A kind of embedding grammar and device of heterogeneous information network |
CN112380434A (en) * | 2020-11-16 | 2021-02-19 | 吉林大学 | Interpretable recommendation system method fusing heterogeneous information network |
CN112380434B (en) * | 2020-11-16 | 2022-09-16 | 吉林大学 | Interpretable recommendation method fusing heterogeneous information network |
Also Published As
Publication number | Publication date |
---|---|
CN105913125B (en) | 2018-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105913125A (en) | Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device | |
CN108777873A (en) | The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend | |
CN103793476B (en) | Network community based collaborative filtering recommendation method | |
CN103729432B (en) | Method for analyzing and sequencing academic influence of theme literature in citation database | |
CN104063612B (en) | A kind of Tunnel Engineering risk profiles fuzzy evaluation method and assessment system | |
CN103353923B (en) | Adaptive space interpolation method and system thereof based on space characteristics analysis | |
CN104881689B (en) | A kind of multi-tag Active Learning sorting technique and system | |
CN107967208A (en) | A kind of Python resource sensitive defect code detection methods based on deep neural network | |
CN105354595A (en) | Robust visual image classification method and system | |
CN109325263A (en) | Truss bridge damage position neural network based and damage extent identification method | |
CN110516757A (en) | A kind of transformer fault detection method and relevant apparatus | |
CN107545151A (en) | A kind of medicine method for relocating based on low-rank matrix filling | |
WO2015032301A1 (en) | Method for detecting the similarity of the patent documents on the basis of new kernel function luke kernel | |
CN105389505A (en) | Shilling attack detection method based on stack type sparse self-encoder | |
CN105069072A (en) | Emotional analysis based mixed user scoring information recommendation method and apparatus | |
CN105138653A (en) | Exercise recommendation method and device based on typical degree and difficulty | |
CN103455612B (en) | Based on two-stage policy non-overlapped with overlapping network community detection method | |
CN110110529B (en) | Software network key node mining method based on complex network | |
Vidinli et al. | New query suggestion framework and algorithms: A case study for an educational search engine | |
CN107391659A (en) | A kind of citation network academic evaluation sort method based on credit worthiness | |
CN108460158A (en) | Differentiation Web page sequencing method based on PageRank | |
CN109783629A (en) | A kind of micro-blog event rumour detection method of amalgamation of global event relation information | |
Ma et al. | Eigenspaces of networks reveal the overlapping and hierarchical community structure more precisely | |
CN104881400B (en) | Semantic dependency computational methods based on associative network | |
CN106682507A (en) | Virus library acquiring method and device, equipment, server and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |