CN105913125A - Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device - Google Patents

Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device Download PDF

Info

Publication number
CN105913125A
CN105913125A CN201610225725.XA CN201610225725A CN105913125A CN 105913125 A CN105913125 A CN 105913125A CN 201610225725 A CN201610225725 A CN 201610225725A CN 105913125 A CN105913125 A CN 105913125A
Authority
CN
China
Prior art keywords
path
data structure
structure body
module
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610225725.XA
Other languages
Chinese (zh)
Other versions
CN105913125B (en
Inventor
石川
曹晓欢
郑玉艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201610225725.XA priority Critical patent/CN105913125B/en
Publication of CN105913125A publication Critical patent/CN105913125A/en
Application granted granted Critical
Publication of CN105913125B publication Critical patent/CN105913125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention provides a heterogeneous information network element determining method, a link prediction method, a heterogeneous information network element determining device and a link prediction device. The heterogeneous information network element determining method comprises the steps of creating a first data structural body and inserting into a candidate set; selecting one data structural body from the candidate set according to the value of a comprehensive similarity score of the data structural body which is stored in the candidate set, checking whether a third entity pair which is same with a random first entity pair exists in the data structural body; if yes, storing a metapath which is connected with the third entity pair in the data structural body into a metapath set, deleting the data structural body in the candidate set, and continuously selecting a next data structural body from the candidate set; and otherwise, creating a third data structural body and inserting into the candidate set, and then continuously selecting a next data structural body from the candidate set until the candidate set is empty. According to the heterogeneous information network element determining method and the heterogeneous information network element determining device, the useful metapath can be quickly and accurately determined. According to the link prediction method and the link prediction device, more accurate prediction results can be obtained.

Description

Heterogeneous information network element path determines, link prediction method and device
Technical field
The first path that the present invention relates in heterogeneous information network determines technical field, particularly relates to a kind of unit path and determines Method and device.
Background technology
In recent years, the research to heterogeneous information network is increasingly burning hoter, and many data mining work are all at heterogeneous information Network is carried out.Heterogeneous information network (Heterogeneous Information Network) refers to, the entity in network The network of relationship type | R | the > 1 between object type | A | > 1 or link different entities object, in a network, a node Representing an entity object (abbreviation entity), a limit represents the relation between two entity objects connected by this edge.
Link prediction is to carry out the basis of data mining, such as data cleansing and recommendation etc. from heterogeneous information network.Chain The general process of road prediction is: the multiple training entities pair linked by certain edges thereof determining in heterogeneous information network, enumerates this All units path of a little training entities pair;Forecast model is set up, according to prediction in first path according to the training entity pair enumerated Model calculates the entity to be predicted probability to being linked by above-mentioned specific unit path, when this probability is more than preset value when, says Bright entity to be predicted links by this certain edges thereof.
Unit path refers to, the sequence in the different paths connecting two entities in heterogeneous information network is combined, and represents really Semantic relation between body.Article one, first path ∏ is defined asIt describes At node R1And Rl+1Between, by a series of node R1,…,Rl+1With chain roadside L1,…,LlA paths.
Being designated as example with the Dbpedia knowledge graph shown in Fig. 1, it contains a lot of different types of node and limit, such as node Person, City, Country, limit bornIn, locatedIn, diedIn, hasCapital-1;Two nodes can be by a plurality of Unit path is chained up, and such as, first path of link two nodes of Person and Country has and two: one is,Two are,Unit Path is widely used in link prediction.Therefore, the top priority carrying out link prediction is, determines in heterogeneous information network Unit path.
Mostly current existing link prediction method is for the heterogeneous information network under simple mode, such as, ecommerce The heterogeneous information network of a kind of two merotypes being made up of user and article in website;Or in bibliographic data base DBLP, by discussing Literary composition, author link together with meeting set key word, the heterogeneous information network of a kind of Star Schema of composition.At these networks In, can manually be enumerated out for building first path of link prediction model.But, many actual heterogeneous information nets Network is the most complicated, and its network structure cannot be described with simple network schemer so that be used for building link prediction model First path also cannot be enumerated out, even and if be enumerated out, can produce is not much critically important first path yet. Such as, Dbpedia, it is a kind of knowledge mapping, have recorded and surpasses 38,000,000 objects and 3,000,000,000 relations, wants first road of this network It is impossible that footpath enumerates out.
Visible, need a kind of first determining method of path for the heterogeneous information network under complex patterns of proposition badly, and utilize These yuan of path construction link prediction model carries out the link prediction in the heterogeneous information network under complex patterns.
Summary of the invention
The purpose of the embodiment of the present invention is to provide heterogeneous information network element path to determine, link prediction method and device, To determine first path of the heterogeneous information network under complex patterns quickly and accurately, and utilize these yuan of path construction link pre- Survey model and carry out the link prediction in the heterogeneous information network under complex patterns.
To achieve these goals, the embodiment of the invention discloses a kind of heterogeneous information network element determining method of path, institute The method of stating includes:
S101, determine in heterogeneous information network to be determined unit path multiple first instances pair, wherein, each described first Entity is to including source node and destination node, and each described first instance is at least being connect by the side chain of the first preset kind;
S102, according to the plurality of first instance to determining primary data structure body;Described primary data structure body includes: The entity pair self being made up of with this source node the source node of each described first instance centering;
S103, according to the limit type in described heterogeneous information network, generate multiple first candidate unit paths that jumping figure is 1, After each described first candidate unit path execution of step A to step D, execution step S104:
A. according to described heterogeneous information network, described primary data structure body and described first candidate unit path, quilt is generated Multiple second instances pair of described first candidate unit path link;Wherein, the source node of described second instance pair be described initially The source node of the entity pair in data structure body, the destination node of described second instance pair is except institute in described heterogeneous information network State the node outside the source node of first instance centering;
B. each described second instance is calculated to by during described first candidate unit path link according to the first preset model Similarity measure values;By described first candidate unit path, each described second instance to and correspondence similarity measure values protect Deposit to the first data structure body;
C. calculate the comprehensive similarity scores of described first data structure body according to the second preset model and preserve to described First data structure body;
D. described first data structure body is inserted candidate collection;
S104, size according to described comprehensive similarity scores, concentrate from described candidate and select a data structure, note It it is the second data structure body;Whether the second data structure body described in procuratorial work exists with arbitrary described first instance identical Three entities pair;
S105 if it does, by described second data structure body, links first path of described 3rd entity pair and described Correspondence is preserved to unit's path collection by the 3rd entity, deletes the described second data structure body that described candidate is concentrated, and performs step S104;
S106 is if it does not, according to the second candidate unit path and described different preserved in described second data structure body Limit type in matter information network, generates multiple 3rd candidate unit path, the jumping figure in described 3rd candidate unit path and described the The difference of the jumping figure in two candidate unit paths is 1;Delete the described second data structure body that described candidate is concentrated;To each described 3rd After candidate unit path execution of step E to H, perform step S104;
E, according to described heterogeneous information network, described second data structure body and described 3rd candidate unit path, generate quilt Multiple 4th entities pair that described 3rd candidate unit path connects, the source node of described 4th entity pair is described second data knot The source node of the entity pair in structure body, the destination node of described 4th entity pair is except described first in described heterogeneous information network Node outside the source node of entity pair;
F, according to described first preset model calculate each described 4th entity link by described 3rd candidate unit path Time similarity measure values, by described 3rd candidate unit path, each described 4th entity to and the similarity measurement of correspondence Value preserves to the 3rd data structure body;
G, the comprehensive similarity scores calculating described 3rd data structure body according to described second preset model preservation are extremely Described 3rd data structure body;
H, by described 3rd data structure body insert described candidate collection;
Wherein, described first preset model is:Wherein, σ (s,ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;∏1…iRepresent linked source Node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path ∏1…i-1Upstream Walking the set of accessibility destination node, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through limit Ri-1Arrive Destination node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second preset model is:Wherein, S represents combining of data structure body Close similarity scores;S is source node, t be by unit path ∏ up to destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measure values on unit path ∏;R (s)=1-α * N, r (s) represent source node s pair In the contribution ability of current data structure with the selection of balanced structure body, α is the degradation factor of contribution ability, and N represents and protects Deposit the source node number as the described first instance pair of s of the first path link integrated to described unit path.
The embodiment of the invention also discloses a kind of link prediction method, the method for described link prediction includes:
Determine entity pair to be predicted;
According to the 4th preset model and described unit path collection, determine that described entity to be predicted presets class to by described first The probability that the side chain of type connects;Described 4th preset model is:Wherein, η (s, t | It is γ) that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s, t) is described entity pair to be predicted, Wherein s is source node, and t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ(s,t|∏i) it is Described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt is unit path ∏iWeight;ω0 For correction factor;
Judge whether described probability is more than the 3rd preset value, if it is, determine that described 3rd entity is to by described default class The limit of type connects.
The embodiment of the invention also discloses a kind of heterogeneous information network element path and determine that device, described device include: first Determine module, second determine module, the first trigger module, the 3rd determine module, the first computing module, the second computing module, One insert module, first select module, the second trigger module, the 3rd trigger module, the 4th determine module, the 3rd computing module, 4th computing module and second inserts module,
Described first determines module, for determining multiple first instances pair in unit to be determined path in heterogeneous information network, Wherein, each described first instance is to including source node and destination node, and each described first instance is at least being preset by first The side chain of type connects;
Described second determines module, is used for according to the plurality of first instance determining primary data structure body;At the beginning of described Beginning data structure body includes: the entity pair self being made up of with this source node the source node of each described first instance centering;
Described first trigger module, for according to the limit type in described heterogeneous information network, generate jumping figure be 1 multiple First candidate unit path, triggers the described 3rd successively and determines module, described first calculating each described first candidate unit path After module, described second computing module and described first insert module, trigger described first and select module;
Described 3rd determines module, for according to described heterogeneous information network, described primary data structure body and described the One candidate unit path, generates by multiple second instances pair of described first candidate unit path link;Wherein, described second instance pair The source node that source node is the entity pair in described primary data structure body, the destination node of described second instance pair is described Node in addition to the source node of described first instance centering in heterogeneous information network;
Described first computing module, for calculating each described second instance to by described first candidate according to the first preset model Similarity measure values during unit's path link;By described first candidate unit path, each described second instance to and the similarity of correspondence Metric preserves to the first data structure body;Wherein, described first preset model is: Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;∏1…iRepresent Linked source node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path ∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through Limit Ri-1Arrive destination node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second computing module, for calculating the comprehensive similarity scores of described first data structure body according to the second preset model And preserve to described first data structure body;Wherein, described second preset model is: Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measure values on unit path ∏;R (s)= 1-α * N, r (s) represent that source node s contributes ability with the selection of balanced structure body for current data structure, and α is contribution energy The degradation factor of power, N represents that the source node preserving the first path link integrated to described unit path is as the described first instance pair of s Number;
Described first inserts module, for described first data structure body is inserted candidate collection;
Described first selects module, for the size according to described comprehensive similarity scores, concentrates from described candidate and selects One data structure, is designated as the second data structure body;Whether the second data structure body described in procuratorial work exist described with arbitrary First instance is to the 3rd identical entity pair;
Described second trigger module, in the case of the inspection result in described first selection module acquisition is for being, will In described second data structure body, correspondence is preserved to unit road by the first path and described 3rd entity that link described 3rd entity pair Footpath collection, deletes the described second data structure body that described candidate is concentrated, and triggers described first selection module;
Described 3rd trigger module, in the case of the described first inspection result selecting module to obtain is no, according to Limit type in the second candidate unit path preserved in described second data structure body and described heterogeneous information network, generates multiple 3rd candidate unit path, the jumping figure in described 3rd candidate unit path is 1 with the difference of the jumping figure in described second candidate unit path;Delete The described second data structure body that described candidate is concentrated;Each described 3rd candidate unit path is triggered the described 4th successively determine After module, described 3rd computing module, described 4th computing module and described second insert module, trigger described first and select mould Block;
Described 4th determines module, for according to described heterogeneous information network, described second data structure body and described the Three candidate unit paths, generate multiple 4th entities pair connected by described 3rd candidate unit path, the source of described 4th entity pair Node is the source node of the entity pair in described second data structure body, and the destination node of described 4th entity pair is described heterogeneous Node in addition to the source node of described first instance pair in information network;
Described 3rd computing module, for calculating each described 4th entity to described according to described first preset model Similarity measure values during the 3rd candidate unit path link, by described 3rd candidate unit path, each described 4th entity to and The similarity measure values of its correspondence preserves to the 3rd data structure body;
Described 4th computing module, for calculating the comprehensive of described 3rd data structure body according to described second preset model Similarity scores also preserves to described 3rd data structure body;
Described second inserts module, for described 3rd data structure body is inserted described candidate collection.
The embodiment of the invention also discloses a kind of link prediction device, the device of described link prediction includes: reality to be predicted Body to determining module, probability determination module and the 4th judge module,
Described entity to be predicted, to determining module, is used for determining entity pair to be predicted;
Described probability determination module, for according to the 4th preset model and described unit path collection, determining described entity pair to be predicted To the probability connect by the side chain of described first preset kind;Described 4th preset model is: Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s, t) be described in treat Prediction entity pair, wherein s is source node, and t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ (s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt it is unit path ∏iWeight;ω0For correction factor;
Described 4th judge module, be used for judging described probability whether more than the 3rd preset value, if it is, determine described the Three entities connect by the limit of described preset kind.
The heterogeneous information network element path that the embodiment of the present invention provides determines, link prediction method and device, can create First data structure body also inserts candidate collection;Comprehensive similarity scores big of the data structure body preserved is concentrated according to candidate Little, concentrate from candidate successively and select the data structure body that comprehensive similarity scores is big, be designated as the second data structure body, and procuratorial work should Whether data structure body exists with arbitrary described first instance the 3rd identical entity pair;If it does, by described second In data structure body, correspondence is preserved to unit's path collection by the first path and described 3rd entity that link described 3rd entity pair, deletes The described second data structure body concentrated except described candidate, and continue the size according to comprehensive similarity scores from candidate concentration choosing Select next data structure body;If it does not exist, then create the 3rd data structure body and insert candidate collection, then proceed to according to combining The size closing similarity scores concentrates the next data structure body of selection from candidate;Until candidate collection is empty.Owing to unit path is true Determine the size of comprehensive similarity scores that method and device is the data structure body concentrated according to candidate, strong according to degree of correlation Weak, determine the link first instance more relevant first path to (train to) the most successively;Therefore, this First determining method of path of bright offer and device, not only determine that the efficiency in unit path is high, and the first path determined be more useful;By It is first path construction that the first determining method of path utilizing the embodiment of the present invention to provide determines in link prediction method and device Forecast model, therefore, application the embodiment of the present invention provide a kind of link prediction method and device, it is thus achieved that predict the outcome more Accurately.Certainly, arbitrary product or the method for implementing the present invention must be not necessarily required to reach all the above advantage simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawing is obtained according to these accompanying drawings.
The flow chart of a kind of heterogeneous information network element determining method of path that Fig. 1 provides for the embodiment of the present invention;
Fig. 2 is step A in the embodiment shown in Fig. 1 to the flow chart of step D;
Fig. 3 is step E in the embodiment shown in Fig. 1 to the flow chart of step H;
A heterogeneous information network subgraph in the actual application that Fig. 4 provides for the embodiment of the present invention;
A kind of heterogeneous information network element determining method of path signal in the actual application that Fig. 5 provides for the embodiment of the present invention Figure;
The flow chart of a kind of link prediction method that Fig. 6 provides for the embodiment of the present invention;
Fig. 7 carries out predicting the outcome of link prediction for a kind of link prediction method that the application embodiment of the present invention provides and compares Figure;
Fig. 8 determines the structure chart of device for a kind of heterogeneous information network element path that the embodiment of the present invention provides;
The structure chart of a kind of link prediction device that Fig. 9 provides for the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.
Embodiments provide a kind of heterogeneous information network element determining method of path and device, be applied to server; Present invention also offers a kind of link prediction method and device, be applied to server.Illustrate separately below.
First a kind of heterogeneous information network element determining method of path is illustrated.
Referring to figs. 1 to Fig. 3, the present invention implements to provide a kind of heterogeneous information network element determining method of path, and the method can To include:
S101, determine in heterogeneous information network to be determined unit path multiple first instances pair, wherein, each described first Entity is to including source node and destination node, and each described first instance is at least being connect by the side chain of the first preset kind;
Owing to the general process of link prediction is: determine that the multiple training linked by certain edges thereof in heterogeneous information network are real Body pair, enumerates all units path of these training entities pair;Prediction is set up in first path according to the training entity pair enumerated Model, calculates the entity to be predicted probability to being linked by above-mentioned specific unit path according to forecast model, when this probability is more than pre- If the when of value, illustrating that entity to be predicted links by this certain edges thereof.
Therefore, when unit path be link prediction service time, first instance is right, by first instance to being referred to as training Set to composition can be referred to as training set;The certain types of limit that the limit of the first preset kind is in link prediction;
The quantity of first instance pair can be determined according to practical situation;The present invention is shown by experimental verification: when When the quantity of one entity pair is less than 10, link prediction accuracy rate quickly improves along with the increasing of quantity of first instance pair;So And, when the quantity of first instance pair is more than 10, the accuracy predicted the outcome is not had by the size of the quantity of first instance pair substantially There is anything to affect.Through analyzing, causing the reason of this phenomenon is that the training set that the quantity of first instance pair is the least can not comprise institute There is important first path, and the quantity of excessive first instance pair can introduce noise.It is preferred, therefore, that the number of first instance pair Amount, in the interval of 10 to 20, is possible not only in this interval can find all useful first paths well, and is avoided that and draws Enter too much noise, in addition, so can also save more time and space resources;
Concrete, in order to a kind of heterogeneous information network element path side of determination that the embodiment of the present invention provides is described more intuitively Method, Fig. 4 provides a heterogeneous information network subgraph in an actual application, and this heterogeneous information network includes: IsCitizenOf, WorkAt, wasBornIn, isLocatedin and Owns five limit of type, by therein The limit of " isCitizenOf " this type is defined as the limit of the first preset kind, by (1,8), (2,8), (3,9), (4,9) this four Individual entity is to being defined as first instance to (train to).
Below in conjunction with Fig. 4 and Fig. 5, step S102 to S106 is specifically described.
S102, according to the plurality of first instance to determining primary data structure body;Described primary data structure body includes: The entity pair self being made up of with this source node the source node of each described first instance centering;
After first instance is to determining, embodiment use " data structure body " that the present invention provides records and determines link The process in first path of first instance pair;
First determining a primary data structure body, this primary data structure body includes: by each first instance centering The entity pair of source node and this source node self composition;Such as table No.1 in Fig. 5.
S103, according to the limit type in described heterogeneous information network, generate multiple first candidate unit paths that jumping figure is 1, After each described first candidate unit path execution of step A to step D, perform step S104
Limit type in heterogeneous information network has how many kinds of, then generate the first candidate unit path that how many jumping figures are 1;Example As, as it is shown in figure 5, according to there being the limit of five types in the heterogeneous information network shown in Fig. 4, then generating 5 jumping figures is the first of 1 Candidate unit path:With
Each described first candidate unit path is performed step A to step D:
A. according to described heterogeneous information network, described primary data structure body and described first candidate unit path, quilt is generated Multiple second instances pair of described first candidate unit path link;Wherein, the source node of described second instance pair be described initially The source node of the entity pair in data structure body, the destination node of described second instance pair is except institute in described heterogeneous information network State the node outside the source node of first instance centering;
Wherein, second instance is to being being linked by described first candidate unit path of necessary being in described heterogeneous information network Entity pair.
Such as, according in primary data structure body surface No.1 and Fig. 5 in the heterogeneous information network subgraph shown in Fig. 4, Fig. 5 EntitledThe first candidate unit path, generate quiltThe of first path link that this article 1 is jumped Two entities pair: (1,8), (2,8), (3,9) and (4,9);
In like manner, quilt is generatedThis 1 jump first path link second instance pair: (1,5), (2,6) and (3,7);Generate quiltThe second instance pair of this 1 first path link jumped: (4,6) and (4,7);
Due to the heterogeneous information network subgraph shown in Fig. 4 does not exist by "With" these two 1 The entity pair of the first path link jumped, therefore cannot generate by the entity pair of these two the first candidate units path link.
B. each described second instance is calculated to by during described first candidate unit path link according to the first preset model Similarity measure values;By described first candidate unit path, each described second instance to and correspondence similarity measure values protect Deposit to the first data structure body;
First preset model is the similarity measurements quantity algorithm PCRW (Path-Constrained disclosed in prior art Random Walk);
First preset model particularly as follows:
Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;Represent linked source node s and destination node tiFirst path of jumping of i-1, ViRepresent and start at unit path ∏ from source node s1…i-1The accessibility destination node of upper migration;I(Vi-1) represent and open from source node s Begin at unit path ∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) represent Whether can pass through limit Ri-1Arrive destination node ti, if R is (x, ti) equal to 1, otherwise R (x, ti) equal to 0;R (x) represents Node x passes through limit Ri-1Accessibility number of network nodes;
Such as, the quilt calculated according to the first preset modelLink second instance to (1,8), (2, 8), the similarity measure values of (3,9) and (4,9) is 1;By the first candidate unit pathBy this first path The second instance pair of link: (1,8), (2,8), (3,9) and (4,9), and each second instance is to corresponding similarity measurement Value preserves the table No.2 to Fig. 5;
In like manner, willWithThis two yuan of first candidate unit path, by these two the first candidate unit paths The second instance of link to and each second instance table No.3 that corresponding similarity measure values is preserved respectively to Fig. 5 and table No.4;
Table No.2, table No.3 and table No.4 in Fig. 5 are the first data structure body.
C. calculate the comprehensive similarity scores of described first data structure body according to the second preset model and preserve to described First data structure body;
Second preset model is:
Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to Destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measurement on unit path ∏ Value;R (s)=1-α * N, r (s) represent source node s for current data structure contribute ability with the selection of balanced structure body, α is the degradation factor of contribution ability, and N represents described as s of source node preserving the first path link integrated to described unit path The number of first instance pair;
Such as, the first candidate unit path can be calculated according to the second preset modelWithThe comprehensive similarity scores S of the first corresponding data structure body is respectively as follows: 4,3 and 0.5, its corresponding preservation is extremely schemed In table No.2, table No.3 and table No.4 in 5.
After each first candidate unit path execution of step C, each first data structure body at least includes: first waits Choosing unit path, by the second instance of the first candidate unit path link to, each second instance to corresponding similarity measure values and The comprehensive similarity scores S of this first data structure body.
D. described first data structure body is inserted candidate collection.
Such as, the table No.2 in Fig. 5, table No.3 and table No.4 are inserted in " candidate set ".
S104, size according to described comprehensive similarity scores, concentrate from described candidate and select a data structure, note It it is the second data structure body;Whether the second data structure body described in procuratorial work exists with arbitrary described first instance identical Three entities pair;If it is, perform step S105;Otherwise, step S106 is performed;
Such as, the size arrived according to comprehensive similarity scores, from " candidate set ", first select comprehensive similarity The data structure body surface No.2 that mark is maximum, is designated as the second data structure body;Owing to the second data structure body surface No.2 existing With the first instance determined in step S101 to the 3rd identical entity pair, therefore, step S105 is performed;
Such as, if according to the size of comprehensive similarity scores, from candidate concentrate the data structure body selected be No.3 also It is designated as the second data structure body, owing to the second data structure body surface No.3 not existing and the first instance that determines in step S101 To the 3rd identical entity pair, therefore perform step S106.
S105, by described second data structure body, link first path of described 3rd entity pair and described 3rd entity Correspondence is preserved to unit's path collection, deletes the described second data structure body that described candidate is concentrated, and perform step S104;
Such as, by the 3rd entity in the second data structure body surface No.2 to (1,8), (2,8), (3,9) and (4,9) and Corresponding first pathCorresponding preserve to unit's path collection, namely it was confirmed that each first instance pair A first pathThen, the second data structure body surface No.2 concentrated by candidate deletes, now selected works In data structure body be and table No.3 and table No.4 that this candidate collection is continued executing with execution step S104.
S106, according in described second data structure body preserve the second candidate unit path and described heterogeneous information network in Limit type, generate multiple 3rd candidate unit path, the jumping figure in described 3rd candidate unit path and described second candidate unit path The difference of jumping figure be 1;Delete the described second data structure body that described candidate is concentrated;Each described 3rd candidate unit path is held Go after step E to H, performed step S104;
When the second data structure body is table No.3, the second candidate unit path is3rd candidate unit path should Being further added by a jumping on the basis of the second candidate unit path, the such as the 3rd candidate unit path can be: Or it is
To each described 3rd candidate unit path execution step E to H:
E, according to described heterogeneous information network, described second data structure body and described 3rd candidate unit path, generate quilt Multiple 4th entities pair that described 3rd candidate unit path connects, the source node of described 4th entity pair is described second data knot The source node of the entity pair in structure body, the destination node of described 4th entity pair is except described first in described heterogeneous information network Node outside the source node of entity pair;
Determining that the method for the 4th entity pair is consistent with the method determining second instance pair in step A, here is omitted.
F, according to described first preset model calculate each described 4th entity link by described 3rd candidate unit path Time similarity measure values, by described 3rd candidate unit path, each described 4th entity to and the similarity measurement of correspondence Value preserves to the 3rd data structure body;
First preset model is the first preset model described in step B;
Determine that the method for the 3rd data structure body is consistent with the method determining the first data structure body in step B, the most not Repeat again.
G, the comprehensive similarity scores calculating described 3rd data structure body according to described second preset model preservation are extremely Described 3rd data structure body;
Second preset model is the second preset model in step C;
The comprehensive similarity scores calculating the 3rd data structure body the concrete grammar preserved are identical with step C, herein Repeat no more.
H, by described 3rd data structure body insert described candidate collection.
Such as, the table No.5 and table No.6 that represent the 3rd data structure body in Fig. 5 are inserted candidate collection.
After the 3rd corresponding for each 3rd candidate unit path data structure body is inserted candidate collection, return and continue executing with step Rapid S104, until candidate collection is empty, when candidate collection is empty, illustrates to find out each first instance to corresponding all useful unit Path.
It should be noted that in step B, step F the similarity measurement of computational entity pair, and in step C and step G The purpose of the comprehensive similarity scores of middle calculating is: make the heterogeneous information network element path provided by the embodiment of the present invention determine First path that method is determined be link training to being more relevant first path, when link training to first path more relevant Time, this first path is more useful when building forecast model;It is right that these yuan of path has not only linked more training, Er Qiebiao Show the source node of training centering and the more close relation of destination node, thus present the recessive character of training set.Such as, Due to the comprehensive similarity scores maximum of this data structure body of table No.2 that candidate is concentrated, therefore in table No.2This first path is the Article 1 unit path found in Fig. 5, and it is not only the shortest frontier juncture system, and is institute There is in candidate unit path maximally related one;
Further, since concentrate the data structure body chosen, all candidate to concentrate comprehensive similarity scores from candidate afterwards every time The data structure body of relative maximum, therefore, first path that each step determines also is most useful, maximally related in Candidate Set at that time, Which ensure that by the power of degree of correlation sequentially find train to relevant unit path;
This from training to source node from the beginning of, find useful first Path Method step by step and be referred to as greedy algorithm, In each step, the first path being determined is all the most relevant and reaches first path of most destination node;Next, it is determined that this yuan of path Whether link training right.If link, the training of this yuan of path and link thereof is to selected and preservation extremely unit's path collection;No The most wolfishly continually look for, until candidate collection is empty;Finally, first set of paths γ will be generated.
A kind of heterogeneous information network element determining method of path that the application embodiment of the present invention provides, it is possible to concentrate according to candidate The size of comprehensive similarity scores of data structure body, according to the power of degree of correlation, determine the most successively Go out to link the first instance relevant unit path to (train to), not only determine that the efficiency in unit path is higher, and the first path determined More useful.
Preferably, on the basis of the embodiment shown in Fig. 1, in order to make the first path determined more useful, more further Relevant, after each first candidate unit path has been performed described step C, and each first candidate unit path is being performed described Before step D, described method also comprises the steps:
I. judge whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value; If it is, perform described step D;Otherwise, described first data structure body is abandoned;
Concrete, whether judge not less than l according to the comprehensive similarity scores S of the first data structure body;
Wherein, l=ε * | A |;Wherein, ε, for limiting coefficient, determines according to actual application scenarios;| A | is the first data knot The scale of structure body, the quantity of entity pair in the i.e. first data structure body;
And/or, after each 3rd candidate unit path has been performed described step G, and to each 3rd candidate unit road Before footpath performs described step H, described method also comprises the steps:
J. judge whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value; If it is, perform described step H;Otherwise, described 3rd data structure body is abandoned;
Method described in concrete determination methods and step I is consistent, and here is omitted.
After the process of step I or step J so that insert the first data structure body and the 3rd data structure that candidate collects Body is preferred data structure body, and the first path further ensuring generation is more relevant, can preferably describe every a pair training Relation between pair, and the link training determined to first path too much will not introduce noise because number.
Preferably, on the basis of the embodiment shown in Fig. 1 of the present invention, in order to make the first path determined more have further With, more relevant, before each 3rd candidate unit path is performed step E, described method also comprises the steps:
Judge whether the jumping figure in described 3rd candidate unit path is not more than the second preset value;If it is, perform described step E;
For example, it is possible to the second preset value is set to 4, say, that when the jumping figure in the 3rd candidate unit path is more than 4, this unit Path is almost without actual semantic relation.
Therefore, the jumping figure inserting the 3rd candidate unit path corresponding to the 3rd data structure body that collects of candidate is any limitation as, Making insert candidate to collect the 3rd data structure body is preferred data structure body, further ensures the first path more phase determined Close, can preferably describe the relation between every a pair training pair, and the link training determined to first path will not be because of counting Many and introduce noise.
Although, utilize a kind of heterogeneous information network element determining method of path that the embodiment shown in Fig. 1 of the present invention provides, really Fixed link training to every first path be all useful, relevant, but, these yuan of path is used for building link prediction Model, and when carrying out link prediction according to forecast model, the influence degree in every first path is again different.Therefore, weighing apparatus is found The method measuring the degree of correlation in every first path, and they are effectively integrated into forecast model be very important.
It is preferred, therefore, that on the basis of the embodiment shown in Fig. 1, a kind of heterogeneous information that the embodiment of the present invention provides Network element determining method of path, it is also possible to including:
According to the 3rd preset model, determine that described unit path is concentrated weight corresponding to each first path and corresponding preserves extremely Described unit path collection;
Described 3rd preset model is:
max h = Σ x + ∈ q + l n ( t ( ω , x + ) ) | q + | + Σ x - ∈ q - l n ( 1 - t ( ω , x - ) ) | q - | - | | ω | | 2 2
Wherein, h represents the output valve of the 3rd preset model, and ω is that power corresponding to each first path is concentrated in described unit path The vector reassembled into, ω=[ω12,…,ωi...], ωiThe first path concentrating serial number i for described unit path is corresponding Weight, it is assumed that unit concentrates in path and saves M bar unit path altogether, then i=1 ..., M, ωi>=0,
When the output valve h maximum of the 3rd preset model, the power that each first path is corresponding is concentrated in the first path in above formula The vectorial ω reassembled into is optimum, the ω in ωiAlso it is optimum;
Wherein,x+It is positive example sample x+Similarity measure values composition on all units path Vector, x+It is referred to as positive example value;x-It is negative example sample x-Similarity measure values group on all units path The vector become, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample x-It is by described positive example sample Destination node replace with this destination node with the node of type after, the sample that there is not link of composition;Q+ is institute There is positive example value x+The similarity matrix of composition;Q-is all negative example values x-The similarity matrix of composition;For correction term.
It should be noted that owing to link prediction can be taken as a kind of special classification problem.So the present invention is just using Example sample and negative example sample arrange weight for every first path determined with having supervision;So that utilize the unit with weight The forecast model that path exercising goes out is more effectively.
The present invention is shown by experimental verification, arranges the random weight peace forecast model that all weight builds with giving unit path Carry out link prediction to compare, the weight that the first path weight value learning method provided by the present invention is determined, and utilize with these The forecast model of first path construction of weight can be obviously improved the accuracy of prediction.
A kind of heterogeneous information network element determining method of path that the embodiment of the present invention provides, can create the first data structure Body also inserts candidate collection;The size of the comprehensive similarity scores of the data structure body preserved is concentrated, successively from candidate according to candidate Concentrate and select the data structure body that comprehensive similarity scores is big, be designated as in the second data structure body, and this data structure body of procuratorial work Whether exist with arbitrary described first instance the 3rd identical entity pair;If it does, by described second data structure body, Correspondence is preserved to unit's path collection by the first path and described 3rd entity that link described 3rd entity pair, deletes described candidate and concentrates Described second data structure body, and continue size according to comprehensive similarity scores and concentrate from candidate and select next data knot Structure body;If it does not exist, then create the 3rd data structure body and insert candidate collection, then proceed to according to comprehensive similarity scores Size is concentrated from candidate and is selected next data structure body;Until candidate collection is empty.Owing to the method is concentrated according to candidate The size of the comprehensive similarity scores of data structure body, according to the power of degree of correlation, determines the most successively The link first instance more relevant first path to (train to), therefore, first determining method of path that the present invention provides, the most really The efficiency in fixed unit path is high, and the first path determined is more useful.
As shown in Figure 6, the embodiment of the present invention additionally provides a kind of link prediction method, and the method can include walking as follows Rapid:
S401, determine entity pair to be predicted;
In heterogeneous information network, entity to be predicted is to can be except the external any entity pair of training.
S402, according to the 4th preset model and described unit path collection, determine that described entity to be predicted is to by described first The probability that the side chain of preset kind connects;
Wherein, the limit of the first preset kind, it is the limit of the first preset kind described in the embodiment shown in Fig. 1, often One training correspondence is at least connect by the side chain of the first preset kind.
4th preset model is:
Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s,t) Being described entity pair to be predicted, wherein s is source node, and t is destination node;γ is described unit path collection;I is that unit path is in γ Sequence number,;σ(s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωi It is unit path ∏iWeight;ω0For correction factor;Wherein, γ is to use the embodiment of the method shown in Fig. 1 to obtain, ωiIt it is root Determine according to the 3rd preset model.
S403, judge described probability whether more than the 3rd preset value, if it is, determine that described 3rd entity is to by described pre- If the limit of type connects;
3rd preset value can set according to actual application scenarios, and generally 0.7;
Such as, general to connect by the side chain of the first preset kind when the entity to be predicted calculated according to the 4th preset model Rate is 0.8, then illustrate, entity to be predicted between be implicitly present in " limit of the first preset kind " this link.
In order to show the effectiveness of a kind of link prediction method that the embodiment of the present invention provides more intuitively, the present invention passes through This link prediction method is verified by experiment, and concrete proof procedure is as follows:
1) data set is determined
In an experiment, using Yago data set to verify, it is a large-scale knowledge mapping, the entity that it comprises The record of the fact that more than 10,000,000 and more than 100,000,000 2 thousand ten thousand, the present invention, should only with its core factual aspect " YagoFact " Part includes: the limit of 35 types, 4484914 relations and be subordinated to 1369931 entities of 3455 kinds of entity types.Article one, Relation RDF data frame representation: (entity, relation, entity), object lesson such as, (New York, is positioned at, the U.S.).
2) evaluation criterion is determined
The present invention uses ROC (Receiver Operating Characteristic) curve to weigh distinct methods Performance, it is using kidney-Yang rate (TPR) as y-axis, using False-Negative Rate (FPR) as x-axis, the most as shown in Figure 7.TPR is just to be predicted as The ratio of positive example sample number and actual positive example sample number, and FPR is to be predicted as positive negative example sample and actual actual negative example sample The ratio of this number.Area under curve is the biggest, it was predicted that result is the most accurate, and what being predicted as here was just referring to is exactly, predicted reality Body connects by the limit of described preset kind.
3) comparison other is determined
Due to being fruitful, so adopting currently without the heterogeneous information link in network prediction being directed under complex patterns With the basic link Forecasting Methodology in disclosed a kind of heterogeneous information network being applied under simple mode of the prior art with The link prediction method that the present invention provides is made comparisons.
Disclosed a kind of basic link Forecasting Methodology of the prior art particularly as follows: travel through out training to all units road Footpath, calculates the similarity measure values in each first path, and gives each the identical weight in first path according to PCRW algorithm, Then build link forecast model, and utilize this model to be predicted.
Owing to the first path more than 4 jumpings is almost without actual semantic relation, the present invention is by of the prior art disclosed a kind of The maximum hop count in the first path determined in basic link Forecasting Methodology is limited to 1,2,3,4 respectively, and i.e. corresponding generation four kinds is basic They are respectively labeled as by link prediction method: PCRW-1, PCRW-2, PCRW-3 and PCRW-4, and by these four basic link The comparison other of the link prediction method that Forecasting Methodology provides as the present invention.
In an experiment, have chosen two different types of links to be predicted:WithPhase is pre- Survey result correspondence display respectively in Fig. 7 (a) and Fig. 7 (b).For each link, from Yago data set, choose 200 to depositing At the entity pair of this both links, using therein 100 to as training entity pair, other 100 to as test entity pair, and Assume that these these links do not exist in prognostic experiment.
In an experiment, ε is set to 0.005, and the jumping figure maximum limit in candidate unit path is made as 4.
4) experimental result
Experimental result as it is shown in fig. 7, it can be seen from figure 7 that the embodiment of the present invention provide a kind of link prediction method Predictablity rate apparently higher than base link Forecasting Methodology of the prior art, this explanation, utilize the embodiment of the present invention to provide The link prediction model of first path construction that determines of a kind of heterogeneous information network element determining method of path more effectively, more can be accurate True carries out link prediction.
A kind of link prediction method that the embodiment of the present invention provides, it may be determined that entity pair to be predicted;Preset according to the 4th Model and described unit path collection, determine described entity to be predicted to the probability connect by the side chain of described first preset kind, the Four preset models are:Judge whether described probability is more than the 3rd preset value, if It is to determine that described 3rd entity connects by the limit of described preset kind.Due to the 4th preset model as forecast model, it is First path exercising that the heterogeneous information network element determining method of path provided by the embodiment of the present invention is determined out, and the 4th Preset model also contemplates the weight in each first path, therefore, a kind of link prediction side that the application embodiment of the present invention provides Method, it is thus achieved that predict the outcome the most accurate.
Corresponding to said method embodiment, the embodiment of the present invention additionally provides a kind of heterogeneous information network element as shown in Figure 8 Path determines that device, described device include: first determine module 101, second determine module the 102, first trigger module 103, Three determine that module the 201, first computing module the 202, second computing module 203, first inserts module 204, first and selects module 104, the second trigger module the 105, the 3rd trigger module the 106, the 4th determines module the 301, the 3rd computing module the 302, the 4th calculating Module 303 and second inserts module 304,
First determines module 101, for determining multiple first instances pair in unit to be determined path in heterogeneous information network, its In, each described first instance is to including source node and destination node, and each described first instance is at least being preset class by first The side chain of type connects;
Owing to the general process of link prediction is: determine that the multiple training linked by certain edges thereof in heterogeneous information network are real Body pair, enumerates all units path of these training entities pair;Prediction is set up in first path according to the training entity pair enumerated Model, calculates the entity to be predicted probability to being linked by above-mentioned specific unit path according to forecast model, when this probability is more than pre- If the when of value, illustrating that entity to be predicted links by this certain edges thereof.
Therefore, when unit path be link prediction service time, first instance is right, by first instance to being referred to as training Set to composition can be referred to as training set;The certain types of limit that the limit of the first preset kind is in link prediction;
The quantity of first instance pair can be determined according to the actual scale of heterogeneous information network, typically 10 to On, it is also preferred that the left 10 to 20 between;
Second determines module 102, is used for according to the plurality of first instance determining primary data structure body;Described initially Data structure body includes: the entity pair self being made up of with this source node the source node of each described first instance centering;
After first instance is to determining, embodiment use " data structure body " that the present invention provides records and determines link The process in first path of first instance pair;
First determining a primary data structure body, this primary data structure body includes: by each first instance centering The entity pair of source node and this source node self composition.
First trigger module 103, for according to the limit type in described heterogeneous information network, generate jumping figure be 1 multiple First candidate unit path, triggers the described 3rd successively and determines module 201, described first meter each described first candidate unit path After calculating module 202, described second computing module 203 and described first insertion module 204, trigger described first and select module;
Concrete, the limit type in heterogeneous information network has how many kinds of, then generate the first candidate unit that how many jumping figures are 1 Path.
3rd determines module 201, for according to described heterogeneous information network, described primary data structure body and described first Candidate unit path, generates by multiple second instances pair of described first candidate unit path link;Wherein, described second instance pair Source node is the source node of the entity pair in described primary data structure body, and the destination node of described second instance pair is described different Node in addition to the source node of described first instance centering in matter information network;
Wherein, second instance is to being being linked by described first candidate unit path of necessary being in described heterogeneous information network Entity pair.
First computing module 202, for calculating each described second instance to by described first according to the first preset model Similarity measure values during the path link of candidate unit;By described first candidate unit path, each described second instance to and right The similarity measure values answered preserves to the first data structure body;
First preset model is the similarity measurements quantity algorithm PCRW (Path-Constrained disclosed in prior art Random Walk);
First preset model particularly as follows:
Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values; ∏1…iRepresent linked source node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent start from source node s Unit path ∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) indicate whether Can be by limit Ri-1Arrive destination node ti, if R is (x, ti) equal to 1, otherwise R (x, ti) equal to 0;R (x) represents node x By limit Ri-1Accessibility number of network nodes;
Second computing module 203, for calculating the most similar of described first data structure body according to the second preset model Property mark preserving to described first data structure body;
Wherein, the second preset model is:
Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to Destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measurement on unit path ∏ Value;R (s)=1-α * N, r (s) represent source node s for current data structure contribute ability with the selection of balanced structure body, α is the degradation factor of contribution ability, and N represents described as s of source node preserving the first path link integrated to described unit path The number of first instance pair;
After each first candidate unit path execution of step C, each first data structure body at least includes: first waits Choosing unit path, by the second instance of the first candidate unit path link to, each second instance to corresponding similarity measure values and The comprehensive similarity scores S of this first data structure body.
First inserts module 204, for described first data structure body is inserted candidate collection;
First selects module 104, for the size according to described comprehensive similarity scores, concentrates from described candidate and selects one Individual data structure, is designated as the second data structure body;Whether the second data structure body described in procuratorial work exists and arbitrary described One entity is to the 3rd identical entity pair;
Second trigger module 105, in the case of the inspection result in described first selection module acquisition is for being, by institute Stating in the second data structure body, correspondence is preserved to unit path by the first path and described 3rd entity that link described 3rd entity pair Collection, deletes the described second data structure body that described candidate is concentrated, and triggers described first selection module 104;
3rd trigger module 106, in the case of the described first inspection result selecting module to obtain is no, according to institute State the limit type in the second candidate unit path and described heterogeneous information network preserved in the second data structure body, generate multiple the Three candidate unit paths, the jumping figure in described 3rd candidate unit path is 1 with the difference of the jumping figure in described second candidate unit path;Delete institute State the described second data structure body that candidate is concentrated;Each described 3rd candidate unit path is triggered the described 4th successively and determines mould After block 301, described 3rd computing module 302, described 4th computing module 303 and described second insert module 304, trigger described First selects module 104;
4th determines module 301, for according to described heterogeneous information network, described second data structure body and the described 3rd Candidate unit path, generates multiple 4th entities pair connected by described 3rd candidate unit path, the source knot of described 4th entity pair Point is the source node of the entity pair in described second data structure body, and the destination node of described 4th entity pair is described heterogeneous letter Node in addition to the source node of described first instance pair in breath network;
Determine that the method for the 4th entity pair determines with the 3rd and module 201 determining, the method for second instance pair is consistent, herein Repeat no more.
3rd computing module 302, for calculating each described 4th entity to described according to described first preset model Similarity measure values during the 3rd candidate unit path link, by described 3rd candidate unit path, each described 4th entity to and The similarity measure values of its correspondence preserves to the 3rd data structure body;
First preset model is the first preset model employed in the first computing module 202;
Determine with the first computing module 202, the method for the 3rd data structure body determines that the method for second instance pair is consistent, Here is omitted.
4th computing module 303, for calculating the comprehensive of described 3rd data structure body according to described second preset model Similarity scores also preserves to described 3rd data structure body;
Second preset model is the second preset model employed in the second computing module 203;
Calculate in the comprehensive similarity scores of the 3rd data structure body the concrete grammar preserved and the second computing module 203 Identical, here is omitted.
Second inserts module 304, for described 3rd data structure body is inserted described candidate collection.
After the 3rd corresponding for each 3rd candidate unit path data structure body is inserted candidate collection, trigger the first selection mould Block 104, until candidate collection is empty, when candidate collection is empty, illustrates to find out each first instance to corresponding all useful unit Path.
It should be noted that in the first computing module the 202, the 3rd computing module 302 similarity measurements of computational entity pair Measure, and the purpose calculating comprehensive similarity scores in the second computing module 203 and the 4th computing module 303 is: make to pass through First path that the method that the heterogeneous information network element path that the embodiment of the present invention provides determines is determined is that link training is to phase The first path closed;It is right that these yuan of path has not only linked more training, and is demonstrated by training source node and the mesh of centering The more close relation of mark node, thus present the recessive character of training set.
Further, since first select module 104 to concentrate the data structure body chosen, all candidate to concentrate from candidate afterwards every time The data structure body of comprehensive similarity scores relative maximum, therefore, first path that each step determines also be at that time in Candidate Set Be correlated with, which ensure that by the power of degree of correlation sequentially find train to relevant unit path;
This from training to source node from the beginning of, find useful first Path Method step by step and be referred to as greedy algorithm, In each step, the first path being determined is all the most relevant and reaches first path of most destination node;Next, it is determined that this yuan of path Whether link training right.If link, the training of this yuan of path and link thereof is to selected and preservation extremely unit's path collection;No The most wolfishly continually look for, until candidate collection is empty;Finally, first set of paths γ will be generated.
A kind of heterogeneous information network element path that the application embodiment of the present invention provides determines device, it is possible to concentrate according to candidate The size of comprehensive similarity scores of data structure body, according to the power of degree of correlation, determine the most successively Go out to link first path that (train to) be correlated with by first instance, not only determine that the efficiency in first path is higher, and the first road determined Footpath is more useful.
Preferably, on the basis of the embodiment shown in Fig. 8, in order to make the first path determined more relevant further, institute State device also to include: the first judge module,
Described first judge module, is used for after triggering described second computing module, and inserts mould triggering described first Before block, it is judged that whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;If It is to trigger described first and insert module;
Concrete, whether judge not less than l according to the comprehensive similarity scores S of the first data structure body;
Wherein, l=ε * | A |;Wherein, ε, for limiting coefficient, determines according to actual application scenarios;| A | is the first data knot The scale of structure body, the quantity of entity pair in the i.e. first data structure body;
And/or, described device also includes: the second judge module,
Described second judge module, is used for after triggering described 4th computing module, and inserts mould triggering described second Before block, it is judged that whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;If It is to trigger described second and insert module;
Concrete determination methods is consistent with the method employed in the first judge module, and here is omitted.
After the process of the first judge module and/or the second judge module so that insert the first data knot of candidate collection Structure body and/or the 3rd data structure body are preferred data structure body, and the first path further ensuring generation is more relevant, energy Relation between every a pair training pair preferably described, and the link training determined to first path too much will not draw because number Enter noise.
Preferably, on the basis of the embodiment shown in Fig. 8 of the present invention, in order to make the first path more phase determined further Closing, described device also includes: the 3rd judge module,
Described 3rd judge module, for before triggering the described 4th and determining module, it is judged that described 3rd candidate unit path Jumping figure whether be not more than the second preset value;If it is, trigger the described 4th to determine module.
For example, it is possible to the second preset value is set to 4, say, that when the jumping figure in the 3rd candidate unit path is more than 4, this unit Path is almost without actual semantic relation.
Therefore, the jumping figure inserting the 3rd candidate unit path corresponding to the 3rd data structure body that collects of candidate is any limitation as, Making insert candidate to collect the 3rd data structure body is preferred data structure body, further ensures the first path more phase determined Close, can preferably describe the relation between every a pair training pair, and the link training determined to first path will not be because of counting Many and introduce noise.
Although, a kind of heterogeneous information network element path utilizing the embodiment shown in Fig. 8 of the present invention to provide determines device, really Fixed link training to every first path be all useful, relevant, but, these yuan of path is used for building link prediction Model, and when carrying out link prediction according to forecast model, the influence degree in every first path is again different.Therefore, weighing apparatus is found The method measuring the degree of correlation in every first path, and they are effectively integrated into forecast model be very important.
It is preferred, therefore, that the device shown in Fig. 8 can also include: the 5th computing module,
5th computing module, for according to the 3rd preset model, determines that described unit path concentrates each first path corresponding Weight and corresponding preserve to the most described unit path collection;
Wherein, the 3rd preset model is:
max h = Σ x + ∈ q + l n ( t ( ω , x + ) ) | q + | + Σ x - ∈ q - l n ( 1 - t ( ω , x - ) ) | q - | - | | ω | | 2 2
Wherein, h represents the output valve of the 3rd preset model, and ω is that power corresponding to each first path is concentrated in described unit path The vector reassembled into, ω=[ω12,…,ωi...], ωiThe first path concentrating serial number i for described unit path is corresponding Weight, it is assumed that unit concentrates in path and saves M bar unit path altogether, then i=1 ..., M, ωi>=0,
When the output valve h maximum of the 3rd preset model, the power that each first path is corresponding is concentrated in the first path in above formula The vectorial ω reassembled into is optimum, the ω in ωiAlso it is optimum;
Wherein,x+It is positive example sample x+Similarity measure values composition on all units path Vector, x+It is referred to as positive example value;x-It is negative example sample x-Similarity measure values group on all units path The vector become, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample x-It is by described positive example sample Destination node replace with this destination node with the node of type after, the sample that there is not link of composition;Q+ is institute There is positive example value x+The similarity matrix of composition;Q-is all negative example values x-The similarity matrix of composition;For correction term.
It should be noted that owing to link prediction can be taken as a kind of special classification problem.So the present invention is just using Example sample and negative example sample arrange weight for every first path determined with having supervision;So that utilize the unit with weight The forecast model that path exercising goes out is more effectively.
A kind of heterogeneous information network element path that the embodiment of the present invention provides determines device, can create the first data structure Body also inserts candidate collection;The size of the comprehensive similarity scores of the data structure body preserved is concentrated, successively from candidate according to candidate Concentrate and select the data structure body that comprehensive similarity scores is big, be designated as in the second data structure body, and this data structure body of procuratorial work Whether exist with arbitrary described first instance the 3rd identical entity pair;If it does, by described second data structure body, Correspondence is preserved to unit's path collection by the first path and described 3rd entity that link described 3rd entity pair, deletes described candidate and concentrates Described second data structure body, and continue size according to comprehensive similarity scores and concentrate from candidate and select next data knot Structure body;If it does not exist, then create the 3rd data structure body and insert candidate collection, then proceed to according to comprehensive similarity scores Size is concentrated from candidate and is selected next data structure body;Until candidate collection is empty.Owing to this device is concentrated according to candidate The size of the comprehensive similarity scores of data structure body, according to the power of degree of correlation, determines the most successively The link first instance more relevant first path to (train to), therefore, first path that the present invention provides determines device, the most really The efficiency in fixed unit path is high, and the first path determined is more useful.
As it is shown in figure 9, the embodiment of the present invention additionally provides a kind of link prediction device, described device includes: reality to be predicted Body to determining module 401, probability determination module 402 and the 4th judge module 403,
Entity to be predicted, to determining module 401, is used for determining entity pair to be predicted;
In heterogeneous information network, entity to be predicted is to can be except the external any entity pair of training.
Probability determination module 402, for according to the 4th preset model and described unit path collection, determining described entity to be predicted To the probability connect by the side chain of described first preset kind;
4th preset model is:
Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s,t) Being described entity pair to be predicted, wherein s is source node, and t is destination node;γ is described unit path collection;I is that unit path is in γ Sequence number;σ(s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt is Unit path ∏iWeight;ω0For correction factor;;Wherein, γ is to use the embodiment of the method shown in Fig. 1 to obtain, ωiIt it is root Determine according to the 3rd preset model.
4th judge module 403, be used for judging described probability whether more than the 3rd preset value, if it is, determine described the Three entities connect by the limit of described preset kind;
3rd preset value can set according to actual application scenarios, and generally 0.7;
Such as, general to connect by the side chain of the first preset kind when the entity to be predicted calculated according to the 4th preset model Rate is 0.8, then illustrate, entity to be predicted between be implicitly present in " limit of the first preset kind " this link.
A kind of link prediction device that the embodiment of the present invention provides, it may be determined that entity pair to be predicted;Preset according to the 4th Model and described unit path collection, determine described entity to be predicted to the probability connect by the side chain of described first preset kind, the Four preset models are:Judge whether described probability is more than the 3rd preset value, if It is to determine that described 3rd entity connects by the limit of described preset kind.Due to the 4th preset model as forecast model, it is First path exercising that the heterogeneous information network element determining method of path provided by the embodiment of the present invention is determined out, and the 4th Preset model also contemplates the weight in each first path, therefore, a kind of link prediction dress that the application embodiment of the present invention provides Put, it is thus achieved that predict the outcome the most accurate.
It should be noted that embodiment of the present invention heterogeneous information network element path determines, link prediction method, can be by soft Part program realizes.
For device embodiment, owing to it is substantially similar to embodiment of the method, so describe is fairly simple, relevant Part sees the part of embodiment of the method and illustrates.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to a reality Body or operation separate with another entity or operating space, and deposit between not necessarily requiring or imply these entities or operating Relation or order in any this reality.And, term " includes ", " comprising " or its any other variant are intended to Comprising of nonexcludability, so that include that the process of a series of key element, method, article or equipment not only include that those are wanted Element, but also include other key elements being not expressly set out, or also include for this process, method, article or equipment Intrinsic key element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that Including process, method, article or the equipment of described key element there is also other identical element.
Each embodiment in this specification all uses relevant mode to describe, identical similar portion between each embodiment Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for device For executing example, owing to it is substantially similar to embodiment of the method, so describe is fairly simple, relevant part sees embodiment of the method Part illustrate.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention.All Any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, are all contained in protection scope of the present invention In.

Claims (10)

1. a heterogeneous information network element determining method of path, it is characterised in that described method includes:
S101, determine in heterogeneous information network to be determined unit path multiple first instances pair, wherein, each described first instance To including source node and destination node, each described first instance is at least being connect by the side chain of the first preset kind;
S102, according to the plurality of first instance to determining primary data structure body;Described primary data structure body includes: by often The entity pair that the source node of first instance centering described in forms with this source node self;
S103, according to the limit type in described heterogeneous information network, generate multiple first candidate unit paths that jumping figure is 1, to often First candidate unit path execution of step A described in one to after step D, performs step S104:
A. according to described heterogeneous information network, described primary data structure body and described first candidate unit path, generate described Multiple second instances pair of the first candidate unit path link;Wherein, the source node of described second instance pair is described primary data The source node of the entity pair in structure, the destination node of described second instance pair is except described the in described heterogeneous information network Node outside the source node of one entity centering;
B. similar to by during described first candidate unit path link of each described second instance is calculated according to the first preset model Property metric;By described first candidate unit path, each described second instance to and correspondence similarity measure values preserve extremely First data structure body;
C. calculate the comprehensive similarity scores of described first data structure body according to the second preset model and preserve to described first Data structure body;
D. described first data structure body is inserted candidate collection;
S104, size according to described comprehensive similarity scores, concentrate from described candidate and select a data structure, is designated as the Two data structure bodies;Whether the second data structure body described in procuratorial work exist real to the identical the 3rd with arbitrary described first instance Body pair;
S105 is if it does, by described second data structure body, link first path and the described 3rd of described 3rd entity pair Correspondence is preserved to unit's path collection by entity, deletes the described second data structure body that described candidate is concentrated, and performs step S104;
S106 if it does not, according in described second data structure body preserve the second candidate unit path and described heterogeneous letter Limit type in breath network, generates multiple 3rd candidate unit path, and the jumping figure in described 3rd candidate unit path is waited with described second The difference of the jumping figure in choosing unit path is 1;Delete the described second data structure body that described candidate is concentrated;To each described 3rd candidate After unit path execution of step E to H, perform step S104;
E, according to described heterogeneous information network, described second data structure body and described 3rd candidate unit path, generate described Multiple 4th entities pair that 3rd candidate unit path connects, the source node of described 4th entity pair is described second data structure body In the source node of entity pair, the destination node of described 4th entity pair is except described first instance in described heterogeneous information network To source node outside node;
F, calculate each described 4th entity according to described first preset model to by during described 3rd candidate unit path link Similarity measure values, by described 3rd candidate unit path, each described 4th entity to and correspondence similarity measure values protect Deposit to the 3rd data structure body;
G, the comprehensive similarity scores calculating described 3rd data structure body according to described second preset model preservation are to the most described 3rd data structure body;
H, by described 3rd data structure body insert described candidate collection;
Wherein, described first preset model is:Wherein, σ (s, ti| ∏1…i) represent that source node s and destination node ti are at unit path ∏1…iOn similarity measure values;∏1…iRepresent linked source node s With destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path ∏1…i-1Upper migration can The set of the destination node arrived, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through limit Ri-1Arrive target Node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second preset model is:Wherein, S represents the comprehensive phase of data structure body Like property mark;S is source node, t be by unit path ∏ up to destination node, τ is the number up to destination node;σ(s,t| It is ∏) that entity is to (s, t) similarity measure values on unit path ∏;R (s)=1-α * N, r (s) represent that source node s is for working as The contribution ability of front data structure body is with the selection of balanced structure body, and α is the degradation factor of contribution ability, N represent preserve to The source node of first path link that described unit path integrates is as the number of the described first instance pair of s.
Method the most according to claim 1, it is characterised in that after having performed described step C, and performing described step Before D, described method also includes:
I. judge whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;If It is to perform described step D;
And/or, after having performed described step G, and before performing described step H, described method also includes:
J. judge whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;If It is to perform described step H.
Method the most according to claim 1, it is characterised in that before performing described step E, described method also includes:
Judge whether the jumping figure in described 3rd candidate unit path is not more than the second preset value;If it is, perform described step E.
Method the most according to claim 1, it is characterised in that described method also includes:
According to the 3rd preset model, determine that described unit path is concentrated weight corresponding to each first path and corresponding preserves to the most described Unit's path collection;Described 3rd preset model is:
max h = Σ x + ∈ q + l n ( t ( ω , x + ) ) | q + | + Σ x - ∈ q - l n ( 1 - t ( ω , x - ) ) | q - | - | | ω | | 2 2
Wherein, h represents the output valve of the 3rd preset model,x+It is positive example sample x+In all units path On similarity measure values composition vector, x+It is referred to as positive example value;x-It is negative example sample x-In all units The vector of the similarity measure values composition on path, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample This x-Be the destination node in described positive example sample is replaced with this destination node with the node of type after, not existing of composition The sample of link;ω is the vector that weight composition corresponding to each first path is concentrated in described unit path;Q+ be all just Example value x+The similarity matrix of composition;q-For all negative example values x-The similarity matrix of composition;For correction term.
5. the method that the method applied described in claim 4 carries out link prediction, it is characterised in that described link prediction Method includes:
Determine entity pair to be predicted;
According to the 4th preset model and described unit path collection, determine that described entity to be predicted is to by described first preset kind The probability that side chain connects;Described 4th preset model is:Wherein, η (s, t | γ) is Entity to be predicted is to the probability connect by the side chain of described first preset kind;(s t) is described entity pair to be predicted, wherein s Being source node, t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ(s,t|∏i) it is described Entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt is unit path ∏iWeight;ω0For repairing Positive coefficient;
Judge whether described probability is more than the 3rd preset value, if it is, determine that described 3rd entity is to by described preset kind Limit connects.
6. a heterogeneous information network element path determines device, it is characterised in that described device includes: first determine module, Two determine module, the first trigger module, the 3rd determine module, the first computing module, the second computing module, first insert module, First select module, the second trigger module, the 3rd trigger module, the 4th determine module, the 3rd computing module, the 4th computing module Module is inserted with second,
Described first determines module, for determining multiple first instances pair in unit to be determined path in heterogeneous information network, wherein, Each described first instance is to including source node and destination node, and each described first instance is at least by the first preset kind Side chain connects;
Described second determines module, is used for according to the plurality of first instance determining primary data structure body;Described initial number Include according to structure: the entity pair self being made up of with this source node the source node of each described first instance centering;
Described first trigger module, for according to the limit type in described heterogeneous information network, generation jumping figure is multiple the first of 1 Candidate unit path, each described first candidate unit path is triggered successively the described 3rd determine module, described first computing module, After described second computing module and described first inserts module, trigger described first and select module;
Described 3rd determines module, for according to described heterogeneous information network, described primary data structure body and described first marquis Choosing unit path, generates by multiple second instances pair of described first candidate unit path link;Wherein, the source of described second instance pair Node is the source node of the entity pair in described primary data structure body, and the destination node of described second instance pair is described heterogeneous Node in addition to the source node of described first instance centering in information network;
Described first computing module, for calculating each described second instance to by described first candidate unit road according to the first preset model The similarity measure values during link of footpath;By described first candidate unit path, each described second instance to and the similarity measurements of correspondence Value preserves to the first data structure body;Wherein, described first preset model is: Wherein, σ (s, ti|∏1…i) represent source node s and destination node tiAt unit path ∏1…iOn similarity measure values;∏1…iRepresent Linked source node s and destination node tiFirst path of jumping of i-1, I (Vi-1) represent from source node s start unit path ∏1…i-1The set of the accessibility destination node of upper migration, x is I (Vi-1A node in);R(x,ti) indicate whether to pass through Limit Ri-1Arrive destination node ti, can be 1, be otherwise 0;R (x) represents that node x is by limit Ri-1Accessibility number of network nodes;
Described second computing module, for calculating the comprehensive similarity scores of described first data structure body according to the second preset model And preserve to described first data structure body;Wherein, described second preset model is: Wherein, S represents the comprehensive similarity scores of data structure body;S is source node, t be by unit path ∏ up to destination node, τ is the number up to destination node;σ (s, t | ∏) it is that entity is to (s, t) similarity measure values on unit path ∏;R (s)= 1-α * N, r (s) represent that source node s contributes ability with the selection of balanced structure body for current data structure, and α is contribution energy The degradation factor of power, N represents that the source node preserving the first path link integrated to described unit path is as the described first instance pair of s Number;
Described first inserts module, for described first data structure body is inserted candidate collection;
Described first selects module, for the size according to described comprehensive similarity scores, concentrates from described candidate and selects one Data structure body, is designated as the second data structure body;Whether the second data structure body described in procuratorial work exists and arbitrary described first Entity is to the 3rd identical entity pair;
Described second trigger module, in the case of the inspection result in described first selection module acquisition is for being, by described In second data structure body, correspondence is preserved to unit path by the first path and described 3rd entity that link described 3rd entity pair Collection, deletes the described second data structure body that described candidate is concentrated, and triggers described first selection module;
Described 3rd trigger module, in the case of the described first inspection result selecting module to obtain is no, according to described Limit type in the second candidate unit path preserved in second data structure body and described heterogeneous information network, generates the multiple 3rd Candidate unit path, the jumping figure in described 3rd candidate unit path is 1 with the difference of the jumping figure in described second candidate unit path;Delete described The described second data structure body that candidate is concentrated;Each described 3rd candidate unit path is triggered the described 4th successively and determines mould After block, described 3rd computing module, described 4th computing module and described second insert module, trigger described first and select mould Block;
Described 4th determines module, for according to described heterogeneous information network, described second data structure body and described 3rd marquis Choosing unit path, generates multiple 4th entities pair connected by described 3rd candidate unit path, the source node of described 4th entity pair For the source node of the entity pair in described second data structure body, the destination node of described 4th entity pair is described heterogeneous information Node in addition to the source node of described first instance pair in network;
Described 3rd computing module, for calculating each described 4th entity to by the described 3rd according to described first preset model The similarity measure values during link of candidate unit path, by described 3rd candidate unit path, each described 4th entity to and right The similarity measure values answered preserves to the 3rd data structure body;
Described 4th computing module, for calculating the most similar of described 3rd data structure body according to described second preset model Property mark preserving to described 3rd data structure body;
Described second inserts module, for described 3rd data structure body is inserted described candidate collection.
Device the most according to claim 6, it is characterised in that described device also includes: the first judge module,
Described first judge module, is used for after triggering described second computing module, and before triggering described first and inserting module, Judge whether the described comprehensive similarity scores that described first data structure body is corresponding is not less than the first preset value;If it is, touch Send out described first insert module;
And/or, described device also includes: the second judge module,
Described second judge module, is used for after triggering described 4th computing module, and before triggering described second and inserting module, Judge whether the described comprehensive similarity scores that described 3rd data structure body is corresponding is not less than the first preset value;If it is, touch Send out described second insert module.
Device the most according to claim 6, it is characterised in that described device also includes: the 3rd judge module,
Described 3rd judge module, for before triggering the described 4th and determining module, it is judged that the jumping in described 3rd candidate unit path Whether number is not more than the second preset value;If it is, trigger the described 4th to determine module.
Device the most according to claim 6, it is characterised in that described device also includes: the 5th computing module,
Described 5th computing module, for according to the 3rd preset model, determines that described unit path concentrates each first path corresponding Weight and corresponding preserve to the most described unit path collection;Described 3rd preset model is:
max h = Σ x + ∈ q + l n ( t ( ω , x + ) ) | q + | + Σ x - ∈ q - l n ( 1 - t ( ω , x - ) ) | q - | - | | ω | | 2 2
Wherein, h represents the output valve of the 3rd preset model,x+It is positive example sample x+In all units path On similarity measure values composition vector, x+It is referred to as positive example value;x-It is negative example sample x-In all units The vector of the similarity measure values composition on path, x-It is referred to as negative example value;Positive example sample x+It it is described first instance pair;Negative example sample This x-Be the destination node in described positive example sample is replaced with this destination node with the node of type after, not existing of composition The sample of link;ω is the vector that weight composition corresponding to each first path is concentrated in described unit path;Q+ be all just Example value x+The similarity matrix of composition;q-For all negative example values x-The similarity matrix of composition;For correction term.
10. the device applied described in claim 9 carries out the device of link prediction, it is characterised in that described link prediction Device include: entity to be predicted to determining module, probability determination module and the 4th judge module,
Described entity to be predicted, to determining module, is used for determining entity pair to be predicted;
Described probability determination module, for according to the 4th preset model and described unit path collection, determining that described entity to be predicted is to by institute State the probability that the side chain of the first preset kind connects;Described 4th preset model is: Wherein, η (s, t | γ) is that entity to be predicted is to the probability connect by the side chain of described first preset kind;(s, t) be described in treat Prediction entity pair, wherein s is source node, and t is destination node;γ is described unit path collection;I is unit's path sequence number in γ;σ (s,t|∏i) it is that described entity to be predicted is to (s, t) at i-th first path ∏iOn similarity measure values;ωiIt it is unit path ∏iWeight;ω0For correction factor;
Described 4th judge module, is used for judging whether described probability is more than the 3rd preset value, if it is, determine that the described 3rd is real Body connects by the limit of described preset kind.
CN201610225725.XA 2016-04-12 2016-04-12 Heterogeneous information network element path determines, link prediction method and device Active CN105913125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610225725.XA CN105913125B (en) 2016-04-12 2016-04-12 Heterogeneous information network element path determines, link prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610225725.XA CN105913125B (en) 2016-04-12 2016-04-12 Heterogeneous information network element path determines, link prediction method and device

Publications (2)

Publication Number Publication Date
CN105913125A true CN105913125A (en) 2016-08-31
CN105913125B CN105913125B (en) 2018-05-25

Family

ID=56746047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610225725.XA Active CN105913125B (en) 2016-04-12 2016-04-12 Heterogeneous information network element path determines, link prediction method and device

Country Status (1)

Country Link
CN (1) CN105913125B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951526A (en) * 2017-03-21 2017-07-14 北京邮电大学 A kind of entity set extended method and device
CN107944629A (en) * 2017-11-30 2018-04-20 北京邮电大学 A kind of recommendation method and device based on heterogeneous information network representation
CN109299285A (en) * 2018-09-11 2019-02-01 中国医学科学院医学信息研究所 A kind of pharmacogenomics knowledge mapping construction method and system
CN109635201A (en) * 2018-12-18 2019-04-16 苏州大学 The heterogeneous cross-platform association user account method for digging of social networks
CN109800504A (en) * 2019-01-21 2019-05-24 北京邮电大学 A kind of embedding grammar and device of heterogeneous information network
CN110555050A (en) * 2018-03-30 2019-12-10 华东师范大学 heterogeneous network node representation learning method based on meta-path
CN112380434A (en) * 2020-11-16 2021-02-19 吉林大学 Interpretable recommendation system method fusing heterogeneous information network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050083848A1 (en) * 2003-10-20 2005-04-21 Huai-Rong Shao Selecting multiple paths in overlay networks for streaming data
CN103559320A (en) * 2013-11-21 2014-02-05 北京邮电大学 Method for sequencing objects in heterogeneous network
CN103955535A (en) * 2014-05-14 2014-07-30 南京大学镇江高新技术研究院 Individualized recommending method and system based on element path

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050083848A1 (en) * 2003-10-20 2005-04-21 Huai-Rong Shao Selecting multiple paths in overlay networks for streaming data
CN103559320A (en) * 2013-11-21 2014-02-05 北京邮电大学 Method for sequencing objects in heterogeneous network
CN103955535A (en) * 2014-05-14 2014-07-30 南京大学镇江高新技术研究院 Individualized recommending method and system based on element path

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YIZHOU SUN,JIAWEI HAN: "Meta-Path-Based Search and Mining in Heterogeneous Information Networks", 《清华大学学报自然科学版(英文版)》 *
孟晓峰: "基于异质信息网络的相似性度量研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
黄立威等: "一种基于元路径的异质信息网络链路预测模型", 《计算机学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951526A (en) * 2017-03-21 2017-07-14 北京邮电大学 A kind of entity set extended method and device
CN106951526B (en) * 2017-03-21 2020-08-07 北京邮电大学 Entity set extension method and device
CN107944629A (en) * 2017-11-30 2018-04-20 北京邮电大学 A kind of recommendation method and device based on heterogeneous information network representation
CN107944629B (en) * 2017-11-30 2020-08-07 北京邮电大学 Recommendation method and device based on heterogeneous information network representation
CN110555050A (en) * 2018-03-30 2019-12-10 华东师范大学 heterogeneous network node representation learning method based on meta-path
CN110555050B (en) * 2018-03-30 2023-03-31 华东师范大学 Heterogeneous network node representation learning method based on meta-path
CN109299285A (en) * 2018-09-11 2019-02-01 中国医学科学院医学信息研究所 A kind of pharmacogenomics knowledge mapping construction method and system
CN109635201A (en) * 2018-12-18 2019-04-16 苏州大学 The heterogeneous cross-platform association user account method for digging of social networks
CN109635201B (en) * 2018-12-18 2020-07-31 苏州大学 Heterogeneous social network cross-platform associated user account mining method
CN109800504A (en) * 2019-01-21 2019-05-24 北京邮电大学 A kind of embedding grammar and device of heterogeneous information network
CN112380434A (en) * 2020-11-16 2021-02-19 吉林大学 Interpretable recommendation system method fusing heterogeneous information network
CN112380434B (en) * 2020-11-16 2022-09-16 吉林大学 Interpretable recommendation method fusing heterogeneous information network

Also Published As

Publication number Publication date
CN105913125B (en) 2018-05-25

Similar Documents

Publication Publication Date Title
CN105913125A (en) Heterogeneous information network element determining method, link prediction method, heterogeneous information network element determining device and link prediction device
CN108777873A (en) The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN103793476B (en) Network community based collaborative filtering recommendation method
CN103729432B (en) Method for analyzing and sequencing academic influence of theme literature in citation database
CN104063612B (en) A kind of Tunnel Engineering risk profiles fuzzy evaluation method and assessment system
CN103353923B (en) Adaptive space interpolation method and system thereof based on space characteristics analysis
CN104881689B (en) A kind of multi-tag Active Learning sorting technique and system
CN107967208A (en) A kind of Python resource sensitive defect code detection methods based on deep neural network
CN105354595A (en) Robust visual image classification method and system
CN109325263A (en) Truss bridge damage position neural network based and damage extent identification method
CN110516757A (en) A kind of transformer fault detection method and relevant apparatus
CN107545151A (en) A kind of medicine method for relocating based on low-rank matrix filling
WO2015032301A1 (en) Method for detecting the similarity of the patent documents on the basis of new kernel function luke kernel
CN105389505A (en) Shilling attack detection method based on stack type sparse self-encoder
CN105069072A (en) Emotional analysis based mixed user scoring information recommendation method and apparatus
CN105138653A (en) Exercise recommendation method and device based on typical degree and difficulty
CN103455612B (en) Based on two-stage policy non-overlapped with overlapping network community detection method
CN110110529B (en) Software network key node mining method based on complex network
Vidinli et al. New query suggestion framework and algorithms: A case study for an educational search engine
CN107391659A (en) A kind of citation network academic evaluation sort method based on credit worthiness
CN108460158A (en) Differentiation Web page sequencing method based on PageRank
CN109783629A (en) A kind of micro-blog event rumour detection method of amalgamation of global event relation information
Ma et al. Eigenspaces of networks reveal the overlapping and hierarchical community structure more precisely
CN104881400B (en) Semantic dependency computational methods based on associative network
CN106682507A (en) Virus library acquiring method and device, equipment, server and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant