CN109543893A

CN109543893A - Heterogeneous Information cyberrelationship prediction technique, readable storage medium storing program for executing and terminal

Info

Publication number: CN109543893A
Application number: CN201811358462.5A
Authority: CN
Inventors: 陈可佳; 吴桐
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2019-03-29

Abstract

A kind of Heterogeneous Information cyberrelationship prediction technique, readable storage medium storing program for executing and terminal, which comprises obtain the destination node being originally inputted to set；The destination node pre-processes set, obtains corresponding positive example set and unmarked example set；Based on obtained positive example set, corresponding unmarked example is extracted from the unmarked example set, forms corresponding reliable counter-example set；Corresponding Heterogeneous Information cyberrelationship prediction model is obtained using the positive example set and obtained reliable counter-example set training；The unknown relation between nodes to be predicted is predicted using the Heterogeneous Information cyberrelationship prediction model that training obtains.The accuracy of Heterogeneous Information nodes Relationship Prediction can be improved in above-mentioned scheme.

Description

Heterogeneous Information cyberrelationship prediction technique, readable storage medium storing program for executing and terminal

Technical field

The invention belongs to data analysis technique field, more particularly to a kind of Heterogeneous Information cyberrelationship prediction technique, can Read storage medium and terminal.

Background technique

With the rapid development of science and technology, the social mode increasingly diversification of people, various complex networks Thus it is born.From ant colony structure to social intercourse system, from nervous system to the ecosystem, from traffic system to electric system etc. reality Complication system in the world can topology be approximately complex network structures.Object abstract representation in complication system is in network Node, link of the interactive relation abstract representation between node between object.In complex network research, link prediction is huge because of its Big application value is widely paid close attention to by researcher.

Currently, the research object of most of link prediction is the complex network of homogeneity, i.e., node and links category in network Type is single.However, real complex network is the network of isomery mostly, there are dependences complicated between a plurality of types of nodes and node Relationship.Homogeneous network substantially be heterogeneous network a homogeneity section, therefore only research homogeneous network can lose it is important Information.For example, reality social networks in not only exist user node and indicate friends link, further include log, The nodes of the types such as word, position and timestamp and indicate log and the linking of inclusion relation, log and place between term node Between register the link etc. of relationship；Node in medical network has the types such as patient, doctor, disease, drug and hospital site.This A little information have potential influence for the prediction of Object linking.In Heterogeneous Information network, the relationship of node pair can use one Item is directly linked to indicate, can also be indicated by the path of a mixing multiple types node and link.Therefore, to be predicted Target may be simple link, it is more likely that it is several link composition relationship.In this way, link forecasting problem just extend in order to Relationship Prediction problem.

Summary of the invention

Present invention solves the technical problem that being how to improve the accuracy of Heterogeneous Information nodes Relationship Prediction.

In order to achieve the above object, the present invention provides a kind of Heterogeneous Information cyberrelationship prediction technique, which comprises

The destination node being originally inputted is obtained to set；

The destination node pre-processes set, obtains corresponding positive example set and unmarked example set；

Based on obtained positive example set, corresponding unmarked example is extracted from the unmarked example set, composition corresponds to Reliable counter-example set；

Corresponding Heterogeneous Information cyberrelationship is obtained using the positive example set and obtained reliable counter-example set training Prediction model；

The Heterogeneous Information cyberrelationship prediction model obtained using training is to the non-MS between nodes to be predicted System is predicted.

It is optionally, described to pre-process the destination node to set, comprising:

Given network is constructed using destination node type as first set of paths of starting point；

It is special to the number of path in corresponding every member path and random walk to each node in set to calculate destination node Sign；

Using each node to the number of path and random walk feature construction node pair in every first path for constituting the node Corresponding example forms example collection；

Using in the example collection there are the node of relationship by objective (RBO) to corresponding example as positive example, there will be no targets to close The node of system, as unmarked example, obtains the positive example set and unmarked example set to corresponding example.

Optionally, described to be based on obtained positive example set, it is extracted from the unmarked example set corresponding unmarked Example forms corresponding reliable counter-example set, comprising:

It clusters respectively to the positive example set and unmarked example set, obtains the corresponding positive cluster in part and part is not marked Remember cluster；

Based on the exemplary feature in the unmarked cluster of the positive cluster in the part and part, the positive cluster in each part is calculated to locally not Mark the distance between cluster；

Each positive cluster in the part is voted to the unmarked cluster in part for being greater than preset distance threshold with its distance, and Using the unmarked example in the unmarked cluster in preset quantity part that ballot sum is arranged in front as counter-example, obtain described reliable anti- Example set.

Optionally, the quantity between the unmarked cluster of the positive cluster in the part and part meets following relationship:

Wherein, N indicates the number of the positive cluster in part, and K indicates the number of the unmarked cluster in part, | U | it indicates not Exemplary number in example set is marked, | P | indicate exemplary number in positive set.

It is optionally, described to cluster to the positive example set and unmarked example set, comprising:

K-means clustering algorithms are respectively adopted to cluster to the positive example set and unmarked example set.

Optionally, using the distance between following each positive cluster in part of formula calculating to the unmarked cluster in part:

And:

Wherein, d (LP_i, ULC_j) indicate the positive cluster LP in part_iTo the unmarked cluster ULC in part_jThe distance between,Expression office The positive cluster LP in portion_iIn example, x_iIndicate the positive cluster LP in part_iIn exemplary ith feature,Indicate the unmarked cluster in part ULC_jIn example, x '_iIndicate the unmarked cluster ULC in part_jIn exemplary ith feature, min () indicate minimum operation, m Indicate the number of feature in example.

Optionally, described to obtain corresponding isomery with obtained reliable counter-example set training using the positive example set and believe Cease cyberrelationship prediction model, comprising:

It is "+1 " by each example markup in the positive example set, is by each example markup in reliable counter-example set " -1 " constitutes corresponding training set；

The training set is inputted in preset Naive Bayes Classifier and is trained, the Heterogeneous Information network is obtained Relationship Prediction model.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described The step of computer instruction executes Heterogeneous Information cyberrelationship prediction technique described in any of the above embodiments when running.

The embodiment of the invention also provides a kind of terminal, including memory and processor, energy is stored on the memory Enough computer instructions run on the processor, the processor execute any of the above-described when running the computer instruction The step of described Heterogeneous Information cyberrelationship prediction technique.

Compared with prior art, the invention has the benefit that

Above-mentioned scheme, by be based on obtained positive example set, extracted from the unmarked example set it is corresponding not Example is marked, corresponding reliable counter-example set is formed, and trained using the positive example set and obtained reliable counter-example set To corresponding Heterogeneous Information cyberrelationship prediction model, the confidence level of the counter-example in counter-example set can be improved, therefore can be improved The accuracy for the Heterogeneous Information cyberrelationship prediction model that training obtains, and then the accuracy of node relationships prediction can be improved.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those of ordinary skill in the art, without any creative labor, it can also be obtained according to these attached drawings His attached drawing.

Fig. 1 is a kind of flow diagram based on Heterogeneous Information cyberrelationship prediction technique of the embodiment of the present invention:

Fig. 2 is a kind of structural schematic diagram based on Heterogeneous Information cyberrelationship prediction meanss of the embodiment of the present invention.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.Related directionality instruction in the embodiment of the present invention (such as upper and lower, left and right, It is forward and backward etc.) it is only used for the relative positional relationship explained under a certain particular pose (as shown in the picture) between each component, movement feelings Condition etc., if the particular pose changes, directionality instruction is also correspondingly changed correspondingly.

As stated in the background art, the existing complex network link/Relationship Prediction method of the prior art mostly uses greatly supervision to learn Frame is practised, is to label by there are Object linking/relationship nodes in network, from Object linking/relationship section is not present Point centering is all or random selection a part marks and is, and positive example and the counter-example instruction for needing largely to mark in training process Practice data to improve the precision of prediction of classifier.But these may will form target in future labeled as the node of counter-example Link/relationship, therefore it is not necessarily believable counter-example.And complex network general data is larger, to consider to pass through sampling Big data problem is become into small data problem, negative data is extracted in a random way and is likely to reduce the prediction mould trained The performance of type.Meanwhile positive example sample (there are Object linking/relationship node to) with unmarked sample (temporarily without Object linking/pass The node of system to) quantity it is extremely uneven, there is a large amount of unmarked sample, how therefrom to select representative and believable Counter-example is a good problem to study.

Technical solution of the present invention is extracted from the unmarked example set and is corresponded to by being based on obtained positive example set Unmarked example, form corresponding reliable counter-example set, and using the positive example set and obtained reliable counter-example set instruction Corresponding Heterogeneous Information cyberrelationship prediction model is got, can be improved the confidence level of the counter-example in counter-example set, therefore can be with The accuracy for the Heterogeneous Information cyberrelationship prediction model that training obtains is improved, and then the accurate of node relationships prediction can be improved Property.

It is understandable to enable above-mentioned purpose of the invention, feature and beneficial effect to become apparent, with reference to the accompanying drawing to this The specific embodiment of invention is described in detail.

Fig. 1 is a kind of flow diagram based on Heterogeneous Information cyberrelationship prediction technique of the embodiment of the present invention.Referring to A kind of Fig. 1, item recommendation method based on predicted value filling, can specifically include following step:

Step S101: the destination node being originally inputted is obtained to set.

Step S102: the destination node pre-processes set, obtains corresponding positive example set and unmarked example Set.

In specific implementation, to it is acquired be originally inputted destination node set pre-processed when, be primarily based on Given network is constructed using destination node type as first set of paths of starting point.Then, according to being constructed with target section Vertex type is first set of paths of starting point, calculates each destination node to the number of path and random trip in corresponding every first path Feature is walked, and using the number of path in every member path of each node pair and random walk feature as in the example of the node pair Element obtains each node to corresponding example, to form corresponding example collection.After example collection formation, pass through Corresponding example is added in positive example set as positive example using there are the nodes of relationship by objective (RBO) in the example collection, and will not There are the nodes of relationship by objective (RBO), and corresponding example to be added in unmarked example set as unmarked example, to finally obtain described Positive example set and unmarked example set.

Step S103: being based on obtained positive example set, extract corresponding unmarked example from the unmarked example set, Form corresponding reliable counter-example set.

In the preset implementation, since the flag data in PU study only has positive example, how by these positive examples from It is critically important that reliable counter-example data are extracted in unmarked set.

In an embodiment of the present invention, use clustering algorithms first, such as K-means clustering algorithms, respectively to it is described just Example set and unmarked example set cluster, and obtain the corresponding positive cluster in N number of part and the K unmarked cluster in part.Wherein, gained To the positive cluster in N number of part and the K unmarked cluster in part between quantity meet following relationship:

Wherein, N indicates the number of the positive cluster in part, and K indicates the number of the unmarked cluster in part, | U | indicate unmarked example set In exemplary number, | P | indicate positive example set in exemplary number.

After obtaining the positive cluster in N number of part and the K unmarked cluster in part, the equidistant calculation method meter of Euclidean distance is being used The positive cluster in each part is calculated to locally the distance between unmarked cluster.Wherein, in an embodiment of the present invention, using following formula The positive cluster in each part is calculated to locally the distance between unmarked cluster:

And:

Be calculated the positive cluster in each part between the unmarked cluster in part apart from when, by by calculated distance It is compared with preset distance threshold, it will be unmarked greater than the part of preset distance threshold with the distance between the positive cluster in part Cluster as with the positive cluster in part apart from the unmarked cluster in farther away part, from the positive cluster in each part respectively to it apart from farther away part Unmarked cluster is voted respectively.Wherein, the positive cluster in each part is to the poll thrown with it apart from the unmarked cluster in farther away part It is identical.When the ballot closes, count the aggregate votes of the unmarked cluster in each part, and will the unmarked cluster in part according to aggregate votes from height It is arranged to low sequence, is added the unmarked example in the unmarked cluster in the part for arranging first preset quantity as counter-example The reliable counter-example set finally obtains the reliable counter-example set.

Step S104: corresponding Heterogeneous Information is obtained using the positive example set and obtained reliable counter-example set training Cyberrelationship prediction model.

In specific implementation, when obtaining the positive example set and the reliable counter-example set, using obtaining the positive example Set and the reliable counter-example set are trained, and corresponding Heterogeneous Information cyberrelationship prediction model can be obtained.Specifically, It can be first "+1 " by each example markup in the positive example set, be by each example markup in reliable counter-example set " -1 " constitutes corresponding training set, then the training set is inputted in preset Naive Bayes Classifier and is trained, Obtain the Heterogeneous Information cyberrelationship prediction model.In order to further increase to obtain the Heterogeneous Information cyberrelationship prediction mould The accuracy of type can be assessed with performance of the test set to obtained Heterogeneous Information cyberrelationship prediction model, and selection is most Excellent model parameter, so that obtaining the Relationship Prediction that Heterogeneous Information cyberrelationship prediction model greatly is optimal after test Energy.

Step S105: the Heterogeneous Information cyberrelationship prediction model obtained using training is between nodes to be predicted Unknown relation predicted.

In specific implementation, it when the Heterogeneous Information cyberrelationship prediction model that training obtains, can use acquired Heterogeneous Information cyberrelationship prediction model the unknown portions of current network are predicted, i.e., in current network be not connected with Unknown connection relationship between destination node pair predicts, the new network after being predicted.

The above-mentioned Heterogeneous Information cyberrelationship prediction technique in the embodiment of the present invention is described in detail, and below will The above-mentioned corresponding device of method is introduced.

Fig. 2 shows the structural schematic diagrams of one of embodiment of the present invention Heterogeneous Information cyberrelationship prediction meanss.Ginseng See Fig. 2, a kind of Heterogeneous Information cyberrelationship prediction meanss 20 may include set acquiring unit 201, set pretreatment unit 202, gather construction unit 203, model training unit 204 and Relationship Prediction unit 205, in which:

The set acquiring unit 201, suitable for obtaining the destination node being originally inputted to set.

The set pretreatment unit 202, suitable for the destination node pre-processes set, obtain it is corresponding just Example set and unmarked example set.

The set construction unit 203 is suitable for being based on obtained positive example set, extract from the unmarked example set Corresponding unmarked example forms corresponding reliable counter-example set.

The model training unit 204, trained suitable for the use positive example set and obtained reliable counter-example set To corresponding Heterogeneous Information cyberrelationship prediction model.

The Relationship Prediction unit 205, the Heterogeneous Information cyberrelationship prediction model suitable for being obtained using training are treated pre- The unknown relation surveyed between nodes is predicted.

In specific implementation, the pretreatment unit 202, suitable for being with destination node type to given network construction First set of paths of starting point；Calculate destination node to each node in set to the number of path in corresponding every first path and Random walk feature；Using each node to the number of path and random walk feature construction section in every first path for constituting the node Point forms example collection to corresponding example；Corresponding example is made by there are the nodes of relationship by objective (RBO) in the example collection For positive example, is obtained by the positive example set and is not marked as unmarked example for corresponding example there will be no the node of relationship by objective (RBO) Remember example set.

In specific implementation, the set construction unit 203 is suitable for respectively to the positive example set and unmarked example set It clusters, obtains the unmarked cluster of the corresponding positive cluster in part and part；Based in the unmarked cluster of the positive cluster in the part and part Exemplary feature calculates the positive cluster in each part to locally the distance between unmarked cluster；Each positive cluster Xiang Yuqi in the part away from The preset quantity office voted from the unmarked cluster in part for being greater than preset distance threshold, and ballot sum is arranged in front Unmarked example in the unmarked cluster in portion obtains the reliable counter-example set as counter-example.In an embodiment of the present invention, the office Quantity between the unmarked cluster of the positive cluster in portion and part meets following relationship:

Wherein, N indicates the number of the positive cluster in part, and K indicates the number of the unmarked cluster in part, | U | it indicates not Exemplary number in example set is marked, | P | indicate exemplary number in positive example set.

In an embodiment of the present invention, the set construction unit 203, suitable for K-means clustering algorithms pair are respectively adopted The positive example set and unmarked example set cluster.

In an embodiment of the present invention, the set construction unit 203, suitable for calculating each part using following formula Positive cluster is to locally the distance between unmarked cluster:

And:

In specific implementation, the model training unit 204, suitable for being by each example markup in the positive example set Each example markup in reliable counter-example set is " -1 ", constitutes corresponding training set by "+1 "；Training set input is pre- If Naive Bayes Classifier in be trained, obtain the Heterogeneous Information cyberrelationship prediction model.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described The step of Heterogeneous Information cyberrelationship prediction technique is executed when computer instruction is run.Wherein, the Heterogeneous Information Cyberrelationship prediction technique refers to the introduction of preceding sections, repeats no more.

The embodiment of the invention also provides a kind of terminal, including memory and processor, energy is stored on the memory Enough computer instructions run on the processor, the processor execute the isomery when running the computer instruction The step of information network Relationship Prediction method.Wherein, the Heterogeneous Information cyberrelationship prediction technique refers to preceding sections Introduction, repeat no more.

Using the above scheme in the embodiment of the present invention, by being based on obtained positive example set, from the unmarked example It extracts corresponding unmarked example in set, forms corresponding reliable counter-example set, and using the positive example set and obtained Reliable counter-example set training obtains corresponding Heterogeneous Information cyberrelationship prediction model, and the counter-example in counter-example set can be improved Confidence level, therefore the accuracy for the Heterogeneous Information cyberrelationship prediction model that training obtains can be improved, and then node can be improved The accuracy of Relationship Prediction.

The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, the present invention Claimed range is delineated by the appended claims, the specification and equivalents thereof from the appended claims.

Claims

1. a kind of Heterogeneous Information cyberrelationship prediction technique characterized by comprising

The destination node being originally inputted is obtained to set；

Based on obtained positive example set, corresponding unmarked example is extracted from the unmarked example set, composition is corresponding can By counter-example set；

Corresponding Heterogeneous Information cyberrelationship prediction is obtained using the positive example set and obtained reliable counter-example set training Model；

Using the obtained Heterogeneous Information cyberrelationship prediction model of training to the unknown relation between nodes to be predicted into Row prediction.

2. Heterogeneous Information cyberrelationship prediction technique according to claim 1, which is characterized in that described by the target section Point pre-processes set, comprising:

Calculate number of path and random walk feature of the destination node to each node in set to corresponding every first path；

Using each node to the number of path and random walk feature construction node in every first path for constituting the node to correspondence Example, formed example collection；

Using in the example collection there are the node of relationship by objective (RBO) to corresponding example as positive example, there will be no relationship by objective (RBO) Node, as unmarked example, obtains the positive example set and unmarked example set to corresponding example.

3. Heterogeneous Information cyberrelationship prediction technique according to claim 1, which is characterized in that described based on obtained Positive example set extracts corresponding unmarked example from the unmarked example set, forms corresponding reliable counter-example set, comprising:

It clusters respectively to the positive example set and unmarked example set, obtains the corresponding positive cluster in part and part is unmarked Cluster；

Based on the exemplary feature in the unmarked cluster of the positive cluster in the part and part, it is unmarked to part to calculate the positive cluster in each part The distance between cluster；

Each positive cluster in the part is voted to the unmarked cluster in part for being greater than preset distance threshold with its distance, and will be thrown The unmarked example in the unmarked cluster in preset quantity part that ticket sum is arranged in front obtains the reliable counter-example collection as counter-example It closes.

4. Heterogeneous Information cyberrelationship prediction technique according to claim 3, which is characterized in that the positive cluster drawn game in part Quantity between the unmarked cluster in portion meets following relationship:

Wherein, N indicates the number of the positive cluster in part, and K indicates the number of the unmarked cluster in part, | U | it indicates to show in unmarked example set The number of example, | P | indicate exemplary number in positive example set.

5. Heterogeneous Information cyberrelationship prediction technique according to claim 3, which is characterized in that described to the positive example collection It closes and unmarked example set clusters, comprising:

6. Heterogeneous Information cyberrelationship prediction technique according to claim 3, which is characterized in that use following formula meter The positive cluster in each part is calculated to locally the distance between unmarked cluster:

And:

Wherein, d (LP_i, ULC_j) indicate the positive cluster LP in part_iTo the unmarked cluster ULC in part_jThe distance between,Indicate part just Cluster LP_iIn example, x_iIndicate the positive cluster LP in part_iIn exemplary ith feature,Indicate the unmarked cluster ULC in part_jIn Example, x '_iIndicate the unmarked cluster ULC in part_jIn exemplary ith feature, min () indicates minimum operation, and m shows The number of feature in example.

7. Heterogeneous Information cyberrelationship prediction technique according to claim 3, which is characterized in that described to use the positive example Set and obtained reliable counter-example set training obtain corresponding Heterogeneous Information cyberrelationship prediction model, comprising:

By each example markup in the positive example set be "+1 ", by each example markup in reliable counter-example set be "- 1 ", constitute corresponding training set；

The training set is inputted in preset Naive Bayes Classifier and is trained, the Heterogeneous Information cyberrelationship is obtained Prediction model.

8. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction fortune Perform claim requires the step of 1 to 7 described in any item Heterogeneous Information cyberrelationship prediction techniques when row.

9. a kind of terminal, which is characterized in that including memory and processor, storing on the memory can be in the processing The computer instruction run on device, perform claim requires described in 1 to 7 any one when the processor runs the computer instruction Heterogeneous Information cyberrelationship prediction technique the step of.