CN106959967B

CN106959967B - A kind of training and link prediction method of link prediction model

Info

Publication number: CN106959967B
Application number: CN201610018320.9A
Authority: CN
Inventors: 张艳; 李太松; 颜永红
Original assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Current assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Priority date: 2016-01-12
Filing date: 2016-01-12
Publication date: 2019-11-19
Anticipated expiration: 2036-01-12
Also published as: CN106959967A

Abstract

The present invention provides a kind of training methods of link prediction model, which comprises step S1) crawl network data is pre-processed, pretreated network data is extracted into training set；Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction；The feature includes: the feature based on neighbours and the feature based on network wandering；Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection；Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains the parameter of the link prediction model, to obtain the link prediction model that training finishes.Method of the invention only need to start with from existing feature set can extension feature collection quantity；It does not need to extract new feature from network again, considerably reduces feature extraction difficulty；And improve the estimated performance and robustness of model.

Description

A kind of training and link prediction method of link prediction model

Technical field

The present invention relates in computer network field, and in particular to a kind of training of link prediction model and link prediction side Method.

Background technique

Along with the rapid development of internet and mobile communication technology, the connection between people becomes increasingly closer.It is logical Internet and communication network are crossed, constitutes a huge complex network between men.Interpersonal interaction in network, The various aspects in life have been dissolved into exchange and influence.It also gradually attracts attention to the research of community network, and becomes current One of research hotspot of scientific domain.In society, many people wish the structure and variation by analyzing community network, discovery Connection principle between nodes knows to hide rule and community network topological features under general phenomenon Relationship between nodal community feature and network node behavior trend, and then find the differentiation essence of community network, utilize this A little information help people more effectively to configure resource and information processing, instruct commodity production, human lives, population management, nature Management, interpretation and the decision of planning etc..Wherein an important research point of network node behavior trend is exactly link prediction.

Link prediction method is used to describe the development trend in network future, the connection prediction that can be refined between node； Incomplete either hiding side can also be found out in existing imperfect network.Traditional link prediction method generally utilizes Network topology characteristic and nodal community are predicted using the method for machine learning；But due on the quality and quantity of feature There are bottleneck, it is unfavorable for the attribute for going deep into excavating network, therefore prediction effect is also restrained.

Summary of the invention

It is an object of the invention to overcome feature quantity existing for current link prediction method limited, intrinsic dimensionality is not rich enough Rich problem, proposes a kind of training method of link prediction model, and the feature set that this method extracts training set carries out model Conversion obtains multidimensional characteristic collection, and feature set and multidimensional characteristic collection are carried out to merge the input as link prediction model, is promoted The estimated performance of model, which is used for link prediction can promote link prediction performance.

To achieve the goals above, the present invention provides a kind of training method of link prediction model, which comprises

Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set；

Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction；The feature It include: the feature based on neighbours and the feature based on network wandering；

Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection；

Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains The parameter of the link prediction model, to obtain the link prediction model that training finishes.

In above-mentioned technical proposal, the step S1) it specifically includes:

Step S1-1) a large amount of network data is grabbed from internet；

It include the temporal information on side in the network data；

Step S1-2) if the network data grabbed is pre-processed, the pretreatment to delete comprising isolated node or The network data of node pair；

Step S1-3) in chronological sequence training set sequentially is extracted from pretreated network data；

It is selected from pretreated network data at a distance of the node pair for being double bounce, due to positive and negative imbalanced training sets, is needed pair Node is to sampling, so that positive and negative sample size is consistent；Sample after sampling is used as training sample after mark；All instructions The collection for practicing sample is combined into training set.

In above-mentioned technical proposal, the step S3) specific implementation process are as follows: by the feature vector in the feature set into Row classification, is then dispensed on the leaf node of the Gradient Iteration tree-model, directly takes the output valve of leaf node as new Multidimensional characteristic collection.

It is described the present invention also provides a kind of method of link prediction based on the link prediction model that the above method is established Method includes:

Step T1) extract the features of the forecast set data, predicted composition feature set；

Step T2) by the predicted characteristics collection input link prediction model, output result is that node connects future generation Probability.

The present invention has the advantages that

1, the training method of link prediction model of the invention only needs to start with from existing feature set, extension feature collection Quantity；It does not need to extract new feature from network again, considerably reduces feature extraction difficulty；

2, link prediction method of the invention is able to ascend link prediction performance；

3, method of the invention not only has universality to heterogeneous networks, but also tests the training set that different timing divide Collection has good robustness.

Detailed description of the invention

Fig. 1 is the flow chart of the training method of link prediction model of the invention.

Specific embodiment

Method of the invention is further described in detail with reference to the accompanying drawing.

As shown in Figure 1, a kind of training method of link prediction model, which comprises

Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set；Specific packet It includes:

Step S1-1) a large amount of network data is grabbed from internet；

It include the temporal information on side in the network data；

It is selected from pretreated network data at a distance of the node pair for being double bounce, due to positive and negative imbalanced training sets, is needed pair Node is to sampling, so that positive and negative sample size is consistent；Sample after sampling is used as training sample after mark；All instructions The collection for practicing sample is combined into training set；

Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction；

The feature of extraction includes: the feature based on neighbours and the feature based on network wandering；In the present embodiment, extraction Feature are as follows: Adamic-Adar and RootedPagerank.

Step S3) feature set is carried out with Gradient Iteration tree (Gradient Boosting Decision Trees) model Feature Conversion obtains new multidimensional characteristic collection；

Feature vector in feature set is classified, the leaf node of the Gradient Iteration tree-model is then dispensed for On, then directly take the output valve of leaf node as new multidimensional characteristic collection；

Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains described The parameter of link prediction model, to obtain the link prediction model that training finishes.

Link prediction model is obtained based on the above method, the present invention also provides a kind of method of link prediction, the side Method includes:

Claims

1. a kind of training method of link prediction model, which comprises

Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction；The feature packet It includes: the feature based on neighbours and the feature based on network wandering；

Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains described The parameter of link prediction model, to obtain the link prediction model that training finishes；

The step S1) it specifically includes:

Step S1-1) a large amount of network data is grabbed from internet；

It include the temporal information on side in the network data；

Step S1-2) if the network data grabbed is pre-processed, the pretreatment includes isolated node or node to delete Pair network data；

It selects from pretreated network data and is needed due to positive and negative imbalanced training sets to node at a distance of for the node pair of double bounce To sampling, so that positive and negative sample size is consistent；Sample after sampling is used as training sample after mark；All trained samples This collection is combined into training set；

The step S3) specific implementation process are as follows: the feature vector in the feature set is classified, institute is then dispensed for On the leaf node for stating Gradient Iteration tree-model, directly take the output valve of leaf node as new multidimensional characteristic collection.

2. a kind of method of link prediction, based on the link prediction model realization that claim 1 the method is established, this method packet It includes:

Step T1) extract forecast set data feature, predicted composition feature set；

Step T2) by the predicted characteristics collection input link prediction model, output result connects the following generation for node general Rate.