CN106959967A - A kind of training of link prediction model and link prediction method - Google Patents
A kind of training of link prediction model and link prediction method Download PDFInfo
- Publication number
- CN106959967A CN106959967A CN201610018320.9A CN201610018320A CN106959967A CN 106959967 A CN106959967 A CN 106959967A CN 201610018320 A CN201610018320 A CN 201610018320A CN 106959967 A CN106959967 A CN 106959967A
- Authority
- CN
- China
- Prior art keywords
- feature
- training
- link prediction
- prediction model
- network data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/02—Topology update or discovery
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a kind of training method of link prediction model, methods described includes:Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;The feature includes:Feature based on neighbours and the feature based on network wandering;Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, training obtains the parameter of the link prediction model, so as to obtain training the link prediction model finished.The method of the present invention only needs to the quantity started with from existing feature set with regard to energy expanded features;New feature need not be extracted from network again, considerably reduce feature extraction difficulty;And improve the estimated performance and robustness of model.
Description
Technical field
The present invention relates in computer network field, and in particular to a kind of training of link prediction model and link prediction
Method.
Background technology
Along with developing rapidly for internet and mobile communication technology, the contact between people becomes more and more closer.
By internet and communication network, a huge complex network is constituted between men.Person to person in network it
Between interaction, exchange and influence be dissolved into life in various aspects.To the research of community network also gradually by
One of concern, and the study hotspot as contemporary scientific field.In society, many people are wished by analyzing society
The structure of meeting network and change, find the contact principle between nodes, know to hide under general phenomenon
Relation between rule, and community network topological features and nodal community feature and network node behavior trend,
And then the differentiation essence of community network is found, help people more effectively to configure at resource and information using these information
Reason, instruct commodity production, human lives, population management, naturally planning in terms of management, interpretation and decision-making.
One important research point of wherein network node behavior trend is exactly link prediction.
Link prediction method is used for describing following development trend of network, can be refined to the connection prediction between node;
Side that is incompleteness or hiding can also be found out in existing imperfect network.Traditional link prediction method one
As utilize network topology characteristic and nodal community, be predicted using the method for machine learning;But due in feature
There is bottleneck on quality and quantity, be unfavorable for the attribute of deep excavation network, therefore prediction effect is also restrained.
The content of the invention
It is an object of the invention to overcome the feature quantity of current link prediction method presence limited, intrinsic dimensionality is inadequate
Abundant the problem of, it is proposed that a kind of training method of link prediction model, the feature set that this method extracts training set
Model conversion is carried out, multidimensional characteristic collection is obtained, and feature set and multidimensional characteristic collection are merged is used as link prediction
The input of model, improves the estimated performance of model, and the model can be lifted into link prediction for link prediction
Energy.
To achieve these goals, the present invention provides a kind of training method of link prediction model, and methods described includes:
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;
Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;It is described
Feature includes:Feature based on neighbours and the feature based on network wandering;
Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, train
To the parameter of the link prediction model, so as to obtain training the link prediction model finished.
In above-mentioned technical proposal, the step S1) specifically include:
Step S1-1) substantial amounts of network data is captured from internet;
The temporal information on side is included in the network data;
Step S1-2) if the network data captured is pre-processed, the pretreatment includes isolated node to delete
Or the network data of node pair;
Step S1-3) in chronological sequence sequentially extract training set from pretreated network data;
Selected from the network data of pretreatment at a distance of be double bounce node pair, due to positive and negative imbalanced training sets, it is necessary to
To node to sampling so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark
This;The collection of all training samples is combined into training set.
In above-mentioned technical proposal, the step S3) the process that implements be:By the feature in the feature set to
Amount is classified, and on the leaf node for being then dispensed for the Gradient Iteration tree-model, directly takes the defeated of leaf node
Go out value as new multidimensional characteristic collection.
The link prediction model set up based on the above method, present invention also offers a kind of method of link prediction, institute
The method of stating includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) predicted characteristics collection is inputted into the link prediction model, output result is that node will produce company to future
The probability connect.
The advantage of the invention is that:
1st, the training method of link prediction model of the invention only needs to start with from existing feature set, expanded features
Quantity;New feature need not be extracted from network again, considerably reduce feature extraction difficulty;
2nd, link prediction method of the invention can lift link prediction performance;
3rd, the method for the present invention not only has universality to heterogeneous networks, and the training set that different sequential are divided is surveyed
Examination collection has good robustness.
Brief description of the drawings
Fig. 1 is the flow chart of the training method of the link prediction model of the present invention.
Embodiment
The method of the present invention is further described in detail below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of training method of link prediction model, methods described includes:
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;Tool
Body includes:
Step S1-1) substantial amounts of network data is captured from internet;
The temporal information on side is included in the network data;
Step S1-2) if the network data captured is pre-processed, the pretreatment includes isolated node to delete
Or the network data of node pair;
Step S1-3) in chronological sequence sequentially extract training set from pretreated network data;
Selected from the network data of pretreatment at a distance of be double bounce node pair, due to positive and negative imbalanced training sets, it is necessary to
To node to sampling so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark
This;The collection of all training samples is combined into training set;
Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;
The feature of extraction includes:Feature based on neighbours and the feature based on network wandering;In the present embodiment, carry
The feature taken is:Adamic-Adar and RootedPagerank.
Step S3) feature set is carried out with Gradient Iteration tree (Gradient Boosting Decision Trees) model
Feature Conversion, obtains new multidimensional characteristic collection;
Characteristic vector in feature set is classified, the leaf node of the Gradient Iteration tree-model is then dispensed for
On, the output valve of leaf node is then directly taken as new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, training obtains institute
The parameter of link prediction model is stated, so as to obtain training the link prediction model finished.
Link prediction model is obtained based on the above method, it is described present invention also offers a kind of method of link prediction
Method includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) predicted characteristics collection is inputted into the link prediction model, output result is that node will produce company to future
The probability connect.
Claims (4)
1. a kind of training method of link prediction model, methods described includes:
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;
Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;It is described
Feature includes:Feature based on neighbours and the feature based on network wandering;
Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, train
To the parameter of the link prediction model, so as to obtain training the link prediction model finished.
2. the training method of link prediction model according to claim 1, it is characterised in that the step S1)
Specifically include:
Step S1-1) substantial amounts of network data is captured from internet;
The temporal information on side is included in the network data;
Step S1-2) if the network data captured is pre-processed, the pretreatment includes isolated node to delete
Or the network data of node pair;
Step S1-3) in chronological sequence sequentially extract training set from pretreated network data;
Selected from the network data of pretreatment at a distance of be double bounce node pair, due to positive and negative imbalanced training sets, it is necessary to
To node to sampling so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark
This;The collection of all training samples is combined into training set.
3. the training method of link prediction model according to claim 1, it is characterised in that the step S3)
The process that implements be:Characteristic vector in the feature set is classified, the gradient is then dispensed for and changes
For on the leaf node of tree-model, directly taking the output valve of leaf node as new multidimensional characteristic collection.
4. a kind of method of link prediction, the link prediction model that the method based on one of claim 1-3 is set up is real
Existing, this method includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) predicted characteristics collection is inputted into the link prediction model, output result is that node will produce company to future
The probability connect.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610018320.9A CN106959967B (en) | 2016-01-12 | 2016-01-12 | A kind of training and link prediction method of link prediction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610018320.9A CN106959967B (en) | 2016-01-12 | 2016-01-12 | A kind of training and link prediction method of link prediction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106959967A true CN106959967A (en) | 2017-07-18 |
CN106959967B CN106959967B (en) | 2019-11-19 |
Family
ID=59480620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610018320.9A Active CN106959967B (en) | 2016-01-12 | 2016-01-12 | A kind of training and link prediction method of link prediction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106959967B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214599A (en) * | 2018-10-25 | 2019-01-15 | 北京师范大学 | The method that a kind of pair of complex network carries out link prediction |
CN110020379A (en) * | 2018-01-04 | 2019-07-16 | 中国科学院声学研究所 | It is a kind of to be embedded in the link prediction method for indicating model based on depth dynamic network |
CN110083778A (en) * | 2019-04-08 | 2019-08-02 | 清华大学 | The figure convolutional neural networks construction method and device of study separation characterization |
CN110190909A (en) * | 2019-06-06 | 2019-08-30 | 北京邮电大学 | A kind of signal equalizing method and device for optic communication |
CN110335165A (en) * | 2019-06-28 | 2019-10-15 | 京东数字科技控股有限公司 | A kind of link prediction method and apparatus |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103886169A (en) * | 2012-12-19 | 2014-06-25 | 电子科技大学 | Link prediction algorithm based on AdaBoost |
CN104751200A (en) * | 2015-04-10 | 2015-07-01 | 中国电力科学研究院 | SVM network business classification method |
CN104767692A (en) * | 2015-04-15 | 2015-07-08 | 中国电力科学研究院 | Network traffic classification method |
-
2016
- 2016-01-12 CN CN201610018320.9A patent/CN106959967B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103886169A (en) * | 2012-12-19 | 2014-06-25 | 电子科技大学 | Link prediction algorithm based on AdaBoost |
CN104751200A (en) * | 2015-04-10 | 2015-07-01 | 中国电力科学研究院 | SVM network business classification method |
CN104767692A (en) * | 2015-04-15 | 2015-07-08 | 中国电力科学研究院 | Network traffic classification method |
Non-Patent Citations (1)
Title |
---|
李勇军 等: "基于最大熵模型的微博传播网络中的链路预测", 《物理学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020379A (en) * | 2018-01-04 | 2019-07-16 | 中国科学院声学研究所 | It is a kind of to be embedded in the link prediction method for indicating model based on depth dynamic network |
CN110020379B (en) * | 2018-01-04 | 2021-02-09 | 中国科学院声学研究所 | Link prediction method based on deep dynamic network embedded representation model |
CN109214599A (en) * | 2018-10-25 | 2019-01-15 | 北京师范大学 | The method that a kind of pair of complex network carries out link prediction |
CN109214599B (en) * | 2018-10-25 | 2022-02-15 | 北京师范大学 | Method for predicting link of complex network |
CN110083778A (en) * | 2019-04-08 | 2019-08-02 | 清华大学 | The figure convolutional neural networks construction method and device of study separation characterization |
CN110190909A (en) * | 2019-06-06 | 2019-08-30 | 北京邮电大学 | A kind of signal equalizing method and device for optic communication |
CN110335165A (en) * | 2019-06-28 | 2019-10-15 | 京东数字科技控股有限公司 | A kind of link prediction method and apparatus |
CN110335165B (en) * | 2019-06-28 | 2021-03-30 | 京东数字科技控股有限公司 | Link prediction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106959967B (en) | 2019-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109710701B (en) | Automatic construction method for big data knowledge graph in public safety field | |
CN108763445B (en) | Construction method, device, computer equipment and the storage medium in patent knowledge library | |
CN106959967A (en) | A kind of training of link prediction model and link prediction method | |
CN104268271B (en) | The myspace of the double cohesions of a kind of interest and network structure finds method | |
CN106357942A (en) | Intelligent response method and system based on context dialogue semantic recognition | |
CN106156286A (en) | Type extraction system and method towards technical literature knowledge entity | |
CN109614508A (en) | A kind of image of clothing searching method based on deep learning | |
CN105843875A (en) | Smart robot-oriented question and answer data processing method and apparatus | |
CN107562947A (en) | A kind of Mobile Space-time perceives the lower dynamic method for establishing model of recommendation service immediately | |
CN106599230A (en) | Method and system for evaluating distributed data mining model | |
CN106294738B (en) | A kind of Intelligent household scene configuration method | |
CN112036445A (en) | Cross-social-network user identity recognition method based on neural tensor network | |
CN110442700A (en) | Man-machine more wheel dialogue methods and system, smart machine for human-computer interaction | |
CN108270608B (en) | Link prediction model establishment and link prediction method | |
CN109447261A (en) | A method of the network representation study based on multistage neighbouring similarity | |
Wang et al. | A novel blockchain oracle implementation scheme based on application specific knowledge engines | |
CN112365139A (en) | Crowd danger degree analysis method under graph convolution neural network | |
CN114549845A (en) | Logo image detection method and system based on feature fusion | |
CN107086925A (en) | A kind of internet traffic big data analysis method based on deep learning | |
Caiqian et al. | Multimedia system and database simulation based on internet of things and cloud service platform | |
CN116805152A (en) | Context-aware event prediction method, device and medium based on graphic entanglement | |
Wu et al. | Identifying potential standard essential patents based on text mining and generative topographic mapping | |
CN105608183A (en) | Method and apparatus for providing answer of aggregation type | |
CN115712511A (en) | Web service classification method based on bilinear graph neural network | |
CN106599931B (en) | A kind of broken ridge line correlating method based on random forest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |