CN106959967A - A kind of training of link prediction model and link prediction method - Google Patents

A kind of training of link prediction model and link prediction method Download PDF

Info

Publication number
CN106959967A
CN106959967A CN201610018320.9A CN201610018320A CN106959967A CN 106959967 A CN106959967 A CN 106959967A CN 201610018320 A CN201610018320 A CN 201610018320A CN 106959967 A CN106959967 A CN 106959967A
Authority
CN
China
Prior art keywords
feature
training
link prediction
prediction model
network data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610018320.9A
Other languages
Chinese (zh)
Other versions
CN106959967B (en
Inventor
张艳
李太松
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201610018320.9A priority Critical patent/CN106959967B/en
Publication of CN106959967A publication Critical patent/CN106959967A/en
Application granted granted Critical
Publication of CN106959967B publication Critical patent/CN106959967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a kind of training method of link prediction model, methods described includes:Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;The feature includes:Feature based on neighbours and the feature based on network wandering;Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, training obtains the parameter of the link prediction model, so as to obtain training the link prediction model finished.The method of the present invention only needs to the quantity started with from existing feature set with regard to energy expanded features;New feature need not be extracted from network again, considerably reduce feature extraction difficulty;And improve the estimated performance and robustness of model.

Description

A kind of training of link prediction model and link prediction method
Technical field
The present invention relates in computer network field, and in particular to a kind of training of link prediction model and link prediction Method.
Background technology
Along with developing rapidly for internet and mobile communication technology, the contact between people becomes more and more closer. By internet and communication network, a huge complex network is constituted between men.Person to person in network it Between interaction, exchange and influence be dissolved into life in various aspects.To the research of community network also gradually by One of concern, and the study hotspot as contemporary scientific field.In society, many people are wished by analyzing society The structure of meeting network and change, find the contact principle between nodes, know to hide under general phenomenon Relation between rule, and community network topological features and nodal community feature and network node behavior trend, And then the differentiation essence of community network is found, help people more effectively to configure at resource and information using these information Reason, instruct commodity production, human lives, population management, naturally planning in terms of management, interpretation and decision-making. One important research point of wherein network node behavior trend is exactly link prediction.
Link prediction method is used for describing following development trend of network, can be refined to the connection prediction between node; Side that is incompleteness or hiding can also be found out in existing imperfect network.Traditional link prediction method one As utilize network topology characteristic and nodal community, be predicted using the method for machine learning;But due in feature There is bottleneck on quality and quantity, be unfavorable for the attribute of deep excavation network, therefore prediction effect is also restrained.
The content of the invention
It is an object of the invention to overcome the feature quantity of current link prediction method presence limited, intrinsic dimensionality is inadequate Abundant the problem of, it is proposed that a kind of training method of link prediction model, the feature set that this method extracts training set Model conversion is carried out, multidimensional characteristic collection is obtained, and feature set and multidimensional characteristic collection are merged is used as link prediction The input of model, improves the estimated performance of model, and the model can be lifted into link prediction for link prediction Energy.
To achieve these goals, the present invention provides a kind of training method of link prediction model, and methods described includes:
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;
Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;It is described Feature includes:Feature based on neighbours and the feature based on network wandering;
Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, train To the parameter of the link prediction model, so as to obtain training the link prediction model finished.
In above-mentioned technical proposal, the step S1) specifically include:
Step S1-1) substantial amounts of network data is captured from internet;
The temporal information on side is included in the network data;
Step S1-2) if the network data captured is pre-processed, the pretreatment includes isolated node to delete Or the network data of node pair;
Step S1-3) in chronological sequence sequentially extract training set from pretreated network data;
Selected from the network data of pretreatment at a distance of be double bounce node pair, due to positive and negative imbalanced training sets, it is necessary to To node to sampling so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark This;The collection of all training samples is combined into training set.
In above-mentioned technical proposal, the step S3) the process that implements be:By the feature in the feature set to Amount is classified, and on the leaf node for being then dispensed for the Gradient Iteration tree-model, directly takes the defeated of leaf node Go out value as new multidimensional characteristic collection.
The link prediction model set up based on the above method, present invention also offers a kind of method of link prediction, institute The method of stating includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) predicted characteristics collection is inputted into the link prediction model, output result is that node will produce company to future The probability connect.
The advantage of the invention is that:
1st, the training method of link prediction model of the invention only needs to start with from existing feature set, expanded features Quantity;New feature need not be extracted from network again, considerably reduce feature extraction difficulty;
2nd, link prediction method of the invention can lift link prediction performance;
3rd, the method for the present invention not only has universality to heterogeneous networks, and the training set that different sequential are divided is surveyed Examination collection has good robustness.
Brief description of the drawings
Fig. 1 is the flow chart of the training method of the link prediction model of the present invention.
Embodiment
The method of the present invention is further described in detail below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of training method of link prediction model, methods described includes:
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;Tool Body includes:
Step S1-1) substantial amounts of network data is captured from internet;
The temporal information on side is included in the network data;
Step S1-2) if the network data captured is pre-processed, the pretreatment includes isolated node to delete Or the network data of node pair;
Step S1-3) in chronological sequence sequentially extract training set from pretreated network data;
Selected from the network data of pretreatment at a distance of be double bounce node pair, due to positive and negative imbalanced training sets, it is necessary to To node to sampling so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark This;The collection of all training samples is combined into training set;
Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;
The feature of extraction includes:Feature based on neighbours and the feature based on network wandering;In the present embodiment, carry The feature taken is:Adamic-Adar and RootedPagerank.
Step S3) feature set is carried out with Gradient Iteration tree (Gradient Boosting Decision Trees) model Feature Conversion, obtains new multidimensional characteristic collection;
Characteristic vector in feature set is classified, the leaf node of the Gradient Iteration tree-model is then dispensed for On, the output valve of leaf node is then directly taken as new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, training obtains institute The parameter of link prediction model is stated, so as to obtain training the link prediction model finished.
Link prediction model is obtained based on the above method, it is described present invention also offers a kind of method of link prediction Method includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) predicted characteristics collection is inputted into the link prediction model, output result is that node will produce company to future The probability connect.

Claims (4)

1. a kind of training method of link prediction model, methods described includes:
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;
Step S2) feature extraction is carried out to the network that training set is constructed, the feature of extraction is constituted into feature set;It is described Feature includes:Feature based on neighbours and the feature based on network wandering;
Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are subjected to fusion input link forecast model, train To the parameter of the link prediction model, so as to obtain training the link prediction model finished.
2. the training method of link prediction model according to claim 1, it is characterised in that the step S1) Specifically include:
Step S1-1) substantial amounts of network data is captured from internet;
The temporal information on side is included in the network data;
Step S1-2) if the network data captured is pre-processed, the pretreatment includes isolated node to delete Or the network data of node pair;
Step S1-3) in chronological sequence sequentially extract training set from pretreated network data;
Selected from the network data of pretreatment at a distance of be double bounce node pair, due to positive and negative imbalanced training sets, it is necessary to To node to sampling so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark This;The collection of all training samples is combined into training set.
3. the training method of link prediction model according to claim 1, it is characterised in that the step S3) The process that implements be:Characteristic vector in the feature set is classified, the gradient is then dispensed for and changes For on the leaf node of tree-model, directly taking the output valve of leaf node as new multidimensional characteristic collection.
4. a kind of method of link prediction, the link prediction model that the method based on one of claim 1-3 is set up is real Existing, this method includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) predicted characteristics collection is inputted into the link prediction model, output result is that node will produce company to future The probability connect.
CN201610018320.9A 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model Active CN106959967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610018320.9A CN106959967B (en) 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610018320.9A CN106959967B (en) 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model

Publications (2)

Publication Number Publication Date
CN106959967A true CN106959967A (en) 2017-07-18
CN106959967B CN106959967B (en) 2019-11-19

Family

ID=59480620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610018320.9A Active CN106959967B (en) 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model

Country Status (1)

Country Link
CN (1) CN106959967B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214599A (en) * 2018-10-25 2019-01-15 北京师范大学 The method that a kind of pair of complex network carries out link prediction
CN110020379A (en) * 2018-01-04 2019-07-16 中国科学院声学研究所 It is a kind of to be embedded in the link prediction method for indicating model based on depth dynamic network
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110190909A (en) * 2019-06-06 2019-08-30 北京邮电大学 A kind of signal equalizing method and device for optic communication
CN110335165A (en) * 2019-06-28 2019-10-15 京东数字科技控股有限公司 A kind of link prediction method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886169A (en) * 2012-12-19 2014-06-25 电子科技大学 Link prediction algorithm based on AdaBoost
CN104751200A (en) * 2015-04-10 2015-07-01 中国电力科学研究院 SVM network business classification method
CN104767692A (en) * 2015-04-15 2015-07-08 中国电力科学研究院 Network traffic classification method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886169A (en) * 2012-12-19 2014-06-25 电子科技大学 Link prediction algorithm based on AdaBoost
CN104751200A (en) * 2015-04-10 2015-07-01 中国电力科学研究院 SVM network business classification method
CN104767692A (en) * 2015-04-15 2015-07-08 中国电力科学研究院 Network traffic classification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李勇军 等: "基于最大熵模型的微博传播网络中的链路预测", 《物理学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020379A (en) * 2018-01-04 2019-07-16 中国科学院声学研究所 It is a kind of to be embedded in the link prediction method for indicating model based on depth dynamic network
CN110020379B (en) * 2018-01-04 2021-02-09 中国科学院声学研究所 Link prediction method based on deep dynamic network embedded representation model
CN109214599A (en) * 2018-10-25 2019-01-15 北京师范大学 The method that a kind of pair of complex network carries out link prediction
CN109214599B (en) * 2018-10-25 2022-02-15 北京师范大学 Method for predicting link of complex network
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110190909A (en) * 2019-06-06 2019-08-30 北京邮电大学 A kind of signal equalizing method and device for optic communication
CN110335165A (en) * 2019-06-28 2019-10-15 京东数字科技控股有限公司 A kind of link prediction method and apparatus
CN110335165B (en) * 2019-06-28 2021-03-30 京东数字科技控股有限公司 Link prediction method and device

Also Published As

Publication number Publication date
CN106959967B (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CN109710701B (en) Automatic construction method for big data knowledge graph in public safety field
CN108763445B (en) Construction method, device, computer equipment and the storage medium in patent knowledge library
CN106959967A (en) A kind of training of link prediction model and link prediction method
CN104268271B (en) The myspace of the double cohesions of a kind of interest and network structure finds method
CN106357942A (en) Intelligent response method and system based on context dialogue semantic recognition
CN106156286A (en) Type extraction system and method towards technical literature knowledge entity
CN109614508A (en) A kind of image of clothing searching method based on deep learning
CN105843875A (en) Smart robot-oriented question and answer data processing method and apparatus
CN107562947A (en) A kind of Mobile Space-time perceives the lower dynamic method for establishing model of recommendation service immediately
CN106599230A (en) Method and system for evaluating distributed data mining model
CN106294738B (en) A kind of Intelligent household scene configuration method
CN112036445A (en) Cross-social-network user identity recognition method based on neural tensor network
CN110442700A (en) Man-machine more wheel dialogue methods and system, smart machine for human-computer interaction
CN108270608B (en) Link prediction model establishment and link prediction method
CN109447261A (en) A method of the network representation study based on multistage neighbouring similarity
Wang et al. A novel blockchain oracle implementation scheme based on application specific knowledge engines
CN112365139A (en) Crowd danger degree analysis method under graph convolution neural network
CN114549845A (en) Logo image detection method and system based on feature fusion
CN107086925A (en) A kind of internet traffic big data analysis method based on deep learning
Caiqian et al. Multimedia system and database simulation based on internet of things and cloud service platform
CN116805152A (en) Context-aware event prediction method, device and medium based on graphic entanglement
Wu et al. Identifying potential standard essential patents based on text mining and generative topographic mapping
CN105608183A (en) Method and apparatus for providing answer of aggregation type
CN115712511A (en) Web service classification method based on bilinear graph neural network
CN106599931B (en) A kind of broken ridge line correlating method based on random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant