CN106959967B - A kind of training and link prediction method of link prediction model - Google Patents

A kind of training and link prediction method of link prediction model Download PDF

Info

Publication number
CN106959967B
CN106959967B CN201610018320.9A CN201610018320A CN106959967B CN 106959967 B CN106959967 B CN 106959967B CN 201610018320 A CN201610018320 A CN 201610018320A CN 106959967 B CN106959967 B CN 106959967B
Authority
CN
China
Prior art keywords
feature
training
link prediction
prediction model
network data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610018320.9A
Other languages
Chinese (zh)
Other versions
CN106959967A (en
Inventor
张艳
李太松
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201610018320.9A priority Critical patent/CN106959967B/en
Publication of CN106959967A publication Critical patent/CN106959967A/en
Application granted granted Critical
Publication of CN106959967B publication Critical patent/CN106959967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides a kind of training methods of link prediction model, which comprises step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction;The feature includes: the feature based on neighbours and the feature based on network wandering;Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains the parameter of the link prediction model, to obtain the link prediction model that training finishes.Method of the invention only need to start with from existing feature set can extension feature collection quantity;It does not need to extract new feature from network again, considerably reduces feature extraction difficulty;And improve the estimated performance and robustness of model.

Description

A kind of training and link prediction method of link prediction model
Technical field
The present invention relates in computer network field, and in particular to a kind of training of link prediction model and link prediction side Method.
Background technique
Along with the rapid development of internet and mobile communication technology, the connection between people becomes increasingly closer.It is logical Internet and communication network are crossed, constitutes a huge complex network between men.Interpersonal interaction in network, The various aspects in life have been dissolved into exchange and influence.It also gradually attracts attention to the research of community network, and becomes current One of research hotspot of scientific domain.In society, many people wish the structure and variation by analyzing community network, discovery Connection principle between nodes knows to hide rule and community network topological features under general phenomenon Relationship between nodal community feature and network node behavior trend, and then find the differentiation essence of community network, utilize this A little information help people more effectively to configure resource and information processing, instruct commodity production, human lives, population management, nature Management, interpretation and the decision of planning etc..Wherein an important research point of network node behavior trend is exactly link prediction.
Link prediction method is used to describe the development trend in network future, the connection prediction that can be refined between node; Incomplete either hiding side can also be found out in existing imperfect network.Traditional link prediction method generally utilizes Network topology characteristic and nodal community are predicted using the method for machine learning;But due on the quality and quantity of feature There are bottleneck, it is unfavorable for the attribute for going deep into excavating network, therefore prediction effect is also restrained.
Summary of the invention
It is an object of the invention to overcome feature quantity existing for current link prediction method limited, intrinsic dimensionality is not rich enough Rich problem, proposes a kind of training method of link prediction model, and the feature set that this method extracts training set carries out model Conversion obtains multidimensional characteristic collection, and feature set and multidimensional characteristic collection are carried out to merge the input as link prediction model, is promoted The estimated performance of model, which is used for link prediction can promote link prediction performance.
To achieve the goals above, the present invention provides a kind of training method of link prediction model, which comprises
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;
Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction;The feature It include: the feature based on neighbours and the feature based on network wandering;
Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains The parameter of the link prediction model, to obtain the link prediction model that training finishes.
In above-mentioned technical proposal, the step S1) it specifically includes:
Step S1-1) a large amount of network data is grabbed from internet;
It include the temporal information on side in the network data;
Step S1-2) if the network data grabbed is pre-processed, the pretreatment to delete comprising isolated node or The network data of node pair;
Step S1-3) in chronological sequence training set sequentially is extracted from pretreated network data;
It is selected from pretreated network data at a distance of the node pair for being double bounce, due to positive and negative imbalanced training sets, is needed pair Node is to sampling, so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark;All instructions The collection for practicing sample is combined into training set.
In above-mentioned technical proposal, the step S3) specific implementation process are as follows: by the feature vector in the feature set into Row classification, is then dispensed on the leaf node of the Gradient Iteration tree-model, directly takes the output valve of leaf node as new Multidimensional characteristic collection.
It is described the present invention also provides a kind of method of link prediction based on the link prediction model that the above method is established Method includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) by the predicted characteristics collection input link prediction model, output result is that node connects future generation Probability.
The present invention has the advantages that
1, the training method of link prediction model of the invention only needs to start with from existing feature set, extension feature collection Quantity;It does not need to extract new feature from network again, considerably reduces feature extraction difficulty;
2, link prediction method of the invention is able to ascend link prediction performance;
3, method of the invention not only has universality to heterogeneous networks, but also tests the training set that different timing divide Collection has good robustness.
Detailed description of the invention
Fig. 1 is the flow chart of the training method of link prediction model of the invention.
Specific embodiment
Method of the invention is further described in detail with reference to the accompanying drawing.
As shown in Figure 1, a kind of training method of link prediction model, which comprises
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;Specific packet It includes:
Step S1-1) a large amount of network data is grabbed from internet;
It include the temporal information on side in the network data;
Step S1-2) if the network data grabbed is pre-processed, the pretreatment to delete comprising isolated node or The network data of node pair;
Step S1-3) in chronological sequence training set sequentially is extracted from pretreated network data;
It is selected from pretreated network data at a distance of the node pair for being double bounce, due to positive and negative imbalanced training sets, is needed pair Node is to sampling, so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark;All instructions The collection for practicing sample is combined into training set;
Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction;
The feature of extraction includes: the feature based on neighbours and the feature based on network wandering;In the present embodiment, extraction Feature are as follows: Adamic-Adar and RootedPagerank.
Step S3) feature set is carried out with Gradient Iteration tree (Gradient Boosting Decision Trees) model Feature Conversion obtains new multidimensional characteristic collection;
Feature vector in feature set is classified, the leaf node of the Gradient Iteration tree-model is then dispensed for On, then directly take the output valve of leaf node as new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains described The parameter of link prediction model, to obtain the link prediction model that training finishes.
Link prediction model is obtained based on the above method, the present invention also provides a kind of method of link prediction, the side Method includes:
Step T1) extract the features of the forecast set data, predicted composition feature set;
Step T2) by the predicted characteristics collection input link prediction model, output result is that node connects future generation Probability.

Claims (2)

1. a kind of training method of link prediction model, which comprises
Step S1) crawl network data is pre-processed, pretreated network data is extracted into training set;
Step S2) feature extraction is carried out to the network of training set construction, by the feature composition characteristic collection of extraction;The feature packet It includes: the feature based on neighbours and the feature based on network wandering;
Step S3) Feature Conversion is carried out with Gradient Iteration tree-model to the feature set, obtain new multidimensional characteristic collection;
Step S4) feature set and new multidimensional characteristic collection are carried out to merge input link prediction model, training obtains described The parameter of link prediction model, to obtain the link prediction model that training finishes;
The step S1) it specifically includes:
Step S1-1) a large amount of network data is grabbed from internet;
It include the temporal information on side in the network data;
Step S1-2) if the network data grabbed is pre-processed, the pretreatment includes isolated node or node to delete Pair network data;
Step S1-3) in chronological sequence training set sequentially is extracted from pretreated network data;
It selects from pretreated network data and is needed due to positive and negative imbalanced training sets to node at a distance of for the node pair of double bounce To sampling, so that positive and negative sample size is consistent;Sample after sampling is used as training sample after mark;All trained samples This collection is combined into training set;
The step S3) specific implementation process are as follows: the feature vector in the feature set is classified, institute is then dispensed for On the leaf node for stating Gradient Iteration tree-model, directly take the output valve of leaf node as new multidimensional characteristic collection.
2. a kind of method of link prediction, based on the link prediction model realization that claim 1 the method is established, this method packet It includes:
Step T1) extract forecast set data feature, predicted composition feature set;
Step T2) by the predicted characteristics collection input link prediction model, output result connects the following generation for node general Rate.
CN201610018320.9A 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model Active CN106959967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610018320.9A CN106959967B (en) 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610018320.9A CN106959967B (en) 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model

Publications (2)

Publication Number Publication Date
CN106959967A CN106959967A (en) 2017-07-18
CN106959967B true CN106959967B (en) 2019-11-19

Family

ID=59480620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610018320.9A Active CN106959967B (en) 2016-01-12 2016-01-12 A kind of training and link prediction method of link prediction model

Country Status (1)

Country Link
CN (1) CN106959967B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020379B (en) * 2018-01-04 2021-02-09 中国科学院声学研究所 Link prediction method based on deep dynamic network embedded representation model
CN109214599B (en) * 2018-10-25 2022-02-15 北京师范大学 Method for predicting link of complex network
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110190909B (en) * 2019-06-06 2020-09-29 北京邮电大学 Signal equalization method and device for optical communication
CN110335165B (en) * 2019-06-28 2021-03-30 京东数字科技控股有限公司 Link prediction method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886169A (en) * 2012-12-19 2014-06-25 电子科技大学 Link prediction algorithm based on AdaBoost
CN104751200B (en) * 2015-04-10 2019-05-21 中国电力科学研究院 A kind of method of SVM network traffic classification
CN104767692B (en) * 2015-04-15 2018-05-29 中国电力科学研究院 A kind of net flow assorted method

Also Published As

Publication number Publication date
CN106959967A (en) 2017-07-18

Similar Documents

Publication Publication Date Title
CN106959967B (en) A kind of training and link prediction method of link prediction model
CN106156286B (en) Type extraction system and method towards technical literature knowledge entity
CN102035698B (en) HTTP tunnel detection method based on decision tree classification algorithm
Tang et al. Graphgpt: Graph instruction tuning for large language models
Shi et al. Event detection and identification of influential spreaders in social media data streams
CN105959372B (en) A kind of Internet user's data analysis method based on mobile application
WO2013170587A1 (en) Multimedia question and answer system and method
CN104090931A (en) Information prediction and acquisition method based on webpage link parameter analysis
CN106980651B (en) Crawling seed list updating method and device based on knowledge graph
CN110210294A (en) Evaluation method, device, storage medium and the computer equipment of Optimized model
Kim et al. Event diffusion patterns in social media
CN105488211A (en) Method for determining user group based on feature analysis
CN109871686A (en) Rogue program recognition methods and device based on icon representation and software action consistency analysis
CN108737290A (en) Non-encrypted method for recognizing flux based on load mapping and random forest
CN105512301A (en) User grouping method based on social content
Chu et al. Prefix-graph: A versatile log parsing approach merging prefix tree with probabilistic graph
Zhang et al. An automatic and efficient malware traffic classification method for secure Internet of Things
Keyvanpour A survey on community detection methods based on the nature of social networks
CN105184654A (en) Public opinion hotspot real-time acquisition method and acquisition device based on community division
CN108270608A (en) A kind of foundation of link prediction model and link prediction method
CN110609898A (en) Self-classification method for unbalanced text data
CN116805152A (en) Context-aware event prediction method, device and medium based on graphic entanglement
CN103853720A (en) User attention based network sensitive information monitoring system and method
CN105589935A (en) Social group recognition method
Wang et al. Data mining in IoT era: A method based on improved frequent items mining algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant