CN106960251A - A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology - Google Patents

A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology Download PDF

Info

Publication number
CN106960251A
CN106960251A CN201710136070.3A CN201710136070A CN106960251A CN 106960251 A CN106960251 A CN 106960251A CN 201710136070 A CN201710136070 A CN 201710136070A CN 106960251 A CN106960251 A CN 106960251A
Authority
CN
China
Prior art keywords
index
gamma
node
similarity
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710136070.3A
Other languages
Chinese (zh)
Inventor
宣琦
赵明浩
虞烨炜
周鸣鸣
傅晨波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201710136070.3A priority Critical patent/CN106960251A/en
Publication of CN106960251A publication Critical patent/CN106960251A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology, comprise the following steps:1) according to data with existing collection, including node and even side right value, set up undirected graph;2) the following three classes similarity indices of network in calculating 1) respectively:Local similarity index, global similarity indices and half local similarity index;3) the three class similarity indices according to obtained by being calculated in 2), using multiple linear regression model, connect side right value, then with ten folding cross-validation methods, testing model average behavior, including Pearson's coefficient and root-mean-square value in prediction test set.The present invention utilizes node similitude, and company's side right value of missing is predicted using multiple linear regression model, and model is simple, predicts the outcome preferably.

Description

Undirected network connection edge weight prediction method based on node similarity
Technical Field
The invention relates to the field of link prediction and data mining, in particular to a method for predicting a connection edge weight based on network node similarity.
Background
In reality, many systems can be abstracted into a model of a complex network, individual objects in the system are abstracted into nodes, and relationships between individuals are abstracted into connecting edges, such as a social network, a protein interaction network, a power network and the like. The network connection edge is used as a bridge for connecting individual objects and plays an important role in revealing a network structure. In reality, the edges of many networks are weighted, and the weights of the edges have definite physical meanings. For various reasons, some network link weights may be missing, and especially when the missing weights contain important network structure information, the prediction of these weights is very critical.
Disclosure of Invention
In order to overcome the defect of poor model prediction results caused by the lack of the existing network continuous edge weights, the invention uses the similarity of network nodes and adopts a multiple linear regression model to predict the missing continuous edge weights, and provides a continuous edge weight prediction method based on undirected network node similarity with better model prediction results.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting a connection edge weight value based on network node similarity comprises the following steps:
s1: constructing a undirected network graph G (V, E) by using an existing undirected network structure data set which comprises connecting edge weights between network nodes and nodes;
s2: according to the graph G ═ V, E, using the node similarity theory in link prediction, the following three types of features are calculated: the method comprises the following steps of obtaining a local similarity index, a global similarity index and a semi-local similarity index, wherein the local similarity index comprises a common neighbor CN, a Salton index, a Jaccard index, a S phi rensen index, a major node favorable index HPI, a major node unfavorable index HDI, an LHN-I index, a preferential link index PA, an Adamic-Adar index AA and a resource allocation index RA; the global similarity indexes comprise Katz indexes, LHN-II indexes, average commute time ACT and cosine similarity cos based on random walk+Random walk RWR with restart, SimRank index SimR and matrix forest index MFI; the semi-local similarity index comprises a local path index LP, a local random walk index LRW and a superposed local random walk index SRW;
s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2, and finally obtaining the following evaluation indexes according to the fitting result and the original data: pearson correlation coefficient and root mean square value.
The invention has the beneficial effects that: and by utilizing the node similarity and adopting a multiple linear regression model to predict missing continuous edge weights, the model is simple and the prediction result is better.
Drawings
Fig. 1 is a flowchart of a undirected network edge prediction method incorporating node similarity in an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a undirected network connection edge weight prediction method based on node similarity includes the following steps:
s1: constructing a undirected network graph G (V, E) by using an existing nematode neural network (C.elegans) data set, wherein nodes represent neurons of the nematode, and edges represent synapses or gap connections of the neurons;
s2: adjacency matrix a of fig. G ═ (a)ij)n×n,i,j∈{1,2,...,n},
Wherein:
according to the adjacency matrix A, the following similarity indexes are respectively calculated:
1) common neighbor CN:
where | Q | represents the number of elements of set Q, (x) is defined as the set of neighbor nodes for node x,the CN index value between the node x and the node y is expressed as follows;
2) the Salton index:
wherein k isxA value representing x;
3) jaccard index:
4) s φ rensen index:
5) the favorable index HPI of the large node:
6) major node unfavorable index HDI:
7) LHN-I index:
8) priority link index PA:
9) Adamic-Adar index AA:
10) resource allocation index (RA):
11) katz index:
SKatz=(I-βA)-1-I
where I is the identity matrix, the value of the parameter β must be less than the maximum eigenvalue λ of the adjacency matrix A1To ensure matrix convergence;
12) LHN-II index:
wherein,xyis the function of Kronecker, when x is yxyThe number of bits is 1, otherwise,xyd is the degree matrix of the undirected network graph G, i.e. Dij=ki ij,kxDenotes the value of x, phi is an adjustable parameter, and the value range is (0,1), lambda1Is the maximum eigenvalue of the adjacency matrix A, and M is the total number of edges of the network;
13) average commute time ACT:
wherein the pseudo-inverse of the laplace matrix L (L ═ D-a) of the network G is L+A representation matrix L+The elements of (1);
14) cosine similarity cos based on random walk+
15) Random walk with restart RWR:
wherein,
element pixyExpressed as how much probability the particle from node x eventually has to go to node y, (1-c) is the particle return probability, and P is the Markov probability transition matrix of the network, whose elements PxyRepresenting the probability of the particle at node x going to node y next;
16) SimRank index SimR:
wherein s isxx=1,C∈[0,1]Is the attenuation parameter at the time of similarity transmission;
17) matrix forest index MFI:
SMFI=(I+αL)-1,α>0
wherein, the laplace matrix of the network G is L (L ═ D-a), and I is an identity matrix;
18) local path index LP:
SLP=A2+A3
wherein the parameter values are arbitrary, and when the value is 0, LP is equivalent to CN;
19) local random walk index LRW:
wherein the initial resource distribution of the node x is qxIs a vector of n × 1, with only the x-th element being 1 and the other elements being 0, i.e.t≥0;
20) The superimposed local random walk index SRW:
s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2 to obtain a result of fitting the test set, and comparing the result with the original data to obtain the following evaluation indexes: the Pearson correlation coefficient and the root mean square value, the model of the invention is simple and can obtain good prediction results.
As described above, the method for predicting the edge-connected weight in the undirected network graph is introduced, the method combines the similarity of network nodes and analyzes by using a multiple linear regression model, the final prediction result is better, and the requirement of actual use is met. The present invention is to be considered as illustrative and not restrictive. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (2)

1. A undirected network connection edge weight prediction method based on node similarity is characterized in that: the method comprises the following steps:
s1: constructing a undirected network graph G (V, E) by using an existing undirected network structure data set which comprises connecting edge weights between network nodes and nodes;
s2: according to the graph G ═ V, E, using the node similarity theory in link prediction, the following three types of features are calculated: the method comprises the steps of local similarity index, global similarity index and semi-local similarity index, wherein the local similarity index comprisesCommon neighbor CN, Salton index, Jaccard index, S phi rensen index, major node favorable index HPI, major node unfavorable index HDI, LHN-I index, preferential link index PA, Adamic-Adar index AA and resource allocation index RA; the global similarity indexes comprise Katz indexes, LHN-II indexes, average commute time ACT and cosine similarity cos based on random walk+Random walk RWR with restart, SimRank index SimR and matrix forest index MFI; the semi-local similarity index comprises a local path index LP, a local random walk index LRW and a superposed local random walk index SRW;
s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2, and finally obtaining the following evaluation indexes according to the fitting result and the original data: pearson correlation coefficient and root mean square value.
2. The undirected network connection edge weight prediction method based on node similarity as claimed in claim 1, wherein: in step S2, the adjacency matrix a in fig. G is (a)ij)n×n,i,j∈{1,2,...,n},
Wherein:
according to the adjacency matrix A, the following similarity indexes are respectively calculated:
1) common neighbor CN:
s x y C N = | Γ ( x ) ∩ Γ ( y ) |
where | Q | represents the number of elements of set Q, (x) is defined as the set of neighbor nodes for node x,representing a CN index value between the node x and the node y;
2) the Salton index:
s x y S a l t o n = | Γ ( x ) ∩ Γ ( y ) | k x × k y
wherein k isxA value representing x;
3) jaccard index:
s x y J a c c a r d = | Γ ( x ) ∩ Γ ( y ) | | Γ ( x ) ∪ Γ ( y ) |
4) s φ rensen index:
s x y S φ r e n s e n = 2 | Γ ( x ) ∩ Γ ( y ) | k x + k y
5) the favorable index HPI of the large node:
s x y H P I = | Γ ( x ) ∩ Γ ( y ) | min { k x , k y }
6) major node unfavorable index HDI:
s x y H D I = | Γ ( x ) ∩ Γ ( y ) | m a x { k x , k y }
7) LHN-I index:
s x y L H N 1 = | Γ ( x ) ∩ Γ ( y ) | k x k y
8) priority link index PA:
s x y P A = k x × k y
9) Adamic-Adar index AA:
s x y A A = Σ z ∈ Γ ( x ) ∩ Γ ( y ) 1 log k z
10) resource allocation index RA:
s x y R A = Σ z ∈ Γ ( x ) ∩ Γ ( y ) 1 k z
11) katz index:
SKatz=(I-βA)-1-I
where I is the identity matrix, the value of the parameter β must be less than the maximum eigenvalue λ of the adjacency matrix A1To ensure matrix convergence;
12) LHN-II index:
s x y L H N 2 = 2 Mλ 1 D - 1 ( I - φ A λ 1 ) - 1 D - 1
wherein,xyis the function of Kronecker, when x is yxyThe number of bits is 1, otherwise,xyd is the degree matrix of the undirected network graph G, i.e. Dij=ki ij,kxDenotes the value of x, phi is an adjustable parameter, and the value range is (0,1), lambda1Is the maximum eigenvalue of the adjacency matrix A, and M is the total number of edges of the network;
13) average commute time ACT:
s x y A C T = 1 l x x + + l y y + - 2 l x y +
wherein the pseudo-inverse of the laplace matrix L (L ═ D-a) of the network G is L+A representation matrix L+The elements of (1);
14) cosine similarity cos based on random walk+
s x y c o s + = c o s ( x , y ) + = l x y + l x x + · l y y +
15) Random walk with restart RWR:
s x y R W R = π x y + π y x
wherein,
π → x = ( 1 - c ) ( I - cP T ) - 1 e → x
element pixyExpressed as how much probability the particle from node x eventually has to go to node y, (1-c) is the particle return probability, and P is the Markov probability transition matrix of the network, whose elements PxyRepresenting the probability of the particle at node x going to node y next;
16) SimRank index SimR:
s x y S i m R = C Σ v z ∈ Γ ( x ) Σ v z ′ ∈ Γ ( y ) s zz ′ S i m R k x k y
wherein s isxx=1,C∈[0,1]Is the attenuation parameter at the time of similarity transmission;
17) matrix forest index MFI:
SMFI=(I+αL)-1,α>0
wherein, the laplace matrix of the network G is L (L ═ D-a), and I is an identity matrix;
18) local path index LP:
SLP=A2+A3
wherein the parameter values are arbitrary, and when the value is 0, LP is equivalent to CN;
19) local random walk index LRW:
s x y L R W ( t ) = q x π x y ( t ) + q y π y x ( t )
wherein the initial resource distribution of the node x is qxIs a vector of n × 1, with only the x-th element being 1 and the other elements being 0, i.e.
20) The superimposed local random walk index SRW:
s x y S R W ( t ) = Σ τ = 1 t s x y L R W ( τ ) = Σ τ = 1 t [ q x π x y ( τ ) + q y π y x ( τ ) ] .
CN201710136070.3A 2017-03-09 2017-03-09 A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology Pending CN106960251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710136070.3A CN106960251A (en) 2017-03-09 2017-03-09 A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710136070.3A CN106960251A (en) 2017-03-09 2017-03-09 A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology

Publications (1)

Publication Number Publication Date
CN106960251A true CN106960251A (en) 2017-07-18

Family

ID=59469992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710136070.3A Pending CN106960251A (en) 2017-03-09 2017-03-09 A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology

Country Status (1)

Country Link
CN (1) CN106960251A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399491A (en) * 2018-02-02 2018-08-14 浙江工业大学 A kind of employee's diversity ranking method based on network
CN108449311A (en) * 2018-01-29 2018-08-24 浙江工业大学 A kind of social networks hiding method based on attack node similitude
CN108491511A (en) * 2018-03-23 2018-09-04 腾讯科技(深圳)有限公司 Data digging method and device, model training method based on diagram data and device
CN108811028A (en) * 2018-07-23 2018-11-13 南昌航空大学 A kind of prediction technique, device and the readable storage medium storing program for executing of opportunistic network link
CN109101629A (en) * 2018-08-14 2018-12-28 合肥工业大学 A kind of network representation method based on depth network structure and nodal community
CN109726297A (en) * 2018-12-28 2019-05-07 沈阳航空航天大学 A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy
CN109829561A (en) * 2018-11-15 2019-05-31 西南石油大学 Accident forecast method based on smoothing processing Yu network model machine learning
CN111310822A (en) * 2020-02-12 2020-06-19 山西大学 PU learning and random walk based link prediction method and device
CN111865690A (en) * 2020-07-21 2020-10-30 南昌航空大学 Opportunistic network link prediction method based on network structure and time sequence

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108449311B (en) * 2018-01-29 2020-08-04 浙江工业大学 Social relationship hiding method based on attack node similarity
CN108449311A (en) * 2018-01-29 2018-08-24 浙江工业大学 A kind of social networks hiding method based on attack node similitude
CN108399491B (en) * 2018-02-02 2021-10-29 浙江工业大学 Employee diversity ordering method based on network graph
CN108399491A (en) * 2018-02-02 2018-08-14 浙江工业大学 A kind of employee's diversity ranking method based on network
CN108491511B (en) * 2018-03-23 2022-03-18 腾讯科技(深圳)有限公司 Data mining method and device based on graph data and model training method and device
CN108491511A (en) * 2018-03-23 2018-09-04 腾讯科技(深圳)有限公司 Data digging method and device, model training method based on diagram data and device
CN108811028B (en) * 2018-07-23 2021-07-16 南昌航空大学 Opportunity network link prediction method and device and readable storage medium
CN108811028A (en) * 2018-07-23 2018-11-13 南昌航空大学 A kind of prediction technique, device and the readable storage medium storing program for executing of opportunistic network link
CN109101629A (en) * 2018-08-14 2018-12-28 合肥工业大学 A kind of network representation method based on depth network structure and nodal community
CN109829561A (en) * 2018-11-15 2019-05-31 西南石油大学 Accident forecast method based on smoothing processing Yu network model machine learning
CN109829561B (en) * 2018-11-15 2021-03-16 西南石油大学 Accident prediction method based on smoothing processing and network model machine learning
CN109726297A (en) * 2018-12-28 2019-05-07 沈阳航空航天大学 A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy
CN109726297B (en) * 2018-12-28 2022-12-23 沈阳航空航天大学 Bipartite network node prediction algorithm based on mutual exclusion strategy
CN111310822A (en) * 2020-02-12 2020-06-19 山西大学 PU learning and random walk based link prediction method and device
CN111865690A (en) * 2020-07-21 2020-10-30 南昌航空大学 Opportunistic network link prediction method based on network structure and time sequence
CN111865690B (en) * 2020-07-21 2022-06-03 南昌航空大学 Opportunistic network link prediction method based on network structure and time sequence

Similar Documents

Publication Publication Date Title
CN106960251A (en) A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology
Hao et al. Probabilistic dual hesitant fuzzy set and its application in risk evaluation
Liu et al. A FTA-based method for risk decision-making in emergency response
Kabak et al. A fuzzy multi-criteria decision making approach to assess building energy performance
CN109523021B (en) Dynamic network structure prediction method based on long-time and short-time memory network
CN110149237B (en) Hadoop platform computing node load prediction method
CN106230773A (en) Risk evaluating system based on fuzzy matrix analytic hierarchy process (AHP)
Yang et al. Neural network and GA approaches for dwelling fire occurrence prediction
CN112580902B (en) Object data processing method and device, computer equipment and storage medium
CN103577876A (en) Credible and incredible user recognizing method based on feedforward neural network
Fan et al. An improved approach to generate generalized basic probability assignment based on fuzzy sets in the open world and its application in multi-source information fusion
Li et al. Analysis and modelling of flood risk assessment using information diffusion and artificial neural network
CN105228185A (en) A kind of method for Fuzzy Redundancy node identities in identification communication network
Jat et al. Applications of statistical techniques and artificial neural networks: A review
CN113761217A (en) Artificial intelligence-based question set data processing method and device and computer equipment
Moradi et al. Sensitivity analysis of ordered weighted averaging operator in earthquake vulnerability assessment
Roshanfar et al. Predicting fatigue life of shear connectors in steel‐concrete composite bridges using artificial intelligence techniques
Li et al. Research on financial risk crisis prediction of listed companies based on IWOA-BP neural network
CN102902875A (en) Network-based method for evaluating reliability degree of failure-relevant system
Cissé et al. Impact of neighborhood structure on epidemic spreading by means of cellular automata approach
Zhang et al. Intrusion detection method based on improved growing hierarchical self-organizing map
Kim et al. A study on influence of human personality to location selection
Bai et al. Failure propagation of dependency networks with recovery mechanism
Mohammadian et al. Intelligent decision making and analysis using fuzzy cognitive maps for disaster recovery planning
Runge et al. Introduction to risk analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170718

RJ01 Rejection of invention patent application after publication