CN106960251A - A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology - Google Patents
A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology Download PDFInfo
- Publication number
- CN106960251A CN106960251A CN201710136070.3A CN201710136070A CN106960251A CN 106960251 A CN106960251 A CN 106960251A CN 201710136070 A CN201710136070 A CN 201710136070A CN 106960251 A CN106960251 A CN 106960251A
- Authority
- CN
- China
- Prior art keywords
- index
- gamma
- node
- similarity
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012417 linear regression Methods 0.000 claims abstract description 8
- 238000012360 testing method Methods 0.000 claims abstract description 6
- 238000002790 cross-validation Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 28
- 238000005295 random walk Methods 0.000 claims description 16
- 239000002245 particle Substances 0.000 claims description 6
- 230000002349 favourable effect Effects 0.000 claims description 4
- 238000013468 resource allocation Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000006870 function Effects 0.000 claims description 2
- 238000004379 similarity theory Methods 0.000 claims description 2
- 238000000547 structure data Methods 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 241000244206 Nematoda Species 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006916 protein interaction Effects 0.000 description 1
- 210000000225 synapse Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology, comprise the following steps:1) according to data with existing collection, including node and even side right value, set up undirected graph;2) the following three classes similarity indices of network in calculating 1) respectively:Local similarity index, global similarity indices and half local similarity index;3) the three class similarity indices according to obtained by being calculated in 2), using multiple linear regression model, connect side right value, then with ten folding cross-validation methods, testing model average behavior, including Pearson's coefficient and root-mean-square value in prediction test set.The present invention utilizes node similitude, and company's side right value of missing is predicted using multiple linear regression model, and model is simple, predicts the outcome preferably.
Description
Technical Field
The invention relates to the field of link prediction and data mining, in particular to a method for predicting a connection edge weight based on network node similarity.
Background
In reality, many systems can be abstracted into a model of a complex network, individual objects in the system are abstracted into nodes, and relationships between individuals are abstracted into connecting edges, such as a social network, a protein interaction network, a power network and the like. The network connection edge is used as a bridge for connecting individual objects and plays an important role in revealing a network structure. In reality, the edges of many networks are weighted, and the weights of the edges have definite physical meanings. For various reasons, some network link weights may be missing, and especially when the missing weights contain important network structure information, the prediction of these weights is very critical.
Disclosure of Invention
In order to overcome the defect of poor model prediction results caused by the lack of the existing network continuous edge weights, the invention uses the similarity of network nodes and adopts a multiple linear regression model to predict the missing continuous edge weights, and provides a continuous edge weight prediction method based on undirected network node similarity with better model prediction results.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting a connection edge weight value based on network node similarity comprises the following steps:
s1: constructing a undirected network graph G (V, E) by using an existing undirected network structure data set which comprises connecting edge weights between network nodes and nodes;
s2: according to the graph G ═ V, E, using the node similarity theory in link prediction, the following three types of features are calculated: the method comprises the following steps of obtaining a local similarity index, a global similarity index and a semi-local similarity index, wherein the local similarity index comprises a common neighbor CN, a Salton index, a Jaccard index, a S phi rensen index, a major node favorable index HPI, a major node unfavorable index HDI, an LHN-I index, a preferential link index PA, an Adamic-Adar index AA and a resource allocation index RA; the global similarity indexes comprise Katz indexes, LHN-II indexes, average commute time ACT and cosine similarity cos based on random walk+Random walk RWR with restart, SimRank index SimR and matrix forest index MFI; the semi-local similarity index comprises a local path index LP, a local random walk index LRW and a superposed local random walk index SRW;
s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2, and finally obtaining the following evaluation indexes according to the fitting result and the original data: pearson correlation coefficient and root mean square value.
The invention has the beneficial effects that: and by utilizing the node similarity and adopting a multiple linear regression model to predict missing continuous edge weights, the model is simple and the prediction result is better.
Drawings
Fig. 1 is a flowchart of a undirected network edge prediction method incorporating node similarity in an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a undirected network connection edge weight prediction method based on node similarity includes the following steps:
s1: constructing a undirected network graph G (V, E) by using an existing nematode neural network (C.elegans) data set, wherein nodes represent neurons of the nematode, and edges represent synapses or gap connections of the neurons;
s2: adjacency matrix a of fig. G ═ (a)ij)n×n,i,j∈{1,2,...,n},
Wherein:
according to the adjacency matrix A, the following similarity indexes are respectively calculated:
1) common neighbor CN:
where | Q | represents the number of elements of set Q, (x) is defined as the set of neighbor nodes for node x,the CN index value between the node x and the node y is expressed as follows;
2) the Salton index:
wherein k isxA value representing x;
3) jaccard index:
4) s φ rensen index:
5) the favorable index HPI of the large node:
6) major node unfavorable index HDI:
7) LHN-I index:
8) priority link index PA:
9) Adamic-Adar index AA:
10) resource allocation index (RA):
11) katz index:
SKatz=(I-βA)-1-I
where I is the identity matrix, the value of the parameter β must be less than the maximum eigenvalue λ of the adjacency matrix A1To ensure matrix convergence;
12) LHN-II index:
wherein,xyis the function of Kronecker, when x is yxyThe number of bits is 1, otherwise,xyd is the degree matrix of the undirected network graph G, i.e. Dij=ki ij,kxDenotes the value of x, phi is an adjustable parameter, and the value range is (0,1), lambda1Is the maximum eigenvalue of the adjacency matrix A, and M is the total number of edges of the network;
13) average commute time ACT:
wherein the pseudo-inverse of the laplace matrix L (L ═ D-a) of the network G is L+,A representation matrix L+The elements of (1);
14) cosine similarity cos based on random walk+:
15) Random walk with restart RWR:
wherein,
element pixyExpressed as how much probability the particle from node x eventually has to go to node y, (1-c) is the particle return probability, and P is the Markov probability transition matrix of the network, whose elements PxyRepresenting the probability of the particle at node x going to node y next;
16) SimRank index SimR:
wherein s isxx=1,C∈[0,1]Is the attenuation parameter at the time of similarity transmission;
17) matrix forest index MFI:
SMFI=(I+αL)-1,α>0
wherein, the laplace matrix of the network G is L (L ═ D-a), and I is an identity matrix;
18) local path index LP:
SLP=A2+A3
wherein the parameter values are arbitrary, and when the value is 0, LP is equivalent to CN;
19) local random walk index LRW:
wherein the initial resource distribution of the node x is qx,Is a vector of n × 1, with only the x-th element being 1 and the other elements being 0, i.e.t≥0;
20) The superimposed local random walk index SRW:
s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2 to obtain a result of fitting the test set, and comparing the result with the original data to obtain the following evaluation indexes: the Pearson correlation coefficient and the root mean square value, the model of the invention is simple and can obtain good prediction results.
As described above, the method for predicting the edge-connected weight in the undirected network graph is introduced, the method combines the similarity of network nodes and analyzes by using a multiple linear regression model, the final prediction result is better, and the requirement of actual use is met. The present invention is to be considered as illustrative and not restrictive. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (2)
1. A undirected network connection edge weight prediction method based on node similarity is characterized in that: the method comprises the following steps:
s1: constructing a undirected network graph G (V, E) by using an existing undirected network structure data set which comprises connecting edge weights between network nodes and nodes;
s2: according to the graph G ═ V, E, using the node similarity theory in link prediction, the following three types of features are calculated: the method comprises the steps of local similarity index, global similarity index and semi-local similarity index, wherein the local similarity index comprisesCommon neighbor CN, Salton index, Jaccard index, S phi rensen index, major node favorable index HPI, major node unfavorable index HDI, LHN-I index, preferential link index PA, Adamic-Adar index AA and resource allocation index RA; the global similarity indexes comprise Katz indexes, LHN-II indexes, average commute time ACT and cosine similarity cos based on random walk+Random walk RWR with restart, SimRank index SimR and matrix forest index MFI; the semi-local similarity index comprises a local path index LP, a local random walk index LRW and a superposed local random walk index SRW;
s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2, and finally obtaining the following evaluation indexes according to the fitting result and the original data: pearson correlation coefficient and root mean square value.
2. The undirected network connection edge weight prediction method based on node similarity as claimed in claim 1, wherein: in step S2, the adjacency matrix a in fig. G is (a)ij)n×n,i,j∈{1,2,...,n},
Wherein:
according to the adjacency matrix A, the following similarity indexes are respectively calculated:
1) common neighbor CN:
where | Q | represents the number of elements of set Q, (x) is defined as the set of neighbor nodes for node x,representing a CN index value between the node x and the node y;
2) the Salton index:
wherein k isxA value representing x;
3) jaccard index:
4) s φ rensen index:
5) the favorable index HPI of the large node:
6) major node unfavorable index HDI:
7) LHN-I index:
8) priority link index PA:
9) Adamic-Adar index AA:
10) resource allocation index RA:
11) katz index:
SKatz=(I-βA)-1-I
where I is the identity matrix, the value of the parameter β must be less than the maximum eigenvalue λ of the adjacency matrix A1To ensure matrix convergence;
12) LHN-II index:
wherein,xyis the function of Kronecker, when x is yxyThe number of bits is 1, otherwise,xyd is the degree matrix of the undirected network graph G, i.e. Dij=ki ij,kxDenotes the value of x, phi is an adjustable parameter, and the value range is (0,1), lambda1Is the maximum eigenvalue of the adjacency matrix A, and M is the total number of edges of the network;
13) average commute time ACT:
wherein the pseudo-inverse of the laplace matrix L (L ═ D-a) of the network G is L+,A representation matrix L+The elements of (1);
14) cosine similarity cos based on random walk+:
15) Random walk with restart RWR:
wherein,
element pixyExpressed as how much probability the particle from node x eventually has to go to node y, (1-c) is the particle return probability, and P is the Markov probability transition matrix of the network, whose elements PxyRepresenting the probability of the particle at node x going to node y next;
16) SimRank index SimR:
wherein s isxx=1,C∈[0,1]Is the attenuation parameter at the time of similarity transmission;
17) matrix forest index MFI:
SMFI=(I+αL)-1,α>0
wherein, the laplace matrix of the network G is L (L ═ D-a), and I is an identity matrix;
18) local path index LP:
SLP=A2+A3
wherein the parameter values are arbitrary, and when the value is 0, LP is equivalent to CN;
19) local random walk index LRW:
wherein the initial resource distribution of the node x is qx,Is a vector of n × 1, with only the x-th element being 1 and the other elements being 0, i.e.
20) The superimposed local random walk index SRW:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710136070.3A CN106960251A (en) | 2017-03-09 | 2017-03-09 | A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710136070.3A CN106960251A (en) | 2017-03-09 | 2017-03-09 | A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106960251A true CN106960251A (en) | 2017-07-18 |
Family
ID=59469992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710136070.3A Pending CN106960251A (en) | 2017-03-09 | 2017-03-09 | A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106960251A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399491A (en) * | 2018-02-02 | 2018-08-14 | 浙江工业大学 | A kind of employee's diversity ranking method based on network |
CN108449311A (en) * | 2018-01-29 | 2018-08-24 | 浙江工业大学 | A kind of social networks hiding method based on attack node similitude |
CN108491511A (en) * | 2018-03-23 | 2018-09-04 | 腾讯科技(深圳)有限公司 | Data digging method and device, model training method based on diagram data and device |
CN108811028A (en) * | 2018-07-23 | 2018-11-13 | 南昌航空大学 | A kind of prediction technique, device and the readable storage medium storing program for executing of opportunistic network link |
CN109101629A (en) * | 2018-08-14 | 2018-12-28 | 合肥工业大学 | A kind of network representation method based on depth network structure and nodal community |
CN109726297A (en) * | 2018-12-28 | 2019-05-07 | 沈阳航空航天大学 | A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy |
CN109829561A (en) * | 2018-11-15 | 2019-05-31 | 西南石油大学 | Accident forecast method based on smoothing processing Yu network model machine learning |
CN111310822A (en) * | 2020-02-12 | 2020-06-19 | 山西大学 | PU learning and random walk based link prediction method and device |
CN111865690A (en) * | 2020-07-21 | 2020-10-30 | 南昌航空大学 | Opportunistic network link prediction method based on network structure and time sequence |
-
2017
- 2017-03-09 CN CN201710136070.3A patent/CN106960251A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108449311B (en) * | 2018-01-29 | 2020-08-04 | 浙江工业大学 | Social relationship hiding method based on attack node similarity |
CN108449311A (en) * | 2018-01-29 | 2018-08-24 | 浙江工业大学 | A kind of social networks hiding method based on attack node similitude |
CN108399491B (en) * | 2018-02-02 | 2021-10-29 | 浙江工业大学 | Employee diversity ordering method based on network graph |
CN108399491A (en) * | 2018-02-02 | 2018-08-14 | 浙江工业大学 | A kind of employee's diversity ranking method based on network |
CN108491511B (en) * | 2018-03-23 | 2022-03-18 | 腾讯科技(深圳)有限公司 | Data mining method and device based on graph data and model training method and device |
CN108491511A (en) * | 2018-03-23 | 2018-09-04 | 腾讯科技(深圳)有限公司 | Data digging method and device, model training method based on diagram data and device |
CN108811028B (en) * | 2018-07-23 | 2021-07-16 | 南昌航空大学 | Opportunity network link prediction method and device and readable storage medium |
CN108811028A (en) * | 2018-07-23 | 2018-11-13 | 南昌航空大学 | A kind of prediction technique, device and the readable storage medium storing program for executing of opportunistic network link |
CN109101629A (en) * | 2018-08-14 | 2018-12-28 | 合肥工业大学 | A kind of network representation method based on depth network structure and nodal community |
CN109829561A (en) * | 2018-11-15 | 2019-05-31 | 西南石油大学 | Accident forecast method based on smoothing processing Yu network model machine learning |
CN109829561B (en) * | 2018-11-15 | 2021-03-16 | 西南石油大学 | Accident prediction method based on smoothing processing and network model machine learning |
CN109726297A (en) * | 2018-12-28 | 2019-05-07 | 沈阳航空航天大学 | A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy |
CN109726297B (en) * | 2018-12-28 | 2022-12-23 | 沈阳航空航天大学 | Bipartite network node prediction algorithm based on mutual exclusion strategy |
CN111310822A (en) * | 2020-02-12 | 2020-06-19 | 山西大学 | PU learning and random walk based link prediction method and device |
CN111865690A (en) * | 2020-07-21 | 2020-10-30 | 南昌航空大学 | Opportunistic network link prediction method based on network structure and time sequence |
CN111865690B (en) * | 2020-07-21 | 2022-06-03 | 南昌航空大学 | Opportunistic network link prediction method based on network structure and time sequence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106960251A (en) | A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology | |
Hao et al. | Probabilistic dual hesitant fuzzy set and its application in risk evaluation | |
Liu et al. | A FTA-based method for risk decision-making in emergency response | |
Kabak et al. | A fuzzy multi-criteria decision making approach to assess building energy performance | |
CN109523021B (en) | Dynamic network structure prediction method based on long-time and short-time memory network | |
CN110149237B (en) | Hadoop platform computing node load prediction method | |
CN106230773A (en) | Risk evaluating system based on fuzzy matrix analytic hierarchy process (AHP) | |
Yang et al. | Neural network and GA approaches for dwelling fire occurrence prediction | |
CN112580902B (en) | Object data processing method and device, computer equipment and storage medium | |
CN103577876A (en) | Credible and incredible user recognizing method based on feedforward neural network | |
Fan et al. | An improved approach to generate generalized basic probability assignment based on fuzzy sets in the open world and its application in multi-source information fusion | |
Li et al. | Analysis and modelling of flood risk assessment using information diffusion and artificial neural network | |
CN105228185A (en) | A kind of method for Fuzzy Redundancy node identities in identification communication network | |
Jat et al. | Applications of statistical techniques and artificial neural networks: A review | |
CN113761217A (en) | Artificial intelligence-based question set data processing method and device and computer equipment | |
Moradi et al. | Sensitivity analysis of ordered weighted averaging operator in earthquake vulnerability assessment | |
Roshanfar et al. | Predicting fatigue life of shear connectors in steel‐concrete composite bridges using artificial intelligence techniques | |
Li et al. | Research on financial risk crisis prediction of listed companies based on IWOA-BP neural network | |
CN102902875A (en) | Network-based method for evaluating reliability degree of failure-relevant system | |
Cissé et al. | Impact of neighborhood structure on epidemic spreading by means of cellular automata approach | |
Zhang et al. | Intrusion detection method based on improved growing hierarchical self-organizing map | |
Kim et al. | A study on influence of human personality to location selection | |
Bai et al. | Failure propagation of dependency networks with recovery mechanism | |
Mohammadian et al. | Intelligent decision making and analysis using fuzzy cognitive maps for disaster recovery planning | |
Runge et al. | Introduction to risk analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170718 |
|
RJ01 | Rejection of invention patent application after publication |