CN113612690A

CN113612690A - Directional network link prediction method

Info

Publication number: CN113612690A
Application number: CN202110893564.2A
Authority: CN
Inventors: 朱西方; 任越美; 李垒; 冀楠楠; 尹光兵; 顾光
Original assignee: Henan Polytechnic Institute
Current assignee: Henan Polytechnic Institute
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2021-11-05

Abstract

The invention provides a directed network link prediction method, which comprises the steps of determining a node set and a connecting edge set of a directed network; constructing a directed graph of a directed network according to the node set and the connecting edge set; determining an optional link in a directed network according to the directed graph; respectively calculating the topological connectivity of each optional link; according to the topology connectivity, performing descending sorting of the selectable links; and selecting the optional link with the maximum topological connectivity value as the preferred link of the directed network according to the descending order. The invention has the beneficial effects that: compared with the prior art, the prediction of the invention is based on the collected directed graph, and the optional selection of all network links is determined before the network prediction is carried out. And the value of the topological connectivity in the directed graph is calculated to be larger through the topological connectivity, and the corresponding preferred link is determined through the value of the topological connectivity.

Description

Directional network link prediction method

Technical Field

The invention relates to the technical field of network links, in particular to a directed network link prediction method.

Background

In life, there are networks ubiquitous, such as WWW networks, food chain networks, academic collaboration networks, and social networks like microblogs. Directed networks belong to complex networks, elements and individuals in the network are used as nodes to represent, and edges between two nodes in the network represent relationships between systems. The link prediction in the directed network is to judge the possibility that two nodes which are not in a handover relationship in the network generate a boundary relationship and generate a link through the determined network nodes and network structures, and then in a complex network, the dynamic evolution mechanism is used, so that the prediction function of the directed network reflects the fact that the approach degree between the two nodes cannot be quickly judged, and the directed interaction of the network is controlled.

Most research objects of early directed network research work are social networks, biochemical networks, ecological networks, engineering networks and the like, and the generation mechanism and the design principle of the directed network underlying structure are proposed.

The link generation mode of the directed network gets more attention and research than the undirected network, and the attention relationship in the network is asymmetric. Although the prior art proposes that link prediction is realized through the relative degree of the directed network, the prediction mode of the prior art has a large amount of relative degree calculation, and when the technology is implemented, the method can only be applied to the limited technical field of calculation, and only a large amount of calculation processes can be observed, the calculation processes need to be particularly meticulous and complex, and cannot be participated in by people, and only judgment of calculation result display is needed, so that the relative degree index of the directed network needs to be determined in advance, and the network prediction cannot be performed under the condition that the relative degree index cannot be determined.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method for predicting directed network links, which aims at overcoming the defects of the prior art, quantitatively describes the directed network through a directed graph of the network, judges the optional links, and finally realizes link optimization based on topological connectivity.

A method of directed network link prediction, comprising:

determining a node set and a connecting edge set of a directed network;

constructing a directed graph of a directed network according to the node set and the connecting edge set;

determining an optional link in a directed network according to the directed graph;

respectively calculating the topological connectivity of each optional link;

according to the topology connectivity, performing descending sorting of the selectable links;

and selecting the optional link with the maximum topological connectivity value as the preferred link of the directed network according to the descending order.

As an embodiment of the present invention, the determining a node combination and a connected edge set of a directed network includes:

acquiring link data of a directed network;

determining link nodes in the link data according to the link data;

carrying out node marking and direction marking on the link nodes, and determining node marking parameters and direction marking parameters;

determining a node set of the directed network according to the node marking parameters;

calculating the similarity of the nodes according to the direction marking parameters;

and performing node edge connection according to the similarity to generate directed links, and performing statistics on each directed link to form an edge connection set.

As an embodiment of the present invention, the computing node similarity includes the following ways:

a Bifan-based frequency prediction method;

a common neighbor-based node similarity method;

common neighbor node similarity method based on DAA;

a proportional method based on resource allocation indicators;

a node in-out degree method based on preference connection indexes;

a third-order function and an algorithm based on local similarity indexes;

global attributes and algorithms based on Katz;

a particle calculation method based on node average commute time;

a probability calculation method based on random walk;

a local random walk method based on superposition effects.

As an embodiment of the present invention, the constructing a directed graph of a directed network according to the node set and the connected edge set includes:

performing parameter definition on the node set and the connecting edge set; wherein,

the node set is defined as x, and the connecting edge set is defined as y;

determining a node sequence among different nodes in the node set according to the parameter definition;

determining a domination relation and an equivalent state between different nodes according to the node sequence;

determining the path directions of different nodes in the directed network according to the domination relation and the equivalence state;

and according to the path direction, different nodes are collected on the same map to generate a directed map.

As an embodiment of the present invention, the determining, according to the directed graph, an optional link in a directed network includes:

determining information contribution of any node and a connecting edge thereof according to the directed graph;

calculating the similarity index of any node according to the information contribution;

respectively determining neighbor nodes of each node according to the similarity indexes; wherein,

the neighbor nodes comprise a connection-out neighbor, a connection-in neighbor and a reciprocal neighbor;

respectively setting the edge connecting weight of the neighbor nodes, and constructing a network weighted adjacency matrix according to the edge connecting weight;

constructing a linear programming model based on information contribution according to the network weighted adjacency matrix;

and determining the selectable links within a preset linear value according to the linear programming model.

As an embodiment of the present invention, the separately calculating the topology connectivity of each optional link includes:

according to the optional link, determining the centrality index of the node degree, and calculating the node entrance and exit degree:

wherein f is_c(i) Representing the out degree of the ith node; f. of_r(i) Representing the degree of entry of the ith node; j. the design is a square_xyRepresenting the degree of entry from the node x to the connecting line y; j. the design is a square_yxRepresenting the out degree from the connecting line y to the node x; i belongs to n, and i is a positive integer;

and respectively calculating the topology connectivity of each optional link according to the node access degree:

wherein L represents the topological connectivity.

As an embodiment of the present invention, the sorting, in descending order, of the selectable links according to the topology connectivity includes:

taking the topological connectivity of the optional link as a training sample, and building a random forest model based on the training sample;

screening out target training samples meeting preset conditions from a training sample set corresponding to each decision tree of the random forest model to form a target training sample set for constructing the decision tree;

obtaining the variable importance of each characteristic variable in each decision tree, and sorting all the characteristic variables in a descending order according to the variable importance;

and according to the target training sample set and all the characteristic variables after descending sorting, sequentially determining the optimal characteristic variables and the optimal segmentation values corresponding to the nodes in the decision tree by taking a kini coefficient as a splitting rule from the root node of the decision tree, and arranging in a descending sorting mode.

As an embodiment of the present invention, the selecting, according to the descending order, the optional link with the largest topology connectivity value as the preferred link of the directed network includes:

step (1): according to the descending order, determining the original characteristics of the nodes in the corresponding order, and removing the characteristics irrelevant to the label content and the characteristics with over-small frequency from the original characteristic set to obtain a characteristic set to be selected;

clustering the characteristics in the characteristic set to be selected by utilizing a characteristic clustering algorithm to obtain a corresponding characteristic group;

introducing a hidden variable into each feature group to obtain a corresponding hidden model, and calculating the correlation between the hidden variable and the label;

step (4), sorting the feature groups according to the relevance between the hidden variables and the labels from large to small;

step (5), sequentially adding the sorted feature groups into the selected feature subsets, connecting the label Y with hidden variables in the added feature groups each time the label Y is added, and connecting the hidden variables with the features in the feature groups, so as to obtain a Bayesian network containing the hidden variables, performing parameter learning on the Bayesian network, and calculating the classification accuracy of the learned Bayesian network;

and (6) establishing a curve of the number of the added feature groups and the classification accuracy, and obtaining a corresponding optimal link by judging the convergence or the highest accuracy of the curve.

The invention has the beneficial effects that: compared with the prior art, the prediction of the invention is based on the collected directed graph, and the optional selection of all network links is determined before the network prediction is carried out. And the value of the topological connectivity in the directed graph is calculated to be larger through the topological connectivity, and the corresponding preferred link is determined through the value of the topological connectivity.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a method for predicting a directed network link according to an embodiment of the present invention;

fig. 2 is a flow chart of a node combination and edge set generation method of the directed network in the embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

As shown in fig. 1, the present invention is a method for predicting a directed network link, including:

determining a node set and a connecting edge set of a directed network;

the invention is formed by nodes when directed network, and realizes connection by the edges of the nodes to determine the possibility of link existence between two nodes, so the invention respectively combines the node set and the edge connection set.

the mailbox network map is a continuous guiding map and can be applied to the situation of determining the directed network edge in any environment and scene.

the directed network graph is a graph of a node set and a continuous edge set established by the invention, which indicates that the directed network graph is corresponding to a node correlation degree, other continuous edges are corresponding to nodes and have an intersecting continuous edge correlation graph, and the prediction of the connection possibility is generated through the correlation of the continuous edges and the nodes, so that a continuous edge link of each node is also determined, all optional links between two nodes are also determined, however, the optional links are determined, the link is definitely adopted in the directed network, whether the link is the optimal link or not needs to be judged, and therefore, the invention calculates the topology connectivity degree,

respectively calculating the topological connectivity of each optional link;

the topological connectivity is a point set topology for judging whether two nodes are connected or not, and after the method is introduced, the excellence of the link can be judged according to the topological connectivity between the nodes.

the descending sort is easily understood, namely, the sorting is from big to small, and the link with the maximum value is definitely the most excellent link.

The principle of the technical scheme is as follows: in the prior art, general link prediction is implemented by means of vector calculation, line metering, line disturbance and the like, and compared with the prior art, the methods are outdated and have a plurality of technical defects. Therefore, the invention provides that a directed graph is constructed through the node set and the connecting edge set of the directed network, so that the optimal line is determined according to the connectivity through topological calculation.

The beneficial effects of the above technical scheme are that: compared with the prior art, the prediction of the invention is based on the collected directed graph, and the optional selection of all network links is determined before the network prediction is carried out. And the value of the topological connectivity in the directed graph is calculated to be larger through the topological connectivity, and the corresponding preferred link is determined through the value of the topological connectivity.

As an embodiment of the present invention, shown in fig. 2, the determining a node combination and a connected edge set of a directed network includes:

acquiring link data of a directed network;

the link data of the mailbox network is the data of the connecting edge of the directed network, and also comprises the node data in the mailbox network, and the data formed by combining the connecting edge and the node is the link data.

Determining link nodes in the link data according to the link data;

carrying out node marking and direction marking on the link nodes, and determining node marking parameters and direction marking parameters; because the invention is a directed network, the nodes of the invention are all directional, also called directed nodes, so that effective marking can be carried out,

in the process of marking the direction, the node is judged in a conventional parameterization mode, and the generation of the node set is realized.

Calculating the similarity of the nodes according to the direction marking parameters; because the invention has been parameterized, the invention can calculate the similarity between the nodes according to the similarity calculation mode, and carry out the correspondence to the edges of the nodes according to the similarity, the node connecting the edges is the interactive type condition of the connecting edges, and each link is determined by the connecting edges.

a Bifan-based frequency prediction method;

a common neighbor-based node similarity method;

common neighbor node similarity method based on DAA;

a proportional method based on resource allocation indicators;

a node in-out degree method based on preference connection indexes;

a third-order function and an algorithm based on local similarity indexes;

global attributes and algorithms based on Katz;

a particle calculation method based on node average commute time;

a probability calculation method based on random walk;

a local random walk method based on superposition effects.

The principle of the technical scheme is that the method is suitable for different types of directed network calculation similarity, so that different modes are needed to calculate the different directed network similarity, and therefore the method comprises the steps of carrying out similarity in the various modes.

The beneficial effects of the above technical scheme are that: the method can adapt to various directed networks of different types and is suitable for more network scenes.

the node set is defined as x, and the connecting edge set is defined as y;

The principle of the technical scheme is as follows: the invention is based on the form of the parameter in the process of constructing the directed graph, therefore, the invention firstly needs to determine the correlation between the continuous edge set and the node set, so the invention establishes a two-dimensional coordinate system, and establishes the network graph based on the two-dimensional coordinate system, so the invention can realize the judgment of the domination relation and the equivalence state between different nodes according to the parameter definition and the node sequence of the two-dimensional coordinate system, thereby being capable of determining the path direction, establishing the directed coordinate on the node coordinate system, and generating the directed graph under the conditions of the directed coordinate, the node and the continuous edit sum.

The beneficial effects of the above technical scheme are that: the directed graph established by the invention is based on a two-dimensional coordinate system, because the two dimensions are plane coordinate systems originally, and the directed graph established by the two-dimensional coordinate system is convenient and accords with the actual condition of a network.

The principle of the technical scheme is as follows: the invention can determine the information contribution degree of any node and connecting edge when determining the optional link, the information sharing is the proportion of the data transmission time and the data quantity of the link formed by the connecting edge in the data transmission of all the connecting edges, the information contribution is also a judgment index of the similarity, because the connecting edges of different nodes can transmit the same data, the two nodes are related, namely the neighbor nodes of the node, thereby calculating the weight of the connecting edge by the neighbor nodes, forming a network adjacent matrix according to the connecting edge, and a linear programming model is a data model formed by a prior network based on the data transmission time and the data transmission contribution degree, and all the optional links can be determined through the data model.

The beneficial effects of the above technical scheme are that: the invention can judge whether the nodes are neighbor nodes or not based on the fact whether the information transmission of the nodes is the same, and can judge all selectable links based on the neighbor nodes.

the principle of the steps is as follows: when the topological connectivity is calculated, the degree of exit and the degree of entry of each node are calculated firstly, for the directed graph, the number of the exit side strips of the vertex is called the degree of exit of the vertex, and the number of the entry side strips of the vertex is called the degree of entry of the vertex, so that the degree of exit and the degree of entry of each node are calculated through the method.

wherein L represents the topological connectivity.

When the topological connectivity is calculated, the method calculates the final topological connectivity through the feature calculation between different nodes based on the calculated out-degree and in-degree and the feature product of the out-degree and the in-degree.

The principle of the technical scheme is as follows: when the descending ordering is carried out, although the ordering is carried out according to the same size, point-to-point directed transmission is needed in the directed network transmission process, and the accuracy of information transmission is needed, therefore, when the judgment is needed, the link is not suitable for the current node, therefore, the random forest model is adopted, the random forest model comprises the decision tree, namely the decision tree is used for judging whether the ordered link is suitable for the data transmission of the current node, the decision tree is also used as a condition and parameterized, the training of all the links is realized, the characteristic variables in the decision tree are obtained, the descending ordering is realized through the characteristic variables, finally, the optimal data variable is segmented by taking the Gini coefficient as a splitting rule, and the arrangement result of the descending ordering is obtained.

The beneficial effects of the above technical scheme are: in the descending order arrangement process, the descending order arrangement is carried out in a decision judgment mode through a random forest model, and the final descending order arrangement result of the selectable links is obtained.

The principle of the technical method is as follows: the label content is the content of data transmitted through a directed network, in the process of descending order sorting, the invention obtains 242828a group by obtaining the node characteristics of each node and then performing cluster analysis on the node, the introduced hidden variable is the variable of data transmission quantity during node data transmission, the variable of the transmission quantity is not counted during data transmission, so that the hidden variable can be used as a hidden variable to construct a hidden model, sorting is finally realized through the relation between the hidden variable and the label, sorting is finally realized through a Bayesian network, the result obtained by sorting is finally grouped to generate a component curve, and the optimal link is obtained based on the sorting curve.

The method for obtaining the optimal link is quite complex because the data transmitted by the mailbox network is different and the data transmitted by different directed networks is different in actual implementation, and if the optimal link is determined, the transmitted data and the link are also determined to be adaptive, so that the method is adopted, and the classification of the Bayesian network is mainly the classification of the data transmitted by the directed networks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for predicting a directed network link, comprising:

determining a node set and a connecting edge set of a directed network;

respectively calculating the topological connectivity of each optional link;

2. The method of claim 1, wherein determining the node join and edge join sets for the directed network comprises:

acquiring link data of a directed network;

determining link nodes in the link data according to the link data;

3. The method of claim 1, wherein computing the node similarities comprises:

a Bifan-based frequency prediction method;

a common neighbor-based node similarity method;

common neighbor node similarity method based on DAA;

a proportional method based on resource allocation indicators;

a node in-out degree method based on preference connection indexes;

a third-order function and an algorithm based on local similarity indexes;

global attributes and algorithms based on Katz;

a particle calculation method based on node average commute time;

a probability calculation method based on random walk;

a local random walk method based on superposition effects.

4. The method of claim 1, wherein constructing the directed graph of the directed network according to the node set and the edge set comprises:

the node set is defined as x, and the connecting edge set is defined as y;

5. The method as claimed in claim 1, wherein said determining the optional links in the directed network according to the directed graph comprises:

6. The method of claim 1, wherein the calculating the topology connectivity of each optional link comprises:

wherein L represents the topological connectivity.

7. The method as claimed in claim 1, wherein said sorting the selectable links in descending order according to the topology connectivity comprises:

8. The method according to claim 1, wherein the selecting the optional link with the largest topology connectivity value as the preferred link of the directed network according to the descending order comprises: