CN116155755A

CN116155755A - Link symbol prediction method based on linear optimization closed sub-graph coding

Info

Publication number: CN116155755A
Application number: CN202310142125.7A
Authority: CN
Inventors: 谭少林; 方志宏; 方遒; 李哲
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-05-23
Anticipated expiration: 2043-02-21
Also published as: CN116155755B

Abstract

The invention discloses a link symbol prediction method based on linear optimization closed sub-graph coding, which comprises the steps of obtaining a network to be predicted, preprocessing the network to be predicted, and obtaining a training set sample and a testing set sample; taking one edge from a training set sample or a test set sample as an edge to be predicted, extracting a corresponding closed subgraph, carrying out coarse-granularity weighting on the closed subgraph to obtain a pre-weighted adjacent matrix, carrying out fine-granularity linear optimization weighting to obtain three likelihood matrixes, and processing and splicing the three likelihood matrixes to obtain vectors corresponding to the edge to be predicted; traversing all edges in the training set samples and the test set samples, repeating the method, and correspondingly obtaining a plurality of vectors; and inputting a plurality of vectors into the trained fully-connected neural network two classifiers, and outputting a link symbol prediction score through processing, so as to obtain a symbol prediction result of the edge to be predicted. The method can directly and simply sort the nodes in the closed subgraph, and the link symbol prediction effect is better.

Description

Link symbol prediction method based on linear optimization closed sub-graph coding

Technical Field

The invention relates to the technical field of data mining, in particular to a link symbol prediction method based on linear optimization closed sub-graph coding.

Background

With the rise of media, the progress of the internet and the development of cloud computing have prompted social media interactions, such as subscription, attention, comments, forwarding, and the like, to emerge in large numbers. In the last decades, the field of link prediction has matured due to the ability to efficiently analyze network structures, creating a vast array of well-understood methods and algorithms to mine potential network interaction information. However, general link prediction focuses on aggressive interactions, ignoring interactions with obvious conflicts, such as distrust and objection. Therefore, an important sub-field of link prediction, link symbol prediction, takes negative interaction information in social media links into account, has great commercial value. The purpose of link symbol prediction is to make friend recommendations in a social network, which can mine potential information from two opposite interactions.

Link symbol prediction is similar to the link prediction problem in that the symbol of the target link is determined based on the link symbols known to the network. Since both link prediction and link symbol prediction are classified into two classes based on graph structure data, some link prediction algorithms can be used directly for link symbol prediction. Most current link prediction methods use graph embedding. Graph embedding maps each node into a representation vector for specific downstream tasks such as node classification, link prediction, and community detection. The graph embedding method has the advantage that the strong fitting capacity of the deep neural network can be utilized to achieve higher prediction accuracy. There are a number of graph embedding methods that have been applied exclusively to symbol networks recently, but there are two drawbacks that prevail. First, the connected edges of the symbol graph have various types such as symbols, directions and the like, so that the neighborhood structure of the symbol graph is more complex, and the common graph embedding method can not aggregate information of different neighborhoods. Secondly, the research on evolution mechanisms of symbols and directions is less at present, so that most of graph embedding methods depend on information transmission mechanisms on graphs designed based on the assumption of balance theory and state theory, proper network evolution mechanisms cannot be automatically learned from the neighborhood, and generalization capability is poor.

Disclosure of Invention

Aiming at the blank of the directed symbol graph sub-pattern coding method, the invention realizes a link symbol prediction method based on linear optimization closed sub-pattern coding. It assumes that the likelihood of link existence is the linear sum of their co-neighbor contributions. Since classical linear optimization methods are designed unsigned, two limitations must be overcome to apply them to signed directed graphs. On the one hand, classical linear optimization methods focus on global graph structures and therefore cannot encode hierarchical structures in closed sub-graphs around target nodes. For example, in determining the sign of a target link, first-order neighbors in a closed sub-graph are typically more important than second-order neighbors, whereas linear optimization methods cannot distinguish between first-order and second-order neighbors. Furthermore, the contributions of the positive and negative links in the closed sub-graph may also be different. In order to solve the problem, the invention provides a new weight allocation strategy for reallocating the weight of each edge according to the distance from each edge to the target edge and the sign thereof. A hierarchy of closed subgraphs can be given by this weight allocation strategy. On the other hand, classical linear optimization methods only consider the contribution of neighbors. The present invention generalizes this assumption to the contributions of the outgoing and incoming neighbors of two target nodes in a directed symbol network. Based on this, the invention proposes an improved linear optimization method to derive the likelihood matrix of the closed sub-graph. After the likelihood matrix is obtained, the nodes can be directly and simply ordered. After the encoding adjacency matrix is linearly optimized, the contribution of one node in the closed subgraph to two target nodes can be obtained, and the contribution can be used as an important score to order the nodes in the closed subgraph.

The invention aims to provide a link symbol prediction method based on linear optimization closed sub-graph coding, which comprises the following steps:

s1, acquiring a network to be predicted, preprocessing the network to be predicted to obtain a directed symbol network which does not contain isolated nodes, and respectively taking edges from the directed symbol network to form a training set sample and a test set sample;

s2, taking one edge from a training set sample or a test set sample as an edge to be predicted, extracting a closed subgraph containing the edge to be predicted, carrying out coarse-grained weighting on each edge in the closed subgraph according to an edge pre-weighting strategy to obtain a pre-weighted adjacent matrix, and inputting the pre-weighted adjacent matrix into a linear optimization model to carry out fine-grained linear optimization weighting to obtain three likelihood matrices;

s3, processing the three likelihood matrixes to correspondingly obtain three ordered likelihood matrixes, and splicing the three ordered likelihood matrixes to obtain vectors corresponding to edges to be predicted;

s4, taking another edge from the training set sample or the test set sample as an edge to be predicted until all edges in the training set sample or the test set sample are extracted, and repeating the steps S2 to S3 to correspondingly obtain a plurality of vectors;

s5, presetting a full-connection neural network second classifier and training to obtain a trained full-connection neural network second classifier, inputting a plurality of vectors into the trained full-connection neural network second classifier, processing the vectors, outputting a link symbol prediction score, and obtaining a symbol prediction result corresponding to the edge to be predicted according to the link symbol prediction score.

Preferably, S1 comprises:

s11, acquiring a network to be predicted, and eliminating the weight of edges in the network to be predicted to obtain a directed symbol network which does not contain isolated nodes;

s12, deleting positive edges and negative edges with preset proportion from a directed symbol network which does not contain isolated nodes, thereby obtaining a training network;

s13, taking a positive edge in a training network as a training set positive sample, taking a negative edge in the training network as a training set negative sample, taking a positive edge deleted in a directed symbol network which does not contain an isolated node as a testing set positive sample, and taking a negative edge deleted in the directed symbol network which does not contain the isolated node as a testing set negative sample;

s14, forming a training set sample by the training set positive sample and the training set negative sample, and forming a testing set sample by the testing set positive sample and the testing set negative sample.

Preferably, in S2, one edge is taken as an edge to be predicted from any one of the training set sample or the test set sample, and the extracting a closed subgraph including the edge to be predicted specifically includes:

s21, presetting a node threshold value, and taking one edge from any one of a training set sample or a test set sample as an edge to be predicted (v _i ,v _j ) Extracting the edges (v) _i ,v _j ) Node v of (2) _i 、v _j And its h (h.gtoreq.1) order neighbor node Γ ^h (v _i )、Γ ^h (v _j ) Forming a node set;

s22, calculating the number of nodes contained in the node set, comparing the number of nodes with a preset node threshold, if the number of nodes is smaller than the preset node threshold, executing step S23, otherwise, if the number of nodes is greater than or equal to the preset node threshold, or no new neighbor node can be added, and executing step S24;

s23, forming edges to be predicted (v _i ,v _j ) Node v of (2) _i 、v _j Is of the order h+1 neighbor Γ ^h+1 (v _i )、Γ ^h+1 (v _j ) Adding the node set to the node set, and executing step S22;

s24, finishing adding the neighbor nodes to obtain the edges (v) _i ,v _j ) A corresponding closed sub-graph.

Preferably, the edges in the closed subgraph comprise positive edges and negative edges, and in S2, coarse-grained weighting is performed on each edge in the closed subgraph according to an edge pre-weighting strategy to obtain a pre-weighting adjacency matrix, wherein the edge pre-weighting strategy specifically comprises:

if the edges in the closed subgraph are positive edges, the corresponding edge pre-weighting strategy is:

if the edges in the closed subgraph are negative edges, the corresponding edge pre-weighting strategy is:

in which W is _xy For closing edges (v) in subgraph _x ,v _y ) Weights of d ((v) _i ,v _j ),(v _x ,v _y ) Is the edge to be predicted (v) in the closed subgraph _i ,v _j ) To the edge (v) _x ,v _y ) Beta is a negative edge weight factor.

Preferably, the linear optimization model in S2 is specifically:

wherein Z is ⁽¹⁾ For the first contribution matrix, Z ⁽²⁾ For the second contribution matrix, Z ⁽³⁾ For the third contribution matrix, W is the pre-weighted adjacency matrix, alpha ₁ ，α ₂ ，α ₃ For free parameters greater than zero, ║ ║ is a norm symbol.

Preferably, three likelihood matrices are obtained in S2, where the three likelihood matrices are specifically:

S ⁽¹⁾ ＝α ₁ W(α ₁ W ^T W+I) ^-1 W ^T W

S ⁽²⁾ ＝α ₂ W ^T (α ₂ WW ^T +I) ^-1 WW

S ⁽³⁾ ＝α ₃ WW(α ₃ W ^T W+I) ^-1 W ^T

wherein S is ⁽¹⁾ Is a first likelihood matrix, S ⁽²⁾ Is a second likelihood matrix, S ⁽³⁾ For the third likelihood matrix, W is the pre-weighted adjacency matrix, alpha ₁ ，α ₂ ，α ₃ I is an identity matrix, which is a free parameter greater than zero.

Preferably, in S3, the three likelihood matrices are processed, so as to obtain three ordered likelihood matrices, which specifically includes:

s31, selecting one likelihood matrix from the three likelihood matrices, calculating importance scores for non-center nodes in the closed subgraph based on the likelihood matrix, and ordering the non-center nodes in a descending order according to the importance scores;

s32, calculating the node number in the closed subgraph, comparing the node number with a preset node threshold, and if the node number is smaller than the preset node threshold, namely |V (V) _i ,v _j ) Step S33 is executed if the number of nodes is less than K, otherwise, if the number of nodes is greater than or equal to the preset node threshold, i.e., |V (V) _i ,v _j ) The I is not less than K, and the step S34 is executed;

s33, filling the likelihood matrix into a K-dimensional square matrix by using 0, thereby obtaining a sequenced likelihood matrix;

s34, the post-ordering |V (V) _i ,v _j ) The I-K nodes are deleted from the closed subgraph, likelihood values corresponding to the deleted nodes are deleted from the likelihood matrix, and therefore the ordered likelihood matrix is obtained.

S35, another likelihood matrix is taken from the three likelihood matrices until all the three likelihood matrices are extracted, and the three ordered likelihood matrices are correspondingly obtained through the processing of steps S31 to S34.

Preferably, in S31, an importance score is calculated for a non-central node in the closed sub-graph, and a specific calculation formula of the importance score is:

/>

in the method, in the process of the invention,

coding as the x (x) in the closed sub-picture>2) The importance score of the individual node based on the kth likelihood matrix, k=1, 2,3, ||representing the absolute value, +|>

For the likelihood value of the 1 st row and the x column in the kth likelihood matrix, +.>

For the likelihood value of the x-th row 1 column in the kth likelihood matrix, +.>

For the likelihood value of row 2 and column x in the kth likelihood matrix, +.>

Is the likelihood value of the x row and the 2 nd column in the kth likelihood matrix.

Preferably, in S3, the splicing is performed on the three ordered likelihood matrices to obtain a vector corresponding to the edge to be predicted, which specifically includes:

s36, deleting elements on diagonal lines of the three ordered likelihood matrixes respectively, flattening the rest elements in the ordered likelihood matrixes to obtain three vectors corresponding to the ordered likelihood matrixes;

and S37, splicing the vectors corresponding to the three ordered likelihood matrixes to obtain the vector corresponding to the edge to be predicted.

Preferably, in S12, positive and negative edges of a preset proportion are deleted from the directed symbol network not including the isolated node, and the preset proportion is 20%.

According to the link symbol prediction method based on linear optimization closed sub-graph coding, the likelihood matrix is obtained by weighting edges of the sub-graph in two stages through the edge pre-weighting strategy and the linear optimization model, then the nodes are ordered according to the likelihood values, the ordered likelihood matrix is obtained, the ordered likelihood matrix contains abundant topological information of the sub-graph, the ordered likelihood matrix is used for replacing the pre-weighted adjacent matrix to serve as a sub-graph coder, and the ordered likelihood matrix strengthens the hierarchical structure and the importance of the negative edge of the closed sub-graph.

Drawings

FIG. 1 is a flow chart of a method for predicting a link symbol based on linear optimization closed sub-picture coding in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process for obtaining a corresponding vector by a closed sub-graph according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of extracting a closed sub-graph of an edge to be predicted according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of extracting a closed sub-graph of an edge to be predicted according to another embodiment of the present invention;

Detailed Description

In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings.

A link symbol prediction method based on linear optimization closed sub-graph coding specifically comprises the following steps:

Specifically, referring to fig. 1 and fig. 2, fig. 1 is a flowchart of a link symbol prediction method based on linear optimization closed sub-graph coding according to an embodiment of the present invention, and fig. 2 is a schematic process diagram of obtaining a corresponding vector through a closed sub-graph according to an embodiment of the present invention.

Firstly, acquiring a network to be predicted, preprocessing the network to be predicted to obtain a directed symbol network which does not contain isolated nodes, and respectively taking edges from the directed symbol network to form a training set sample and a test set sample; then taking one edge from the training set sample or the test set sample as an edge to be predicted, extracting a closed subgraph containing the edge to be predicted, carrying out coarse-grained weighting on each edge in the closed subgraph according to an edge pre-weighting strategy to obtain a pre-weighted adjacent matrix, and inputting the pre-weighted adjacent matrix into a linear optimization model to carry out fine-grained linear optimization weighting to obtain three likelihood matrixes; then, processing and splicing the three likelihood matrixes to obtain vectors corresponding to the edges to be predicted; taking another edge from the training set sample or the test set sample as the edge to be predicted until all edges in the training set sample or the test set sample are extracted, repeating the steps S2 to S3, and correspondingly obtaining a plurality of vectors; finally, presetting a full-connection neural network second classifier and training to obtain a trained full-connection neural network second classifier, inputting a plurality of vectors into the trained full-connection neural network second classifier, processing the vectors, outputting a link symbol prediction score (a value between 0 and 1), and judging according to the link symbol prediction score to obtain a symbol prediction result corresponding to the edge to be predicted. For example, if the link prediction score is greater than or equal to 0.5, it indicates that the edge to be predicted is a positive edge, otherwise, if the link prediction score is less than 0.5, it indicates that the edge to be predicted is a negative edge.

In one embodiment, S1 comprises:

Specifically, after eliminating the weight of the edge in the network to be predicted, a directed symbol network which does not contain isolated nodes is obtained, edges with preset proportion are deleted from the directed symbol network, the positive edge and the negative edge which are not deleted in the directed symbol network are respectively used as a training set positive sample and a training set negative sample, and the positive edge and the negative edge which are deleted in the directed symbol network are respectively used as a testing set positive sample and a testing set negative sample. Further, the sides with the preset proportion are deleted, the preset proportion is 20%, or may be 10%, or 30%, etc., and the lower the preset proportion is, the fewer deleted sides are, the higher the prediction accuracy is.

In one embodiment, in S2, one edge to be predicted is taken from any one of the training set sample or the test set sample, and a closed sub-graph of the edge to be predicted is extracted, which specifically includes:

Taking a training set sample as an example, the directed symbol network corresponding to the training set sample is expressed as G= (V, E) ⁺ ,E ^- ) Wherein

Representing a set of nodes in a directed symbol network, i.e. a set of nodes corresponding to all edges of a training set sample, E ⁺ Representing positive edge sets in a directed symbol network, E ^- Representing a set of negative edges in a directed symbol network, whereby e=e is known ⁺ ∪E ^- Representing the set of all edges in the directed symbol network,/->

I.e. each edge belongs to either the training set positive sample set or the training set negative sample set.

For any node v in the training set sample _i E, V, defining a set of first-order neighbor nodes as follows:

Γ(v _i )＝Γ _out (v _i )∪Γ _in (v _i )

wherein, the liquid crystal display device comprises a liquid crystal display device,

wherein Γ (v) _i ) For node v _i V is the node set corresponding to all edges in the training set, Γ _out (v _i ) For node v _i To node v _j The first-order neighbor node set corresponding to the formed edge is Γ _in (v _i ) For node v _j To node v _i And the formed first-order neighbor node set corresponding to the edge is E, which is a set of all edges in the training set.

For any node v in the training set sample _i E, V, defining a set of second-order neighbor nodes as follows:

wherein, Γ ² (v _i ) For node v _i V of second-order neighbor nodes of (a) _k For any one of the neighboring nodes,

representing v _k Is node v _i V of any one of the first-order neighbor nodes _j ∈Γ(v _k ) Representing v _j Is node v _k Thus v _j Is v _i Neighbor nodes of the neighbor nodes of (v), i.e. v _j And v _i A passageway may be formed by two sides.

Likewise, node v may be obtained _i The set of h-order neighbor nodes of E V is Γ ^h (v _i ) And will not be described in detail herein.

The extraction method of the closed subgraph comprises the following steps: a node threshold K is preset, and an edge (v _i ,v _j ) Node v of (2) _i 、v _j And 1-order neighbor node Γ (v) _i )、Γ(v _j ) Added to the closed sub-graph G (v _i ,v _j ) Node set V (V) _i ,v _j ) Then calculate the node set V (V _i ,v _j ) The number of nodes |V (V) _i ,v _j ) I, if the number of nodes at this time is smaller than the preset node threshold, i.e., |v (V) _i ,v _j ) I < K, and then the 2-order neighbor node Γ is adopted ² (v _i )、Γ ² (v _j ) Added to the closed sub-graph G (v _i ,v _j ) Node set V (V) _i ,v _j ) In the method, nodes are added in the similar way until the obtained node number is greater than or equal to a preset node threshold value, namely |V (V _i ,v _j ) And (3) the I is not less than K, or the process ends when no new neighbor nodes can be added, so that a closed subgraph of the edge to be predicted is obtained.

Specifically, referring to fig. 3 and fig. 4, fig. 3 is a schematic diagram of extracting a closed sub-graph of an edge to be predicted according to an embodiment of the present invention, and fig. 4 is a schematic diagram of extracting a closed sub-graph of an edge to be predicted according to another embodiment of the present invention.

In fig. 3, if the preset node threshold K is 6, the predicted edge (v _i ，v _j ) After 1-order neighbor nodes in the first-order neighbor nodes, the node number obtained at the moment is calculated to be 7 and is larger than a preset node threshold value, and new neighbor nodes are added to obtain a closed subgraph corresponding to the edges to be predicted.

In fig. 4, if the preset node threshold K is 6, the predicted edge (v _i ，v _j ) After the 1-order neighbor nodes in the first-order neighbor domain are calculated, the number of the nodes obtained at the moment is 5 and is smaller than a preset node threshold value, so that 2-order neighbor nodes in the second-order neighbor domain are required to be extracted, the number of the nodes obtained at the moment is 8 and is larger than the preset node threshold value, and new neighbor nodes are added to obtain a closed subgraph corresponding to the edges to be predicted.

In one embodiment, the edges in the closed subgraph include positive edges and negative edges, and in S2, coarse-grained weighting is performed on each edge in the closed subgraph according to an edge pre-weighting policy, so as to obtain a pre-weighted adjacency matrix, where the edge pre-weighting policy specifically includes:

Specifically, the edge pre-weighting strategy is to block sub-graph G (v _i ,v _j ) And taking an appropriate weight to distinguish whether each edge in the closed sub-graph contributes positively or negatively to the edge to be predicted. The adjacency matrix is denoted by A _ij =1 indicates that there is one slave node v _i Pointing to node v _j Directed positive edge of A _ij = -1 indicates that there is one slave node v _i Pointing to node v _j Directed negative edge of A _ij =0 indicates that there is no node v _i Pointing to node v _j Is a directional edge of (a). Extraction of closed subgraph G (v) _i ,v _j ) Thereafter, the closed subgraph G (v _i ,v _j ) The adjacent matrix A is represented as an adjacent matrix A only comprising three values of 1, 0 and 1, so that the problem of link symbol prediction is converted into the problem of classifying the adjacent matrix A.

From the adjacency matrix a, the edge to be predicted (v _i ,v _j ) Is (v) _i ,v _j ) The weight of each side of the matrix is obtained, and the pre-weighted adjacency matrix W is obtained by the following calculation mode:

design (v) _x ,v _y ) For the edge to be predicted (v) _i ,v _j ) Is (v) _i ,v _j ) Any one edge of (v) _x ,v _y ) Is a positive side, then side (v _x ,v _y ) Weight calculation of (2)The method is as follows:

if (v) _x ,v _y ) Is a negative edge, then edge (v _x ,v _y ) The weight calculation mode of (2) is as follows:

wherein d ((v) _i ,v _j ),(v _x ,v _y ) Is the edge (v) in the closed sub-graph _i ,v _j ) To the edge (v) _x ,v _y ) In (a) is set to 1, d ((v) if there is a connected edge between two nodes _i ,v _j ),(v _x ,v _y ) Is d (v) _i ,v _x )、d(v _i ,v _y )、d(v _j ,v _x )、d(v _j ,v _y ) The minimum of these four distances, β is a negative edge weight factor, n ⁺ N is the number of positive edges in the directed symbol network ^- As the number of negative edges in the directed symbol network.

In one embodiment, the linear optimization model in S2 is specifically:

/>

In one embodiment, three likelihood matrices are obtained in S2, where the three likelihood matrices are specifically:

S ⁽¹⁾ ＝α ₁ W(α ₁ W ^T W+I) ^-1 W ^T W

S ⁽²⁾ ＝α ₂ W ^T (α ₂ WW ^T +I) ^-1 WW

S ⁽³⁾ ＝α ₃ WW(α ₃ W ^T W+I) ^-1 W ^T

Specifically, a consistent node ordering method is adopted to encode the nodes in the closed subgraph so as to facilitate subsequent operation, and the specific method is as follows:

1) For edges to be predicted (v) _i ,v _j ) Is (v) _i ,v _j ) Will constitute the edges to be predicted (v _i ,v _j ) Node v of (2) _i 、v _j As a central node, encoded as 1 and 2, respectively;

2) All other nodes in the closed sub-graph are randomly encoded as 3 to |v (V _i ,v _j )|。

After encoding by the above method, the edge to be predicted (v _i ,v _j ) Is (v) _i ,v _j ) Corresponding to the unique adjacent matrix A, obtaining a closed sub-graph G (v) through an edge pre-weighting strategy _i ,v _j ) A corresponding pre-weighted adjacency matrix W.

According to pre-weighted neighborsDeriving a closed sub-graph G (v) from the joint matrix W _i ,v _j ) The specific formula is:

S ⁽¹⁾ ＝WZ ⁽¹⁾

S ⁽²⁾ ＝WZ ⁽²⁾

S ⁽³⁾ ＝WZ ⁽³⁾

wherein W is a pre-weighted adjacency matrix, S ⁽¹⁾ For a matrix of first likelihoods ⁽²⁾ Is a second likelihood matrix, S ⁽³⁾ Third likelihood matrix, Z ⁽¹⁾ For the first contribution matrix, Z ⁽²⁾ For the second contribution matrix, Z ⁽³⁾ For the third contribution matrix, Z ⁽¹⁾ 、Z ⁽²⁾ And Z ⁽³⁾ The value of (2) is based on the following linear optimization model, and the solution of the linear optimization model is the corresponding contribution matrix:

wherein alpha is ₁ 、α ₂ 、α ₃ Is a free parameter larger than zero, ║ ║ is a norm sign, and the solution Z of the optimization model can be obtained by taking the Frobenius norm of the power of 2 as a matrix norm ⁽¹⁾ 、Z ⁽²⁾ And Z ⁽³⁾ 。

Will contribute matrix Z ⁽¹⁾ 、Z ⁽²⁾ And Z ⁽³⁾ The values of the likelihood matrix are respectively brought into the calculation formulas of the corresponding likelihood matrix, so that the calculation formulas of the likelihood matrix are obtained:

S ⁽¹⁾ ＝α ₁ W(α ₁ W ^T W+I) ^-1 W ^T W

S ⁽²⁾ ＝α ₂ W ^T (α ₂ WW ^T +I) ^-1 WW

S ⁽³⁾ ＝α ₃ WW(α ₃ W ^T W+I) ^-1 W ^T

s for closed subgraphs of different sizes ⁽¹⁾ 、S ⁽²⁾ 、S ⁽³⁾ In addition, even for the same closed subgraph, each element of the likelihood matrix may be not fixed due to the random marking of the nodes in the closed subgraph, so that a consistent node ordering method is adopted to encode the nodes in each closed subgraph, and finally three likelihood matrices are obtained, which can facilitate training of the neural network.

In one embodiment, the processing in S3 is performed on three likelihood matrices, so as to obtain three ordered likelihood matrices, which specifically includes:

In one embodiment, the importance score is calculated for the non-central node in the closed sub-graph in S31, where the specific calculation formula of the importance score is:

in the method, in the process of the invention,

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a process of obtaining a corresponding vector by a closed sub-graph according to an embodiment of the present invention.

Taking any one of the obtained three likelihood matrixes as an example, processing the likelihood matrixes to correspondingly obtain ordered likelihood matrixes, wherein the specific process is as follows:

1) Calculating importance scores of non-central nodes (nodes with code numbers larger than 2) in the closed subgraph, and sorting the nodes in the likelihood matrix in a descending order according to the importance scores, wherein the sum of likelihood values of the non-central nodes to the central nodes (namely nodes with code numbers of 1 and 2) is used as the importance score, so that the importance of the nodes to be predicted for symbol prediction of the edges to be predicted is measured:

in the formula, I represents an absolute value,

coding as the x (x) in the closed sub-picture>2) The importance scores of the individual nodes based on the kth likelihood matrix, k=1, 2,3,/-for each node>

For the likelihood values of the 1 st row and the x column in the kth likelihood matrix, i.e. the likelihood values of the edge formed from the node coded 1 to the node coded x in the closed subgraph>

For the likelihood values of the kth likelihood matrix, i.e. the likelihood values of the edges formed from the node coded x to the node coded 1 in the closed sub-graph, row x and column 1>

For the likelihood value of the 2 nd row and the x column in the kth likelihood matrix, i.e. the likelihood value of the edge formed from the node coded 2 to the node coded x in the closed subgraph>

The likelihood values for the x-th row and the 2-th column in the kth likelihood matrix, that is, the likelihood values of the edges formed from the node encoded as x to the node encoded as 2 in the closed sub-graph.

2) After calculating the importance scores, the order of the nodes in the closed subgraph may be arranged in descending order of the importance scores (where the order of the nodes labeled 1 and 2 is 1 st and 2 nd, respectively, the remaining nodes are ordered in descending order of the importance scores, and rank to 3 rd to |V (V) _i ,v _j ) I bit).

3) Calculating the node number |V (V) in a closed sub-graph _i ,v _j ) I, the calculated node number i V (V _i ,v _j ) Ratio of I to preset node threshold KIf the node number is smaller than the preset node threshold value, filling the corresponding likelihood matrix into a K-dimensional square matrix by using 0, and taking the K-dimensional square matrix as the likelihood matrix after sequencing; otherwise, if the calculated node number is greater than or equal to the preset node threshold, the post-ordering |V (V _i ,v _j ) Deleting the I-K nodes, deleting likelihood values corresponding to the deleted nodes from the likelihood matrix to obtain a processed K-dimensional matrix, and taking the processed K-dimensional matrix as the ordered likelihood matrix.

In one embodiment, in S3, the splicing of the three ordered likelihood matrices to obtain the vector corresponding to the edge to be predicted specifically includes:

Specifically, assuming that three ordered likelihood matrixes are 5*5-dimensional square matrixes, deleting diagonal line elements of each likelihood matrix, flattening the rest elements into a 20-dimensional vector, obtaining three 20-dimensional vectors, splicing the three 20-dimensional vectors into a 60-dimensional vector, taking the three 20-dimensional vectors as vectors corresponding to edges to be predicted, inputting the 60-dimensional vectors into a trained fully-connected neural network two-classifier, processing the vectors, outputting a link symbol prediction score, and obtaining a symbol prediction result corresponding to the edges to be predicted according to the link symbol prediction score.

Further, experimental verification is performed on the prediction effect of the link symbol prediction method based on the linear optimization closed sub-graph coding. Referring to table 1, different network-based link prediction methods are listed in table 1.

Table 1 Link prediction method

In Table 1, five real world networks were employed, including Bitcoin-Alpha, bitcoin-OTC, wikiRfA, slashdot and Epinions as the experimental data sets. Wherein Bitcoin-Alpha, bitcoin-OTC is two user networks that characterize trust relationships; wikiRfA is a voting network of wikipedia users, each of which can support (positive side) or resist (negative side) the promotion of other users by voting; slashdot is a technology-related news website that has evolved Slashdot zoom, allowing users to mark each other as friends or enemies; epinits is a comprehensive consumer comment website which establishes an online social network describing trust relationships according to the trust or distrust of consumers on comments of other people.

To compare the performance of the methods in this application and existing methods, we compare the methods in this application with the feature engineering and symbol graph embedding two types of common link symbol prediction methods. In the application, a node threshold K=5 of a closed sub-graph is set, and a free parameter alpha is set ₁ ＝0.005、α ₂ ＝0.005、α ₃ =0.005. A fully connected neural network consisting of 3 hidden layers (32, 16 neurons, respectively) and a softmax output layer was used. All the activation functions of the hidden layers are ReLU, the update rule is optimized by Adam, the batch size is 512, the learning rate is 0.001, and the number of epochs is set to 100.

According to the link symbol prediction method based on linear optimization closed sub-graph coding, the likelihood matrix is obtained by weighting edges in the closed sub-graph of the edges to be predicted in two stages through the edge pre-weighting strategy and the linear optimization model, and then the nodes are ordered according to the likelihood values, so that the ordered likelihood matrix contains abundant topological information of the sub-graph. And replacing a pre-weighted adjacent matrix with the ordered likelihood matrix to serve as a sub-picture encoder, wherein the ordered likelihood matrix strengthens the importance of the hierarchical structure and the negative edge of the closed sub-picture. Experimental results show that the SELO model provided by the application is superior to the existing characteristic engineering method and graph embedding method in all 6 real networks and 4 evaluation indexes.

The link symbol prediction method based on linear optimization closed sub-graph coding provided by the invention is described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method for predicting a link symbol based on linear optimization closed sub-picture coding, the method comprising:

s2, taking one edge from the training set sample or the test set sample as an edge to be predicted, extracting a closed subgraph containing the edge to be predicted, carrying out coarse-grained weighting on each edge in the closed subgraph according to an edge pre-weighting strategy to obtain a pre-weighted adjacent matrix, and inputting the pre-weighted adjacent matrix into a linear optimization model to carry out fine-grained linear optimization weighting to obtain three likelihood matrixes;

s3, processing the three likelihood matrixes to correspondingly obtain three ordered likelihood matrixes, and splicing the three ordered likelihood matrixes to obtain vectors corresponding to the edges to be predicted;

2. The link symbol prediction method based on linear optimization closed sub-picture coding according to claim 1, wherein S1 comprises:

s12, deleting positive edges and negative edges with preset proportions from the directed symbol network which does not contain isolated nodes, thereby obtaining a training network;

s13, taking a positive edge in the training network as a training set positive sample, taking a negative edge in the training network as a training set negative sample, taking a positive edge deleted in the directed symbol network which does not contain the isolated node as a testing set positive sample, and taking a negative edge deleted in the directed symbol network which does not contain the isolated node as a testing set negative sample;

3. The method for predicting the link symbol based on the linear optimization closed sub-graph coding according to claim 2, wherein in S2, one edge is taken as an edge to be predicted from any one of the training set sample and the test set sample, and the extracting the closed sub-graph including the edge to be predicted specifically includes:

s21, presetting a node threshold, and taking one edge from the training set sample or the test set sample as an edge to be predicted (v _i ,v _j ) Extracting the edges (v) _i ,v _j ) Node v of (2) _i 、v _j And its h (h.gtoreq.1) order neighbor node Γ ^h (v _i )、Γ ^h (v _j ) Forming a node set;

s22, calculating the node number contained in the node set, comparing the node number with the preset node threshold, if the node number is smaller than the preset node threshold, executing step S23, otherwise, if the node number is larger than or equal to the preset node threshold, or no new neighbor node can be added, executing step S24;

s23, forming the edges (v) _i ,v _j ) Node v of (2) _i 、v _j Is of the order h+1 neighbor Γ ^h+1 (v _i )、Γ ^h+1 (v _j ) Adding the node set to the node set, and executing step S22;

s24, finishing adding neighbor nodes to obtain the edge (v) _i ,v _j ) A corresponding closed sub-graph.

4. The link symbol prediction method based on linear optimization closed sub-graph coding according to claim 3, wherein the edges in the closed sub-graph include positive edges and negative edges, and the step S2 of performing coarse-grained weighting on each edge in the closed sub-graph according to an edge pre-weighting strategy to obtain a pre-weighted adjacency matrix, where the edge pre-weighting strategy specifically includes:

5. The link symbol prediction method based on linear optimization closed sub-graph coding according to claim 4, wherein the linear optimization model in S2 specifically comprises:

wherein Z is ⁽¹⁾ For the first contribution matrix, Z ⁽²⁾ For the second contribution matrix, Z ⁽³⁾ For the third contribution matrix, W is the pre-weighted adjacency matrix, alpha ₁ ，α ₂ ，α ₃ Is a free parameter that is greater than zero, and I are norm symbols.

6. The link symbol prediction method based on linear optimization closed sub-picture coding according to claim 5, wherein three likelihood matrices are obtained in S2, and the three likelihood matrices are specifically:

S ⁽¹⁾ ＝α ₁ W(α ₁ W ^T W+I) ^-1 W ^T W

S ⁽²⁾ ＝α ₂ W ^T (α ₂ WW ^T +I) ^-1 WW

S ⁽³⁾ ＝α ₃ WW(α ₃ W ^T W+I) ^-1 W ^T

7. The link symbol prediction method based on linear optimization closed sub-graph coding according to claim 6, wherein in S3, three likelihood matrices are processed to obtain three ordered likelihood matrices, respectively, which specifically includes:

s31, taking one likelihood matrix from any one of the three likelihood matrices, calculating importance scores for non-center nodes in the closed subgraph based on the likelihood matrix, and sorting the non-center nodes in descending order according to the importance scores;

s32, calculating the node number in the closed subgraph, comparing the node number with the preset node threshold, and if the node number is smaller than the preset node threshold, namely |V (V) _i ,v _j ) Step S33 is executed if the number of nodes is less than K, otherwise, if the number of nodes is greater than or equal to the preset node threshold, i.e., |V (V) _i ,v _j ) The I is not less than K, and the step S34 is executed;

s34, the post-ordering |V (V) _i ,v _j ) And deleting the I-K nodes from the closed subgraph, and deleting likelihood values corresponding to the deleted nodes from the likelihood matrix, thereby obtaining the ordered likelihood matrix.

S35, another likelihood matrix is taken from the three likelihood matrices until the three likelihood matrices are all extracted, and the three ordered likelihood matrices are correspondingly obtained through processing in steps S31 to S34.

8. The link symbol prediction method based on linear optimization closed sub-graph coding according to claim 7, wherein in S31, an importance score is calculated for a non-central node in the closed sub-graph, and a specific calculation formula of the importance score is:

in the method, in the process of the invention,

9. The link symbol prediction method based on linear optimization closed sub-graph coding according to claim 8, wherein in S3, the three ordered likelihood matrices are spliced to obtain the vector corresponding to the edge to be predicted, which specifically includes:

10. The method for linear optimization closed sub-picture coding based on link symbol prediction according to claim 2, wherein the positive and negative edges of a predetermined proportion are deleted from the directed symbol network excluding isolated nodes in S12, and the predetermined proportion is 20%.