CN112765415A

CN112765415A - Link prediction method based on relational content joint embedding convolution neural network

Info

Publication number: CN112765415A
Application number: CN202110085651.5A
Authority: CN
Inventors: 朱笑岩; 张琳杰; 马建峰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2021-05-07

Abstract

The invention discloses a link prediction method based on a relation content combined embedded convolutional neural network, which mainly solves the problems of low precision and low running speed of the conventional link prediction method. The implementation scheme is as follows: 1) calculating a structural initial vector of the node pair; 2) calculating a content embedding vector of the node pair; 3) selecting nodes to generate a training sample set and a test sample set, and respectively obtaining structure initial vectors and content embedded vectors of node pairs of the training sample and the test sample to obtain real label values of the training sample; 4) constructing a convolutional neural network, and training the convolutional neural network by using a training sample set; 5) and inputting the test sample into the trained convolutional neural network to obtain a prediction result. The invention comprehensively considers the relationship information characteristics and the content information characteristics of the nodes, improves the prediction precision and efficiency of the link, and can be used for link prediction in communication privacy connection, community formation and radar network relay evaluation.

Description

Link prediction method based on relational content joint embedding convolution neural network

Technical Field

The invention belongs to the technical field of computers, and further relates to a link prediction method of a convolutional neural network, which can be used for social network recommendation, cooperative network recommendation, communication privacy connection, community formation and radar network relay evaluation.

Background

Link prediction is an information processing technique that identifies missing links and predicts new links in social network analysis, and is one of the important bridges that link complex networks with information science. The progress of link prediction technology is dependent on the development of network link formation factor analysis technology. The formation of links in a network is related to many factors related to nodes and edges. Node behavior may potentially affect nodes around it, and associated nodes will often behave similarly. The existence and nature of edges in a network is also affected by several structural factors, such as the local neighborhood of edges in the network, the topology of the network, the nature and labels associated with surrounding edges in the network, and the like. With the improvement of the link forming factor analysis technology, the mainstream research method of link prediction gradually transits from a mathematical method and a statistical method which utilize a heuristic method to a machine learning method. The current technical development trend is that the connection information and the content information of the network are embedded in an Euclidean space to learn node and edge representation, the representation of each node and edge is responsible for coding the characteristics of corresponding types, and whether a link exists or not is predicted through neural network training so as to save the similarity between the original associated node and the associated edge and obtain a high-precision prediction effect. The research related to link prediction can not only promote the development of network science and information science theories, but also has great practical application value, such as friend recommendation, citation recommendation, project recommendation, radar network optimization, cellular network formation, traffic path planning, identification of collaborators in a collaborating network, identification of criminals in a criminal network and the like.

The patent document "a link prediction method based on network structure and text information" (patent application No. CN202010113634.3, application publication No. CN111368074A) applied by the university of electronic science and technology of sienna discloses a link prediction method based on network structure and text information. The method comprises the following implementation steps: firstly, based on nodes which are randomly walked in a network structure, structure embedded vectors of the nodes are obtained. And secondly, constructing a convolutional neural network to process the text information of the nodes, and acquiring the text information embedded vectors of the nodes. And thirdly, jointly embedding the structure embedding vector of the node and the text information embedding vector. And fourthly, generating a training set and a testing set. And fifthly, constructing a neural network for two-class learning, and sixthly, training the neural network. And seventhly, predicting a result. In the method, two-classification learning is performed through the constructed neural network in the prediction stage, so that the prediction of low-dimensional dense edge representation and edge weight from a high-dimensional sparse network structure is difficult, and the method is not suitable for a large-scale network, so that the prediction precision is limited.

Ningbo university discloses a link prediction method in a dynamic social network in "a link prediction method in a dynamic social network" in patent document filed by Ningbo university (patent application No. 201911285769.1, application publication No. CN 111090781A). The method comprises the following implementation steps: in the first step, the nodes in the network at the time t are mapped into a low-dimensional embedding space and written into a low-dimensional representation vector of each node. And secondly, respectively calculating local characteristics and second-order similarity of nodes in the network at the time t and loss functions corresponding to the maintained network evolution smoothness, and finally obtaining the optimal low-dimensional expression vector of the nodes according to the minimized total loss function. And thirdly, obtaining low-dimensional expression vectors of all nodes in the test set by using an optimal low-dimensional expression vector method, and inputting the low-dimensional expression vectors of each node pair into a logistic regression classifier in sequence for training to obtain the trained logistic regression classifier. And fourthly, inputting the low-dimensional expression vector of each node pair in the network at the moment T into the trained logistic regression classifier to obtain the network information at the moment T + 1. According to the method, the minimum value of the relative entropy between the content similarity distribution and the edge weight distribution representing the relationship is used as the distance between the similarity distribution and the edge weight distribution, the number of node neighbors is exponentially increased, the calculated amount is large, and the training speed is slowed down. In addition, the method uses global information for calculation, so that the relation attribute and the content attribute are difficult to fully participate in the calculation of the prediction weight, and the prediction precision is low.

Disclosure of Invention

The invention aims to provide a link prediction method based on the relation content combined embedded convolutional neural network aiming at the defects of the prior art so as to improve the precision and generalization capability of a link prediction model and accelerate the training speed of the link prediction model.

The scheme for realizing the purpose of the invention is as follows: calculating the overall closeness of the node pairs to obtain node pair structure initial vectors, calculating preference characteristics between the nodes and contents to obtain content embedding vectors, and constructing a prediction model of the convolutional neural network based on the relation content attributes to obtain prediction tag values of link prediction.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

1. a link prediction method based on a relation content joint embedded convolutional neural network is characterized by comprising the following steps:

(1) acquiring a first-order neighbor node set and a second-order neighbor node set of each node in a link network, sampling the two sets, and calculating a structural initial vector a of the node pair according to a sampling result_i；

(2) Calculating a node content embedding vector according to the content information of each node in the link network, pairing all nodes in the link network pairwise, and calculating a content embedding vector b of each node pair_i；

(3) Selecting 2437 node pairs randomly from all node pairs of the link network to form a training sample set, obtaining structure vectors, content embedding vectors and real label values of the node pairs of the training sample, selecting 429 node pairs randomly from all the remaining node pairs in the link network except the training sample set to form a test sample set, and obtaining structure vectors and content embedding vectors of the node pairs of the test sample;

(4) the construction sequentially comprises 1 sampling layer, 20 convolutional layers, 16 pooling layers, 2 hidden layers and 1 full-connection layerCascade-connecting formed convolution neural network, selecting Hadamard product as calculation function of hidden layer, selecting Relu function as activation function of hidden layer, selecting training error l of training sample_iAs a function of the loss of the network;

(5) training a convolutional neural network:

setting an initial weight vector m₁Initial learning rate of η₁Maximum number of iterations is Q_MAXDividing a training sample set into i batches, inputting the i batches into a convolutional neural network, and training the convolutional neural network until a loss function of the network is converged or the maximum iteration round times is reached to obtain a trained convolutional neural network;

(6) inputting the structure vector of the node pair of each sample in the test sample set and the content embedding vector of the node pair into the trained neural network to obtain the predicted label value W of all the node pairs in the test sample set;

(7) setting a detection threshold value H, comparing the prediction tag value W obtained in the step (6) with the detection threshold value H to obtain a final link prediction result:

if W is larger than H, the link is considered to exist;

if W is less than or equal to H, the link is not existed.

Compared with the prior art, the invention has the following advantages:

firstly, when the node pair structure initial vector is calculated, the effective representation of the nodes in the relational topological graph in the structure vector space is obtained, the representation of each node is responsible for coding the node characteristics of the corresponding type, and the similarity between the original associated nodes is preserved, so that the speed of convolution operation can be accelerated when the training and operation of a model are completed by using a convolution neural network, and the possibility of generating a link between two nodes which do not generate the link in the link network is accurately predicted;

secondly, when the calculation node embeds the content into the vector, the potential association represented by the learning content information vector is obtained, so that the embedding dimension efficiency is improved, the problem of high calculation complexity of nearest neighbor search when the vector dimension is too high is solved, and the parameter is continuously optimized, so that the method is suitable for a large-scale network;

thirdly, when the convolutional neural network is constructed, parameters and calculated amount can be reduced while node characteristics are kept by using a calculation function and an activation function of a hidden layer, and the method is embodied in that not only can content information of nodes be captured, but also a relationship structure of the nodes can be captured, and the accuracy of link prediction is improved;

fourthly, because the convolutional neural network is used for learning the embedded vector and a network loss function suitable for the convolutional neural network model is designed, the problem of inaccurate characterization of the embedded vector in the prior art can be solved, and a more accurate link prediction result can be obtained.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a convolutional neural network model structure in the present invention;

fig. 3 is a simulation of the results of link prediction over 2 data sets using the present invention and 5 prior art techniques.

Detailed Description

Embodiments of the present invention are further described below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps of this example are as follows:

step 1, calculating a structural initial vector a of a node pair_i。

1.1) acquiring a first-order neighbor node set and a second-order neighbor node set of each node in a link network;

in the embodiment, but not limited to, a random walk searching mode is adopted to obtain a first-order neighbor node set and a second-order neighbor node set, namely, a better neighbor node is selected from neighbor nodes of the current candidate solution for searching and transferring;

1.2) sampling a first-order neighbor node set and a second-order neighbor node set;

in the embodiment, but not limited to, sampling is carried out on a first-order neighbor node set and a second-order neighbor node set in an ancestor sampling mode, namely, the first-order neighbor node is sampled, and the second-order neighbor node is sampled only when all first-order neighbor nodes of a certain second-order neighbor node are sampled;

1.3) calculating the structural initial vector a of the node pair according to the sampling result_i；

1.3.1) selecting a new target node i from the link network;

in the embodiment, but not limited to, a branch and bound rule is adopted to select a new target node, and the branch and bound rule refers to selecting the new target node in a breadth-first mode;

1.3.2) randomly selecting an auxiliary node j from a first-order neighbor node set of a new target node i, forming a node pair by the new target node i and the auxiliary node j, and calculating the connection closeness R of the node pair_ij：

Wherein, U_iA first order set of neighbor nodes representing a new target node i,

dimension, U, of a first-order set of neighbor nodes representing a new target node i_jA first order set of neighbor nodes representing a secondary node j,

representing the dimension of a first-order neighbor node set of the auxiliary node j;

1.3.3) calculating the Range closeness S of node pairs_ij：

Wherein, U_gRepresents the intersection of the first order set of neighbor nodes of the new target node i and the first order set of neighbor nodes of the auxiliary node j,

first order neighbor node representing new target node iThe dimension of the intersection of the point set and the first-order neighbor node of the auxiliary node j;

1.3.4) calculating the overall closeness J of the node pair_ij：

Where ρ is a mapping vector that maps high-dimensional dense data to low-dimensional dense data, α represents a responsibility weight for the closeness of the connection of the node pair, this example takes and is not limited to 0.7,

representing the cascade operation of the multilayer perceptron, and beta represents the responsibility weight of the range closeness of the node pair, and the example is not limited to 0.3;

1.3.5) judging whether 32 auxiliary nodes j are selected:

if yes, the overall closeness J of the 32 node pairs_ijComposing the structural initial vector:

ε_i＝[J_ij]₃₂，

where [ ] is the constituent operator, perform 1.3.6);

otherwise, return to 1.3.2);

1.3.6) judging whether all new target nodes are selected:

if yes, all the structure initial vectors are processed to be epsilon_iInitial vector of structure spliced into node pairs:

a_i＝∪ε_i，

wherein, U is a splicing operator;

otherwise, return to 1.3.1).

Step 2, calculating the content embedding vector b of the node pair_i。

2.1) calculating the content embedding vector of each node according to the content information of each node in the link network;

2.1.1) obtaining the feature vector x of the new target node i from the content information of the new target node i selected in 1.3.1)_iCharacteristics of the new target node iVector weight ω_iContent library feature vector y of new target node i_i；

The embodiment adopts but is not limited to a crawler mode to obtain the feature vector x of the new target node i_iNew target node i eigenvector weight ω_iContent library feature vector y for new target node i_iAutomatically capturing content information on a data list of a new target node i from a data set according to a rule of a label sequence;

2.1.2) calculating the content embedding vector of the new target node i:

where ζ represents the nonlinear activation function leak ReLU function, ψ represents the deviation of the content embedding vector, and this example takes and is not limited to 0.5;

2.1.3) obtaining the characteristic vector x of the auxiliary node j from the content information of the auxiliary node j_jWeight ω of feature vector of auxiliary node j_jAuxiliary node j content library feature vector y_jAnd calculating the content embedding vector of the auxiliary node j:

the embodiment adopts but is not limited to a crawler mode to obtain the characteristic vector x of the auxiliary node j_jWeight ω of feature vector of auxiliary node j_jAuxiliary node j content library feature vector y_jThat is, the content information on the auxiliary node j data list is automatically captured from the data set according to a certain rule;

2.2) pairing all nodes in the link network pairwise;

in the embodiment, the Hungarian algorithm is adopted but not limited to carry out pairwise pairing on all nodes, namely, a combined optimization algorithm for solving the node distribution problem in polynomial time is adopted;

2.3) calculating the content embedding vector b of each node pair_i；

2.3.1) calculating a content embedding vector π_i：

Where ξ represents a nonlinear activation function; τ denotes a content embedding responsibility weight of the new target node i, and is not limited to 0.7 in the present example;

represents the content embedding responsibility weight of the auxiliary node j, which is taken as 0.3 in the example and is not limited in the example; ≦ indicates a concatenation operation between the two vectors;

2.3.2) judging whether all new target nodes are selected: if yes, all contents are embedded into the vector pi_iContent embedding vector spliced into node pairs: b_i＝∪π_i(ii) a Otherwise, return to 2.1.1).

And 3, generating a training sample set and a testing sample set.

3.1) obtaining a training sample set and a testing sample set from a real link prediction data set;

in the embodiment, a real link prediction data set is used as a Subreddit data set and a Mooc data set, 2437 node pairs are selected from each data set to serve as a training sample set, and 429 node pairs are selected to serve as a testing sample set.

3.2) randomly selecting 2437 node pairs from all node pairs of the link network to form a training sample set, and acquiring a structure initial vector of the node pairs of the training sample, a content embedding vector of the node pairs and a real label value of the node pairs;

in the embodiment, nodes are selected to form a training sample set by adopting but not limited to a backtracking rule, namely, the nodes are selected in a depth-first mode, and are searched forward according to a preferred condition, if the node selected firstly is found to not meet the requirement, the node is returned to one step for reselection;

3.3) randomly selecting 429 node pairs from all the node pairs left by the link network except the training sample set to form a test sample set, and acquiring a structural initial vector and a content embedded vector of the node pairs of the test sample set;

and 4, constructing a convolutional neural network.

4.1) constructing a convolutional neural network which is formed by cascading 1 sampling layer, 20 convolutional layers, 16 pooling layers, 2 hiding layers and 1 full-connection layer in sequence;

4.2) setting parameters of the convolutional neural network;

referring to fig. 2, the functions and parameter settings of the layers of the convolutional neural network are as follows:

the sampling layer: for the node pair structure initial vector a in step 1_iCarrying out sampling operation by using a bilinear interpolation method to obtain a sampling vector c_i；

The convolutional layer: to sample vector c_iPerforming convolution operation to obtain a convolution vector d_iThe size of convolution kernel used by the convolution layer is 3, the step length is 1, the number of convolution kernels is 32, the activation function of the convolution layer is ReLU, and dropout is 0.2;

the convolution operation is carried out by adopting but not limited to a window convolution method in the embodiment, namely, the sampling vector c_iExtracting local features c of a sequence of sample vectors_ijPerforming window convolution operation with window length of 1 to obtain convolution vector d_i：

d_i＝c_ij·γ_iWherein γ is_iRepresenting a convolution matrix;

the pooling layer: for the convolution vector d_iPerforming mean pooling operation to obtain a structure embedding vector e_i；

The example uses, but is not limited to, an ordinal pooling method to perform the mean pooling operation, i.e., on the convolution vector d_iPerforming sequential pooling operation according to the sequence of the activation value in a pooling domain to obtain a structure embedded vector e_i：

Where t represents the ordinal threshold for selecting activation values to participate in pooling, θ_lA pooling field indicated in the ith feature map, v indicates the index value of the activation value within this pooling field,

an ordinal bit representing an activation value;

the hidden layer: for embedding a vector e into a structure_iAnd the content embedding vector b of the node pair in step 2_iCarrying out Hadamard product operation to obtain a joint embedded vector n_i：

Wherein χ represents the Relu function of the activation function of the hidden layer,

a computation function Hadamard product representing the hidden layer;

the full connection layer: to calculate the predicted tag value f for i batch node pairs_i：

f_i＝δ(n_im_i)，

Where δ represents the mapping function, m_iRepresenting the i batch weight vector.

And 5, training the convolutional neural network.

5.1) setting an initial weight vector m₁；

Firstly, setting the initial value m of the weight vector of each layer_λTo satisfy the normally distributed random numbers with a standard deviation of 0.1 and a mean of 0, all layer weights are used to form an initial weight vector: m is₁＝[m_λ]Wherein, the]To form an operator;

5.2) setting the initial learning rate to eta₁Maximum number of iterations is Q_MAX；

5.3) dividing the training sample set into i batches, inputting the i batches into a convolutional neural network, and training the i batches;

5.3.1) dividing the training sample set into i batches and inputting the i batches into the convolutional neural network designed in the step 4 to obtain a prediction label value f of node pairs of the i batches in the training sample set_i；

5.3.2) predicted tag values f for node pairs from i batches of training samples_iCalculating the predicted label probability of the i batches of training samples

5.3.3) true label value f of node pair according to training sample obtained in step 3_TAnd calculating the real label probability P of the training sample:

5.3.4) calculating the training error l of the loss function i batch training samples_i：

5.3.5) training error l from i batches of training samples_iAnd the learning rate η of the current batch_iCalculating gradient values of convolution kernel parameters in a convolution neural network of a training sample set, updating the convolution kernel parameters according to the obtained gradient values, and finishing one-time training;

the example is not limited to updating the convolution kernel parameters by a gradient descent method, namely, updating the parameters along the gradient direction to solve the optimal solution of neural network convergence;

5.3.6) determining a training error l for the training sample_iWhether it is no longer dropping: if yes, stopping training the network to obtain trainingA refined convolutional neural network.

Otherwise, 5.3.7) is executed.

5.3.7) determining whether the number Q of training rounds has reached the maximum number Q of training rounds_MAX：

If so, stopping training the network to obtain a trained convolutional neural network;

otherwise, increment training round number Q by 1, increment batch i by 1, return to 5.3.1).

And 6, predicting the test sample set.

And inputting the structure vector of the node pair of each sample in the test sample set and the content embedding vector of the node pair into the trained neural network to obtain the predicted label value W of all the node pairs in the test sample set.

And 7, obtaining a link prediction result.

7.1) setting a detection threshold value H;

this example takes but is not limited to 0.5;

7.2) comparing the predicted label value W obtained in the step 6 with a detection threshold value H to obtain a final link prediction result;

if W is larger than H, the link is considered to exist;

if W is less than or equal to H, the link is not existed.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation experiment conditions are as follows:

the operation environment of the simulation experiment of the invention is as follows: the processor is Intel (R) core (TM) i3-9100 CPU @3.60GHz, the memory is 8.00GB, the hard disk is 929G, the operating system is Windows 10, the programming environment is Python3.8, and the programming software is Pycharm Community Edition 2020.2.3x 64.

The datasets used for the simulation were the Subreddit dataset and the Mooc dataset. Wherein the Subreddit data set contains a network of hyperlinks representing directed connections between two sub-hyperlinks. The mocc data set collects user operation logs from the mocc website. The contents of the two data sets comprise node information, node relation information, content information of the nodes and real label value vectors of the node pairs.

The training sample set consists of 2437 nodes in the data set, and the testing sample set consists of 429 nodes in the data set.

The following 5 existing methods are used:

1. a common neighbor method for performing link prediction based on the number of neighbor nodes common to two nodes having equal weights.

2. According to the number of adjacent nodes shared by two nodes with degree similarity, the link prediction is carried out by the Jacarde coefficient correlation method.

3. And the weighted neighbor method carries out link prediction according to the degree preference of the neighbor nodes shared by the two nodes.

4. And (4) measuring whether a preferential connection method of a link exists between the nodes according to the multiplication result of the degrees of the two nodes.

5. And (3) measuring whether a link exists between the nodes according to the adjacent characteristic graph of the adjacent nodes shared by the two nodes.

2. Simulation content and result analysis thereof:

simulation experiment 1: the link prediction accuracy of the present invention was compared with the existing 5 methods described above.

Firstly, using the invention and the above 5 methods, respectively calculating node pair structure vectors and node pair content vectors according to the node information, node relationship information and node content information of each node in the Subreddit data set and the Mooc data set, and performing link prediction to obtain prediction tag values;

secondly, comparing the predicted label value of each method with a detection threshold value to obtain a result of whether the link is successful, and counting the number Q of the links successfully predicted by the node, the number M of the links totally available for prediction and the number N of the links connectable by the node according to the result; respectively utilizing accuracy rate calculation formula

Recall ratio calculation formula

Calculating the accuracy P and the recall ratio R, and calculating a formula by using an F1 score:

calculating an F1 score, wherein λ is 0.5, and the high and low of the F1 score can indicate the high and low of the link prediction accuracy;

finally, the F1 scores for each method were compared, and the results are shown in FIG. 3, where the horizontal axis represents the different methods and the vertical axis represents the F1 score.

As can be seen from fig. 3, the F1 score corresponding to the histogram labeled by the present invention is located above the F1 score corresponding to the histogram labeled by the existing 5 methods, i.e., the F1 score of the present invention is the highest among the 6 methods, indicating that the link prediction accuracy of the present invention is higher than that of the existing 5 methods.

Simulation experiment 2: the link prediction time frequency of the present invention is compared with the above-mentioned conventional 5 methods.

The time frequencies of the method of the present invention and the 5 existing methods in simulation experiment 1 were calculated by using the present invention and the 5 existing methods, respectively, and the time frequencies of the 6 link prediction methods were compared, and the results are shown in table 1.

TABLE 1 time frequency of each link prediction method

The time frequency of the link prediction method can indicate the speed of the link prediction speed, and the shorter the time frequency is, the faster the link prediction speed is.

As can be seen from table 1, the link prediction time frequency of the existing 5 methods is long, and the link prediction time frequency of the present invention is short, which indicates that the link prediction speed of the present invention is higher than that of the existing 5 methods.

Claims

(3) Selecting 2437 node pairs randomly from all node pairs of the link network to form a training sample set, obtaining a structure initial vector, a content embedding vector and a real label value of the node pairs of the training sample node pairs, selecting 429 node pairs randomly from all the remaining node pairs in the link network except the training sample set to form a test sample set, and obtaining a structure initial vector and a content embedding vector of the node pairs of the test sample;

(4) constructing a convolutional neural network which is formed by cascading 1 sampling layer, 20 convolutional layers, 16 pooling layers, 2 hidden layers and 1 full-connection layer in sequence, selecting a Hadamard product as a calculation function of the hidden layers in the hidden layers, selecting a Relu function as an activation function of the hidden layers, and selecting a training error l of a training sample_iAs a function of the loss of the network;

(5) training a convolutional neural network:

if W is larger than H, the link is considered to exist;

if W is less than or equal to H, the link is not existed.

2. The method of claim 1, wherein the structure initial vector of the node pair is calculated according to the sampling result in (1), and the following is implemented:

(1a) selecting a new target node i from the link network;

(1b) randomly selecting an auxiliary node j from a first-order neighbor node set of a new target node i, forming a node pair by the new target node i and the auxiliary node j, and calculating the connection closeness R of the node pair_ij：

(1c) calculating the range closeness S of node pairs_ij：

Wherein, U_gFirst-order neighbor node set representing new target node i and one of auxiliary nodes jThe intersection of the set of order neighbor nodes,

representing the dimension of the intersection of the first-order neighbor node set of the new target node i and the first-order neighbor node of the auxiliary node j;

(1d) calculating an overall closeness J of a node pair_ij：

Where ρ is a mapping vector for mapping high-dimensional dense data to low-dimensional dense data, α represents a responsibility weight for the connection closeness of a node pair,

representing the cascade operation of the multilayer perceptron, and beta represents the responsibility weight of the range closeness of the node pair;

(1e) and judging whether 32 auxiliary nodes j are selected: if yes, the overall closeness J of the 32 node pairs_ijComposing the structural initial vector:

ε_i＝[J_ij]₃₂，

wherein [ ] is a composition operator, perform (1 f);

otherwise, returning to (1 b);

(1f) judging whether all new target nodes are selected: if yes, all the structure initial vectors are processed to obtain the vector epsilon_iInitial vector of structure spliced into node pairs:

a_i＝∪ε_i，

wherein, U is a splicing operator;

otherwise, return to (1 a).

3. The method of claim 1, wherein the node content embedding vector is calculated in (2) according to the content information of each node in the link network, and the following is implemented:

(2a) from within the new target node iObtaining new target node i characteristic vector x from capacity information_iNew target node i eigenvector weight ω_iNew target node i content library feature vector y_i；

(2b) Calculating a new target node i content embedding vector:

where ζ represents a nonlinear activation function leak ReLU function, and ψ represents a deviation of a content embedding vector;

(2c) obtaining the feature vector x of the auxiliary node j from the content information of the auxiliary node j_jWeight ω of feature vector of auxiliary node j_jAuxiliary node j content library feature vector y_jAnd calculating the content embedding vector of the auxiliary node j:

4. the method of claim 1, wherein (2) the content embedding vector b for the node pair is calculated_iThe implementation is as follows:

(2d) computing content embedding vector pi_i：

Where ξ represents the nonlinear activation function, τ represents the content embedding responsibility weight for the new target node i,

representing a cascaded operation between two vectors,

representing content embedding responsibility of secondary node jA weight;

(2e) judging whether all new target nodes are selected: if yes, all contents are embedded into the vector pi_iContent embedding vector spliced into node pairs:

b_i＝∪π_i，

wherein, U is the splicing operator, otherwise, return to (2 a).

5. The method of claim 1, wherein (4) the convolutional neural network is constructed, and functions and parameters of each layer are set as follows:

the sampling layer: for node pair structure initial vector a in (1)_iCarrying out sampling operation by using a bilinear interpolation method to obtain a sampling vector c_i；

The hidden layer: for embedding a vector e into a structure_iAnd (2) the content embedding vector b of the node pair_iCarrying out Hadamard product operation to obtain a joint embedded vector n_i：

a computation function Hadamard product representing the hidden layer;

f_i＝δ(n_im_i)，

6. The method of claim 1, wherein (5) an initial weight vector m is set₁Firstly, the initial value m of the weight vector of each layer is set_λTo satisfy the normally distributed random numbers with a standard deviation of 0.1 and a mean of 0, all layer weights are used to form an initial weight vector:

m₁＝[m_λ]，

where [ ] is a constituent operator.

7. The method of claim 1, wherein (5) the convolutional neural network is trained as follows:

(5a) dividing the training sample set into i batches and inputting the i batches into the convolutional neural network designed in the step (4) to obtain the predicted label value f of the i batches of node pairs in the training sample set_i；

(5b) Predicting label value f according to node pairs of i batches of training samples_iCalculating the predicted label probability of the i batches of training samples

(5c) According to the real label value f of the node pair of the training sample obtained in the step (3)_TAnd calculating the real label probability P of the training sample:

(5d) calculating a training error l of a loss function i batch of training samples_i：

(5e) Training error l of training samples according to i batches_iAnd the learning rate η of the current batch_iCalculating gradient values of convolution kernel parameters in a convolution neural network of a training sample set, updating the convolution kernel parameters according to the obtained gradient values, and finishing one-time training;

(5f) judging training error l of training sample_iWhether it is no longer dropping:

otherwise, executing (5 g);

(5g) judging whether the number Q of training wheels reaches the maximum number Q of training wheels_MAX：

otherwise, increment training round number Q by 1, increment batch i by 1, and return to (5 a).