WO2023000574A1

WO2023000574A1 - Model training method, apparatus and device, and readable storage medium

Info

Publication number: WO2023000574A1
Application number: PCT/CN2021/134051
Authority: WO
Inventors: 胡克坤; 董刚; 赵雅倩; 李仁刚
Original assignee: 浪潮(北京)电子信息产业有限公司
Priority date: 2021-07-21
Filing date: 2021-11-29
Publication date: 2023-01-26
Also published as: CN113705772A

Abstract

A model training method, apparatus and device, and a readable storage medium. By means of the method, two Chebyshev graph convolutional neural networks are designed, one of which performs supervised training on the basis of a vertex feature matrix, an adjacency matrix and a label matrix, and the other of which performs unsupervised training on the basis of the vertex feature matrix, a positive pointwise mutual information matrix, and an output of the previous network during the training process; and when a target loss value determined on the basis of loss values of the two Chebyshev graph convolutional neural networks meets a preset convergence condition, the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to obtain a vertex classification model with better performance by means of training. By means of the method, respective advantages of supervised training and unsupervised training can be brought into full play, thereby improving the performance of a vertex classification model.

Description

A model training method, device, equipment and readable storage medium

This application claims the priority of the Chinese patent application submitted to the China Patent Office on July 21, 2021, with the application number 202110825194.9, and the title of the invention is "a model training method, device, equipment, and readable storage medium", the entire content of which Incorporated in this application by reference.

technical field

The present application relates to the field of computer technology, in particular to a model training method, device, equipment and readable storage medium.

Background technique

With the rapid development of information technologies such as cloud computing, the Internet of Things, mobile communications, and smart terminals, new applications represented by social networks, communities, and blogs are widely used. These applications continue to generate a large amount of data, which is convenient for modeling and analysis with graphs. Among them, the vertices of the graph represent individuals or groups, and the connecting edges represent the connections between them; the vertices are usually attached with label information to represent the age, gender, location, hobbies and religious beliefs of the modeled objects, and many others possible features. These characteristics reflect individual behavior preferences from various aspects. Ideally, each social network user has all tags related to his own characteristics. But the reality is not the case. This is because, for the purpose of protecting personal privacy, more and more social network users are more cautious when sharing personal information, so that social network media can only collect part of the user's information. Therefore, how to infer the tags of the remaining users based on the tag information of known users is particularly important and urgent. This problem is the vertex classification problem.

Aiming at the inadequacy of traditional machine learning methods to deal with graph data, a wave of research on graph neural networks has gradually emerged in academia and industry. Graph neural network, simply put, is a deep learning architecture for graph-structured data, which combines end-to-end learning with inductive reasoning, and is expected to solve problems such as causal reasoning and interpretability that traditional deep learning architectures cannot handle. series of bottlenecks.

According to different implementation principles, graph convolutional neural networks can be divided into two types based on spatial methods and spectral methods. Among them, the former uses the information propagation mechanism displayed on the graph, which lacks interpretability; the latter uses the Laplacian matrix of the graph as a tool, has a good theoretical basis, and is the mainstream direction of graph convolutional neural network research. However, the current graph convolutional neural networks based on spectral methods do not perform well when applying graph vertex classification tasks, that is, the existing graph convolutional neural network-based vertex classification models perform poorly.

Therefore, how to improve the performance of the vertex classification model is a problem to be solved by those skilled in the art.

Contents of the invention

In view of this, the purpose of the present application is to provide a model training method, device, device and readable storage medium to improve the performance of the vertex classification model. The specific plan is as follows:

In a first aspect, the present application provides a model training method, including:

Obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph dataset;

performing random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;

Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;

Inputting the vertex feature matrix and the positive point-wise mutual information matrix into a second Chebyshev graph convolutional neural network to output a second training result;

calculating a first loss value between the first training result and the label matrix;

calculating a second loss value between the second training result and the first training result;

determining a target loss value based on the first loss value and the second loss value;

If the target loss value meets the preset convergence condition, combining the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model.

Preferably, the random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including:

Based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;

Randomly sample all context paths to determine the number of co-occurrences of any two vertices and build a matrix of vertex co-occurrences;

Based on the vertex co-occurrence times matrix, the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.

Preferably, the calculating the first loss value between the first training result and the label matrix includes:

Based on the cross-entropy principle, the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.

Preferably, said calculating a second loss value between said second training result and said first training result includes:

calculating the difference between elements having the same coordinates in the second training result and the first training result, and using the sum of squares of all differences as the second loss value.

Preferably, the determining the target loss value based on the first loss value and the second loss value includes:

inputting the first loss value and the second loss value into a loss function to output the target loss value;

Wherein, the loss function is: ls=ls _S +αls _U , ls is the target loss value, ls _S is the first loss value, ls _U is the second loss value, and α is the adjusted second loss value Constant for the proportion of the value in the destination loss value.

Preferably, if the target loss value does not meet the preset convergence condition, updating the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value Network parameters, and iteratively training the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;

Wherein, the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:

After updating the network parameters of the first Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the second Chebyshev graph convolutional neural network;

or

After updating the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the first Chebyshev graph convolutional neural network;

or

After the new network parameters are calculated according to the target loss value, the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.

Preferably, both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Feature transformation and graph convolution operations;

Among them, the feature transformation formula of the lth (1≤l≤L) layer graph convolution layer is:

The graph convolution operation formula of the lth (1≤l≤L) layer graph convolution layer is:

Among them, Q _l is the vertex feature matrix of the _lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and _Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network;

is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network; σ is a nonlinear activation function; K<<n is the order of the polynomial; n is the The number of vertices; θ _k is the coefficient of the polynomial; T _k (x)=2xT _k-1 (x)-T _k-2 (x), and T ₀ =1, T ₁ =x is the Chebyshev polynomial;

is the Laplacian matrix of the graph dataset,

is the Laplacian matrix after linear transformation.

In a second aspect, the present application provides a model training device, including:

The obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;

A sampling module, configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;

The first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;

The second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;

a first calculation module, configured to calculate a first loss value between the first training result and the label matrix;

a second calculation module, configured to calculate a second loss value between the second training result and the first training result;

a determining module, configured to determine a target loss value based on the first loss value and the second loss value;

A combination module, configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .

In a third aspect, the present application provides a model training device, including:

memory for storing computer programs;

A processor is configured to execute the computer program to implement the model training method disclosed above.

In a fourth aspect, the present application provides a readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned disclosed model training method is implemented.

It can be seen from the above scheme that the present application provides a model training method, including: obtaining the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set; performing random walk and sampling based on the adjacency matrix to obtain the positive point-by-point Mutual information matrix; Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result; Input the vertex feature matrix and the positive point-by-point mutual information matrix A second Chebyshev graph convolutional neural network to output a second training result; calculate a first loss value between the first training result and the label matrix; calculate the second training result and the first the second loss value between the training results; determine the target loss value based on the first loss value and the second loss value; if the target loss value meets the preset convergence condition, the first Chebyshev The graph convolutional neural network and the second Chebyshev graph convolutional neural network are combined into a dual vertex classification model.

It can be seen that this application designs two Chebyshev graph convolutional neural networks, the first Chebyshev graph convolutional neural network is based on vertex feature matrix, adjacency matrix, and label matrix for supervised training, while the second Chebyshev graph convolutional neural network The product neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss value determined based on the loss value of the two When the preset convergence conditions are met, the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model to train a vertex classification model with better performance. This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.

Correspondingly, the model training device, equipment and readable storage medium provided by the present application also have the above-mentioned technical effects.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present application, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.

Fig. 1 is a schematic structural diagram of a graph convolutional neural network disclosed in the present application;

Fig. 2 is a flow chart of a model training method disclosed in the present application;

Fig. 3 is a schematic diagram of the data trend of a dual Chebyshev graph convolutional neural network disclosed in the present application;

4 is a schematic diagram of a dual Chebyshev graph convolutional neural network disclosed in the present application;

5 is a flow chart of a model construction and training method disclosed in the present application;

6 is a schematic diagram of a model training device disclosed in the present application;

FIG. 7 is a schematic diagram of a model training device disclosed in the present application.

detailed description

The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

In order to facilitate the understanding of this application, the graph neural network and graph dataset are introduced first.

It should be noted that using graphs to model and analyze data and the relationship between data has important academic and economic value. For example, (1) to study the law of infectious diseases and ideas spread over time in social networks; (2) to study how groups in social networks form communities around specific interests or affiliations, and the strength of community connections; (3) The social network discovers people with similar interests according to the law of "grouping people into groups", and suggests or recommends new links or connections to them; (4) The question answering system guides questions to the people with the most relevant experience; advertising Advertisements are shown to individuals who are most interested and willing to receive advertisements on a particular topic.

Therefore, it is necessary to infer the labels of the remaining users based on the label information of the known users. This problem is the vertex classification problem, which can be formally described as: Given a graph G=(V,E), V represents the set of vertices, and E represents A collection of connected edges, V _L is a subset of V, and the vertices in V _L have assigned labels. The graph vertex classification problem solves: how to infer the label of each vertex in the set V\V _L of the remaining vertices. Unlike traditional classification problems, it cannot be solved directly by applying classification methods in traditional machine learning, such as support vector machines, k-nearest neighbors, decision trees, and naive Bayes. This is because traditional classification methods usually assume that objects are independent, and the classification results are not precise. But in graph vertex classification, different objects, that is, vertices, are not independent of each other, on the contrary, they have complex dependencies, and these relationships must be fully utilized to improve the quality of classification.

A graph neural network usually consists of an input layer, one or more graph convolutional layers, and an output layer. According to the structural characteristics, graph neural networks can be divided into graph convolutional neural networks, graph recurrent neural networks, graph autoencoders, graph generative networks, and spatiotemporal graph neural networks. Among them, the graph convolutional neural network has attracted the attention of many scholars due to the great success of the traditional convolutional neural network in the fields of image processing and natural language understanding.

See Figure 1, Figure 1 shows the structure of a typical graph convolutional neural network, which consists of an input layer (Input layer), two graph convolution layers (Gconv layer), and an output layer (Output layer) composition. Among them, the input layer reads the n*d-dimensional vertex attribute matrix X; the graph convolution layer performs feature extraction on X, and passes it to the next graph convolution layer after nonlinear activation functions such as ReLu transformation; finally, the output layer is the task Layer, to complete specific tasks such as vertex classification, clustering, etc.; the figure shows a vertex classification task layer, which outputs the category label Y of each vertex.

However, due to the graph convolutional neural network based on the spectral method, the performance is not ideal when applying the graph vertex classification task. The main reason is: (1) The calculation overhead of the eigendecomposition of the Laplacian matrix is relatively large, which is O(n ³ ); (2) The target loss function defined by adding a regular term (ls = ls _s + α ls _reg , ls _S and ls _reg denote the supervised learning loss function and the regular term defined based on the graph topology, respectively) depends on the "phase Neighboring vertices have similar labels" local consistency assumption, which limits the ability of graph neural network models, because the connecting edges in the graph do not encode the similarity between nodes, but they can contain additional information.

To this end, the present application provides a model training solution that can combine supervised and unsupervised learning to effectively improve the accuracy of classification, effectively reduce the computational complexity of the network, and improve classification efficiency.

Referring to Figure 2, the embodiment of the present application discloses a model training method, including:

S201. Obtain a vertex feature matrix, an adjacency matrix, and a label matrix constructed based on the graph data set.

Assume that the graph data set to be classified is G=(V, E), and V represents the vertex set, which is divided into two parts: a small number of vertex sets V _L with class labels and most of the vertex sets V _U without class labels, and satisfy V _L ∪ V _U = V,

E represents the set of connected edges. In addition to the label, each vertex v of G has d features, and the features of all vertices constitute the n*d-dimensional vertex feature matrix X. The adjacency matrix of G is denoted as A, and the element A _ij represents the weight of the connection edge between vertices i and j.

According to the vertex set V _L with existing labels, an n*C-dimensional label matrix Y is constructed. Among them, n=|V| indicates the number of all vertices in the graph, C indicates the number of label categories of all vertices, and matrix element Y _ij indicates whether the category label of vertex i is j (j=1, 2, ..., C). When vertex i already has a class label, set the jth column element to 1, and the other column elements to 0, that is: Y _ij =1 (when k=j) or 0 (when k≠j). When the vertex i has no category label, the elements of each column corresponding to the row are set to 0.

For example: Building a graph dataset based on the Pubmed dataset. The Pubmed dataset contains 19,717 scientific publications in 3 categories with 44,338 citation links between publications. Publications and the links between them form a citation network, and each publication in the network uses a term frequency-inverse text frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) vector to describe the feature vector, which has 500 from a dictionary of terms. The feature vectors of all documents form the feature matrix X. The goal is to classify each document, randomly sample 20 instances of each category as labeled data, use 1000 instances as test data, and use the rest as unlabeled data; construct a vertex label matrix Y. According to the citation relationship between papers, construct its adjacency matrix A. Calculate the transition probability between any two vertices according to A; conduct a random walk of length u for each vertex v _j to obtain the path π _j ; randomly sample π _j to calculate the frequency P _ij of vertex v _i appearing on the path π _j , and then get the positive point-wise mutual information matrix P.

Of course, graph datasets can also be constructed based on proteins, graph images, etc. to classify proteins, graph images, etc.

S202. Perform random walk and sampling based on the adjacency matrix to obtain a positive point-wise mutual information matrix.

According to the adjacency matrix A, based on the random walk and random sampling techniques, the positive point-wise mutual information matrix of the globally consistent information of the coding graph can be constructed. Specifically, the adjacency matrix has two functions in random walk engineering. First, it represents the topological structure of the graph. According to it, it can be known which vertices are connected and can walk from one vertex to adjacent vertices; second. , is used to determine the probability of random walk, see formula (1) for details, a vertex may have multiple neighbors, in a random walk step, the walker can randomly pick one among all its neighbors.

In a specific implementation, random walk and sampling are performed based on the adjacency matrix to obtain a positive point-wise mutual information matrix, including: based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain Context path for each vertex; randomly sample all context paths to determine the co-occurrence times of any two vertices, and construct a vertex co-occurrence matrix; based on the vertex co-occurrence matrix, calculate the vertex and context co-occurrence probability and corresponding The marginal probability of , and determine each element in the positive pointwise mutual information matrix.

Wherein, the "co-occurrence probability of a vertex and a context" refers to: the probability pr(v _i , ct _j ) of a certain vertex v _i appearing in a certain context ct _j . In other words, the probability pr(v _i , ct _j ) of vertex v _i is included in ct _j . After getting all the co-occurrence probabilities of vertices and contexts, they form a matrix, namely the matrix of co-occurrence times of vertices. The marginal probability of vertex v _i is equal to the sum of elements in row i in this matrix divided by the sum of all elements in this matrix. The marginal probability of context ct _j is equal to the sum of elements in column j divided by the sum of all elements in this matrix.

The positive point-wise mutual information matrix can be represented by P, which can encode the global consistency information of the graph, and can be determined by referring to the following content:

Suppose the row vector pi, _: is the embedded representation of the vertex v _i , the column vector p _{:, j} is the embedded representation of the context ct _j , and pi _j represents the probability that the vertex v _i appears in the context ct _j , then the pointwise The mutual information matrix P can be obtained by random walk on the graph dataset. Specifically, consider the context ct _j of vertex v _j as a path π _j with v _j as the root node and length u, then p _ij can be obtained by calculating the frequency of vertex v _i appearing on the path π _j . Without loss of generality, suppose that the number of the graph vertex where a random walker is at time τ is x(τ), and x(τ)=v _i , then the probability t _ij of walking to its neighbor vertex v _j at time τ+1 Expressed by formula (1): t _ij =pr(x(τ+1)=v _j |x(τ)=v _i )=A _ij /∑ _j A _ij .

According to the formula (1), each vertex in the graph data set is randomly walked with a length of u steps, and the path π representing the context of the vertex can be obtained. Random sampling is performed on π to calculate the number of co-occurrences of any two vertices, and the vertex is obtained - context co-occurrence times matrix O (ie vertex co-occurrence times matrix). In this matrix O, the element o _ij represents the number of times that vertex v _i appears on the context ct _j , that is, the path π _j with vertex v _j as the root node, which can be used for subsequent calculation of p _ij . Calculate the vertex and context co-occurrence probability and the corresponding edge probability based on the vertex co-occurrence times matrix O. Note that the co-occurrence probability of vertex v _i and context ct _j and the corresponding edge probability are pr(v _i , ct _j ), pr(v _i ) and pr(ctj) respectively, then there is formula (2):

Combined with formula (2), the value of element P _ij in the positive point-wise mutual information matrix P can be calculated by the following formula: p _ij =max(log(pr(v _i , ct _j )/(pr(v _i )pr (ct _j )), 0).

Based on this, the value of each element in the positive point-wise mutual information matrix P can be determined, thereby determining the positive point-wise mutual information matrix P.

S203. Input the vertex feature matrix and adjacency matrix into the first Chebyshev graph convolutional neural network to output a first training result.

S204. Input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output a second training result.

In a specific implementation manner, the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are identical, and both include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used for Perform feature transformation and graph convolution operations on the input data;

is the transposition matrix of the feature transformation matrix that needs to be learned by the first graph convolutional layer of the graph convolutional neural network; σ is the nonlinear activation function; K<<n is the order of the polynomial; n is the number of vertices in the graph dataset Number; θ _k is the coefficient of polynomial; T _k (x)=2xT _k-1 (x)-T _k-2 (x), and T ₀ =1, T ₁ =x is Chebyshev polynomial;

is the Laplacian matrix of the graph dataset,

is the Laplacian matrix after linear transformation.

in,

λ _max is

The largest eigenvalue in , I _n is an n*n-dimensional identity matrix.

S205. Calculate a first loss value between the first training result and the label matrix.

In a specific implementation manner, calculating the first loss value between the first training result and the label matrix includes: based on the cross-entropy principle, using the difference degree of the probability distribution between the first training result and the label matrix as the first loss value (i.e. supervised loss).

S206. Calculate a second loss value between the second training result and the first training result.

In a specific implementation manner, calculating the second loss value between the second training result and the first training result includes: calculating the difference between elements with the same coordinates in the second training result and the first training result, and The sum of squares of all differences is used as the second loss value (i.e. unsupervised loss).

S207. Determine a target loss value based on the first loss value and the second loss value.

In a specific implementation manner, determining the target loss value based on the first loss value and the second loss value includes: inputting the first loss value and the second loss value into a loss function to output a target loss value; wherein, the loss function is : ls=ls _S +αls _U , ls is the target loss value, ls _S is the first loss value, ls _U is the second loss value, and α is a constant for adjusting the proportion of the second loss value in the target loss value.

S208. If the target loss value meets the preset convergence condition, combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model.

In a specific implementation, if the target loss value does not meet the preset convergence condition, the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition.

Wherein, updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes: updating the first Chebyshev graph convolutional neural network according to the purpose loss value After network parameters, share the updated network parameters to the second Chebyshev graph convolutional neural network; or update the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, and update the updated The network parameters are shared to the first Chebyshev graph convolutional neural network; or after the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph Convolutional neural network.

It can be seen that in this embodiment, two Chebyshev graph convolutional neural networks are designed. The first Chebyshev graph convolutional neural network performs supervised training based on the vertex feature matrix, adjacency matrix, and label matrix, while the second Chebyshev graph convolutional neural network The convolutional neural network performs unsupervised training based on the vertex feature matrix, the positive point-wise mutual information matrix and the output of the first Chebyshev graph convolutional neural network during the training process; when the target loss determined based on the loss values of the two When the values meet the preset convergence conditions, the two Chebyshev graph convolutional neural networks are combined into a dual vertex classification model, and a vertex classification model with better performance is trained. This scheme can give full play to the respective advantages of supervised training and unsupervised training, and improves the performance of the vertex classification model.

Based on the above embodiments, it should be noted that the dual vertex classification model can also be called a dual Chebyshev graph convolutional neural network (DCGCN, Dual Chebyshev Graph Convolutional Neural Network). In order to train the dual Chebyshev graph convolutional neural network, it is necessary to first determine the network structure, loss function, initialization strategy, network parameter update method, etc.

1. Network structure.

The dual Chebyshev graph convolutional neural network includes two identical Chebyshev graph convolutional neural networks ChebyNet with shared parameters, and each ChebyNet consists of an input layer, L graph convolutional layers and an output layer.

Please refer to Figure 3, remember that the two ChebyNets are ChebyNet _A and ChebyNet _P respectively. ChebyNet _A takes the adjacency matrix A and vertex feature matrix X of encoding graph local consistency information as input data, and outputs the vertex category label prediction matrix Z _A ; ChebyNet _P uses the positive point-wise mutual information matrix P and vertex feature encoding graph global consistency information The feature matrix X is used as input data, and the vertex category label prediction matrix Z _P is output.

Among them, ChebyNet _A performs supervised learning based on some labeled graph vertices, and the prediction accuracy is high; under the guidance of the former (using its prediction result Z _A ), ChebyNet _P uses unlabeled graph vertices for unsupervised learning to improve Prediction accuracy for better vertex classification models. After the training of ChebyNet _A and ChebyNet _P , Z _A and Z _P are consistent or the difference is negligible, so Z _A or Z _P can be used as the output of the dual Chebyshev graph convolutional neural network.

Figure 4 illustrates the structure of a dual Chebyshev graph convolutional neural network. The convolutional layer in Figure 4 is the graph convolutional layer described below.

Among them, the input layer is mainly responsible for reading the graph data to be classified, including the vertex feature matrix X, the adjacency matrix A representing the topology of the graph, and the positive point-by-point mutual information matrix P that encodes the global consistency information of the graph.

Definition of the lth (1≤l≤L) graph convolution layer: In order to reduce the network parameters, the graph convolution operation of the lth hidden layer is decomposed into two stages of feature transformation and graph convolution.

Among them, the feature transformation formula is:

The graph convolution operation formula is:

is the Laplacian matrix of the graph dataset,

is the Laplacian matrix after linear transformation. Among them, _H1 is the vertex feature matrix X.

in,

λ _max is

The largest eigenvalue in , I _n is an n*n-dimensional identity matrix.

It should be noted,

Depend on

(denoted as a formula) is simplified, and the simplification process can refer to the following content:

where U is given by the Laplacian matrix of the graph G

A matrix composed of eigenvectors obtained by eigendecomposition; U ^-1 is the inverse matrix of U; Λ is a diagonal matrix of eigenvalues, and the elements on the diagonal are λ ₁ , λ ₂ ,…,λ _n .

is the graph convolution kernel matrix of the l-th graph convolution layer, and is defined as:

It should be noted that θ _k represents the order of the polynomial, which can limit the information to propagate at most K steps at each vertex. Therefore, only K+1 parameters are required, which greatly reduces the complexity of the model training process. due to the formula

The calculation of the convolution kernel matrix involves the eigendecomposition of the graph Laplacian matrix, which is computationally expensive. Therefore, on this basis, the present embodiment uses the Chebyshev polynomials to design an approximate calculation scheme, and

Approximately:

Wherein, T _k (x)=2xT _k-1 (x)-T _k-2 (x), and T ₀ =1, T ₁ =x is a Chebyshev polynomial, which can be solved recursively;

Is a diagonal matrix that can map the eigenvalue diagonal matrix to [-1,1].

Will

substitute

available

in,

The output layer is defined as

Z is an n*C-dimensional matrix, and each column vector Z _j represents the probability that all vertices belong to category j, that is, its kth element (1≤k≤n) indicates that vertex k belongs to category j (j=1 ,2,…,C) probability.

2. Loss function.

The loss function of the dual Chebyshev graph convolutional neural network consists of two parts: the supervised learning loss ls _S with labeled vertices and the unsupervised learning loss ls _U for unlabeled vertices.

Among them, ChebyNet _A takes the adjacency matrix A and the vertex feature matrix X as input for supervised learning, and compares the vertex label prediction result Z _A with the known vertex label matrix Y to calculate the supervised learning loss. ChebyNet _P takes the positive point-wise mutual information matrix and vertex feature matrix X as input for unsupervised learning, and compares its prediction result Z _P with ChebyNet _A 's prediction result Z _A to calculate the unsupervised learning loss. Accordingly, the loss function of the dual Chebyshev graph convolutional neural network can be expressed as:

Among them, α is a constant used to adjust the proportion of unsupervised learning loss in the entire loss function.

Among them, the supervised learning loss function calculates the degree of difference between the actual label probability distribution and the predicted label probability distribution of the vertex based on the principle of cross entropy; the unsupervised learning loss function calculates the sum of squares of the difference between the same coordinate elements of Z _P and Z _A.

3. Initialize the strategy.

The initialization strategy of network parameters can choose normal distribution random initialization, Xavier initialization or He Initialization initialization, etc. Network parameters include feature transformation matrix Θ _l and convolution kernel F _l .

4. Network parameter update method.

The network parameters can be corrected and updated according to stochastic gradient descent (StochasticGradientDescent, SGD), momentum gradient descent (MomentumGradientDescent, MGD), NesterovMomentum, AdaGrad, RMSprop and Adam (AdaptiveMomentEstimation) or batch gradient descent (BatchGradientDescent, BGD), etc., to Optimize the loss function value.

After determining the network structure, loss function, initialization strategy, network parameter update method, etc., the training process of the dual Chebyshev graph convolutional neural network can be carried out with reference to Figure 5, specifically including: for the graph data set G, construct the vertex feature matrix X , the positive point-by-point mutual information matrix P of the global consistency information of the encoded graph, the adjacency matrix A of the local consistency information of the encoded graph, and the vertex label matrix Y; the vertex feature matrix X and the adjacency matrix A are input into ChebyNet _A , and the positive point-by-point mutual information The information matrix P and vertex feature matrix X are input into ChebyNet _P , and the network parameters are updated according to the above loss function to train ChebyNet _A and ChebyNet _P. If the value of the loss function reaches a specified smaller value or the number of iterations reaches the specified maximum value, the training ends and a dual Chebyshev graph convolutional neural network is obtained. At this time, for a vertex i∈V _U without a class label, the class j it should belong to can be obtained according to the vertex label matrix Y.

During the training process, according to the definition of the graph convolutional layer, combined with the input feature matrix of this layer, the output feature matrix of each layer is calculated; according to the definition of the output layer, the probability Z _j (1 ≤j≤C), and calculate the loss function value according to the loss function defined above; for an unlabeled vertex v _i ∈ V _U , take the category with the highest probability as the latest category of the vertex to update the vertex label matrix Y.

In this scheme, the dual Chebyshev graph convolutional neural network is composed of two Chebyshev graph convolutional neural networks with the same structure and shared parameters. The two perform supervised learning and unsupervised learning respectively, which can improve the network The convergence rate and prediction accuracy; at the same time, the graph convolution layer is defined based on the graph Fourier transform, and the graph convolution operation is divided into two stages of feature transformation and graph convolution, which can reduce the amount of network parameters; based on the spectral graph theory, The graph convolution kernel is defined as a polynomial convolution kernel, which ensures the locality of the graph convolution calculation; in order to reduce the computational complexity, the Chebyshev polynomial is used to approximate the graph convolution.

It can be seen that this embodiment provides a training method for a dual Chebyshev graph convolutional neural network, which can solve the problem of vertex classification. First, graph modeling is performed on the collected data set to obtain its adjacency matrix and vertex feature matrix; based on the adjacency matrix, for each vertex, a random walk of a specific length is carried out on the graph, and the resulting walk is Sequence sampling obtains a positive point-by-point mutual information matrix, which represents the context information of vertices; defines the convolution operation according to the spectral graph theory, constructs the graph convolution layer for feature extraction and the output layer for vertex classification tasks, builds and trains Chebyshev graph convolutional neural network; at the end of training, classification predictions for unlabeled vertices in the graph are available.

Compared with the classification system with only a single graph convolutional neural network, this method can learn more graph topology information, including the local consistency and global consistency of each vertex, due to the design strategy of the dual graph convolutional neural network. The characteristic information greatly improves the learning ability of the model; and, at the same time, using the graph topology and attribute characteristics of vertices, combined with supervised and unsupervised learning, effectively improves the accuracy of classification; with the help of Chebyshev polynomials to approximate the calculation of graph convolution, Avoiding the expensive matrix eigendecomposition operation effectively reduces the computational complexity of the network and improves the classification efficiency of the network.

A model training device provided in the embodiment of the present application is introduced below, and a model training device described below and a model training method described above may refer to each other.

Referring to Figure 6, the embodiment of the present application discloses a model training device, including:

Obtaining module 601, used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;

The sampling module 602 is used to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;

The first training module 603 is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;

The second training module 604 is used to input the vertex feature matrix and the positive point-by-point mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;

The first calculation module 605 is used to calculate the first loss value between the first training result and the label matrix;

A second calculation module 606, configured to calculate a second loss value between the second training result and the first training result;

A determining module 607, configured to determine a target loss value based on the first loss value and the second loss value;

The combination module 608 is configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets the preset convergence condition.

In a specific implementation manner, the sampling module is specifically used for:

Based on the adjacency matrix, a random walk of preset length is performed on each vertex in the graph dataset to obtain the context path of each vertex;

Randomly sample all context paths to determine the number of co-occurrences of any two vertices and construct a matrix of vertex co-occurrences;

Based on the vertex co-occurrence times matrix, the vertex and context co-occurrence probability and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.

In a specific implementation manner, the first calculation module is specifically used for:

Based on the principle of cross entropy, the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.

In a specific implementation manner, the second calculation module is specifically used for:

Calculate the difference between elements with the same coordinates in the second training result and the first training result, and use the sum of squares of all differences as the second loss value.

In a specific implementation manner, the determination module is specifically used for:

Input the first loss value and the second loss value into the loss function to output the target loss value;

Among them, the loss function is: ls=ls _S + αls _U , ls is the target loss value, ls _S is the first loss value, ls _U is the second loss value, and α is the ratio of the second loss value to the target loss value proportional constant.

In a specific implementation, if the target loss value does not meet the preset convergence condition, the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, And perform iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;

Among them, the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network are updated according to the target loss value, including:

After updating the network parameters of the first Chebyshev graph convolutional neural network according to the target loss value, the updated network parameters are shared to the second Chebyshev graph convolutional neural network;

or

After updating the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, the updated network parameters are shared to the first Chebyshev graph convolutional neural network;

or

After the new network parameters are calculated according to the target loss value, the new network parameters are shared to the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.

In a specific implementation, both the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network include an L-layer graph convolutional layer, and the L-layer graph convolutional layer is used to process the input data Perform feature transformation and graph convolution operations;

is the Laplacian matrix of the graph dataset,

is the Laplacian matrix after linear transformation.

For the more specific working process of each module and unit in this embodiment, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.

It can be seen that this embodiment provides a model training device, which can give full play to the respective advantages of supervised training and unsupervised training, and improve the performance of the vertex classification model.

The following introduces a model training device provided in the embodiment of the present application, and the model training device described below and the model training method and device described above may refer to each other.

Referring to Figure 7, the embodiment of the present application discloses a model training device, including:

Memory 701, used to store computer programs;

The processor 702 is configured to execute the computer program, so as to implement the method disclosed in any of the foregoing embodiments.

A readable storage medium provided by an embodiment of the present application is introduced below. The readable storage medium described below and the model training method, device, and equipment described above may refer to each other.

A readable storage medium is used to store a computer program, wherein the computer program implements the model training method disclosed in the foregoing embodiments when executed by a processor. Regarding the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.

"First", "second", "third", "fourth" and the like referred to in the present application, if any, are used to distinguish similar objects and not necessarily to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, e.g. a process, method or apparatus comprising a series of steps or elements is not necessarily limited to those steps or elements explicitly listed , but may include other steps or elements not explicitly listed or inherent to the process, method or apparatus.

It should be noted that the descriptions in this application involving "first", "second" and so on are for descriptive purposes only, and should not be understood as indicating or implying their relative importance or implicitly indicating the number of indicated technical features . Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In addition, the technical solutions of the various embodiments can be combined with each other, but it must be based on the realization of those skilled in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that the combination of technical solutions does not exist , nor within the scope of protection required by the present application.

Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.

The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known readable storage medium.

In this paper, specific examples are used to illustrate the principle and implementation of the application. The description of the above embodiments is only used to help understand the method and core idea of the application; at the same time, for those of ordinary skill in the art, according to the application There will be changes in the specific implementation and scope of application. In summary, the content of this specification should not be construed as limiting the application.

Claims

A model training method, characterized in that, comprising:

Obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph dataset;

performing random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;

Input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;

Inputting the vertex feature matrix and the positive point-wise mutual information matrix into a second Chebyshev graph convolutional neural network to output a second training result;

calculating a first loss value between the first training result and the label matrix;

calculating a second loss value between the second training result and the first training result;

determining a target loss value based on the first loss value and the second loss value;

If the target loss value meets the preset convergence condition, combining the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model.
The model training method according to claim 1, wherein said random walk and sampling are performed based on said adjacency matrix to obtain a positive point-by-point mutual information matrix, comprising:

Based on the adjacency matrix, a random walk of a preset length is performed on each vertex in the graph data set to obtain a context path of each vertex;

Randomly sample all context paths to determine the number of co-occurrences of any two vertices and construct a matrix of vertex co-occurrences;

Based on the vertex co-occurrence times matrix, the co-occurrence probability of the vertex and the context and the corresponding edge probability are calculated, and each element in the positive point-wise mutual information matrix is determined.
The model training method according to claim 1, wherein the calculating the first loss value between the first training result and the label matrix comprises:

Based on the cross-entropy principle, the degree of difference in probability distribution between the first training result and the label matrix is used as the first loss value.
The model training method according to claim 1, wherein said calculating a second loss value between said second training result and said first training result comprises:

calculating the difference between elements having the same coordinates in the second training result and the first training result, and using the sum of squares of all differences as the second loss value.
The model training method according to claim 1, wherein said determining a target loss value based on said first loss value and said second loss value comprises:

inputting the first loss value and the second loss value into a loss function to output the target loss value;

Wherein, the loss function is: ls=ls S +αls U , ls is the target loss value, ls S is the first loss value, ls U is the second loss value, and α is the adjusted second loss value Constant for the proportion of the value in the destination loss value.
The model training method according to any one of claims 1 to 5, characterized in that,

If the target loss value does not meet the preset convergence condition, updating network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the target loss value, And performing iterative training on the updated first Chebyshev graph convolutional neural network and the updated second Chebyshev graph convolutional neural network until the target loss value meets the preset convergence condition;

Wherein, the updating the network parameters of the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network according to the purpose loss value includes:

After updating the network parameters of the first Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the second Chebyshev graph convolutional neural network;

or

After updating the network parameters of the second Chebyshev graph convolutional neural network according to the target loss value, sharing the updated network parameters to the first Chebyshev graph convolutional neural network;

or

After the new network parameters are calculated according to the target loss value, the new network parameters are shared with the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network.
The model training method according to any one of claims 1 to 5, wherein the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network both include L-layer graphs Convolution layer, the L-layer graph convolution layer is used to perform feature transformation and graph convolution operations on input data;

Among them, the feature transformation formula of the lth (1≤l≤L) layer graph convolution layer is:
The graph convolution operation formula of the lth (1≤l≤L) layer graph convolution layer is:

Among them, Q l is the vertex feature matrix of the lth convolutional layer of the graph convolutional neural network after feature transformation; Hl is the input data of the lth convolutional layer of the graph convolutional neural network, and Hl+1 is the graph The output data of the first convolutional layer of the convolutional neural network;
is the transposition matrix of the feature transformation matrix that needs to be learned by the first convolutional layer of the graph convolutional neural network; σ is a nonlinear activation function; K<<n is the order of the polynomial; n is the The number of vertices; θ k is the coefficient of the polynomial; T k (x)=2xT k-1 (x)-T k-2 (x), and T 0 =1, T 1 =x is the Chebyshev polynomial;
is the Laplacian matrix of the graph dataset,
is the Laplacian matrix after linear transformation.
A model training device, characterized in that it comprises:

The obtaining module is used to obtain the vertex feature matrix, adjacency matrix and label matrix constructed based on the graph data set;

A sampling module, configured to perform random walk and sampling based on the adjacency matrix to obtain a positive point-by-point mutual information matrix;

The first training module is used to input the vertex feature matrix and the adjacency matrix into the first Chebyshev graph convolutional neural network to output the first training result;

The second training module is used to input the vertex feature matrix and the positive point-wise mutual information matrix into the second Chebyshev graph convolutional neural network to output the second training result;

a first calculation module, configured to calculate a first loss value between the first training result and the label matrix;

a second calculation module, configured to calculate a second loss value between the second training result and the first training result;

a determining module, configured to determine a target loss value based on the first loss value and the second loss value;

A combination module, configured to combine the first Chebyshev graph convolutional neural network and the second Chebyshev graph convolutional neural network into a dual vertex classification model if the target loss value meets a preset convergence condition .
A model training device, characterized in that it comprises:

memory for storing computer programs;

A processor, configured to execute the computer program, so as to realize the model training method according to any one of claims 1 to 7.
A readable storage medium, characterized by being used to store a computer program, wherein the computer program implements the model training method according to any one of claims 1 to 7 when executed by a processor.