WO2022252458A1

WO2022252458A1 - Classification model training method and apparatus, device, and medium

Info

Publication number: WO2022252458A1
Application number: PCT/CN2021/121905
Authority: WO
Inventors: 胡克坤; 董刚; 赵雅倩; 刘海威; 徐哲
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2021-06-02
Filing date: 2021-09-29
Publication date: 2022-12-08
Also published as: CN113255798A

Abstract

A classification model training method and apparatus, a device, and a medium. The method comprises: constructing a vertex feature matrix, an adjacency matrix, and a vertex tag matrix on the basis of a graph data set, the vertex tag matrix comprising tag information of each vertex of the graph data set; inputting the vertex feature matrix, the adjacency matrix, and the vertex tag matrix into a teacher graph wavelet neural network in a classification model for supervised training, and determining a corresponding supervised training loss in the training process; inputting the vertex feature matrix and the adjacency matrix into a student graph wavelet neural network in the classification model for unsupervised training, and determining a corresponding unsupervised training loss in the training process; determining a target training loss on the basis of the supervised training loss and the unsupervised training loss; and when the target training loss is converged, outputting a current classification model to obtain a trained classification model. In this way, the classification accuracy of the classification model can be improved.

Description

A classification model training method, device, equipment and medium

This application claims the priority of the Chinese patent application with the application number 202110613729.6 and the title of the invention "a classification model training method, device, equipment and medium" submitted to the China Patent Office on June 2, 2021, the entire contents of which are incorporated by reference incorporated in this application.

technical field

The present application relates to the technical field of classifiers, and in particular to a classification model training method, device, equipment and medium.

Background technique

With the rapid development of information technologies such as cloud computing, the Internet of Things, mobile communications, and smart terminals, new applications represented by social networks, communities, and blogs are widely used. These applications continue to generate a large amount of data, which is convenient for modeling and analysis with graphs. Among them, the vertices represent individuals or groups, and the connecting edges represent the connections between them; the vertices are usually attached with label information to represent the age, gender, location, hobbies and religious beliefs of the modeled objects, and many other possible feature. These characteristics reflect individual behavior preferences from various aspects. Ideally, each social network user has all tags related to his own characteristics. But the reality is not the case. This is because, for the purpose of protecting personal privacy, more and more social network users are more cautious when sharing personal information, so that social network media can only collect part of the user's information. Therefore, how to infer the tags of the remaining users based on the tag information of known users is particularly important and urgent. This problem is the vertex classification problem.

At present, solving the vertex classification problem through graph neural network has become a research hotspot. A graph neural network usually consists of an input layer, one or more hidden layers, and an output layer. For example, referring to Fig. 1, Fig. 1 is a graph neural network structural diagram in the prior art, and Fig. 1 shows a typical graph convolutional neural network structure, which consists of an input layer (Input layer), It consists of two graph convolutional layers (Gconv layer) and an output layer (Output layer). Among them, the input layer reads the n*d-dimensional vertex feature matrix, and the graph convolution layer performs feature extraction on the vertex feature matrix, which is passed to the next graph convolution layer after nonlinear activation functions such as ReLu transformation. Finally, the output layer is The task layer completes specific tasks such as vertex classification, clustering, etc. Figure 1 shows a vertex classification task layer that outputs the category label of each vertex. At present, how to improve the classification accuracy is a problem that needs to be solved.

Contents of the invention

In view of this, the purpose of the present application is to provide a classification model training method, device, equipment and medium, which can improve the classification accuracy of the classification model. The specific plan is as follows:

In a first aspect, the present application discloses a classification model training method, including:

Constructing a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on a graph data set; wherein, the vertex label matrix includes label information for each vertex of the graph data set;

The vertex feature matrix, the adjacency matrix and the vertex label matrix are input to the Teacher graph wavelet neural network in the classification model to carry out supervised training, and determine the corresponding supervised training loss in the training process;

The vertex feature matrix and the adjacency matrix are input to the Student graph wavelet neural network in the classification model to carry out unsupervised training, and determine the corresponding unsupervised training loss in the training process;

determining a target training loss based on the supervised training loss and the unsupervised training loss;

When the target training loss converges, the current classification model is output to obtain the trained classification model.

Optionally, determining the corresponding supervised training loss in the training process includes:

In the training process, the corresponding supervised training loss is determined based on the first vertex label prediction result of the Teacher graph wavelet neural network and the vertex label matrix;

Correspondingly, the corresponding unsupervised training loss is determined in the training process, including:

During the training process, a corresponding unsupervised training loss is determined based on the second vertex label prediction result of the Student graph wavelet neural network and the first vertex label prediction result.

Optionally, also include:

During the training process, using the first vertex label prediction result to update the vertex label matrix;

When the target training loss converges, the current vertex label matrix is output to obtain the category prediction result of each vertex without a category label.

Optionally, the method also includes:

Computing a graph wavelet transform basis and a graph wavelet inverse transform basis for said graph data set using Chebyshev polynomials;

Correspondingly, during the training process, the Teacher graph wavelet neural network and the Student graph wavelet neural network perform graph convolution operations based on the graph wavelet transform base and the graph wavelet inverse transform base.

Optionally, the method also includes:

Obtain the calculation formula of the wavelet transform base of the graph;

Wherein, the calculation formula is a formula defined based on spectrum theory.

Optionally, both the Teacher graph wavelet neural network and the Student graph wavelet neural network include an input layer, several graph convolution layers, and an output layer;

Wherein, the graph convolution layer is used to sequentially perform feature transformation and graph convolution operation processing on the input data of the layer during the training process.

Optionally, the method also includes:

During the training process, the convolution kernel of the graph convolution layer obtained through the training of the Teacher graph wavelet neural network is used to determine the convolution kernel of the corresponding graph convolution layer in the Student graph wavelet neural network based on the attention mechanism.

In a second aspect, the present application discloses a classification model training device, including:

A training data construction module, configured to construct a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on a graph data set; wherein, the vertex label matrix includes label information for each vertex of the graph data set;

The classification model training module is used to input the vertex feature matrix, the adjacency matrix and the vertex label matrix to the Teacher graph wavelet neural network in the classification model to carry out supervised training, and determine the corresponding effective supervised training loss; input the vertex feature matrix and the adjacency matrix to the Student graph wavelet neural network in the classification model for unsupervised training, and determine the corresponding unsupervised training loss in the training process; based on the supervised The training loss and the unsupervised training loss determine the target training loss; when the target training loss converges, the current classification model is output to obtain the trained classification model.

In a third aspect, the present application discloses an electronic device, comprising:

memory for storing computer programs;

A processor, configured to execute the computer program, so as to realize the aforementioned classification model training method.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program, and when the computer program is executed by a processor, the aforementioned classification model training method is implemented.

It can be seen that the present application constructs a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on a graph data set; wherein, the vertex label matrix includes the label information of each vertex of the graph data set, and then the vertex feature matrix, the vertex label matrix, and The adjacency matrix and the vertex label matrix are input to the Teacher graph wavelet neural network in the classification model to carry out supervised training, and determine the corresponding supervised training loss in the training process; the vertex feature matrix, the adjacency matrix Input to the Student graph wavelet neural network in classification model to carry out unsupervised training, and determine corresponding unsupervised training loss in training process; Based on described supervised training loss and described unsupervised training loss, determine target training loss; When When the target training loss converges, the current classification model is output to obtain the post-training classification model. In this way, the vertex feature matrix and adjacency matrix of the graph data set are input into the graph neural network for training, and the graph topology and vertex features are used. During training, supervised training and unsupervised training are used to give full play to supervised training and The respective advantages of unsupervised training can improve the classification accuracy of the classification model.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present application, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.

Fig. 1 is a kind of graph neural network structural diagram in the prior art;

Fig. 2 is a flow chart of a classification model training method disclosed in the present application;

Fig. 3 is a flow chart of a specific classification model training method disclosed in the present application;

Fig. 4 is a kind of classification model structural diagram disclosed in the present application;

FIG. 5 is a structural diagram of a specific classification model disclosed in the present application;

FIG. 6 is a flow chart of a specific classification model training method disclosed in the present application;

7 is a schematic structural diagram of a classification model training device disclosed in the present application;

FIG. 8 is a structural diagram of an electronic device disclosed in the present application.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

Referring to Fig. 2, the embodiment of the present application discloses a classification model training method, including:

Step S11: Construct a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on the graph dataset; wherein, the vertex label matrix includes label information for each vertex of the graph dataset;

Wherein, the label information indicates a corresponding category label or no category label.

In a specific implementation, it is assumed that the graph data set is G=(V, E), V represents a vertex set, and V is divided into a small number of vertex sets V _L with category labels and a majority of vertex sets V _U without category labels. part, and satisfy V _L ∪ V _U ＝ V,

E represents the set of connected edges. In addition to labels, each vertex v of G has d features, and the features of all vertices constitute an n*d-dimensional vertex feature matrix, which is denoted as X. The adjacency matrix of G is denoted as A, and the element A _ij represents the weight of the connection edge between vertices i and j. According to the vertex set V _L with existing labels, a n*C-dimensional vertex label matrix Y is constructed, where n=|V| represents the number of all vertices in the graph, C represents the number of label categories of all vertices, and the matrix element Y _ij represents Whether the category label of vertex i is j (j=1, 2, ..., C), when vertex i already has a category label, set the corresponding j-th column element to 1, and set the other column elements to 0. That is:

When the vertex i has no category label, the elements of each column corresponding to the row are set to 0.

Step S12: Input the vertex feature matrix, the adjacency matrix and the vertex label matrix into the Teacher graph wavelet neural network in the classification model for supervised training, and determine the corresponding supervised training loss during the training process.

Step S13: Input the vertex feature matrix and the adjacency matrix into the Student graph wavelet neural network in the classification model for unsupervised training, and determine the corresponding unsupervised training loss during the training process.

In a specific embodiment, in the training process, the corresponding supervised training loss is determined based on the first vertex label prediction result of the Teacher graph wavelet neural network and the vertex label matrix; based on the second vertex of the Student graph wavelet neural network The label prediction result and the first vertex label prediction result determine a corresponding unsupervised training loss.

Specifically, the first vertex label prediction result is compared with the vertex label matrix to calculate a supervised training loss, and the second vertex label prediction result is compared with the first vertex label prediction result to calculate an unsupervised learning loss.

Step S14: Determine a target training loss based on the supervised training loss and the unsupervised training loss.

In a specific implementation manner, the calculation formula of the target training loss is as follows:

Among them, ls _T represents the supervised training loss, ls _S represents the unsupervised training loss, and α is a constant used to adjust the proportion of unsupervised training loss in the target loss. Z _T represents the first vertex label prediction result, and Z _S represents the second vertex label prediction result.

Among them, both Z _T and Z _S are n*C-dimensional matrices, and each column vector z _j in Z _T or Z _S represents the probability that all vertices belong to category j, that is, its i-th (1≤i≤n ) elements represent the probability that vertex i belongs to category j (j=1, 2, . . . , C).

It should be pointed out that in the embodiment of the present application, the output layer of the Teacher graph wavelet neural network and the Student graph wavelet neural network can be defined as

in,

ψ _r is the graph wavelet transform basis,

is the graph wavelet inverse transform base, F _L represents the convolution kernel matrix of the L layer graph convolution layer, Q _L represents the L layer vertex feature transformation result, the Teacher graph wavelet neural network and the Student graph wavelet neural network both include L layer graphs convolutional layer.

Moreover, the supervised training loss function calculates the degree of difference between the actual label probability distribution and the predicted label probability distribution of vertices based on the principle of cross entropy; the unsupervised training loss function calculates the sum of squares of the differences between the same coordinate elements of Z _T and Z _S.

In this way, when the entire network training ends, the output results Z _T and Z _S of the two networks are consistent or the difference is negligible. The output Z _T of the teacher graph wavelet neural network can be used as the output of the entire network model.

In this embodiment, during the training process, the vertex label matrix is updated using the first vertex label prediction result. Specifically, for vertices without category labels, that is, for v _i ∈ V _U , the first vertex label prediction result The category with the highest probability is used as the latest category of the vertex, and the vertex label matrix is updated.

Step S15: When the target training loss converges, output the current classification model to obtain the post-training classification model.

And, when the target training loss converges, the current vertex label matrix is output to obtain the category prediction result of each vertex without a category label.

In a specific implementation manner, when the target training loss reaches a preset threshold or the number of iterations reaches a specified maximum value of iterations, the target training loss converges and the training ends. Wherein, the preset threshold is usually a small value, at this time, for a vertex without a class label, the class to which it should belong is obtained according to the current vertex label matrix.

That is, this application integrates the prediction of unlabeled vertices into the training process: during the training process, the vertex label matrix is updated according to each training result, and the category label of any unlabeled vertex can be obtained after the training is completed.

Wherein, in a specific implementation manner, the network parameters of each layer of the graph wavelet neural network may be initialized first according to a specific strategy such as random initialization with normal distribution, Xavier initialization or He initialization. In the process of training, according to specific strategies such as SGD (Stochastic Gradient Descent, stochastic gradient descent), MGD (Momentum Gradient Descent, momentum gradient descent), Nesterov Momentum (Newton momentum), AdaGrad (Adaptive gradient algorithm, automatic Adaptive gradient algorithm), RMSprop (ie Root Mean Square Prop, forward root mean square gradient descent algorithm) and Adam (ie Adaptive Moment Estimation, adaptive moment estimation) or BGD (ie Batch Gradient Descent, batch gradient descent), etc. The network parameters of each layer of the graph wavelet neural network are corrected and updated to optimize the value of the loss function.

It can be seen that in the embodiment of the present application, the vertex feature matrix, adjacency matrix, and vertex label matrix are constructed based on the graph data set; wherein, the vertex label matrix includes the label information of each vertex of the graph data set, and then the vertex feature matrix , the adjacency matrix and the vertex label matrix are input to the Teacher graph wavelet neural network in the classification model for supervised training, and determine the corresponding supervised training loss in the training process; the vertex feature matrix, the The adjacency matrix is input to the Student graph wavelet neural network in the classification model for unsupervised training, and the corresponding unsupervised training loss is determined during the training process; the target training loss is determined based on the supervised training loss and the unsupervised training loss ; When the target training loss converges, the current classification model is output to obtain the trained classification model. In this way, the vertex feature matrix and adjacency matrix of the graph data set are input into the graph neural network for training, and the graph topology and vertex features are used. During training, supervised training and unsupervised training are used to give full play to supervised training and The respective advantages of unsupervised training can improve the classification accuracy of the classification model.

Referring to Figure 3, the embodiment of the present application discloses a specific classification model training method, including:

Step S21: Obtain the calculation formula of the graph wavelet transform basis.

Wherein, the calculation formula is a formula defined based on spectrum theory.

It should be pointed out that the graph convolution operation defined by the Fourier transform has poor locality in the vertex domain, and the use of spectral theory to define the basis of the graph wavelet transform ensures the locality of the graph convolution calculation.

Step S22: Calculate the graph wavelet transform basis and the graph wavelet inverse transform basis of the graph data set by using Chebyshev polynomials.

In a specific implementation, the calculation formula of the graph wavelet transform basis is ψ _r =UH _r U ^T , where ψ _r represents the graph wavelet transform base extracted from the graph data set G, and U represents the graph obtained from the graph data set G. Laplace matrix

A matrix composed of eigenvectors obtained by eigendecomposition; D is a diagonal matrix, and the n elements on the main diagonal represent the degrees of n vertices respectively, and the remaining elements are all zero. H _r ＝diag(h(rλ ₁ ), h(rλ ₂ ),..., h(rλ _n )) is the scaling matrix whose scaling scale is r, and let

is the eigenvalue obtained by eigendecomposing the Laplacian matrix of graph G; the graph wavelet inverse transform base

It can be obtained by replacing h(rλ _i ) in ψ _r with h(-rλ _i ). Since the eigendecomposition of the matrix has a large computational cost, in order to avoid this cost, use the Chebyshev polynomial T _k (x)=2xT _k-1 (x)-T _k-2 (x), and T ₀ =1,T ₁ =x, to approximate the calculation of graph wavelet transform base and graph wavelet inverse transform base.

It should be pointed out that the graph Fourier transform is inefficient in the process of graph convolution operation in the prior art, because the eigenvector matrix of the Laplacian matrix is dense, and this embodiment is based on the graph wavelet The transform base and graph wavelet inverse transform base perform graph convolution operations. The graph wavelet transform base and graph wavelet inverse transform base are sparse, so the computational efficiency of graph convolution operations can be improved.

Step S23: Construct a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on the graph data set; wherein, the vertex label matrix includes label information of each vertex of the graph data set, and the label information represents the corresponding category label or none category label.

Step S24: Input the vertex feature matrix, the adjacency matrix and the vertex label matrix to the Teacher graph wavelet neural network in the classification model for supervised training, and determine the corresponding supervised training loss during the training process.

Step S25: Input the vertex feature matrix and the adjacency matrix into the Student graph wavelet neural network in the classification model for unsupervised training, and determine the corresponding unsupervised training loss during the training process.

In a specific embodiment, both the Teacher graph wavelet neural network and the Student graph wavelet neural network include an input layer, several graph convolution layers, and an output layer; The input data of is sequentially processed by feature transformation and graph convolution operation. Specifically, it may include 1 input layer, L (L≥1) graph convolution layers, and an output layer.

That is to say, in the embodiment of the present application, the graph convolution layer first performs feature transformation on the input data of the layer, and then the graph convolution operation processing, so that the graph convolution layer is divided into two processes of feature transformation and graph convolution operation stage, network parameters can be reduced, thereby reducing the amount of model computation and improving model training efficiency.

Among them, in the l(1≤l≤L) layer graph convolution layer:

Feature transformation:

Graph convolution:

Among them, H _l and H _l+1 are the input and output data of the hidden layer of the l-th layer graph respectively, and H ₁ =X; Θ _l is the feature transformation matrix to be trained in the l-th layer, and Q _l is the feature transformation of the l-th layer As a result, T represents the transpose operation of the matrix.

It should be pointed out that the definition of the graph convolution layer in the prior art usually does not distinguish between feature transformation and convolution operation, combined with the graph wavelet transform base in the embodiment of the application, if the graph convolution layer is not divided into feature transformation and graph convolution The plot operation has two processing stages. The graph convolutional layer is defined by the following formula:

Among them, X represents the vertex feature matrix, m represents the ordinal number of the graph convolution layer, F is the graph convolution kernel matrix, and h is the activation function. In the graph convolution layer defined in the above way, the number of parameters included is n*p*q, where n represents the number of vertices in the graph, p represents the vertex feature dimension of the layer input, and q represents the output of the layer Vertex feature dimension. However, in the embodiment of the present application, the feature transformation is separated from the graph convolution operation, and the number of parameters of each graph convolution layer becomes n+p*q.

In addition, in a specific embodiment, during the training process, the convolution kernel of the graph convolution layer obtained by using the Teacher graph wavelet neural network training based on the attention mechanism determines the corresponding graph volume in the Student graph wavelet neural network. Layered convolution kernels.

Specifically, the classification model may include a Teacher graph wavelet neural network, a Student graph wavelet neural network, and an attention network connecting each pair of the Teacher graph wavelet neural network and the Student graph wavelet neural network.

It should be noted that, let F _l be the graph convolution kernel matrix of layer l, which is a diagonal matrix. From the perspective of signal _{processing, the elements (f 1} _, f ₂ _. Note that the convolution kernel matrices of the first layer of the Teacher graph wavelet neural network and the Student graph wavelet neural network are T _l and S _l respectively, and the convolution kernel t _l of the Teacher graph wavelet neural network layer and the Student graph wavelet neural network are The convolution kernel s _l of the layer is obtained by diagonalizing, both of which are n-dimensional column vectors.

In this embodiment, attention transfer (attention transfer) can be performed based on the attention mechanism: each layer of the Teacher graph wavelet neural network transfers the learned convolution kernel to the corresponding layer of the Student graph wavelet neural network, that is, the Student graph wavelet neural network. The graph wavelet neural network learns from the Teacher graph wavelet neural network to improve the performance of the entire network. Specifically, a single-layer feed-forward neural network can be designed, and its input layer is responsible for reading the convolution kernels t _l and s _l of the first layer of the Teacher graph wavelet neural network and the Student graph wavelet neural network; its hidden layer is used to realize the attention Force function a _l : R ⁿ ×R ⁿ → R, in order to get the attention weight e _l between the two vectors: e _l = a _l (t _l , s _l );

Further, normalize the attention weight e _l through the softmax function to obtain the normalized attention weight e′ _l :

Among them, e _l (i) represents the i-th component of e _l , and e′ _l (i) represents the i-th component of e′ _l . Then there are:

s′ _l (i)=e′ _l (i)×t _l (i), i∈[1,n];

Among them, s′ _l (i) represents the first layer of convolution kernel learned by the Student graph wavelet neural network from the Teacher graph wavelet neural network.

It should be pointed out that the addition of the attention mechanism promotes the Student graph wavelet neural network to quickly use the knowledge mastered by the Teacher graph wavelet neural network to improve the training speed.

Step S26: Determine a target training loss based on the supervised training loss and the unsupervised training loss.

Step S27: When the target training loss converges, output the current classification model to obtain the trained classification model.

For example, refer to FIG. 4 , which is a structure diagram of a classification model disclosed in the embodiment of the present application. Teacher graph wavelet neural network GWN _T , Student graph wavelet neural network _GWNS , further, refer to FIG. 5 , which is a specific classification model structure diagram disclosed in the embodiment of the present application. The classification model consists of a Teacher graph wavelet neural network GWN _T , a Student graph wavelet neural network GWNS _S , and an attention network connecting each pair of graph convolutional layers of the two networks. GWN _T performs supervised learning based on labeled graph vertices, and the prediction accuracy is high; GWNS _S uses unlabeled graph vertices to perform unsupervised learning under the guidance of GWN _T (using its prediction results), in order to improve the prediction accuracy. Get a better vertex classification model. The attention network is used by GWN _T to transfer the "knowledge" learned by each layer, that is, the convolution kernel, to the corresponding layer of GWN _S , that is, GWN _S learns from GWN _T. Both GWN _T and GWN _S contain 1 input layer, L graph convolution layers and 1 output layer. The input layer is mainly used to read the graph data to be classified, including the adjacency matrix A and the vertex feature matrix X representing the topology of the graph. In the graph convolution layer, the graph convolution operation is decomposed into two stages: feature transformation and graph convolution. The output layer is used to output prediction results.

Moreover, in the entire classification model, the network parameters of each layer include the feature transformation matrix Θ _l (including the Teacher graph wavelet neural network

and Student graph wavelet neural network

), the convolution kernel (convolution kernel t _l and convolution kernel s _l ), and then use the convolution kernel to update the convolution kernel matrix F _l , and the attention network parameter a _l . In the initialization phase, the aforementioned network parameters are initialized, and during the training process, the aforementioned network parameters are updated.

For example, referring to Fig. 6, the embodiment of the present application discloses a flow chart of a specific classification model training method. For a given graph data set G, its adjacency matrix A, vertex feature matrix X, and vertex label matrix Y As an input, it is sent to the network for forward propagation, and the prediction results of all vertices belonging to each category are calculated. While updating the prediction result matrix, the loss of the supervised learning part and the loss of the unsupervised learning part are calculated to obtain the total network loss. Function value, update the network parameters of each layer according to a certain strategy, until the network error reaches a specified minimum value or the number of iterations reaches the specified maximum value, the training ends.

For example, the method based on the embodiment of the present application utilizes a collection of scientific papers to train a classification model and predict category labels of unlabeled scientific papers.

(1) Download the citation network dataset Citeseer, which contains 3312 scientific papers divided into six categories and the citation relationship between 4732 papers; use bag-of-words (word bag model) to construct its feature vector for each paper x, the feature vectors of all documents form the feature matrix X. According to the citation relationship between papers, construct its adjacency matrix A. The goal is to classify each document, randomly sample 20 instances of each category as labeled data, use 1000 instances as test data, and use the rest as unlabeled data; construct a vertex label matrix Y.

(2) Define the network structure: define the graph convolution layer, output layer, and network loss function based on the aforementioned disclosure.

(3) Using Chebyshev polynomials to approximate the calculation of graph wavelet transform basis and graph wavelet inverse transform basis.

(4) According to the regularization initialization method, the network parameters are initialized.

(5) Take A, X and Y as network input and send them to the network for forward propagation. Among them, the Teacher graph wavelet neural network GWN _T takes A, X and Y as inputs, and the Student graph wavelet neural network GWN _S takes A and X as inputs. Each network calculates the output feature matrix of each layer according to the definition of the graph convolution layer, combined with the input feature matrix of the layer; according to the definition of the output layer, calculates the prediction results Z _T or Z _S of all vertices belonging to each category, And calculate the supervised learning loss function value and the unsupervised learning function loss value according to the network loss function defined above, and then obtain the loss function value of the entire network; for unlabeled vertices, take the category with the highest probability as the latest category of the vertex , and update the vertex label matrix Y.

(6) According to the optimization method, calculate the gradient of the loss function with respect to the network parameters, and propagate backwards to optimize the network parameters until the network prediction error reaches a specified minimum value or the number of iterations reaches the specified maximum value, training Finish. At this time, for vertices without category labels, the category to which they belong can be obtained according to the vertex label matrix Y.

Of course, this application is not limited to the scientific citation classification problems listed in the examples, and can also be applied to any data classification problems that are conveniently modeled and represented by graphs, such as proteins, graphic images, etc., and for the study of infectious diseases The law of the spread and diffusion of ideas and ideas in social networks over time, research on how groups in social networks form communities around specific interests or affiliation relationships, and the strength of community connections; social networks are based on the law of "dividing people into groups". Discovering people with similar interests and suggesting or recommending new links or connections to them; question answering systems directing questions to those with the most relevant experience; advertising systems showing ads to individuals who are most interested and willing to receive advertisements on a particular topic, etc.

Referring to Figure 7, the embodiment of the present application discloses a classification model training device, including:

Training data construction module 11, for constructing vertex feature matrix, adjacency matrix and vertex label matrix based on graph data set; Wherein, described vertex label matrix comprises the label information of each vertex of described graph data set;

Classification model training module 12, is used for described vertex feature matrix, described adjacency matrix and described vertex label matrix input to the Teacher figure wavelet neural network in the classification model to carry out supervised training, and determine corresponding Supervised training loss is arranged; described vertex characteristic matrix, described adjacency matrix are input to the Student graph wavelet neural network in classification model and carry out unsupervised training, and determine corresponding unsupervised training loss in training process; Based on described The supervised training loss and the unsupervised training loss determine the target training loss; when the target training loss converges, the current classification model is output to obtain the trained classification model.

Wherein, the classification model training module 12 is specifically used in the training process to determine the corresponding supervised training loss based on the first vertex label prediction result of the Teacher graph wavelet neural network and the vertex label matrix; based on the Student graph wavelet neural network The second vertex label prediction result and the first vertex label prediction result determine the corresponding unsupervised training loss.

The classification model training module 12 is also used for: during the training process, update the vertex label matrix using the first vertex label prediction result; when the target training loss converges, output the current vertex label matrix to obtain each Class prediction results for vertices without class labels.

The device also includes a graph wavelet transform base calculation module, which is used to calculate the graph wavelet transform base and graph wavelet inverse transform base of the graph data set using Chebyshev polynomials; correspondingly, the Teacher graph wavelet neural network and the Student graph wavelet During the training process of the neural network, a graph convolution operation is performed based on the graph wavelet transform base and the graph wavelet inverse transform base.

The device also includes a graphic wavelet transform base formula acquisition module, configured to acquire the calculation formula of the graph wavelet transform base; wherein, the calculation formula is a formula defined based on spectral theory.

In a specific embodiment, both the Teacher graph wavelet neural network and the Student graph wavelet neural network include an input layer, several graph convolution layers, and an output layer;

The classification model training module 12 is also used in the training process to determine the corresponding graph volume in the Student graph wavelet neural network based on the attention mechanism using the graph convolution layer trained by the Teacher graph wavelet neural network. Layered convolution kernels.

Referring to FIG. 8 , the embodiment of the present application discloses an electronic device 20, including a processor 21 and a memory 22; wherein, the memory 22 is used to store computer programs; the processor 21 is used to execute the A computer program, the classification model training method disclosed in the foregoing embodiments.

Regarding the specific process of the above classification model training method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.

Moreover, the memory 22, as a resource storage carrier, may be a read-only memory, random access memory, magnetic disk or optical disk, etc., and the storage method may be temporary storage or permanent storage.

In addition, the electronic device 20 also includes a power supply 23, a communication interface 24, an input and output interface 25, and a communication bus 26; wherein, the power supply 23 is used to provide working voltage for each hardware device on the server 20; the communication The interface 24 can create a data transmission channel between the electronic device 20 and the external device, and the communication protocol it follows is any communication protocol applicable to the technical solution of the present application, which is not specifically limited here; the input The output interface 25 is used to obtain external input data or output data to the external, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.

Further, the embodiment of the present application also discloses a computer-readable storage medium for storing a computer program, wherein, when the computer program is executed by a processor, the classification model training method disclosed in the foregoing embodiments is implemented.

Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

A classification model training method, device, equipment and medium provided by this application have been introduced in detail above. In this paper, specific examples have been used to illustrate the principle and implementation of this application. The description of the above embodiments is only used to help Understand the method of this application and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of this application, there will be changes in the specific implementation and scope of application. In summary, the content of this specification does not It should be understood as a limitation on the present application.

Claims

A classification model training method, characterized in that, comprising:

Constructing a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on a graph data set; wherein, the vertex label matrix includes label information for each vertex of the graph data set;

The vertex feature matrix, the adjacency matrix and the vertex label matrix are input to the Teacher graph wavelet neural network in the classification model to carry out supervised training, and determine the corresponding supervised training loss in the training process;

The vertex feature matrix and the adjacency matrix are input to the Student graph wavelet neural network in the classification model to carry out unsupervised training, and determine the corresponding unsupervised training loss in the training process;

determining a target training loss based on the supervised training loss and the unsupervised training loss;

When the target training loss converges, output the current classification model to obtain the trained classification model;

Among them, both the Teacher graph wavelet neural network and the Student graph wavelet neural network include an input layer, several graph convolution layers, and an output layer;

And, the method also includes: in the training process, using the convolution kernel of the graph convolution layer obtained by the Teacher graph wavelet neural network training based on the attention mechanism to determine the corresponding graph convolution in the Student graph wavelet neural network The convolution kernel of the layer.
The classification model training method according to claim 1, wherein the corresponding supervised training loss is determined in the training process, comprising:

In the training process, the corresponding supervised training loss is determined based on the first vertex label prediction result of the Teacher graph wavelet neural network and the vertex label matrix;

Correspondingly, the corresponding unsupervised training loss is determined in the training process, including:

During the training process, a corresponding unsupervised training loss is determined based on the second vertex label prediction result of the Student graph wavelet neural network and the first vertex label prediction result.
The classification model training method according to claim 2, further comprising:

During the training process, using the first vertex label prediction result to update the vertex label matrix;

When the target training loss converges, the current vertex label matrix is output to obtain the category prediction result of each vertex without a category label.
The classification model training method according to claim 1, further comprising:

Computing a graph wavelet transform basis and a graph wavelet inverse transform basis for said graph data set using Chebyshev polynomials;

Correspondingly, during the training process, the Teacher graph wavelet neural network and the Student graph wavelet neural network perform graph convolution operations based on the graph wavelet transform base and the graph wavelet inverse transform base.
The classification model training method according to claim 4, further comprising:

Obtain the calculation formula of the wavelet transform base of the graph;

Wherein, the calculation formula is a formula defined based on spectrum theory.
The classification model training method according to any one of claims 1 to 5, wherein the graph convolution layer is used to sequentially perform feature transformation and graph convolution operation processing on the input data of the layer during the training process.
A classification model training device, characterized in that it comprises:

A training data construction module, configured to construct a vertex feature matrix, an adjacency matrix, and a vertex label matrix based on a graph data set; wherein, the vertex label matrix includes label information for each vertex of the graph data set;

The classification model training module is used to input the vertex feature matrix, the adjacency matrix and the vertex label matrix to the Teacher graph wavelet neural network in the classification model to carry out supervised training, and determine the corresponding effective supervised training loss; input the vertex feature matrix and the adjacency matrix to the Student graph wavelet neural network in the classification model for unsupervised training, and determine the corresponding unsupervised training loss in the training process; based on the supervised The training loss and the unsupervised training loss determine the target training loss; when the target training loss converges, the current classification model is output to obtain the trained classification model;

Among them, both the Teacher graph wavelet neural network and the Student graph wavelet neural network include an input layer, several graph convolution layers, and an output layer;

And, the classification model training module is also used for: in the training process, based on the attention mechanism, using the convolution kernel of the graph convolution layer obtained from the Teacher graph wavelet neural network training to determine the corresponding The convolution kernel of the graph convolution layer.
The classification model training device according to claim 7, wherein the classification model training module is specifically used to, in the training process, based on the first vertex label prediction result of the Teacher graph wavelet neural network and the vertex label matrix A corresponding supervised training loss is determined; a corresponding unsupervised training loss is determined based on the second vertex label prediction result of the Student graph wavelet neural network and the first vertex label prediction result.
An electronic device, characterized in that it comprises:

memory for storing computer programs;

A processor, configured to execute the computer program, so as to realize the classification model training method according to any one of claims 1 to 6.
A computer-readable storage medium, characterized in that it is used to store a computer program, and when the computer program is executed by a processor, the classification model training method according to any one of claims 1 to 6 is realized.