CN113255798A

CN113255798A - Classification model training method, device, equipment and medium

Info

Publication number: CN113255798A
Application number: CN202110613729.6A
Authority: CN
Inventors: 胡克坤; 董刚; 赵雅倩; 刘海威; 徐哲
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2021-08-13
Also published as: WO2022252458A1

Abstract

The application discloses a classification model training method, a classification model training device, classification model training equipment and a classification model training medium, wherein the classification model training device comprises the following steps: constructing a vertex characteristic matrix, an adjacency matrix and a vertex label matrix based on the graph data set; wherein the vertex label matrix comprises label information of each vertex of the graph data set; inputting the vertex characteristic matrix, the adjacency matrix and the vertex label matrix into a Teacher diagram wavelet neural network in the classification model for supervised training, and determining corresponding supervised training loss in the training process; inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in the classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process; determining a target training loss based on the supervised training loss and the unsupervised training loss; and when the target training loss is converged, outputting the current classification model to obtain the trained classification model. Thus, the classification accuracy of the classification model can be improved.

Description

Classification model training method, device, equipment and medium

Technical Field

The present application relates to the field of classifier technologies, and in particular, to a classification model training method, apparatus, device, and medium.

Background

With the rapid development of information technologies such as cloud computing, internet of things, mobile communication, and intelligent terminals, novel applications represented by social networks, communities, and blogs are widely used. These applications are constantly generating large amounts of data that facilitate graphical modeling analysis. Wherein the vertices represent individuals or groups and the connecting edges represent the connections between them; the vertices are typically tagged with information representing the age, gender, location, hobbies, and religious beliefs of the modeled object, as well as many other possible characteristics. These features reflect individual behavioral preferences from various aspects, and ideally, each social network user is tagged with all of the tags associated with their own features. But this is not the case in reality. This is because, for the purpose of protecting personal privacy, more and more social network users are more cautious when sharing personal information, resulting in that the social network media can only collect part of the user's information. Therefore, it is important and urgent to predict the labels of the remaining users according to the label information of the known users. This problem is the vertex classification problem.

Currently, solving the vertex classification problem through a graph neural network has become a research hotspot. A graph neural network is typically composed of an input layer, one or more hidden layers, and an output layer. For example, referring to fig. 1, fig. 1 is a diagram of a neural network structure in the prior art, and fig. 1 shows a structure of a typical graph convolution neural network, which is composed of an Input layer (Input layer), two graph convolution layers (Gconv layers), and an Output layer (Output layer). Wherein the input layer readsn*dThe vertex feature matrix of the dimension, the graph convolution layer extracts the feature of the vertex feature matrix, the vertex feature matrix is transferred to the next graph convolution layer after being transformed by a nonlinear activation function such as ReLu, finally, an output layer, namely a task layer, completes specific tasks such as vertex classification, clustering and the like, and what is shown in figure 1 is a vertex feature matrixAnd the vertex classification task layer outputs the class label of each vertex. Currently, how to improve the classification accuracy is a problem to be solved.

Disclosure of Invention

In view of this, an object of the present application is to provide a method, an apparatus, a device and a medium for training a classification model, which can improve the classification accuracy of the classification model. The specific scheme is as follows:

in a first aspect, the present application discloses a classification model training method, including:

constructing a vertex characteristic matrix, an adjacency matrix and a vertex label matrix based on the graph data set; wherein the vertex label matrix comprises label information for each vertex of the graph dataset;

inputting the vertex characteristic matrix, the adjacency matrix and the vertex label matrix into a Teacher graph wavelet neural network in a classification model for supervised training, and determining corresponding supervised training loss in the training process;

inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process;

determining a target training loss based on the supervised training loss and the unsupervised training loss;

and when the target training loss is converged, outputting the current classification model to obtain a trained classification model.

Optionally, the determining a corresponding supervised training loss in the training process includes:

in the training process, determining corresponding supervised training loss based on a first vertex label prediction result of a Teacher diagram wavelet neural network and the vertex label matrix;

correspondingly, the determining the corresponding unsupervised training loss in the training process includes:

and in the training process, determining corresponding unsupervised training loss based on a second vertex label prediction result and the first vertex label prediction result of the Student graph wavelet neural network.

Optionally, the method further includes:

in the training process, updating the vertex label matrix by using the first vertex label prediction result;

and when the target training loss is converged, outputting the current vertex label matrix to obtain the class prediction result of each vertex without class labels.

Optionally, the method further includes:

calculating a graph wavelet transform basis for the graph data set and a graph wavelet inverse transform basis using a chebyshev polynomial;

correspondingly, the Teacher diagram wavelet neural network and the Student diagram wavelet neural network perform graph convolution operation based on the diagram wavelet transformation basis and the diagram wavelet inverse transformation basis in the training process.

Optionally, the method further includes:

obtaining a calculation formula of the image wavelet transformation base;

wherein the calculation formula is a formula defined based on spectrum theory.

Optionally, the Teacher diagram wavelet neural network and the Student diagram wavelet neural network both include an input layer, a plurality of diagram convolution layers, and an output layer;

the graph volume layer is used for sequentially carrying out feature transformation and graph volume operation processing on input data of the layer in a training process.

Optionally, the method further includes:

in the training process, determining the convolution kernel of the corresponding graph convolution layer in the Student graph wavelet neural network by utilizing the convolution kernel of the graph convolution layer obtained by training the Teacher graph wavelet neural network based on an attention mechanism.

In a second aspect, the present application discloses a classification model training apparatus, comprising:

the training data construction module is used for constructing a vertex characteristic matrix, an adjacency matrix and a vertex label matrix based on the graph data set; wherein the vertex label matrix comprises label information for each vertex of the graph dataset;

the classification model training module is used for inputting the vertex characteristic matrix, the adjacency matrix and the vertex label matrix into a Teacher diagram wavelet neural network in a classification model for supervised training and determining corresponding supervised training loss in the training process; inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process; determining a target training loss based on the supervised training loss and the unsupervised training loss; and when the target training loss is converged, outputting the current classification model to obtain a trained classification model.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the aforementioned classification model training method.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program which, when executed by a processor, implements the aforementioned classification model training method.

Therefore, the vertex feature matrix, the adjacency matrix and the vertex label matrix are constructed on the basis of the graph data set; the vertex label matrix comprises label information of each vertex of the graph data set, and then the vertex feature matrix, the adjacency matrix and the vertex label matrix are input into a Teacher graph wavelet neural network in a classification model for supervised training, and corresponding supervised training loss is determined in the training process; inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process; determining a target training loss based on the supervised training loss and the unsupervised training loss; and when the target training loss is converged, outputting the current classification model to obtain a trained classification model. Therefore, the vertex characteristic matrix and the adjacency matrix of the graph data set are input into the graph neural network for training, the graph topological structure and the vertex characteristic are utilized, supervised training and unsupervised training are utilized during training, the advantages of the supervised training and the unsupervised training are fully played, and the classification accuracy of the classification model can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a diagram of a prior art neural network architecture;

FIG. 2 is a flow chart of a classification model training method disclosed herein;

FIG. 3 is a flow chart of a particular classification model training method disclosed herein;

FIG. 4 is a diagram of a classification model architecture disclosed herein;

FIG. 5 is a diagram of a particular classification model architecture disclosed herein;

FIG. 6 is a flow chart of a particular classification model training method disclosed herein;

FIG. 7 is a schematic structural diagram of a classification model training apparatus according to the present disclosure;

fig. 8 is a block diagram of an electronic device disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, an embodiment of the present application discloses a classification model training method, including:

step S11: constructing a vertex characteristic matrix, an adjacency matrix and a vertex label matrix based on the graph data set; wherein the vertex label matrix comprises label information for each vertex of the graph dataset;

wherein the tag information represents a corresponding category tag or no category tag.

In a specific embodiment, a graph data set is assumed to be

，VA set of vertices is represented that is,Vclustering into a small set of vertices with class labels

And most class label-free vertex sets

Two parts, and satisfy

，

；ERepresenting a set of connected edges. In addition to the label or labels,Geach vertex of (1)vAll havedA feature of all the vertexes constituting

Vertex feature matrix of dimension, denotedX。GIs denoted as an adjacency matrixAElements of

Representing verticesiAndjthe weight of the connecting edge therebetween. Vertex set from existing labels

Construction of

Vertex label matrix of dimensionYWherein, in the step (A),

showing the number of all the vertices in the graph,Cnumber of label classes, matrix elements, representing all vertices

Representing verticesiWhether the class label of (1) is

When it is at the vertexiWhen the category label exists, the corresponding second label is putjThe column elements are 1 and the remaining column elements are 0. Namely, the method comprises the following steps:

when the vertex is upiAnd if the label is not the category label, setting each column element corresponding to the row to be 0.

Step S12: and inputting the vertex characteristic matrix, the adjacency matrix and the vertex label matrix into a Teacher graph wavelet neural network in a classification model for supervised training, and determining corresponding supervised training loss in the training process.

Step S13: and inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process.

In a specific implementation mode, in a training process, determining corresponding supervised training loss based on a first vertex label prediction result of a Teacher diagram wavelet neural network and a vertex label matrix; and determining corresponding unsupervised training loss based on the second vertex label prediction result and the first vertex label prediction result of the Student graph wavelet neural network.

Specifically, the first vertex label prediction result is compared with the vertex label matrix to calculate supervised training loss, and the second vertex label prediction result is compared with the first vertex label prediction result to calculate unsupervised learning loss.

Step S14: determining a target training loss based on the supervised training loss and the unsupervised training loss.

In a specific embodiment, the target training loss is calculated as follows:

;

wherein the content of the first and second substances,

indicating a loss of supervised training,

indicating a loss of unsupervised training,

is a constant used to adjust the proportion of the unsupervised training loss in the target loss.

Representing the result of the prediction of the first vertex label,

representing the second vertex label prediction.

Wherein the content of the first and second substances,

and

are all made of

A matrix of dimensions, and,

or

Each column vector of

Indicating that all vertices belong to a classjProbability of, i.e. it is

Each element representing a vertexiBelong to the category

The probability of (c).

It should be noted that, in the embodiments of the present application, the output layers of the Teacher diagram wavelet neural network and the Student diagram wavelet neural network may be defined as

；

Wherein the content of the first and second substances,

；

in order to map the wavelet transform basis,

in order to map the basis of the inverse wavelet transform,

is shown asLThe layer map is convolved with the convolution kernel matrix of the layer,

is shown asLThe layer vertex feature transformation result, the Teacher diagram wavelet neural network and the Student diagram wavelet neural network compriseLThe layer map is rolled up layer by layer.

Moreover, the supervised training loss function calculates the difference degree of the actual label probability distribution and the predicted label probability distribution of the vertex based on the cross entropy principle; unsupervised training loss function computation

And

the sum of the squares of the differences between the same coordinate elements.

Thus, when the whole network training is finished, the output results of the two networks

And

the agreement or difference is negligible. Output of wavelet neural network with Teacher diagram

Is the output of the entire network model.

In the training process, the vertex label matrix is updated by using the first vertex label prediction result, specifically, for the vertex without class label, that is, for the vertex without class label

And updating the vertex label matrix by taking the class with the highest probability in the first vertex label prediction result as the latest class of the vertex.

Step S15: and when the target training loss is converged, outputting the current classification model to obtain a trained classification model.

In a specific embodiment, when the target training loss reaches a preset threshold or the number of iterations reaches a specified iteration maximum, the target training loss converges, and the training ends. The preset threshold value is usually a small value, and at this time, for a vertex without a class label, the class to which the vertex label belongs is obtained according to the current vertex label matrix.

That is, the present application fuses the prediction of unlabeled vertices into the training process: in the training process, the vertex label matrix is updated according to the training result of each time, and any one category label without a label vertex can be obtained after the training is finished.

In a specific embodiment, the network parameters of each layer of the graph wavelet neural network may be initialized according to a specific strategy, such as normal distribution random initialization, Xavier initialization or He initialization. In the training process, network parameters of each layer of the graph wavelet neural network can be modified and updated according to specific strategies such as SGD (i.e., Stochastic Gradient Descent), MGD (i.e., Momentum Gradient Descent), Nesterov Momentum, AdaGrad (i.e., Adaptive Gradient algorithm), RMSprop (i.e., Root Mean Square Gradient Descent algorithm), Adam (i.e., Adaptive Gradient Estimation) or BGD (i.e., Batch Gradient Descent) to optimize the loss function value.

Referring to fig. 3, the embodiment of the present application discloses a specific classification model training method, including:

step S21: and obtaining a calculation formula of the wavelet transform base of the image.

Wherein the calculation formula is a formula defined based on spectrum theory.

It should be noted that the graph convolution operation defined by the fourier transform has poor locality in the vertex domain, and the basis of the graph wavelet transform is defined by using the spectrum theory, so that the locality of the graph convolution calculation is ensured.

Step S22: a graph wavelet transform basis for the graph data set and a graph wavelet inverse transform basis are calculated using chebyshev polynomials.

In a specific embodiment, the calculation formula of the graph wavelet transformation base is

Wherein, in the step (A),

representing a set of slave graph dataGThe extracted graph wavelet transform basis in (1),Urepresenting a data set from a mapGLaplacian matrix of

Carrying out characteristic decomposition to obtain a matrix consisting of characteristic vectors;Dis a diagonal matrix with major diagonalsnEach element representsnThe degrees of each vertex and the remaining elements are zero.

Is scaled torAnd scaling the matrix of

，

Is a pair of drawingsGPerforming characteristic decomposition on the Laplace matrix to obtain characteristic values; inverse transformation base of image wavelet

Can be prepared by mixing

In (1)

Is replaced by

And (6) obtaining. Because the characteristic decomposition calculation cost of the matrix is large, in order to avoid the cost, the Chebyshev polynomial is utilized

And is and

,

to approximate the computation of the graph wavelet transform basis, and the graph wavelet inverse transform basis.

It should be noted that, in the prior art, the graph fourier transform is inefficient during the graph convolution operation because the eigenvector matrix of the laplacian matrix is dense, whereas the graph convolution operation is performed based on the graph wavelet transform basis and the graph wavelet inverse transform basis, which are sparse, in the present embodiment, so that the operation efficiency of the graph convolution operation can be improved.

Step S23: constructing a vertex characteristic matrix, an adjacency matrix and a vertex label matrix based on the graph data set; wherein the vertex label matrix comprises label information for each vertex of the graph dataset, the label information representing a corresponding class label or no class label.

Step S24: and inputting the vertex characteristic matrix, the adjacency matrix and the vertex label matrix into a Teacher graph wavelet neural network in a classification model for supervised training, and determining corresponding supervised training loss in the training process.

Step S25: and inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process.

In a specific implementation mode, the Teacher diagram wavelet neural network and the Student diagram wavelet neural network respectively comprise an input layer, a plurality of diagram convolution layers and an output layer; the graph volume layer is used for sequentially carrying out feature transformation and graph volume operation processing on input data of the layer in a training process. In particular, 1 input layer may be included,

a map volume layer, and an output layer.

That is, in the embodiment of the present application, the graph convolution layer performs feature transformation on the input data of the layer first, and then performs graph convolution operation processing, so that the graph convolution layer is divided into two processing stages, namely, the feature transformation and the graph convolution operation, which can reduce network parameters, thereby reducing the amount of model operation and improving the model training efficiency.

Wherein, in

Layer diagram volume layer:

and (3) feature transformation:

；

graph convolution:

；

wherein the content of the first and second substances,

and

are respectively the firstlThe layer diagram hides input and output data of the layer, an

;

Is as followslThe layer is the feature transformation matrix to be trained,

is as followslAs a result of the transformation of the layer characteristics,Trepresenting the transpose operation of the matrix.

It should be noted that, in the prior art, the graph convolution layer definition does not usually distinguish between the feature transformation and the convolution operation, and if the graph convolution layer is not divided into two processing stages, namely the feature transformation and the graph convolution operation, in combination with the graph wavelet transformation base in the embodiment of the present application. The graph convolution layer is defined by the following formula:

wherein the content of the first and second substances,Xa matrix of the characteristics of the vertices is represented,mthe ordinal number of the graph convolution layer is shown,Fis a graph convolution kernel matrix that is,his an activation function. In the graph convolution layer defined in the above manner, the number of parameters included is

WhereinnIn the representation diagramThe number of the top points is,pthe vertex feature dimension representing the input of the layer,qrepresenting the vertex feature dimension of the layer output. In the embodiment of the present application, the feature transformation is stripped from the graph convolution operation, and the number of parameters of each graph convolution layer becomes

。

In addition, in a specific embodiment, in the training process, the convolution kernel of the corresponding graph convolution layer in the Student graph wavelet neural network is determined by using the convolution kernel of the graph convolution layer obtained by training the Teacher graph wavelet neural network based on an attention mechanism.

Specifically, the classification model may include a Teacher wavelet neural network, a Student wavelet neural network, and an attention network connecting each pair of map convolutional layers of the Teacher wavelet neural network and the Student wavelet neural network.

In this specification, the following means

Is the firstlThe graph of the layer is convolved with the kernel matrix, which is a diagonal matrix. From the point of view of the signal processing,

diagonal elements

The frequency can be regarded as a graph, and represents the importance of the feature vector corresponding to the frequency. Recording the Teacher diagram wavelet neural network and the Student diagram wavelet neural networklThe convolution kernel matrices of the layers are respectivelyT _lAndS _lconvolution kernels of the layer by the Teacher diagram wavelet neural network respectively

And convolution kernel of the layer of the Student graph wavelet neural network

Diagonalization is obtained, both arenA column vector of dimensions.

In this embodiment, attention transfer (attention transfer) may be performed based on an attention mechanism: the Teacher diagram wavelet neural network transfers the learned convolution kernel of each layer to the corresponding layer of the Student diagram wavelet neural network, namely, the Student diagram wavelet neural network learns to the Teacher diagram wavelet neural network, and the performance of the whole network is improved. Specifically, a single-layer feedforward neural network can be designed, and the input layer of the feedforward neural network is responsible for reading the Teacher diagram wavelet neural network and the Student diagram wavelet neural networklLayer convolution kernel

And

(ii) a With hidden layers for implementing attention functions

In order to obtain attention weight between two vectors

；

Further, attention is weighted by softmax function

The normalization is performed to obtain a normalized attention weight of

：

；

Wherein the content of the first and second substances,

to represent

To (1) aiThe number of the components is such that,

to represent

To (1) aiAnd (4) a component. Then there are:

；

wherein the content of the first and second substances,

represents the first time that the Student graph wavelet neural network learns from the Teacher graph wavelet neural networklA layer convolution kernel.

It should be noted that the addition of the attention mechanism promotes the Student graph wavelet neural network to quickly utilize the knowledge mastered by the Teacher graph wavelet neural network, and the training speed is increased.

Step S26: determining a target training loss based on the supervised training loss and the unsupervised training loss.

Step S27: and when the target training loss is converged, outputting the current classification model to obtain a trained classification model.

For example, referring to fig. 4, fig. 4 is a diagram of a classification model structure disclosed in the embodiment of the present application. Teacher diagram wavelet neural network GWN_TStudent graph wavelet neural network GWN_SFurther, referring to fig. 5, fig. 5 is a specific classification model structure diagram disclosed in the embodiment of the present application. The classification model is composed of a Teacher diagram wavelet neural network GWN_TAnd, Student graph wavelet neural network GWN_SAnd an attention network connecting each pair of map convolutional layers of the two networks. GWN_TSupervised learning is carried out according to the graph vertexes with the labels, and the prediction accuracy is high; GWN_SIn GWN_TUnder the guidance of (using the prediction results of) using unlabeled graph vertex marchingAnd performing unsupervised learning so as to improve the prediction accuracy and obtain a better vertex classification model. Attention network for GWN_TTransferring the learned "knowledge" of each layer, i.e., the convolution kernel, to the GWN_SCorresponding layer, i.e. GWN_STowards GWN_TAnd (5) learning. GWN_TAnd GWN_SEach comprises 1 input layer,LOne map convolutional layer and 1 output layer. The input layer is mainly used for reading the data of the graph to be classified and comprises an adjacent matrix representing the topological structure of the graphAAnd vertex feature matrixX. In the graph convolution layer, the graph convolution operation is decomposed into two stages of feature transformation and graph convolution. The output layer is used for outputting the prediction result.

In addition, in the whole classification model, the network parameters of each layer comprise feature transformation matrixes

(including the Teacher diagram wavelet neural network

And Student graph wavelet neural network

) Convolution kernel (convolution kernel)

And convolution kernel

) And further updating the convolution kernel matrix with the convolution kernel

And attention network parameters

. In the initialization stage, the network parameters are initialized, and in the training process, the network parameters are updated.

For example, referring to fig. 6, the embodiment of the present application discloses a toolMethod for training classification model of body, flow chart, for given image data setGWith its adjacent matrixA、Vertex feature matrixXAnd vertex label matrixYAnd (3) as input, sending the input into a network for forward propagation, calculating the prediction results of all vertexes belonging to each category, updating a prediction result matrix, calculating the loss of a supervised learning part and the loss of an unsupervised learning part so as to obtain a total network loss function value, updating network parameters of each layer according to a certain strategy, and finishing training until the network error reaches a specified smaller value or the iteration times reaches a specified maximum value.

For example, a method according to an embodiment of the present application trains a classification model using a scientific paper set and predicts class labels of unlabeled scientific papers.

(1) Downloading a quotation network data set Citeseer, which comprises 3312 scientific papers divided into six categories and the quotation relation among 4732 papers; bag-of-words (bag of words) model is utilized to construct feature vector of each paperxFeature vectors of all documents form a feature matrixX. Constructing an adjacency matrix according to reference relations among papersA. The goal is to classify each document, randomly draw 20 instances per category as labeled data, 1000 instances as test data, and the rest as unlabeled data; building a vertex label matrixY。

(2) Defining a network structure: graph convolution layers, output layers, and network loss functions are defined based on the foregoing disclosure.

(3) The graph wavelet transform basis and the basis of the graph wavelet inverse transform are calculated using a chebyshev polynomial approximation.

(4) And initializing the network parameters according to a regularization initialization method.

(5) To be provided withA，XAndYand as network input, sending the data to the network for forward propagation. Wherein, the Teacher diagram wavelet neural network GWN_TTo be provided withA，XAndYas input, the Student graph wavelet neural network GWN_STo be provided withAAndXas an input. Each network is based on the definition of the graph convolution layerDefining, combining the input feature matrixes of the layers, and calculating the output feature matrix of each layer; according to the definition of the output layer, the prediction results of all the vertexes belonging to each category are calculatedZ _TOrZ _SCalculating a supervised learning loss function value and an unsupervised learning function loss value according to the defined network loss function, and further obtaining a loss function value of the whole network; for the non-label vertex, the class with the highest probability is taken as the latest class of the vertex, and the vertex label matrix is updatedY。

(6) According to the optimization method, the gradient of the loss function relative to the network parameters is calculated and propagated backwards so as to optimize the network parameters, and the training is finished until the network prediction error reaches a specified smaller value or the iteration times reach a specified maximum value. In this case, for vertices without class labels, the vertex label matrix can be based onYGet the category to which it should belong.

Of course, the present application is not limited to the scientific citation classification problem listed in the examples, but may also be applied to any classification problem that facilitates the data represented by graph modeling, such as proteins, graphic images, and the like, and to research the laws of spreading of infectious diseases and thought points and the like in social networks over time, how groups in social networks form communities around specific interests or membership relationships, and the strength of community connections; according to the rule of 'people group by group', the social network finds people with similar interests and suggests or recommends new links or connections to the people; the question-answering system directs questions to the most experienced person; the advertisement system displays advertisements, etc., to individuals who are most interested in and willing to accept advertisements on a particular topic.

Referring to fig. 7, an embodiment of the present application discloses a classification model training apparatus, including:

the training data construction module 11 is used for constructing a vertex feature matrix, an adjacency matrix and a vertex label matrix based on the graph data set; wherein the vertex label matrix comprises label information for each vertex of the graph dataset;

the classification model training module 12 is configured to input the vertex feature matrix, the adjacency matrix, and the vertex label matrix into a Teacher wavelet neural network in a classification model for supervised training, and determine a corresponding supervised training loss in a training process; inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process; determining a target training loss based on the supervised training loss and the unsupervised training loss; and when the target training loss is converged, outputting the current classification model to obtain a trained classification model.

The classification model training module 12 is specifically configured to determine, in a training process, a corresponding supervised training loss based on a first vertex label prediction result of a Teacher graph wavelet neural network and the vertex label matrix; and determining corresponding unsupervised training loss based on the second vertex label prediction result and the first vertex label prediction result of the Student graph wavelet neural network.

The classification model training module 12 is further configured to: in the training process, updating the vertex label matrix by using the first vertex label prediction result; and when the target training loss is converged, outputting the current vertex label matrix to obtain the class prediction result of each vertex without class labels.

The apparatus also includes a graph wavelet transform basis calculation module for calculating a graph wavelet transform basis of the graph data set and a graph wavelet inverse transform basis using a chebyshev polynomial; correspondingly, the Teacher diagram wavelet neural network and the Student diagram wavelet neural network perform graph convolution operation based on the diagram wavelet transformation basis and the diagram wavelet inverse transformation basis in the training process.

The device also comprises an image wavelet transformation base formula acquisition module, a calculation module and a calculation module, wherein the image wavelet transformation base formula acquisition module is used for acquiring a calculation formula of the image wavelet transformation base; wherein the calculation formula is a formula defined based on spectrum theory.

In a specific implementation mode, the Teacher diagram wavelet neural network and the Student diagram wavelet neural network respectively comprise an input layer, a plurality of diagram convolution layers and an output layer;

The classification model training module 12 is further configured to determine, based on an attention mechanism, a convolution kernel of a corresponding graph convolution layer in the Student graph wavelet neural network by using the convolution kernel of the graph convolution layer obtained by training the Teacher graph wavelet neural network in a training process.

Referring to fig. 8, an embodiment of the present application discloses an electronic device 20, which includes a processor 21 and a memory 22; wherein, the memory 22 is used for saving computer programs; the processor 21 is configured to execute the computer program and the classification model training method disclosed in the foregoing embodiment.

For the specific process of the above classification model training method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

The memory 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, and the storage mode may be a transient storage mode or a permanent storage mode.

In addition, the electronic device 20 further includes a power supply 23, a communication interface 24, an input-output interface 25, and a communication bus 26; the power supply 23 is configured to provide an operating voltage for each hardware device on the server 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to a specific application requirement, which is not specifically limited herein.

Further, a computer-readable storage medium for storing a computer program is disclosed in an embodiment of the present application, where the computer program is executed by a processor to implement the classification model training method disclosed in the foregoing embodiment.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above detailed description is given to a classification model training method, apparatus, device and medium provided by the present application, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A classification model training method is characterized by comprising the following steps:

when the target training loss is converged, outputting a current classification model to obtain a trained classification model;

the device comprises a Teacher diagram wavelet neural network, a Student diagram wavelet neural network and a terminal, wherein the Teacher diagram wavelet neural network and the Student diagram wavelet neural network respectively comprise an input layer, a plurality of diagram convolution layers and an output layer;

and, the method further comprises: in the training process, determining the convolution kernel of the corresponding graph convolution layer in the Student graph wavelet neural network by utilizing the convolution kernel of the graph convolution layer obtained by training the Teacher graph wavelet neural network based on an attention mechanism.

2. The method of classification model training according to claim 1, wherein the determining a corresponding supervised training loss during the training process comprises:

3. The classification model training method according to claim 2, further comprising:

4. The classification model training method according to claim 1, further comprising:

5. The classification model training method according to claim 4, further comprising:

obtaining a calculation formula of the image wavelet transformation base;

wherein the calculation formula is a formula defined based on spectrum theory.

6. The method for training classification models according to any one of claims 1 to 5, wherein the graph convolution layer is used for sequentially performing feature transformation and graph convolution operation processing on input data of the layer in a training process.

7. A classification model training apparatus, comprising:

the classification model training module is used for inputting the vertex characteristic matrix, the adjacency matrix and the vertex label matrix into a Teacher diagram wavelet neural network in a classification model for supervised training and determining corresponding supervised training loss in the training process; inputting the vertex characteristic matrix and the adjacency matrix into a Student graph wavelet neural network in a classification model for unsupervised training, and determining corresponding unsupervised training loss in the training process; determining a target training loss based on the supervised training loss and the unsupervised training loss; when the target training loss is converged, outputting a current classification model to obtain a trained classification model;

and the classification model training module is further configured to: in the training process, determining the convolution kernel of the corresponding graph convolution layer in the Student graph wavelet neural network by utilizing the convolution kernel of the graph convolution layer obtained by training the Teacher graph wavelet neural network based on an attention mechanism.

8. The classification model training device according to claim 7, wherein the classification model training module is specifically configured to determine, during a training process, a corresponding supervised training loss based on a first vertex label prediction result of a Teacher diagram wavelet neural network and the vertex label matrix; and determining corresponding unsupervised training loss based on the second vertex label prediction result and the first vertex label prediction result of the Student graph wavelet neural network.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the classification model training method according to any one of claims 1 to 6.

10. A computer-readable storage medium for storing a computer program which, when executed by a processor, implements the classification model training method according to any one of claims 1 to 6.