CN115115862A

CN115115862A - High-order relation knowledge distillation method and system based on heterogeneous graph neural network

Info

Publication number: CN115115862A
Application number: CN202210553500.2A
Authority: CN
Inventors: 刘静; 郝沁汾; 叶笑春; 范东睿
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-09-27

Abstract

The invention provides a high-order relation knowledge distillation method and system based on a heterogeneous graph neural network. Specifically, the method comprises the steps of coding single node semantics of a pre-trained heterogeneous teacher model by performing node-level knowledge distillation; and modeling semantic relationships among different types of nodes of the pre-trained heterogeneous teacher model by performing relationship-level knowledge distillation. By integrating node-level knowledge distillation and system-level knowledge distillation, the high-order relation knowledge distillation method becomes a practical and universal training method, is suitable for any heterogeneous graph neural network, not only improves the performance and generalization capability of a heterogeneous student model, but also ensures the node-level and relation-level knowledge extraction of the heterogeneous graph neural network.

Description

High-order relation knowledge distillation method and system based on heterogeneous graph neural network

Technical Field

The invention relates to the field of graph data mining, in particular to the field of heterogeneous graph data mining, and more particularly relates to a high-order relation knowledge distillation method and system based on a heterogeneous graph neural network.

Background

The ubiquitous presence of heteromorphic images in academic and industrial fields, a great number of heteromorphic image neural networks (HGNN) have been proposed recently, and learning of node representations in heteromorphic images is a hot spot of current research. Heterogeneous graph modeling has the advantage of integrating more information than homogeneous graphs. However, how to embed rich structural and semantic information in an anomaly graph into a low-dimensional node representation is a serious challenge.

In recent years, in order to solve the problem of heterogeneity of nodes and edges in a heterogeneous graph, researchers have proposed many HGNN-based methods, mainly classified into a meta-path-based method and an edge-relationship-based method. In order to capture the heterogeneity of edges, the edge relation-based method directly utilizes a specific relation matrix to process edge relations of various node types in different measurement spaces, such as heterogeneous graph neural network models of RGCN, HGT, HGConv and the like. However, the edge relation-based method can capture only local structural information of the heteromorphic image. In order to be able to encode rich semantic information in an anomaly map, a meta-path based approach is proposed. The meta-path is an effective semantic mining tool, and can capture more complex and richer high-order semantic information among nodes in a heterogeneous graph. Among them, HAN is a pioneering work based on the meta-path method.

While existing HGNNs have achieved good performance, their representation capabilities are limited by (1) imprecise data labeling. In general, the HGNN training method belongs to semi-supervised learning, and therefore, the performance of the HGNN training method is highly dependent on a large amount of high-quality label data. However, fuzzy data labeling will become a bottleneck for HGNN modeling; (2) the semantic relationship between different types of nodes is difficult to model. Although meta-paths are used for higher-order semantic modeling in metamorphic graphs, meta-path selection in different domains is still challenging because it requires sufficient domain knowledge.

In recent years, Knowledge Distillation (KD) techniques in deep learning have shown some advantages in improving the performance of models. Currently, there are some efforts to combine the knowledge distillation method with the neural network of the graph for application. They are designed for homogeneous neural networks where each node or edge in the processed data is of the same type.

Disclosure of Invention

The invention aims to overcome two defects of inaccuracy in data annotation and difficulty in semantic relationship modeling faced by HGNN in the prior art, and provides a high-order relationship knowledge distillation method based on a heterogeneous graph neural network, which comprises the following steps:

step S1, respectively obtaining a heterogeneous graph neural network model of knowledge to be distilled as a teacher model, obtaining a heterogeneous graph neural network model of knowledge to be received as a student model, and obtaining model prediction values of output layers of the teacher model and the student model and the heterogeneous node embedded representation of a middle graph convolution layer;

step S2, extracting first-order node-level soft label knowledge of the teacher model through node-level knowledge distillation based on the model predicted values of the teacher model and the student model;

step S3, based on the teacher model and the student model, the intermediate graph convolution layer embedding representation, and extracting the second-order relation level heterogeneous semantic knowledge of the teacher model through relation level knowledge distillation;

and step S4, integrating the first-order node-level soft label knowledge and the second-order relation-level heterogeneous semantic knowledge to obtain high-order relation knowledge, training the student model based on the high-order relation knowledge, and using the trained student model for a specified task.

The distillation method based on the higher-order relation knowledge of the neural network of the heterogeneous map, wherein the step S1 comprises the following steps:

acquiring a heterogeneous data set D which comprises n training set samples, wherein the characteristic dimension of each sample is D dimension; constructing a teacher model T and a student model S with the same configuration, wherein each model comprises 5 layers: an input layer, a first layer convolution layer, a second layer convolution layer, an MLP linear transformation layer and a Softmax output layer; the neural network parameters of the teacher and the student are respectively W _t And W _s The activation function RELU used by the convolution layer is f (x) max (x, 0);

the intermediate graph convolution layer heterogeneous node embedded representation of the teacher model and the student model comprises:

the input sample is characterized by h ⁰ The expression of convolutional layer is h, then h _t ＝RELU(W _t *h ⁰ )，h _s ＝RELU(W _s *h ⁰ ) (ii) a The output expression of the MLP linear transformation layer is z, and the output expressions of the linear transformation layers of the teacher and student models are z respectively _t And z _s ；

The model predicted values of the teacher model and the student model include: the expression of the Softmax output layer is p, then p _t ＝Softmax(z _t )，p _s ＝Softmax(z _s )。

The distillation method based on the higher-order relation knowledge of the neural network of the heterogeneous map, wherein the step S2 comprises the following steps:

predicting value p by adopting teacher and student models _t ，p _s Transferring the soft label knowledge in the teacher model to the student model by using a node level knowledge distillation method to obtain a first-order node level distillation loss L _NKD As the first-order node-level soft label knowledge:

L _NKD ＝(1-α)L _CE +αL _KD

wherein

Basic cross entropy loss and distillation loss, respectively, alpha is a hyperparameter of equilibrium cross entropy loss and distillation loss, and D (-) is a KL metric function; in addition, the

Is the sfotmax probability output scaled by the temperature coefficient τ.

The distillation method based on the higher-order relation knowledge of the neural network of the heterogeneous map, wherein the step S3 comprises the following steps:

the expression h is embedded by a convolution layer between a teacher and a student _t ，h _s Transferring high-order semantic relation knowledge in the teacher model to the student model by using a relation-level knowledge distillation method;

the correlation matrix MetaCorr of the teacher and student network models is:

wherein

k is the total number of types of the heterogeneous nodes corresponding to the corresponding heterogeneous data set D, and i and j represent nodes of different types;

is a Gaussian kernel function;

the intermediate layer embedding is nonlinearly transformed, and then a shared attention vector q is applied to obtain the attention value of the student model

Wherein W _s Is a weight matrix of the teacher model, b _s Is a deviation vector;

to the attention valueCarrying out normalization processing to obtain a final attention coefficient through a softmax function

Obtaining the second order relation knowledge distillation loss L _RKD As second order relationship level heterogeneous semantic knowledge;

where D is the mean square error.

The distillation method based on the higher-order relation knowledge of the neural network of the heterogeneous map, wherein the step S4 comprises the following steps:

integration of L _NKD And L _RKD Obtaining the final total loss L of the distillation scheme of the high-order relation knowledge as the high-order relation knowledge so as to train the student model end to end;

L＝L _NKD +βL _RKD

wherein beta is L _NKD And L _RKD Is determined.

The method comprises the steps that a training set sample comprises a movie name, a director, actors and movie categories, and the specified task comprises the step of inputting the movie name to be classified and/or the director and/or the actors into the student model to obtain the movie category to which the students belong.

The invention also provides a high-order relation knowledge distillation system based on the neural network of the heterogeneous map, which comprises the following components:

the model acquisition module is used for respectively acquiring a heterogeneous graph neural network model of knowledge to be distilled as a teacher model, acquiring a heterogeneous graph neural network model of knowledge to be received as a student model, and acquiring model prediction values of output layers of the teacher model and the student model and the embedded expression of heterogeneous nodes of a middle graph convolution layer;

the first knowledge extraction module is used for extracting first-order node-level soft label knowledge of the teacher model through node-level knowledge distillation according to the model prediction values of the teacher model and the student models;

the second knowledge extraction module is used for extracting second-order relation-level heterogeneous semantic knowledge of the teacher model through relation-level knowledge distillation based on the built-in expression of the intermediate graph convolution layer of the teacher model and the student model;

the training module is used for integrating the first-order node-level soft label knowledge and the second-order relation-level heterogeneous semantic knowledge to obtain high-order relation knowledge, training the student model based on the high-order relation knowledge, and using the trained student model for a specified task;

the model acquisition module is configured to:

The model predicted values of the teacher model and the student model include: the expression of the Softmax output layer is p, then p _t ＝Softmax(z _t )，p _s ＝Softmax(z _s )；

The first knowledge extraction module to:

by teachingTeacher and student model prediction value p _t ，p _s Transferring the soft label knowledge in the teacher model to the student model by using a node level knowledge distillation method to obtain a first-order node level distillation loss L _NKD As the first-order node-level soft label knowledge:

L _NKD ＝(1-α)L _CE +αL _KD

wherein

Is the basic cross entropy loss and distillation loss, respectively, alpha is the hyperparameter of the equilibrium cross entropy loss and distillation loss, and D (-) is a KL metric function; in addition, the

Is sfotmax probability input convex with temperature coefficient tau scaling

The second knowledge extraction module is configured to:

the correlation matrix MetaCorr of the teacher and student network models is:

wherein

k is the total number of the types of the heterogeneous nodes corresponding to the corresponding heterogeneous data set D, and i, j represent nodes of different types;

is a Gaussian kernel function;

the attention value is normalized, and the final attention coefficient is obtained through a softmax function

wherein D is the mean square error;

the training module is configured to:

L＝L _NKD +βL _RKD

wherein beta is L _NKD And L _RKD Is determined.

The higher-order knowledge distillation system based on the heterogeneous graph neural network is characterized in that the training set samples comprise movie names, directors, actors and movie categories, and the specified task comprises inputting the movie names and/or the directors and/or the actors to be classified into the student model to obtain the movie categories to which the student model belongs.

The invention also provides a storage medium for storing a program for executing the any one of the higher-order relation knowledge distillation methods based on the heterogeneous graph neural network.

The invention also provides a client used for the distillation system based on the high-order relation knowledge of the neural network of the heterogeneous map.

The embodiment of the invention provides a high-order relation knowledge distillation method, which is used for applying knowledge distillation to a heterogeneous graph neural network for the first time and fills the blank of extracting knowledge from a heterogeneous graph model. The scheme combines first-order node-level knowledge distillation and second-order relation-level knowledge distillation, and can be flexibly applied to any HGNN model. Through the scheme, the student model can fully utilize and extract the soft tag knowledge and the high-order heterogeneous relation knowledge hidden in the HGNN. Therefore, the generalization capability of the student model is improved, and the performance is remarkably superior to that of the corresponding teacher model.

Drawings

FIG. 1 is a schematic flow chart of a method for performing distillation of high-order knowledge based on a neural network of an isomerous diagram according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a high-order knowledge distillation method based on a neural network of a heterogeneous diagram according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a method for performing high-order relationship knowledge distillation based on an isomorphic neural network according to an embodiment of the present invention.

Detailed Description

The invention provides a high-order relation knowledge distillation method and system based on a heterogeneous graph neural network. The specific technical solution is as follows, the description is given by taking classic heterogeneous data sets such as IMDB (including three types of heterogeneous nodes of movie, director and actor), ACM (including three types of heterogeneous nodes of paper, author and field), and DBLP (including four types of heterogeneous nodes of paper, conference, author and keyword) as examples:

according to the first aspect of the invention, aiming at the problem of inaccurate labeling of data labels, a first-order node-level knowledge distillation (NKD) method is introduced, soft labels of target nodes (such as movies in movie data) are transmitted to students, and general supervision information is provided for downstream tasks (such as node classification). The method comprises the following steps:

step S1: respectively constructing abnormal graph neural network models of teachers and students, and obtaining predicted values of output layer models of the teachers and the students and embedded expression of middle graph convolution layer heterogeneous nodes;

step S2: adopting the model prediction values of the teacher and student networks obtained in the step 1, and transferring the first-order node-level soft label knowledge of the pre-trained teacher model to the student models by using node-level knowledge distillation;

step S3: adopting the intermediate graph convolution layer embedded expression of the teacher and student network obtained in the step 1, and transferring the second-order relation-level high-order heterogeneous semantic knowledge of the pre-trained teacher model to the student model by using relation-level knowledge distillation (RKD);

step S4: and (3) integrating the node-level knowledge and the relation-level knowledge in the steps 2 and 3 to obtain a final high-level relation knowledge distillation scheme, further training the student model, and finally obtaining a trained student model through minimizing loss until the student network converges, so that the student model can be used for different downstream tasks. Downstream tasks of the ACM data set on the fields to which the papers belong comprise tasks of classification, clustering, visualization and the like; the IMDB performs downstream tasks including classification, clustering, visualization and other tasks on the film; DBLP performs downstream tasks such as classification, clustering, visualization, etc. on the author's research neighborhood.

In an embodiment of the present invention, the step S1 further includes: inputting heterogeneous data sets and constructing neural network models of teacher and student heterogeneous image, wherein the specific data sets and the models are set to

Preparing a heterogeneous data set D (such as IMDB, ACM, DBLP and other classical heterogeneous data), wherein n training set samples are provided, and the characteristic dimension of each sample is D dimension; constructing reference teacher and student models T and S with the same configuration, wherein the models T and S comprise 5 layers: input layer, first layer convolution layer, second layer convolution layer, MLP linear transformationA layer and a Softmax output layer; the parameters of the neural network of the teacher and the student are respectively recorded as W _t And W _s The activation function used by the convolutional layer is RELU, in the form of f (x) max (x, 0).

In one embodiment of the present invention, step S1 further includes: calculating the predicted value of the output layer model of the teacher and the student and the embedded expression of the intermediate graph convolution layer heterogeneous nodes, and specifically calculating the method

The input sample characteristic is recorded as h ⁰ The expression of convolutional layer is h, then h _t ＝RELU(W _t *h ⁰ )，h _s ＝RELU(W _s *h ⁰ ) (ii) a The output expression of the MLP linear transformation layer is recorded as z, and the probability output of the teacher model and the probability output of the student model are respectively z _t And z _s Noting that the expression of the Softmax output layer is p, then p _t ＝Softmax(z _t )，p _s ＝Softmax(z _s )。

In one embodiment of the present invention, the step S2 includes: predicting value p by adopting teacher and student models _t ，p _s Transferring the soft label knowledge in the teacher model to the student model by using a node level knowledge distillation method to obtain a first-order node level distillation loss L _NKD The loss function is

L _NKD ＝(1-α)L _CE +αL _KD

Wherein

Basic cross entropy loss and distillation loss, respectively, i represents a node, α is a hyper-parameter of equilibrium cross entropy loss and distillation loss, and D (-) is a KL metric function; in addition, the

The sfotmax probability output is scaled by the temperature coefficient tau, the probability distribution on the class is smoother when the hyper-parameter tau is larger, and more smooth information is learned by a student model.

The step S3 includes: the expression h is embedded by a convolution layer between a teacher and a student _t ，h _s Distillation method using relationship level knowledgeHigher-order semantic relationship knowledge in the teacher model is transferred to the student model.

In an embodiment of the present invention, the step S3 further includes: in order that students can fully extract high-order semantic information hidden in HGNN from teachers, a MetaCorr correlation matrix is designed, relation-level knowledge between different types of nodes is coded from a pre-trained teacher model, and MetaCorr of the teacher and student network model is calculated as

Wherein

k is the total number of the types of the heterogeneous nodes corresponding to the corresponding heterogeneous data sets, and i, j represent nodes of different types;

is a Gaussian kernel function for measuring the similarity between two node-embedded representations, the larger of which

Indicating that the distance between the two node representations is relatively large. The reason for using the gaussian RBF kernel is that gaussian RBFs are more flexible and powerful in capturing complex non-linear relationships between nodes. To avoid dimension disaster, pair

A second order Taylor extension is selected.

Meanwhile, a type-related attention layer is introduced behind the convolutional layer, and the importance of different node types is automatically learned. Firstly, the middle layer embedding is subjected to nonlinear transformation, and then a shared attention vector q is applied to obtain the attention value of the student model

Wherein W _s Is a weight matrix of the teacher model, b _s Is a deviation vector. Then, the attention value is normalized, and a final attention coefficient is obtained through a softmax function

Obviously, higher alpha indicates more critical nodes, and alpha can be dynamically adjusted in the model training process. Finally, the distillation loss L of the second order relation knowledge is obtained _RKD The loss function is

Where D is the mean square error MSE penalty.

In one embodiment of the present invention, the step S4 includes: integration of node-level knowledge distillation loss L _NKD And relationship-level knowledge distillation loss L _RKD Obtaining the final total loss L of the distillation scheme with the higher-order relation knowledge, wherein the loss function is

L＝L _NKD +βL _RKD

Wherein beta is a hyperparameter balancing first order node-level knowledge distillation and second order relationship-level knowledge distillation.

According to the total loss L, end-to-end training can be carried out on the student model, the loss L is minimized until the student network converges, and finally a trained student model is obtained, so that the student model can be used for different downstream tasks.

According to a second aspect of the present invention, there is provided a computer readable storage medium having stored therein one or more computer programs which, when executed, are for implementing the higher order knowledge of relationship distillation method based on a neural network of an isomerous graph of the present invention.

According to a third aspect of the invention there is provided a computing system comprising: a storage device, and one or more processors; wherein the storage device is configured to store one or more computer programs that, when executed by the processor, are configured to implement the higher order knowledge of relationship distillation method based on a neural network of a heterogeneous map according to the present invention.

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

It can be seen from the background that the performance of the existing HGNN model is limited to: (1) data annotation is not accurate; (2) high-order relational semantic modeling is inadequate. Inspired by the successful application of knowledge distillation technology in deep learning, for example, certain advantages are shown in the performance of the improved model, and some work attempts are made to combine the knowledge distillation method and the graph neural network for application. However, these methods are designed for homogeneous neural networks, and each node or edge in the processed data is of the same type.

Aiming at the two problems faced by HGNN, the inventor conducts research and designs a higher-order relation knowledge distillation method facing to a heterogeneous graph neural network so as to improve the performance of a student heterogeneous graph neural network model. In summary, the method of the present invention is shown in fig. 1, and step S1: based on the constructed abnormal graph neural network models of the teacher and the students, respectively obtaining output layer model prediction values of the teacher and the students and the middle graph convolution layer heterogeneous node embedded representation; step S2: then, based on the obtained model prediction values of the teacher and student networks, transferring the first-order node-level soft label knowledge of the pre-trained teacher model to the student models by adopting node-level knowledge distillation; step S3: then, based on the intermediate graph convolution layer embedded representation of teachers and students, transferring second-order relation level high-order heterogeneous semantic knowledge of a pre-trained teacher model to a student model by adopting relation level knowledge distillation; step S4: and finally, integrating the previous node-level knowledge and the relation-level knowledge to obtain final high-level relation knowledge, training the student model, and obtaining a trained student model by minimizing student loss, so that the student model can be used for different downstream tasks.

The invention is described in detail below with reference to the accompanying drawings, fig. 2 shows a high-order relation knowledge distillation method based on a heterogeneous graph neural network provided by the invention, fig. 3 shows a high-order relation knowledge distillation system based on a heterogeneous graph neural network formed by a teacher model and a student model in an embodiment of the invention, and the method comprises the following 4 steps:

step S1: and respectively constructing abnormal graph neural network models of the teacher and the students, and obtaining the predicted values of the output layer models of the teacher and the students and the embedded representation of the abnormal nodes of the convolution layer of the intermediate graph.

According to one embodiment of the invention, heterogeneous data D is input and T and S are constructed: preparing a heterogeneous data set D, wherein n training set samples are provided, and the characteristic dimension of each sample is D dimension; constructing reference teacher and student models T, S (see figure 3) with the same configuration, wherein the model comprises 5 layers: an input layer, a first layer convolution layer, a second layer convolution layer, an MLP linear transformation layer and a Softmax output layer; the parameters of the neural network of the teacher and the student are respectively marked as W _t And W _s The activation function used by the convolutional layer is RELU, in the form of f (x) max (x, 0).

Calculating the predicted value p of the output layer model of T and S and the heterogeneous node embedding representation h of the convolution layer of the intermediate graph, and specifically calculating as follows: the input sample characteristic is recorded as h ⁰ The expression of convolutional layer is h, then h _t ＝RELU(W _t *h ⁰ )，h _s ＝RELU(W _s *h ⁰ ) (ii) a Recording the output expression of the MLP linear transformation layer as z, the probability output of the teacher model and the probability output of the student model are respectively z _t And z _s Noting that the expression of the Softmax output layer is p, then p _t ＝Softmax(z _t )，p _s ＝Softmax(z _s )。

Step S2: using p obtained in step 1 _t ，p _s Transferring the knowledge of the first-order node-level soft label of the pre-training T to S by using node-level knowledge distillation to obtain the distillation loss L of the first-order node-level _NKD The loss function is

L _NKD ＝(1-α)L _CE +αL _KD

Wherein

The sfotmax probability output is scaled by a temperature coefficient tau, and the probability distribution on the class is smoother when the tau is larger, so that the student model is promoted to learn more smooth information.

Step S3: and (2) embedding the convolution layer of the T and S network intermediate graph obtained in the step (1) into a representation h, and transferring the second-order relation-level high-order heterogeneous semantic knowledge of the pre-trained T into the S model by using relation-level knowledge distillation.

Wherein, in order that students can fully extract high-order semantic information hidden in the heteromorphic neural network from teachers, a MetaCorr correlation matrix is designed, the pre-trained T encodes the relation-level knowledge between different types of nodes, and the MetaCorr of the T and S network model is calculated as

Wherein

k is the total number of types of the heterogeneous nodes corresponding to the corresponding heterogeneous data set, i, j represents notNodes of the same type;

A second order Taylor extension is selected.

Wherein W _s Is the weight matrix of the T model, b _s Is a deviation vector. Then, the attention value is normalized, and a final attention coefficient is obtained through a softmax function

Where D is the mean square error.

Step S4: integrating node-level knowledge L of step 2 and step 3 _NKD And relationship level knowledge L _RKD Obtaining the final total loss L of the distillation scheme with the higher-order relation knowledge, wherein the loss function is

L＝L _NKD +βL _RKD

According to the total loss L, the S model can be trained end to end, and a trained student model is obtained by minimizing the loss L until the S converges, so that the student model can be used for different downstream tasks.

To illustrate the effectiveness of the above-described scheme of embodiments of the present invention, experiments are described in conjunction with the following detailed description, the experiments being developed over several classical heterogeneous graph datasets:

data set

The experiment involved 3 reference data sets, including 2 citation network (ACM and DBLP) and 1 movie network (IMDB) data sets, described in relation to table 1 below:

table 1 3 heterogeneous graph data sets adopted in this scheme

The meaning of the meta path column is the meta path type of the corresponding data set, and is represented by the node type passing through the meta path.

Two, reference model

To verify the effectiveness of the distillation scheme of the invention, this experiment will be tested on classical heterogeneous models, RGCN, HAN, HGT and HGConv heterogeneous patterned neural network models, respectively.

Third, experimental results

The method applies a designed high-order relational knowledge distillation algorithm to RGCN, HAN, HGT and HGConv heterogeneous graph neural network models, and carries out node classification on three data sets of ACM, IMDB and DBLP, wherein the classification index is Micro-F1. The specific experimental effects are shown in table 2:

table 2 classification effect of the present solution on heterogeneous data sets based on various heterogeneous graph neural networks

From table 2, it can be found that by using the high-order relationship knowledge distillation scheme related by the invention, the performances of the neural networks of the heterogeneous graphs are all improved remarkably and consistently, and the improvement amplitude is 0.5% -9.6%.

The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

the first knowledge extraction module is used for extracting first-order node-level soft label knowledge of the teacher model through node-level knowledge distillation according to the model predicted values of the teacher model and the student model;

the model acquisition module is configured to:

acquiring a heterogeneous data set D which comprises n training set samples, wherein the characteristic dimension of each sample is D dimension; constructing a teacher model T and a student model S with the same configuration, wherein each model comprises 5 layers: an input layer, a first layer convolution layer, a second layer convolution layer, an MLP linear transformation layer and a Softmax output layer; the neural network parameters of the teacher and the student are respectively W _t And W _s The activation function RELU used for the convolution layer is f (x) max (x, 0);

The first knowledge extraction module is configured to:

L _NKD ＝(1-α)L _CE +αL _KD

wherein

Is sfotmax probability output with temperature coefficient τ scaling;

the second knowledge extraction module is to:

the correlation matrix MetaCorr of the teacher and student network models is:

wherein

is a Gaussian kernel function;

wherein D is the mean square error;

the training module is configured to:

L＝L _NKD +βL _RKD

wherein beta is L _NKD And L _RKD Is determined.

The system comprises a training set sample, a student model and a heterogeneous graph neural network, wherein the training set sample comprises a movie name, a director, an actor and a movie category, and the specified task comprises inputting the movie name to be classified and/or the director and/or the actor into the student model to obtain the movie category to which the student model belongs.

Claims

1. A higher-order relation knowledge distillation method based on a heterogeneous graph neural network is characterized by comprising the following steps:

step S2, extracting first-order node-level soft label knowledge of the teacher model through node-level knowledge distillation based on the model prediction values of the teacher model and the student models;

2. The method of claim 1, wherein the step S1 comprises:

the input sample is characterized by h ⁰ Expression of convolutional layer ash, then h _t ＝RELU(W _t *h ⁰ )，h _s ＝RELU(W _s *h ⁰ ) (ii) a The output expression of the MLP linear transformation layer is z, and the output expressions of the linear transformation layers of the teacher and student models are z respectively _t And z _s ；

3. The method of claim 2, wherein the step S2 comprises:

L _NKD ＝(1-α)L _CE +αL _KD

wherein

Is the sfotmax probability output scaled by the temperature coefficient τ.

4. The method of claim 3, wherein the step S3 comprises:

the correlation matrix MetaCorr of the teacher and student network models is:

wherein

is a Gaussian kernel function;

the attention value is normalized, and a final attention coefficient is obtained through a softmax function

Get the second order relation level knowledgeDistillation loss L _RKD As second order relationship level heterogeneous semantic knowledge;

where D is the mean square error.

5. The method of claim 4, wherein the step S4 comprises:

L＝L _NKD +βL _RKD

wherein beta is L _NKD And L _RKD Is determined.

6. The method according to any one of claims 2 to 4, wherein the training set samples comprise movie names, directors, actors and movie categories, and the assignment task comprises inputting the movie names and/or directors and/or actors to be classified into the student model to obtain the movie category to which the student model belongs.

7. A higher order knowledge of relationships distillation system based on a neural network of heterogeneous maps, comprising:

the model acquisition module is configured to:

The first knowledge extraction module to:

L _NKD ＝(1-α)L _CE +αL _KD

wherein

Is sfotmax probability output with scaling of the temperature coefficient τ;

the second knowledge extraction module is to:

the correlation matrix MetaCorr of the teacher and student network models is:

wherein

is a Gaussian kernel function;

wherein D is the mean square error;

the training module is used for:

L＝L _NKD +βL _RKD

wherein beta is L _NKD And L _RKD Is determined.

8. The system of claim 7, wherein the training set samples comprise movie names, directors, actors, and movie categories, and the assignment task comprises inputting the movie names and/or directors and/or actors to be classified into the student model to obtain the movie category to which the student model belongs.

9. A storage medium storing a program for executing the higher order relation knowledge distillation method based on the neural network of the heterogeneous map according to any one of claims 1 to 7.

10. A client for the higher order knowledge of relationship distillation system based on a neural network of a heterogeneous map according to claim 8 or 9.