CN115481682A

CN115481682A - Graph classification training method based on supervised contrast learning and structure inference

Info

Publication number: CN115481682A
Application number: CN202211106324.4A
Authority: CN
Inventors: 冀俊忠; 贾浩; 雷名龙
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-09-11
Filing date: 2022-09-11
Publication date: 2022-12-16

Abstract

The invention discloses a graph classification training method based on supervised contrast learning and structure inference. And then randomly sampling through the labels to construct sample positive and negative example pairs, and respectively learning the positive and negative example pairs by using a hierarchical graph neural network to extract the whole graph embedding. And finally, the learning process of the hierarchical graph neural network is guided by common classification loss and supervised contrast learning loss, and the embedded classification performance is improved. The invention integrates structure inference and label information, and the data enhancement mode based on the structure inference does not need prior knowledge, thereby widening the application range of the model and accelerating the learning speed of the model. Self-supervision contrast learning on the graph data is expanded into contrast learning under label supervision, and the contrast learning capability of the graph data is enhanced. The method improves the graph classification performance and has better generalizability on the generalized graph classification data.

Description

Graph classification training method based on supervised contrast learning and structure inference

Technical Field

The invention relates to a topological information extraction and comparison learning method of a graph data structure, and designs a graph classification training model based on supervision comparison learning and structure inference aiming at the problem that the graph data structure is difficult to classify by using a traditional method.

Background

Graph data structures (graphs) are a general term for a class of relational data structures that are commonly used to characterize real-world systems that involve complex interrelationships, such as social networks, protein networks, and traffic networks. Specifically, each graph is composed of nodes and edges, wherein each node corresponds to one concrete or abstract entity, the features on the node are abstract to the concrete features of the entity, each edge represents the mutual relationship existing between a pair of entities, and the relationship between the node and the edge forms the topological information structure of the graph. The graph classification task is a process of summarizing the topological structure and characteristics of each graph and dividing different graphs into different categories. Nowadays, the graph classification task has been applied to many fields, such as chemical molecule property prediction, brain disease classification, point cloud image classification (point cloud classification), and the like.

The more popular graph classification methods now can be roughly divided into two categories: graph Kernel (GK) and Graph Neural Networks (Graph Neural Networks). The general graph core method includes a random walk core method, a shortest path core method, a Webster Rahman core method, and the like. The graph core method requires manually defining a kernel function to summarize the topological structure information of the graph, and lacks universality and learnability in a more general data set, which limits the use of the graph core method. The graph neural network modifies the current popular artificial neural network algorithm to adapt to the topological property of graph data, and has excellent expansibility and learnability. Because the graph neural network can have a deeper structure, the hidden topological information of the graph can be obtained by adopting a stacking depth method. However, an increase in the number of layers of the neural network of the graph leads to over-smooth node features and reduces feature identifiability. In order to solve the problem, a subgraph neural network and a graph neural network jointly form a hierarchical feature extraction model. However, the method still lacks a targeted design on the graph classification problem and cannot fully mine the performance of the graph neural network.

In the general classification field, the classification capability of the model can be enhanced by adopting contrast learning, but related attempts, especially the content of supervised contrast learning, are lacked in the graph classification field. Meanwhile, an enhanced graph structure beneficial to graph classification can be constructed through structure inference aiming at the quality of the graph. The structure inference is a technology for carrying out structure inference on the graph through sequence sampling, deep topological information of the graph can be visually expressed, and the data quality of the graph is improved.

Disclosure of Invention

Aiming at the problems that label information is lacked and a data enhancement means depends on prior knowledge in the current graph classification task adopting contrast learning, the invention provides a graph classification model based on Supervised contrast learning and structure inference (Supervised contrast learning with structure inference) and provides a graph classification model based on Supervised contrast learning and structure inference. First, the model fully mines topology information of the graph data itself as an enhanced graph of the original graph data by structure inference. And then, randomly sampling through the labels to construct sample positive and negative example pairs, and respectively learning the positive and negative example samples by using a hierarchical graph neural network to extract whole graph embedding. And finally, the learning process of the hierarchical graph neural network is guided by common classification loss and supervised contrast learning loss, and the embedded classification performance is improved.

The main idea for realizing the invention is as follows: and (4) scoring each edge of the graph in the information circulation process according to the simulation time sequence, and mining nodes and edges with important information hubs so as to extract the hidden deep topological features of the graph. The topological characteristics are combined with the original graph, so that the information which needs to be extracted by the multilayer GNNs can be directly expressed on the change of the topological structure, and the information extraction capability of the GNNs is improved under the condition of a certain number of layers. And in the aspect of training and guiding the whole model, a mode of combining supervision contrast learning and traditional classification learning is adopted. The uniqueness of embedding can be promoted by introducing supervision comparison learning in the graph classification task, meanwhile, label information is introduced in self-supervision comparison learning to help comparison learning loss to clarify learning direction, and the distinctiveness among different types of embedding is increased, so that the integral classification performance of the model is promoted.

A graph classification method based on supervised contrast learning and structure inference comprises the following steps:

step one, data acquisition: basic data required by graph classification, namely an adjacency matrix A and a node characteristic X, are obtained based on data sets such as MUTAG, PTC, PROTECTION and IMDBBINARY.

Step two, generating a simulation time sequence: and generating a corresponding simulation time sequence set C for the adjacency matrix A of each graph based on the basic data acquired in the step one.

Step three, structure inference: calculating the adjacency probability of each edge by posterior according to the time series set C of each graph, applying the adjacency probability to each edge and applying a threshold k to generate an enhanced graph

Step four, constructing positive and negative sample: to be acquired

And X is divided into positive and negative example samples G according to graph labels _p And G _n 。

Step five, generating subgraph embedding: positive and negative examples G _p And G _n And generating corresponding subgraph division by adopting breadth-first search on each node, and generating subgraph embedding by applying a subgraph neural network on the corresponding subgraph.

Step six, generating graph embedding: the sub-graph embedding is scored and sampled to construct a one-dimensional embedding capable of representing the data of the current graph, and the embedding is utilized to generate a corresponding graph classification label P.

Step seven, supervision comparison learning: and (3) the graph embedding of the positive and negative samples and the corresponding supervision labels calculate loss through a comparison learning function, the graph classification label P and the real label calculate general classification loss, and the graph classification label P and the real label are combined to form final classification loss.

And step eight, the model is iteratively updated according to the classification loss, and a final image classification label P' is adopted as a final image classification model to be output after convergence.

Compared with the prior art, the invention has the following obvious advantages and beneficial effects;

(1) And the structure inference and the label information are fused, so that the capability of contrast learning in the graph classification task is enhanced.

(2) The data enhancement mode based on the structure inference does not need prior knowledge, thereby widening the application range of the model and accelerating the learning speed of the model.

(3) Self-supervision contrast learning on the graph data is expanded into contrast learning under the supervision of the label, and the contrast learning capability of the graph data is enhanced.

(4) The experimental result on PTC shows that the method can improve the graph classification performance and has better generalizability on the generalized graph classification data.

Drawings

FIG. 1: a flow chart of a model involved in the method.

FIG. 2: and (4) performing ablation experiments.

FIG. 3: and comparing the precision of the training process model.

Detailed Description

The following explains a specific embodiment and detailed steps of the present invention, and a flow of the specific implementation of the present invention is shown in fig. 1, which specifically includes:

(step one) data acquisition: in order to verify the effectiveness of the model provided by the invention, experiments are carried out on MUTAG, PTC, IMDBBINARY and PROTEINS data sets to evaluate the classification performance of the model. Where the MUTAG and PTC datasets are compound molecule datasets, each figure represents a compound molecule. IMDBBINARY is an internet movie database, with each node representing one actor and the edges representing two actors appearing in the same movie. PROTEINS are a data set of protein molecules whose topology represents the topological shape of the corresponding protein molecule in space. For each graph in the above dataset, it can be represented as G = (V, E), where V represents a set of nodes and E represents a set of edges. For an attribute map, it can be further expressed as G = (X, A) | X ∈ R ^n×d ,A∈R ^n×n Where X is the set of the attributes of each node, A is the topology structure of the adjacency matrix representation graph, n is the number of nodes in each time series, and d is the maximum value of the characteristic dimension of the node.

(step two) generating a simulation time series: the invention uses a simulated infection program with a fusion node degree to generate a data simulation sequence for each graph in the data set. Firstly, selecting nodes in the graph, and then generating the infection probability of each neighbor according to the difference value between the degree of the nodes and the degree of the adjacent nodes. And calculating corresponding infection time and infection nodes according to the infection probability, and performing corresponding marking. Repeating the above process for a limited time can result in a time-infected sequence of a node. For each plot, we generated 1000 temporal infection sequences for structural inference.

(step three) structure inference: the join probabilities of the various edges are a posteriori calculated from the time series of each graph and a threshold ξ is applied to produce the enhanced graph a'. Time series set for each graph C: (c) ¹ ，...，c ^q )∈R ^q×n Where q is the number of time series, n is the number of nodes in each time series, c ¹ ，...，c ^q Are the individual elements of the time series. The calculation of the corresponding adjacency matrix M from the set C is to maximize the probability function φ (C; M). In the time window [0,T]Internal, infection process slave node v _i To node v _j Transfer equation f (t) _j |t _i ；M _i，j ) Comprises the following steps:

wherein t is _i Is a node v _i Time of infection of t _j Is a node v _j Time of infection of t _i ，t _j T is less than or equal to T, e is a natural base number, M _i，j Is the (i, j) th entry of the adjacency matrix M. Then by node v _i To node v _j The transition probability equation phi () can be defined as:

wherein t is _k Is a time t _i To t _j All time in between, all will satisfy v _i (t _i ＜t _j ) Summary of conditionsThe sum of the rates is then obtained at t _j Probability distribution of adjacency matrix M that the time instant satisfies the current temporal infection sequence:

for all time instants T of each time series, multiplying the corresponding probability distribution to obtain the posterior probability distribution of the adjacency matrix M at any time instant T under the condition of the time series c:

the probability density of the probability adjacency matrix M at time T and the corresponding time series c according to the probability distribution and the conditional independence assumption is as follows:

since equation (5) only expresses the probability distribution of a single time series C, for the set C, f (C; M) = n _c∈C f (c; M), and finally solving for M can be obtained in the form:

max _M≥0 ∑ _c∈C logf(c；M) (6)

m obtained after noise is filtered by using a threshold xi and passes

An enhancement map a' is generated as an enhanced adjacency matrix for the data.

(step four) constructing positive and negative example samples: for each sample of each label, the sample itself and the corresponding label are taken as positive examples, and the samples of the samples in the other labels are taken as negative examples. For each exemplar in the dataset, the exemplars are divided according to the label into exemplars belonging to the current label and exemplars not belonging to the current label. Randomly selecting a sample from the set not belonging to the current label as a negative example of the current sample, and forming a positive and negative example pair together with the original sample.

(step five) generation of subgraph embedding: and generating corresponding subgraph division by adopting breadth-first search for each node, and generating subgraph embedding by applying a subgraph neural network on the corresponding subgraph. For each node v on the graph _i Generating corresponding subgraphs g by applying a Breath-First-Search (BFS) and a Search upper threshold beta _i And corresponding node set S _BFS (v _i ). Applying the sub-graph neural network on the set to generate initialized node characteristics is as follows:

where x is the initial characteristic of the node, h ⁽⁰⁾ Is node v _i The initialization feature for the graph neural network, AGGEREGATE is the aggregation function for aggregating neighbor features, and COMBINE is the function that computes neighbor features and current node features. By the formula (7) and the formula (8), we can calculate the initial characteristic H of the whole graph ⁽⁰⁾ 。

(step six) generating graph embedding: the sub-graph embedding is scored and sampled to construct a one-dimensional embedding that can represent the current graph data. Embedding H in subgraph ⁽⁰⁾ On the basis, a plurality of neighborhood aggregation layers are adopted to form a depth map neural network, wherein the feature updating function of each layer is as follows:

wherein l.gtoreq.1 represents the layer number of the current polymerization,

representing a node v _i γ is the corresponding weightThe parameter MLP is then a multi-layer perceptron for learning aggregation methods. After several layers of learning, the node feature representation of the whole graph can be used

Is represented by (a) wherein d _n Is the characteristic dimension of the node at level L. To fuse the feature representations of several nodes in a graph into a single embedding representing the entire graph, first we apply learnable TOP _k The function selects k important nodes:

where idx denotes the number of the selected node,

representing a learnable parameter vector. The final graph embedding r can be calculated by applying the following formula to the selected nodes:

wherein,

is a weight matrix, d _s Is the number of dimensions of the super-node,

is the embedded expression of the whole graph, and P is the icon label prediction of the study.

(step seven) supervision comparison learning: and (4) embedding the positive and negative sample graphs and corresponding supervision labels, calculating loss through a comparison learning function, and combining the loss with the general classification loss to form final classification loss. We take cross-entropy loss as a general classification loss:

where Y is the real label information of each graph. The comparative learning loss after the introduction of the label information can be expressed as follows:

where Ω = { 1.,. 2m } represents the subscripts of all samples in the current batch of data (including positive and negative examples), Φ (.) represents the subscripts of all positive examples in the current batch of data, for each positive example i there is a corresponding negative example p, τ is the corresponding temperature control hyperparameter, and Γ (i) is the set of all samples except for sample i. By the method for constructing positive and negative example sample pairs according to the labels in advance, label information is implicit in formula (14), so that the guidance quality of contrast learning in the graph classification task is improved. The final losses are as follows:

where λ is the hyper-parameter controlling the proportion of the supervised contrast learning component.

And (step eight), carrying out iterative updating on the model according to the classification loss, and outputting by taking a final image classification label P' as a final image classification model after convergence.

To fully verify the superiority of the method, we compared the MUTAG, PTC, PROTECTIN, IMDBBINARY datasets with many existing graph classification methods. These methods can be broadly divided into two categories: graph neural network method and graph contrast learning method. The graph neural network method comprises GraphSAGE, GIN, DAGCN, PPGN, capsGNN, SGN and the like, and the graph contrast learning method comprises GraphCL, infoGraph, M-GCL, GXN, sGIN, SUGAR and the like. Using the classification accuracy (acc) as a comparison index, the results are shown in table 1 as mean (%) ± standard deviation:

TABLE 1 comparison of the methods on MUTAG, PTC, PROTECTIN, and IMDBBINARY datasets

Method	MUTAG	PTC	PROTEINS	IMDBBINARY
					GraphSAGE	79.8±13.9	-	65.9±2.7	72.4±3.6
GIN	89.4±5.6	64.6±7.0	76.2±2.8	75.1±5.1
					DAGCN	87.2±2.05	62.9±9.6	76.3±4.3	-
PPGN	90.6±8.7	66.2±6.5	77.2±4.7	73.0±5.8
					CapsGNN	86.7±6.9	-	72.0±1.1	72.2±0.9
SGN	89.5±7.4	64.1±3.7	76.3±4.1	76.5±5.7
					GraphCL	86.8±1.3	-	74.4±0.5	71.1±0.4
InfoGraph	89.0±1.1	61.7±1.4	-	73.0±0.9
					M-GCL	89.7±1.1	62.5±1.7	-	74.2±0.7
GXN	86.1±8.3	63.5±5.8	79.9±4.1	78.6±2.3
					sGIN	94.1±2.7	73.6±4.3	79.0±3.2	77.9±4.3
SUGAR	96.7±4.1	77.5±2.8	81.3±0.9	73.0±3.5
					SupCosine(ours)	98.3±2.5	87.8±10.4	80.0±3.6	83.0±3.2

From the above table, we can see that our model has higher accuracy mean value compared with other methods, and basically reaches the highest performance level on four data sets. On the MUTAG, PTC and IBDBBINARY datasets, compared with the second strongest method, the method respectively achieves the performance improvement of 1.0%, 10% and 4.4%, and also achieves the first level in parallel on the PROTECTIN dataset. Our approach made a significant improvement over other graph-versus-learning approaches, which demonstrated the effectiveness of our strategy.

To further illustrate the effectiveness of the various modules of the proposed method, we performed corresponding ablation experiments on four data sets, respectively. The basic method (Base) of comparison adopts a GIN model without adding a structure inference part and a supervision comparison learning part, and a structure inference module (+ StruInf) and a supervision comparison learning part (SupGCon) are respectively added on the basis, and a final model (SupCosine) fusing all modules. The data results are shown in figure 2. In most cases, the performance of the structure inference module is superior to that of a model which adopts a supervision and comparison learning module only, and compared with Base, the two modules bring certain performance improvement of graph classification, which proves the effectiveness of the structure inference module. Meanwhile, the SupCosine model combined with the two modules achieves very remarkable performance improvement, which shows that the structure inference and supervision comparison learning module can be well fused with the graph neural network, and the two modules supplement each other to enable the graph neural network to well classify the graphs.

To further illustrate the effectiveness of the present method, we present a graph of the training accuracy of GIN and our method in fig. 3. It can be seen that our method has a relatively high accuracy from the beginning and a considerable stability during training. In contrast, the GIN method fluctuates significantly during the training process, and the training accuracy is close to 100% after about 100 epochs, which indicates that there is a very significant overfitting phenomenon, and therefore the performance on the test set is rather inferior to that of our method. The relatively stable training process of the method is mainly benefited by the structure inference module to strengthen hidden links in the graph, and the property characteristics of different graphs are highlighted. The supervision comparison learning module helps the model to learn identifiability among various figures, avoids overfitting phenomenon in the training process and improves the generalizability of the model.

The experiment shows that compared with other graph classification methods based on a graph neural network, the model SupCosine provided by the invention has superior performance, has better adaptability on the aspect of generalized graph classification problem, and has good popularization and application prospects.

Claims

1. A graph classification training method based on supervised contrast learning and structure inference is characterized by comprising the following steps:

step one, data acquisition: acquiring basic data required by graph classification, namely an adjacency matrix A and a node characteristic X, based on data sets such as MUTAG, PTC, PROTECTION, IMDBBINARY and the like;

step two, generating a simulation time sequence: generating a corresponding simulation time sequence set C for the adjacent matrix A of each graph based on the acquired basic data;

step three, structure inference: calculating the adjacency probability of each edge through posterior according to the time series set C of each graph, applying the adjacency probability to generate an enhanced graph by applying a threshold k

Step four, constructing positive and negative sample: to be acquired

And X is divided into positive and negative example samples G according to graph labels _p And G _n ；

Step five, generating subgraph embedding: positive and negative example G _p And G _n Generating corresponding subgraph division by adopting width-first search on each node, and generating subgraph embedding by applying a subgraph neural network on the corresponding subgraph;

step six, generating graph embedding: grading the sub-graph embedding, sampling and constructing one-dimensional embedding capable of representing current graph data, and generating a corresponding graph classification label P by utilizing the embedding;

step seven, supervision comparison learning: the graph embedding of positive and negative samples and corresponding supervision labels calculate loss through a comparison learning function, the graph classification label P and the real label calculate general classification loss, and the graph classification label P and the real label are combined to form final classification loss;

2. Root of herbaceous plantThe graph classification training method based on supervised contrast learning and structure inference as claimed in claim 1, wherein, in step one, the MUTAG and PTC datasets are compound molecule datasets, and each graph represents a compound molecule; the IMDBBINARY is an internet movie database, each node represents one actor, and edges represent that two actors appear in the same movie; PROTEINS is a protein molecule data set, and the topological structure of PROTEINS represents the topological shape of corresponding protein molecules in space; for each graph in the data set, G = (V, E), where V represents a set of nodes and E represents a set of edges; for an attribute map, the expression G = (X, A) | X ∈ R ^n×d ,A∈R ^n×n Where X is the set of the attributes of each node, A is the topology structure of the adjacency matrix representation graph, n is the number of nodes in each time series, and d is the maximum value of the characteristic dimension of the node.

3. The graph classification training method based on supervised contrast learning and structure inference as claimed in claim 1, wherein in step two, a simulated infection program with a fusion node degree is adopted to generate a data simulation sequence for each graph in the data set; firstly, selecting nodes in a graph, and then generating the infection probability of each neighbor according to the difference value between the degree of the nodes and the degree of the adjacent nodes; calculating corresponding infection time and infection nodes according to the infection probability, and carrying out corresponding marking; repeating the process within a limited time to obtain a time infection sequence of a node; for each graph, 1000 time infection sequences were generated for structural inference.

4. The graph classification training method based on supervised contrast learning and structure inference as claimed in claim 1, wherein in step three, the connection probability of each edge is calculated a posteriori according to the time sequence of each graph, and a threshold ξ is applied to generate an enhanced graph a'; for each time series set of graphs C (C) ¹ ,…,c ^q )∈R ^q×n Where q is the number of time series, n is the number of nodes in each time series, c ¹ ,…,c ^q As a sequence of timeThe individual elements of the column; calculating a corresponding adjacency matrix M according to the set C, namely maximizing a probability function phi (C; M); in time window [0,T]Internal, infection process slave node v _i To node v _j Transfer equation f (t) _j ∣t _i ；M _i,j ) Comprises the following steps:

wherein t is _i Is a node v _i Time of infection of t _j Is a node v _j Time of infection of t _i ,t _j T is less than or equal to T, e is a natural base number, M _i,j Is the (i, j) th item of the adjacency matrix M; then by node v _i To node v _j The transition probability equation phi () can be defined as:

wherein t is _k Is a time t _i To t _j All time in between, all will satisfy v _i (t _i <t _j ) Conditional probability addition is then obtained at t _j The probability distribution of the adjacency matrix M that the instant satisfies the current temporal infection sequence:

for all time T of each time series, multiplying the corresponding probability distribution can obtain the posterior probability distribution of the adjacency matrix M at any time T under the condition of the time series c:

since equation (5) only expresses the probability distribution of a single time series C, for the set C, f (C; M) = Π needs to be solved _c∈C f (c; M), and finally solving for M can be obtained in the form:

max _M≥0 ∑ _c∈C log f(c；M) (6)

m obtained after filtering noise with a threshold ξ, and producing an enhancement map a 'by a' = a ≦ M as an enhancement adjacency matrix for the data.

5. The method for graph classification training based on supervised contrast learning and structure inference as claimed in claim 1, wherein in step four, for each sample of each label, its own and corresponding label are used as positive examples, and samples of samples in other labels are used as negative examples; for each sample in the data set, dividing the sample into a sample belonging to the current label and a sample not belonging to the current label according to the label; randomly selecting a sample from the set not belonging to the current label as a negative example of the current sample, and forming a positive and negative example pair together with the original sample.

6. The graph classification training method based on supervised contrast learning and structure inference as claimed in claim 1, wherein step five) generates sub-graph embedding: generating corresponding subgraph division by adopting width-first search for each node, and generating subgraph embedding by applying a subgraph neural network on the corresponding subgraph; for each node v on the graph _i Generating a corresponding sub-graph g using a breadth-first search BFS and a search upper threshold β _i And corresponding node set S _BFS (v _i ) (ii) a Applying the sub-graph neural network on the set to generate initialized node characteristics is as follows:

where x is the initial characteristic of the node, h ⁽⁰⁾ Is node v _i The initialization feature is used for the graph neural network, AGGEREGATE is an aggregation function used for aggregating neighbor features, and COMBINE is a function for calculating the neighbor features and the current node features; calculating to obtain the initial characteristic H of the whole graph through the formula (7) and the formula (8) ⁽⁰⁾ 。

7. The graph classification training method based on supervised contrast learning and structure inference as claimed in claim 1, wherein in step six, sub-graph embedding is scored and sampled to construct one-dimensional embedding capable of representing current graph data; embedding H in subgraph ⁽⁰⁾ On the basis, a plurality of neighborhood aggregation layers are adopted to form a depth map neural network, wherein the feature updating function of each layer is as follows:

wherein l.gtoreq.1 represents the layer number of the current polymerization,

representing a node v _i The neighborhood nodes in the cluster are set, and the multilayer perceptron for learning aggregation method is a multilayer perceptron for learning aggregation if gamma is a corresponding weighting parameter MLP; after several layers of learning, the node feature representation of the whole graph can be used

Is represented by the formula (I) in which d _n Is the node feature dimension of the L-th level; in order to fuse the feature representations of several nodes in a graph into a single embedding representing the entire graph, first of all the embedding is appliedLearnable TOP _k The function selects k important nodes:

where idx denotes the number of the selected node,

representing a learnable parameter vector; the final graph embedding r can be calculated by applying the following formula to the selected nodes:

wherein,

is a weight matrix, d _s Is the number of dimensions of the super-node,

8. The graph classification training method based on supervised contrast learning and structure inference as claimed in claim 1, wherein in step seven, the graph embedding of positive and negative samples and the corresponding supervised labels are used to calculate the loss through the contrast learning function and combined with the general classification loss to form the final classification loss; cross entropy loss is taken as the general classification loss:

wherein Y is the real label information of each graph; the comparative learning loss after introducing the label information is expressed as follows:

where Ω = {1, …,2m } represents the subscripts of all samples in the current batch of data, Φ (.) represents the subscripts of all positive examples in the current batch of data, for each positive example i there is a corresponding negative example p, τ is the corresponding temperature control hyperparameter, and Γ (i) is the set of all samples except for example i; by the method of constructing positive and negative example sample pairs according to the labels in advance, label information is hidden in the formula (14), so that the guidance quality of contrast learning in the graph classification task is improved; the final loss is as follows:

where λ is a hyper-parameter that controls the proportion of the supervised contrast learning component.