CN116484911A

CN116484911A - Distribution robust optimization method and system suitable for out-of-distribution generalization of graph neural networks

Info

Publication number: CN116484911A
Application number: CN202310178341.7A
Authority: CN
Inventors: 邢晓芬; 许轶珂; 郭锴凌; 徐向民
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-07-25

Abstract

The invention provides a distribution robust optimization method and a system suitable for map neural network distribution generalization, wherein the method comprises the following steps: selecting an arbitrary graph convolution neural network model, aggregating neighbor features of a target individual in graph data, and updating features of the target individual; obtaining a final result prediction of an individual through a model by using a loss function related to a task target, and calculating a corresponding loss function with an individual label to obtain a loss value corresponding to each individual; according to the calculated loss value, obtaining the weight of each individual through a distribution robust optimization algorithm based on KL divergence, wherein the weight distribution is the worst distribution of the current individual; carrying out smoothing operation on the weights of the individuals so that weights of training individuals with similar distances or similar local structures are also similar; calculating the overall loss of all training individuals through the adjusted weight values and the corresponding loss values; and optimizing the used graph convolution neural network model by utilizing the overall training loss.

Description

Distribution robust optimization method and system suitable for out-of-distribution generalization of graph neural networks

技术领域technical field

本发明涉及图神经网络领域，尤其涉及一种适用于图神经网络分布外泛化的分布鲁棒优化方法及系统。The invention relates to the field of graph neural networks, in particular to a distribution robust optimization method and system suitable for out-of-distribution generalization of graph neural networks.

背景技术Background technique

如今，图结构数据越来越受到人们的重视，与图相关的各种算法也应运而生，其中，图神经网络得到了广泛的研究，也被应用到了多个领域。图神经网络作为一种深度学习模型，其可信性十分重要，尤其是在金融领域或者生物研究等领域。可信AI中的一个重要分支就是模型的分布外泛化问题，由于模型在训练过程中使用的数据与在预测中所面对的数据分布可能存在不同，导致模型在测试时与训练阶段的性能会有很大的差距，从而使得模型缺乏最基本的可信度。Nowadays, people pay more and more attention to graph-structured data, and various graph-related algorithms have emerged as the times require. Among them, graph neural networks have been extensively studied and applied to many fields. As a deep learning model, the graph neural network's credibility is very important, especially in the fields of finance or biological research. An important branch of trusted AI is the out-of-distribution generalization of the model. Since the data used by the model in the training process may be different from the data distribution faced in the prediction, the performance of the model during the test and the training phase will have a large gap, which makes the model lack the most basic credibility.

目前有一些工作专门用于解决图卷积神经网络的分布外泛化问题。There are currently several works dedicated to addressing the out-of-distribution generalization problem of graph convolutional neural networks.

EERM(Explore-to-Extrapolate Risk Minimization)首先利用生成器将所有节点划分成不同的环境，然后计算出每个环境下节点的平均损失，训练分类模型使得所有环境损失的均值和方差最小化，而训练生成器使得损失方差最大化，这样就可以使得分类模型在不同环境下的数据都有好的分类效果，提高模型的泛化能力。但该方法需要较大的计算量，因此在规模较大的数据集上很难应用，对计算资源要求很高。Shift-Robust GNNs(SRGNN)提出通过减少训练样本和测试样本之间的分布差异来提高泛化能力，该方法首先需要从测试集中采样一部分无偏样本，在训练过程中输入到图卷积神经网络中，得到对应的特征表示，然后计算和训练样本之间的特征分布差异。常用的度量方式为最大均值差异(Maximum Mean Discrepancy,MMD)，该度量损失将作为正则项，和交叉熵损失一起对模型进行优化。该方法需要在训练阶段获得测试阶段的数据，这在实际场景中很难满足。GraphOut-of-Distribution Benchmark(GOOD)提出了一个关于图卷积神经网络在分布外泛化问题上的测试平台，构造了多个存在分布漂移的图数据集，比如GOOD-Cora，GOOD-Arxiv等。其中将部分解决欧式数据中分布漂移问题的方法，比如不变风险最小化(invariant riskminimization,IRM)，方差-风险外推(Variance-Risk Extrapolation,V-REx)等直接和图卷积神经网络相结合。但这些方法都普遍存在一个问题，即在训练阶段，除了需要知道样本的分类标签之外，还需要知道每个样本所在的环境标签，而环境标签在现实中是很难获得的，因此实际应用中对数据标签的获取成本太高。在解决欧式数据的分布漂移问题中，还存在一种方式，即基于KL散度的分布鲁棒优化算法，它通过让模型在训练过程中一直关注更差的数据分布，从而提高模型的泛化能力。这种方式有两个优点，一个是在训练过程中不需要知道样本的环境标签，因此标签成本较低；另一个是计算复杂度低，训练速度与传统的经验风险最小化(Empirical Risk Minimization,ERM)方法基本相同，同时对计算资源的要求不高。基于KL散度的分布鲁棒优化算法在解决深度学习模型分布外泛化问题上取得了很大进展，但在图神经网络中，该方法没有考虑图结构信息。因此如何改进分布鲁棒优化方法，使其能够更适用于图数据场景，成为了一个亟待解决的问题。EERM (Explore-to-Extrapolate Risk Minimization) first uses the generator to divide all nodes into different environments, and then calculates the average loss of nodes in each environment, trains the classification model to minimize the mean and variance of all environmental losses, and trains the generator to maximize the loss variance, so that the classification model can have good classification effects on data in different environments and improve the generalization ability of the model. However, this method requires a large amount of calculation, so it is difficult to apply to a large-scale data set, and requires high computing resources. Shift-Robust GNNs (SRGNN) proposes to improve the generalization ability by reducing the distribution difference between training samples and test samples. This method first needs to sample a part of unbiased samples from the test set, input them into the graph convolutional neural network during training, obtain the corresponding feature representation, and then calculate the feature distribution difference between the training samples. The commonly used measurement method is the Maximum Mean Discrepancy (MMD), and the measurement loss will be used as a regular term to optimize the model together with the cross-entropy loss. This method needs to obtain the data of the test phase during the training phase, which is difficult to meet in actual scenarios. GraphOut-of-Distribution Benchmark (GOOD) proposes a test platform for graph convolutional neural networks on out-of-distribution generalization issues, and constructs multiple graph datasets with distribution drift, such as GOOD-Cora, GOOD-Arxiv, etc. Among them, some methods to solve the distribution drift problem in European data, such as invariant risk minimization (IRM), variance-risk extrapolation (Variance-Risk Extrapolation, V-REx), etc., are directly combined with the graph convolutional neural network. However, there is a common problem in these methods, that is, in the training phase, in addition to the classification label of the sample, it is also necessary to know the environmental label of each sample, and the environmental label is difficult to obtain in reality, so the cost of obtaining data labels in practical applications is too high. In solving the distribution drift problem of European data, there is another way, that is, the distribution robust optimization algorithm based on KL divergence, which improves the generalization ability of the model by making the model focus on the worse data distribution during the training process. This method has two advantages. One is that the environmental label of the sample does not need to be known during the training process, so the label cost is low; the other is that the computational complexity is low, and the training speed is basically the same as the traditional Empirical Risk Minimization (ERM) method, while the requirements for computing resources are not high. Distribution-robust optimization algorithms based on KL divergence have made great progress in solving the out-of-distribution generalization problem of deep learning models, but in graph neural networks, this method does not consider graph structure information. Therefore, how to improve the distribution robust optimization method to make it more suitable for graph data scenarios has become an urgent problem to be solved.

发明内容Contents of the invention

本发明的目的是克服基于KL散度的分布鲁棒优化在图神经网络上的不足，提出一种权重平滑的方法，对其进行改进，使其能够适用于图卷积神经网络。本发明能够进一步提升图神经网络在分布外泛化问题上的性能，The purpose of the present invention is to overcome the deficiency of distribution robust optimization based on KL divergence in graph neural networks, propose a weight smoothing method, improve it, and make it applicable to graph convolutional neural networks. The present invention can further improve the performance of the graph neural network on the problem of out-of-distribution generalization,

同时具有很强的灵活性，即插即用。At the same time, it has strong flexibility, plug and play.

本发明至少通过如下技术方案之一实现。The present invention is realized through at least one of the following technical solutions.

适用于图神经网络分布外泛化的分布鲁棒优化方法，包括以下步骤：A distribution-robust optimization method suitable for out-of-distribution generalization of graph neural networks, comprising the following steps:

步骤1、将训练个体输入到图卷积神经网络模型，完成对目标个体邻居特征的聚合、特征变换以及更新目标个体自身的特征，最后得到预测输出；Step 1. Input the training individual into the graph convolutional neural network model, complete the aggregation, feature transformation and update of the target individual's own characteristics on the neighbor features of the target individual, and finally obtain the predicted output;

步骤2、计算每个训练个体的损失值；Step 2, calculate the loss value of each training individual;

步骤3、利用基于KL散度的分布鲁棒优化算法计算每个训练个体的损失权重；步骤4、对个体的损失权重进行平滑；Step 3, using the distribution robust optimization algorithm based on KL divergence to calculate the loss weight of each training individual; Step 4, smoothing the individual loss weight;

步骤5、利用加权损失进行训练；Step 5. Use weighted loss for training;

步骤6、使用训练好的模型在进行数据预测。Step 6. Use the trained model to perform data prediction.

进一步地，图卷积神经网络的输入包括特征矩阵和图结构信息；其中，特征矩阵包括节点特征矩阵和边特征矩阵，图结构信息由邻接矩阵保存，表示节点之间的连接关系；Further, the input of the graph convolutional neural network includes a feature matrix and graph structure information; wherein, the feature matrix includes a node feature matrix and an edge feature matrix, and the graph structure information is stored by an adjacency matrix, which represents the connection relationship between nodes;

图卷积神经网络主要由三个部分组成：特征变换、信息聚合和特征更新。The graph convolutional neural network mainly consists of three parts: feature transformation, information aggregation, and feature update.

进一步地，所述图卷积神经网络为Graph Convolutional Networks(GCN)模型：Further, the graph convolutional neural network is a Graph Convolutional Networks (GCN) model:

其中，W^l表示第l层的特征变换矩阵；H^l表示个体在第l层的特征表示，H⁰为个体的原始特征；D表示图中节点的度矩阵，代表每个节点的邻居数量；A表示图的邻接矩阵则是对邻接矩阵进行归一化操作；σ表示激活函数；通过将图中每个节点的特征矩阵以及表示节点之间边关系的邻接矩阵输入到图卷积神经网络中，最终得到每一个节点的分类结果。Among them, W ^l represents the feature transformation matrix of layer l; H ^l represents the feature representation of the individual in layer l, and ^H0 is the original feature of the individual; D represents the degree matrix of nodes in the graph, representing the number of neighbors of each node; A represents the adjacency matrix of the graph It is to normalize the adjacency matrix; σ represents the activation function; by inputting the feature matrix of each node in the graph and the adjacency matrix representing the edge relationship between nodes into the graph convolutional neural network, the classification result of each node is finally obtained.

进一步地，当前图数据存在N个个体，模型对每个个体进行分类，共有M个风险类别，利用标签计算的交叉熵损失来优化模型参数，计算公式如下：Furthermore, there are N individuals in the current graph data, and the model classifies each individual, and there are M risk categories in total. The cross-entropy loss calculated by the label is used to optimize the model parameters. The calculation formula is as follows:

其中，N为训练个体个数，L_i为个体i对应的损失，p_ic表示模型对个体i在类别c上的预测概率，y_ic则是对应的标签，通过该公式得到最后的总损失L。Among them, N is the number of training individuals, L _i is the loss corresponding to individual i, p _ic represents the predicted probability of the model for individual i in category c, y _ic is the corresponding label, and the final total loss L is obtained through this formula.

进一步地，根据所得的损失值，使用基于KL散度的分布鲁棒优化算法，计算出当前所有个体的损失权重，该损失权重分布为当前算法所能找到的最差分布。Further, according to the obtained loss value, the distribution robust optimization algorithm based on KL divergence is used to calculate the current loss weight of all individuals, and the loss weight distribution is the worst distribution that the current algorithm can find.

进一步地，优化目标如下所示：Further, the optimization objective is as follows:

其中，θ是图模型的参数,Θ表示模型可能的参数集合，p表示未来可能的个体数据的特征分布，Ρ表示所有可能分布组成的不确定集合，其不确定范围由确定。D(P,Q)表示分布P和Q之间的距离函数。以KL散度为例，当D(P,Q)表示P和Q之间的KL散度时，D(P,Q)可以表示为/>f(X,θ)表示模型对每个个体的预测结果，L(f(X,θ),y)则表示每个个体所对应的损失值，这里的L使用交叉熵损失函数。(X,y)～p表示样本(X,y)是从分布p的数据中随机采样得到。X表示个体的信息，包括个体的特征信息以及子图结构信息，子图结构信息包含了当前个体与其他个体之间的关联信息。/>则是计算在分布p下个体损失L的期望。/>表示训练集的数据分布，通过距离阈值ρ的约束，将最差分布限制在以/>为中心，半径为ρ的球体范围内。Among them, θ is the parameter of the graphical model, Θ represents the possible parameter set of the model, p represents the characteristic distribution of possible individual data in the future, and P represents the uncertain set composed of all possible distributions, and its uncertain range is determined by Sure. D(P,Q) represents the distance function between distributions P and Q. Taking KL divergence as an example, when D(P,Q) represents the KL divergence between P and Q, D(P,Q) can be expressed as /> f(X, θ) represents the prediction result of the model for each individual, and L(f(X, θ), y) represents the loss value corresponding to each individual, where L uses the cross entropy loss function. (X, y) ~ p means that the sample (X, y) is randomly sampled from the data of distribution p. X represents the information of the individual, including the characteristic information of the individual and the structure information of the subgraph, and the structure information of the subgraph contains the association information between the current individual and other individuals. /> is to calculate the expectation of the individual loss L under the distribution p. /> Represents the data distribution of the training set, through the constraint of the distance threshold ρ, the worst distribution is limited to As the center, within the range of a sphere with radius ρ.

进一步地，距离函数为Wasserstein距离或KL散度。Further, the distance function is Wasserstein distance or KL divergence.

进一步地，对步骤3得到的个体损失权重进行平滑，使得相邻或者具有相似局部邻居结构的个体能有更相近的权重值，从而引入图结构信息。Further, the individual loss weights obtained in step 3 are smoothed, so that individuals adjacent or with similar local neighbor structures can have closer weight values, thereby introducing graph structure information.

进一步地，平滑操作为插值平滑。平滑操作如下，Further, the smoothing operation is interpolation smoothing. The smoothing operation is as follows,

其中，W_s ⁿ表示平滑n次之后的权重，W_s ⁰为步骤3计算得到的初始权重，表示平滑矩阵，1表示初始权重的重要程度，利用λ就可以使得平滑后的权重能够保留部分步骤3得到的权重信息。Among them, W _s ⁿ represents the weight after smoothing n times, and W _s ⁰ is the initial weight calculated in step 3, Represents the smoothing matrix, 1 represents the importance of the initial weight, using λ can make the smoothed weight retain part of the weight information obtained in step 3.

实现所述的一种适用于图神经网络分布外泛化的分布鲁棒优化方法的系统，包括：A system for implementing the described distribution robust optimization method suitable for out-of-distribution generalization of graph neural networks, comprising:

运算模块：将图数据输入到图卷积神经网络中，获得每个个体的心理风险等级预测，并利用标签计算相应的损失值。Operation module: Input the graph data into the graph convolutional neural network to obtain the prediction of the psychological risk level of each individual, and use the label to calculate the corresponding loss value.

权重计算模块：利用基于KL散度的分布鲁棒优化算法计算每个个体的损失权重。Weight calculation module: use KL divergence-based distribution robust optimization algorithm to calculate the loss weight of each individual.

权重平滑模块：对得到的损失权重进行平滑，使得相邻或者具有相似局部邻居结构的个体能有更相近的权重值。Weight smoothing module: smooth the obtained loss weights, so that individuals with adjacent or similar local neighbor structures can have closer weight values.

训练模块：利用平滑后的权重和每个个体的损失计算总体的加权损失，训练图卷积神经网络。Training module: use the smoothed weights and the loss of each individual to calculate the overall weighted loss, and train the graph convolutional neural network.

执行模块：当获得某一地区的学生心理问卷调查结果，将其构造成图数据，输入到训练好的模型中进行心理疾病的风险预测。Execution module: When the psychological questionnaire survey results of students in a certain area are obtained, it is constructed into graph data and input into the trained model to predict the risk of mental illness.

与现有的技术相比，本发明的有益效果为：本发明在存在分布漂移的数据上性能都高于目前普遍使用的分布外泛化方法；同时，传统的分布鲁棒优化方法在结合本发明提出的权重平滑方法之后，模型性能都得到了显著提升，这也证明了本发明的有效性。Compared with the existing technology, the beneficial effect of the present invention is: the performance of the present invention on data with distribution drift is higher than that of the currently commonly used out-of-distribution generalization method; at the same time, after the traditional distribution robust optimization method is combined with the weight smoothing method proposed by the present invention, the model performance has been significantly improved, which also proves the effectiveness of the present invention.

附图说明Description of drawings

图1为本发明实施例个体相似心理关系图；Fig. 1 is an individual similar psychological relationship diagram of an embodiment of the present invention;

图2为本发明实施例训练框图；Fig. 2 is the training block diagram of the embodiment of the present invention;

图3为本发明实施例适用于图神经网络分布外泛化的分布鲁棒优化方法流程图。FIG. 3 is a flowchart of a distribution robust optimization method suitable for out-of-distribution generalization of a graph neural network according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，以下将结合附图和具体实施方式对本发明作进一步的详细说明。显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本实施例的一种适用于图神经网络分布外泛化的分布鲁棒优化心理疾病检测方法，以某一个大学的所有大学生作为检测对象，每一个大学生就是一个个体，学生所做的心理问卷调查作为他们的特征。为了挖掘更多的特征信息，利用学生的问卷结果构造一个学生相似心理关系图，使得心理情况相似的两个学生之间存在一条边。在一般情况下，利用图模型对个体进行个体风险评估，将个体的特征信息以及对应的关系子图输入到图模型中，对其心理疾病进行预测，或者将一整张图数据进行全量输入。此时图模型只是用了简单的交叉熵损失进行优化训练。当利用一个学校的学生数据训练得到一个分类器，想将这个分类器推广到其他地区或者学校，往往得到的效果会很差。这是因为不同地区或者不同学校，学生的心理状态和问卷的特征分布是不一样的。上述分布差异主要是环境因素导致，比如不同学校课程和学习的压力不同，有些学校学生的学业压力较大，有些则较小；不同地区的生活水平压力不同，发达地区的学生生活水平较高，生活压力小，而相对落后地区的学生除了学业上的压力，生活上的压力也会引起心理状态的变化。上述因素都会导致不同地区和学校的学生心理健康状态有着不同的分布。这也导致了模型在进行推理时所面对的数据分布和实际训练时的分布有很大不同，因此简单的利用交叉熵损失并不能使模型有很好的泛化能力。基于KL散度的分布鲁棒优化算法是常用的提高模型泛化能力的算法，它主要是通过在训练时对损失较大的个体赋予更高的权重，从而提高其在总损失中的占比，让模型能够更关注这些个体。但是简单的将这个方法引入到图模型的训练过程中是不合理的，如在学生相似心理关系图数据中，一个个体除了自身，还会和其他个体产生关系，因此该个体训练损失的权重也应该与相邻个体的权重相似。本发明在利用分布鲁棒优化算法计算出每个个体的权重之后，进一步设计了权重平滑的操作，使得平滑后的权重除了考虑个体损失的大小，还考虑了个体与个体之间的连接关系。In this embodiment, a distribution-robust-optimized mental illness detection method suitable for the out-of-distribution generalization of graph neural networks takes all college students of a certain university as detection objects, each college student is an individual, and the psychological questionnaire survey done by the students is used as their characteristics. In order to mine more characteristic information, a student similar psychological relationship graph is constructed by using the student questionnaire results, so that there is an edge between two students with similar psychological conditions. In general, the graph model is used to assess the individual risk of an individual, and the characteristic information of the individual and the corresponding relationship subgraph are input into the graph model to predict its mental illness, or the entire graph data is fully input. At this time, the graph model only uses a simple cross-entropy loss for optimization training. When using a school's student data to train a classifier and want to extend this classifier to other regions or schools, the results will often be poor. This is because in different regions or different schools, the psychological state of students and the distribution of characteristics of the questionnaire are not the same. The above-mentioned differences in distribution are mainly caused by environmental factors. For example, different schools have different curriculum and learning pressures. Students in some schools have higher academic pressures, while others have lower pressures; different regions have different pressures on living standards. The above factors will lead to different distributions of mental health status of students in different regions and schools. This also leads to the fact that the data distribution faced by the model during inference is very different from the distribution during actual training, so simply using the cross-entropy loss does not make the model have a good generalization ability. The distribution robust optimization algorithm based on KL divergence is a commonly used algorithm to improve the generalization ability of the model. It mainly assigns higher weights to individuals with large losses during training, thereby increasing their proportion in the total loss, so that the model can pay more attention to these individuals. However, it is unreasonable to simply introduce this method into the training process of the graph model. For example, in the similar mental relationship graph data of students, an individual will have relationships with other individuals besides itself, so the weight of the individual training loss should also be similar to the weight of adjacent individuals. After the weight of each individual is calculated by the distribution robust optimization algorithm, the present invention further designs a weight smoothing operation, so that the smoothed weight not only considers the size of the individual loss, but also considers the connection relationship between individuals.

如图2所示的适用于图神经网络分布外泛化的分布鲁棒优化方法，以提供一种缓解图神经网络在面对分布外数据时存在低准确度、不可信等问题的优化方法，包括以下步骤：As shown in Figure 2, the distribution robust optimization method suitable for the generalization of the distribution of the graph neural network is provided to provide an optimization method for alleviating the low accuracy and untrustworthiness of the graph neural network in the face of out-of-distribution data, including the following steps:

步骤1、在心理疾病检测中，以学生相似心理关系图作为输入，利用图神经网络得到关系图中每个学生的心理风险预测输出，其主要完成对每个学生邻居信息的聚合，特征变换以及更新目标学生自身的特征，从而充分考虑关系图的结构信息。其中，学生相似心理关系图中的每一个节点代表一个学生，每个节点自身会存在一些特征信息，这些特征是根据学生所做的心理问卷挖掘出来的信息；同时，图中的每一条边表示节点之间的相似关系，比如，根据计算学生特征之间的相似度，当相似度高于某一阈值，则他们之间会形成一条边。Step 1. In the detection of mental illness, the student’s similar psychological relationship graph is used as input, and the graph neural network is used to obtain the psychological risk prediction output of each student in the graph. It mainly completes the aggregation of each student’s neighbor information, feature transformation, and updates the target student’s own characteristics, so as to fully consider the structural information of the graph. Among them, each node in the student similarity psychological relationship graph represents a student, and each node has some characteristic information. These characteristics are information mined from the psychological questionnaires made by the students; at the same time, each edge in the graph represents the similarity relationship between nodes.

图卷积神经网络模型的输入主要包括特征矩阵和图结构信息，其中，特征矩阵又包括节点特征矩阵和边特征矩阵，图结构信息主要由邻接矩阵保存，表示节点之间的连接关系。图卷积神经网络模型的工作过程主要由三个部分组成：特征变换、信息聚合和特征更新，其核心就在于通过聚合目标节点的邻居特征，来更新自己的特征信息。The input of the graph convolutional neural network model mainly includes feature matrix and graph structure information. The feature matrix includes node feature matrix and edge feature matrix. The graph structure information is mainly stored by the adjacency matrix, which represents the connection relationship between nodes. The working process of the graph convolutional neural network model is mainly composed of three parts: feature transformation, information aggregation, and feature update. The core is to update its own feature information by aggregating the neighbor features of the target node.

以GCN为一种优选的实施例，其计算公式如下式所示：Taking GCN as a preferred embodiment, its calculation formula is shown in the following formula:

其中，W^l表示第l层的特征变换矩阵；H^l表示个体在第l层的特征矩阵；D表示图中节点的度矩阵，代表每个节点的邻居数量；A表示图的邻接矩阵；Among them, W ^l represents the feature transformation matrix of layer l; H ^l represents the feature matrix of individuals in layer l; D represents the degree matrix of nodes in the graph, representing the number of neighbors of each node; A represents the adjacency matrix of the graph;

则是对邻接矩阵进行归一化操作；σ表示激活函数。 It is to normalize the adjacency matrix; σ represents the activation function.

以本实施例所提到的心理疾病检测为例，假设当前总共有N个个体，个体的特征数量为C。图卷积神经网络的预测过程可以分为以下三个步骤：Taking the mental illness detection mentioned in this embodiment as an example, it is assumed that there are currently N individuals in total, and the number of individual features is C. The prediction process of graph convolutional neural network can be divided into the following three steps:

特征变换：(H^l)W^l，当l＝0时，H⁰表示个体的原始特征，矩阵维度大小是(N,C)，特征可以是根据学生所做的心理问卷挖掘出来的信息。通过一个线性层对特征进行线性变换，得到更新后的特征矩阵。Feature transformation: (H ^l )W ^l , when l=0, H ⁰ represents the original feature of the individual, the size of the matrix dimension is (N,C), and the feature can be the information mined from the psychological questionnaire made by the students. The features are linearly transformed through a linear layer to obtain an updated feature matrix.

信息聚合：A表示图的邻接矩阵，矩阵维度大小为(N,N)矩阵中元素为1则表示对应的两个个体之间存在较高的相似性关系，0则表示没有不相似。D表示度矩阵，矩阵维度大小为(N,1),每一个值表示对应个体在整个关系图中与多少个体有较高的相似度。以个体a为例，当个体b,c,d,e与其很相似时，他们之间就会存在一条边，通过当前步骤，个体a可以聚合得到个体b、c、d、e的特征信息，通过加权求和之后，作为个体a新的特征，则得到的个体a的特征不仅包含了原始的特征信息，还包含了邻居的信息，提高了特征的信息量。Information aggregation: A represents the adjacency matrix of the graph, and the matrix dimension size is (N, N). The element in the matrix is 1, which means that there is a high similarity relationship between the corresponding two individuals, and 0 means that there is no dissimilarity. D represents the degree matrix, and the dimension size of the matrix is (N, 1), and each value indicates how many individuals the corresponding individual has a high similarity with in the entire relationship graph. Taking individual a as an example, when individuals b, c, d, and e are very similar to them, there will be an edge between them. Through the current step, individual a can aggregate and obtain the feature information of individuals b, c, d, and e. After weighted summation, as a new feature of individual a, the obtained feature of individual a not only contains the original feature information, but also contains neighbor information, which improves the information content of the feature.

特征更新：最后通过一个激活函数，对特征作进一步的更新。Feature update: Finally, an activation function is used to further update the features.

首先是使用图卷积神经网络模型对训练集个体进行类别预测。通过图卷积神经网络模型，学生相似心理关系图中的每一个个体，即图中每一个节点，就可以聚合多阶邻居节点的特征信息，进而得到最终的预测结果。The first is to use a graph convolutional neural network model to perform category prediction on individuals in the training set. Through the graph convolutional neural network model, students are similar to each individual in the psychological relationship graph, that is, each node in the graph, and can aggregate the feature information of multi-level neighbor nodes to obtain the final prediction result.

步骤2、计算每个个体的损失。Step 2. Calculate the loss of each individual.

通过步骤1可以得到图卷积神经网络模型对每一个训练集个体的预测输出。当前图数据存在N个个体，模型对每个个体的心理疾病风险类别进行分类，个体的标签表示其存在心理问题的风险，总共有M个风险类别。利用标签计算的交叉熵损失来优化模型参数，计算公式如下：Through step 1, the prediction output of the graph convolutional neural network model for each individual training set can be obtained. There are N individuals in the data in the current graph, and the model classifies the mental disease risk category of each individual. The label of the individual indicates the risk of psychological problems. There are M risk categories in total. Use the cross-entropy loss calculated by the label to optimize the model parameters. The calculation formula is as follows:

其中，N为训练个体个数，l_i为每个个体对应的损失，p_ic表示模型对个体i在类别c上的预测概率，y_ic则是对应的标签，通过该公式得到最后的总损失L。Among them, N is the number of training individuals, l _i is the corresponding loss of each individual, p _ic represents the predicted probability of the model for individual i in category c, y _ic is the corresponding label, and the final total loss L is obtained through this formula.

步骤3、利用基于KL散度的分布鲁棒优化算法计算每个个体的损失权重。Step 3. Calculate the loss weight of each individual using the distribution robust optimization algorithm based on KL divergence.

对于一般的机器学习分类任务，通过在有限的训练数据下，训练一个分类的模型，即只能使用有限的数据去估计真实的数据分布。而由于数据数量的限制，可观测的训练集分布与实际的分布之间存在一定差异，导致模型并不能学到真正的数据分布，因此模型的性能会受到训练集分布的影响。而如何去减少这种影响，或者进一步优化所学习到的分类器，成为了一个研究话题。For general machine learning classification tasks, by training a classification model under limited training data, only limited data can be used to estimate the real data distribution. However, due to the limitation of the amount of data, there is a certain difference between the observable training set distribution and the actual distribution, so that the model cannot learn the real data distribution, so the performance of the model will be affected by the training set distribution. How to reduce this effect, or further optimize the learned classifier, has become a research topic.

分布鲁棒优化(Distributionally Robust Optimization,DRO)就被提出用来缓解分布漂移所带来模型性能下降的影响。分布鲁棒优化以训练集数据分布为中心，在其附近找到对于当前模型来说最差的一个数据分布，使得对应的整体损失最大，再优化模型使其在最差分布下能有更好的效果，因此提高了模型的泛化能力和鲁棒性。其优化目标如下是所示：Distributionally Robust Optimization (Distributionally Robust Optimization, DRO) was proposed to alleviate the impact of model performance degradation caused by distribution drift. Distribution robust optimization focuses on the data distribution of the training set, and finds the worst data distribution for the current model near it, so that the corresponding overall loss is the largest, and then optimizes the model so that it can have better results under the worst distribution, thus improving the generalization ability and robustness of the model. Its optimization goal is as follows:

其中，θ是图模型的参数,Θ表示模型可能的参数集合，p表示未来可能的个体数据的特征分布，Ρ表示所有可能分布组成的不确定集合，其不确定范围由确定，D(P,Q)表示分布P和Q之间的距离函数，以KL散度为例，当D(P,Q)表示P和Q之间的KL散度时，D(P,Q)可以表示为/>f(X,θ)表示模型对每个个体的预测结果，L(f(X,θ),y)则表示每个个体所对应的损失值，这里的L使用交叉熵损失函数。(X,y)～p表示样本(X,y)是从分布p的数据中随机采样的到。X表示个体的信息，包括个体的特征信息以及子图结构信息，子图结构信息包含了当前个体与其他个体之间的相似性关系。/>则是计算在分布p下个体损失L的期望。/>表示训练集的数据分布，通过距离阈值ρ的约束，将最差分布限制在以/>为中心，半径为ρ的球体范围内。Among them, θ is the parameter of the graphical model, Θ represents the possible parameter set of the model, p represents the characteristic distribution of possible individual data in the future, and P represents the uncertain set composed of all possible distributions, and its uncertain range is determined by OK, D(P,Q) represents the distance function between distribution P and Q, taking KL divergence as an example, when D(P,Q) represents the KL divergence between P and Q, D(P,Q) can be expressed as /> f(X, θ) represents the prediction result of the model for each individual, and L(f(X, θ), y) represents the loss value corresponding to each individual, where L uses the cross entropy loss function. (X, y) ~ p means that the sample (X, y) is randomly sampled from the data of distribution p. X represents individual information, including individual feature information and subgraph structure information, and the subgraph structure information includes the similarity relationship between the current individual and other individuals. /> is to calculate the expectation of the individual loss L under the distribution p. /> Represents the data distribution of the training set, through the constraint of the distance threshold ρ, the worst distribution is limited to As the center, within the range of a sphere with radius ρ.

作为另一种优选的实施例，距离函数为比如Wasserstein距离，Kl距离等。As another preferred embodiment, the distance function is, for example, Wasserstein distance, K1 distance and the like.

通过上述的minmax的对抗训练框架，利用优化算法，可以计算出当前损失下每个个体的损失权重系数。Through the above-mentioned minmax confrontation training framework, using the optimization algorithm, the loss weight coefficient of each individual under the current loss can be calculated.

步骤4、对步骤3得到的个体的损失权重进行平滑，通过对所得到的每个个体的损失权重进行平滑操作，使得相邻或者具有相似局部邻居结构的个体能有更相近的权重值，从而进一步的引入了图结构信息。Step 4. Smooth the loss weight of the individual obtained in step 3. By smoothing the loss weight of each individual obtained, the adjacent or similar local neighbor structure individuals can have closer weight values, thereby further introducing graph structure information.

经过步骤3可以得到每个个体对应的损失权重，然而此时的权重只考虑了每个个体各自的损失值，没有考虑关系图的结构信息。由于图数据在结构上存在同配性，在图上距离相近的训练节点的权重应该也是相近的，因此本发明提出了一个权重平滑模块，在利用步骤3计算出每个个体的权重之后，对权重值进行一次平滑操作，使得在图上相近的训练节点的权重相似，从而考虑了图数据特有的结构信息。平滑操作如下，After step 3, the loss weight corresponding to each individual can be obtained. However, the weight at this time only considers the individual loss value of each individual, and does not consider the structural information of the relationship graph. Due to the assortment of graph data in structure, the weights of training nodes with similar distances on the graph should also be similar. Therefore, the present invention proposes a weight smoothing module. After calculating the weight of each individual in step 3, a smoothing operation is performed on the weight value, so that the weights of similar training nodes on the graph are similar, thereby considering the unique structural information of graph data. The smoothing operation is as follows,

通过这一步的改进，分布鲁棒优化算法能够更适用于图数据，并且融入了更多的信息，从而提高了该方法在图数据上的适应能力。Through the improvement of this step, the distribution robust optimization algorithm can be more suitable for graph data and incorporate more information, thus improving the adaptability of the method on graph data.

步骤5、加权损失进行训练，利用平滑后的损失权重以及对应的损失值，计算所有个体的损失加权和，作为最后的总损失；在完成上述步骤之后，利用梯度下降优化算法，最小化总预测损失，对图卷积神经网络模型参数进行更新优化。Step 5: Train with weighted loss, use the smoothed loss weight and corresponding loss value to calculate the weighted sum of the losses of all individuals as the final total loss; after completing the above steps, use the gradient descent optimization algorithm to minimize the total prediction loss, and update and optimize the parameters of the graph convolutional neural network model.

步骤6、使用训练好的模型预测个体的心理疾病风险类别。Step 6. Use the trained model to predict the individual mental disease risk category.

运算模块：将学生相似心理关系图数据输入到图卷积神经网络中，获得每个个体的心理风险等级预测，并利用标签计算相应的损失值。Operation module: input the similar psychological relationship graph data of students into the graph convolutional neural network to obtain the prediction of each individual's psychological risk level, and use the labels to calculate the corresponding loss value.

执行模块：当获得某一地区的学生心理问卷调查结果，将其构造成图数据，输入到训练好的图卷积神经网络中进行心理疾病的风险预测。Execution module: When the psychological questionnaire survey results of students in a certain area are obtained, it is constructed into graph data and input into the trained graph convolutional neural network for risk prediction of mental diseases.

以上公开的本发明优选实施例只是用于帮助阐述本发明。优选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本说明书的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本发明的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本发明。本发明仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the invention disclosed above are only to help illustrate the invention. The preferred embodiments are not exhaustive in all detail, nor are the inventions limited to specific embodiments described. Obviously, many modifications and variations can be made based on the contents of this specification. This description selects and specifically describes these embodiments in order to better explain the principle and practical application of the present invention, so that those skilled in the art can well understand and utilize the present invention. The invention is to be limited only by the claims, along with their full scope and equivalents.

Claims

1. The distribution robust optimization method suitable for the map neural network distribution generalization is characterized by comprising the following steps of:

step 1, inputting a training individual into a graph convolution neural network model, completing aggregation and feature transformation of neighbor features of a target individual, updating the features of the target individual, and finally obtaining prediction output;

step 2, calculating a loss value of each training individual;

step 3, calculating the loss weight of each training individual by using a distribution robust optimization algorithm based on KL divergence;

step 4, smoothing the loss weight of the individual;

step 5, training by using the weight loss;

and 6, carrying out data prediction by using the trained model.

2. The distributed robust optimization method suitable for the outward generalization of the graph neural network distribution according to claim 1, wherein the input of the graph convolution neural network comprises a feature matrix and graph structure information, wherein the feature matrix comprises a node feature matrix and an edge feature matrix, and the graph structure information is stored by an adjacent matrix and represents the connection relation between nodes;

the graph convolutional neural network is mainly composed of three parts: feature transformation, information aggregation, and feature updating.

3. The distributed robust optimization method for out-of-distribution generalization of a graph neural network according to claim 2, characterized in that the graph roll-up neural network is a Graph Convolutional Networks (GCN) model:

wherein W is ^l Representing a feature transformation matrix of a first layer; h ^l Representation of features of an individual at layer I, H ⁰ Is an original feature of an individual; d represents a degree matrix of the nodes in the graph and represents the number of neighbors of each node; a represents the adjacency matrix of the graph,then normalizing the adjacency matrix; sigma represents an activation function; the characteristic matrix of each node in the graph and the adjacency matrix representing the edge relation between the nodes are input into the graph convolution neural network, and finally the classification result of each node is obtained.

4. The distributed robust optimization method for the outward generalization of the graph neural network distribution according to claim 1, wherein the current graph data has N individuals, the model classifies each individual, and there are M risk categories, and the model parameters are optimized by using the cross entropy loss calculated by the labels, and the calculation formula is as follows:

wherein N is the number of training individuals, L _i For loss corresponding to individual i, p _ic Representing the model's predicted probability of an individual i over category c, y _ic Then it is the corresponding label from which the final total loss L is derived.

5. The distribution robust optimization method suitable for the distribution generalization of the graph neural network according to claim 1, wherein a distribution robust optimization algorithm based on KL divergence is used to calculate the loss weight of all current individuals according to the obtained loss value, and the loss weight distribution is the worst distribution found by the current algorithm.

6. The distribution robust optimization method suitable for off-distribution generalization of a graph neural network according to claim 1, wherein the optimization objective is as follows:

where θ is a parameter of the graph model, θ represents a set of possible parameters of the model, p represents a characteristic distribution of the future possible individual data, and p represents an uncertainty set of all possible distributions, whose uncertainty range is defined byDetermining that D (P, Q) represents a distance function between the distributions P and Q, taking KL divergence as an example, when D (P, Q) represents KL divergence between P and Q, D (P, Q) is expressed as +.> f (X, θ) represents the model's predicted outcome for each individual,

l (f (X, theta)) represents a loss value corresponding to each individual, wherein L uses a cross entropy loss function, samples (X, y) to p are randomly sampled from data of distribution p, X represents individual information and comprises individual characteristic information and sub-graph structure information, the sub-graph structure information comprises association information between the current individual and other individuals,it is the expectation of calculating the individual loss L under distribution p,/->Data distribution representing training set, the worst distribution is limited to be +.>Is centered and has a radius rho in the sphere range.

7. The distribution robust optimization method for extradistribution generalization of a graph neural network according to claim 6, wherein the distance function is a wasperstein distance or KL divergence.

8. The distribution robust optimization method suitable for the outward generalization of the graph neural network distribution according to claim 1, wherein the individual loss weights obtained in the step 3 are smoothed, so that individuals with adjacent or similar local neighbor structures can have more similar weight values, and graph structure information is introduced.

9. The method for distribution robustness optimization for use in extradistribution generalization of a graph neural network according to claim 8, wherein the smoothing operation is interpolation smoothing, the smoothing operation is as follows,

wherein W is _s ⁿ Represents the weight after smoothing n times, W _s ⁰ For the initial weights calculated in step 3,representing a smoothing matrix, wherein 1 represents the importance degree of the initial weight, and lambda is utilized to enable the smoothed weight to retain part of the weight information obtained in the step 3.

10. A system for implementing a distributed robust optimization method for off-distribution generalization of a graph neural network as recited in claim 1, comprising:

the operation module: inputting the graph data into a graph convolution neural network, obtaining psychological risk level prediction of each individual, and calculating a corresponding loss value by using a label;

and a weight calculation module: calculating the loss weight of each individual by using a distribution robust optimization algorithm based on KL divergence;

and a weight smoothing module: smoothing the obtained loss weight to enable individuals with adjacent or similar local neighbor structures to have more similar weight values;

training module: calculating the total weighted loss by using the smoothed weight and the loss of each individual, and training the graph convolution neural network;

the execution module: when the psychological questionnaire survey results of students in a certain area are obtained, the psychological questionnaire survey results are constructed into graph data, and the graph data are input into a trained model to conduct psychological disease risk prediction.