CN116484911A - Distribution robust optimization method and system suitable for graph neural network distribution generalization - Google Patents
Distribution robust optimization method and system suitable for graph neural network distribution generalization Download PDFInfo
- Publication number
- CN116484911A CN116484911A CN202310178341.7A CN202310178341A CN116484911A CN 116484911 A CN116484911 A CN 116484911A CN 202310178341 A CN202310178341 A CN 202310178341A CN 116484911 A CN116484911 A CN 116484911A
- Authority
- CN
- China
- Prior art keywords
- distribution
- individual
- graph
- loss
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 102
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 46
- 238000005457 optimization Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 50
- 238000009499 grossing Methods 0.000 claims abstract description 28
- 238000003062 neural network model Methods 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 41
- 230000009466 transformation Effects 0.000 claims description 11
- 201000010099 disease Diseases 0.000 claims description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 8
- 238000004220 aggregation Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 230000004580 weight loss Effects 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 238000013145 classification model Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000013213 extrapolation Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a distribution robust optimization method and a system suitable for map neural network distribution generalization, wherein the method comprises the following steps: selecting an arbitrary graph convolution neural network model, aggregating neighbor features of a target individual in graph data, and updating features of the target individual; obtaining a final result prediction of an individual through a model by using a loss function related to a task target, and calculating a corresponding loss function with an individual label to obtain a loss value corresponding to each individual; according to the calculated loss value, obtaining the weight of each individual through a distribution robust optimization algorithm based on KL divergence, wherein the weight distribution is the worst distribution of the current individual; carrying out smoothing operation on the weights of the individuals so that weights of training individuals with similar distances or similar local structures are also similar; calculating the overall loss of all training individuals through the adjusted weight values and the corresponding loss values; and optimizing the used graph convolution neural network model by utilizing the overall training loss.
Description
Technical Field
The invention relates to the field of graph neural networks, in particular to a distribution robust optimization method and system suitable for graph neural network distribution generalization.
Background
Nowadays, graph structure data is increasingly paid attention to, and various algorithms related to graphs are generated, wherein graph neural networks are widely studied and applied to a plurality of fields. The reliability of the graph neural network is very important as a deep learning model, and the graph neural network is especially applied to the fields of finance, biological research and the like. An important branch in the trusted AI is the problem of model distribution generalization, and because the data used in the training process of the model and the data distribution faced in the prediction may be different, the performance of the model in the test and the training stage may have a great gap, so that the model lacks the most basic credibility.
At present, some work is specially used for solving the problem of distribution generalization of a graph roll-up neural network.
EERM (Explorer-to-Extrapolate Risk Minimization) firstly divides all nodes into different environments by a generator, then calculates the average loss of the nodes under each environment, trains a classification model to minimize the mean and variance of the loss of all environments, trains the generator to maximize the variance of the loss, thus the classification model has good classification effect on the data under different environments and improves the generalization capability of the model. However, this method requires a large amount of computation, and is therefore difficult to apply to large-scale data sets, and requires a high computational resource. Shift-robustgnns (SRGNN) proposes to increase generalization capability by reducing the distribution difference between training samples and test samples, which first requires sampling a portion of unbiased samples from the test set, inputting to the graph convolutional neural network during training to obtain corresponding feature representations, and then calculating the feature distribution difference between the training samples. A common measure is the maximum mean difference (Maximum Mean Discrepancy, MMD), which will be used as a regularization term, together with cross entropy loss to optimize the model. This method requires data from the test phase to be obtained during the training phase, which is difficult to meet in a practical scenario. Graph Out-of-Distribution Benchmark (GOOD) presents a test platform for the problem of off-distribution generalization of Graph roll-up neural networks, constructing multiple Graph datasets with distribution drift, such as GOOD-Cora, GOOD-Arxiv, etc. Methods that partially solve the problem of distributed drift in European data, such as invariant risk minimization (invariant risk minimization, IRM), variance-risk extrapolation (Variance-Risk Extrapolation, V-REx), and the like, are directly combined with the graph convolutional neural network. However, these methods have a problem that in the training stage, besides knowing the classification labels of the samples, the environmental labels where each sample is located need to be known, and the environmental labels are difficult to obtain in reality, so that the acquisition cost of the data labels is too high in practical application. In solving the problem of distribution drift of European data, there is also a mode of distribution robust optimization algorithm based on KL divergence, which improves the generalization capability of the model by keeping the model focus on worse data distribution all the time in the training process. This approach has two advantages, one is that the environmental label of the sample does not need to be known during training, so the label cost is low; the other is low computational complexity, the training speed is basically the same as that of the traditional empirical risk minimization (Empirical Risk Minimization, ERM) method, and the requirement on computational resources is not high. The distribution robust optimization algorithm based on KL divergence has made great progress in solving the problem of generalization outside the distribution of the deep learning model, but in the graph neural network, the method does not consider graph structure information. Therefore, how to improve the distributed robust optimization method so that the method can be more suitable for graph data scenes becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to overcome the defect of distribution robust optimization on a graph neural network based on KL divergence, and provides a weight smoothing method which is improved so as to be applicable to the graph convolution neural network. The invention can further improve the performance of the graph neural network on the problem of generalization outside the distribution,
and meanwhile, the method has strong flexibility in plug and play.
The invention is realized at least by one of the following technical schemes.
The distribution robust optimization method suitable for the map neural network distribution generalization comprises the following steps:
step 1, inputting a training individual into a graph convolution neural network model, completing aggregation and feature transformation of neighbor features of a target individual, updating the features of the target individual, and finally obtaining prediction output;
step 2, calculating a loss value of each training individual;
step 3, calculating the loss weight of each training individual by using a distribution robust optimization algorithm based on KL divergence; step 4, smoothing the loss weight of the individual;
step 5, training by using the weight loss;
and 6, carrying out data prediction by using the trained model.
Further, the inputs of the graph convolution neural network include feature matrix and graph structure information; the feature matrix comprises a node feature matrix and an edge feature matrix, and the graph structure information is stored by an adjacent matrix to represent the connection relation between the nodes;
the graph convolutional neural network is mainly composed of three parts: feature transformation, information aggregation, and feature updating.
Further, the graph roll-up neural network is a Graph Convolutional Networks (GCN) model:
wherein W is l Representing a feature transformation matrix of a first layer; h l Representation of features of an individual at layer I, H 0 Is an original feature of an individual; d represents a degree matrix of the nodes in the graph and represents the number of neighbors of each node; a represents the adjacency matrix of the graphThen normalizing the adjacency matrix; sigma represents an activation function; by representing nodes by a feature matrix for each node in the graphAnd inputting the adjacency matrix of the edge relation into the graph convolution neural network to finally obtain the classification result of each node.
Further, the current graph data has N individuals, the model classifies each individual, and M risk categories are all used, and the model parameters are optimized by using the cross entropy loss calculated by the labels, and the calculation formula is as follows:
wherein N is the number of training individuals, L i For loss corresponding to individual i, p ic Representing the model's predicted probability of an individual i over category c, y ic Then it is the corresponding label from which the final total loss L is derived.
Further, according to the obtained loss value, a distribution robust optimization algorithm based on KL divergence is used for calculating the loss weight of all current individuals, wherein the loss weight distribution is the worst distribution found by the current algorithm.
Further, the optimization objectives are as follows:
where θ is a parameter of the graph model, θ represents a set of possible parameters of the model, p represents a characteristic distribution of the future possible individual data, and p represents an uncertainty set of all possible distributions, whose uncertainty range is defined byAnd (5) determining. D (P, Q) represents the distance function between the distributions P and Q. Taking KL divergence as an example, when D (P, Q) represents KL divergence between P and Q, D (P, Q) can be represented as +.>f (X, theta) represents the prediction result of the model for each individual, L (F (X, theta), y) represents the loss value corresponding to each individual, whichL in uses a cross entropy loss function. (X, y) to p represent samples (X, y) randomly sampled from the data of distribution p. X represents individual information, including individual characteristic information and sub-image structure information, wherein the sub-image structure information comprises the association information between the current individual and other individuals. />It is the expectation of calculating the individual loss L at distribution p. />Data distribution representing training set, the worst distribution is limited to be +.>Is centered and has a radius rho in the sphere range.
Further, the distance function is a Wasserstein distance or KL divergence.
Further, the individual loss weights obtained in the step 3 are smoothed, so that individuals adjacent to each other or having similar local neighbor structures can have more similar weight values, and therefore, the graph structure information is introduced.
Further, the smoothing operation is interpolation smoothing. The smoothing operation is as follows,
wherein W is s n Represents the weight after smoothing n times, W s 0 For the initial weights calculated in step 3,representing the smoothing matrix, 1 representing the importance of the initial weight, and using λ may enable the smoothed weight to retain part of the weight information obtained in step 3.
The system for realizing the distributed robust optimization method suitable for the map neural network distribution generalization comprises the following components:
the operation module: and inputting the graph data into a graph convolution neural network, obtaining psychological risk level prediction of each individual, and calculating a corresponding loss value by using the labels.
And a weight calculation module: the loss weight of each individual is calculated by using a distribution robust optimization algorithm based on KL divergence.
And a weight smoothing module: and smoothing the obtained loss weight, so that individuals with adjacent or similar local neighbor structures can have more similar weight values.
Training module: the neural network is rolled by training the graph by calculating the overall weighted loss using the smoothed weights and the loss of each individual.
The execution module: when the psychological questionnaire survey results of students in a certain area are obtained, the psychological questionnaire survey results are constructed into graph data, and the graph data are input into a trained model to conduct psychological disease risk prediction.
Compared with the prior art, the invention has the beneficial effects that: the performance of the invention on the data with distribution drift is higher than that of the currently commonly used distribution generalization method; meanwhile, the model performance is obviously improved after the traditional distribution robust optimization method is combined with the weight smoothing method provided by the invention, and the effectiveness of the invention is also proved.
Drawings
FIG. 1 is a diagram of individual similar psychological relationships according to an embodiment of the present invention;
FIG. 2 is a training block diagram of an embodiment of the present invention;
fig. 3 is a flowchart of a distributed robust optimization method suitable for the outward generalization of the distribution of the neural network according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, the following description will be given in detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The distributed robust optimized psychological disease detection method suitable for the map neural network distribution generalization takes all college students of a certain university as detection objects, each college student is an individual, and psychological questionnaires made by the students are used as characteristics of the college students. In order to mine more characteristic information, a student similar psychological relation diagram is constructed by utilizing the questionnaire results of students, so that an edge exists between two students with similar psychological conditions. In general, an individual risk assessment is performed on an individual by using a graph model, characteristic information of the individual and a corresponding relationship sub-graph are input into the graph model, psychological diseases of the individual are predicted, or a whole graph data is input in full quantity. The graph model is optimally trained with only a simple cross entropy penalty. When a classifier is obtained by training student data of a school, the effect of popularizing the classifier to other areas or schools is often poor. This is because the psychological state of students and the characteristic distribution of questionnaires are different from region to region or from school to school. The distribution difference is mainly caused by environmental factors, such as different school courses and learning pressures, and some school students have higher academic pressures and some school students have lower academic pressures; the living standard pressure in different areas is different, the living standard of students in developed areas is higher, the living pressure is small, and the pressure in life can cause psychological state change besides the pressure in academic of students in relatively backward areas. These factors all lead to different distributions of student mental health in different areas and schools. This also results in a large difference between the data distribution faced by the model when reasoning is performed and the distribution when actual training is performed, so that simply exploiting cross entropy loss does not give the model a good generalization ability. The distribution robust optimization algorithm based on KL divergence is a common algorithm for improving the generalization capability of a model, and mainly aims to give higher weight to individuals with larger loss during training so as to improve the occupation ratio of the individuals in the total loss and enable the model to pay more attention to the individuals. However, it is not reasonable to simply introduce this method into the training process of the graph model, for example, in the student similar psychological relation graph data, one individual can generate a relation with other individuals besides the student similar psychological relation graph data, so that the weight of the training loss of the individual is similar to that of the adjacent individuals. According to the invention, after the weight of each individual is calculated by using a distributed robust optimization algorithm, the weight smoothing operation is further designed, so that the smoothed weight considers the individual loss and the connection relationship among the individuals.
The distribution robust optimization method suitable for the distribution generalization of the graph neural network as shown in fig. 2 is used for providing an optimization method for relieving the problems of low accuracy, unreliability and the like of the graph neural network when the graph neural network faces to data outside the distribution, and comprises the following steps:
in the step 1, in the detection of the psychological diseases, a similar psychological relation diagram of students is taken as input, psychological risk prediction output of each student in the relation diagram is obtained by using a graph neural network, aggregation of neighbor information of each student is mainly completed, characteristics of the target students are transformed, and characteristics of the target students are updated, so that structural information of the relation diagram is fully considered. Each node in the student similar psychological relation graph represents a student, and each node can have characteristic information, wherein the characteristics are information mined according to psychological questionnaires made by the student; meanwhile, each edge in the graph represents a similarity relationship between nodes, for example, according to the similarity between the characteristics of the students, when the similarity is higher than a certain threshold, an edge is formed between the characteristics of the students.
The input of the graph convolution neural network model mainly comprises a feature matrix and graph structure information, wherein the feature matrix comprises a node feature matrix and an edge feature matrix, and the graph structure information is mainly stored by an adjacent matrix and represents the connection relation between nodes. The working process of the graph roll-up neural network model mainly comprises three parts: feature transformation, information aggregation and feature update are all characterized in that the feature information of the feature transformation, information aggregation and feature update are updated by aggregating the neighbor features of the target node.
GCN is a preferred embodiment, and the calculation formula is as follows:
wherein W is l Representing a feature transformation matrix of a first layer; h l Representing a feature matrix of the individual at a first layer; d represents a degree matrix of the nodes in the graph and represents the number of neighbors of each node; a represents an adjacency matrix of the graph;
then normalizing the adjacency matrix; sigma represents the activation function.
Taking the psychological disease detection mentioned in this example as an example, it is assumed that there are currently a total of N individuals, the number of characteristics of the individuals being C. The prediction process of the graph roll-up neural network can be divided into the following three steps:
feature transformation: (H) l )W l When l=0, H 0 Representing the original characteristics of the individual, the dimension of the matrix is (N, C), and the characteristics can be information mined according to psychological questionnaires made by students. And carrying out linear transformation on the features through a linear layer to obtain an updated feature matrix.
Information aggregation:a represents the adjacency matrix of the diagram, and a matrix dimension of (N, N) is that an element of 1 represents that a higher similarity relationship exists between two corresponding individuals, and 0 represents that no dissimilarity exists. D represents a degree matrix, the dimension of the matrix is (N, 1), and each value represents how many individuals have higher similarity in the whole relation diagram. Taking an individual a as an example, when the individuals b, c, d and e are very similar, a side exists between the individuals b, c, d and e, through the current step, the individual a can aggregate to obtain the characteristic information of the individual b, c, d, e, and after weighted summation, the characteristic of the obtained individual a is taken as a new characteristic of the individual a, and the obtained characteristic of the individual a not only comprises the original characteristic information, but also comprises the neighbor information, thereby improving the characteristicInformation amount.
Feature update:finally, the features are further updated by an activation function.
Firstly, category prediction is carried out on individual training sets by using a graph convolution neural network model. Through the graph convolution neural network model, each individual in the student similar psychological relation graph, namely each node in the graph, can aggregate the characteristic information of the multi-order neighbor nodes, and further a final prediction result is obtained.
And 2, calculating the loss of each individual.
And (3) obtaining the prediction output of the graph convolution neural network model for each training set individual through the step (1). The current map data has N individuals, the model classifies each individual's psychological disease risk category, and the individual's label indicates that the individual has risk of psychological problems, for a total of M risk categories. The cross entropy loss calculated by the label is utilized to optimize the model parameters, and the calculation formula is as follows:
wherein N is the number of training individuals, l i For each individual corresponding loss, p ic Representing the model's predicted probability of an individual i over category c, y ic Then it is the corresponding label from which the final total loss L is derived.
And 3, calculating the loss weight of each individual by using a distribution robust optimization algorithm based on KL divergence.
For general machine learning classification tasks, by training a classified model under limited training data, only limited data can be used to estimate the true data distribution. Due to the limitation of the data quantity, a certain difference exists between the observable training set distribution and the actual distribution, so that the model cannot learn the actual data distribution, and therefore, the performance of the model is affected by the training set distribution. How to reduce the influence or further optimize the learned classifier becomes a research topic.
Distribution robust optimization (Distributionally Robust Optimization, DRO) is proposed to mitigate the effects of model performance degradation due to distribution drift. The distribution robust optimization takes the data distribution of the training set as the center, and finds the worst data distribution for the current model nearby, so that the corresponding overall loss is maximum, and the model is optimized to have better effect under the worst distribution, so that the generalization capability and the robustness of the model are improved. The optimization targets are as follows:
where θ is a parameter of the graph model, θ represents a set of possible parameters of the model, p represents a characteristic distribution of the future possible individual data, and p represents an uncertainty set of all possible distributions, whose uncertainty range is defined byDetermining that D (P, Q) represents a distance function between the distributions P and Q, taking KL divergence as an example, when D (P, Q) represents KL divergence between P and Q, D (P, Q) can be expressed as +.>f (X, theta) represents the prediction result of the model for each individual, and L (F (X, theta), y) represents the loss value corresponding to each individual, where L uses a cross entropy loss function. (X, y) through p denote that the samples (X, y) were randomly sampled from the data of distribution p. X represents the information of the individual, including the characteristic information of the individual and the sub-image structure information, wherein the sub-image structure information comprises the similarity relation between the current individual and other individuals. />It is the expectation of calculating the individual loss L at distribution p. />Data distribution representing training set, the worst distribution is limited to be +.>Is centered and has a radius rho in the sphere range.
As another preferred embodiment, the distance function is, for example, wasserstein distance, kl distance, etc.
Through the minmax countermeasure training framework, the loss weight coefficient of each individual under the current loss can be calculated by using an optimization algorithm.
And 4, smoothing the loss weights of the individuals obtained in the step 3, and smoothing the obtained loss weights of each individual to enable the individuals adjacent to or having similar local neighbor structures to have more similar weight values, so that the graph structure information is further introduced.
The corresponding loss weight of each individual can be obtained through the step 3, however, the weight only considers the respective loss value of each individual at the moment, and the structural information of the relation graph is not considered. Because the map data has homozygosity in structure, weights of training nodes with similar distances on the map should be similar, the invention provides a weight smoothing module, and after the weights of each individual are calculated by utilizing the step 3, a smoothing operation is carried out on the weight values, so that the weights of the training nodes with similar distances on the map are similar, and the specific structural information of the map data is considered. The smoothing operation is as follows,
wherein W is s n Represents the weight after smoothing n times, W s 0 For the initial weights calculated in step 3,representing the smoothed matrix and,1 represents the importance of the initial weight, and the smoothed weight can be enabled to retain part of the weight information obtained in step 3 by using lambda.
Through the improvement of the step, the distributed robust optimization algorithm can be more suitable for graph data, and more information is integrated, so that the adaptability of the method on the graph data is improved.
Step 5, training the weighted loss, and calculating the weighted sum of the loss of all individuals by using the smoothed loss weight and the corresponding loss value to be used as the final total loss; after the steps are completed, the gradient descent optimization algorithm is utilized to minimize the total prediction loss, and the parameters of the graph convolution neural network model are updated and optimized.
And 6, predicting the psychological disease risk category of the individual by using the trained model.
The system for realizing the distributed robust optimization method suitable for the map neural network distribution generalization comprises the following components:
the operation module: and inputting the student similar psychological relation graph data into a graph convolution neural network, obtaining psychological risk level prediction of each individual, and calculating a corresponding loss value by using the label.
And a weight calculation module: the loss weight of each individual is calculated by using a distribution robust optimization algorithm based on KL divergence.
And a weight smoothing module: and smoothing the obtained loss weight, so that individuals with adjacent or similar local neighbor structures can have more similar weight values.
Training module: the neural network is rolled by training the graph by calculating the overall weighted loss using the smoothed weights and the loss of each individual.
The execution module: when the psychological questionnaire survey results of students in a certain area are obtained, the psychological questionnaire survey results are constructed into graph data, and the graph data are input into a trained graph convolutional neural network to conduct psychological disease risk prediction.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.
Claims (10)
1. The distribution robust optimization method suitable for the map neural network distribution generalization is characterized by comprising the following steps of:
step 1, inputting a training individual into a graph convolution neural network model, completing aggregation and feature transformation of neighbor features of a target individual, updating the features of the target individual, and finally obtaining prediction output;
step 2, calculating a loss value of each training individual;
step 3, calculating the loss weight of each training individual by using a distribution robust optimization algorithm based on KL divergence;
step 4, smoothing the loss weight of the individual;
step 5, training by using the weight loss;
and 6, carrying out data prediction by using the trained model.
2. The distributed robust optimization method suitable for the outward generalization of the graph neural network distribution according to claim 1, wherein the input of the graph convolution neural network comprises a feature matrix and graph structure information, wherein the feature matrix comprises a node feature matrix and an edge feature matrix, and the graph structure information is stored by an adjacent matrix and represents the connection relation between nodes;
the graph convolutional neural network is mainly composed of three parts: feature transformation, information aggregation, and feature updating.
3. The distributed robust optimization method for out-of-distribution generalization of a graph neural network according to claim 2, characterized in that the graph roll-up neural network is a Graph Convolutional Networks (GCN) model:
wherein W is l Representing a feature transformation matrix of a first layer; h l Representation of features of an individual at layer I, H 0 Is an original feature of an individual; d represents a degree matrix of the nodes in the graph and represents the number of neighbors of each node; a represents the adjacency matrix of the graph,then normalizing the adjacency matrix; sigma represents an activation function; the characteristic matrix of each node in the graph and the adjacency matrix representing the edge relation between the nodes are input into the graph convolution neural network, and finally the classification result of each node is obtained.
4. The distributed robust optimization method for the outward generalization of the graph neural network distribution according to claim 1, wherein the current graph data has N individuals, the model classifies each individual, and there are M risk categories, and the model parameters are optimized by using the cross entropy loss calculated by the labels, and the calculation formula is as follows:
wherein N is the number of training individuals, L i For loss corresponding to individual i, p ic Representing the model's predicted probability of an individual i over category c, y ic Then it is the corresponding label from which the final total loss L is derived.
5. The distribution robust optimization method suitable for the distribution generalization of the graph neural network according to claim 1, wherein a distribution robust optimization algorithm based on KL divergence is used to calculate the loss weight of all current individuals according to the obtained loss value, and the loss weight distribution is the worst distribution found by the current algorithm.
6. The distribution robust optimization method suitable for off-distribution generalization of a graph neural network according to claim 1, wherein the optimization objective is as follows:
where θ is a parameter of the graph model, θ represents a set of possible parameters of the model, p represents a characteristic distribution of the future possible individual data, and p represents an uncertainty set of all possible distributions, whose uncertainty range is defined byDetermining that D (P, Q) represents a distance function between the distributions P and Q, taking KL divergence as an example, when D (P, Q) represents KL divergence between P and Q, D (P, Q) is expressed as +.> f (X, θ) represents the model's predicted outcome for each individual,
l (f (X, theta)) represents a loss value corresponding to each individual, wherein L uses a cross entropy loss function, samples (X, y) to p are randomly sampled from data of distribution p, X represents individual information and comprises individual characteristic information and sub-graph structure information, the sub-graph structure information comprises association information between the current individual and other individuals,it is the expectation of calculating the individual loss L under distribution p,/->Data distribution representing training set, the worst distribution is limited to be +.>Is centered and has a radius rho in the sphere range.
7. The distribution robust optimization method for extradistribution generalization of a graph neural network according to claim 6, wherein the distance function is a wasperstein distance or KL divergence.
8. The distribution robust optimization method suitable for the outward generalization of the graph neural network distribution according to claim 1, wherein the individual loss weights obtained in the step 3 are smoothed, so that individuals with adjacent or similar local neighbor structures can have more similar weight values, and graph structure information is introduced.
9. The method for distribution robustness optimization for use in extradistribution generalization of a graph neural network according to claim 8, wherein the smoothing operation is interpolation smoothing, the smoothing operation is as follows,
wherein W is s n Represents the weight after smoothing n times, W s 0 For the initial weights calculated in step 3,representing a smoothing matrix, wherein 1 represents the importance degree of the initial weight, and lambda is utilized to enable the smoothed weight to retain part of the weight information obtained in the step 3.
10. A system for implementing a distributed robust optimization method for off-distribution generalization of a graph neural network as recited in claim 1, comprising:
the operation module: inputting the graph data into a graph convolution neural network, obtaining psychological risk level prediction of each individual, and calculating a corresponding loss value by using a label;
and a weight calculation module: calculating the loss weight of each individual by using a distribution robust optimization algorithm based on KL divergence;
and a weight smoothing module: smoothing the obtained loss weight to enable individuals with adjacent or similar local neighbor structures to have more similar weight values;
training module: calculating the total weighted loss by using the smoothed weight and the loss of each individual, and training the graph convolution neural network;
the execution module: when the psychological questionnaire survey results of students in a certain area are obtained, the psychological questionnaire survey results are constructed into graph data, and the graph data are input into a trained model to conduct psychological disease risk prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310178341.7A CN116484911A (en) | 2023-02-28 | 2023-02-28 | Distribution robust optimization method and system suitable for graph neural network distribution generalization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310178341.7A CN116484911A (en) | 2023-02-28 | 2023-02-28 | Distribution robust optimization method and system suitable for graph neural network distribution generalization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116484911A true CN116484911A (en) | 2023-07-25 |
Family
ID=87216685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310178341.7A Pending CN116484911A (en) | 2023-02-28 | 2023-02-28 | Distribution robust optimization method and system suitable for graph neural network distribution generalization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116484911A (en) |
-
2023
- 2023-02-28 CN CN202310178341.7A patent/CN116484911A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110675623B (en) | Short-term traffic flow prediction method, system and device based on hybrid deep learning | |
Rienow et al. | Supporting SLEUTH–Enhancing a cellular automaton with support vector machines for urban growth modeling | |
Matijaš et al. | Load forecasting using a multivariate meta-learning system | |
CN112949828B (en) | Graph convolution neural network traffic prediction method and system based on graph learning | |
CN113313947A (en) | Road condition evaluation method of short-term traffic prediction graph convolution network | |
CN114220271A (en) | Traffic flow prediction method, equipment and storage medium based on dynamic space-time graph convolution cycle network | |
CN112541302A (en) | Air quality prediction model training method, air quality prediction method and device | |
CN108986453A (en) | A kind of traffic movement prediction method based on contextual information, system and device | |
CN109740588A (en) | The X-ray picture contraband localization method reassigned based on the response of Weakly supervised and depth | |
CN115131618B (en) | Semi-supervised image classification method based on causal reasoning | |
CN104156943B (en) | Multi objective fuzzy cluster image change detection method based on non-dominant neighborhood immune algorithm | |
Konstantakopoulos et al. | Design, benchmarking and explainability analysis of a game-theoretic framework towards energy efficiency in smart infrastructure | |
CN116307152A (en) | Traffic prediction method for space-time interactive dynamic graph attention network | |
CN106056577A (en) | Hybrid cascaded SAR image change detection method based on MDS-SRM | |
Handoyo et al. | The fuzzy inference system with rule bases generated by using the fuzzy C-means to predict regional minimum wage in Indonesia | |
CN112883133B (en) | Flow prediction method based on time sequence data and functional evolution data | |
Somu et al. | Evaluation of building energy demand forecast models using multi-attribute decision making approach | |
Son et al. | Partial convolutional LSTM for spatiotemporal prediction of incomplete data | |
CN117455551A (en) | Industry electricity consumption prediction method based on industry relation complex network | |
CN117436653A (en) | Prediction model construction method and prediction method for travel demands of network about vehicles | |
CN117593877A (en) | Short-time traffic flow prediction method based on integrated graph convolution neural network | |
CN117217779A (en) | Training method and device of prediction model and information prediction method and device | |
Dahal | Effect of different distance measures in result of cluster analysis | |
KR102110316B1 (en) | Method and device for variational interference using neural network | |
CN115019342B (en) | Endangered animal target detection method based on class relation reasoning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |