CN112651436A

CN112651436A - Optimization method and device based on uncertain weight graph convolution neural network

Info

Publication number: CN112651436A
Application number: CN202011546124.1A
Authority: CN
Inventors: 孙月; 闫潇宁
Original assignee: Shenzhen Anruan Technology Co Ltd
Current assignee: Shenzhen Anruan Technology Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-13

Abstract

The embodiment of the application belongs to the technical field of deep learning, and relates to an optimization method of a convolutional neural network based on uncertain weights, which comprises the following steps: acquiring and preprocessing graph structure data; constructing a graph convolution neural network, and acquiring posterior distribution of weight parameters of the graph convolution neural network through KL divergence loss based on prior distribution of the weight parameters of the graph convolution neural network; and updating the graph convolution neural network according to posterior distribution of the weight parameters, and training the updated graph convolution neural network by using the graph structure data. On the basis of the constructed prior distribution of the weight parameters of the graph convolution neural network, KL divergence loss is utilized to learn the posterior distribution of the weight parameters of the graph convolution neural network, and the posterior distribution of the weight parameters is used to update the graph convolution neural network, so that uncertainty is introduced into the weight of the graph convolution neural network, and the accuracy of the graph structure data classification of the graph convolution neural network model is improved.

Description

Optimization method and device based on uncertain weight graph convolution neural network

Technical Field

The invention relates to the technical field of deep learning, in particular to an optimization method and device based on a graph convolution neural network with uncertain weights, a node classification system based on the graph convolution neural network with uncertain weights, computer equipment and a computer-readable storage medium.

Background

In the past decade, deep learning has enjoyed tremendous success on conventional data in Euclidean space, such as speech, image and natural language processing. However, non-european data structures are ubiquitous in the real world and may represent relationships between objects, such as social networks, e-commerce networks, bio-structure networks, and transportation networks. Therefore, how to process the graph structure data using the deep learning method has attracted much attention in the past few years.

While prior methods successfully applied the idea of convolution operations to process non-euclidean graph data and achieved reasonably good performance, they processed the graph as representing the true relationship between nodes. In many cases, the graph itself used in an application is derived from complex data or incorrect modeling assumptions. In these complex graphs, the existence of false edges or no strong relationship between nodes can affect the learning of the model. To address the uncertainty of graph structure data, a Byesian (bayesian) framework has been proposed in which observed graphs are treated as random samples in a set of parametric stochastic graph model descriptions. Then, inspired by this work, Pal et al proposed a nonparametric generative model of a graph based on replicated nodes and an alternative generative model of the graph, respectively. However, these methods focus on how to efficiently generate new graphs and then actually learn the graph structure representation by methods that employ a graph convolutional neural network (GCN) model. Furthermore, these methods fail to correctly assess the uncertainty in the training data, resulting in the model making an overly confident decision on the correct node class or prediction, resulting in a reduction in the model node classification accuracy.

Disclosure of Invention

The embodiment of the application aims to provide an optimization method of a graph convolution neural network based on uncertain weight, so that the graph convolution neural network can correctly evaluate the uncertainty of data when facing graph structure data, and the accuracy of model node classification is improved.

In order to solve the above technical problem, an embodiment of the present application provides an optimization method based on a graph convolution neural network with uncertain weights, which adopts the following technical solutions:

acquiring and preprocessing graph structure data;

constructing a graph convolution neural network, and acquiring posterior distribution of weight parameters of the graph convolution neural network through KL divergence loss based on prior distribution of the weight parameters of the graph convolution neural network;

and updating the graph convolution neural network according to posterior distribution of the weight parameters, and training the updated graph convolution neural network by using the graph structure data.

Further, the step of obtaining the posterior distribution of the weight parameters of the convolutional neural network through KL divergence loss based on the prior distribution of the weight parameters of the convolutional neural network includes:

obtaining prior distribution of weight parameters of the graph convolution neural network, wherein the weight parameters comprise expectation of weight and variance of the weight;

initializing the posterior distribution of the weight parameters of the graph convolution neural network and resampling to obtain the initial value of the resampled posterior distribution of the weight parameters;

and updating the posterior distribution of the weight parameters of the graph convolution neural network by using KL divergence loss according to the prior distribution of the weight parameters and the initial value of the weight parameter posterior distribution after resampling.

Further, the step of updating the graph convolution neural network according to the posterior distribution of the weight parameter includes:

forward propagation of the graph convolution neural network is performed based on the weight sampling weight parameter posterior distribution and cross entropy loss is calculated;

calculating a total loss based on the cross entropy loss and the KL divergence loss;

performing back propagation by using variational Bayesian inference according to the total loss, and calculating the gradient of posterior distribution of the weight parameter;

and optimizing the posterior distribution of the weight parameters by using the gradient, and updating the weight parameters of the graph convolution neural network by the optimized posterior distribution of the weight parameters.

Further, the optimizing the posterior distribution of the weight parameter using the gradient includes:

and carrying out random gradient descent optimization on the posterior distribution of the weight parameters according to the gradient.

Further, the KL divergence loss decreases a distance between the weight parameter prior distribution and the weight parameter posterior distribution to update the weight parameter posterior distribution.

Further, the preprocessing of the graph structure data includes a normalization process.

Further, in order to solve the above technical problem, an embodiment of the present application further provides an optimization apparatus for a convolutional neural network based on uncertainty of weights, including:

the acquisition module is used for acquiring and preprocessing the graph structure data;

the construction module is used for constructing the graph convolution neural network, acquiring posterior distribution of the weight parameters of the graph convolution neural network through KL divergence loss based on prior distribution of the weight parameters of the graph convolution neural network;

and the updating module is used for updating the graph convolution neural network according to the posterior distribution of the weight parameters and training the updated graph convolution neural network by using the graph structure data to obtain the optimized graph convolution neural network.

Further, in order to solve the above technical problem, an embodiment of the present application further provides a node classification system based on a graph convolution neural network with uncertain weights, which includes an obtaining unit configured to obtain and preprocess graph structure data, and a graph convolution neural network model based on uncertain weights, which is configured to extract node features from the graph structure data and perform node classification, where the graph convolution neural network model is optimized and trained according to the optimization method based on a graph convolution neural network with uncertain weights and has a characteristic of uncertain weights.

Further, in order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the optimization method based on the weight uncertainty graph convolution neural network when executing the computer program.

Further, in order to solve the above technical problem, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the optimization method based on the uncertainty-based atlas volume neural network.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: a method for optimizing a convolutional neural network based on uncertainty of weights is provided, the method comprising: acquiring and preprocessing graph structure data; the method is used for constructing the graph convolution neural network, acquiring the posterior distribution of the weight parameters of the graph convolution neural network through KL divergence loss based on the prior distribution of the weight parameters of the graph convolution neural network; and updating the graph convolution neural network according to posterior distribution of the weight parameters, and training the updated graph convolution neural network by using the graph structure data. KL divergence is carried out on the weight parameter prior distribution of the constructed graph convolution neural network to calculate the weight parameter posterior distribution of the graph convolution neural network, and the graph convolution neural network is updated and trained by using the weight parameter posterior distribution based on Bayesian inference, so that uncertainty is introduced into the graph convolution neural network weight, the uncertainty in training data can be correctly evaluated, the confidence of the graph convolution neural network model in predicting the correct node category is reduced, and the accuracy of the graph convolution neural network model in classifying the graph structure data nodes is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 illustrates a flow chart of a method for optimization of a convolutional neural network based on weight uncertainty graph provided in accordance with an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an optimization apparatus for a convolutional neural network based on a graph of weight uncertainty provided according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram illustrating a node classification system based on a weight uncertainty graph convolution neural network according to an embodiment of the present application;

fig. 4 is a block diagram of a basic structure of a computer device provided according to an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a flowchart illustrating an optimization method of a weight uncertainty-based convolutional neural network according to the present application, and the optimization method of the weight uncertainty-based convolutional neural network includes the steps of:

101. acquiring and preprocessing graph structure data;

102. the method is used for constructing the graph convolution neural network, acquiring the posterior distribution of the weight parameters of the graph convolution neural network through KL divergence loss based on the prior distribution of the weight parameters of the graph convolution neural network;

103. and updating the graph convolution neural network according to posterior distribution of the weight parameters, and training the updated graph convolution neural network by using the graph structure data to obtain an optimized graph convolution neural network.

The following graph convolution neural network described in the present application is first set forth. In real life, there are many irregular data structures, typically graph structures, or topology structures, such as social networks, chemical molecular structures, knowledge maps, and so on; even if the language is a language, the interior of the language is actually a complex tree structure and is also a graph structure; like a picture, when target recognition is carried out, the concerned points are only partial key points on the two-dimensional picture, and the structure of the picture is formed by the key points. The structure of the graph is generally quite irregular and can be considered as a data of infinite dimensions, so it has no translational invariance. The surrounding structure of each node may be unique, and the data of the structure makes the traditional Convolutional Neural Network (CNN) and the cyclic neural network (RNN) fail. Graph convolutional neural networks (GCNs), like CNNs, are a feature extractor, except that the object of its processing is graph data. The graph convolutional neural network (GCN) subtly designs a method for extracting features from graph data, so that we can use the features to perform node classification (node classification), graph classification (graph classification), link prediction (link prediction) on the graph data, and incidentally obtain an embedded representation of the graph, and the like.

Specifically, assuming that there is a batch of graph data, where there are N nodes (nodes), each node has its own features, we set the features of these nodes to form an N × D dimensional matrix X, and then the relationship between the nodes will also form an N × N dimensional matrix a, also called an adjacency matrix (adjacency matrix), where X and a are the inputs of the graph convolution neural network model. The graph convolution neural network GCN is also a neural network layer, and the propagation mode between layers is as follows:

Z＝f(X，A)＝softmax(A ReLU(AXW⁽⁰⁾)W⁽¹⁾).

wherein the content of the first and second substances,

representing an adjacency matrix of an undirected graph with incremental self-joins, I_NIs an identity matrix.

For the preprocessing calculation mode of the acquired graph structure data, namely normalization preprocessing, because A is a matrix which is not normalized, the original distribution of the features can be changed by multiplying the A by the feature matrix, and some unpredictable problems are generated; a standardized pre-treatment of a is required.

The degree matrix representing the node may be further multiplied by A to obtain a symmetric and normalized matrix, i.e.

Is an input weight parameter matrix of a hidden layer with H feature maps,

is a hidden layer output weight parameter matrix. The Softmax activation function is defined as

In the conventional neural network, the input weight parameters and the output weight parameters are fixed values, but in the embodiment of the invention, a back propagation algorithm based on KL divergence loss calculation and variational Bayesian inference is provided, so that all weights in the neural network can be represented by probability distribution on transferable values instead of a single fixed value. The algorithm aims to introduce uncertainty in the network weights using variational bayes learning, in particular, for given training data, the variational bayes inference of the neural network computes the posterior distribution p (w | D) of the weights. However, this distribution feeds back the predicted values for the hidden data by taking the expected values:

that is, achieving the expectation under a posterior distribution of weights is equivalent to using a distribution consisting of an infinite number of neural networks. However, this is troublesome for neural networks of any practical scale. Therefore, the prior distribution parameter θ on the weight q (w | θ) is found, and the distance between the posterior distribution of the weight parameter and the prior distribution parameter is made smaller by using the Kullback-leibel (kl) divergence.

Firstly, a graph convolution neural network is constructed according to the propagation mode between layers of the graph convolution neural network, and the posterior distribution of the weight parameters of the graph convolution neural network is obtained based on the prior distribution of the weight parameters of the graph convolution neural network. Specifically, the following are:

obtaining an a priori distribution of a weight parameter q (w | θ) of the graph convolution neural network, the weight parameter including a desired μ of the weight and a variance s of the weight, and the a priori distribution of the weight parameter may be a priori value manually set in advance, such as μ ═ 0.5, and s ═ 0.5.

Initializing the posterior distribution of the weight parameters of the graph convolution neural network and resampling to obtain the initial value of the resampled posterior distribution of the weight parameters; specifically, the posterior distribution of the weight parameter may be initialized by gaussian distribution, the initial value of the posterior distribution of the real weight parameter is simulated by randomly initializing gaussian distribution, and the initial value of the posterior distribution of the weight parameter is resampled to obtain the posterior distribution of the resampled weight parameter, that is:

W←μ+exp(s)·∈

and then calculating and updating the posterior distribution of the weight parameters of the graph convolution neural network by using the KL divergence loss based on the initial value of the posterior distribution of the weight sampling weight parameters and the preset prior distribution of the weight parameters, namely calculating the posterior distribution of the weight parameters by reducing the distance between the posterior distribution of the weight parameters and the prior distribution of the weight parameters by using the KL divergence loss, and enabling the posterior distribution of the weight parameters to be closer to the given prior distribution of the weight parameters so as to carry out regularization operation on the posterior distribution of the weight parameters.

Then forward the graph convolution neural network based on the weight-sampling weight parameter posterior distribution

Propagating and computing cross entropy loss

Namely, calculating the loss between a true value and a predicted value by using cross entropy loss;

further, calculating a total loss based on the crossover loss and the KL divergence loss, according to the

The total loss is propagated reversely by using variational Bayesian inference, and the gradient of posterior distribution of the weight parameter is calculated, namely:

L(μ,s)←loss_ce+loss_kl

then, optimizing posterior distribution of the weight parameters by using the gradient, updating the weight parameters of the graph convolution neural network by using the optimized weight parameters, and then training the graph convolution neural network updated by the weight parameters by using the preprocessed graph structure data so as to obtain the optimized graph convolution neural network; the optimization is carried out by using a Stochastic Gradient Descent (SGD) algorithm:

therefore, in the embodiment of the invention, KL divergence loss calculation is performed according to the prior distribution of the weight parameters of the constructed graph convolution neural network to learn the posterior distribution of the weight parameters of the graph convolution neural network, and the posterior distribution of the weight parameters is used for updating and training the graph convolution neural network based on variational Bayesian inference, so that uncertainty is introduced into the weight of the graph convolution neural network, the uncertainty in the training data can be correctly evaluated, the confidence of the graph convolution neural network model in predicting the correct node category is reduced, and the accuracy of the graph structure data node classification of the graph convolution neural network model is improved.

Further, in order to solve the above technical problem, an embodiment of the present application further provides a rights-based method

An apparatus 200 for optimizing a convolutional neural network of a heavily uncertain graph, as shown in fig. 2, the apparatus 200 includes:

an obtaining module 201, configured to obtain graph structure data and perform preprocessing;

the construction module 202 is configured to construct a graph convolution neural network, and obtain posterior distribution of weight parameters of the graph convolution neural network through KL divergence loss based on prior distribution of the weight parameters of the graph convolution neural network;

and the updating module 203 updates the graph convolution neural network according to the posterior distribution of the weight parameters, and trains the updated graph convolution neural network by using the graph structure data to obtain an optimized graph convolution neural network.

Further, in order to solve the above technical problem, an embodiment of the present invention further provides a node classification system based on a weight uncertainty graph convolution neural network, as shown in fig. 3, the system 300 comprises an acquisition unit 301 for acquiring and preprocessing graph structure data, a graph convolutional neural network model 302 based on weight uncertainty for extracting node features from the graph structure data and classifying nodes, wherein the graph convolutional neural network model 302 is optimally trained according to the optimization method based on the graph convolutional neural network with uncertain weights and has the characteristic of uncertain weights, thereby being capable of correctly evaluating the uncertainty in the training data, reducing the confidence of the prediction of the graph convolution neural network model to the correct node category, therefore, the accuracy of the graph convolution neural network model in classifying the graph structure data nodes can be improved.

To solve the foregoing technical problem, an embodiment of the present application further provides a computer device, and specifically refer to fig. 4, where fig. 4 is a block diagram of a basic structure of the computer device according to the embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device 16. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes of an optimization method based on a graph convolutional neural network with uncertain weights. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, for example, execute the program code of the optimization method based on the uncertainty-based atlas neural network.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The present application provides yet another embodiment, which provides a computer-readable storage medium storing a program for optimizing a weight uncertainty-based atlas neural network, where the program for optimizing the weight uncertainty-based atlas neural network is executable by at least one processor to cause the at least one processor to perform the steps of the method for optimizing a weight uncertainty-based atlas neural network as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present application may be substantially or partially embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method for optimizing a convolutional neural network based on uncertainty of weights according to the embodiments of the present application.

It should be understood that, although the respective subsystems in the structural diagram of the drawings are sequentially shown as indicated by arrows, the subsystems are not necessarily sequentially executed in the order indicated by the arrows. The execution of these subsystems is not strictly sequential, and may be performed in other sequences unless explicitly stated otherwise herein. Moreover, at least a portion of the subsystems in the schematic block diagrams of the figures may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be alternated or performed with other steps or at least a portion of the sub-steps or stages of other steps.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for optimizing a convolutional neural network based on uncertainty of weight, which is characterized by comprising the following steps:

acquiring and preprocessing graph structure data;

and updating the graph convolution neural network according to posterior distribution of the weight parameters, and training the updated graph convolution neural network by using the graph structure data to obtain an optimized graph convolution neural network.

2. The method of claim 1, wherein the step of obtaining the posterior distribution of the weight parameters of the convolutional neural network through KL divergence loss based on the prior distribution of the weight parameters of the convolutional neural network with uncertain weights comprises:

3. The method of claim 2 for optimizing a convolutional neural network based on uncertainty in weight,

wherein the step of updating the atlas neural network based on the posterior distribution of the weight parameters comprises:

4. The method of optimizing a weight uncertainty based graph convolution neural network of claim 3, wherein the optimizing the posterior distribution of weight parameters using the gradient comprises:

5. The method for weight uncertainty graph convolution neural network based on claim 4, wherein the KL divergence loss updates the weight parameter posterior distribution by making a distance between the weight parameter prior distribution and the weight parameter posterior distribution smaller.

6. The method for optimizing a weight uncertainty-based graph convolution neural network of any one of claims 1 to 5, wherein the preprocessing of the graph structure data includes a normalization process.

7. An optimization apparatus based on a weight uncertainty graph convolution neural network, comprising:

8. A node classification system based on a graph convolution neural network with uncertain weight is characterized in that,

the method comprises an acquisition unit for acquiring and preprocessing graph structure data, and a graph convolutional neural network model based on weight uncertainty for extracting node features from the graph structure data and classifying nodes, wherein the graph convolutional neural network model is optimally trained according to the optimization method of the graph convolutional neural network based on weight uncertainty of any one of claims 1 to 6 and has the characteristic of weight uncertainty.

9. A computer device comprising a memory having stored therein a computer program and a processor which when executed implements the steps of the method of optimization of a weight uncertainty based atlas neural network of any of claims 1-6.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, implements the steps of the method for weight uncertainty based atlas neural network optimization of any of claims 1 to 6.