CN110766044B

CN110766044B - Neural network training method based on Gaussian process prior guidance

Info

Publication number: CN110766044B
Application number: CN201910858834.9A
Authority: CN
Inventors: 崔家宝; 朱文武; 励雪巍; 李玺
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2021-10-26
Anticipated expiration: 2039-09-11
Also published as: CN110766044A

Abstract

The invention discloses a neural network training method based on Gaussian process prior guidance, which is used for improving the training process of a neural network to obtain better training effect. The method specifically comprises the following steps: s1, acquiring a data set for neural network training, selecting a representative set for modeling prior knowledge, and defining an algorithm target; s2, carrying out a training process of iterative learning on the neural network model in batches in one period, and sequentially executing steps S21-S24 in each iterative batch; s3, after the training process of the current period is finished, verifying the neural network model by using a verification set to obtain the error rate of the verification set of the current model; and S4, continuously repeating the steps S2 and S3 to carry out a multi-period training process on the neural network model until the model converges. The neural network training method based on the Gaussian process prior guidance can effectively improve the training effectiveness in tasks, improves the network learning ability and learning quality, and has good application value.

Description

Neural network training method based on Gaussian process prior guidance

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a neural network training method based on Gaussian process prior guidance.

Background

Image classification is a task of distinguishing between different classes of pictures in a data set. At present, the mainstream solution on the task of image classification is to train a convolutional neural network to solve the problem, and the training method generally adopts a random gradient descent method. In recent years, as the progress rate of network architecture is gradually reduced, improvement of training strategies is increasingly important. To this end, the present invention recognizes that it is desirable to provide as sophisticated and efficient supervised information as possible to train a given model better in supervised learning such as image classification. The data set provides a label, but the inherent label only represents the classification result of the picture and does not show the relationship between the picture and other categories. On the basis of utilizing the inherent labels of the data set, the invention introduces the soft labels representing the probability distribution of the image classification result by random process modeling, and combines the soft labels with the inherent labels of the data set for use, thereby improving the effectiveness of the training method.

Disclosure of Invention

In order to solve the problems, the invention provides a neural network training method based on Gaussian process prior guidance. The method is based on deep learning and random processes, the Gaussian process in the random process is used for modeling the correlation between images, the model is used for giving a 'soft label' to each training sample, the soft label and the inherent label of a data set are used for guiding the training process, and therefore the trained model is more accurate and robust.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a neural network training method based on Gaussian process prior guidance comprises the following steps:

s1, acquiring a data set for neural network training, selecting a representative set for modeling prior knowledge, and defining an algorithm target;

s2, carrying out a training process of batch iterative learning in one period (epoch) on the neural network model, and sequentially executing steps S21-S24 in each iterative batch (batch):

s21, before the current iteration batch starts, carrying out combined modeling on samples in the representative set and training samples of the batch to obtain related prior knowledge;

s22, starting the current iteration batch learning process, and calculating the soft label of the batch of training samples according to the representative set and the batch of training samples; after the forward propagation process of the batch of training samples is carried out, the loss functions of the network output and the inherent labels of the batch of training samples are calculated

And loss functions of inherent labels and soft labels of the training samples of the batch

S23, calculating loss functions of network output and soft labels of training samples in the batch

S24, making a total loss function

And to

Is counter-propagating, wherein

In part and

some of which are used to optimize all parameters of the neural network,

part of the convolutional layer parameters are only used for optimizing the neural network;

s3, after the training process of the current period is finished, verifying the neural network model by using a verification set to obtain the error rate of the verification set of the current model;

and S4, continuously repeating the steps S2 and S3 to carry out a multi-period training process on the neural network model until the model converges.

Based on the scheme, the steps can be realized in the following modes:

the representative set in step S1 is a set containing a plurality of images of different types, and the method for constructing the representative set includes:

first, the number of categories for the entire data set is evaluated:

when the category number of the data set is less than 50 classes, taking 50 pictures from each class of images, and then taking the pictures taken from all the classes as a representative set;

when the category number of the data set is more than or equal to 50 categories, taking 100 pictures from each category of images, and then taking the pictures taken from all the categories as a representative set;

the algorithm targets are defined as: function of total loss

And (4) minimizing.

In step S21, the specific steps of performing joint modeling on the samples in the representative set and the training samples of the batch and obtaining the related prior knowledge include:

s211, carrying out feature extraction on samples in the representative set and training samples of the batch by using convolutional layer parameters of the initial neural network model in each training process to obtain feature vectors of all samples;

s212, jointly modeling all samples in the representative set and samples to be predicted into a Gaussian process:

wherein the content of the first and second substances,

a representative set is represented that represents a set of representations,

to represent the set of feature vectors for all picture samples in a set,

a set of feature vectors representing all samples in the set;

is the label of the sample to be predicted, h_bIs the feature vector of the sample to be predicted; k (·, ·) represents a covariance matrix, and an RBF kernel function is used for calculation, wherein the calculation general formula of the RBF kernel function is as follows:

wherein r is²(a, b) represents a second order Euclidean distance between a and b, and l is a characteristic length; -

Calculating to obtain prior knowledge through the modeling of the Gaussian process

The specific implementation method of step S22 is:

s221, calculating according to the batch of training samples

K(h_b,h_b)、

Performing a gaussian process regression algorithm using the gaussian process constructed in step S21, and

and (3) predicting:

wherein g is_mAnd g_vRespectively, the predicted mean and variance;

s222, carrying out forward propagation on the network, and calculating the current output h (x) of the networkⁱ) And using h (x)ⁱ) And g_m、g_vComputing loss function network output h (x)ⁱ) And the inherent label y of the training sample of the batchⁱLoss function of

And the inherent label y of the training sample of the batchⁱAnd a soft label g_m(xⁱ) Loss function of

Wherein

And

both represent cross entropy calculations, and the calculation formula for the parameters α and γ is:

u is the error rate of the validation set during the previous training period, and the initial value during the first training period is

C is the category number of the data set;

for the last iteration batch in the training process of this period

Absolute value of, in the first iteration batch

All initial values of (1).

In step S23, the network outputs h (x)ⁱ) And the soft label g of the training sample of the batch_m(xⁱ) Loss function of

The calculation formula of (2) is as follows:

wherein:

the relative entropy calculation is expressed, and the calculation formula of the parameter beta is as follows;

u is the error rate of the verification set in the previous training process, and the initial value is

C is the number of categories;

for the last iteration batch in the training process of this period

Absolute value of, in the first iteration batch

All initial values of (1).

In step S24, the three term loss functions obtained in steps S22 and S23 are added to form a total loss function

To pair

Optimizing to achieve the algorithm goal, wherein

In part and

some of which are used to optimize all parameters of the neural network,

part of which is used only to optimize convolutional layer parameters of the neural network.

In step S3, after all iterations of the training process in this period are completed, the pictures of the verification set are sequentially passed through the network by using the current network, and then the error rate of the prediction result given by the current network is calculated.

Compared with the prior art, the invention has the following beneficial effects:

firstly, the neural network training method based on the prior guidance of the Gaussian process provides a solution to the problem that only a small part of samples can be sampled simultaneously for training and global information cannot be considered in the random gradient descent method commonly used in the current deep learning, and the performance of the trained network can be improved by effectively solving the problem.

Secondly, the representative set sampling method of the invention can play a role in data sets with different sizes based on different characteristics of different data sets.

Finally, compared with the traditional one-item loss function, the three-item loss function provided by the invention can enable the model to consider information brought by different labels and global information contained in the 'soft label', so that the training of the model is more perfect.

The neural network training method based on the Gaussian process prior guidance can effectively improve the training effectiveness in tasks, improves the network learning ability and learning quality, and has good application value.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a comparison of training set error rates of the ResNet20 network on a CIFAR-100 dataset in an example;

FIG. 3 is a comparison of validation set error rates of the ResNet20 network on a CIFAR-100 dataset in an example.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

As shown in fig. 1, a neural network training method based on a gaussian process prior guidance is characterized by comprising the following steps:

s1, acquiring a data set for neural network training, and defining a neural network model structure to be trained. A representative set for modeling prior knowledge is selected from the data set. The representative set in the step is a set comprising a plurality of images of different types, and the construction method of the representative set comprises the following steps:

first, the number of categories for the entire data set is evaluated:

the algorithm targets are defined as: function of total loss

And (4) minimizing.

S2, carrying out a training process of iterative learning on the neural network model in batches in one period, and sequentially executing steps S21-S24 in each iterative batch:

and S21, before the current iteration batch starts, carrying out combined modeling on the samples in the representative set and the training samples of the batch and obtaining related prior knowledge. In this embodiment, the specific implementation step in step S21 includes:

s212, supposing that the representative set is

All picture sample sets in the representative set are

The set of intrinsic labels representing all picture samples in the set is

X hereⁱIs a picture, yⁱIs a label of the picture. Defining:

to represent all samples in the set

The function f () is the structure of the neural network model except the fully connected layer, i.e. all convolutional layer parameters of the neural network model, and is used to extract the feature vector of the sample。

Is the label of the sample to be predicted, h_bIs the feature vector of the sample to be predicted. The function h () represents a fully connected layer.

Jointly modeling all samples in the representative set and the samples to be predicted into a Gaussian process:

wherein, K (·,) represents a covariance matrix, which is calculated using an RBF kernel, the calculation formula of the RBF kernel is:

wherein r is²(a, b) represents the second order Euclidean distance between a and b, and l is the characteristic length. It should be noted that, when one of the two terms a and b is a matrix and the other is a vector, we need to first extend the vector to the same dimension as the matrix and then calculate the euclidean distance.

S22, starting a current iteration batch learning process, and firstly calculating soft labels of training samples of a batch according to a representative set and the training samples of the batch under the guidance of prior knowledge; after the forward propagation process of the batch of training samples is carried out, the loss functions of the network output and the inherent labels of the batch of training samples are calculated

And inherent label and soft label of the training sample of the batchLoss function of label

In this embodiment, the specific implementation method of step S22 is as follows:

s221, calculating according to the batch of training samples

K(h_b,h_b)、

Performing a Gaussian process regression algorithm using the Gaussian process constructed in step S21, assuming that

So as to

The predictions of (1) are:

wherein g is_mAnd g_vRespectively, the predicted mean and variance;

s222, carrying out forward propagation on the network similarly to the traditional deep learning process, and calculating the current output h (x) of the networkⁱ) And using h (x)ⁱ) And g_m、g_vComputing loss function network output h (x)ⁱ) And the inherent label y of the training sample of the batchⁱLoss function of

It should be noted that h (x)ⁱ) Represents the batch of training samples xⁱIs output from the network, and g_m(xⁱ) Represents the batch of training samples xⁱIs taken as the feature vector of_bInputting the average value g predicted after the formula in S221_m。

In addition, the first and second substrates are,

and

u is the error rate of the validation set in the previous training process, and since the error rate of the validation set in the previous training process does not exist in the first training process, the initial value of u in the first training process is directly adopted as

C is the number of categories of the data set.

For the last iteration batch in the training process of this period

Since there is no previous iteration batch for the first iteration batch, the absolute value of (c) in the first iteration batch

All initial values of (1).

In this embodiment, in step S23, the network outputs h (x)ⁱ) And the soft label g of the training sample of the batch_m(xⁱ) Loss function of

The calculation formula of (2) is as follows:

wherein:

u is the error rate of the validation set in the previous training process, and the initial value in the same first training process is

C is the number of categories;

for the last iteration batch in the training process of this period

Absolute value of (a), in the same first iteration batch

All initial values of (1).

S24, making a total loss function

And to

Is counter-propagating, wherein

In part and

some of which are used to optimize all parameters of the neural network,

part of which is used only to optimize convolutional layer parameters of the neural network. In this step, the three loss functions obtained in steps S22 and S23 are added to form a total loss function

To pair

And optimizing to achieve the algorithm target.

And S3, after the training process of the current period is finished, verifying the neural network model by using the verification set to obtain the error rate of the verification set of the current model.

In the specific calculation of the embodiment, after all iterations (iteration) of the current epoch training process are completed, the pictures of the verification set are sequentially passed through the network by using the current network, and then the error rate of the prediction result given by the current network is calculated. Whether the model converges may be determined based on whether the validation set error rate is below a threshold. If the model converges, the training of the neural network is ended, and if the model does not converge, the step S4 is executed continuously.

The above-described method is applied to specific examples so that those skilled in the art can better understand the effects of the present invention.

Examples

The implementation method of this embodiment is as described above, and specific steps are not elaborated, and the effect is shown only for case data. The invention is implemented on three data sets with truth labels based on a ResNet network, and the three data sets are respectively as follows:

cifar10 dataset

Cifar100 dataset

Tiny-ImageNet dataset

This example performed a set of experiments on each selected data set, comparing the general SGD optimization method with the method of the present invention.

The results of the experiments of this example are shown in Table 1 for comparison of accuracy. The data in the figure shows the average performance of 5 tests of the invention on the relevant data set, GPGL in the table is the neural network training method (Gaussian Process Guided Learning) based on the prior guidance of the Gaussian Process

TABLE 1 comparison of accuracy of experimental results

In addition, fig. 2 shows the comparison of training set error rates on a CIFAR-100 data set between the general SGD optimization method and the method of the present invention; fig. 3 is a comparison of the common SGD optimization method and the verification set error rate of the method of the present invention on a CIFAR-100 data set, which can visually indicate that the method of the present invention can improve the performance of the trained network compared to the conventional method.

Through the technical scheme, the neural network training method based on the Gaussian process prior guidance is implemented. The method can model the relationship between different types of pictures on various real image data, thereby helping the convolutional neural network training to be better.

The above description is only exemplary of the present invention and should not be construed as limiting the present invention, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.