CN110766044B - Neural network training method based on Gaussian process prior guidance - Google Patents

Neural network training method based on Gaussian process prior guidance Download PDF

Info

Publication number
CN110766044B
CN110766044B CN201910858834.9A CN201910858834A CN110766044B CN 110766044 B CN110766044 B CN 110766044B CN 201910858834 A CN201910858834 A CN 201910858834A CN 110766044 B CN110766044 B CN 110766044B
Authority
CN
China
Prior art keywords
training
batch
neural network
samples
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910858834.9A
Other languages
Chinese (zh)
Other versions
CN110766044A (en
Inventor
崔家宝
朱文武
励雪巍
李玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910858834.9A priority Critical patent/CN110766044B/en
Publication of CN110766044A publication Critical patent/CN110766044A/en
Application granted granted Critical
Publication of CN110766044B publication Critical patent/CN110766044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network training method based on Gaussian process prior guidance, which is used for improving the training process of a neural network to obtain better training effect. The method specifically comprises the following steps: s1, acquiring a data set for neural network training, selecting a representative set for modeling prior knowledge, and defining an algorithm target; s2, carrying out a training process of iterative learning on the neural network model in batches in one period, and sequentially executing steps S21-S24 in each iterative batch; s3, after the training process of the current period is finished, verifying the neural network model by using a verification set to obtain the error rate of the verification set of the current model; and S4, continuously repeating the steps S2 and S3 to carry out a multi-period training process on the neural network model until the model converges. The neural network training method based on the Gaussian process prior guidance can effectively improve the training effectiveness in tasks, improves the network learning ability and learning quality, and has good application value.

Description

Neural network training method based on Gaussian process prior guidance
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a neural network training method based on Gaussian process prior guidance.
Background
Image classification is a task of distinguishing between different classes of pictures in a data set. At present, the mainstream solution on the task of image classification is to train a convolutional neural network to solve the problem, and the training method generally adopts a random gradient descent method. In recent years, as the progress rate of network architecture is gradually reduced, improvement of training strategies is increasingly important. To this end, the present invention recognizes that it is desirable to provide as sophisticated and efficient supervised information as possible to train a given model better in supervised learning such as image classification. The data set provides a label, but the inherent label only represents the classification result of the picture and does not show the relationship between the picture and other categories. On the basis of utilizing the inherent labels of the data set, the invention introduces the soft labels representing the probability distribution of the image classification result by random process modeling, and combines the soft labels with the inherent labels of the data set for use, thereby improving the effectiveness of the training method.
Disclosure of Invention
In order to solve the problems, the invention provides a neural network training method based on Gaussian process prior guidance. The method is based on deep learning and random processes, the Gaussian process in the random process is used for modeling the correlation between images, the model is used for giving a 'soft label' to each training sample, the soft label and the inherent label of a data set are used for guiding the training process, and therefore the trained model is more accurate and robust.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a neural network training method based on Gaussian process prior guidance comprises the following steps:
s1, acquiring a data set for neural network training, selecting a representative set for modeling prior knowledge, and defining an algorithm target;
s2, carrying out a training process of batch iterative learning in one period (epoch) on the neural network model, and sequentially executing steps S21-S24 in each iterative batch (batch):
s21, before the current iteration batch starts, carrying out combined modeling on samples in the representative set and training samples of the batch to obtain related prior knowledge;
s22, starting the current iteration batch learning process, and calculating the soft label of the batch of training samples according to the representative set and the batch of training samples; after the forward propagation process of the batch of training samples is carried out, the loss functions of the network output and the inherent labels of the batch of training samples are calculated
Figure BDA0002199057050000021
And loss functions of inherent labels and soft labels of the training samples of the batch
Figure BDA0002199057050000022
S23, calculating loss functions of network output and soft labels of training samples in the batch
Figure BDA0002199057050000023
S24, making a total loss function
Figure BDA0002199057050000024
And to
Figure BDA0002199057050000025
Is counter-propagating, wherein
Figure BDA0002199057050000026
In part and
Figure BDA0002199057050000027
some of which are used to optimize all parameters of the neural network,
Figure BDA0002199057050000028
part of the convolutional layer parameters are only used for optimizing the neural network;
s3, after the training process of the current period is finished, verifying the neural network model by using a verification set to obtain the error rate of the verification set of the current model;
and S4, continuously repeating the steps S2 and S3 to carry out a multi-period training process on the neural network model until the model converges.
Based on the scheme, the steps can be realized in the following modes:
the representative set in step S1 is a set containing a plurality of images of different types, and the method for constructing the representative set includes:
first, the number of categories for the entire data set is evaluated:
when the category number of the data set is less than 50 classes, taking 50 pictures from each class of images, and then taking the pictures taken from all the classes as a representative set;
when the category number of the data set is more than or equal to 50 categories, taking 100 pictures from each category of images, and then taking the pictures taken from all the categories as a representative set;
the algorithm targets are defined as: function of total loss
Figure BDA0002199057050000031
And (4) minimizing.
In step S21, the specific steps of performing joint modeling on the samples in the representative set and the training samples of the batch and obtaining the related prior knowledge include:
s211, carrying out feature extraction on samples in the representative set and training samples of the batch by using convolutional layer parameters of the initial neural network model in each training process to obtain feature vectors of all samples;
s212, jointly modeling all samples in the representative set and samples to be predicted into a Gaussian process:
Figure BDA0002199057050000032
wherein the content of the first and second substances,
Figure BDA0002199057050000033
a representative set is represented that represents a set of representations,
Figure BDA0002199057050000034
to represent the set of feature vectors for all picture samples in a set,
Figure BDA0002199057050000035
a set of feature vectors representing all samples in the set;
Figure BDA0002199057050000036
is the label of the sample to be predicted, hbIs the feature vector of the sample to be predicted; k (·, ·) represents a covariance matrix, and an RBF kernel function is used for calculation, wherein the calculation general formula of the RBF kernel function is as follows:
Figure BDA0002199057050000037
wherein r is2(a, b) represents a second order Euclidean distance between a and b, and l is a characteristic length; -
Calculating to obtain prior knowledge through the modeling of the Gaussian process
Figure BDA0002199057050000038
Figure BDA0002199057050000039
The specific implementation method of step S22 is:
s221, calculating according to the batch of training samples
Figure BDA00021990570500000310
K(hb,hb)、
Figure BDA00021990570500000311
Performing a gaussian process regression algorithm using the gaussian process constructed in step S21, and
Figure BDA0002199057050000041
and (3) predicting:
Figure BDA0002199057050000042
Figure BDA0002199057050000043
wherein g ismAnd gvRespectively, the predicted mean and variance;
s222, carrying out forward propagation on the network, and calculating the current output h (x) of the networki) And using h (x)i) And gm、gvComputing loss function network output h (x)i) And the inherent label y of the training sample of the batchiLoss function of
Figure BDA0002199057050000044
And the inherent label y of the training sample of the batchiAnd a soft label gm(xi) Loss function of
Figure BDA0002199057050000045
Figure BDA0002199057050000046
Figure BDA0002199057050000047
Wherein
Figure BDA0002199057050000048
And
Figure BDA0002199057050000049
both represent cross entropy calculations, and the calculation formula for the parameters α and γ is:
Figure BDA00021990570500000410
Figure BDA00021990570500000411
u is the error rate of the validation set during the previous training period, and the initial value during the first training period is
Figure BDA00021990570500000412
C is the category number of the data set;
Figure BDA00021990570500000413
for the last iteration batch in the training process of this period
Figure BDA00021990570500000414
Absolute value of, in the first iteration batch
Figure BDA00021990570500000415
All initial values of (1).
In step S23, the network outputs h (x)i) And the soft label g of the training sample of the batchm(xi) Loss function of
Figure BDA00021990570500000416
The calculation formula of (2) is as follows:
Figure BDA00021990570500000417
wherein:
Figure BDA00021990570500000418
the relative entropy calculation is expressed, and the calculation formula of the parameter beta is as follows;
Figure BDA00021990570500000419
u is the error rate of the verification set in the previous training process, and the initial value is
Figure BDA00021990570500000420
C is the number of categories;
Figure BDA0002199057050000051
for the last iteration batch in the training process of this period
Figure BDA0002199057050000052
Absolute value of, in the first iteration batch
Figure BDA0002199057050000053
All initial values of (1).
In step S24, the three term loss functions obtained in steps S22 and S23 are added to form a total loss function
Figure BDA0002199057050000054
To pair
Figure BDA0002199057050000055
Optimizing to achieve the algorithm goal, wherein
Figure BDA0002199057050000056
In part and
Figure BDA0002199057050000057
some of which are used to optimize all parameters of the neural network,
Figure BDA0002199057050000058
part of which is used only to optimize convolutional layer parameters of the neural network.
In step S3, after all iterations of the training process in this period are completed, the pictures of the verification set are sequentially passed through the network by using the current network, and then the error rate of the prediction result given by the current network is calculated.
Compared with the prior art, the invention has the following beneficial effects:
firstly, the neural network training method based on the prior guidance of the Gaussian process provides a solution to the problem that only a small part of samples can be sampled simultaneously for training and global information cannot be considered in the random gradient descent method commonly used in the current deep learning, and the performance of the trained network can be improved by effectively solving the problem.
Secondly, the representative set sampling method of the invention can play a role in data sets with different sizes based on different characteristics of different data sets.
Finally, compared with the traditional one-item loss function, the three-item loss function provided by the invention can enable the model to consider information brought by different labels and global information contained in the 'soft label', so that the training of the model is more perfect.
The neural network training method based on the Gaussian process prior guidance can effectively improve the training effectiveness in tasks, improves the network learning ability and learning quality, and has good application value.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a comparison of training set error rates of the ResNet20 network on a CIFAR-100 dataset in an example;
FIG. 3 is a comparison of validation set error rates of the ResNet20 network on a CIFAR-100 dataset in an example.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
As shown in fig. 1, a neural network training method based on a gaussian process prior guidance is characterized by comprising the following steps:
s1, acquiring a data set for neural network training, and defining a neural network model structure to be trained. A representative set for modeling prior knowledge is selected from the data set. The representative set in the step is a set comprising a plurality of images of different types, and the construction method of the representative set comprises the following steps:
first, the number of categories for the entire data set is evaluated:
when the category number of the data set is less than 50 classes, taking 50 pictures from each class of images, and then taking the pictures taken from all the classes as a representative set;
when the category number of the data set is more than or equal to 50 categories, taking 100 pictures from each category of images, and then taking the pictures taken from all the categories as a representative set;
the algorithm targets are defined as: function of total loss
Figure BDA0002199057050000071
And (4) minimizing.
S2, carrying out a training process of iterative learning on the neural network model in batches in one period, and sequentially executing steps S21-S24 in each iterative batch:
and S21, before the current iteration batch starts, carrying out combined modeling on the samples in the representative set and the training samples of the batch and obtaining related prior knowledge. In this embodiment, the specific implementation step in step S21 includes:
s211, carrying out feature extraction on samples in the representative set and training samples of the batch by using convolutional layer parameters of the initial neural network model in each training process to obtain feature vectors of all samples;
s212, supposing that the representative set is
Figure BDA0002199057050000072
All picture sample sets in the representative set are
Figure BDA0002199057050000073
The set of intrinsic labels representing all picture samples in the set is
Figure BDA0002199057050000074
X hereiIs a picture, yiIs a label of the picture. Defining:
Figure BDA0002199057050000075
Figure BDA0002199057050000076
to represent all samples in the set
Figure BDA0002199057050000077
The function f () is the structure of the neural network model except the fully connected layer, i.e. all convolutional layer parameters of the neural network model, and is used to extract the feature vector of the sample。
Figure BDA0002199057050000078
Is the label of the sample to be predicted, hbIs the feature vector of the sample to be predicted. The function h () represents a fully connected layer.
Jointly modeling all samples in the representative set and the samples to be predicted into a Gaussian process:
Figure BDA0002199057050000079
wherein, K (·,) represents a covariance matrix, which is calculated using an RBF kernel, the calculation formula of the RBF kernel is:
Figure BDA00021990570500000710
wherein r is2(a, b) represents the second order Euclidean distance between a and b, and l is the characteristic length. It should be noted that, when one of the two terms a and b is a matrix and the other is a vector, we need to first extend the vector to the same dimension as the matrix and then calculate the euclidean distance.
Calculating to obtain prior knowledge through the modeling of the Gaussian process
Figure BDA0002199057050000081
Figure BDA0002199057050000082
S22, starting a current iteration batch learning process, and firstly calculating soft labels of training samples of a batch according to a representative set and the training samples of the batch under the guidance of prior knowledge; after the forward propagation process of the batch of training samples is carried out, the loss functions of the network output and the inherent labels of the batch of training samples are calculated
Figure BDA0002199057050000083
And inherent label and soft label of the training sample of the batchLoss function of label
Figure BDA0002199057050000084
In this embodiment, the specific implementation method of step S22 is as follows:
s221, calculating according to the batch of training samples
Figure BDA0002199057050000085
K(hb,hb)、
Figure BDA0002199057050000086
Performing a Gaussian process regression algorithm using the Gaussian process constructed in step S21, assuming that
Figure BDA0002199057050000087
So as to
Figure BDA0002199057050000088
The predictions of (1) are:
Figure BDA0002199057050000089
Figure BDA00021990570500000810
wherein g ismAnd gvRespectively, the predicted mean and variance;
s222, carrying out forward propagation on the network similarly to the traditional deep learning process, and calculating the current output h (x) of the networki) And using h (x)i) And gm、gvComputing loss function network output h (x)i) And the inherent label y of the training sample of the batchiLoss function of
Figure BDA00021990570500000811
And the inherent label y of the training sample of the batchiAnd a soft label gm(xi) Loss function of
Figure BDA00021990570500000812
Figure BDA00021990570500000813
Figure BDA00021990570500000814
It should be noted that h (x)i) Represents the batch of training samples xiIs output from the network, and gm(xi) Represents the batch of training samples xiIs taken as the feature vector ofbInputting the average value g predicted after the formula in S221m
In addition, the first and second substrates are,
Figure BDA00021990570500000815
and
Figure BDA00021990570500000816
both represent cross entropy calculations, and the calculation formula for the parameters α and γ is:
Figure BDA0002199057050000091
Figure BDA0002199057050000092
u is the error rate of the validation set in the previous training process, and since the error rate of the validation set in the previous training process does not exist in the first training process, the initial value of u in the first training process is directly adopted as
Figure BDA0002199057050000093
C is the number of categories of the data set.
Figure BDA0002199057050000094
For the last iteration batch in the training process of this period
Figure BDA0002199057050000095
Since there is no previous iteration batch for the first iteration batch, the absolute value of (c) in the first iteration batch
Figure BDA0002199057050000096
All initial values of (1).
S23, calculating loss functions of network output and soft labels of training samples in the batch
Figure BDA0002199057050000097
In this embodiment, in step S23, the network outputs h (x)i) And the soft label g of the training sample of the batchm(xi) Loss function of
Figure BDA0002199057050000098
The calculation formula of (2) is as follows:
Figure BDA0002199057050000099
wherein:
Figure BDA00021990570500000910
the relative entropy calculation is expressed, and the calculation formula of the parameter beta is as follows;
Figure BDA00021990570500000911
u is the error rate of the validation set in the previous training process, and the initial value in the same first training process is
Figure BDA00021990570500000912
C is the number of categories;
Figure BDA00021990570500000913
for the last iteration batch in the training process of this period
Figure BDA00021990570500000914
Absolute value of (a), in the same first iteration batch
Figure BDA00021990570500000915
All initial values of (1).
S24, making a total loss function
Figure BDA00021990570500000916
And to
Figure BDA00021990570500000917
Is counter-propagating, wherein
Figure BDA00021990570500000918
In part and
Figure BDA00021990570500000919
some of which are used to optimize all parameters of the neural network,
Figure BDA00021990570500000920
part of which is used only to optimize convolutional layer parameters of the neural network. In this step, the three loss functions obtained in steps S22 and S23 are added to form a total loss function
Figure BDA00021990570500000921
To pair
Figure BDA00021990570500000922
And optimizing to achieve the algorithm target.
And S3, after the training process of the current period is finished, verifying the neural network model by using the verification set to obtain the error rate of the verification set of the current model.
In the specific calculation of the embodiment, after all iterations (iteration) of the current epoch training process are completed, the pictures of the verification set are sequentially passed through the network by using the current network, and then the error rate of the prediction result given by the current network is calculated. Whether the model converges may be determined based on whether the validation set error rate is below a threshold. If the model converges, the training of the neural network is ended, and if the model does not converge, the step S4 is executed continuously.
And S4, continuously repeating the steps S2 and S3 to carry out a multi-period training process on the neural network model until the model converges.
The above-described method is applied to specific examples so that those skilled in the art can better understand the effects of the present invention.
Examples
The implementation method of this embodiment is as described above, and specific steps are not elaborated, and the effect is shown only for case data. The invention is implemented on three data sets with truth labels based on a ResNet network, and the three data sets are respectively as follows:
cifar10 dataset
Cifar100 dataset
Tiny-ImageNet dataset
This example performed a set of experiments on each selected data set, comparing the general SGD optimization method with the method of the present invention.
The results of the experiments of this example are shown in Table 1 for comparison of accuracy. The data in the figure shows the average performance of 5 tests of the invention on the relevant data set, GPGL in the table is the neural network training method (Gaussian Process Guided Learning) based on the prior guidance of the Gaussian Process
TABLE 1 comparison of accuracy of experimental results
Figure BDA0002199057050000111
In addition, fig. 2 shows the comparison of training set error rates on a CIFAR-100 data set between the general SGD optimization method and the method of the present invention; fig. 3 is a comparison of the common SGD optimization method and the verification set error rate of the method of the present invention on a CIFAR-100 data set, which can visually indicate that the method of the present invention can improve the performance of the trained network compared to the conventional method.
Through the technical scheme, the neural network training method based on the Gaussian process prior guidance is implemented. The method can model the relationship between different types of pictures on various real image data, thereby helping the convolutional neural network training to be better.
The above description is only exemplary of the present invention and should not be construed as limiting the present invention, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A neural network training method based on Gaussian process prior guidance is characterized by comprising the following steps:
s1, acquiring a data set for neural network training, selecting a representative set for modeling prior knowledge, and defining an algorithm target;
s2, carrying out a training process of iterative learning on the neural network model in batches in one period, and sequentially executing steps S21-S24 in each iterative batch:
s21, before the current iteration batch starts, carrying out combined modeling on samples in the representative set and training samples of the batch to obtain related prior knowledge;
s22, starting the current iteration batch learning process, and calculating the soft label of the batch of training samples according to the representative set and the batch of training samples; after the forward propagation process of the batch of training samples is carried out, the loss functions of the network output and the inherent labels of the batch of training samples are calculated
Figure FDA0003169613800000011
And loss functions of inherent labels and soft labels of the training samples of the batch
Figure FDA0003169613800000012
S23, calculating loss functions of network output and soft labels of training samples in the batch
Figure FDA0003169613800000013
S24, making a total loss function
Figure FDA0003169613800000014
And to
Figure FDA0003169613800000015
Is counter-propagating, wherein
Figure FDA0003169613800000016
In part and
Figure FDA0003169613800000017
some of which are used to optimize all parameters of the neural network,
Figure FDA0003169613800000018
part of the convolutional layer parameters are only used for optimizing the neural network;
s3, after the training process of the current period is finished, verifying the neural network model by using a verification set to obtain the error rate of the verification set of the current model;
s4, continuously repeating the steps S2 and S3 to carry out a multi-period training process on the neural network model until the model converges;
the representative set in step S1 is a set containing a plurality of images of different types, and the method for constructing the representative set includes:
first, the number of categories for the entire data set is evaluated:
when the category number of the data set is less than 50 classes, taking 50 pictures from each class of images, and then taking the pictures taken from all the classes as a representative set;
when the category number of the data set is more than or equal to 50 categories, taking 100 pictures from each category of images, and then taking the pictures taken from all the categories as a representative set;
the algorithm targets are defined as: function of total loss
Figure FDA0003169613800000021
Minimization;
in step S21, the specific steps of performing joint modeling on the samples in the representative set and the training samples of the batch and obtaining the related prior knowledge include:
s211, carrying out feature extraction on samples in the representative set and training samples of the batch by using convolutional layer parameters of the initial neural network model in each training process to obtain feature vectors of all samples;
s212, jointly modeling all samples in the representative set and samples to be predicted into a Gaussian process:
Figure FDA0003169613800000022
wherein the content of the first and second substances,
Figure FDA0003169613800000023
a representative set is represented that represents a set of representations,
Figure FDA0003169613800000024
to represent the set of feature vectors for all picture samples in a set,
Figure FDA0003169613800000025
a set of feature vectors representing all samples in the set;
Figure FDA0003169613800000026
is the label of the sample to be predicted, hbIs the feature vector of the sample to be predicted; k (·, ·) represents a covariance matrix, and an RBF kernel function is used for calculation, wherein the calculation general formula of the RBF kernel function is as follows:
Figure FDA0003169613800000027
wherein r is2(a, b) represents a second order Euclidean distance between a and b, and l is a characteristic length;
calculating to obtain prior knowledge through the modeling of the Gaussian process
Figure FDA0003169613800000028
Figure FDA0003169613800000029
The specific implementation method of step S22 is:
s221, calculating according to the batch of training samples
Figure FDA00031696138000000210
K(hb,hb)、
Figure FDA00031696138000000211
Performing a gaussian process regression algorithm using the gaussian process constructed in step S21, and
Figure FDA0003169613800000031
and (3) predicting:
Figure FDA0003169613800000032
Figure FDA0003169613800000033
wherein g ismAnd gvRespectively, the predicted mean and variance;
s222, carrying out forward propagation on the network, and calculating the current output h (x) of the networki) And using h (x)i) And gm、gvComputing loss function network output h (x)i) And the inherent label y of the training sample of the batchiLoss function of
Figure FDA0003169613800000034
And inherent mark of the batch of training samplesSign yiAnd a soft label gm(xi) Loss function of
Figure FDA0003169613800000035
Figure FDA0003169613800000036
Figure FDA0003169613800000037
Wherein
Figure FDA0003169613800000038
And
Figure FDA0003169613800000039
both represent cross entropy calculations, and the calculation formula for the parameters α and γ is:
Figure FDA00031696138000000310
Figure FDA00031696138000000311
u is the error rate of the validation set during the previous training period, and the initial value during the first training period is
Figure FDA00031696138000000312
C is the category number of the data set;
Figure FDA00031696138000000313
for the last iteration batch in the training process of this period
Figure FDA00031696138000000314
Absolute value of, in the first iteration batch
Figure FDA00031696138000000315
The initial values of (a) are all 1;
in step S23, the network outputs h (x)i) And the soft label g of the training sample of the batchm(xi) Loss function of
Figure FDA00031696138000000316
The calculation formula of (2) is as follows:
Figure FDA00031696138000000317
wherein:
Figure FDA00031696138000000318
the relative entropy calculation is expressed, and the calculation formula of the parameter beta is as follows;
Figure FDA00031696138000000319
u is the error rate of the verification set in the previous training process, and the initial value is
Figure FDA00031696138000000320
C is the number of categories;
Figure FDA0003169613800000041
for the last iteration batch in the training process of this period
Figure FDA0003169613800000042
Absolute value of, in the first iteration batch
Figure FDA0003169613800000043
The initial values of (a) are all 1;
in step S24, the three term loss functions obtained in steps S22 and S23 are added to form a total loss function
Figure FDA0003169613800000044
To pair
Figure FDA0003169613800000045
Optimizing to achieve the algorithm goal, wherein
Figure FDA0003169613800000046
In part and
Figure FDA0003169613800000047
some of which are used to optimize all parameters of the neural network,
Figure FDA0003169613800000048
part of the convolutional layer parameters are only used for optimizing the neural network;
in step S3, after all iterations of the training process in this period are completed, the pictures of the verification set are sequentially passed through the network by using the current network, and then the error rate of the prediction result given by the current network is calculated.
CN201910858834.9A 2019-09-11 2019-09-11 Neural network training method based on Gaussian process prior guidance Active CN110766044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910858834.9A CN110766044B (en) 2019-09-11 2019-09-11 Neural network training method based on Gaussian process prior guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910858834.9A CN110766044B (en) 2019-09-11 2019-09-11 Neural network training method based on Gaussian process prior guidance

Publications (2)

Publication Number Publication Date
CN110766044A CN110766044A (en) 2020-02-07
CN110766044B true CN110766044B (en) 2021-10-26

Family

ID=69329421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910858834.9A Active CN110766044B (en) 2019-09-11 2019-09-11 Neural network training method based on Gaussian process prior guidance

Country Status (1)

Country Link
CN (1) CN110766044B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931801B (en) * 2020-05-28 2024-03-12 浙江大学 Dynamic route network learning method based on path diversity and consistency
CN111639759A (en) * 2020-06-01 2020-09-08 深圳前海微众银行股份有限公司 Neural network model protection method, device, equipment and readable storage medium
CN111667016B (en) * 2020-06-12 2023-01-06 中国电子科技集团公司第三十六研究所 Incremental information classification method based on prototype
CN111860660A (en) * 2020-07-24 2020-10-30 辽宁工程技术大学 Small sample learning garbage classification method based on improved Gaussian network
CN111860424B (en) * 2020-07-30 2023-07-11 厦门熵基科技有限公司 Training method and device for visible light palm recognition model
CN112380631B (en) * 2020-12-02 2023-02-14 黑龙江科技大学 Novel iterative hybrid test method based on neural network
CN112633503B (en) * 2020-12-16 2023-08-22 浙江大学 Tool variable generation and handwritten number recognition method and device based on neural network
CN112614550B (en) * 2020-12-17 2024-03-15 华东理工大学 Molecular sieve X-ray diffraction spectrum peak position prediction method based on neural network
CN112884059B (en) * 2021-03-09 2022-07-05 电子科技大学 Small sample radar working mode classification method fusing priori knowledge
CN113435641B (en) * 2021-06-24 2023-03-07 布瑞克农业大数据科技集团有限公司 Full-automatic management method and system for agricultural products and storage medium
CN114463602B (en) * 2022-04-12 2022-07-08 北京云恒科技研究院有限公司 Target identification data processing method based on big data
CN116127371B (en) * 2022-12-06 2023-09-08 东北林业大学 Multi-user model joint iteration method integrating prior distribution and homomorphic chaotic encryption

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108806A (en) * 2017-12-14 2018-06-01 西北工业大学 Convolutional neural networks initial method based on the extraction of pre-training model filter
WO2018184222A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Methods and systems using improved training and learning for deep neural networks
CN110020718A (en) * 2019-03-14 2019-07-16 上海交通大学 The layer-by-layer neural networks pruning method and system inferred based on variation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8666148B2 (en) * 2010-06-03 2014-03-04 Adobe Systems Incorporated Image adjustment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018184222A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Methods and systems using improved training and learning for deep neural networks
CN108108806A (en) * 2017-12-14 2018-06-01 西北工业大学 Convolutional neural networks initial method based on the extraction of pre-training model filter
CN110020718A (en) * 2019-03-14 2019-07-16 上海交通大学 The layer-by-layer neural networks pruning method and system inferred based on variation

Also Published As

Publication number Publication date
CN110766044A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN110766044B (en) Neural network training method based on Gaussian process prior guidance
CN108536784B (en) Comment information sentiment analysis method and device, computer storage medium and server
CN111126488A (en) Image identification method based on double attention
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN109508686B (en) Human behavior recognition method based on hierarchical feature subspace learning
CN113128671B (en) Service demand dynamic prediction method and system based on multi-mode machine learning
CN110033089B (en) Method and system for optimizing parameters of handwritten digital image recognition deep neural network based on distributed estimation algorithm
CN112740200B (en) Systems and methods for end-to-end deep reinforcement learning based on coreference resolution
CN114936639A (en) Progressive confrontation training method and device
CN112710310A (en) Visual language indoor navigation method, system, terminal and application
CN113902129A (en) Multi-mode unified intelligent learning diagnosis modeling method, system, medium and terminal
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN107240100B (en) Image segmentation method and system based on genetic algorithm
CN114444727B (en) Living body detection method and device, electronic model and storage medium
CN111161238A (en) Image quality evaluation method and device, electronic device, and storage medium
Bai et al. A unified deep learning model for protein structure prediction
CN110674860A (en) Feature selection method based on neighborhood search strategy, storage medium and terminal
Zhu et al. Fast Adaptive Character Animation Synthesis Based on Greedy Algorithm
CN115329863A (en) Novel linear rectification gradient balance loss function classification method and system
Calderhead et al. Sparse approximate manifolds for differential geometric mcmc
CN114936890A (en) Counter-fact fairness recommendation method based on inverse tendency weighting method
CN114154582A (en) Deep reinforcement learning method based on environment dynamic decomposition model
CN111860556A (en) Model processing method and device and storage medium
Pour et al. Optimal Bayesian feature selection with missing data
WO2021215261A1 (en) Information processing method, information processing device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant