CN110738242B - Bayes structure learning method and device of deep neural network - Google Patents

Bayes structure learning method and device of deep neural network Download PDF

Info

Publication number
CN110738242B
CN110738242B CN201910912494.3A CN201910912494A CN110738242B CN 110738242 B CN110738242 B CN 110738242B CN 201910912494 A CN201910912494 A CN 201910912494A CN 110738242 B CN110738242 B CN 110738242B
Authority
CN
China
Prior art keywords
network
deep neural
neural network
training
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910912494.3A
Other languages
Chinese (zh)
Other versions
CN110738242A (en
Inventor
朱军
邓志杰
张钹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910912494.3A priority Critical patent/CN110738242B/en
Publication of CN110738242A publication Critical patent/CN110738242A/en
Application granted granted Critical
Publication of CN110738242B publication Critical patent/CN110738242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention provides a Bayes structure learning method and device of a deep neural network. The method comprises the steps of constructing a deep neural network comprising a plurality of learning units with the same internal structure, wherein each learning unit comprises a plurality of hidden layers, each hidden layer comprises a plurality of computing units, the network structure is the relative weight of each computing unit, and parameterized variational distribution is adopted to model the network structure; extracting a training subset, and sampling a network structure by adopting a re-parameterization process; calculating a lower evidence bound; if the change in the evidence lower bound exceeds the loss threshold, the network structure and network weights are optimized and new training begins. According to the embodiment of the invention, the deep neural network comprising a plurality of learning units with the same internal structure is constructed, and the relative weights of the computing units among all hidden layers in the learning units are trained through the training set to obtain an optimized network structure, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.

Description

Bayes structure learning method and device of deep neural network
Technical Field
The invention relates to the technical field of data processing, in particular to a Bayesian structure learning method and device for a deep neural network.
Background
Bayesian deep learning is intended to provide accurate and reliable uncertainty assessment for flexible and efficient deep neural networks. Traditionally, the bayesian network introduces uncertainty in the network weights, which often prevents the problem that the model is easily over-fitted, and also brings efficient prediction uncertainty for the model. However, there is a problem with introducing uncertainty in the network weights. Firstly, the prior distribution of the artificially set weight is often unreliable, which easily causes the problems of over-pruning and the like, so that the model fitting capability is greatly limited; secondly, introducing flexible variation distribution on weight is easy to cause difficulty in reasoning because of complex dependence relationship in the variation distribution. Recently, particle-based variational inference techniques have also been used to optimize bayesian networks, but they also suffer from problems of particle collapse and degradation.
Therefore, the current bayesian network cannot provide accurate and reliable prediction performance in practical application.
Disclosure of Invention
Because the existing method has the problems, the embodiment of the invention provides a Bayesian structure learning method and device for a deep neural network.
In a first aspect, an embodiment of the present invention provides a bayesian structure learning method for a deep neural network, including:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, a network structure is defined as the relative weight of each computing unit, and the network structure is modeled by adopting parameterized variational distribution;
randomly extracting a training subset from a preset training set, and sampling a network structure of the learning unit by adopting a re-parameterization process;
according to the sampled network structure, calculating an evidence lower bound ELBO of the deep neural network;
if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the network structure and the network weight according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;
and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.
Further, the sampling the network structure of the learning unit by adopting a re-parameterization process; the method specifically comprises the following steps:
and sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient.
Further, obtaining an evidence lower bound of the deep neural network according to the sampled network structure; the method specifically comprises the following steps:
according to the sampled network structure, calculating output results corresponding to the labeled samples in the training subset, and calculating the error of the deep neural network and the difference value of the logarithmic density in the network variation distribution and the preset prior distribution;
and carrying out weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain the evidence lower bound of the deep neural network.
Further, the deep neural network is constructed, and comprises at least one learning unit with the same internal structure; the method specifically comprises the following steps:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, and a preset down-sampling layer and/or an up-sampling layer are/is inserted between predetermined learning units; wherein the downsampling layer comprises: the device comprises a batch regularization layer, a linear rectification layer, a convolution layer and a pooling layer, wherein the up-sampling layer is constructed by a deconvolution layer.
Further, an input layer of the deep neural network is a convolution layer for preprocessing, and an output layer is a linear full-connection layer.
Further, the variation distribution is a concoret distribution.
In a second aspect, an embodiment of the present invention provides a bayesian structure learning apparatus for a deep neural network, including:
the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, the network structure is defined as the relative weight of each computing unit, and the parameterized variational distribution is adopted to model the network structure;
the training unit is used for randomly extracting a training subset from a preset training set and sampling the network structure of the learning unit by adopting a re-parameterization process;
the error calculation unit is used for obtaining an evidence lower bound ELBO of the deep neural network according to the sampled network structure;
the judging unit is used for optimizing the network structure and the network weight according to a preset optimization method if the change of the evidence lower bound exceeds a preset loss threshold value, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit; and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.
Further, the training unit is specifically configured to randomly extract a training subset from a preset training set, and sample the network structure of the learning unit by using a reparameterization process according to a preset adaptive coefficient.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
a processor, a memory, a communication interface, and a communication bus; wherein the content of the first and second substances,
the processor, the memory and the communication interface complete mutual communication through the communication bus;
the communication interface is used for information transmission between communication devices of the electronic equipment;
the memory stores computer program instructions executable by the processor, the processor invoking the program instructions to perform a method comprising:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, a network structure is defined as the relative weight of each computing unit, and the network structure is modeled by adopting parameterized variational distribution;
randomly extracting a training subset from a preset training set, and sampling a network structure of the learning unit by adopting a re-parameterization process;
according to the sampled network structure, calculating an evidence lower bound ELBO of the deep neural network;
if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the distribution and the network weight of the network structure according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;
and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.
In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following method:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, a network structure is defined as the relative weight of each computing unit, and the network structure is modeled by adopting parameterized variational distribution;
randomly extracting a training subset from a preset training set, and sampling a network structure of the learning unit by adopting a re-parameterization process;
according to the sampled network structure, calculating an evidence lower bound ELBO of the deep neural network;
if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the distribution and the network weight of the network structure according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;
and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.
According to the Bayes structure learning method and device for the deep neural network, provided by the embodiment of the invention, the optimized network structure of the learning unit is obtained by constructing the deep neural network comprising a plurality of learning units with the same internal structure and training the relative weight of each computing unit among all hidden layers in the learning unit through a training set, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart of a Bayesian structure learning method for a deep neural network according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a Bayesian structure learning device of a deep neural network according to an embodiment of the present invention;
fig. 3 illustrates a physical structure diagram of an electronic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a bayesian structure learning method of a deep neural network according to an embodiment of the present invention, as shown in fig. 1, the method includes:
and step S01, constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of calculation units are arranged between every two hidden layers, the network structure is defined as the relative weight of each calculation unit, and the parameterized variation distribution is adopted to model the network structure.
Constructing a deep neural network according to actual needs, wherein the deep neural network comprises the following components: the device comprises an input layer, an output layer and at least one repeatedly piled learning unit positioned between the input layer and the output layer, wherein the learning units are sequentially connected in series and have the same network structure.
The learning unit comprises K hidden layers with a preset number of layers, and any two hidden layers comprise a plurality of computing units, such as a full-connection unit, a convolution unit, a pooling unit and the like. The network structure α of the learning unit is ═ { α ═ α(i,j)I is more than or equal to 1 and less than j and less than or equal to K, wherein alpha is(i,j)The method comprises the relative weight between the ith hidden layer and the jth hidden layer corresponding to each computing unit. The feature of the i-th hidden layerThe computation of the representation of the features to the jth hidden layer is: according to the alpha(i,j)And the weighted sum of the outputs of the computing units between the two.
The relative weights corresponding to the computing units obey classification distribution, and can be set to be variable distribution which can be learned, continuous and parameterized for convenience of training based on an optimization method, so that the network structure is constructed.
Further, the variation distribution is a concoret distribution.
The variation distribution can be set according to actual needs, and the embodiment of the present invention only gives an example of one of them: the constcrete distribution, but for simplicity, is exemplified in the following examples by way of example.
And step S02, randomly extracting a training subset from a preset training set, and sampling the network structure of the learning unit by adopting a re-parameterization process.
And step S03, obtaining an evidence lower bound ELBO of the deep neural network according to the sampled network structure.
And step S04, if the change of the evidence lower bound exceeds a preset loss threshold, optimizing the distribution and the network weight of the network structure according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit.
And step S05, if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.
Presetting a training set containing N samples, each sample (x)n,yn) Comprising input data xnAnd corresponding pre-labeled output result yn
And training the deep neural network according to the training set, and finally obtaining an optimized network structure of the learning unit after finishing the training, wherein the optimized network structure comprises optimized relative weights corresponding to each computing unit between every two hidden layers. The specific training process is as follows:
1. randomSelecting a subset of size k in the training set
Figure BDA0002215142260000061
Randomly sampling the network structure alpha of the learning unit from the variation distribution by using a re-parameterization technology, wherein the network weight and the bias parameter in the deep neural network are represented as w, and the deep neural network obtains the prediction probability according to the sampled network structure
Figure BDA0002215142260000062
f is one with xnA function corresponding to a deep neural network of a network structure in which w is a parameter and α is a learning unit is input.
2. And obtaining the current Evidence Lower Bound (ELBO) of the deep neural network according to the sampled network structure.
3. And comparing the current evidence lower bound with the evidence lower bound obtained after the last training is finished so as to obtain the change of the evidence lower bound. And if the change of the evidence lower bound obtained after the training exceeds the preset loss threshold, judging that the training is not finished, and further optimizing the current network structure and the network weight according to a preset optimization method. The subset is then re-selected from the training set to continue the training process of 1-3. And judging that the training is finished until the change of the lower evidence bound is less than or equal to the loss threshold.
As can be seen from the training process, the training process in the embodiment of the present invention only trains the connection relationship between hidden layers inside the learning unit, so that for a hidden layer structure of K layers, the network structure α of the learning unit has K (K-1)/2 variables in total, and each variable represents the relative weight of the computing unit between a pair of hidden layers.
According to the embodiment of the invention, the deep neural network comprising a plurality of learning units with the same internal structure is constructed, and the relative weights of the computing units among all hidden layers in the learning units are trained through the training set to obtain the optimized network structure of the learning units, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.
Based on the above embodiment, further, in step S02, a re-parameterization process is used to sample the network structure of the learning unit; the method specifically comprises the following steps:
and sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient.
In order to prevent the problem that the training is difficult to converge in the training process, therefore, when the learning unit is constructed, a preset adaptability coefficient beta is added to the parameterization process(i,j)Adjust the variance of the samples. The specific reparameterization process thus obtained is
Figure BDA0002215142260000071
Wherein ∈ { [ epsilon ](i,j)A set of Gumbel variables that are independent for each dimension, τ being a positive real number representing temperature.
The specific process is as follows:
a. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;
b. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;
c. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;
d. and (c) inputting the result obtained in the step (c) into softmax transformation, and obtaining the sampled network structure alpha which is g (theta, beta and epsilon).
According to the embodiment of the invention, the adaptability coefficient is adjusted for the re-parameterization process, so that the problem that the training is difficult to converge in the training process is prevented, and the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.
Based on the foregoing embodiment, further, the step S03 specifically includes:
step S031, according to the sampled network structure, calculating an output result corresponding to each labeled sample in the training subset, and calculating an error of the deep neural network and a log density difference between the network variation distribution and a preset prior distribution.
And S032, performing weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain an evidence lower bound of the deep neural network.
After a sampled network structure of the learning unit is obtained by sampling according to the extracted training subset and using a reparameterization process, prediction results obtained by the samples in the training subset through a current deep neural network are obtained according to the sampled network structure, and errors of the prediction results are calculated, wherein the calculation methods of the errors are many, and only cross entropy is taken as an example to illustrate: cross entropy
Figure BDA0002215142260000081
Meanwhile, based on the network structure alpha after current sampling, calculating the difference value of the logarithmic density of the network structure alpha in the variation distribution and the preset prior distribution, wherein the two distributions are both adaptive concoret distributions, and the logarithmic density is as follows:
Figure BDA0002215142260000082
therefore, the readily available difference is denoted as KL;
weighting the difference value of the cross entropy and the logarithmic density to obtain the integral loss of the deep neural network
Figure BDA0002215142260000083
Wherein
Figure BDA0002215142260000084
The KL distance corresponding to the whole training set is distributed to the weight corresponding to the subset, the loss is the approximate error of variational reasoning, and a preset optimization method, such as a gradient descent method, is used for solving an optimization problem minθ,wELBO, in order to further optimize the network structure.
In each training, the difference value of the cross entropy and the logarithmic density is calculated, the evidence lower bound of the deep neural network is obtained after the mutual weighting, and the network structure is optimized according to the change of the evidence lower bound, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.
Based on the foregoing embodiment, further, the step S01 specifically includes:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, and a preset down-sampling layer and/or an up-sampling layer are/is inserted between predetermined learning units; wherein the downsampling layer comprises: the device comprises a batch regularization layer, a linear rectification layer, a convolution layer and a pooling layer, wherein the up-sampling layer is constructed by a deconvolution layer.
The deep neural network constructed by the embodiment of the invention comprises a plurality of learning units which are connected in series, and a preset down-sampling layer and/or up-sampling layer is inserted between part of predetermined learning units according to the requirements of practical application. In the picture classification, only a downsampling layer needs to be inserted, and for semantic segmentation, the downsampling layer needs to be inserted. Wherein the downsampling layer is generally composed of batch regularization-linear rectification-convolution-pooling, and the upsampling is generally constructed by deconvolution.
Further, an input layer of the deep neural network is a convolution layer for preprocessing, and an output layer is a linear full-connection layer.
The convolution computing unit adopts the operation sequence of batch regularization-linear rectification-convolution-batch regularization;
the method integrates the simultaneous, mutually independent and same-class calculation units into one group operation, improves the calculation efficiency, and mainly arranges a plurality of convolutions into one group convolution to improve the calculation efficiency.
According to the embodiment of the invention, a down-sampling layer and/or an up-sampling layer are/is inserted between the predetermined learning units, a preprocessed convolution layer is added in the input layer, and a linear full-connection layer is inserted in the network finally, so that the performance of the deep neural network is improved.
The deep neural network constructed by the above embodiment can be applied to various scenes, such as:
classifying pictures:
1) training by using the cross entropy of the network prediction result and the label as an error in model training;
2) during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of prediction probabilities by a model;
3) averaging the prediction probabilities to obtain the prediction probability;
4) and taking the category with the highest prediction probability in the step 3) as the classification of the picture.
Semantic segmentation:
1) training by using the sum of the prediction results of all pixels and the cross entropy of the labels as an error in model training;
2) during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of pixel-level prediction probabilities by a model;
3) averaging the prediction probabilities to obtain pixel-level prediction probabilities;
4) and taking the category with the maximum result on each pixel obtained in the step 3) as the segmentation result of the pixel.
Detecting a challenge sample:
1) for a group of confrontation samples, randomly sampling 30 network structures from the network structure distribution obtained by learning;
2) based on these structures, the model gives 30 sets of prediction probabilities;
3) averaging the prediction probabilities to obtain the final prediction probability of the model;
4) calculating the entropy of the prediction probability obtained in step 3) as an index of detection;
5) and if the entropy obtained in the step 4) is obviously larger than the entropy corresponding to the prediction of the normal sample, the countermeasure sample is detected.
Detection field migration:
1) for a group of samples which are sampled from different fields from the training data, randomly sampling 100 network structures from the structure distribution obtained by learning;
2) based on these structures, the model gives 100 sets of prediction probabilities;
3) averaging the prediction probabilities to obtain the final prediction probability of the model;
4) calculating the entropy of the prediction probability obtained in step 3) as an index of detection;
5) and if the entropy obtained in the step 4) is obviously larger than the entropy corresponding to the prediction of the normal sample, the detection of the domain migration is indicated.
The deep Bayesian structure network is used for testing on the natural picture classification data sets CIFAR-10 and CIFAR-100, and the trained model respectively achieves 4.98% and 22.50% of classification error rates, which are greatly superior to the most advanced deep neural networks ResNet and DenseNet. The invention is applied to a CamVid semantic segmentation task, and can obtain average IoU which is 2.3 higher than that of FC-DenseNet under the condition of using the same training condition with a powerful FC-DenseNet method, and simultaneously reach the level which is comparable to the world leading level method. In addition, the model obtained by training can detect the antagonistic sample and the field migration through the prediction uncertainty, and the model shows the prediction uncertainty which is obviously superior to that of the traditional Bayesian neural network in the test. In conclusion, the invention introduces uncertainty into the network structure of the depth network by utilizing the network structure learning space provided in the neural network architecture search, performs Bayesian modeling, and learns by using the random variational inference method, thereby relieving the problems of difficult design prior, difficult inference posterior and the like of the Bayesian depth network with uncertain weight, bringing comprehensive promotion to the prediction performance and the prediction uncertainty of the network model, and remarkably improving the performance of the Bayesian neural network on tasks such as image classification, semantic segmentation, countersample detection, field migration and the like.
Fig. 2 is a schematic structural diagram of a bayesian structure learning apparatus of a deep neural network according to an embodiment of the present invention, as shown in fig. 2, the apparatus includes: a construction unit 10, a training unit 11, an error calculation unit 12 and a judgment unit 13, wherein,
the construction unit 10 is configured to construct a deep neural network, where the deep neural network includes at least one learning unit with the same internal structure, the learning unit includes a preset number of hidden layers, each two hidden layers includes a plurality of calculation units, a network structure is defined as a relative weight of each calculation unit, and a parameterized variational distribution is used to model the network structure; the training unit 11 is configured to randomly extract a training subset from a preset training set, and sample a network structure of the learning unit by using a parameterization process; the error calculation unit 12 is configured to obtain an evidence lower bound ELBO of the deep neural network according to the sampled network structure; the judging unit 13 is configured to optimize the network structure and the network weight according to a preset optimization method if the change of the lower evidence bound exceeds a preset loss threshold, and randomly extract a training subset from the training set again to continue training the network structure of the learning unit; and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.
The building unit 10 builds a deep neural network according to actual needs, and the deep neural network comprises: the device comprises an input layer, an output layer and at least one repeatedly piled learning unit positioned between the input layer and the output layer, wherein the learning units are sequentially connected in series and have the same network structure.
The learning unit comprises K hidden layers with a preset number of layers, and any two hidden layers comprise a plurality of computing units, such as a full-connection unit, a convolution unit, a pooling unit and the like. The network structure α of the learning unit is ═ { α ═ α(i,j)|1≤i<j is less than or equal to K }, wherein alpha is(i,j)The method comprises the relative weight between the ith hidden layer and the jth hidden layer corresponding to each computing unit. The calculation from the feature representation of the i-th hidden layer to the feature representation of the j-th hidden layer is as follows: according to the alpha(i,j)And the weighted sum of the outputs of the computing units between the two.
The relative weights corresponding to the computing units are subjected to classification distribution, and can be set to be variable distribution which can be learned, continuous and parameterized for training based on an optimization method.
Further, the variation distribution is a concoret distribution.
The variation distribution can be set according to actual needs, and the embodiment of the present invention only gives an example of one of them: the constcrete distribution, but for simplicity, is exemplified in the following examples by way of example.
The training unit 11 sets in advance a training set including N samples, each sample (x)n,yn) Comprising input data xnAnd corresponding pre-labeled output result yn
Training the deep neural network constructed by the construction unit 10 according to the training set, and finally obtaining an optimized network structure of the learning unit after finishing the training, wherein the optimized network structure comprises optimized relative weights corresponding to each calculation unit between every two hidden layers. The specific training process is as follows:
1. a subset of size k in the training set is randomly selected by the training unit 11
Figure BDA0002215142260000121
Randomly sampling the network structure alpha of the learning unit from the variation distribution by using a re-parameterization technology, wherein the network weight and the bias parameter in the deep neural network are represented as w, and the deep neural network obtains the prediction probability according to the sampled network structure
Figure BDA0002215142260000122
f is one with xnA function corresponding to a deep neural network of a network structure in which w is a parameter and α is a learning unit is input.
2. The error calculation unit 12 obtains the current Lower Evidence Bound (ELBO) of the deep neural network according to the sampled network structure.
3. The judging unit 13 compares the current lower evidence bound with the lower evidence bound obtained after the last training is completed, so as to obtain the change of the lower evidence bound. And if the change of the evidence lower bound obtained after the training exceeds a preset loss threshold value, judging that the training is not finished, and further optimizing the sampled network structure and the sampled network parameters according to a preset optimization method. The subset is then re-selected from the training set to continue the training process of 1-3. And judging that the training is finished until the change of the lower evidence bound is less than or equal to the loss threshold.
As can be seen from the training process, the training process in the embodiment of the present invention only trains the connection relationship between hidden layers inside the learning unit, so that for a hidden layer structure of K layers, the network structure α of the learning unit has K (K-1)/2 variables in total, and each variable represents the relative weight of the computing unit between a pair of hidden layers.
The apparatus provided in the embodiment of the present invention is configured to execute the method, and the functions of the apparatus refer to the method embodiment specifically, and detailed method flows thereof are not described herein again.
According to the embodiment of the invention, the deep neural network comprising a plurality of learning units with the same internal structure is constructed, and the relative weights of the computing units among all hidden layers in the learning units are trained through the training set to obtain the optimized network structure of the learning units, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.
Based on the foregoing embodiment, further, the training unit is specifically configured to randomly extract a training subset from a preset training set, and sample the network structure of the learning unit by using a reparameterization process according to a preset adaptive coefficient.
In order to prevent the problem that the training is difficult to converge from occurring in the training process, therefore, when the learning unit is constructed, the construction unit adds a preset adaptability coefficient β ═ to the parameterization process(i,j)Adjust the variance of the samples. Whereby the specific reparameterisation process obtained by the training unit is
Figure BDA0002215142260000131
Wherein ∈ { [ epsilon ](i,j)A set of Gumbel variables that are independent for each dimension, τ being a positive real number representing temperature.
The specific process is as follows:
a. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;
b. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;
c. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;
d. and (c) inputting the result obtained in the step (c) into softmax transformation, and obtaining the sampled network structure alpha which is g (theta, beta and epsilon).
The apparatus provided in the embodiment of the present invention is configured to execute the method, and the functions of the apparatus refer to the method embodiment specifically, and detailed method flows thereof are not described herein again.
According to the embodiment of the invention, the adaptability coefficient is adjusted for the re-parameterization process, so that the problem that the training is difficult to converge in the training process is prevented, and the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a communication Interface (Communications Interface)303, a memory (memory)302 and a communication bus 304, wherein the processor 301, the communication Interface 303 and the memory 302 complete communication with each other through the communication bus 304. The processor 301 may call logic instructions in the memory 302 to perform the above-described method.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which, when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments.
Further, the present invention provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the methods provided by the above method embodiments.
Those of ordinary skill in the art will understand that: furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A Bayesian structure learning method for a deep neural network is characterized by comprising the following steps:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, a network structure is defined as the relative weight of each computing unit, and the network structure is modeled by adopting parameterized variational distribution;
randomly extracting a training subset from a preset training set, and sampling a network structure of the learning unit by adopting a re-parameterization process;
according to the sampled network structure, calculating an evidence lower bound ELBO of the deep neural network;
if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the network structure and the network weight according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;
if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished;
when the deep neural network is used for picture classification, then:
training by using the cross entropy of the network prediction result and the label as an error in model training;
during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of prediction probabilities by a model;
averaging the prediction probabilities to obtain the prediction probability;
taking the category with the maximum prediction probability as the classification of the pictures;
when the deep neural network is used for semantic segmentation, then:
training by using the sum of the prediction results of all pixels and the cross entropy of the labels as an error in model training;
during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of pixel-level prediction probabilities by a model;
averaging the prediction probabilities to obtain pixel-level prediction probabilities;
taking the class with the maximum pixel-level prediction probability on each pixel as the segmentation result of the pixel;
when the deep neural network is used to detect challenge samples, then:
for a group of confrontation samples, randomly sampling 30 network structures from the network structure distribution obtained by learning;
based on these structures, the model gives 30 sets of prediction probabilities;
averaging the prediction probabilities to obtain the final prediction probability of the model;
calculating the entropy of the final prediction probability of the obtained model as a detection index;
if the obtained entropy is obviously larger than the predicted corresponding entropy of the normal sample, the confrontation sample is detected;
when the deep neural network is used to detect a domain migration, then:
for a group of samples which are sampled from different fields from the training data, randomly sampling 100 network structures from the structure distribution obtained by learning;
based on these structures, the model gives 100 sets of prediction probabilities;
averaging the prediction probabilities to obtain the final prediction probability of the model;
calculating the entropy of the obtained prediction probability as a detection index;
if the entropy obtained in the step (b) is obviously larger than the entropy corresponding to the prediction of the normal sample, the field migration is detected;
the network structure for sampling the learning unit by adopting the reparameterization process specifically comprises the following steps:
sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient;
when a learning unit is constructed, a preset adaptability coefficient beta is added to a heavy parameterization process(i,j)Adjusting the variance of the samples; the specific reparameterization process thus obtained is
Figure FDA0003021580730000031
Wherein ∈ { [ epsilon ](i,j)A set of Gumbel variables subject to dimensional independence, where τ is a positive real number representing temperature;
the specific process is as follows:
a. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;
b. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;
c. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;
d. inputting the result obtained in the step (c) into softmax transformation, and obtaining a sampled network structure alpha which is g (theta, beta and epsilon);
the calculating of the evidence lower bound ELBO of the deep neural network according to the sampled network structure specifically includes:
according to the sampled network structure, calculating output results corresponding to the labeled samples in the training subset, and calculating the error of the deep neural network and the difference value of the logarithmic density in the network variation distribution and the preset prior distribution;
and carrying out weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain the evidence lower bound of the deep neural network.
2. The Bayesian structure learning method for the deep neural network as recited in claim 1, wherein the deep neural network is constructed and comprises at least one learning unit with the same internal structure; the method specifically comprises the following steps:
constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, and a preset down-sampling layer and/or an up-sampling layer are/is inserted between predetermined learning units; wherein the downsampling layer comprises: the device comprises a batch regularization layer, a linear rectification layer, a convolution layer and a pooling layer, wherein the up-sampling layer is constructed by a deconvolution layer.
3. The Bayesian structure learning method for the deep neural network as recited in claim 2, wherein an input layer of the deep neural network is a convolutional layer for preprocessing, and an output layer is a linear fully-connected layer.
4. The Bayesian structure learning method for a deep neural network as recited in claim 3, wherein the variation distribution is a concoret distribution.
5. A Bayesian structure learning device for a deep neural network, comprising:
the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, the network structure is defined as the relative weight of each computing unit, and the parameterized variational distribution is adopted to model the network structure;
the training unit is used for randomly extracting a training subset from a preset training set and sampling the network structure of the learning unit by adopting a re-parameterization process;
the error calculation unit is used for obtaining an evidence lower bound ELBO of the deep neural network according to the sampled network structure;
the judging unit is used for optimizing the distribution and the network weight of the network structure according to a preset optimization method if the change of the evidence lower bound exceeds a preset loss threshold value, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit; if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished;
when the deep neural network is used for picture classification, then:
training by using the cross entropy of the network prediction result and the label as an error in model training;
during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of prediction probabilities by a model;
averaging the prediction probabilities to obtain the prediction probability;
taking the category with the maximum prediction probability as the classification of the pictures;
when the deep neural network is used for semantic segmentation, then:
training by using the sum of the prediction results of all pixels and the cross entropy of the labels as an error in model training;
during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of pixel-level prediction probabilities by a model;
averaging the prediction probabilities to obtain pixel-level prediction probabilities;
taking the class with the maximum pixel-level prediction probability on each pixel as the segmentation result of the pixel;
when the deep neural network is used to detect challenge samples, then:
for a group of confrontation samples, randomly sampling 30 network structures from the network structure distribution obtained by learning;
based on these structures, the model gives 30 sets of prediction probabilities;
averaging the prediction probabilities to obtain the final prediction probability of the model;
calculating the entropy of the final prediction probability of the obtained model as a detection index;
if the obtained entropy is obviously larger than the predicted corresponding entropy of the normal sample, the confrontation sample is detected;
when the deep neural network is used to detect a domain migration, then:
for a group of samples which are sampled from different fields from the training data, randomly sampling 100 network structures from the structure distribution obtained by learning;
based on these structures, the model gives 100 sets of prediction probabilities;
averaging the prediction probabilities to obtain the final prediction probability of the model;
calculating the entropy of the obtained prediction probability as a detection index;
if the obtained entropy is obviously larger than the entropy corresponding to the prediction of the normal sample, the field migration is detected;
the network structure for sampling the learning unit by adopting the reparameterization process specifically comprises the following steps:
sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient;
when a learning unit is constructed, a preset adaptability coefficient beta is added to a heavy parameterization process(i,j)Adjusting the variance of the samples; the specific reparameterization process thus obtained is
Figure FDA0003021580730000061
Wherein ∈ { [ epsilon ](i,j)A set of Gumbel variables subject to dimensional independence, where τ is a positive real number representing temperature;
the specific process is as follows:
e. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;
f. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;
g. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;
h. inputting the result obtained in the step (c) into softmax transformation, and obtaining a sampled network structure alpha which is g (theta, beta and epsilon);
the calculating of the evidence lower bound ELBO of the deep neural network according to the sampled network structure specifically includes:
according to the sampled network structure, calculating output results corresponding to the labeled samples in the training subset, and calculating the error of the deep neural network and the difference value of the logarithmic density in the network variation distribution and the preset prior distribution;
and carrying out weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain the evidence lower bound of the deep neural network.
6. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the bayesian structure learning method of the deep neural network of any one of claims 1 to 4.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the bayesian structure learning method of a deep neural network according to any one of claims 1 to 4.
CN201910912494.3A 2019-09-25 2019-09-25 Bayes structure learning method and device of deep neural network Active CN110738242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910912494.3A CN110738242B (en) 2019-09-25 2019-09-25 Bayes structure learning method and device of deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910912494.3A CN110738242B (en) 2019-09-25 2019-09-25 Bayes structure learning method and device of deep neural network

Publications (2)

Publication Number Publication Date
CN110738242A CN110738242A (en) 2020-01-31
CN110738242B true CN110738242B (en) 2021-08-10

Family

ID=69269608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910912494.3A Active CN110738242B (en) 2019-09-25 2019-09-25 Bayes structure learning method and device of deep neural network

Country Status (1)

Country Link
CN (1) CN110738242B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325757B (en) * 2020-02-18 2022-12-23 西北工业大学 Point cloud identification and segmentation method based on Bayesian neural network
CN111860495B (en) * 2020-06-19 2022-05-17 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
CN111814966A (en) * 2020-08-24 2020-10-23 国网浙江省电力有限公司 Neural network architecture searching method, neural network application method, device and storage medium
CN112637879A (en) * 2020-12-18 2021-04-09 中国科学院深圳先进技术研究院 Method for deciding fault intervention time of telecommunication core network
CN114445692B (en) * 2021-12-31 2022-11-15 北京瑞莱智慧科技有限公司 Image recognition model construction method and device, computer equipment and storage medium
CN114961985A (en) * 2022-05-11 2022-08-30 西安交通大学 Intelligent prediction method and system for performance of hydrogen fuel aviation rotor engine
CN116030063B (en) * 2023-03-30 2023-07-04 同心智医科技(北京)有限公司 Classification diagnosis system, method, electronic device and medium for MRI image

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030165262A1 (en) * 2002-02-21 2003-09-04 The University Of Chicago Detection of calcifications within a medical image
US10014076B1 (en) * 2015-02-06 2018-07-03 Brain Trust Innovations I, Llc Baggage system, RFID chip, server and method for capturing baggage data
CN107292324A (en) * 2016-03-31 2017-10-24 日本电气株式会社 Method and apparatus for training mixed model
US11042811B2 (en) * 2016-10-05 2021-06-22 D-Wave Systems Inc. Discrete variational auto-encoder systems and methods for machine learning using adiabatic quantum computers
US11531852B2 (en) * 2016-11-28 2022-12-20 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels
WO2018200054A1 (en) * 2017-04-28 2018-11-01 Pearson Education, Inc. Method and system for bayesian network-based standard or skill mastery determination
US20190019097A1 (en) * 2017-04-28 2019-01-17 Pearson Education, Inc. Method and system for bayesian network-based standard or skill mastery determination using a collection of interim assessments
CN107491417B (en) * 2017-07-06 2021-06-22 复旦大学 Document generation method based on specific division under topic model
US11797838B2 (en) * 2018-03-13 2023-10-24 Pinterest, Inc. Efficient convolutional network for recommender systems
CN108763167A (en) * 2018-05-07 2018-11-06 西北工业大学 A kind of adaptive filter method of variation Bayes
CN109299464B (en) * 2018-10-12 2023-07-28 天津大学 Topic embedding and document representing method based on network links and document content
CN109894495B (en) * 2019-01-11 2020-12-22 广东工业大学 Extruder anomaly detection method and system based on energy consumption data and Bayesian network
CN109902801B (en) * 2019-01-22 2020-11-17 华中科技大学 Flood collective forecasting method based on variational reasoning Bayesian neural network
CN109858630A (en) * 2019-02-01 2019-06-07 清华大学 Method and apparatus for intensified learning
CN109840833B (en) * 2019-02-13 2020-11-10 苏州大学 Bayesian collaborative filtering recommendation method

Also Published As

Publication number Publication date
CN110738242A (en) 2020-01-31

Similar Documents

Publication Publication Date Title
CN110738242B (en) Bayes structure learning method and device of deep neural network
CN109711426B (en) Pathological image classification device and method based on GAN and transfer learning
CN107526785B (en) Text classification method and device
CN108985317B (en) Image classification method based on separable convolution and attention mechanism
CN109754078A (en) Method for optimization neural network
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN113780292B (en) Semantic segmentation network model uncertainty quantification method based on evidence reasoning
CN109635763B (en) Crowd density estimation method
CN111985310A (en) Training method of deep convolutional neural network for face recognition
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN112183742A (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN114490065A (en) Load prediction method, device and equipment
CN113283524A (en) Anti-attack based deep neural network approximate model analysis method
CN111832650A (en) Image classification method based on generation of confrontation network local aggregation coding semi-supervision
CN117061322A (en) Internet of things flow pool management method and system
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN115131558A (en) Semantic segmentation method under less-sample environment
CN117037258B (en) Face image detection method and device, storage medium and electronic equipment
CN112541530B (en) Data preprocessing method and device for clustering model
CN109934835B (en) Contour detection method based on deep strengthening network adjacent connection
CN116543259A (en) Deep classification network noise label modeling and correcting method, system and storage medium
CN111079930A (en) Method and device for determining quality parameters of data set and electronic equipment
CN112529637B (en) Service demand dynamic prediction method and system based on context awareness
CN115358952A (en) Image enhancement method, system, equipment and storage medium based on meta-learning
CN115829029A (en) Channel attention-based self-distillation implementation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant