CN110738242B

CN110738242B - Bayes structure learning method and device of deep neural network

Info

Publication number: CN110738242B
Application number: CN201910912494.3A
Authority: CN
Inventors: 朱军; 邓志杰; 张钹
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2021-08-10
Anticipated expiration: 2039-09-25
Also published as: CN110738242A

Abstract

The embodiment of the invention provides a Bayes structure learning method and device of a deep neural network. The method comprises the steps of constructing a deep neural network comprising a plurality of learning units with the same internal structure, wherein each learning unit comprises a plurality of hidden layers, each hidden layer comprises a plurality of computing units, the network structure is the relative weight of each computing unit, and parameterized variational distribution is adopted to model the network structure; extracting a training subset, and sampling a network structure by adopting a re-parameterization process; calculating a lower evidence bound; if the change in the evidence lower bound exceeds the loss threshold, the network structure and network weights are optimized and new training begins. According to the embodiment of the invention, the deep neural network comprising a plurality of learning units with the same internal structure is constructed, and the relative weights of the computing units among all hidden layers in the learning units are trained through the training set to obtain an optimized network structure, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.

Description

Bayes structure learning method and device of deep neural network

Technical Field

The invention relates to the technical field of data processing, in particular to a Bayesian structure learning method and device for a deep neural network.

Background

Bayesian deep learning is intended to provide accurate and reliable uncertainty assessment for flexible and efficient deep neural networks. Traditionally, the bayesian network introduces uncertainty in the network weights, which often prevents the problem that the model is easily over-fitted, and also brings efficient prediction uncertainty for the model. However, there is a problem with introducing uncertainty in the network weights. Firstly, the prior distribution of the artificially set weight is often unreliable, which easily causes the problems of over-pruning and the like, so that the model fitting capability is greatly limited; secondly, introducing flexible variation distribution on weight is easy to cause difficulty in reasoning because of complex dependence relationship in the variation distribution. Recently, particle-based variational inference techniques have also been used to optimize bayesian networks, but they also suffer from problems of particle collapse and degradation.

Therefore, the current bayesian network cannot provide accurate and reliable prediction performance in practical application.

Disclosure of Invention

Because the existing method has the problems, the embodiment of the invention provides a Bayesian structure learning method and device for a deep neural network.

In a first aspect, an embodiment of the present invention provides a bayesian structure learning method for a deep neural network, including:

constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, a network structure is defined as the relative weight of each computing unit, and the network structure is modeled by adopting parameterized variational distribution;

randomly extracting a training subset from a preset training set, and sampling a network structure of the learning unit by adopting a re-parameterization process;

according to the sampled network structure, calculating an evidence lower bound ELBO of the deep neural network;

if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the network structure and the network weight according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;

and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.

Further, the sampling the network structure of the learning unit by adopting a re-parameterization process; the method specifically comprises the following steps:

and sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient.

Further, obtaining an evidence lower bound of the deep neural network according to the sampled network structure; the method specifically comprises the following steps:

according to the sampled network structure, calculating output results corresponding to the labeled samples in the training subset, and calculating the error of the deep neural network and the difference value of the logarithmic density in the network variation distribution and the preset prior distribution;

and carrying out weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain the evidence lower bound of the deep neural network.

Further, the deep neural network is constructed, and comprises at least one learning unit with the same internal structure; the method specifically comprises the following steps:

constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, and a preset down-sampling layer and/or an up-sampling layer are/is inserted between predetermined learning units; wherein the downsampling layer comprises: the device comprises a batch regularization layer, a linear rectification layer, a convolution layer and a pooling layer, wherein the up-sampling layer is constructed by a deconvolution layer.

Further, an input layer of the deep neural network is a convolution layer for preprocessing, and an output layer is a linear full-connection layer.

Further, the variation distribution is a concoret distribution.

In a second aspect, an embodiment of the present invention provides a bayesian structure learning apparatus for a deep neural network, including:

the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, the network structure is defined as the relative weight of each computing unit, and the parameterized variational distribution is adopted to model the network structure;

the training unit is used for randomly extracting a training subset from a preset training set and sampling the network structure of the learning unit by adopting a re-parameterization process;

the error calculation unit is used for obtaining an evidence lower bound ELBO of the deep neural network according to the sampled network structure;

the judging unit is used for optimizing the network structure and the network weight according to a preset optimization method if the change of the evidence lower bound exceeds a preset loss threshold value, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit; and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.

Further, the training unit is specifically configured to randomly extract a training subset from a preset training set, and sample the network structure of the learning unit by using a reparameterization process according to a preset adaptive coefficient.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

a processor, a memory, a communication interface, and a communication bus; wherein the content of the first and second substances,

the processor, the memory and the communication interface complete mutual communication through the communication bus;

the communication interface is used for information transmission between communication devices of the electronic equipment;

the memory stores computer program instructions executable by the processor, the processor invoking the program instructions to perform a method comprising:

if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the distribution and the network weight of the network structure according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;

In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following method:

According to the Bayes structure learning method and device for the deep neural network, provided by the embodiment of the invention, the optimized network structure of the learning unit is obtained by constructing the deep neural network comprising a plurality of learning units with the same internal structure and training the relative weight of each computing unit among all hidden layers in the learning unit through a training set, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart of a Bayesian structure learning method for a deep neural network according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a Bayesian structure learning device of a deep neural network according to an embodiment of the present invention;

fig. 3 illustrates a physical structure diagram of an electronic device.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a bayesian structure learning method of a deep neural network according to an embodiment of the present invention, as shown in fig. 1, the method includes:

and step S01, constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of calculation units are arranged between every two hidden layers, the network structure is defined as the relative weight of each calculation unit, and the parameterized variation distribution is adopted to model the network structure.

Constructing a deep neural network according to actual needs, wherein the deep neural network comprises the following components: the device comprises an input layer, an output layer and at least one repeatedly piled learning unit positioned between the input layer and the output layer, wherein the learning units are sequentially connected in series and have the same network structure.

The learning unit comprises K hidden layers with a preset number of layers, and any two hidden layers comprise a plurality of computing units, such as a full-connection unit, a convolution unit, a pooling unit and the like. The network structure α of the learning unit is ═ { α ═ α^(i，j)I is more than or equal to 1 and less than j and less than or equal to K, wherein alpha is^(i，j)The method comprises the relative weight between the ith hidden layer and the jth hidden layer corresponding to each computing unit. The feature of the i-th hidden layerThe computation of the representation of the features to the jth hidden layer is: according to the alpha^(i，j)And the weighted sum of the outputs of the computing units between the two.

The relative weights corresponding to the computing units obey classification distribution, and can be set to be variable distribution which can be learned, continuous and parameterized for convenience of training based on an optimization method, so that the network structure is constructed.

Further, the variation distribution is a concoret distribution.

The variation distribution can be set according to actual needs, and the embodiment of the present invention only gives an example of one of them: the constcrete distribution, but for simplicity, is exemplified in the following examples by way of example.

And step S02, randomly extracting a training subset from a preset training set, and sampling the network structure of the learning unit by adopting a re-parameterization process.

And step S03, obtaining an evidence lower bound ELBO of the deep neural network according to the sampled network structure.

And step S04, if the change of the evidence lower bound exceeds a preset loss threshold, optimizing the distribution and the network weight of the network structure according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit.

And step S05, if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.

Presetting a training set containing N samples, each sample (x)_n，y_n) Comprising input data x_nAnd corresponding pre-labeled output result y_n。

And training the deep neural network according to the training set, and finally obtaining an optimized network structure of the learning unit after finishing the training, wherein the optimized network structure comprises optimized relative weights corresponding to each computing unit between every two hidden layers. The specific training process is as follows:

1. randomSelecting a subset of size k in the training set

Randomly sampling the network structure alpha of the learning unit from the variation distribution by using a re-parameterization technology, wherein the network weight and the bias parameter in the deep neural network are represented as w, and the deep neural network obtains the prediction probability according to the sampled network structure

f is one with x_nA function corresponding to a deep neural network of a network structure in which w is a parameter and α is a learning unit is input.

2. And obtaining the current Evidence Lower Bound (ELBO) of the deep neural network according to the sampled network structure.

3. And comparing the current evidence lower bound with the evidence lower bound obtained after the last training is finished so as to obtain the change of the evidence lower bound. And if the change of the evidence lower bound obtained after the training exceeds the preset loss threshold, judging that the training is not finished, and further optimizing the current network structure and the network weight according to a preset optimization method. The subset is then re-selected from the training set to continue the training process of 1-3. And judging that the training is finished until the change of the lower evidence bound is less than or equal to the loss threshold.

As can be seen from the training process, the training process in the embodiment of the present invention only trains the connection relationship between hidden layers inside the learning unit, so that for a hidden layer structure of K layers, the network structure α of the learning unit has K (K-1)/2 variables in total, and each variable represents the relative weight of the computing unit between a pair of hidden layers.

According to the embodiment of the invention, the deep neural network comprising a plurality of learning units with the same internal structure is constructed, and the relative weights of the computing units among all hidden layers in the learning units are trained through the training set to obtain the optimized network structure of the learning units, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.

Based on the above embodiment, further, in step S02, a re-parameterization process is used to sample the network structure of the learning unit; the method specifically comprises the following steps:

In order to prevent the problem that the training is difficult to converge in the training process, therefore, when the learning unit is constructed, a preset adaptability coefficient beta is added to the parameterization process^(i，j)Adjust the variance of the samples. The specific reparameterization process thus obtained is

Wherein ∈ { [ epsilon ]^(i，j)A set of Gumbel variables that are independent for each dimension, τ being a positive real number representing temperature.

The specific process is as follows:

a. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;

b. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;

c. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;

d. and (c) inputting the result obtained in the step (c) into softmax transformation, and obtaining the sampled network structure alpha which is g (theta, beta and epsilon).

According to the embodiment of the invention, the adaptability coefficient is adjusted for the re-parameterization process, so that the problem that the training is difficult to converge in the training process is prevented, and the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.

Based on the foregoing embodiment, further, the step S03 specifically includes:

step S031, according to the sampled network structure, calculating an output result corresponding to each labeled sample in the training subset, and calculating an error of the deep neural network and a log density difference between the network variation distribution and a preset prior distribution.

And S032, performing weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain an evidence lower bound of the deep neural network.

After a sampled network structure of the learning unit is obtained by sampling according to the extracted training subset and using a reparameterization process, prediction results obtained by the samples in the training subset through a current deep neural network are obtained according to the sampled network structure, and errors of the prediction results are calculated, wherein the calculation methods of the errors are many, and only cross entropy is taken as an example to illustrate: cross entropy

Meanwhile, based on the network structure alpha after current sampling, calculating the difference value of the logarithmic density of the network structure alpha in the variation distribution and the preset prior distribution, wherein the two distributions are both adaptive concoret distributions, and the logarithmic density is as follows:

therefore, the readily available difference is denoted as KL;

weighting the difference value of the cross entropy and the logarithmic density to obtain the integral loss of the deep neural network

Wherein

The KL distance corresponding to the whole training set is distributed to the weight corresponding to the subset, the loss is the approximate error of variational reasoning, and a preset optimization method, such as a gradient descent method, is used for solving an optimization problem min_θ，wELBO, in order to further optimize the network structure.

In each training, the difference value of the cross entropy and the logarithmic density is calculated, the evidence lower bound of the deep neural network is obtained after the mutual weighting, and the network structure is optimized according to the change of the evidence lower bound, so that the prediction performance and the prediction uncertainty of the deep neural network are comprehensively improved.

Based on the foregoing embodiment, further, the step S01 specifically includes:

The deep neural network constructed by the embodiment of the invention comprises a plurality of learning units which are connected in series, and a preset down-sampling layer and/or up-sampling layer is inserted between part of predetermined learning units according to the requirements of practical application. In the picture classification, only a downsampling layer needs to be inserted, and for semantic segmentation, the downsampling layer needs to be inserted. Wherein the downsampling layer is generally composed of batch regularization-linear rectification-convolution-pooling, and the upsampling is generally constructed by deconvolution.

The convolution computing unit adopts the operation sequence of batch regularization-linear rectification-convolution-batch regularization;

the method integrates the simultaneous, mutually independent and same-class calculation units into one group operation, improves the calculation efficiency, and mainly arranges a plurality of convolutions into one group convolution to improve the calculation efficiency.

According to the embodiment of the invention, a down-sampling layer and/or an up-sampling layer are/is inserted between the predetermined learning units, a preprocessed convolution layer is added in the input layer, and a linear full-connection layer is inserted in the network finally, so that the performance of the deep neural network is improved.

The deep neural network constructed by the above embodiment can be applied to various scenes, such as:

classifying pictures:

1) training by using the cross entropy of the network prediction result and the label as an error in model training;

2) during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of prediction probabilities by a model;

3) averaging the prediction probabilities to obtain the prediction probability;

4) and taking the category with the highest prediction probability in the step 3) as the classification of the picture.

Semantic segmentation:

1) training by using the sum of the prediction results of all pixels and the cross entropy of the labels as an error in model training;

2) during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of pixel-level prediction probabilities by a model;

3) averaging the prediction probabilities to obtain pixel-level prediction probabilities;

4) and taking the category with the maximum result on each pixel obtained in the step 3) as the segmentation result of the pixel.

Detecting a challenge sample:

1) for a group of confrontation samples, randomly sampling 30 network structures from the network structure distribution obtained by learning;

2) based on these structures, the model gives 30 sets of prediction probabilities;

3) averaging the prediction probabilities to obtain the final prediction probability of the model;

4) calculating the entropy of the prediction probability obtained in step 3) as an index of detection;

5) and if the entropy obtained in the step 4) is obviously larger than the entropy corresponding to the prediction of the normal sample, the countermeasure sample is detected.

Detection field migration:

1) for a group of samples which are sampled from different fields from the training data, randomly sampling 100 network structures from the structure distribution obtained by learning;

2) based on these structures, the model gives 100 sets of prediction probabilities;

5) and if the entropy obtained in the step 4) is obviously larger than the entropy corresponding to the prediction of the normal sample, the detection of the domain migration is indicated.

The deep Bayesian structure network is used for testing on the natural picture classification data sets CIFAR-10 and CIFAR-100, and the trained model respectively achieves 4.98% and 22.50% of classification error rates, which are greatly superior to the most advanced deep neural networks ResNet and DenseNet. The invention is applied to a CamVid semantic segmentation task, and can obtain average IoU which is 2.3 higher than that of FC-DenseNet under the condition of using the same training condition with a powerful FC-DenseNet method, and simultaneously reach the level which is comparable to the world leading level method. In addition, the model obtained by training can detect the antagonistic sample and the field migration through the prediction uncertainty, and the model shows the prediction uncertainty which is obviously superior to that of the traditional Bayesian neural network in the test. In conclusion, the invention introduces uncertainty into the network structure of the depth network by utilizing the network structure learning space provided in the neural network architecture search, performs Bayesian modeling, and learns by using the random variational inference method, thereby relieving the problems of difficult design prior, difficult inference posterior and the like of the Bayesian depth network with uncertain weight, bringing comprehensive promotion to the prediction performance and the prediction uncertainty of the network model, and remarkably improving the performance of the Bayesian neural network on tasks such as image classification, semantic segmentation, countersample detection, field migration and the like.

Fig. 2 is a schematic structural diagram of a bayesian structure learning apparatus of a deep neural network according to an embodiment of the present invention, as shown in fig. 2, the apparatus includes: a construction unit 10, a training unit 11, an error calculation unit 12 and a judgment unit 13, wherein,

the construction unit 10 is configured to construct a deep neural network, where the deep neural network includes at least one learning unit with the same internal structure, the learning unit includes a preset number of hidden layers, each two hidden layers includes a plurality of calculation units, a network structure is defined as a relative weight of each calculation unit, and a parameterized variational distribution is used to model the network structure; the training unit 11 is configured to randomly extract a training subset from a preset training set, and sample a network structure of the learning unit by using a parameterization process; the error calculation unit 12 is configured to obtain an evidence lower bound ELBO of the deep neural network according to the sampled network structure; the judging unit 13 is configured to optimize the network structure and the network weight according to a preset optimization method if the change of the lower evidence bound exceeds a preset loss threshold, and randomly extract a training subset from the training set again to continue training the network structure of the learning unit; and if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished.

The building unit 10 builds a deep neural network according to actual needs, and the deep neural network comprises: the device comprises an input layer, an output layer and at least one repeatedly piled learning unit positioned between the input layer and the output layer, wherein the learning units are sequentially connected in series and have the same network structure.

The learning unit comprises K hidden layers with a preset number of layers, and any two hidden layers comprise a plurality of computing units, such as a full-connection unit, a convolution unit, a pooling unit and the like. The network structure α of the learning unit is ═ { α ═ α^(i,j)|1≤i<j is less than or equal to K }, wherein alpha is^(i,j)The method comprises the relative weight between the ith hidden layer and the jth hidden layer corresponding to each computing unit. The calculation from the feature representation of the i-th hidden layer to the feature representation of the j-th hidden layer is as follows: according to the alpha^(i,j)And the weighted sum of the outputs of the computing units between the two.

The relative weights corresponding to the computing units are subjected to classification distribution, and can be set to be variable distribution which can be learned, continuous and parameterized for training based on an optimization method.

Further, the variation distribution is a concoret distribution.

The training unit 11 sets in advance a training set including N samples, each sample (x)_n，y_n) Comprising input data x_nAnd corresponding pre-labeled output result y_n。

Training the deep neural network constructed by the construction unit 10 according to the training set, and finally obtaining an optimized network structure of the learning unit after finishing the training, wherein the optimized network structure comprises optimized relative weights corresponding to each calculation unit between every two hidden layers. The specific training process is as follows:

1. a subset of size k in the training set is randomly selected by the training unit 11

2. The error calculation unit 12 obtains the current Lower Evidence Bound (ELBO) of the deep neural network according to the sampled network structure.

3. The judging unit 13 compares the current lower evidence bound with the lower evidence bound obtained after the last training is completed, so as to obtain the change of the lower evidence bound. And if the change of the evidence lower bound obtained after the training exceeds a preset loss threshold value, judging that the training is not finished, and further optimizing the sampled network structure and the sampled network parameters according to a preset optimization method. The subset is then re-selected from the training set to continue the training process of 1-3. And judging that the training is finished until the change of the lower evidence bound is less than or equal to the loss threshold.

The apparatus provided in the embodiment of the present invention is configured to execute the method, and the functions of the apparatus refer to the method embodiment specifically, and detailed method flows thereof are not described herein again.

Based on the foregoing embodiment, further, the training unit is specifically configured to randomly extract a training subset from a preset training set, and sample the network structure of the learning unit by using a reparameterization process according to a preset adaptive coefficient.

In order to prevent the problem that the training is difficult to converge from occurring in the training process, therefore, when the learning unit is constructed, the construction unit adds a preset adaptability coefficient β ═ to the parameterization process^(i，j)Adjust the variance of the samples. Whereby the specific reparameterisation process obtained by the training unit is

The specific process is as follows:

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a communication Interface (Communications Interface)303, a memory (memory)302 and a communication bus 304, wherein the processor 301, the communication Interface 303 and the memory 302 complete communication with each other through the communication bus 304. The processor 301 may call logic instructions in the memory 302 to perform the above-described method.

Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which, when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments.

Further, the present invention provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the methods provided by the above method embodiments.

Those of ordinary skill in the art will understand that: furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A Bayesian structure learning method for a deep neural network is characterized by comprising the following steps:

if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished;

when the deep neural network is used for picture classification, then:

training by using the cross entropy of the network prediction result and the label as an error in model training;

during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of prediction probabilities by a model;

averaging the prediction probabilities to obtain the prediction probability;

taking the category with the maximum prediction probability as the classification of the pictures;

when the deep neural network is used for semantic segmentation, then:

training by using the sum of the prediction results of all pixels and the cross entropy of the labels as an error in model training;

during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of pixel-level prediction probabilities by a model;

averaging the prediction probabilities to obtain pixel-level prediction probabilities;

taking the class with the maximum pixel-level prediction probability on each pixel as the segmentation result of the pixel;

when the deep neural network is used to detect challenge samples, then:

for a group of confrontation samples, randomly sampling 30 network structures from the network structure distribution obtained by learning;

based on these structures, the model gives 30 sets of prediction probabilities;

averaging the prediction probabilities to obtain the final prediction probability of the model;

calculating the entropy of the final prediction probability of the obtained model as a detection index;

if the obtained entropy is obviously larger than the predicted corresponding entropy of the normal sample, the confrontation sample is detected;

when the deep neural network is used to detect a domain migration, then:

for a group of samples which are sampled from different fields from the training data, randomly sampling 100 network structures from the structure distribution obtained by learning;

based on these structures, the model gives 100 sets of prediction probabilities;

calculating the entropy of the obtained prediction probability as a detection index;

if the entropy obtained in the step (b) is obviously larger than the entropy corresponding to the prediction of the normal sample, the field migration is detected;

the network structure for sampling the learning unit by adopting the reparameterization process specifically comprises the following steps:

sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient;

when a learning unit is constructed, a preset adaptability coefficient beta is added to a heavy parameterization process^(i，j)Adjusting the variance of the samples; the specific reparameterization process thus obtained is

Wherein ∈ { [ epsilon ]^(i，j)A set of Gumbel variables subject to dimensional independence, where τ is a positive real number representing temperature;

the specific process is as follows:

d. inputting the result obtained in the step (c) into softmax transformation, and obtaining a sampled network structure alpha which is g (theta, beta and epsilon);

the calculating of the evidence lower bound ELBO of the deep neural network according to the sampled network structure specifically includes:

2. The Bayesian structure learning method for the deep neural network as recited in claim 1, wherein the deep neural network is constructed and comprises at least one learning unit with the same internal structure; the method specifically comprises the following steps:

3. The Bayesian structure learning method for the deep neural network as recited in claim 2, wherein an input layer of the deep neural network is a convolutional layer for preprocessing, and an output layer is a linear fully-connected layer.

4. The Bayesian structure learning method for a deep neural network as recited in claim 3, wherein the variation distribution is a concoret distribution.

5. A Bayesian structure learning device for a deep neural network, comprising:

the judging unit is used for optimizing the distribution and the network weight of the network structure according to a preset optimization method if the change of the evidence lower bound exceeds a preset loss threshold value, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit; if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished;

when the deep neural network is used for picture classification, then:

averaging the prediction probabilities to obtain the prediction probability;

when the deep neural network is used for semantic segmentation, then:

when the deep neural network is used to detect challenge samples, then:

based on these structures, the model gives 30 sets of prediction probabilities;

when the deep neural network is used to detect a domain migration, then:

if the obtained entropy is obviously larger than the entropy corresponding to the prediction of the normal sample, the field migration is detected;

the specific process is as follows:

e. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;

f. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;

g. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;

h. inputting the result obtained in the step (c) into softmax transformation, and obtaining a sampled network structure alpha which is g (theta, beta and epsilon);

6. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the bayesian structure learning method of the deep neural network of any one of claims 1 to 4.

7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the bayesian structure learning method of a deep neural network according to any one of claims 1 to 4.