CN116012598A - Image recognition method, device, equipment and medium based on Bayesian linearity - Google Patents
Image recognition method, device, equipment and medium based on Bayesian linearity Download PDFInfo
- Publication number
- CN116012598A CN116012598A CN202111214197.5A CN202111214197A CN116012598A CN 116012598 A CN116012598 A CN 116012598A CN 202111214197 A CN202111214197 A CN 202111214197A CN 116012598 A CN116012598 A CN 116012598A
- Authority
- CN
- China
- Prior art keywords
- features
- image
- bayesian
- model
- distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000009826 distribution Methods 0.000 claims abstract description 87
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 49
- 238000003860 storage Methods 0.000 claims abstract description 19
- 238000011176 pooling Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 8
- 230000001149 cognitive effect Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 235000019692 hotdogs Nutrition 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000013398 bayesian method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 235000015113 tomato pastes and purées Nutrition 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 240000005561 Musa balbisiana Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 235000021015 bananas Nutrition 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses an image recognition method, an image recognition device, electronic equipment and a storage medium based on Bayesian linearity, wherein an image to be recognized is obtained; inputting an image to be identified into a pre-trained convolutional neural network model for target identification, wherein the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on input initial multidimensional features to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the initial multidimensional features in dimension, multiplying the linear features with the initial multidimensional features to obtain output multidimensional features of the channel attention module, and carrying out image identification on the image to be identified based on the output multidimensional features by the convolutional neural network module; and outputting the recognition result of the convolutional neural network model to the image to be recognized. The problems of more parameters and given prior distribution of the existing model are solved, and the uncertainty of the image can be effectively predicted.
Description
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to an image recognition method, device, electronic equipment and storage medium based on Bayesian linearity.
Background
The CNN (Convolutional Neural Network ) trains the model by back propagation based on observed data (training samples), obtains optimal point estimates of model parameters, and supports the model to output deterministic results. Such CNN networks trained based on parameter optimal point estimates fit observed data better, but do not predict unobserved data (test samples) better, i.e., an overfitting to existing training samples occurs. Although existing regularization methods can alleviate overfitting to some extent, such as early stopping, weight decay, L1-L2 regularization, dropout, etc., the model itself cannot measure uncertainty. For classification tasks, the softmax function maximizes the output probability score for a given class by compressing the output probability scores for other classes, which is not the confidence of the model in outputting the given class.
In order to improve the generalization capability of the CNN model and support the uncertainty of model measurement, the prior research introduces a Bayesian (Bayesian) method to obtain BCNN (Bayesian Convolutional Neural Network ) and converts the optimal point estimation of model parameters into the distribution estimation of the model parameters. BCNN first gives a priori distribution to the parameters, then performs gradient approximation estimation by a variational reasoning (Variational Inference) method, and learns a posterior distribution of fitting parameters based on observed data (training samples). The posterior distribution of the parameters is learned from the observed data for use in inferring the unobserved data (test samples).
The inventor finds that, when the BCNN is used for constructing a network model for image recognition, the problems of multiple parameters, difficult given good prior distribution and approximate gradient estimation still exist although the BCNN model supports uncertainty estimation.
Disclosure of Invention
The invention provides an image recognition method, device, electronic equipment and storage medium based on Bayesian linearity, which are used for solving the technical problems that the existing Bayesian convolutional neural network model has more parameters, good prior distribution is difficult to give and gradient approximation estimation is difficult.
In a first aspect, an embodiment of the present invention provides a bayesian linear based image recognition method, including:
acquiring an image to be identified;
inputting the image to be identified into a pre-trained convolutional neural network model for target identification, wherein the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on input initial multidimensional features to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the initial multidimensional features in dimension, and multiplying the linear features by the initial multidimensional features to obtain output multidimensional features of the channel attention module, and the convolutional neural network module carries out image identification on the image to be identified based on the output multidimensional features;
and outputting a recognition result of the convolutional neural network model on the image to be recognized.
In a second aspect, an embodiment of the present invention further provides an image recognition apparatus based on bayesian linearity, including:
the image acquisition unit is used for acquiring an image to be identified;
the image recognition unit is used for inputting the image to be recognized into a pre-trained convolutional neural network model for target recognition, the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on the input initial multidimensional feature to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the dimensions of the initial multidimensional features and multiplying the linear features by the initial multidimensional features to obtain output multidimensional features of the channel attention module, and the convolutional neural network module carries out image recognition on the image to be recognized based on the output multidimensional features;
and the result output unit is used for outputting the recognition result of the convolutional neural network model on the image to be recognized.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the electronic device to implement the bayesian linear based image recognition method as described in the first aspect.
In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the bayesian linear based image recognition method according to the first aspect.
The image recognition method, the device, the electronic equipment and the storage medium based on Bayesian linearity acquire an image to be recognized; inputting the image to be identified into a pre-trained convolutional neural network model for target identification, wherein the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on input initial multidimensional features to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the initial multidimensional features in dimension, and multiplying the linear features by the initial multidimensional features to obtain output multidimensional features of the channel attention module, and the convolutional neural network module carries out image identification on the image to be identified based on the output multidimensional features; and outputting a recognition result of the convolutional neural network model on the image to be recognized. The Bayesian linearity is added into the channel attention module of the model, a channel attention mechanism based on the Bayesian linearity is constructed in the model, and the uncertainty capture of local important information is realized, so that the problems that the existing Bayesian convolutional neural network model has more parameters, good prior distribution is difficult to give and gradient approximate estimation are solved, and the uncertainty of the whole image can be effectively predicted.
Drawings
FIG. 1 is a flow chart of a Bayesian linear-based image recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a channel attention module according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an image recognition device based on bayesian linearity according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not of limitation. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
In the field of image recognition processing, uncertainty is an indicator that measures how well an image recognition model predicts its determination. In the bayesian model, there are two main types of uncertainty: occasional uncertainty and cognitive uncertainty. Occasional uncertainties, which are inherent in the observed data, such as sensor noise or motion noise distributed evenly along the data set, are not eliminated by enlarging the data set. Cognitive uncertainty is due to the model failing to learn enough samples, and is prevalent because observed data (training samples) cannot fully cover the features of unobserved data (test samples), and can be reduced by enlarging the data set.
For example, an image is taken of the road edge generally toward one direction of extension of the road, and there may be content elements such as vehicles, intersections, sidewalks, traffic lights, pedestrians, trees, houses, etc. in the taken image. When the content elements in the image are segmented based on the image recognition model, there may be segmentation bias. Such occasional uncertainty can lead to poor segmentation of the vehicle in the image, for example, due to the fact that the tags of the vehicle are noisy due to the different distances and perspectives of the shots; such cognitive uncertainty may lead to poor segmentation of the sidewalk in the image, e.g. due to fewer sidewalks in the training set, or because the image recognition model used to fit the training set is not itself suitable for the segmentation task.
Accidental uncertainties are inherent, objective, to image recognition models; while cognitive uncertainties are subjective and can be eliminated. In general, uncertainty of an image recognition model is eliminated, on the one hand, a data set is increased, such as various sidewalks are increased for image recognition shot in the road scene; another aspect is to design a model that can fit the training set better, i.e., to change the model. In fact, regardless of how the number of samples is increased, the data that can be observed is always limited, cannot be exhausted, and the data acquisition cost is high; as for model changes, only models that better fit the observed data (training samples) can be used, and model uncertainty estimation cannot be supported.
From the data distribution perspective, the model uncertainty contained in the cognitive uncertainty is actually that the training is used to fit a distribution of features of a reaction training set, and the features of the test sample to be inferred may not follow the distribution. In this case, the learned model parameters are subject to a large uncertainty in the predictions. In the prior art, the Bayesian method is introduced, and when the image recognition model is trained, the optimal point estimation of the parameters is changed into distribution estimation, so that uncertain estimation is realized during reasoning. The parameters of the whole image recognition model are estimated by distribution, and the problems of multiple parameters, approximate gradient estimation and the like exist. Based on the method, the image recognition method based on Bayesian linearity is provided, and the distribution of the attention module in the image recognition model is only learned for the parameters of the attention module, so that the uncertainty of local important information is measured, and the problems that the parameters are more, the good prior distribution is difficult to give and the gradient approximation is estimated in the existing scheme are solved.
It should be noted that, for the sake of brevity, this specification is not exhaustive of all of the alternative embodiments, and after reading this specification, one skilled in the art will appreciate that any combination of features may constitute an alternative embodiment as long as the features do not contradict each other.
The following describes each embodiment in detail.
Fig. 1 is a flowchart of a method for bayesian linear-based image recognition method according to an embodiment of the present invention, where the bayesian linear-based image recognition method is used for an electronic device, and as shown in the drawing, the bayesian linear-based image recognition method includes:
step S110: and acquiring an image to be identified.
The scheme is used for image recognition, and can be used for recognizing the image in real time or at any time according to the recognition requirement of the user. If the real-time identification is performed, the image acquired by the image acquisition equipment is used as an image to be identified, the subsequent identification process is performed immediately after the acquisition, and although the identification result may be slightly delayed compared with the image acquisition time, the real-time identification can be still considered, for example, in the security field, the real-time identification is performed on the image acquired by the monitoring camera. If the identification is performed according to the identification requirement of the user, the image to be identified input by the user is acquired, the image to be identified is usually an image prepared by the user in advance, a subsequent identification process is performed after the image to be identified is acquired, and an identification result is correspondingly output, for example, the user needs to search the image with intention or search the information with the image.
Step S120: inputting the image to be identified into a pre-trained convolutional neural network model for target identification, wherein the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on input initial multidimensional features to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the initial multidimensional features in dimension, multiplying the linear features by the initial multidimensional features to obtain output multidimensional features of the channel attention module, and carrying out image identification on the image to be identified by the convolutional neural network module based on the output multidimensional features.
The channel attention module in the pre-trained convolutional neural network model is added with Bayesian linearity, and on the whole, the uncertainty capture of local important information can be effectively supported through a channel attention mechanism based on the Bayesian linearity, and the target can still be accurately identified from the image to be identified under the condition that no exhaustive non-target sample exists in the training data set. As shown in fig. 2, the input is c×h×w dimension characteristics, the c×1 dimension characteristics are obtained by averaging and pooling, and are used for bayesian linearity, and then are expanded into characteristics consistent with the input in size through a Sigmoid function and multiplied and output. The channel attention module in the scheme can be correspondingly configured according to the selected network structure, the configuration of the channel attention module in the network is common in the prior art, and the scheme does not particularly describe other hierarchy modules and the matching modes of the channel attention module and other hierarchy modules.
Cognitive uncertainty (model uncertainty) is essentially because observed data (training samples) are always limited, and features learned by the image recognition model are inadequate, making predictions of unobserved data (test samples) by the image recognition model uncertain. For example, there is currently an image recognition model based on existing network training for identifying "hot dogs" in an image, but the image recognition model based on existing training patterns does not accept image training for "non-hot dogs", and the existing model may predict tomato-paste-coated legs or tomato-paste-coated bananas as hot dogs, as opposed to images of real hot dogs coated with tomato paste. In fact, because "non-hotdog" images cannot be exhausted, model uncertainty is resolved from the model itself, rather than simply augmenting the data set. The probability of the training samples determines the ubiquity capacity of the model, and the model based on the Bayesian method has advantages in small data, in that prior distribution is added to each weight and deviation parameter of the model, and posterior distribution is approximated in model training through limited samples, and the posterior distribution reflects the characteristics of the overall samples more truly. The BCNN model learns the distribution of parameters on a limited training sample, so that the parameter distribution of the model approximates to the distribution of a total sample, and the uncertainty prediction of unobserved data is realized. In contrast, a good a priori distribution definition is highly dependent on domain knowledge, and the learning of the parameter distribution of the whole model is also relatively high in efficiency. To address this high dependence of domain knowledge and performance, only the attention module in the model is learned in this scheme so that the distribution of locally important information in the limited training samples approximates the distribution of the overall samples.
The attention mechanism directs computing resources toward the most informative part of the input signal, and in CNN, channel attention selectively enhances informative features and suppresses useless features by capturing dependency relationships among channels. Existing channel attention modules, some capture global dependencies among channels through MLPs (multilayer perceptron, multi-layer perceptrons), but dimension reduction; some capture the local dependency relationship among channels through convolution, the dimension is irregular, and the parameters are fewer. In the scheme, the Bayesian linear capturing local channel relation is used, and the channel attention module based on the Bayesian linear can be embedded into any existing CNN backbone network to learn the probability distribution of local important information.
Compared with the traditional convolution, the channel attention module based on Bayesian linearity learns the distribution of the parameters according to the training samples to approximate the distribution of the overall samples, and learns the variation of the distribution learning of the parameters from the points of the model parameters in the traditional convolution. A bayesian linearity can be equivalently an infinite number of conventional linearities of identical distributions, determined by the training samples, as a distribution approximating the overall samples.
In the existing Bayesian method-based model, the neural network optimization strategy is divided into MLE (Maximize Likelihood Estimation, maximum likelihood estimation) and MAP (Maximize Aposteriori Estimation, maximum posterior probability estimation), which increases the prior distribution of parameters. Bayesian linearity is considered as a probabilistic model, i.e. p (y|x, ω), y being the output under the input x and the parameter ω. Model training is based on observed dataThe (training samples) (x, y) learn the parameter ω so that ω can be used to predict the unobserved data. Based on the observed data D, the learning parameter omega can be expressed as the relation between posterior and likelihood functions and priori by Bayesian formula, namelyWhere p (y|x, ω) is a likelihood function and p (ω) is a priori distribution of model parameters ω, the edge probability of the denominator can be regarded as a normalization constant, and can be removed.
Based on Bayesian rules, the purpose of model training is to obtain posterior distribution, and the purpose of MLE method is to maximize likelihood function, i.eWhereas the goal of the MAP method is to maximize the posterior distribution, namely:
the first term corresponds to the maximum likelihood function, and the second term is a regularization term for the parameters. If Gaussian prior, the second term is equivalent to L2 regularization; in the case of Laplacian a priori, the second term is then equivalent to L1 regularization. When log p (y|x, ω) is differentiable for parameter ω, then the parameter can be updated with gradient descent (backward propagation). The MAP method is described below by taking the mean square error loss as an example.
Let likelihood function p (y|x, ω) =n (y|f (x, w), β -1 ) Obeying the mean value f (x, w) and the variance beta -1 Is a gaussian distribution of (c). Likelihood functions can be understood as linear regression of equal variances, i.e. y=f (x, w) +epsilon, epsilon-N (0, beta) -1 ) Where ε is the variance of the model predicted and true results, obeying the mean value 0 and variance β -1 Is a gaussian distribution of (c). In general, convolution operation f (x, w) =x·w, i.e., matrix multiplication of the input and the parameter weights. Let the parameter ωa priori be gaussian, i.e. p (ω) =n (0, α -1 ) Then:
in the loss function, the first term is classical linear regression, and the second term is L2 regularization, whereIs the regularization coefficient. The MLE and MAP parameter estimation methods are the best estimation idea, whereas the Bayesian inference method calculates the Bayesian posterior distribution p (ω|x, y) of parameters on the observed data to predict the unobserved data using the expectations +|>Label->I.e. < ->Predicting unobserved data with the expectation of the likelihood function is equivalent to the integration of infinite uniformly distributed maximized likelihood functions, and then expecting the integrality of the predictions makes the model difficult to process.
In this scheme, the objective function of the channel attention module is to minimize the distribution of model parameters and the KL divergence of the true Bayesian posterior distribution.
The training objective of convolutional neural network models is to learn a distribution of model parameters ω, which are θ, i.e., p (ω|θ). The KL-divergence is used to measure the difference between the two distributions, so the objective function is to minimize the KL-divergence of the distribution q (ω|θ) of the model parameter ω and the true bayesian posterior distribution p (ω|x, y). Specifically, the objective function is expressed as:
where q (ω|θ) represents the distribution of the model parameter ω with respect to the parameter θ, p (ω) represents the a priori distribution of the model parameter ω, p (y|x, ω) represents the probability model of the bayesian linear correspondence, and y represents the output under the conditions of input x and minimization of the model parameter ω. The calculation process of the objective function is as follows:
the purpose of this cost function is to learn the distribution parameter θ such that p (ω|θ) approximates the true bayesian posterior distribution p (ω|x, y). The first term is the KL divergence of the distribution p (ω|θ) to be learned and the a priori p (ω) of the model parameter ω, the cost of which is a priori; the second term is the expectation of the likelihood function, the cost of which is related to the data.
To further reduce the computational cost of the minimization of the cost function, a variational approximation is therefore employed to solve. Under certain conditions, the desired derivative may be expressed as the desired derivative, and based on the unbiased monte carlo gradient, the objective function may be expressed as:
l(θ)≈log p(ω|θ)-log p(ω)-log p(y|x,ω)
where p (ω|θ) represents the distribution of the minimized model parameter ω with respect to the parameter θ, p (ω) represents the prior distribution of the model parameter ω, p (y|x, ω) represents the probability model for which the bayesian linearity corresponds, and y represents the output under the conditions of input x and minimized model parameter ω.
The distribution of the minimized model parameter ω with respect to the parameter θ is a diagonal gaussian distribution, the sampling of the model parameter ω is a standard gaussian, and the parameter θ= (μ, ρ), wherein:
where λ is the learning rate, the parameter θ includes the mean μ and the standard deviation σ=log (1+exp (ρ)).
Such a representation of the objective function based on unbiased monte carlo gradients is an approximate representation of the cost function, and the model parameters ω can be sampled by a standard gaussian, assuming that the score p (ω|θ) to be learned is a diagonal gaussian distribution. The diagonal gaussian distribution parameter θ includes a mean μ and a standard deviation σ=log (1+exp (ρ)), and the variation posterior parameter θ= (μ, ρ) to be learned. Thus, the sampling of the model parameter ω translates into:
thus, μ and ρ are updated after each backward propagation, so that the distribution p (ω|θ) continuously approaches the true bayesian posterior distribution under the training of the observed data. Thus, the parameters μ and ρ of the distribution p (ω|θ) to be learned can be calculated:
where λ is the learning rate, the cost function l (θ) is a function of the model parameter ω, and ω is a function of the parameters μ and ρ about the distribution to which it is subjected. The general network backward propagation can be integrated to update through the variational approximation method. The difference is that the conventional convolutional layer updates the parameter ω, and the bayesian linear updates the distribution parameter θ= (μ, ρ).
The gradient in the cost function l (θ) with respect to θ= (μ, ρ), where the log p (ω) term is given a priori, independent of the distribution parameters, is 0 at the time of gradient. This leaves the given model training of the a priori distribution unconstrained, that is, the distribution to be learned p (ω|θ) is data driven, a priori independent. Overall, in the scheme, under limited observation data, the training model drives the parameter distribution to approach the real overall sample distribution.
Step S130: and outputting a recognition result of the convolutional neural network model on the image to be recognized.
In the scheme, the target recognition is performed on the image to be recognized through the pre-trained convolutional neural network model, and then the recognition result is output, specifically, the region where the recognized target is located in the image to be recognized is marked in different modes, or the image to be recognized is segmented based on the recognized target, and the specific presentation mode is realized in the prior art, and is not repeated here.
Overall, the cognitive uncertainty results from the fact that the observed data is always limited, and the limited samples learned by the model do not fully cover the features of the overall sample, so that the prediction of unobserved data is the uncertainty. The method for solving the uncertainty prediction is to introduce a Bayesian parameter estimation method, but the existing Bayesian-based model has high cost in efficiency on parameter distribution learning due to parameter expansion. According to the method, the device and the system, bayesian linearity is applied to the channel attention module in the model, the model can be guided to pay attention to information with a discriminating function locally, so that uncertainty capturing of the characteristics of local important information is achieved, uncertainty of the whole image can be effectively predicted, uncertainty prediction can be carried out, and meanwhile, the dependence relation of the global channel can be captured under the condition of dimension uncertainty.
The method comprises the steps of obtaining an image to be identified; inputting the image to be identified into a pre-trained convolutional neural network model for target identification, wherein the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on input initial multidimensional features to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the initial multidimensional features in dimension, and multiplying the linear features by the initial multidimensional features to obtain output multidimensional features of the channel attention module, and the convolutional neural network module carries out image identification on the image to be identified based on the output multidimensional features; and outputting a recognition result of the convolutional neural network model on the image to be recognized. The Bayesian linearity is added into the channel attention module of the model, a channel attention mechanism based on the Bayesian linearity is constructed in the model, and the uncertainty capture of local important information is realized, so that the problems that the existing Bayesian convolutional neural network model has more parameters, good prior distribution is difficult to give and gradient approximate estimation are solved, and the uncertainty of the whole image can be effectively predicted.
Fig. 3 is a schematic structural diagram of an image recognition device based on bayesian linearity according to an embodiment of the present invention. Referring to fig. 3, the bayesian linear based image recognition apparatus includes an image acquisition unit 210, an image recognition unit 220, and a result output unit 230.
Wherein, the image acquisition unit 210 is configured to acquire an image to be identified; the image recognition unit 220 is configured to input the image to be recognized into a pre-trained convolutional neural network model for target recognition, where the convolutional neural network model includes a channel attention module based on bayesian linearity, the channel attention module is configured to average and pool the input initial multidimensional feature to obtain a pooled feature, perform bayesian linearity on the pooled feature to obtain a linear feature, expand the linear feature to be the same as the dimension of the initial multidimensional feature, and multiply the linear feature with the initial multidimensional feature to obtain an output multidimensional feature of the channel attention module, and the convolutional neural network module performs image recognition on the image to be recognized based on the output multidimensional feature; and a result output unit 230, configured to output a recognition result of the convolutional neural network model on the image to be recognized.
On the basis of the embodiment, the objective function of the channel attention module is to minimize the distribution of model parameters and the KL divergence of the true Bayesian posterior distribution.
On the basis of the above embodiment, the objective function is expressed as:
where q (ω|θ) represents the distribution of the model parameter ω with respect to the parameter θ, p (ω) represents the a priori distribution of the model parameter ω, p (y|x, ω) represents the probability model of the bayesian linear correspondence, and y represents the output under the conditions of input x and minimization of the model parameter ω.
On the basis of the above embodiment, the objective function is expressed as:
l(θ)≈log p(ω|θ)-log p(ω)-log p(y|x,ω)
where p (ω|θ) represents the distribution of the minimized model parameter ω with respect to the parameter θ, p (ω) represents the prior distribution of the model parameter ω, p (y|x, ω) represents the probability model for which the bayesian linearity corresponds, and y represents the output under the conditions of input x and minimized model parameter ω.
On the basis of the above embodiment, the distribution of the minimized model parameter ω with respect to the parameter θ is a diagonal gaussian distribution, the sampling of the model parameter ω is a standard gaussian, and the parameter θ= (μ, ρ), wherein:
where λ is the learning rate, the parameter θ includes the mean μ and the standard deviation σ=log (1+exp (ρ)).
The image recognition device based on Bayesian linearity provided by the embodiment of the invention is contained in the electronic equipment of the equipment, can be used for executing any image recognition method based on Bayesian linearity provided by the embodiment, and has corresponding functions and beneficial effects.
It should be noted that, in the embodiment of the bayesian linear based image recognition device, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 4, the electronic device includes a processor 310, a memory 320, an input device 330, an output device 340, and a communication device 350; the number of processors 310 in the electronic device may be one or more, one processor 310 being taken as an example in fig. 4; the processor 310, the memory 320, the input device 330, the output device 340, and the communication device 350 in the electronic device may be connected by a bus or other means, which is illustrated in fig. 4 as a bus connection.
The memory 320 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the bayesian linearity-based image recognition method in the embodiment of the present invention (for example, the image acquisition unit 210, the image recognition unit 220, and the result output unit 230 in the bayesian linearity-based image recognition apparatus). The processor 310 executes various functional applications of the electronic device and data processing by running software programs, instructions and modules stored in the memory 320, i.e., implements the bayesian linear based image recognition method described above.
The input device 330 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output device 340 may include a display device such as a display screen.
The electronic equipment comprises the image recognition device based on Bayesian linearity, can be used for executing any image recognition method based on Bayesian linearity, and has corresponding functions and beneficial effects.
The embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the relevant operations in the bayesian linear based image recognition method provided in any embodiment of the present application, and have corresponding functions and beneficial effects.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product.
Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
Claims (10)
1. The image recognition method based on Bayesian linearity is characterized by comprising the following steps:
acquiring an image to be identified;
inputting the image to be identified into a pre-trained convolutional neural network model for target identification, wherein the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on input initial multidimensional features to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the initial multidimensional features in dimension, multiplying the linear features with the initial multidimensional features to obtain output multidimensional features of the channel attention module, and carrying out image identification on the image to be identified by the convolutional neural network module based on the output multidimensional features;
and outputting a recognition result of the convolutional neural network model on the image to be recognized.
2. The method of claim 1, wherein the objective function of the channel attention module is a KL-divergence that minimizes a distribution of model parameters and a true bayesian posterior distribution.
3. The method of claim 1, wherein the objective function is expressed as:
where q (ω|θ) represents the distribution of the model parameter ω with respect to the parameter θ, p (ω) represents the a priori distribution of the model parameter ω, p (y|x, ω) represents the probability model of the bayesian linear correspondence, and y represents the output under the conditions of input x and minimization of the model parameter ω.
4. The method of claim 2, wherein the objective function is expressed as:
where p (ω|θ) represents the distribution of the minimized model parameter ω with respect to the parameter θ, p (ω) represents the prior distribution of the model parameter ω, p (y|x, ω) represents the probability model corresponding to the bayesian linear layer, and y represents the output under the conditions of input x and minimized model parameter ω.
5. The method of claim 4, wherein the distribution of the minimized model parameter ω with respect to the parameter θ is a diagonal gaussian distribution, the sampling of the model parameter ω is a standard gaussian, the parameter θ= (μ, ρ), wherein:
where λ is the learning rate, the parameter θ includes the mean μ and the standard deviation σ=log (1+exp (ρ)).
6. An image recognition apparatus based on bayesian linearity, comprising:
the image acquisition unit is used for acquiring an image to be identified;
the image recognition unit is used for inputting the image to be recognized into a pre-trained convolutional neural network model for target recognition, the convolutional neural network model comprises a channel attention module based on Bayesian linearity, the channel attention module is used for carrying out average pooling on the input initial multidimensional feature to obtain pooled features, carrying out Bayesian linearity on the pooled features to obtain linear features, expanding the linear features to be the same as the dimensions of the initial multidimensional features and multiplying the linear features by the initial multidimensional features to obtain output multidimensional features of the channel attention module, and the convolutional neural network module carries out image recognition on the image to be recognized based on the output multidimensional features;
and the result output unit is used for outputting the recognition result of the convolutional neural network model on the image to be recognized.
7. The apparatus of claim 6, wherein the objective function of the channel attention module is to minimize a KL divergence of a distribution of model parameters and a true bayesian posterior distribution.
8. The apparatus of claim 6, wherein the objective function is expressed as:
where q (ω|θ) represents the distribution of the model parameter ω with respect to the parameter θ, p (ω) represents the prior distribution of the model parameter ω, p (y|x, ω) represents the probability model corresponding to the bayesian linear layer, and y represents the output under the conditions of input x and minimization of the model parameter ω.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the electronic device to implement the bayesian linear based image recognition method of any of claims 1-5.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a bayesian linear based image recognition method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111214197.5A CN116012598A (en) | 2021-10-19 | 2021-10-19 | Image recognition method, device, equipment and medium based on Bayesian linearity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111214197.5A CN116012598A (en) | 2021-10-19 | 2021-10-19 | Image recognition method, device, equipment and medium based on Bayesian linearity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116012598A true CN116012598A (en) | 2023-04-25 |
Family
ID=86023469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111214197.5A Pending CN116012598A (en) | 2021-10-19 | 2021-10-19 | Image recognition method, device, equipment and medium based on Bayesian linearity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116012598A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689966A (en) * | 2024-02-04 | 2024-03-12 | 中国科学院深圳先进技术研究院 | Quantum Bayesian neural network-based magnetic resonance image classification method |
-
2021
- 2021-10-19 CN CN202111214197.5A patent/CN116012598A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689966A (en) * | 2024-02-04 | 2024-03-12 | 中国科学院深圳先进技术研究院 | Quantum Bayesian neural network-based magnetic resonance image classification method |
CN117689966B (en) * | 2024-02-04 | 2024-05-24 | 中国科学院深圳先进技术研究院 | Quantum Bayesian neural network-based magnetic resonance image classification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10451712B1 (en) | Radar data collection and labeling for machine learning | |
US20210117760A1 (en) | Methods and apparatus to obtain well-calibrated uncertainty in deep neural networks | |
US20200104688A1 (en) | Methods and systems for neural architecture search | |
CN112487278A (en) | Training method of recommendation model, and method and device for predicting selection probability | |
US20210012183A1 (en) | Method and device for ascertaining a network configuration of a neural network | |
EP3985552A1 (en) | System for detection and management of uncertainty in perception systems | |
US20210209507A1 (en) | Processing a model trained based on a loss function | |
US20200265307A1 (en) | Apparatus and method with multi-task neural network | |
WO2020185209A1 (en) | Radar data collection and labeling for machine-learning | |
CN112906816B (en) | Target detection method and device based on optical differential and two-channel neural network | |
Demertzis et al. | Geo-AI to aid disaster response by memory-augmented deep reservoir computing | |
CA3143928C (en) | Dynamic image resolution assessment | |
CN116012598A (en) | Image recognition method, device, equipment and medium based on Bayesian linearity | |
CN116012597A (en) | Uncertainty processing method, device, equipment and medium based on Bayesian convolution | |
US11562184B2 (en) | Image-based vehicle classification | |
CN111340356A (en) | Method and apparatus for evaluating model interpretation tools | |
US20210326645A1 (en) | Robust correlation of vehicle extents and locations when given noisy detections and limited field-of-view image frames | |
CN112434629B (en) | Online time sequence action detection method and equipment | |
Revathi et al. | Weather Prediction (analysis) using Soft kind of Computing techniques | |
CN113723462B (en) | Dangerous goods detection method, dangerous goods detection device, computer equipment and storage medium | |
CN118570569B (en) | Mosquito identification method and system based on convolutional neural network | |
US20230153392A1 (en) | Control device for predicting a data point from a predictor and a method thereof | |
US20210397902A1 (en) | Image analysis device and method, and method for generating image analysis model used for same | |
US20240338563A1 (en) | Methods and apparatus for data-efficient continual adaptation to post-deployment novelties for autonomous systems | |
Qu et al. | Long‐time target tracking algorithm based on re‐detection multi‐feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |