CN111652246B - Image self-adaptive sparsization representation method and device based on deep learning - Google Patents
Image self-adaptive sparsization representation method and device based on deep learning Download PDFInfo
- Publication number
- CN111652246B CN111652246B CN202010385699.3A CN202010385699A CN111652246B CN 111652246 B CN111652246 B CN 111652246B CN 202010385699 A CN202010385699 A CN 202010385699A CN 111652246 B CN111652246 B CN 111652246B
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- value
- deep learning
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000013135 deep learning Methods 0.000 title claims abstract description 25
- 230000007246 mechanism Effects 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims abstract description 9
- 230000001965 increasing effect Effects 0.000 claims abstract description 9
- 238000012512 characterization method Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 238000012886 linear function Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000004075 alteration Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/513—Sparse representations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
An image self-adaptive sparsization representation method and device based on deep learning are disclosed, the method comprises the following steps: a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M'; a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation; and A3, setting a loss function suitable for the task, and training the whole deep convolutional neural network model M' by using back propagation. According to the method, under the condition that extra time complexity and space complexity are not introduced, the recognition accuracy of the depth convolution model on computer vision tasks such as image recognition and target detection is stably improved.
Description
Technical Field
The invention relates to the field of computer vision technology and deep learning, in particular to an image self-adaptive sparsity characterization method and device based on deep learning.
Background
Computer vision is a form of taking natural scenes by a camera or generating images by a computer, and content recognition and positioning monitoring are carried out on targets in the images through electronic equipment. The task can be said to be an application of machine learning in the visual field, and is an important component in the field of artificial intelligence. The main research content of computer vision can be summarized as follows: the information that we need is obtained by acquiring pictures or videos, preprocessing and analyzing the acquired pictures or videos, and the information is often called as features. In short, cameras and electronic devices are used to capture intrinsic information of pictures or videos.
Computer vision is a comprehensive discipline that involves a wide range of fields. From the research in the current stage, computer vision attempts to establish an Artificial Intelligence system, i.e. the AI (intellectual Intelligence) system, which we often say, in recent years, the theory or technique around computer vision is mainly to extract high-dimensional features from images or videos, i.e. as an expression of image or video information.
The traditional Feature extraction method mainly depends on a manually set Feature extraction mode, such as the classic SIFT (Scale-innovative Feature Transform) Feature. The SIFT feature mainly comprises the following four basic steps: (1) Detecting extreme points in a scale space, namely searching the extreme points at each position of the multi-scale features after image scaling, and identifying potential interest points with invariance to scale and rotation through a Gaussian differential function; (2) Positioning key points, namely, on each position of multiple scales, judging whether the candidate extreme points detected in the first step are stable or not by fitting a fine model, and selecting the stable part as the key points for subsequent calculation; (3) The direction of the key points is determined, the gradient direction is determined according to the local information of the image, one or more directions are distributed to each key point, and all the subsequent operations on the image data are essentially to transform the direction, the scale and the position of the key points, so that the robust feature with rotation invariance and space invariance can be provided finally. (4) Keypoint description local gradients of an image are measured at different scales within the local domain of each keypoint. The set of all gradients of an image, i.e. the SIFT features of the image, will eventually be robust to large local shape deformations and illumination variations.
Compared with the traditional feature extraction methods such as SIFT and the like, the modern deep learning feature extraction method based on images is much simpler in design and only comprises three parts: (1) The convolution layer is used for performing convolution on local information of the image characteristics to acquire image information with local receptive fields; (2) The nonlinear layer is used for enhancing the representation capability of the output characteristics of the convolutional layer; (3) And the full connection layer is used for carrying out deformation transformation on the global information of the image characteristics to obtain the image information with the global receptive field. The learned features of the modern deep learning feature extraction method are similar to the traditional features, and are essentially a representation of the position invariance and the rotation invariance of the image content, but each neural network layer is trained according to a specific data set by using back propagation, and the more robust image information representation capability is displayed under the condition of large data.
Disclosure of Invention
The invention mainly aims to provide an image self-adaptive sparsity representation method and device based on deep learning, so that the recognition accuracy of a deep convolution model on computer vision tasks such as image recognition, target detection and the like is improved under the condition of not introducing extra time complexity and space complexity.
In order to achieve the purpose, the invention adopts the following technical scheme:
an image self-adaptive sparsification characterization method based on deep learning comprises the following steps:
a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M';
a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation;
a3, setting a loss function suitable for a task, and training a whole deep convolutional neural network model M' by utilizing back propagation;
the semi-hard attention mechanism is that the neural network learns the weight value of the image feature by using the statistical information of the image feature, and when the weight value is smaller than a set value range k, the image feature corresponding to the weight value smaller than k is reset to zero.
Further, the method comprises the following steps:
in the step A1, multiple convolution operations are used to gradually extract image local information, and then the image features with the local information are convolved, so as to extract global information.
In the step A1, the convolution operation is as shown in formula (1):
F i+1,j =conv(F i,* ) (1)
wherein Conv represents a convolution operation, F i+1,j The j-th feature, F, representing the i +1 th layer of the convolution output i,* Represents all the characteristics of the ith layer;
an attention mechanism is introduced as equation (2), and the mean of each image feature is used to determine the importance of the feature:
v i+1,j =avgpool(F i+1,j ) (2)
wherein, F i+1,j Is a two-dimensional feature with a length h and a width w, avgpool for F i+1,j This two-dimensional feature is averaged, i.e.,
the mean is then mapped to weights between [0,1] by a linear transformation and a nonlinear activation function:
wherein, v' i+1,* To note the force value, W is the learnable weight of the linear transformation, and δ represents the sigmoid activation function.
Preferably, a semi-hard attention mechanism is added every third convolution layer for feature sparseness.
In the step A2, a dynamic range function is set:
y=min(f(x),k) (4)
wherein f (x) is a linear function, x represents the iteration times in training, and is increased from 0 to the maximum iteration times, and the dynamic value range is fixed to k and does not change any more until the value of f (x) is greater than k; making the network iterate, firstly making the weight value of each image characteristic learn a local optimal solution, then gradually increasing the value range of the weight value return to zero, and finally making the network converge to a general optimal solution;
at each iteration, attention value v 'is given' i+1,j Those values that are smaller than the dynamic range y in equation (4) are set to 0 and then applied to the convolution feature of the current layer, resulting in a self-sparsified feature:
F′ i+1,* =v′ i+1,* *F i+1,* (5)。
in the step A3, the cross entropy function is used to process the image classification related task, and the mean square error loss function is used to process the target detection related task.
In the step A3, for the image classification ImageNet task, two full connection layers and one softmax layer are connected behind the one-dimensional global feature to output various predicted values, and a Cross Entropy Loss function (Cross Entropy Loss) is used for back propagation training of the whole network:
where n represents the number of classes, p represents the correct answer given by the label, and q represents the predicted value of the trained model output.
An adaptive image sparsification characterization device based on deep learning, which comprises a computer-readable storage medium and a processor, wherein the computer-readable storage medium stores an executable program, and the executable program is characterized in that when being executed by the processor, the adaptive image sparsification characterization device based on deep learning realizes the adaptive image sparsification characterization method based on deep learning.
A computer-readable storage medium, storing an executable program, which when executed by a processor, implements the deep learning-based image adaptive sparsification characterization method.
The invention has the following beneficial effects:
the invention provides an image self-adaptive sparsity characterization method and device based on deep learning, which can be well fused in the current mainstream deep convolution models (such as ResNet) and stably improve the recognition accuracy of the deep convolution models on computer vision tasks such as image recognition, target detection and the like under the condition of not introducing extra time complexity and space complexity. The invention simultaneously proves the effectiveness of the ImageNet data set and the COCO data set which are widely used.
By using the method, after the self-adaptive sparsization is added to any deep learning model, the generalization and robustness of the model are obviously enhanced, namely the representation capability of the image characteristics is enhanced, and meanwhile, the time complexity and the space complexity of the model are kept unchanged.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The embodiment of the invention provides an image self-adaptive sparsification characterization method based on deep learning, which comprises the following steps:
a1, selecting an arbitrary deep convolution neural network model M, and adding a deep learning method based on a semi-hard attention mechanism in each stage of convolution operation, wherein the new model is M'.
Convolution operation is a common method for extracting image features in deep learning, and multiple times of convolution operation are used for extracting image local information step by step from shallow to deep and then performing convolution on image features with the local information so as to extract global information. In the whole process, the convolution can extract both the shallow local features (such as texture information) and the high-level global features (such as semantic information) of the image. The attention mechanism means that the neural network uses statistical information (such as mean, variance, etc.) of image features of each stage to learn respective feature weights for the features respectively, wherein the weight values are between [0,1] to indicate the importance of the features. The semi-hard attention mechanism in the invention means that a value range k is set, and when a weight value is smaller than k, image features corresponding to the weight value smaller than k are zeroed, namely, sparsification operation is carried out. The purpose of this is to preserve those most important features, and the role of the unimportant feature zeroing is not to let these unimportant information influence the back propagation of the neural network, so that the trained network is more generalized and robust.
And A2, setting a linear increasing semihard attention sparse value domain for obtaining a more robust sparse image representation.
Specifically, a final value range k is set, that is, the neural network is trained to the end, and all image features with weights less than k are displayed and zeroed. In addition, a value range function can be set:
y=min(f(x),k)
here, x represents the number of iterations in training, and is incremented from 0 to the maximum number of iterations until the value of f (x) is greater than k, and the dynamic range is fixed to k and does not change. In our scheme, f (x) is set as a linear function, such as f (x) =3e-5 x, k =0.3. The image features with weight values smaller than k are zeroed, so that parameters corresponding to the features are not derivable, and the image features cannot be trained sufficiently. Therefore, in the first thousands of iterations, the value of f (x) is close to 0, the whole neural network is approximately completely derivable, the network is firstly subjected to thousands of iterations, the weight value of each image feature is learned to a local optimal solution, then the value range of the weight value is gradually increased to zero, and finally the network is converged to a total optimal solution.
And A3, setting a loss function suitable for a specific task, and training the whole deep convolutional neural network model M' by utilizing back propagation.
The method is suitable for general computer vision tasks, and the whole network model is trained based on back propagation. Cross entropy functions can be used for processing image classification related tasks, and mean square error loss functions can be used for processing related tasks such as target detection. This step and the 2 above steps can be trained jointly, and the whole process does not increase the time complexity and space complexity of the neural network. Two computer vision basic tasks of image classification and object detection are taken as examples in the embodiment, but the application range covered by the computer vision basic task is not limited to the exemplified tasks.
The embodiment of the invention provides a self-adaptive sparse representation method aiming at the aspect of application of deep learning in the field of computer vision, in particular to the aspect of extracting image information. The method can be simply merged into any model based on the deep convolution neural network. According to the method, a semi-hard attention mechanism is added to each convolution block, so that attention to effective features and zero resetting of ineffective features are achieved. For example, for image a, the features corresponding to a certain part of parameter p are zeroed, i.e. represent that these features have low weight, and thus can be discarded. Then the neural network carried by the image a propagates backward in the training process without affecting the parameter p of the model. Therefore, the method enables other training images to fully utilize the feature representation brought by the parameter p. This has the benefit of thinning out the invalid features, more effectively characterizing the large data image. It is worth noting that the method does not bring extra time complexity and space complexity in the training and testing stages, and the effect of enhancing the generalization and robustness of the model and the image representation capability is really realized.
In some particularly preferred embodiments, the method may be operated as follows.
Step A1: the core operation of this step is the addition of a semi-hard attention mechanism to the convolutional layer, where we first describe the convolutional layer:
F i+1,j =Conv(F i,* ) (1)
here, conv stands for convolution operation, F i+i,j Represents the jth feature, F, of the i +1 th layer convolution output i,* Representing all the characteristics of the ith layer. The convolution operation of a conventional convolutional network is performed by equation (1).
We next introduce a mechanism of attention where we use the mean of each feature to determine the importance of the feature itself:
v i+1,j =avgpool(F i+1,j ) (2)
here, F i+1,j Is a two-dimensional feature with a length h and a width w, avgpool for F i+1,j This two-dimensional feature is averaged and can be written as
Then, the means are mapped to weights between [0,1] by a linear transformation and a nonlinear activation function:
v' i+1,* =δ(Wv i+1,* ) (3)
here, W is a learnable weight of the linear transformation, δ represents a sigmoid activation function, and until the weight [0,1] of the feature itself is obtained by adding a self-attention mechanism to the feature, we will describe how to sparsify the feature at step A2. Since the order of the operands of the attention mechanism is smaller than the amount of operations of the current layer convolution operation, our method does not enter additional temporal and spatial complexity. In addition, in our model, we add a semi-hard attention mechanism every other two convolution layers for feature sparsification, which is beneficial to further reduce the computational complexity.
Step A2: the attention mechanism is generally referred to as a soft attention (soft attention) mechanism without specific description, and represents that the weight may be any value between [0,1] and is everywhere conductive. The hard attention (hard attention) mechanism means that the weight can only take one of two values of 0,1, and most of the hard attention mechanism is not conductive. In the method, a half-hard attention (half-hard attention) mechanism is adopted, namely a value range k is set, all weight values smaller than k are forced to be 0, all values larger than or equal to k keep the original values, and the method combines a hard attention mechanism and a soft attention mechanism, so that the half-hard attention mechanism is conductive within a value range larger than or equal to k and is non-conductive within a value range smaller than k, and the effect is just required.
Furthermore, to make the weight of each image feature meaningful, at initialization time, we train several rounds with a very small range of values (approximately 0), with the goal of ensuring that the features zeroed out by the semi-hard attention mechanism are indeed relatively unimportant features, and not the result of random initialization, the specific dynamic range setting is as follows:
y=min(f(x),k) (4)
here, x represents the number of iterations in the training, and is incremented from 0 to the maximum number of iterations (one iteration for a batch of images, i.e., one batch), and until the value of f (x) is greater than k, the dynamic range is fixed to k and does not change. In our scheme, f (x) is set as a linear function, such as f (x) =3e-5 x, k =0.3. It is worth noting that we do not set k to 0.5, since most features are valid in neural networks, discarding too many (half or more) feature channels results in a large drop in performance, and the opposite effect.
At each iteration, attention value v 'of formula (3)' i+1,j Those values that are smaller than the dynamic range y in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′ i+1,* =v i+1,* *F i+1,* (5)
step A3: in the first two steps, a high-dimensional vector representing image features is usually output, i.e. the last two-dimensional features of the convolutional neural network are input into an avgpool layer (see formula 2), and one-dimensional global features are obtained. Taking the image classification ImageNet task as an example, two full-connection layers and one softmax layer are connected behind one-dimensional global features to output various predicted values, and a Cross Entropy Loss function (Cross Engine Loss) is used for back propagation training of the whole network:
where n represents the number of classes (e.g., imageNet n = 1000), p represents the correct answer given by the label, and q represents the predicted value of the model output we have trained.
Compared with a deep convolution model ResNet50, the model trained by the method is represented in the following table 1 in ImageNet image classification, and is represented in the following table 2 in COCO target detection.
TABLE 1
Model representation on ImageNet image classification | Accuracy of measurement |
ResNet50 | 76.2% |
ResNet50+ feature sparsification (method) | 77.3% |
TABLE 2
Model Performance on COCO target detection | mAP |
FCOS(ResNet50) | 38.7% |
FCOS (ResNet 50) + thinning (method) | 39.3% |
End-to-end training can be realized through the steps A1-A3, with the increase of training iteration times, the value range in the step A2 is higher and higher, the characteristics output by the whole model become sparse, and the sparse redundant parameters provide better characteristic expression for other inputs.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments and it is not intended to limit the invention to the specific embodiments described. It will be apparent to those skilled in the art that numerous alterations and modifications can be made to the described embodiments without departing from the inventive concepts herein, and such alterations and modifications are to be considered as within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.
Claims (8)
1. An image self-adaptive sparsification characterization method based on deep learning is characterized by comprising the following steps:
a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M';
a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation;
a3, setting a loss function suitable for a task, and training a whole deep convolutional neural network model M' by utilizing back propagation;
the semi-hard attention mechanism is that a neural network learns the weight value of the image features by utilizing statistical information of the image features, and when the weight value is smaller than a set value range k, the image features corresponding to the weight value smaller than k are returned to zero;
in the step A2, a dynamic range function is set:
y=min(f(x),k) (4)
wherein f (x) is a linear function, x represents the iteration times in training, and is increased from 0 to the maximum iteration times, and the value of the dynamic value domain y is fixed to k and does not change any more until the value of f (x) is greater than k; making the network iterate, firstly making the weight value of each image characteristic learn a local optimal solution, then gradually increasing the value range of the weight value return to zero, and finally making the network converge to a general optimal solution;
at each iteration, attention value v' i+1,j Those values that are smaller than the dynamic range of values in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′ i+1,* =v′ i+1,* *F i+1,* (5);
wherein F i+1,* Denotes the i +1 th feature map, v' i+1,* Is a semi-hard attention value, F ', corresponding to the ith +1 feature map' i+1,* The feature map is a feature map obtained by multiplying the first two and then thinning the first two.
2. The method according to claim 1, wherein in step A1, the local information of the image is gradually extracted by using a plurality of convolution operations, and then the image features with the local information are convolved, so as to extract the global information.
3. The method of claim 1, wherein in step A1, the convolution operation is as in formula (1):
F i+1,j =Conv(F i,* ) (1)
wherein Conv stands for convolution operation, F i+1,j The j-th feature, F, representing the i +1 th layer of the convolution output i,* Represents all the characteristics of the ith layer;
an attention-calling mechanism is given as equation (2) using the mean v of each image feature i+1,j To determine the importance of the feature:
v i+1,j =avgpool(F i+1,j ) (2)
wherein, F i+1,j Is a two-dimensional feature with a length h and a width w, avgpool for F i+1,j This two-dimensional feature is averaged, i.e.,
the mean is then mapped to weights between [0,1] by a linear transformation and a nonlinear activation function:
v' i+1,* =δ(Wv i+1,* ) (3)
wherein, v' i+1,* For attention value, wv i+1,* δ represents the sigmoid activation function, which is a learnable weight for a linear transformation.
4. A method as claimed in any one of claims 1 to 3, wherein a semi-hard attention mechanism is applied to every third convolution layer for feature sparsification.
5. The method according to any one of claims 1 to 3, wherein in step A3, the image classification related task is processed by using a cross entropy function, and the target detection related task is processed by using a mean square error loss function.
6. The method according to any one of claims 1 to 3, wherein in the step A3, for the image classification ImageNet task, the predicted values of each class are output after a one-dimensional global feature by two fully-connected layers and a softmax layer, and the whole network is trained by using Cross Entropy Loss function (Cross Entropy Loss) backpropagation:
where n represents the number of classes, p represents the correct answer given by the label, and q represents the predicted value of the trained model output.
7. An adaptive image sparsification characterization device based on deep learning, comprising a computer readable storage medium and a processor, wherein the executable program is stored in the computer readable storage medium, and when being executed by the processor, the executable program realizes the adaptive image sparsification characterization method based on deep learning according to any one of claims 1 to 6.
8. A computer-readable storage medium storing an executable program, wherein the executable program, when executed by a processor, implements the adaptive image sparsification characterization method based on deep learning according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010385699.3A CN111652246B (en) | 2020-05-09 | 2020-05-09 | Image self-adaptive sparsization representation method and device based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010385699.3A CN111652246B (en) | 2020-05-09 | 2020-05-09 | Image self-adaptive sparsization representation method and device based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111652246A CN111652246A (en) | 2020-09-11 |
CN111652246B true CN111652246B (en) | 2023-04-18 |
Family
ID=72342551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010385699.3A Active CN111652246B (en) | 2020-05-09 | 2020-05-09 | Image self-adaptive sparsization representation method and device based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111652246B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114563763B (en) * | 2022-01-21 | 2022-10-21 | 青海师范大学 | Underwater sensor network node distance measurement positioning method based on return-to-zero neurodynamics |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871136A (en) * | 2017-03-22 | 2018-04-03 | 中山大学 | The image-recognizing method of convolutional neural networks based on openness random pool |
CN110827312A (en) * | 2019-11-12 | 2020-02-21 | 北京深境智能科技有限公司 | Learning method based on cooperative visual attention neural network |
CN111046962A (en) * | 2019-12-16 | 2020-04-21 | 中国人民解放军战略支援部队信息工程大学 | Sparse attention-based feature visualization method and system for convolutional neural network model |
-
2020
- 2020-05-09 CN CN202010385699.3A patent/CN111652246B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871136A (en) * | 2017-03-22 | 2018-04-03 | 中山大学 | The image-recognizing method of convolutional neural networks based on openness random pool |
CN110827312A (en) * | 2019-11-12 | 2020-02-21 | 北京深境智能科技有限公司 | Learning method based on cooperative visual attention neural network |
CN111046962A (en) * | 2019-12-16 | 2020-04-21 | 中国人民解放军战略支援部队信息工程大学 | Sparse attention-based feature visualization method and system for convolutional neural network model |
Also Published As
Publication number | Publication date |
---|---|
CN111652246A (en) | 2020-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135366B (en) | Shielded pedestrian re-identification method based on multi-scale generation countermeasure network | |
Cao et al. | DenseUNet: densely connected UNet for electron microscopy image segmentation | |
Xie et al. | POLSAR image classification via Wishart-AE model or Wishart-CAE model | |
Jiang et al. | Hyperspectral image classification with spatial consistence using fully convolutional spatial propagation network | |
Chen et al. | Learning linear regression via single-convolutional layer for visual object tracking | |
CN107862680B (en) | Target tracking optimization method based on correlation filter | |
Taghanaki et al. | Robust representation learning via perceptual similarity metrics | |
Etezadifar et al. | A new sample consensus based on sparse coding for improved matching of SIFT features on remote sensing images | |
CN112967210B (en) | Unmanned aerial vehicle image denoising method based on full convolution twin network | |
JP6107531B2 (en) | Feature extraction program and information processing apparatus | |
Alom et al. | Object recognition using cellular simultaneous recurrent networks and convolutional neural network | |
CN113642602B (en) | Multi-label image classification method based on global and local label relation | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN111179270A (en) | Image co-segmentation method and device based on attention mechanism | |
Liu et al. | Noise robust face hallucination based on smooth correntropy representation | |
CN112329771A (en) | Building material sample identification method based on deep learning | |
CN116258877A (en) | Land utilization scene similarity change detection method, device, medium and equipment | |
CN110135435B (en) | Saliency detection method and device based on breadth learning system | |
CN111652246B (en) | Image self-adaptive sparsization representation method and device based on deep learning | |
CN112668662B (en) | Outdoor mountain forest environment target detection method based on improved YOLOv3 network | |
CN111985487B (en) | Remote sensing image target extraction method, electronic equipment and storage medium | |
CN117853596A (en) | Unmanned aerial vehicle remote sensing mapping method and system | |
CN117611838A (en) | Multi-label image classification method based on self-adaptive hypergraph convolutional network | |
Xu et al. | SAR target recognition based on variational autoencoder | |
Caglayan et al. | 3D convolutional object recognition using volumetric representations of depth data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |