CN111652246B - Image self-adaptive sparsization representation method and device based on deep learning - Google Patents

Image self-adaptive sparsization representation method and device based on deep learning Download PDF

Info

Publication number
CN111652246B
CN111652246B CN202010385699.3A CN202010385699A CN111652246B CN 111652246 B CN111652246 B CN 111652246B CN 202010385699 A CN202010385699 A CN 202010385699A CN 111652246 B CN111652246 B CN 111652246B
Authority
CN
China
Prior art keywords
image
feature
value
deep learning
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010385699.3A
Other languages
Chinese (zh)
Other versions
CN111652246A (en
Inventor
袁春
施诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202010385699.3A priority Critical patent/CN111652246B/en
Publication of CN111652246A publication Critical patent/CN111652246A/en
Application granted granted Critical
Publication of CN111652246B publication Critical patent/CN111652246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An image self-adaptive sparsization representation method and device based on deep learning are disclosed, the method comprises the following steps: a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M'; a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation; and A3, setting a loss function suitable for the task, and training the whole deep convolutional neural network model M' by using back propagation. According to the method, under the condition that extra time complexity and space complexity are not introduced, the recognition accuracy of the depth convolution model on computer vision tasks such as image recognition and target detection is stably improved.

Description

Image self-adaptive sparsization representation method and device based on deep learning
Technical Field
The invention relates to the field of computer vision technology and deep learning, in particular to an image self-adaptive sparsity characterization method and device based on deep learning.
Background
Computer vision is a form of taking natural scenes by a camera or generating images by a computer, and content recognition and positioning monitoring are carried out on targets in the images through electronic equipment. The task can be said to be an application of machine learning in the visual field, and is an important component in the field of artificial intelligence. The main research content of computer vision can be summarized as follows: the information that we need is obtained by acquiring pictures or videos, preprocessing and analyzing the acquired pictures or videos, and the information is often called as features. In short, cameras and electronic devices are used to capture intrinsic information of pictures or videos.
Computer vision is a comprehensive discipline that involves a wide range of fields. From the research in the current stage, computer vision attempts to establish an Artificial Intelligence system, i.e. the AI (intellectual Intelligence) system, which we often say, in recent years, the theory or technique around computer vision is mainly to extract high-dimensional features from images or videos, i.e. as an expression of image or video information.
The traditional Feature extraction method mainly depends on a manually set Feature extraction mode, such as the classic SIFT (Scale-innovative Feature Transform) Feature. The SIFT feature mainly comprises the following four basic steps: (1) Detecting extreme points in a scale space, namely searching the extreme points at each position of the multi-scale features after image scaling, and identifying potential interest points with invariance to scale and rotation through a Gaussian differential function; (2) Positioning key points, namely, on each position of multiple scales, judging whether the candidate extreme points detected in the first step are stable or not by fitting a fine model, and selecting the stable part as the key points for subsequent calculation; (3) The direction of the key points is determined, the gradient direction is determined according to the local information of the image, one or more directions are distributed to each key point, and all the subsequent operations on the image data are essentially to transform the direction, the scale and the position of the key points, so that the robust feature with rotation invariance and space invariance can be provided finally. (4) Keypoint description local gradients of an image are measured at different scales within the local domain of each keypoint. The set of all gradients of an image, i.e. the SIFT features of the image, will eventually be robust to large local shape deformations and illumination variations.
Compared with the traditional feature extraction methods such as SIFT and the like, the modern deep learning feature extraction method based on images is much simpler in design and only comprises three parts: (1) The convolution layer is used for performing convolution on local information of the image characteristics to acquire image information with local receptive fields; (2) The nonlinear layer is used for enhancing the representation capability of the output characteristics of the convolutional layer; (3) And the full connection layer is used for carrying out deformation transformation on the global information of the image characteristics to obtain the image information with the global receptive field. The learned features of the modern deep learning feature extraction method are similar to the traditional features, and are essentially a representation of the position invariance and the rotation invariance of the image content, but each neural network layer is trained according to a specific data set by using back propagation, and the more robust image information representation capability is displayed under the condition of large data.
Disclosure of Invention
The invention mainly aims to provide an image self-adaptive sparsity representation method and device based on deep learning, so that the recognition accuracy of a deep convolution model on computer vision tasks such as image recognition, target detection and the like is improved under the condition of not introducing extra time complexity and space complexity.
In order to achieve the purpose, the invention adopts the following technical scheme:
an image self-adaptive sparsification characterization method based on deep learning comprises the following steps:
a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M';
a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation;
a3, setting a loss function suitable for a task, and training a whole deep convolutional neural network model M' by utilizing back propagation;
the semi-hard attention mechanism is that the neural network learns the weight value of the image feature by using the statistical information of the image feature, and when the weight value is smaller than a set value range k, the image feature corresponding to the weight value smaller than k is reset to zero.
Further, the method comprises the following steps:
in the step A1, multiple convolution operations are used to gradually extract image local information, and then the image features with the local information are convolved, so as to extract global information.
In the step A1, the convolution operation is as shown in formula (1):
F i+1,j =conv(F i,* ) (1)
wherein Conv represents a convolution operation, F i+1,j The j-th feature, F, representing the i +1 th layer of the convolution output i,* Represents all the characteristics of the ith layer;
an attention mechanism is introduced as equation (2), and the mean of each image feature is used to determine the importance of the feature:
v i+1,j =avgpool(F i+1,j ) (2)
wherein, F i+1,j Is a two-dimensional feature with a length h and a width w, avgpool for F i+1,j This two-dimensional feature is averaged, i.e.,
Figure GDA0004095036380000031
the mean is then mapped to weights between [0,1] by a linear transformation and a nonlinear activation function:
Figure GDA0004095036380000032
wherein, v' i+1,* To note the force value, W is the learnable weight of the linear transformation, and δ represents the sigmoid activation function.
Preferably, a semi-hard attention mechanism is added every third convolution layer for feature sparseness.
In the step A2, a dynamic range function is set:
y=min(f(x),k) (4)
wherein f (x) is a linear function, x represents the iteration times in training, and is increased from 0 to the maximum iteration times, and the dynamic value range is fixed to k and does not change any more until the value of f (x) is greater than k; making the network iterate, firstly making the weight value of each image characteristic learn a local optimal solution, then gradually increasing the value range of the weight value return to zero, and finally making the network converge to a general optimal solution;
at each iteration, attention value v 'is given' i+1,j Those values that are smaller than the dynamic range y in equation (4) are set to 0 and then applied to the convolution feature of the current layer, resulting in a self-sparsified feature:
F′ i+1,* =v′ i+1,* *F i+1,* (5)。
in the step A3, the cross entropy function is used to process the image classification related task, and the mean square error loss function is used to process the target detection related task.
In the step A3, for the image classification ImageNet task, two full connection layers and one softmax layer are connected behind the one-dimensional global feature to output various predicted values, and a Cross Entropy Loss function (Cross Entropy Loss) is used for back propagation training of the whole network:
Figure GDA0004095036380000033
where n represents the number of classes, p represents the correct answer given by the label, and q represents the predicted value of the trained model output.
An adaptive image sparsification characterization device based on deep learning, which comprises a computer-readable storage medium and a processor, wherein the computer-readable storage medium stores an executable program, and the executable program is characterized in that when being executed by the processor, the adaptive image sparsification characterization device based on deep learning realizes the adaptive image sparsification characterization method based on deep learning.
A computer-readable storage medium, storing an executable program, which when executed by a processor, implements the deep learning-based image adaptive sparsification characterization method.
The invention has the following beneficial effects:
the invention provides an image self-adaptive sparsity characterization method and device based on deep learning, which can be well fused in the current mainstream deep convolution models (such as ResNet) and stably improve the recognition accuracy of the deep convolution models on computer vision tasks such as image recognition, target detection and the like under the condition of not introducing extra time complexity and space complexity. The invention simultaneously proves the effectiveness of the ImageNet data set and the COCO data set which are widely used.
By using the method, after the self-adaptive sparsization is added to any deep learning model, the generalization and robustness of the model are obviously enhanced, namely the representation capability of the image characteristics is enhanced, and meanwhile, the time complexity and the space complexity of the model are kept unchanged.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The embodiment of the invention provides an image self-adaptive sparsification characterization method based on deep learning, which comprises the following steps:
a1, selecting an arbitrary deep convolution neural network model M, and adding a deep learning method based on a semi-hard attention mechanism in each stage of convolution operation, wherein the new model is M'.
Convolution operation is a common method for extracting image features in deep learning, and multiple times of convolution operation are used for extracting image local information step by step from shallow to deep and then performing convolution on image features with the local information so as to extract global information. In the whole process, the convolution can extract both the shallow local features (such as texture information) and the high-level global features (such as semantic information) of the image. The attention mechanism means that the neural network uses statistical information (such as mean, variance, etc.) of image features of each stage to learn respective feature weights for the features respectively, wherein the weight values are between [0,1] to indicate the importance of the features. The semi-hard attention mechanism in the invention means that a value range k is set, and when a weight value is smaller than k, image features corresponding to the weight value smaller than k are zeroed, namely, sparsification operation is carried out. The purpose of this is to preserve those most important features, and the role of the unimportant feature zeroing is not to let these unimportant information influence the back propagation of the neural network, so that the trained network is more generalized and robust.
And A2, setting a linear increasing semihard attention sparse value domain for obtaining a more robust sparse image representation.
Specifically, a final value range k is set, that is, the neural network is trained to the end, and all image features with weights less than k are displayed and zeroed. In addition, a value range function can be set:
y=min(f(x),k)
here, x represents the number of iterations in training, and is incremented from 0 to the maximum number of iterations until the value of f (x) is greater than k, and the dynamic range is fixed to k and does not change. In our scheme, f (x) is set as a linear function, such as f (x) =3e-5 x, k =0.3. The image features with weight values smaller than k are zeroed, so that parameters corresponding to the features are not derivable, and the image features cannot be trained sufficiently. Therefore, in the first thousands of iterations, the value of f (x) is close to 0, the whole neural network is approximately completely derivable, the network is firstly subjected to thousands of iterations, the weight value of each image feature is learned to a local optimal solution, then the value range of the weight value is gradually increased to zero, and finally the network is converged to a total optimal solution.
And A3, setting a loss function suitable for a specific task, and training the whole deep convolutional neural network model M' by utilizing back propagation.
The method is suitable for general computer vision tasks, and the whole network model is trained based on back propagation. Cross entropy functions can be used for processing image classification related tasks, and mean square error loss functions can be used for processing related tasks such as target detection. This step and the 2 above steps can be trained jointly, and the whole process does not increase the time complexity and space complexity of the neural network. Two computer vision basic tasks of image classification and object detection are taken as examples in the embodiment, but the application range covered by the computer vision basic task is not limited to the exemplified tasks.
The embodiment of the invention provides a self-adaptive sparse representation method aiming at the aspect of application of deep learning in the field of computer vision, in particular to the aspect of extracting image information. The method can be simply merged into any model based on the deep convolution neural network. According to the method, a semi-hard attention mechanism is added to each convolution block, so that attention to effective features and zero resetting of ineffective features are achieved. For example, for image a, the features corresponding to a certain part of parameter p are zeroed, i.e. represent that these features have low weight, and thus can be discarded. Then the neural network carried by the image a propagates backward in the training process without affecting the parameter p of the model. Therefore, the method enables other training images to fully utilize the feature representation brought by the parameter p. This has the benefit of thinning out the invalid features, more effectively characterizing the large data image. It is worth noting that the method does not bring extra time complexity and space complexity in the training and testing stages, and the effect of enhancing the generalization and robustness of the model and the image representation capability is really realized.
In some particularly preferred embodiments, the method may be operated as follows.
Step A1: the core operation of this step is the addition of a semi-hard attention mechanism to the convolutional layer, where we first describe the convolutional layer:
F i+1,j =Conv(F i,* ) (1)
here, conv stands for convolution operation, F i+i,j Represents the jth feature, F, of the i +1 th layer convolution output i,* Representing all the characteristics of the ith layer. The convolution operation of a conventional convolutional network is performed by equation (1).
We next introduce a mechanism of attention where we use the mean of each feature to determine the importance of the feature itself:
v i+1,j =avgpool(F i+1,j ) (2)
here, F i+1,j Is a two-dimensional feature with a length h and a width w, avgpool for F i+1,j This two-dimensional feature is averaged and can be written as
Figure GDA0004095036380000061
Then, the means are mapped to weights between [0,1] by a linear transformation and a nonlinear activation function:
v' i+1,* =δ(Wv i+1,* ) (3)
here, W is a learnable weight of the linear transformation, δ represents a sigmoid activation function, and until the weight [0,1] of the feature itself is obtained by adding a self-attention mechanism to the feature, we will describe how to sparsify the feature at step A2. Since the order of the operands of the attention mechanism is smaller than the amount of operations of the current layer convolution operation, our method does not enter additional temporal and spatial complexity. In addition, in our model, we add a semi-hard attention mechanism every other two convolution layers for feature sparsification, which is beneficial to further reduce the computational complexity.
Step A2: the attention mechanism is generally referred to as a soft attention (soft attention) mechanism without specific description, and represents that the weight may be any value between [0,1] and is everywhere conductive. The hard attention (hard attention) mechanism means that the weight can only take one of two values of 0,1, and most of the hard attention mechanism is not conductive. In the method, a half-hard attention (half-hard attention) mechanism is adopted, namely a value range k is set, all weight values smaller than k are forced to be 0, all values larger than or equal to k keep the original values, and the method combines a hard attention mechanism and a soft attention mechanism, so that the half-hard attention mechanism is conductive within a value range larger than or equal to k and is non-conductive within a value range smaller than k, and the effect is just required.
Furthermore, to make the weight of each image feature meaningful, at initialization time, we train several rounds with a very small range of values (approximately 0), with the goal of ensuring that the features zeroed out by the semi-hard attention mechanism are indeed relatively unimportant features, and not the result of random initialization, the specific dynamic range setting is as follows:
y=min(f(x),k) (4)
here, x represents the number of iterations in the training, and is incremented from 0 to the maximum number of iterations (one iteration for a batch of images, i.e., one batch), and until the value of f (x) is greater than k, the dynamic range is fixed to k and does not change. In our scheme, f (x) is set as a linear function, such as f (x) =3e-5 x, k =0.3. It is worth noting that we do not set k to 0.5, since most features are valid in neural networks, discarding too many (half or more) feature channels results in a large drop in performance, and the opposite effect.
At each iteration, attention value v 'of formula (3)' i+1,j Those values that are smaller than the dynamic range y in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′ i+1,* =v i+1,* *F i+1,* (5)
step A3: in the first two steps, a high-dimensional vector representing image features is usually output, i.e. the last two-dimensional features of the convolutional neural network are input into an avgpool layer (see formula 2), and one-dimensional global features are obtained. Taking the image classification ImageNet task as an example, two full-connection layers and one softmax layer are connected behind one-dimensional global features to output various predicted values, and a Cross Entropy Loss function (Cross Engine Loss) is used for back propagation training of the whole network:
Figure GDA0004095036380000071
where n represents the number of classes (e.g., imageNet n = 1000), p represents the correct answer given by the label, and q represents the predicted value of the model output we have trained.
Compared with a deep convolution model ResNet50, the model trained by the method is represented in the following table 1 in ImageNet image classification, and is represented in the following table 2 in COCO target detection.
TABLE 1
Model representation on ImageNet image classification Accuracy of measurement
ResNet50 76.2%
ResNet50+ feature sparsification (method) 77.3%
TABLE 2
Model Performance on COCO target detection mAP
FCOS(ResNet50) 38.7%
FCOS (ResNet 50) + thinning (method) 39.3%
End-to-end training can be realized through the steps A1-A3, with the increase of training iteration times, the value range in the step A2 is higher and higher, the characteristics output by the whole model become sparse, and the sparse redundant parameters provide better characteristic expression for other inputs.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments and it is not intended to limit the invention to the specific embodiments described. It will be apparent to those skilled in the art that numerous alterations and modifications can be made to the described embodiments without departing from the inventive concepts herein, and such alterations and modifications are to be considered as within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims (8)

1. An image self-adaptive sparsification characterization method based on deep learning is characterized by comprising the following steps:
a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M';
a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation;
a3, setting a loss function suitable for a task, and training a whole deep convolutional neural network model M' by utilizing back propagation;
the semi-hard attention mechanism is that a neural network learns the weight value of the image features by utilizing statistical information of the image features, and when the weight value is smaller than a set value range k, the image features corresponding to the weight value smaller than k are returned to zero;
in the step A2, a dynamic range function is set:
y=min(f(x),k) (4)
wherein f (x) is a linear function, x represents the iteration times in training, and is increased from 0 to the maximum iteration times, and the value of the dynamic value domain y is fixed to k and does not change any more until the value of f (x) is greater than k; making the network iterate, firstly making the weight value of each image characteristic learn a local optimal solution, then gradually increasing the value range of the weight value return to zero, and finally making the network converge to a general optimal solution;
at each iteration, attention value v' i+1,j Those values that are smaller than the dynamic range of values in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′ i+1,* =v′ i+1,* *F i+1,* (5);
wherein F i+1,* Denotes the i +1 th feature map, v' i+1,* Is a semi-hard attention value, F ', corresponding to the ith +1 feature map' i+1,* The feature map is a feature map obtained by multiplying the first two and then thinning the first two.
2. The method according to claim 1, wherein in step A1, the local information of the image is gradually extracted by using a plurality of convolution operations, and then the image features with the local information are convolved, so as to extract the global information.
3. The method of claim 1, wherein in step A1, the convolution operation is as in formula (1):
F i+1,j =Conv(F i,* ) (1)
wherein Conv stands for convolution operation, F i+1,j The j-th feature, F, representing the i +1 th layer of the convolution output i,* Represents all the characteristics of the ith layer;
an attention-calling mechanism is given as equation (2) using the mean v of each image feature i+1,j To determine the importance of the feature:
v i+1,j =avgpool(F i+1,j ) (2)
wherein, F i+1,j Is a two-dimensional feature with a length h and a width w, avgpool for F i+1,j This two-dimensional feature is averaged, i.e.,
Figure FDA0004100842740000021
the mean is then mapped to weights between [0,1] by a linear transformation and a nonlinear activation function:
v' i+1,* =δ(Wv i+1,* ) (3)
wherein, v' i+1,* For attention value, wv i+1,* δ represents the sigmoid activation function, which is a learnable weight for a linear transformation.
4. A method as claimed in any one of claims 1 to 3, wherein a semi-hard attention mechanism is applied to every third convolution layer for feature sparsification.
5. The method according to any one of claims 1 to 3, wherein in step A3, the image classification related task is processed by using a cross entropy function, and the target detection related task is processed by using a mean square error loss function.
6. The method according to any one of claims 1 to 3, wherein in the step A3, for the image classification ImageNet task, the predicted values of each class are output after a one-dimensional global feature by two fully-connected layers and a softmax layer, and the whole network is trained by using Cross Entropy Loss function (Cross Entropy Loss) backpropagation:
Figure FDA0004100842740000022
where n represents the number of classes, p represents the correct answer given by the label, and q represents the predicted value of the trained model output.
7. An adaptive image sparsification characterization device based on deep learning, comprising a computer readable storage medium and a processor, wherein the executable program is stored in the computer readable storage medium, and when being executed by the processor, the executable program realizes the adaptive image sparsification characterization method based on deep learning according to any one of claims 1 to 6.
8. A computer-readable storage medium storing an executable program, wherein the executable program, when executed by a processor, implements the adaptive image sparsification characterization method based on deep learning according to any one of claims 1 to 6.
CN202010385699.3A 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning Active CN111652246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010385699.3A CN111652246B (en) 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010385699.3A CN111652246B (en) 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning

Publications (2)

Publication Number Publication Date
CN111652246A CN111652246A (en) 2020-09-11
CN111652246B true CN111652246B (en) 2023-04-18

Family

ID=72342551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010385699.3A Active CN111652246B (en) 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN111652246B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114563763B (en) * 2022-01-21 2022-10-21 青海师范大学 Underwater sensor network node distance measurement positioning method based on return-to-zero neurodynamics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
CN110827312A (en) * 2019-11-12 2020-02-21 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
CN110827312A (en) * 2019-11-12 2020-02-21 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model

Also Published As

Publication number Publication date
CN111652246A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
Cao et al. DenseUNet: densely connected UNet for electron microscopy image segmentation
Xie et al. POLSAR image classification via Wishart-AE model or Wishart-CAE model
Jiang et al. Hyperspectral image classification with spatial consistence using fully convolutional spatial propagation network
Chen et al. Learning linear regression via single-convolutional layer for visual object tracking
CN107862680B (en) Target tracking optimization method based on correlation filter
Taghanaki et al. Robust representation learning via perceptual similarity metrics
Etezadifar et al. A new sample consensus based on sparse coding for improved matching of SIFT features on remote sensing images
CN112967210B (en) Unmanned aerial vehicle image denoising method based on full convolution twin network
JP6107531B2 (en) Feature extraction program and information processing apparatus
Alom et al. Object recognition using cellular simultaneous recurrent networks and convolutional neural network
CN113642602B (en) Multi-label image classification method based on global and local label relation
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
Liu et al. Noise robust face hallucination based on smooth correntropy representation
CN112329771A (en) Building material sample identification method based on deep learning
CN116258877A (en) Land utilization scene similarity change detection method, device, medium and equipment
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN111652246B (en) Image self-adaptive sparsization representation method and device based on deep learning
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN111985487B (en) Remote sensing image target extraction method, electronic equipment and storage medium
CN117853596A (en) Unmanned aerial vehicle remote sensing mapping method and system
CN117611838A (en) Multi-label image classification method based on self-adaptive hypergraph convolutional network
Xu et al. SAR target recognition based on variational autoencoder
Caglayan et al. 3D convolutional object recognition using volumetric representations of depth data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant