CN111652246A - Image self-adaptive sparsization representation method and device based on deep learning - Google Patents

Image self-adaptive sparsization representation method and device based on deep learning Download PDF

Info

Publication number
CN111652246A
CN111652246A CN202010385699.3A CN202010385699A CN111652246A CN 111652246 A CN111652246 A CN 111652246A CN 202010385699 A CN202010385699 A CN 202010385699A CN 111652246 A CN111652246 A CN 111652246A
Authority
CN
China
Prior art keywords
image
feature
convolution
deep learning
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010385699.3A
Other languages
Chinese (zh)
Other versions
CN111652246B (en
Inventor
袁春
施诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202010385699.3A priority Critical patent/CN111652246B/en
Publication of CN111652246A publication Critical patent/CN111652246A/en
Application granted granted Critical
Publication of CN111652246B publication Critical patent/CN111652246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An image self-adaptive sparsization representation method and device based on deep learning are disclosed, the method comprises the following steps: a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolution operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M'; a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation; a3, setting a loss function suitable for the task, and training the whole deep convolution neural network model M' by using back propagation. According to the method, under the condition that extra time complexity and space complexity are not introduced, the recognition accuracy of the depth convolution model on computer vision tasks such as image recognition and target detection is stably improved.

Description

Image self-adaptive sparsization representation method and device based on deep learning
Technical Field
The invention relates to the field of computer vision technology and deep learning, in particular to an image self-adaptive sparsity characterization method and device based on deep learning.
Background
Computer vision is a form of natural scenery shot by a camera or generated into an image by a computer, and content recognition and positioning monitoring are carried out on objects in the image through electronic equipment. The task can be said to be an application of machine learning in the visual field, and is an important component in the field of artificial intelligence. The main research content of computer vision can be summarized as follows: the information that we need is obtained by acquiring pictures or videos, preprocessing and analyzing the acquired pictures or videos, and the information is often called as features. In short, cameras and electronic devices are used to capture intrinsic information of pictures or videos.
Computer vision is a comprehensive discipline that relates to a wide field. From the current stage of research, computer vision attempts to establish an artificial intelligence system, namely the ai (intellectual intelligence) system, which we often say, in recent years, the theory or technique around computer vision is mainly to extract high-dimensional features from images or videos, namely, as an expression of image or video information.
The traditional Feature extraction method mainly depends on a manually set Feature extraction mode, such as the classic SIFT (Scale-innovative Feature Transform) Feature. The SIFT feature mainly has the following four basic steps: (1) detecting extreme points in a scale space, namely searching the extreme points at each position of the multi-scale features after image scaling, and identifying potential interest points with invariance to scale and rotation through a Gaussian differential function; (2) positioning key points, namely, on each position of multiple scales, judging whether the candidate extreme points detected in the first step are stable or not by fitting a fine model, and selecting the stable part as the key points for subsequent calculation; (3) the direction of the key points is determined, the gradient direction is determined according to the local information of the image, one or more directions are distributed to each key point, and all the subsequent operations on the image data are essentially to transform the direction, the scale and the position of the key points, so that the robust feature with rotation invariance and space invariance can be provided finally. (4) Keypoint description local gradients of an image are measured at different scales within the local domain of each keypoint. The set of all gradients of an image, i.e. the SIFT features of the image, will eventually be robust to large local shape deformations and illumination variations.
Compared with the traditional feature extraction methods such as SIFT and the like, the modern deep learning feature extraction method based on images is much simpler in design and only comprises three parts: (1) the convolution layer is used for performing convolution on local information of the image characteristics to acquire image information with local receptive fields; (2) the nonlinear layer is used for enhancing the representation capability of the output characteristics of the convolutional layer; (3) and the full connection layer is used for carrying out deformation transformation on the global information of the image characteristics to obtain the image information with the global receptive field. The learned features of the modern deep learning feature extraction method are similar to the traditional features, and are essentially a representation of the position invariance and the rotation invariance of the image content, but each neural network layer is trained according to a specific data set by using back propagation, and the more robust image information representation capability is displayed under the condition of large data.
Disclosure of Invention
The invention mainly aims to provide an image self-adaptive sparsity representation method and device based on deep learning, so that the recognition accuracy of a deep convolution model on computer vision tasks such as image recognition, target detection and the like is improved under the condition of not introducing extra time complexity and space complexity.
In order to achieve the purpose, the invention adopts the following technical scheme:
an image self-adaptive sparsification characterization method based on deep learning comprises the following steps:
a1, selecting an arbitrary deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolution operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M';
a2, setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation;
a3, setting a suitable loss function for the task, and training the whole deep convolution neural network model M' by using back propagation;
the semi-hard attention mechanism is that the neural network learns the weight value of the image feature by using the statistical information of the image feature, and when the weight value is smaller than a set value range k, the image feature corresponding to the weight value smaller than k is reset to zero.
Further:
in step a1, the local information of the image is gradually extracted by using a plurality of convolution operations, and then the image features with the local information are convolved, so as to extract the global information.
In step a1, the convolution operation is as follows:
Fi+1,j=Conv(Fi,*) (1)
wherein Conv stands for convolution operation, Fi+1,jThe j-th feature, F, representing the i +1 th layer of the convolution outputi,*Represents all the characteristics of the ith layer;
an attention mechanism is introduced as equation (2), and the mean of each image feature is used to determine the importance of the feature:
vi+1,j=avgpool(Fi+1,j) (2)
wherein, Fi+1,jIs a two-dimensional feature with a length h and a width w, avgpool for Fi+1,jThis two-dimensional feature is averaged, i.e.,
Figure BDA0002483720370000031
the mean is then mapped to a weight between [0,1] by a linear transformation and a non-linear activation function:
v′i+1,*=(Wvi+1,*) (3)
wherein, v'i+1,*For attention values, W is the learnable weight of the linear transformation, representing the sigmoid activation function.
Preferably, a semi-hard attention mechanism is added every third convolution layer for feature sparseness.
In step a2, a dynamic range function is set:
y=min(f(x),k) (4)
wherein, f (x) is a linear function, x represents the iteration number in the training, and is increased from 0 to the maximum iteration number, and when the value of f (x) is greater than k, the dynamic value range is fixed to k and does not change any more; making the network iterate, firstly making the weight value of each image characteristic learn a local optimal solution, then gradually increasing the value range of the weight value return to zero, and finally making the network converge to a general optimal solution;
at each iteration, attention value v'i+1,jThose values that are smaller than the dynamic range y in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′i+1,*=v′i+1,**Fi+1,*(5)。
in the step a3, the cross entropy function is used to process the image classification related task, and the mean square error loss function is used to process the target detection related task.
In the step a3, for the image classification ImageNet task, two full-link layers and a softmax layer are connected after the one-dimensional global feature to output various predicted values, and a Cross Entropy Loss function (Cross Entropy Loss) is used for back propagation training of the whole network:
Figure BDA0002483720370000032
where n represents the number of classes, p represents the correct answer given by the label, and q represents the predicted value of the trained model output.
An adaptive image sparsification characterization device based on deep learning, which comprises a computer-readable storage medium and a processor, wherein the computer-readable storage medium stores an executable program, and the executable program is characterized in that when being executed by the processor, the adaptive image sparsification characterization device based on deep learning realizes the adaptive image sparsification characterization method based on deep learning.
A computer-readable storage medium, storing an executable program, which when executed by a processor, implements the deep learning-based image adaptive sparsification characterization method.
The invention has the following beneficial effects:
the invention provides an image self-adaptive sparsity characterization method and device based on deep learning, which can be well fused in the current mainstream deep convolution models (such as ResNet) and stably improve the recognition accuracy of the deep convolution models on computer vision tasks such as image recognition, target detection and the like under the condition of not introducing extra time complexity and space complexity. The invention simultaneously proves the effectiveness of the ImageNet data set and the COCO data set which are widely used.
By using the method, after the self-adaptive sparsization is added to any deep learning model, the generalization and robustness of the model are obviously enhanced, namely the representation capability of the image characteristics is enhanced, and meanwhile, the time complexity and the space complexity of the model are kept unchanged.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The embodiment of the invention provides an image self-adaptive sparsification characterization method based on deep learning, which comprises the following steps:
and A1, selecting an arbitrary deep convolution neural network model M, and adding a deep learning method based on a semi-hard attention mechanism in each stage of convolution operation, wherein the new model is M'.
Convolution operation is a common method for extracting image features in deep learning, and multiple times of convolution operation are used for extracting image local information step by step from shallow to deep and then performing convolution on image features with the local information so as to extract global information. In the whole process, the convolution can extract the shallow local features (such as texture information) of the image and can also extract the high-level global features (such as semantic information) of the image. The attention mechanism means that the neural network uses statistical information (such as mean, variance, etc.) of image features of each stage to learn respective feature weights for the features, wherein the weight values are between [0,1] to indicate the importance of the features. The semi-hard attention mechanism in the invention means that a value range k is set, and when a weight value is smaller than k, image features corresponding to the weight value smaller than k are zeroed, namely, sparsification operation is carried out. The purpose of this is to preserve those most important features, and the role of the unimportant features zeroing is not to let these unimportant information influence the back propagation of the neural network, so that the trained network is more generalized and robust.
And step A2, setting a linear increasing semihard attention sparse value domain for obtaining a more robust sparse image representation.
Specifically, a final value range k is set, that is, the neural network is trained to the end, and all image features with weights less than k are displayed and zeroed. In addition, a value range function can be set:
y=min(f(x),k)
here, x represents the number of iterations in training, and is increased from 0 to the maximum number of iterations until the value of f (x) is greater than k, and the dynamic range is fixed to k and does not change. In our scheme, f (x) is set as a linear function, such as f (x) 3e-5 x, k 0.3. The image features with weight values smaller than k are zeroed, so that parameters corresponding to the features are not derivable, and the image features cannot be trained sufficiently. Thus, in the first few thousand iterations, where f (x) is close to 0, the entire neural network is approximately fully conductive, and the network is subjected to several thousand iterations, where the weight values of each image feature are learned to a local optimal solution, and then the value range where the weight values are zeroed is gradually increased, eventually causing the network to converge to an overall better solution.
Step A3, setting a loss function suitable for a specific task, and training the whole deep convolution neural network model M' by using back propagation.
The method is suitable for general computer vision tasks, and the whole network model is trained based on back propagation. Cross entropy functions can be used for processing image classification related tasks, and mean square error loss functions can be used for processing related tasks such as target detection. This step and the above 2 steps can be trained jointly, and the whole process does not increase the time complexity and the space complexity of the neural network. Two computer vision basic tasks of image classification and object detection are taken as examples in the embodiment, but the application range covered by the computer vision basic task is not limited to the exemplified tasks.
The embodiment of the invention provides a self-adaptive sparse representation method aiming at the aspect of application of deep learning in the field of computer vision, in particular to the aspect of extracting image information. The method can be simply integrated into any model based on the deep convolutional neural network. According to the method, a semi-hard attention mechanism is added to each convolution block, so that attention to effective features and zero resetting of ineffective features are achieved. For example, for image a, the features corresponding to a certain part of parameter p are zeroed, i.e. represent that these features have low weight, and thus can be discarded. Then the neural network carried by the image a propagates backward in the training process without affecting the parameter p of the model. Therefore, the method enables other training images to fully utilize the feature representation brought by the parameter p. This has the benefit of thinning out the invalid features, more effectively characterizing the large data image. It is worth noting that the method does not bring extra time complexity and space complexity in the training and testing stages, and the effect of enhancing the generalization and robustness of the model and the image representation capability is really realized.
In some particularly preferred embodiments, the method may be operated as follows.
Step A1: the core operation of this step is the addition of a semi-hard ideogram to the convolutional layer, where we first describe the convolutional layer:
Fi+1,j=Conv(Fi,*) (1)
here, Conv stands for convolution operation, Fi+i,jThe j-th feature, F, representing the i +1 th layer of the convolution outputi,*Representing all the characteristics of the ith layer. The convolution operation of a conventional convolutional network is performed by equation (1).
We next introduce a mechanism of attention where we use the mean of each feature to determine the importance of the feature itself:
vi+1,j=avgpool(Fi+1,j) (2)
here, Fi+1,jIs a two-dimensional character with the length h and the width wAvgpool for Fi+1,jThis two-dimensional feature is averaged and can be written as
Figure BDA0002483720370000061
These means are then mapped to by a linear transformation and a non-linear activation function
Weight between [0,1 ]:
v′i+1,*=(Wvi+1,*) (3)
here, W is the learnable weight of the linear transformation, representing the sigmoid activation function, and until now between the weights [0,1] of the features themselves obtained from the self-attention mechanism on the features, we will describe how to sparsify the features at step a 2. Since the order of the operands of the attention mechanism is smaller than the amount of operations of the current layer convolution operation, our method does not enter additional temporal and spatial complexity. In addition, in our model, we add a semi-hard attention mechanism every other two convolution layers for feature sparsification, which is beneficial to further reduce the computational complexity.
Step A2: the attention mechanism is generally referred to as a soft attention (softattention) mechanism without specific description, and represents that the weight may be any value between [0,1] and is everywhere conductive. The hard attention (hard attention) mechanism means that the weight can only take one of two values of 0,1, and most of the hard attention mechanism is not conductive. In the method, a half-hard attention (half-hard attention) mechanism is adopted, namely a value range k is set, all weight values smaller than k are forced to be 0, all values larger than or equal to k keep the original values, and the method combines a hard attention mechanism and a soft attention mechanism, so that the half-hard attention mechanism is conductive within a value range larger than or equal to k and is non-conductive within a value range smaller than k, and the effect is just required.
Furthermore, in order to make the weight of each image feature meaningful, at initialization, we train several rounds with a very small range (approximately 0) in order to ensure that the features zeroed out by the semi-hard attention mechanism are indeed relatively unimportant features, and not the result of random initialization, the specific dynamic range is set as follows:
y=min(f(x),k) (4)
here, x represents the number of iterations in the training, and is incremented from 0 to the maximum number of iterations (one iteration for a batch of images, i.e., a batch), and until the value of f (x) is greater than k, the dynamic range is fixed to k and does not change. In our scheme, f (x) is set as a linear function, such as f (x) 3e-5 x, k 0.3. It is worth noting that we do not set k to 0.5, since most features are valid in neural networks, discarding too many (half or more) feature channels results in a large drop in performance, and the opposite effect.
At each iteration, formula (3) attention value v'i+1,jThose values that are smaller than the dynamic range y in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′i+1,*=v′i+1,**Fi+1,*(5)
step A3: in the first two steps, a high-dimensional vector representing image features is usually output, i.e. the last two-dimensional features of the convolutional neural network are input into an avgpool layer (see formula 2), and one-dimensional global features are obtained. Taking the image classification ImageNet task as an example, two full-connection layers and one softmax layer are connected behind one-dimensional global features to output various predicted values, and a Cross Entropy Loss function (Cross Engine Loss) is used for back propagation training of the whole network:
Figure BDA0002483720370000071
where n represents the number of classes (e.g., ImageNet where n is 1000), p represents the correct answer given by the label, and q represents the predicted value of the model output we have trained.
Compared with a deep convolution model ResNet50, the model trained by the method is represented in the following table 1 in ImageNet image classification and in the following table 2 in COCO target detection.
TABLE 1
Model representation on ImageNet image classification Accuracy of measurement
ResNet50 76.2%
ResNet50+ feature sparsification (method) 77.3%
TABLE 2
Model performance on COCO target detection mAP
FCOS(ResNet50) 38.7%
FCOS (ResNet50) + sparsification (method) 39.3%
End-to-end training can be realized through the steps A1-A3, as the number of training iterations increases, the value range in the step A2 is higher and higher, the characteristics of the output of the whole model become sparse, and the sparse redundant parameters provide better characteristic expression for other inputs.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims (9)

1. An image self-adaptive sparsification characterization method based on deep learning is characterized by comprising the following steps:
a1: selecting any deep convolutional neural network model M, adding a deep learning method based on a semi-hard attention mechanism at each stage of convolutional operation, adding the semi-hard attention mechanism into a convolutional layer, and constructing a new deep convolutional neural network model M';
a2: setting a linear increasing semihard attention sparse value domain for obtaining sparse image representation;
a3: setting a loss function suitable for a task, and training a whole deep convolutional neural network model M' by utilizing back propagation;
the semi-hard attention mechanism is that the neural network learns the weight value of the image feature by using the statistical information of the image feature, and when the weight value is smaller than a set value range k, the image feature corresponding to the weight value smaller than k is reset to zero.
2. The method as claimed in claim 1, wherein in step a1, the local information of the image is gradually extracted by using a plurality of convolution operations, and the image characteristics with the local information are convolved to extract the global information.
3. The method of claim 1, wherein in step a1, the convolution operation is as in formula (1):
Fi+1,j=Conv(Fi,*) (1)
wherein ConV stands for convolution operation, Fi+1,jThe j-th feature, F, representing the i +1 th layer of the convolution outputi,*Represents all the characteristics of the ith layer;
an attention mechanism is introduced as equation (2), and the mean of each image feature is used to determine the importance of the feature:
vi+1,j=avgpool(Fi+1,j) (2)
wherein, Fi+1,jIs a two-dimensional feature of length h and width w, aVgpool for Fi+1,jThis two-dimensional feature is averaged, i.e.,
Figure FDA0002483720360000011
the mean is then mapped to a weight between [0,1] by a linear transformation and a non-linear activation function:
v′i+1,*=(Wvi+1,*) (3)
wherein, v'i+1,*For attention values, W is the learnable weight of the linear transformation, representing the sigmoid activation function.
4. A method as claimed in any one of claims 1 to 3, wherein a semi-hard attention mechanism is applied to every third convolution layer for feature sparsification.
5. The method according to any of claims 1 to 4, characterized in that in step A2, a dynamic range function is set:
y=min(f(x),k) (4)
wherein, f (x) is a linear function, x represents the iteration number in the training, and is increased from 0 to the maximum iteration number, and when the value of f (x) is greater than k, the dynamic value range is fixed to k and does not change any more; making the network iterate, firstly making the weight value of each image characteristic learn a local optimal solution, then gradually increasing the value range of the weight value return to zero, and finally making the network converge to a general optimal solution;
at each iteration, attention value v'i+1,jThose values that are smaller than the dynamic range y in equation (4) are set to 0 and then act on the convolution feature of the current layer, resulting in a self-thinned feature:
F′i+1,*=v′i+1,**Fi+1,*(5)。
6. the method according to any one of claims 1 to 5, wherein in step A3, the image classification related task is processed by using a cross entropy function, and the target detection related task is processed by using a mean square error loss function.
7. The method according to any one of claims 1 to 5, wherein in the step A3, for the image classification ImageNet task, the predicted values of each class are output after a one-dimensional global feature by two fully-connected layers and a softmax layer, and the whole network is trained by using Cross Entropy Loss function (Cross Entropy Loss) back propagation:
Figure FDA0002483720360000021
where n represents the number of classes, p represents the correct answer given by the label, and q represents the predicted value of the trained model output.
8. An adaptive image sparsification characterization device based on deep learning, comprising a computer readable storage medium and a processor, wherein the executable program is stored in the computer readable storage medium, and when being executed by the processor, the executable program realizes the adaptive image sparsification characterization method based on deep learning according to any one of claims 1 to 7.
9. A computer-readable storage medium storing an executable program, wherein the executable program, when executed by a processor, implements the adaptive image sparsification characterization method based on deep learning according to any one of claims 1 to 7.
CN202010385699.3A 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning Active CN111652246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010385699.3A CN111652246B (en) 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010385699.3A CN111652246B (en) 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning

Publications (2)

Publication Number Publication Date
CN111652246A true CN111652246A (en) 2020-09-11
CN111652246B CN111652246B (en) 2023-04-18

Family

ID=72342551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010385699.3A Active CN111652246B (en) 2020-05-09 2020-05-09 Image self-adaptive sparsization representation method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN111652246B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114563763A (en) * 2022-01-21 2022-05-31 青海师范大学 Underwater sensor network node distance measurement positioning method based on return-to-zero neurodynamics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
CN110827312A (en) * 2019-11-12 2020-02-21 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
CN110827312A (en) * 2019-11-12 2020-02-21 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN111046962A (en) * 2019-12-16 2020-04-21 中国人民解放军战略支援部队信息工程大学 Sparse attention-based feature visualization method and system for convolutional neural network model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114563763A (en) * 2022-01-21 2022-05-31 青海师范大学 Underwater sensor network node distance measurement positioning method based on return-to-zero neurodynamics
US11658752B1 (en) 2022-01-21 2023-05-23 Qinghai Normal University Node positioning method for underwater wireless sensor network (UWSN) based on zeroing neural dynamics (ZND)

Also Published As

Publication number Publication date
CN111652246B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Yang et al. BM3D-Net: A convolutional neural network for transform-domain collaborative filtering
Ahmed et al. Star-caps: Capsule networks with straight-through attentive routing
Xie et al. POLSAR image classification via Wishart-AE model or Wishart-CAE model
CN107066559B (en) Three-dimensional model retrieval method based on deep learning
Mathur et al. Crosspooled FishNet: transfer learning based fish species classification model
Zhu et al. Generalizable no-reference image quality assessment via deep meta-learning
CN112464865A (en) Facial expression recognition method based on pixel and geometric mixed features
CN107862680B (en) Target tracking optimization method based on correlation filter
JP6107531B2 (en) Feature extraction program and information processing apparatus
Etezadifar et al. A new sample consensus based on sparse coding for improved matching of SIFT features on remote sensing images
CN112967210B (en) Unmanned aerial vehicle image denoising method based on full convolution twin network
Taghanaki et al. Robust representation learning via perceptual similarity metrics
CN113642602B (en) Multi-label image classification method based on global and local label relation
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
Wang et al. Building correlations between filters in convolutional neural networks
CN112329771A (en) Building material sample identification method based on deep learning
CN116258877A (en) Land utilization scene similarity change detection method, device, medium and equipment
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN111985487B (en) Remote sensing image target extraction method, electronic equipment and storage medium
CN111652246B (en) Image self-adaptive sparsization representation method and device based on deep learning
Pratapagiri et al. Early detection of plant leaf disease using convolutional neural networks
CN117611838A (en) Multi-label image classification method based on self-adaptive hypergraph convolutional network
Liu et al. An effective approach to crowd counting with CNN-based statistical features
Zhang et al. Heuristic dual-tree wavelet thresholding for infrared thermal image denoising of underground visual surveillance system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant