CN111275076B - Image significance detection method based on feature selection and feature fusion - Google Patents
Image significance detection method based on feature selection and feature fusion Download PDFInfo
- Publication number
- CN111275076B CN111275076B CN202010030505.8A CN202010030505A CN111275076B CN 111275076 B CN111275076 B CN 111275076B CN 202010030505 A CN202010030505 A CN 202010030505A CN 111275076 B CN111275076 B CN 111275076B
- Authority
- CN
- China
- Prior art keywords
- feature
- conv
- features
- pyramid set
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image significance detection method based on feature selection and feature fusion, which comprises the following steps of: extracting features of the input image, and adding the features into the feature pyramid set; selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set; performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set; and training the significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on the image to be detected by using the trained model. The invention adopts the attention model to select the characteristics of the image, enhances the characteristics related to the image target, makes the characteristics more effective, adopts a bottom-up characteristic fusion structure to effectively fuse the detailed characteristics of the bottom layer and the semantic characteristics of the high layer, greatly improves the characteristic capability of the characteristics, and has higher detection accuracy than a common significance model network.
Description
Technical Field
The invention belongs to the field of image significance detection, and particularly relates to an image significance detection method based on feature selection and feature fusion.
Background
The saliency of an image is an object or an object which draws attention in the image, the result of saliency detection in the image or the video is often an object in the image or the video, the saliency detection in the neurology department is described as an attention mechanism for focusing or reducing a significant part of a seen object scene, and the saliency detection can automatically process the representation of the object in the image. The significance detection can improve the efficiency of algorithms such as object detection and image segmentation.
The most effective significance detection method at present is realized based on a full convolution neural network. The full convolutional neural network adds a plurality of convolutional layers and pooling layers, gradually increases the receptive field, generates high-level semantic information, and plays a crucial role in significance detection, while the pooling layers reduce the size of feature mapping and worsen the boundaries of salient objects. Some networks protect the boundary of a protruding object by using manual design features, extract the manual features to calculate the significant value of a super pixel, and divide an image into regions by using the manual features. When the saliency map is generated, the handcraft features and the high-level features of the convolutional neural network are complementary, but the methods extract the features separately, and the complementary features extracted separately are difficult to be effectively fused. Furthermore, the manual process feature extraction process is very time consuming.
In addition to manual process characterization, some studies have found that the features of different layers of the network are also complementary and integrate multi-scale features for significance detection. More specifically, deep features often contain global context-aware information that is suitable for correctly locating salient regions. Shallow features contain spatial structural details suitable for locating boundaries. These methods fuse different scale features but do not take into account their different contributions to significance, which makes significance detection underperforming. To overcome these problems, the prior art proposes to introduce a focus model and a gate function into the significance detection network, but this method ignores different features of high-level and low-level features, which may affect the extraction of valid features, and thus reduce the accuracy of significance detection.
Disclosure of Invention
The invention aims to provide an image significance detection method based on feature selection and feature fusion, which can better perform feature characterization and significance prediction on an image.
The technical solution for realizing the purpose of the invention is as follows: an image saliency detection method based on feature selection and feature fusion, the method comprising the steps of:
step 1, extracting features of an input image, and adding all the features into a feature pyramid set;
step 2, selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set;
step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set;
and 4, training a significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on the image to be detected by using the trained significance prediction network model.
Further, the step 1 of performing feature extraction on the input image, specifically performing feature extraction on the input image by using a convolutional neural network ResNext, and the specific process includes:
suppose that the five part of the convolution blocks included in the convolutional neural network ResNext are conv respectively 1 、conv 2 、conv 3 、conv 4 、conv 5 ;
Step 1-1, inputting an image into the five parts of the volume blocks in sequence, and performing forward iteration, wherein an iteration formula is as follows:
f i+1 =conv j (f i ,W j ),j∈[1,5],i∈[-1,3]
wherein, when i = -1, f -1 F is the image to be detected, i is-1,0,1,2,3 respectively i+1 Respectively representing the convolution blocks conv 1 、conv 2 、conv 3 、conv 4 、conv 5 Output result of (1), W j For the convolution block conv j The parameters of (1);
step 1-2, adding the feature graph output by each partial rolling block to an output set to form a feature pyramid set { f } 0 ,f 1 ,f 2 ,f 3 ,f 4 }。
Further, in step 2, feature selection is performed on the feature pyramid set, specifically, a spatial attention and channel attention mechanism is adopted for feature selection, and the specific process includes:
step 2-1, utilizing spatial attention to the bottom layer feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Step 2-2, utilizing channel attention to the middle layer characteristic diagram f in the characteristic pyramid set 2 The selection of the characteristics is carried out,obtaining a new mid-level feature map
Further, step 2-1 is to utilize the spatial attention to the bottom-level feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagramThe method specifically comprises the following steps:
defining an underlying feature graph f 0 Is composed ofw, h and c respectively represent the width, height and channel number of the characteristic diagram; constructing a spatial attention module comprising two sub-volume blocks, respectively denoted conv 11 、conv 22 ;
Step 2-1-1, mixing f l Put into conv in sequence 11 、conv 22 The sub-volume blocks respectively output the feature map C 1 、C 2 :
C 1 =conv 11 (f l ,W 11 )
C 2 =conv 22 (f l ,W 22 )
In the formula, W 11 、W 22 Are each conv 11 、conv 22 Parameters of the sub-volume blocks;
step 2-1-2, p conv 11 、conv 22 Output result C of sub-volume block 1 、C 2 Element-by-element addition is performed and the resulting value of the addition is mapped to [0,1] using a sigmoid function]Obtaining the weight SA of the spatial attention, wherein the specific formula is as follows:
SA=σ(C 1 +C 2 )
in the formula, σ represents a sigmoid function;
step 2-1-3, utilizing the weight SA of the space attention to the bottom layer feature map f 0 Carrying out feature selection to obtain a new underlying feature mapOrThe formula used is:
further, the sub-volume block conv 11 、conv 22 Each including two convolutional layers, one of which has 32 convolutional kernels and 3x3 convolutional kernels, and the other has 64 convolutional kernels and 3x3 convolutional kernels.
Further, step 2-2 describes using channel attention to the middle-level feature map f in the feature pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagramThe method specifically comprises the following steps:
Step 2-2-1, mixing f m Expand into one set:
f m ={f 1 m ,f 2 m ,......,f C m }
wherein f is i m Is f m The ith channel slice feature in (1),i =1,2, …, C, C is characteristic diagram f m The number of channels of (a);
step 2-2-2, slicing feature f for each channel i m Performing global pooling to obtain a channel level vector
Step 2-2-3, learning the channel level vector by utilizing two continuous full-connection layers and a nonlinear activation layer to obtain a channel level attention vector, mapping the channel level attention vector to [0,1] by utilizing a sigmoid function, and obtaining a weight CA of channel attention, wherein the formula is as follows:
CA=F(v m ,W)=σ(fc 2 (δ(fc 1 (v m ,W 1 )),W 2 ))
in the formula, W 1 、W 2 Respectively full connection layer fc 1 、fc 2 δ is a nonlinear activation function, and σ is a sigmoid function;
step 2-2-4, centering the middle layer characteristic diagram f by using the weight CA of the channel attention 2 Re-distributing channel weight to obtain new middle layer characteristic diagramOrThe formula used is:
further, step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a fused feature pyramid set, specifically including:
step 3-1, removing new bottom layer characteristic diagramSampling some other characteristic graph as new bottom characteristic graphThen upsampled feature map andor mixed feature cascade to obtain cascade feature f cat The formula used is:
in the formula (f) i ×) represents the pair feature f i Up-sampling, [ c ]]Representing channel cascade operation, j = -1,to representj =0,1,2,representing cascade characteristics f cat Hybrid features learned through the three convolutional layers;
step 3-2, the cascade characteristic f cat Through three layers of convolution layers, learning of feature fusion is carried out to obtain mixed featuresThe formula used is:
step 3-3, repeating step 3-1 and step 3-2 in a bottom-up mode, and enabling the features f in the new feature pyramid set to be new 1 ,f 2 ,f 3 ,f 4 I.e. f 1 ,f 3 ,f 4 Fusing layer by layer to obtainObtaining a set of mixed feature pyramids
Further, the saliency prediction network model in the step 4 includes three convolution layers, a batch regularization layer and an activation layer are added behind the first two convolution layers, and the last convolution layer outputs a saliency map which is a single channel and has the same resolution as the original input image.
Further, in the step 4, training a significance prediction network model by using the features in the mixed feature pyramid set includes:
step 4-1, sequentially carrying out significance prediction on the features in the mixed feature pyramid set by using a significance prediction network model;
4-2, performing loss calculation on all prediction results to obtain a gradient, and performing iterative update on the significance prediction network model parameters by using the gradient through a reverse transfer algorithm;
and (5) repeating the step 4-1 to the step 4-2 until the iteration times exceed a preset threshold value, and finishing the training of the significance prediction network model.
Compared with the prior art, the invention has the following remarkable advantages: 1) The attention model is adopted to select the features of the image, so that the features related to the image target are enhanced, and the features are more effective; 2) The method adopts a bottom-up feature fusion structure to effectively fuse the detail features of the bottom layer and the semantic features of the high layer, greatly improves the characterization capability of the features, and has higher detection accuracy than a common significance model network.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
Fig. 1 is a flowchart of an image saliency detection method based on feature selection and feature fusion according to the present invention.
FIG. 2 is a diagram illustrating feature selection performed on a feature map by a spatial attention module according to the present invention.
FIG. 3 is a diagram illustrating feature selection performed on a feature map by a channel attention module according to the present invention.
FIG. 4 is a schematic diagram of bottom-up feature fusion for a feature pyramid in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, in conjunction with fig. 1, the present invention provides an image saliency detection method based on feature selection and feature fusion, the method including the following steps:
step 1, extracting features of an input image, and adding all the features into a feature pyramid set;
step 2, selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set;
step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set;
and 4, training the significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on the image to be detected by using the trained significance prediction network model.
Further, in one embodiment, the feature extraction is performed on the input image in step 1, specifically, the feature extraction is performed on the input image by using a convolutional neural network ResNext, and the specific process includes:
suppose that the five part of the convolution blocks included in the convolutional neural network ResNext are conv respectively 1 、conv 2 、conv 3 、conv 4 、conv 5 (ii) a The higher feature layer has rich semantic information, and the lower feature layer has rich low-level information such as texture.
Step 1-1, inputting an image into five parts of convolution blocks in sequence, and performing forward iteration, wherein an iteration formula is as follows:
f i+1 =conv j (f i ,W j ),j∈[1,5],i∈[-1,3]
wherein, when i = -1, f -1 F for the image to be detected, i is-1,0,1,2,3, respectively i+1 Respectively representing the convolution blocks conv 1 、conv 2 、conv 3 、conv 4 、conv 5 Output result of (1), W j Conv for convolution block j The parameters of (a);
step 1-2, adding the feature graph output by each partial rolling block to an output set to form a feature pyramid set { f } 0 ,f 1 ,f 2 ,f 3 ,f 4 }。
Exemplary preferably, as a specific example, the above conv 1 Is a layer of convolution layer with convolution kernel size of 7x7, conv 2 、conv 3 、conv 4 、conv 5 The method comprises 3, 4, 6 and 3 blocks, wherein the blocks are structures commonly used in Resnet series, specifically, a network structure formed by serially stacking three layers of convolution layers, and the convolution kernel sizes of the three layers of convolution are 1x1,3x3 and 1x1 respectively.
Illustratively, as a specific example, assume an input image I 3×300×300 The picture size is RGB three channels, and the length and the width of the picture are both 300 pixels. The characteristic pyramid set obtained through the process of the step 1 isWherein the superscript indicates the serial number of the feature map and the subscript indicates the number of channels and the width and height shapes of the feature map.
Further, in one embodiment, in step 2, feature selection is performed on the feature pyramid set, specifically, a spatial attention and channel attention mechanism is adopted for feature selection, and the specific process includes:
step 2-1, utilizing spatial attention to the bottom layer feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Step 2-2, utilizing channel attention to the middle layer characteristic diagram f in the characteristic pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Further, in one embodiment, in conjunction with FIG. 2, step 2-1 utilizes spatial attention to the underlying feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagramThe method specifically comprises the following steps:
defining an underlying feature graph f 0 Is composed ofw, h and c respectively represent the width, height and channel number of the characteristic diagram; in view of the above-described examples,constructing a spatial attention module comprising two sub-volume blocks, respectively denoted conv 11 、conv 22 ;
Step 2-1-1, mixing f l Put into conv in sequence 11 、conv 22 The sub-volume blocks respectively output the feature map C 1 、C 2 :
C 1 =conv 11 (f l ,W 11 )
C 2 =conv 22 (f l ,W 22 )
In the formula, W 11 、W 22 Are respectively conv 11 、conv 22 Parameters of the sub-volume blocks;
as a specific example, for the above example, the following will be mentionedPut into conv in sequence 11 、conv 22 Sub-volume blocks, respectively outputting a feature map C 1 、C 2 ,
Step 2-1-2, p conv 11 、conv 22 Output result C of sub-volume block 1 、C 2 Performing element-by-element addition, and mapping the result of the addition to [0,1] using a sigmoid function]Obtaining the weight SA of the spatial attention, wherein the specific formula is as follows:
SA=σ(C 1 +C 2 )
in the formula, σ represents a sigmoid function;
step 2-1-3, utilizing the weight SA of the space attention to the bottom layer characteristic diagram f 0 Carrying out feature selection to obtain a new underlying feature mapOrThe formula used is:
further, in one embodiment, the sub-volume block conv 11 、conv 22 Each including two convolutional layers, one of which has 32 convolutional kernels and 3x3 convolutional kernels, and the other has 64 convolutional kernels and 3x3 convolutional kernels.
Further, in one embodiment, in conjunction with FIG. 3, step 2-2 utilizes channel attentionFor the middle layer characteristic diagram f in the characteristic pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagramThe method specifically comprises the following steps:
Step 2-2-1, mixing f m Expand into one set:
f m ={f 1 m ,f 2 m ,......,f C m }
wherein f is i m Is f m The ith channel slice feature in (1),i =1,2, …, C, C is characteristic diagram f m The number of channels of (a);
as a specific example, the above example is addressedSpread out into a set f m ={f 1 m ,f 2 m ,......,f 512 m }。
Step 2-2-2, slicing feature f for each channel i m Performing global pooling to obtain a channel level vector
Step 2-2-3, learning a channel level vector by utilizing two continuous full-connection layers and a nonlinear activation layer to obtain a channel level attention vector, mapping the channel level attention vector to [0,1] by utilizing a sigmoid function, and obtaining a weight CA of channel attention, wherein the formula is as follows:
CA=F(v m ,W)=σ(fc 2 (δ(fc 1 (v m ,W 1 )),W 2 ))
in the formula, W 1 、W 2 Respectively full connection layer fc 1 、fc 2 Delta is a nonlinear activation function, and sigma is a sigmoid function;
step 2-2-4, centering the middle layer characteristic diagram f by using the weight CA of the channel attention 2 Re-distributing channel weight to obtain new middle layer characteristic diagramOrThe formula used is:
further, in one embodiment, with reference to fig. 4, in step 3, feature fusion is performed on features in the new feature pyramid set in a bottom-up manner, so as to obtain a fused feature pyramid set, which specifically includes:
step 3-1, removing new bottom layer characteristic diagramSampling some other characteristic graph as new bottom characteristic graphThen the up-sampled feature map andor mixed feature cascade to obtain cascade feature f cat The formula used is:
in the formula (f) i ×) represents the pair feature f i Up-sampling, [ c ]]Indicating channel cascade operation, j = -1,to representj =0,1,2,representing cascade characteristics f cat Hybrid features learned through the three convolutional layers;
step 3-2, cascading characteristic f cat Through three layers of convolution layers, learning of feature fusion is carried out to obtain mixed featuresThe formula used is:
step 3-3, repeating step 3-1 and step 3-2 in a bottom-up mode, and enabling the features f in the new feature pyramid set to be new 1 ,f 2 ,f 3 ,f 4 I.e. f 1 ,f 3 ,f 4 Fusing layer by layer to obtain a mixed characteristic pyramid set
Illustratively, in one embodiment, the convolution kernel size of the three convolutional layers in step 3-2 is 3x3,1x1 in this order.
Further, in one embodiment, the saliency-predicted network model in step 4 includes three convolutional layers, the first two convolutional layers are added with batch regularization layers and activation layers, and the last convolutional layer outputs a saliency map of a single channel and the same resolution as the original input image.
In one embodiment, the sizes of convolution kernels of three convolutional layers included in the significance prediction network model are sequentially 3x3,1x1.
Further, in one embodiment, in step 4, the feature in the mixed feature pyramid set is used to train the significance prediction network model, and the specific process includes:
step 4-1, sequentially carrying out significance prediction on the features in the mixed feature pyramid set by using a significance prediction network model;
4-2, performing loss calculation on all prediction results to obtain a gradient, and performing iterative update on the significance prediction network model parameters by using the gradient through a reverse transfer algorithm;
and (5) repeating the step 4-1 to the step 4-2 until the iteration times exceed a preset threshold value, and finishing the training of the significance prediction network model.
The invention adopts the attention model to select the characteristics of the image, enhances the characteristics related to the image target, makes the characteristics more effective, adopts a bottom-up characteristic fusion structure to effectively fuse the detailed characteristics of the bottom layer and the semantic characteristics of the high layer, greatly improves the characteristic capability of the characteristics, and has higher detection accuracy than a common significance model network.
Claims (6)
1. An image saliency detection method based on feature selection and feature fusion is characterized by comprising the following steps:
step 1, extracting features of an input image, and adding all the features into a feature pyramid set; the feature extraction is performed on the input image, specifically, the feature extraction is performed on the input image by using a convolutional neural network ResNext, and the specific process includes:
suppose that the five partial convolution blocks included in the convolutional neural network ResNext are conv 1 、conv 2 、conv 3 、conv 4 、conv 5 ;
Step 1-1, inputting an image into the five parts of the volume blocks in sequence, and performing forward iteration, wherein an iteration formula is as follows:
f i+1 =conv j (f i ,W j ),j∈[1,5],i∈[-1,3]
wherein, when i = -1, f -1 F is the image to be detected, i is-1,0,1,2,3 respectively i+1 Respectively representing the convolution blocks conv 1 、conv 2 、conv 3 、conv 4 、conv 5 Output result of (1), W j For the convolution block conv j The parameters of (1);
step 1-2, adding the feature graph output by each partial rolling block to an output set to form a feature pyramid set { f } 0 ,f 1 ,f 2 ,f 3 ,f 4 };
Step 2, selecting the characteristics of the characteristic pyramid set to obtain a new characteristic pyramid set; and selecting the features of the feature pyramid set, specifically selecting the features by adopting a space attention and channel attention mechanism, wherein the specific process comprises the following steps:
step 2-1, utilizing spatial attention to the bottom layer feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagram
Step 2-2, utilizing channel attention to the middle layer feature pattern f in the feature pyramid set 2 Selecting characteristics to obtain a new middle layer characteristic diagram
Step 3, performing feature fusion on the features in the new feature pyramid set in a bottom-up manner to obtain a mixed feature pyramid set; the feature fusion is performed on the features in the new feature pyramid set in a bottom-up manner to obtain a fused feature pyramid set, and the method specifically includes:
step 3-1, removing new bottom layer characteristic diagramSampling some other characteristic graph as new bottom characteristic graphThen upsampled feature map andor mixed feature cascade to obtain cascade feature f cat The formula used is:
in the formula (f) i ×) represents the pair feature f i Up-sampling, [ c ]]Indicating channel cascade operation, j = -1,representj =0,1,2,representing cascade characteristics f cat The hybrid features learned through the three convolutional layers;
step 3-2, the cascade characteristic f cat Through three layers of convolution layers, learning of feature fusion is carried out to obtain mixed featuresThe formula used is:
step 3-3, repeating step 3-1 and step 3-2 in a bottom-up mode, and enabling the features f in the new feature pyramid set to be in the same shape 1 ,f 2 ,f 3 ,f 4 I.e. f 1 ,f 3 ,f 4 Fusing layer by layer to obtain a pyramid set with mixed features
Step 4, training a significance prediction network model by using the features in the mixed feature pyramid set, and performing significance detection on an image to be detected by using the trained significance prediction network model; the significance prediction network model comprises three convolutional layers, wherein a batch regularization layer and an activation layer are added behind the first two convolutional layers, and the last convolutional layer outputs a significance map which is a single channel and has the same resolution as the original input image.
2. The method for detecting image saliency based on feature selection and feature fusion as claimed in claim 1, wherein step 2-1 utilizes spatial attention to the underlying feature map f in the feature pyramid set 0 Selecting the characteristics to obtain a new bottom characteristic diagramThe method specifically comprises the following steps:
defining an underlying feature graph f 0 Is composed ofw, h and c respectively represent the width, height and channel number of the characteristic diagram; constructing a spatial attention module comprising two sub-volume blocks, respectively denoted conv 11 、conv 22 ;
Step 2-1-1, mixing f l Put into conv in sequence 11 、conv 22 The sub-volume blocks respectively output the feature map C 1 、C 2 :
C 1 =conv 11 (f l ,W 11 )
C 2 =conv 22 (f l ,W 22 )
In the formula, W 11 、W 22 Are respectively conv 11 、conv 22 Parameters of the sub-volume blocks;
step 2-1-2, p conv 11 、conv 22 Output result C of sub-volume block 1 、C 2 Element-by-element addition is performed and the resulting value of the addition is mapped to [0,1] using a sigmoid function]The specific formula of the weight SA of the spatial attention is obtained as follows:
SA=σ(C 1 +C 2 )
in the formula, σ represents a sigmoid function;
step 2-1-3, utilizing the weight SA of the space attention to the bottom layer characteristic diagram f 0 Carrying out feature selection to obtain a new underlying feature mapOrThe formula used is:
3. the method according to claim 2, wherein the sub-volume block conv is used for detecting image saliency based on feature selection and feature fusion 11 、conv 22 Each including two convolutional layers, where the number of convolutional kernels in one layer is 32, the size of convolutional kernel is 3x3, the number of convolutional kernels in the other layer is 64, and the size of convolutional kernel is 3x3.
4. The method for detecting image saliency based on feature selection and feature fusion as claimed in claim 1, characterized in that, in step 2-2, the middle-level feature map f in the feature pyramid set is focused on by channel attention 2 Selecting characteristics to obtain a new middle layer characteristic diagramThe method specifically comprises the following steps:
Step 2-2-1, mixing f m Expand into one set:
f m ={f 1 m ,f 2 m ,......,f C m }
wherein f is i m Is f m The ith channel slice feature in (1),c is a characteristic diagram f m The number of channels of (a);
step 2-2-2, slicing feature f for each channel i m Performing global pooling to obtain a channel level vector
Step 2-2-3, learning the channel level vector by utilizing two continuous full-connection layers and a nonlinear activation layer to obtain a channel level attention vector, mapping the channel level attention vector to [0,1] by utilizing a sigmoid function, and obtaining a weight CA of channel attention, wherein the formula is as follows:
CA=F(v m ,W)=σ(fc 2 (δ(fc 1 (v m ,W 1 )),W 2 ))
in the formula, W 1 、W 2 Respectively full connection layer fc 1 、fc 2 δ is a nonlinear activation function, and σ is a sigmoid function;
step 2-2-4, centering the middle layer characteristic diagram f by using the weight CA of the channel attention 2 Re-distributing channel weight to obtain new middle layer characteristic diagramOrThe formula used is:
5. the image significance detection method based on feature selection and feature fusion as claimed in claim 1, wherein the convolution kernel size of the three layers of convolution layers in step 3-2 is 3x3,1x1 in sequence.
6. The method for detecting image saliency based on feature selection and feature fusion according to claim 1, wherein in step 4, the feature in the mixed feature pyramid set is used for training a saliency prediction network model, and the specific process includes:
step 4-1, sequentially carrying out significance prediction on the features in the mixed feature pyramid set by using a significance prediction network model;
4-2, performing loss calculation on all prediction results to obtain a gradient, and performing iterative update on the significance prediction network model parameters by using the gradient through a reverse transfer algorithm;
and (5) repeating the step 4-1 to the step 4-2 until the iteration times exceed a preset threshold value, and finishing the training of the significance prediction network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030505.8A CN111275076B (en) | 2020-01-13 | 2020-01-13 | Image significance detection method based on feature selection and feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030505.8A CN111275076B (en) | 2020-01-13 | 2020-01-13 | Image significance detection method based on feature selection and feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111275076A CN111275076A (en) | 2020-06-12 |
CN111275076B true CN111275076B (en) | 2022-10-21 |
Family
ID=70997061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010030505.8A Active CN111275076B (en) | 2020-01-13 | 2020-01-13 | Image significance detection method based on feature selection and feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111275076B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931793B (en) * | 2020-08-17 | 2024-04-12 | 湖南城市学院 | Method and system for extracting saliency target |
CN112927209B (en) * | 2021-03-05 | 2022-02-11 | 重庆邮电大学 | CNN-based significance detection system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165660A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of obvious object detection method based on convolutional neural networks |
CN110619638A (en) * | 2019-08-22 | 2019-12-27 | 浙江科技学院 | Multi-mode fusion significance detection method based on convolution block attention module |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI709107B (en) * | 2018-05-21 | 2020-11-01 | 國立清華大學 | Image feature extraction method and saliency prediction method including the same |
-
2020
- 2020-01-13 CN CN202010030505.8A patent/CN111275076B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165660A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of obvious object detection method based on convolutional neural networks |
CN110619638A (en) * | 2019-08-22 | 2019-12-27 | 浙江科技学院 | Multi-mode fusion significance detection method based on convolution block attention module |
Also Published As
Publication number | Publication date |
---|---|
CN111275076A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509978B (en) | Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion | |
Qian et al. | Learning and transferring representations for image steganalysis using convolutional neural network | |
CN111325165B (en) | Urban remote sensing image scene classification method considering spatial relationship information | |
CN105095862B (en) | A kind of human motion recognition method based on depth convolution condition random field | |
CN105678284B (en) | A kind of fixed bit human body behavior analysis method | |
CN108062756A (en) | Image, semantic dividing method based on the full convolutional network of depth and condition random field | |
CN113344806A (en) | Image defogging method and system based on global feature fusion attention network | |
CN104462494B (en) | A kind of remote sensing image retrieval method and system based on unsupervised feature learning | |
CN110287777B (en) | Golden monkey body segmentation algorithm in natural scene | |
CN109753959B (en) | Road traffic sign detection method based on self-adaptive multi-scale feature fusion | |
CN113269224B (en) | Scene image classification method, system and storage medium | |
CN111275076B (en) | Image significance detection method based on feature selection and feature fusion | |
JP6107531B2 (en) | Feature extraction program and information processing apparatus | |
CN113449612B (en) | Three-dimensional target point cloud identification method based on sub-flow sparse convolution | |
Yoo et al. | Fast training of convolutional neural network classifiers through extreme learning machines | |
CN112991364A (en) | Road scene semantic segmentation method based on convolution neural network cross-modal fusion | |
CN112132145A (en) | Image classification method and system based on model extended convolutional neural network | |
CN115482518A (en) | Extensible multitask visual perception method for traffic scene | |
CN113393457A (en) | Anchor-frame-free target detection method combining residual dense block and position attention | |
CN113627487B (en) | Super-resolution reconstruction method based on deep attention mechanism | |
CN115294356A (en) | Target detection method based on wide area receptive field space attention | |
CN115410087A (en) | Transmission line foreign matter detection method based on improved YOLOv4 | |
CN111340189A (en) | Space pyramid graph convolution network implementation method | |
CN117218457B (en) | Self-supervision industrial anomaly detection method based on double-layer two-dimensional normalized flow | |
CN114723733B (en) | Class activation mapping method and device based on axiom explanation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |