CN109635882B - Salient object detection method based on multi-scale convolution feature extraction and fusion - Google Patents

Salient object detection method based on multi-scale convolution feature extraction and fusion Download PDF

Info

Publication number
CN109635882B
CN109635882B CN201910062293.9A CN201910062293A CN109635882B CN 109635882 B CN109635882 B CN 109635882B CN 201910062293 A CN201910062293 A CN 201910062293A CN 109635882 B CN109635882 B CN 109635882B
Authority
CN
China
Prior art keywords
feature
convolution
scale
characteristic
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910062293.9A
Other languages
Chinese (zh)
Other versions
CN109635882A (en
Inventor
牛玉贞
龙观潮
郭文忠
苏超然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201910062293.9A priority Critical patent/CN109635882B/en
Publication of CN109635882A publication Critical patent/CN109635882A/en
Application granted granted Critical
Publication of CN109635882B publication Critical patent/CN109635882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Abstract

The invention relates to a salient object detection method based on multi-scale convolution feature extraction and fusion, which comprises the steps of firstly enhancing data, simultaneously processing a color image and a corresponding artificial labeling image, and increasing the data volume of a training data set; extracting multi-scale features, and performing channel compression to optimize the computing efficiency of the network; then, fusing multi-scale features to obtain a predicted saliency map; finally, the optimal parameters of the model are learned by solving the minimum cross entropy loss; and finally, predicting the salient objects in the image by using the trained model network. The invention can obviously improve the detection precision of the obvious object.

Description

Salient object detection method based on multi-scale convolution feature extraction and fusion
Technical Field
The invention relates to the field of image processing and computer vision, in particular to a salient object detection method based on multi-scale convolution feature extraction and fusion.
Background
How to fuse various scale convolution characteristics in a full convolution network is an open problem in the field of salient object detection. Starting from this problem, most existing salient object detection methods based on full convolution neural networks generally enable convolution features of different scales to be fused via branches by adding network branches, thereby generating features more beneficial to the salient object detection task. Salient object detection algorithms proposed after 2015 mostly focus on applying a Full Convolutional Neural Network (FCNN) to improve the computational efficiency of the network and the accuracy of salient object detection.
The work can be divided into two types, firstly, the innovation of a full convolution network structure is realized, Li and the like obtain characteristics of different scales on a pre-trained VGG-16 network, the characteristics of each scale are subjected to convolution calculation to obtain a new characteristic result, the characteristics are restored to be uniform in size through an up-sampling operation, finally, a significant detection result is obtained through the convolution operation, and a branch with a super-pixel scale is fused to optimize the final significant object detection result from a spatial scale. The salient object network proposed by Wang et al is a full convolution neural network in the form of an encoder-decoder, and a recurrent neural network structure is added to continuously iteratively optimize the detection result of the salient object. Cheng et al adds a short connection structure (short connection structure) to the full-volume network, and because each output branch in the short connection structure fuses high-level semantic information and low-level features such as texture and shape, the performance of the algorithm is significantly improved while keeping the model simple and efficient.
However, most methods use a pre-trained feature network on a classification task to fuse convolution features of corresponding different scales in the network, and the scales of the features are generally limited and fixed.
Disclosure of Invention
In view of this, the present invention provides a method for detecting a salient object based on multi-scale convolution feature extraction and fusion, which can significantly improve the detection accuracy of the salient object.
The invention is realized by adopting the following scheme: a salient object detection method based on multi-scale convolution feature extraction and fusion specifically comprises the following steps:
step S1: data enhancement is carried out, meanwhile, the color image and the corresponding artificial labeling graph are processed, and the data volume of the training data set is increased;
step S2: extracting multi-scale features and performing channel compression to optimize the computing efficiency of the network;
step S3: fusing multi-scale features to obtain a predicted saliency map Predi
Step S4: learning the optimal parameters of the model by solving the minimum cross entropy loss; and finally, predicting the salient objects in the image by using the trained model network.
Further, step S1 specifically includes the following steps:
step S11: scaling each color image in the data set together with the corresponding artificial label graph so that the computing equipment can bear the calculated amount of the neural network;
step S12: carrying out random cutting operation on each color image in the data set and the corresponding manual labeling graph together so as to increase the diversity of data;
step S13: and generating a mirror image by horizontally turning the image to enlarge the data volume of the original data set.
Further, step S2 specifically includes the following steps:
step S21: the inherent network structure of U-Net is improved, wherein the encoder structure of the U-Net network takes an image classification convolution network as a feature network, convolution layers and a pooling layer are combined by continuous stacking to generate convolution features of 5 different scales, and the convolution feature EniAnd convolution feature Eni+1There is a pooling layer to gradually reduce the size of the feature map, the pooling layer is set to a step size of 2, such that Eni+1Compare EniThe feature result is reduced by half in two spatial dimensions of width and height; in order to maintain the convolution features and have enough information in the spatial dimension, the step size of the pooling layer between the last two convolution features is made to be 1, so that the last two convolution features maintain consistent size in both the width and height spatial dimensions;
step S22: designing a multi-scale feature extraction module to act on the convolution feature of each scale generated by the improved U-Net network in the step S21 to obtain multi-scale content features;
step S23: a channel compression module is added to act on the multi-scale content characteristics to optimize the computing efficiency of the network.
Further, step S22 specifically includes the following steps:
step S221: designing three convolution layers with convolution characteristics EniAs input, these three convolutions are all performed by a depth separable hole convolution operation, in which the expansion coefficients of the hole convolutions are 3, 6, 9, respectively; the feature result and convolution feature En obtained by the three operationsiThe feature sizes of the two-dimensional image are kept consistent, and the feature sizes are all (c, h, w);
step S222: splicing the three characteristic results together on the channel dimension by applying connection operation and obtaining the characteristic result with the characteristic size of (3c, h, w);
step S223: applying a convolution operation with convolution kernel size of (1,1) to compress the channel of the feature result obtained in step S222 to the convolution feature EniAnd (5) conforming and obtaining the multi-scale content features with the feature size of (c, h, w).
Further, step S3 specifically includes the following steps:
step S31: designing a multi-scale feature fusion module, and setting input multi-scale content features FeatiHas a characteristic size of (c, h, w); in the multi-scale feature fusion module, the depth separable convolution operation with convolution kernel sizes of (1, k) and (k,1) and the depth separable convolution operation with convolution kernel sizes of (k,1) and (1, k) are respectively applied to obtain the Feat with the input featureiFeature fusion results with consistent sizes;
step S32: the decoder structure of the U-Net network and the characteristic network of the encoder correspond to 5 characteristic results with different scales, and the convolutional characteristic Dec of each scale is generated by the decoder structure of the U-Net networkiThe multi-scale content feature Feat is fused by applying a multi-scale feature fusion moduleiAnd convolution characteristics Deci+1Here, it is assumed that the convolution characteristic Dec of the inputi+1The characteristic size of (c, h/2, w/2); first, the convolution feature Dec is alignedi+1Applying the upsampling operation to magnify twice in the spatial dimension, thereby convolving the features Deci+1With multi-scale content features FeatiThe sizes are the same in spatial dimension, and the characteristic size is (c, h, w); then the multi-scale content feature FeatiAnd convolution characteristics Deci+1Obtaining spliced features with the feature size of (2c, h, w) by applying splicing operation, and obtaining a feature result with the feature size of (c, h, w) through a ReLU activation function and a BN layer by applying convolution operation; then, a multi-scale feature fusion module is applied to the obtained feature result to obtain a feature fusion result, meanwhile, the feature result and the feature fusion result are subjected to splicing operation and convolution operation, and a feature result Dec with a feature size of (c, h, w) is obtained through a ReLU activation function and a BN layeri(ii) a Finally, the convolution operation with convolution kernel size of (1,1) is applied to convert the characteristic result DeciThe number of channels is reduced by half in order to be matched with Deci-1Fusing, and obtaining a characteristic result Dec with the characteristic size of (0.5c, h, w) through a ReLU activation function and a BN layeriCompressing the channel to 1 through convolution operation, and obtaining the predicted saliency map Pred through Sigmoid functioni
Further, step S31 specifically includes the following steps:
step S311: successively pair input multi-scale content features FeatiApplying a deeply separable convolution operation with convolution kernels of (1, k) and (k,1) in order to successively align the input features FeatiApplying the convolution operation with separable depths of convolution kernels (k,1) and (1, k), wherein after the two successive operations, a BN layer is added and two characteristic results are obtained respectively;
step S312: summing the two feature results according to the channel dimension to obtain and input feature FeatiCharacteristic results with consistent sizes;
step S313: modeling the features on the channel of the feature result by applying a convolution operation with convolution kernel size of (1,1) and obtaining the feature Feat corresponding to the input featureiAnd fusing the results of the features with the same size.
Further, in step S4, the cross entropy Loss is calculated by the following formula:
Figure BDA0001954535490000041
compared with the prior art, the invention has the following beneficial effects: the invention provides a multi-scale feature extraction module and a multi-scale fusion module, wherein the module is directly embedded into a U-Net network architecture of a typical encoder-decoder structure in network design, and meanwhile, the redundancy of information on a feature channel on the decoder structure is also considered, and a channel compression module is applied to ensure that the model calculation efficiency is higher. The invention can obviously improve the detection precision of the obvious object.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Fig. 2 is a structure diagram of a salient object detection network according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a multi-scale feature extraction module according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a channel compression module according to an embodiment of the invention.
Fig. 5 is a schematic diagram of a multi-scale feature fusion module according to an embodiment of the present invention.
Fig. 6 is a schematic network structure diagram of a multi-scale feature fusion process according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a method for detecting a salient object based on multi-scale convolution feature extraction and fusion, which specifically includes the following steps:
step S1: data enhancement is carried out, meanwhile, the color image and the corresponding artificial labeling graph are processed, and the data volume of the training data set is increased;
step S2: extracting multi-scale features and performing channel compression to optimize the computing efficiency of the network;
step S3: fusing multi-scale features to obtain a predicted saliency map Predi
Step S4: learning the optimal parameters of the model by solving the minimum cross entropy loss; and finally, predicting the salient objects in the image by using the trained model network.
In this embodiment, the step S1 performs data enhancement, and simultaneously processes the color image and the corresponding artificial label graph, so as to increase the data amount of the training data set. The international mainstream data set used for training a salient object detection network is generally a binary image containing a color image and a corresponding artificial label map, wherein the color image is shown in fig. 2 (a), and the artificial label map is similar to the salient map (e.g., fig. 2 (b)) and is an image salient object region marked by an artificial. Because the construction of the data set needs to consume large manpower, and the training of the deep neural network needs to require enough data, the data enhancement operation needs to be performed on the basis of the data volume of the original data set. Therefore, step S1 specifically includes the following steps:
step S11: scaling each color image in the data set together with the corresponding artificial label graph so that the computing equipment can bear the calculated amount of the neural network;
step S12: carrying out random cutting operation on each color image in the data set and the corresponding manual labeling graph together so as to increase the diversity of data;
step S13: and generating a mirror image by horizontally turning the image to enlarge the data volume of the original data set so as to meet the requirement of training a deep Convolutional Neural Network (CNN) on larger data volume and enhance the generalization capability of the model.
In this embodiment, step S2 specifically includes the following steps:
step S21: the inherent network structure of U-Net is improved, wherein the encoder structure of the U-Net network takes an image classification convolution network as a feature network (such as network structures of VGG or ResNet) and generates convolution features with 5 different scales by continuously stacking and combining convolution layers and pooling layers, such as En in FIG. 21,En2,En3,En4And En5Corresponding five characteristic results. Of these five convolution characteristics, in the convolution characteristic EniAnd convolution features Eni+1There is a pooling layer to gradually reduce the size of the feature map, the pooling layer is set to a step size of 2, such that Eni+1Compare EniThe feature results are reduced by half in both the width and height spatial dimensions, which also results in attenuation of the information of the convolution features in the spatial dimension; in order to preserve the convolution characteristics and to have enough information in the spatial dimension, the last two convolution characteristics (En) are made4And En5) The step size of the pooling layer in between is 1, so that the last two convolution features (En)4And En5) The size is kept consistent in both the width and height spatial dimensions;
step S22: designing a multi-scale feature extraction module to act on the convolution feature of each scale generated by the improved U-Net network in the step S21 to obtain multi-scale content features; the multi-scale feature extraction module is shown in fig. 3, and here, the feature size of the convolution feature is assumed to be (c, h, w);
step S23: a channel compression module is added to act on the multi-scale content characteristics to optimize the computational efficiency of the network. The channel compression Module is shown in FIG. 4, where "SE Module" is the Module proposed by Hu et al in the SENet (Squeeze-and-Excitation Networks) paper. SE module characterization Feat with multi-scale contentiAs an input, the feature generalization capability is made stronger by modeling the correlation between features on each channel and performing a weighting operation. Then the channel compression module applies a convolution operation with convolution kernel size of (1,1) to compress the channel number of the characteristic result to half of the original channel number, and then the channel number is subjected to a Recirculation Linear Unit (LU) function to obtain a characteristic resultAnd BN (Batch Normalization) layer to obtain multi-scale content characteristic Feat after channel compressioni
In this embodiment, step S22 specifically includes the following steps:
step S221: designing three convolution layers with convolution characteristics EniAs input, these three convolutions are all performed by a depth separable hole convolution operation, in which the expansion coefficients of the hole convolutions are 3, 6, 9, respectively; and setting different expansion coefficients of the cavity convolution can enable the convolution operation to capture the content area characteristics with different sizes on the image, namely generating the characteristic result of the multi-scale content area. The feature result and convolution feature En obtained by the three operationsiThe feature sizes of the two-dimensional image are kept consistent, and the feature sizes are all (c, h, w);
step S222: splicing the three characteristic results together on the channel dimension by applying a connection operation (concatee) to obtain a characteristic result with a characteristic size of (3c, h, w);
step S223: applying a convolution operation with convolution kernel size of (1,1) to compress the channel of the feature result obtained in step S222 to the convolution feature EniThe multi-scale content features with feature size (c, h, w) are consistent and obtained, such as Feat in FIG. 4i
In this embodiment, step S3 specifically includes the following steps:
step S31: in order to fuse features of different sizes, the present embodiment designs a multi-scale feature fusion module, as shown in fig. 5, which sets an input multi-scale content feature FeatiHas a characteristic size of (c, h, w); in the multi-scale feature fusion module, the depth separable convolution operation with convolution kernel sizes of (1, k) and (k,1) and the depth separable convolution operation with convolution kernel sizes of (k,1) and (1, k) are respectively applied to obtain the Feat with the input featureiFeature fusion results with consistent sizes; this is equivalent to the convolution operation of (k, k), but requires less computational resources while being able to stitch content region features of different scales from the spatial dimension;
step S32: the decoder structure of the U-Net network and the characteristic network of the encoder are respectively corresponding to 5 characteristic junctions with different scalesConvolution characteristics Dec for each scale generated by decoder structure of U-Net networkiThe multi-scale content feature Feat is fused by applying a multi-scale feature fusion moduleiAnd convolution characteristics Deci+1Here, it is assumed that the convolution characteristic Dec of the inputi+1The characteristic size of (c, h/2, w/2); first, the convolution feature Dec is alignedi+1Applying the upsampling operation to magnify by a factor of two in the spatial dimension, thereby convolving the features Deci+1With multi-scale content features FeatiThe sizes are the same in spatial dimension, and the characteristic size is (c, h, w); then the multi-scale content feature FeatiAnd convolution characteristics Deci+1Obtaining spliced features with the feature size of (2c, h, w) by applying splicing operation, and obtaining a feature result with the feature size of (c, h, w) through a ReLU activation function and a BN layer by applying convolution operation; then, a multi-scale feature fusion module is applied to the obtained feature result to obtain a feature fusion result, meanwhile, the feature result and the feature fusion result are subjected to splicing operation and convolution operation, and a feature result Dec with a feature size of (c, h, w) is obtained through a ReLU activation function and a BN layeri(ii) a Finally, the convolution operation with convolution kernel size of (1,1) is applied to convert the characteristic result DeciThe number of channels is reduced by half in order to be matched with Deci-1Fusing, and obtaining a characteristic result Dec with the characteristic size of (0.5c, h, w) through a ReLU activation function and a BN layeriCompressing the channel to 1 through convolution operation, and obtaining the predicted saliency map Pred through Sigmoid functioni. Notably due to Dec4And Dec5The same number of channels is provided, so here the number of channels is not compressed.
In this embodiment, step S31 specifically includes the following steps:
step S311: successively pair input multi-scale content features FeatiApplying a deeply separable convolution operation with convolution kernels of (1, k) and (k,1) in order to successively align the input features FeatiApplying the convolution operation with separable depths of convolution kernels (k,1) and (1, k), wherein after the two successive operations, a BN layer is added and two characteristic results are obtained respectively;
step S312: summing the two feature results according to the channel dimension to obtain and input feature FeatiCharacteristic results with consistent sizes;
step S313: modeling the features on the channel of the feature result by applying a convolution operation with convolution kernel size of (1,1) and obtaining the feature Feat corresponding to the input featureiAnd fusing the results of the features with the same size.
In the present embodiment, in step S4, an Adam (Adaptive moment estimation) algorithm is used to optimize the loss function in the training phase. The feature result Dec for each scale in step three is shown in FIG. 2iAll correspond to a LossiEach of which is a LossiAre all predicted saliency maps Pred of FIG. 6iAnd calculating a cross entropy loss with the artificial label graph.
Wherein, the calculation of the network cross entropy Loss adopts the following formula:
Figure BDA0001954535490000081
and optimizing by an Adam algorithm to obtain the optimal parameters of the network, and finally predicting the salient objects in the color image by using the network.
When the algorithm extracts more relevant scale features on the basis of the original scale features and then fuses the features, the fused features have stronger generalization capability. Following the idea of extracting and fusing multi-scale convolution features, the embodiment provides a multi-scale feature extraction module and a multi-scale fusion module. The module is directly embedded into a U-Net network architecture of a typical encoder-decoder structure in network design, meanwhile, the redundancy of information on a characteristic channel on the decoder structure is considered, and a channel compression module is applied to enable the model to be more efficient in calculation. In summary, the present embodiment provides a method for detecting a salient object based on multi-scale convolution feature extraction and fusion, and a network structure based on multi-scale feature extraction and fusion designed by the algorithm can significantly improve the detection accuracy of the salient object.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (5)

1. A salient object detection method based on multi-scale convolution feature extraction and fusion is characterized by comprising the following steps: the method comprises the following steps:
step S1: performing data enhancement, and simultaneously processing the color image and the corresponding artificial labeling image to increase the data volume of the training data set;
step S2: extracting multi-scale features, and performing channel compression to optimize the computing efficiency of the network;
step S3: fusing multi-scale features to obtain a predicted saliency map Predi
Step S4: learning the optimal parameters of the model by solving the minimum cross entropy loss; finally, a trained model network is used for predicting the salient objects in the image;
step S2 specifically includes the following steps:
step S21: the inherent network structure of U-Net is improved, wherein the encoder structure of the U-Net network takes an image classification convolution network as a feature network, convolution layers and a pooling layer are combined by continuous stacking to generate convolution features of 5 different scales, and the convolution feature EniAnd convolution feature Eni+1A pooling layer exists between the two layers, and the step length of the pooling layer is set to be 2; setting the step length of the pooling layer between the last two convolution characteristics to be 1;
step S22: designing a multi-scale feature extraction module to act on the convolution feature of each scale generated by the improved U-Net network in the step S21 to obtain multi-scale content features;
step S23: adding a channel compression module to act on the multi-scale content characteristics; step S3 specifically includes the following steps:
step S31: designing a multi-scale feature fusion module, and setting input multi-scale content features FeatiHas a characteristic size of (c, h, w); in the multi-scale feature fusion module, the depth separable convolution operation with convolution kernel sizes of (1, k) and (k,1) and the depth separable convolution operation with convolution kernel sizes of (k,1) and (1, k) are respectively applied to obtain the Feat with the input featureiFeature fusion results with consistent sizes;
step S32: the decoder structure of the U-Net network and the characteristic network of the encoder correspond to 5 characteristic results with different scales, and the convolutional characteristic Dec of each scale is generated by the decoder structure of the U-Net networkiThe multi-scale content feature Feat is fused by applying a multi-scale feature fusion moduleiAnd convolution characteristics Deci+1Here, it is assumed that the convolution characteristic Dec of the inputi+1The characteristic size of (c, h/2, w/2); first, the convolution feature Dec is alignedi+1Applying the upsampling operation to magnify twice in the spatial dimension, thereby convolving the features Deci+1With multi-scale content features FeatiThe sizes are the same in spatial dimension, and the characteristic size is (c, h, w); then the multi-scale content feature FeatiAnd convolution characteristics Deci+1Obtaining spliced features with the feature size of (2c, h, w) by applying splicing operation, and obtaining a feature result with the feature size of (c, h, w) through a ReLU activation function and a BN layer by applying convolution operation; then, a multi-scale feature fusion module is applied to the obtained feature result to obtain a feature fusion result, meanwhile, the feature result and the feature fusion result are subjected to splicing operation and convolution operation, and a feature result Dec with a feature size of (c, h, w) is obtained through a ReLU activation function and a BN layeri(ii) a Finally, the convolution operation with convolution kernel size of (1,1) is applied to convert the characteristic result DeciThe number of channels is reduced by half in order to be matched with Deci-1Fusing, and obtaining a characteristic result Dec with the characteristic size of (0.5c, h, w) through a ReLU activation function and a BN layeriAnd is subjected to a convolution operation to compress the channel to 1,then obtaining a predicted saliency map Pred through a Sigmoid functioni
2. The method for detecting the salient objects based on the multi-scale convolution feature extraction and fusion as claimed in claim 1, wherein: step S1 specifically includes the following steps:
step S11: scaling each color image in the data set together with the corresponding artificial labeling image;
step S12: carrying out random cutting operation on each color image in the data set and the corresponding manual labeling graph;
step S13: the mirror image is generated by image level flipping.
3. The method for detecting the salient objects based on the multi-scale convolution feature extraction and fusion as claimed in claim 1, wherein: step S22 specifically includes the following steps:
step S221: designing three convolution layers with convolution characteristics EniAs input, these three convolutions are all performed by a depth separable hole convolution operation, where the expansion coefficients of the hole convolution are 3, 6, 9, respectively; the feature result and convolution feature En obtained by the three operationsiThe feature sizes of the two-dimensional image are kept consistent, and the feature sizes are all (c, h, w);
step S222: splicing the three characteristic results together on the channel dimension by applying connection operation and obtaining the characteristic result with the characteristic size of (3c, h, w);
step S223: applying a convolution operation with convolution kernel size of (1,1) to compress the channel of the feature result obtained in step S222 to the convolution feature EniAnd (5) conforming and obtaining the multi-scale content features with the feature size of (c, h, w).
4. The method for detecting the salient objects based on the multi-scale convolution feature extraction and fusion as claimed in claim 1, wherein: step S31 specifically includes the following steps:
step S311: successively pair input multi-scale content features FeatiApplying a deeply separable convolution operation with convolution kernels of (1, k) and (k,1) in order to successively align the input features FeatiApplying the convolution operation with separable depths of convolution kernels (k,1) and (1, k), wherein after the two successive operations, a BN layer is added and two characteristic results are obtained respectively;
step S312: summing the two feature results according to the channel dimension to obtain and input feature FeatiCharacteristic results with consistent sizes;
step S313: modeling the features on the channel of the feature result by applying a convolution operation with convolution kernel size of (1,1) and obtaining the feature Feat corresponding to the input featureiAnd fusing the results of the features with the same size.
5. The method for detecting the salient objects based on the multi-scale convolution feature extraction and fusion as claimed in claim 1, wherein: in step S4, the cross entropy Loss is calculated by the following equation:
Figure FDA0003550657900000031
CN201910062293.9A 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion Active CN109635882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910062293.9A CN109635882B (en) 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910062293.9A CN109635882B (en) 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion

Publications (2)

Publication Number Publication Date
CN109635882A CN109635882A (en) 2019-04-16
CN109635882B true CN109635882B (en) 2022-05-13

Family

ID=66063115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910062293.9A Active CN109635882B (en) 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion

Country Status (1)

Country Link
CN (1) CN109635882B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084309B (en) * 2019-04-30 2022-06-21 北京市商汤科技开发有限公司 Feature map amplification method, feature map amplification device, feature map amplification equipment and computer readable storage medium
CN110298397A (en) * 2019-06-25 2019-10-01 东北大学 The multi-tag classification method of heating metal image based on compression convolutional neural networks
CN110322528B (en) * 2019-06-26 2021-05-14 浙江大学 Nuclear magnetic resonance brain image vessel reconstruction method based on 3T and 7T
CN110490892A (en) * 2019-07-03 2019-11-22 中山大学 A kind of Thyroid ultrasound image tubercle automatic positioning recognition methods based on USFaster R-CNN
CN110348390B (en) * 2019-07-12 2023-05-16 创新奇智(重庆)科技有限公司 Training method, computer readable medium and system for flame detection model
CN110378976B (en) * 2019-07-18 2020-11-13 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110660046B (en) * 2019-08-30 2022-09-30 太原科技大学 Industrial product defect image classification method based on lightweight deep neural network
CN111080588A (en) * 2019-12-04 2020-04-28 南京航空航天大学 Multi-scale neural network-based rapid fetal MR image brain extraction method
CN111028246A (en) * 2019-12-09 2020-04-17 北京推想科技有限公司 Medical image segmentation method and device, storage medium and electronic equipment
CN111080599A (en) * 2019-12-12 2020-04-28 哈尔滨市科佳通用机电股份有限公司 Fault identification method for hook lifting rod of railway wagon
CN111191649A (en) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 Method and equipment for identifying bent multi-line text image
CN111814536B (en) * 2020-05-21 2023-11-28 闽江学院 Culture monitoring method and device
CN111860233B (en) * 2020-07-06 2021-05-18 中国科学院空天信息创新研究院 SAR image complex building extraction method and system based on attention network selection
CN112258431B (en) * 2020-09-27 2021-07-20 成都东方天呈智能科技有限公司 Image classification model based on mixed depth separable expansion convolution and classification method thereof
CN112446292B (en) * 2020-10-28 2023-04-28 山东大学 2D image salient object detection method and system
CN112115951B (en) * 2020-11-19 2021-03-09 之江实验室 RGB-D image semantic segmentation method based on spatial relationship
CN112861795A (en) * 2021-03-12 2021-05-28 云知声智能科技股份有限公司 Method and device for detecting salient target of remote sensing image based on multi-scale feature fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171701A (en) * 2018-01-15 2018-06-15 复旦大学 Conspicuousness detection method based on U networks and confrontation study
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN109191426A (en) * 2018-07-24 2019-01-11 江南大学 A kind of flat image conspicuousness detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10033918B2 (en) * 2016-03-29 2018-07-24 Sony Corporation Method and system for image processing to detect salient objects in image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171701A (en) * 2018-01-15 2018-06-15 复旦大学 Conspicuousness detection method based on U networks and confrontation study
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN109191426A (en) * 2018-07-24 2019-01-11 江南大学 A kind of flat image conspicuousness detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning;Hangke Song et al.;《IEEE Transactions on Image Processing》;20170602;第26卷(第9期);第4204-4216页 *
Salient Object Segmentation Based on Superpixel and Background Connectivity Prior;Yuzhen Niu et al.;《IEEE Access》;20181001;第6卷;第56170-56183页 *
基于似物性采样的语义物体检测与分割;李金东;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20190115;全文 *

Also Published As

Publication number Publication date
CN109635882A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109635882B (en) Salient object detection method based on multi-scale convolution feature extraction and fusion
CN110728219B (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN111696110B (en) Scene segmentation method and system
CN112348870B (en) Significance target detection method based on residual error fusion
Zanardelli et al. Image forgery detection: a survey of recent deep-learning approaches
CA3137297C (en) Adaptive convolutions in neural networks
CN111223057B (en) Incremental focused image-to-image conversion method based on generation of countermeasure network
CN107871103B (en) Face authentication method and device
CN112927209B (en) CNN-based significance detection system and method
CN113095254B (en) Method and system for positioning key points of human body part
CN115345866B (en) Building extraction method in remote sensing image, electronic equipment and storage medium
CN112785637A (en) Light field depth estimation method based on dynamic fusion network
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN114926734A (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN114742985A (en) Hyperspectral feature extraction method and device and storage medium
CN112990356A (en) Video instance segmentation system and method
CN113313140A (en) Three-dimensional model classification and retrieval method and device based on deep attention
CN115937121A (en) Non-reference image quality evaluation method and system based on multi-dimensional feature fusion
CN115471718A (en) Construction and detection method of lightweight significance target detection model based on multi-scale learning
CN113919479B (en) Method for extracting data features and related device
CN113536977B (en) 360-degree panoramic image-oriented saliency target detection method
Sun et al. Adversarial training for dual-stage image denoising enhanced with feature matching
Joshi et al. Meta-Learning, Fast Adaptation, and Latent Representation for Head Pose Estimation
CN113298814A (en) Indoor scene image processing method based on progressive guidance fusion complementary network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant