CN109635882A - Salient object detection method based on multi-scale convolution feature extraction and fusion - Google Patents
Salient object detection method based on multi-scale convolution feature extraction and fusion Download PDFInfo
- Publication number
- CN109635882A CN109635882A CN201910062293.9A CN201910062293A CN109635882A CN 109635882 A CN109635882 A CN 109635882A CN 201910062293 A CN201910062293 A CN 201910062293A CN 109635882 A CN109635882 A CN 109635882A
- Authority
- CN
- China
- Prior art keywords
- feature
- convolution
- multiple dimensioned
- network
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Abstract
The invention relates to a salient object detection method based on multi-scale convolution feature extraction and fusion, which comprises the steps of firstly enhancing data, simultaneously processing a color image and a corresponding artificial labeling image, and increasing the data volume of a training data set; extracting multi-scale features, and performing channel compression to optimize the computing efficiency of the network; then, fusing multi-scale features to obtain a predicted saliency map; finally, the optimal parameters of the model are learned by solving the minimum cross entropy loss; and finally, predicting the salient objects in the image by using the trained model network. The invention can obviously improve the detection precision of the obvious object.
Description
Technical field
The present invention relates to image procossing and computer vision fields, especially a kind of to be based on multiple dimensioned convolution feature extraction
With the obvious object detection method of fusion.
Background technique
How an opening that a variety of scale convolution be characterized in obvious object detection field is merged in full convolutional network
Problem.From this problem, most of existing obvious object detection methods based on full convolutional neural networks generally pass through
Addition network branches enable the convolution feature of different scale to be merged via branch, and obvious object is detected to generate
Task more useful feature.The obvious object detection algorithm proposed after 2015, it is most of to be devoted to using full convolution mind
The precision of computational efficiency and obvious object detection through network (FCNN) Lai Tisheng network.
These work can be divided into two classes, be the innovation of full convolutional network structure first, Li et al. people is in training in advance
After the feature for obtaining different scale on VGG-16 network, the feature of each scale obtains new characteristic results by convolutional calculation,
It is operated again by these characteristic recoveries by up-sampling to unified size, finally obtains significant testing result by convolution operation, and
And the branch for having merged a super-pixel scale optimizes the result of last obvious object detection from space scale.Wang etc.
The obvious object network that people proposes is the full convolutional neural networks of coder-decoder form, while being additionally added circulation nerve net
Network structure carrys out continuous iteration optimization obvious object testing result.Cheng et al. joined short connection structure in full convolutional network
(short connection structure), since each output branch has merged high-level semantics letter in short connection structure
Breath and the features such as rudimentary texture, shape, the performance of algorithm are significantly improved and also keep model simple and efficient simultaneously.
However, most methods be using trained character network preparatory in classification task come in converged network it is corresponding not
With the convolution feature of scale, and the scale of these features is typically all limited and fixed.
Summary of the invention
In view of this, the purpose of the present invention is to propose to a kind of obvious objects based on multiple dimensioned convolution feature extraction and fusion
Detection method can significantly improve obvious object detection accuracy.
The present invention is realized using following scheme: a kind of to be detected based on the obvious object of multiple dimensioned convolution feature extraction and fusion
Method, specifically includes the following steps:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction
Practice the data volume of data set;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi;
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally using trained
Prototype network carrys out the obvious object in forecast image.
Further, step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes
The calculation amount of neural network can be undertaken by calculating equipment;
Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together
Operation is cut, to increase the diversity of data;
Step S13: it is overturn by image level and generates mirror image, to expand the data volume of legacy data collection.
Further, step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme
Picture classification convolutional network generates 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network
Convolution feature, in convolution feature EniWith convolution feature Eni+1Between there are a pond layers to reduce characteristic pattern step by step
Size, the step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results are reduced in wide and high two spaces dimension
Half;In order to keep convolution feature and have enough information on Spatial Dimension, make between most latter two convolution feature
The step-length of pond layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated
The convolution feature of each scale, obtains multiple dimensioned content characteristic;
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
Further, step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth
Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;These three operate obtained feature knot
Fruit and convolution feature EniFeature sizes be consistent, feature sizes are all (c, h, w);
Three characteristic results: being stitched together by step S222 using attended operation on channel dimension and it is big to obtain feature
The small characteristic results for (3c, h, w);
Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1)
Channel compressions to convolution feature EniUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).
Further, step S3 specifically includes the following steps:
Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputiFeature
Size is (c, h, w);It is respectively the depth of (1, k) and (k, 1) using convolution kernel size in the multi-scale feature fusion module
It spends separable convolution operation and convolution kernel size is the separable convolution operation of depth of (k, 1) and (1, k), obtain and input
Feature FeatiFusion Features result of the same size;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales
Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, all it is using multiple dimensioned spy
Fusion Module is levied to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature of input
Deci+1Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Using up-sampling operation on Spatial Dimension
Twice of amplification, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHave same size on Spatial Dimension, it is special
Levying size is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1It is spelled using concatenation
Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers
Size is the characteristic results of (c, h, w);And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module
Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU
It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)i;Finally, reapplying convolution kernel size as (1,1) volume
Product is operated characteristic results DeciPort number compression half in order to Deci-1Merged, by ReLU activation primitive with
And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Deci, and pass it through convolution operation for channel compressions to 1,
The notable figure Pred that can be predicted using Sigmoid functioni。
Further, step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is (1, k) and (k, 1) using convolution kernel size
The separable convolution operation of depth, while again successively to input feature vector FeatiIt is (k, 1) and (1, k) using convolution kernel size
The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiSize
Consistent characteristic results;
Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1)
Sign is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
Further, in step S4, the calculating for intersecting entropy loss Loss uses following formula:
Compared with prior art, the invention has the following beneficial effects: the invention proposes a Multi resolution feature extraction moulds
Module is directly embedded into the U-Net of typical coder-decoder structure in network design by block and Multiscale Fusion module
The network architecture, while the redundancy of information on feature channel on decoder architecture is also contemplated, apply a channel compressions mould
Block is higher to make model computational efficiency.The method can significantly improve obvious object detection accuracy.
Detailed description of the invention
Fig. 1 is the method flow schematic diagram of the embodiment of the present invention.
Fig. 2 is that the obvious object of the embodiment of the present invention detects network structure.
Fig. 3 is the Multi resolution feature extraction module diagram of the embodiment of the present invention.
Fig. 4 is the channel compressions module diagram of the embodiment of the present invention.
Fig. 5 is the multi-scale feature fusion module diagram of the embodiment of the present invention.
Fig. 6 is the schematic network structure of the multi-scale feature fusion process of the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
As shown in Figure 1, present embodiments providing a kind of based on the inspection of the obvious object of multiple dimensioned convolution feature extraction and fusion
Survey method, specifically includes the following steps:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction
Practice the data volume of data set;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi;
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally using trained
Prototype network carrys out the obvious object in forecast image.
In the present embodiment, the step S1 carries out data enhancing, while to color image and corresponding artificial mark
Figure is handled, and the data volume of training dataset is increased.For training the international mainstream data set one of obvious object detection network
As be to scheme comprising color image and corresponding handmarking, wherein in color image such as Fig. 2 shown in (a), handmarking's figure class
It is similar to the bianry image that notable figure (in such as Fig. 2 (b)) is the image obvious object region gone out by handmarking.Due to constructing data
Collection needs to expend biggish manpower, considers that training deep neural network needs enough data, it is therefore desirable in legacy data collection
Data enhancement operations are carried out on the basis of data volume.Therefore step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes
The calculation amount of neural network can be undertaken by calculating equipment;
Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together
Operation is cut, to increase the diversity of data;
Step S13: being overturn by image level and generate mirror image, to expand the data volume of legacy data collection, to meet instruction
Practice biggish data volume required for the convolutional neural networks CNN of depth, enhances the generalization ability of model.
In the present embodiment, step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme
As classification convolutional network is as character network (such as VGG or ResNet network structure), by continuous stacked combination convolutional layer with
And pond layer generates the convolution feature of 5 kinds of different scales, such as the En in Fig. 21, En2, En3, En4And En5Five corresponding spies
Levy result.In this five convolution features, in convolution feature EniWith convolution feature Eni+1Between there are a pond layers to come gradually
Ground reduces the size of characteristic pattern, and the step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results are wide and two high
Reduce half on Spatial Dimension, this also results in the decaying of convolution feature information on Spatial Dimension;In order to keep convolution special
It seeks peace and has enough information on Spatial Dimension, make most latter two convolution feature (En4And En5) between pond layer step-length
It is 1, so that most latter two convolution feature (En4And En5) size that is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated
The convolution feature of each scale, obtains multiple dimensioned content characteristic;Wherein, Multi resolution feature extraction module is as shown in figure 3, here
Assuming that the feature sizes of convolution feature are (c, h, w);
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
Channel compressions module is as shown in figure 4, wherein " SE Module " is by Hu et al. in SENet (Squeeze-and-Excitation
Networks) the module proposed in paper.SE module is with multiple dimensioned content characteristic FeatiAs input, by each channel
On feature between correlation modeled and carry out weighting operations to keep feature generalization ability stronger.Then channel compressions mould
The port number of characteristic results is compressed to original half using the convolution operation that a convolution kernel size is (1,1) by block, then
By ReLU (Rectified Linear Unit, line rectification function) function and BN, (Batch Normalization is criticized
Normalization) layer obtains the multiple dimensioned content characteristic Feat after channel compressionsi。
In the present embodiment, step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth
Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;The swollen of different empty convolution is set
Swollen coefficient can make convolution operation capture different size of content area feature on image, that is, generate multiple dimensioned content regions
The characteristic results in domain.The characteristic results and convolution feature En that these three operations obtainiFeature sizes be consistent, feature sizes
All it is (c, h, w);
Step S222: three characteristic results are stitched together simultaneously on channel dimension using attended operation (concate)
Obtain the characteristic results that feature sizes are (3c, h, w);
Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1)
Channel compressions to convolution feature EniUnanimously and the multiple dimensioned content characteristic that feature sizes are (c, h, w) is obtained, in Fig. 4
Feati。
In the present embodiment, step S3 specifically includes the following steps:
Step S31: in order to merge various sizes of feature, the present embodiment designs a multi-scale feature fusion module, such as
Shown in Fig. 5, if the multiple dimensioned content characteristic Feat of inputiFeature sizes be (c, h, w);In the multi-scale feature fusion mould
In block, respectively using convolution kernel size be (1, k) and (k, 1) the separable convolution operation of depth and convolution kernel size for (k,
1) it with the separable convolution operation of the depth of (1, k), obtains and input feature vector FeatiFusion Features result of the same size;This
It is equivalent to the convolution operation of (k, k), but required computing resource can less splice different scale from Spatial Dimension simultaneously
Content area feature;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales
Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, all it is using multiple dimensioned spy
Fusion Module is levied to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature of input
Deci+1Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Using up-sampling operation on Spatial Dimension
Twice of amplification, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHave same size on Spatial Dimension, it is special
Levying size is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1It is spelled using concatenation
Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers
Size is the characteristic results of (c, h, w);And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module
Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU
It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)i;Finally, reapplying convolution kernel size as (1,1) volume
Product is operated characteristic results DeciPort number compression half in order to Deci-1Merged, by ReLU activation primitive with
And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Deci, and pass it through convolution operation for channel compressions to 1,
The notable figure Pred that can be predicted using Sigmoid functioni.It is worth noting that due to Dec4And Dec5Has phase
Same number of active lanes, so port number is without compression here.
In the present embodiment, step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is (1, k) and (k, 1) using convolution kernel size
The separable convolution operation of depth, while again successively to input feature vector FeatiIt is (k, 1) and (1, k) using convolution kernel size
The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiSize
Consistent characteristic results;
Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1)
Sign is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
In the present embodiment, in step S4, using Adam, (Adaptive moment estimation, adaptive square are estimated
Meter) algorithm the training stage optimize loss function.As shown in Figure 2, in step 3 each scale characteristic results DeciIt is all right
Answer a LossiCalculating, wherein each LossiIt is all the notable figure Pred predicted in Fig. 6iIt is calculated with artificial mark figure
An obtained intersection entropy loss.
Wherein, the calculating that network intersects entropy loss Loss uses following formula:
The optimized parameter of network is obtained by Adam algorithm optimization, is finally predicted using network significant in color image
Object.
It, can be with when algorithm goes the feature for extracting more scale dependents to be merged again on the basis of original scale feature
Make fused feature that there is stronger generalization ability.The thinking extracting a variety of scale convolution features and being merged is follow,
The present embodiment proposes a Multi resolution feature extraction module and Multiscale Fusion module.It is in network design that module is directly embedding
Enter the U-Net network architecture to typical coder-decoder structure, while also contemplating on decoder architecture on feature channel
The redundancy of information applies a channel compressions module to keep model computational efficiency higher.In conclusion the present embodiment proposes
A kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, one kind of algorithm design are based on multiple dimensioned
The network structure of feature extraction and fusion can significantly improve obvious object detection accuracy.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The above described is only a preferred embodiment of the present invention, being not that the invention has other forms of limitations, appoint
What those skilled in the art changed or be modified as possibly also with the technology contents of the disclosure above equivalent variations etc.
Imitate embodiment.But without departing from the technical solutions of the present invention, according to the technical essence of the invention to above embodiments institute
Any simple modification, equivalent variations and the remodeling made, still fall within the protection scope of technical solution of the present invention.
Claims (7)
1. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, it is characterised in that: including following
Step:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases training number
According to the data volume of collection;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi;
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally utilize trained model
Network carrys out the obvious object in forecast image.
2. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1,
It is characterized by: step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes to calculate
Equipment can undertake the calculation amount of neural network;
Step S12: each color image artificial mark figure corresponding with its concentrated to data carries out random cropping behaviour together
Make, to increase the diversity of data;
Step S13: it is overturn by image level and generates mirror image, to expand the data volume of legacy data collection.
3. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1,
It is characterized by: step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is with image point
Class convolutional network generates the volume of 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network
Product feature, in convolution feature EniWith convolution feature Eni+1Between reduce the size of characteristic pattern step by step there are a pond layer,
The step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results reduce one in wide and high two spaces dimension
Half;In order to keep convolution feature and have enough information on Spatial Dimension, make the pond between most latter two convolution feature
The step-length of layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on each of the improved U-Net network generation of step S21
The convolution feature of a scale, obtains multiple dimensioned content characteristic;
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
4. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 3,
It is characterized by: step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth to separate
Empty convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;These three operate obtained characteristic results and volume
Product feature EniFeature sizes be consistent, feature sizes are all (c, h, w);
Step S222: using attended operation being stitched together and obtain feature sizes three characteristic results on channel dimension is
The characteristic results of (3c, h, w);
Step S223: using the convolution operation that a convolution kernel size is (1,1) leading to the obtained characteristic results of step S222
Road is compressed to and convolution feature EniUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).
5. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1,
It is characterized by: step S3 specifically includes the following steps:
Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputiFeature sizes be
(c,h,w);In the multi-scale feature fusion module, it can divide using the depth that convolution kernel size is (1, k) and (k, 1) respectively
From convolution operation and convolution kernel size be (k, 1) and (1, k) the separable convolution operation of depth, obtain and input feature vector
FeatiFusion Features result of the same size;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with the feature of 5 different scales
As a result, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, melted using Analysis On Multi-scale Features
Block is molded to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature Dec of inputi+1
Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Amplify on Spatial Dimension using up-sampling operation
Twice, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHas same size on Spatial Dimension, feature is big
Small is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1After obtaining splicing using concatenation
Feature, feature sizes are (2c, h, w), reapply convolution operation, obtain feature sizes by ReLU activation primitive and BN layers
For the characteristic results of (c, h, w);And then, Fusion Features are obtained to obtained characteristic results application multi-scale feature fusion module
As a result, this feature result and Fusion Features result are carried out concatenation again simultaneously, convolution operation, by ReLU activation primitive with
And BN layers obtain feature sizes be (c, h, w) characteristic results Deci;Finally, reapplying convolution kernel size as (1,1) convolution behaviour
Make characteristic results DeciPort number compression half in order to Deci-1It is merged, by ReLU activation primitive and BN
Layer obtains the characteristic results Dec that feature sizes are (0.5c, h, w)i, and convolution operation is passed it through by channel compressions to 1, then is passed through
Cross the notable figure Pred that Sigmoid function can be predictedi。
6. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 5,
It is characterized by: step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is the depth of (1, k) and (k, 1) using convolution kernel size
Separable convolution operation, while again successively to input feature vector FeatiIt is the depth of (k, 1) and (1, k) using convolution kernel size
Separable convolution operation has BN layers of addition after this successively operation twice and respectively obtains two characteristic results;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiIt is of the same size
Characteristic results;
Step S313: using the convolution operation that convolution kernel size is (1,1) come the feature on the channel to characteristic results into
Row is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
7. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1,
It is characterized by: the calculating for intersecting entropy loss Loss uses following formula in step S4:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910062293.9A CN109635882B (en) | 2019-01-23 | 2019-01-23 | Salient object detection method based on multi-scale convolution feature extraction and fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910062293.9A CN109635882B (en) | 2019-01-23 | 2019-01-23 | Salient object detection method based on multi-scale convolution feature extraction and fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109635882A true CN109635882A (en) | 2019-04-16 |
CN109635882B CN109635882B (en) | 2022-05-13 |
Family
ID=66063115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910062293.9A Active CN109635882B (en) | 2019-01-23 | 2019-01-23 | Salient object detection method based on multi-scale convolution feature extraction and fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635882B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084309A (en) * | 2019-04-30 | 2019-08-02 | 北京市商汤科技开发有限公司 | Characteristic pattern amplification method, device and equipment and computer readable storage medium |
CN110298397A (en) * | 2019-06-25 | 2019-10-01 | 东北大学 | The multi-tag classification method of heating metal image based on compression convolutional neural networks |
CN110322528A (en) * | 2019-06-26 | 2019-10-11 | 浙江大学 | Nuclear magnetic resonance brain image reconstructing blood vessel method based on 3T, 7T |
CN110348390A (en) * | 2019-07-12 | 2019-10-18 | 创新奇智(重庆)科技有限公司 | A kind of training method, computer-readable medium and the system of fire defector model |
CN110378976A (en) * | 2019-07-18 | 2019-10-25 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN110490892A (en) * | 2019-07-03 | 2019-11-22 | 中山大学 | A kind of Thyroid ultrasound image tubercle automatic positioning recognition methods based on USFaster R-CNN |
CN110660046A (en) * | 2019-08-30 | 2020-01-07 | 太原科技大学 | Industrial product defect image classification method based on lightweight deep neural network |
CN111028246A (en) * | 2019-12-09 | 2020-04-17 | 北京推想科技有限公司 | Medical image segmentation method and device, storage medium and electronic equipment |
CN111080599A (en) * | 2019-12-12 | 2020-04-28 | 哈尔滨市科佳通用机电股份有限公司 | Fault identification method for hook lifting rod of railway wagon |
CN111080588A (en) * | 2019-12-04 | 2020-04-28 | 南京航空航天大学 | Multi-scale neural network-based rapid fetal MR image brain extraction method |
CN111191649A (en) * | 2019-12-31 | 2020-05-22 | 上海眼控科技股份有限公司 | Method and equipment for identifying bent multi-line text image |
CN111814536A (en) * | 2020-05-21 | 2020-10-23 | 闽江学院 | Breeding monitoring method and device |
CN111860233A (en) * | 2020-07-06 | 2020-10-30 | 中国科学院空天信息创新研究院 | SAR image complex building extraction method and system based on attention network selection |
CN112115951A (en) * | 2020-11-19 | 2020-12-22 | 之江实验室 | RGB-D image semantic segmentation method based on spatial relationship |
CN112258431A (en) * | 2020-09-27 | 2021-01-22 | 成都东方天呈智能科技有限公司 | Image classification model based on mixed depth separable expansion convolution and classification method thereof |
CN112446292A (en) * | 2020-10-28 | 2021-03-05 | 山东大学 | 2D image salient target detection method and system |
CN112861795A (en) * | 2021-03-12 | 2021-05-28 | 云知声智能科技股份有限公司 | Method and device for detecting salient target of remote sensing image based on multi-scale feature fusion |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170289434A1 (en) * | 2016-03-29 | 2017-10-05 | Sony Corporation | Method and system for image processing to detect salient objects in image |
CN108171701A (en) * | 2018-01-15 | 2018-06-15 | 复旦大学 | Conspicuousness detection method based on U networks and confrontation study |
CN109165660A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of obvious object detection method based on convolutional neural networks |
CN109191426A (en) * | 2018-07-24 | 2019-01-11 | 江南大学 | A kind of flat image conspicuousness detection method |
-
2019
- 2019-01-23 CN CN201910062293.9A patent/CN109635882B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170289434A1 (en) * | 2016-03-29 | 2017-10-05 | Sony Corporation | Method and system for image processing to detect salient objects in image |
CN108171701A (en) * | 2018-01-15 | 2018-06-15 | 复旦大学 | Conspicuousness detection method based on U networks and confrontation study |
CN109165660A (en) * | 2018-06-20 | 2019-01-08 | 扬州大学 | A kind of obvious object detection method based on convolutional neural networks |
CN109191426A (en) * | 2018-07-24 | 2019-01-11 | 江南大学 | A kind of flat image conspicuousness detection method |
Non-Patent Citations (3)
Title |
---|
HANGKE SONG ET AL.: "Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
YUZHEN NIU ET AL.: "Salient Object Segmentation Based on Superpixel and Background Connectivity Prior", 《IEEE ACCESS》 * |
李金东: "基于似物性采样的语义物体检测与分割", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084309A (en) * | 2019-04-30 | 2019-08-02 | 北京市商汤科技开发有限公司 | Characteristic pattern amplification method, device and equipment and computer readable storage medium |
US11049217B2 (en) | 2019-04-30 | 2021-06-29 | Beijing Sensetime Technology Development Co., Ltd. | Magnifying feature map |
CN110084309B (en) * | 2019-04-30 | 2022-06-21 | 北京市商汤科技开发有限公司 | Feature map amplification method, feature map amplification device, feature map amplification equipment and computer readable storage medium |
CN110298397A (en) * | 2019-06-25 | 2019-10-01 | 东北大学 | The multi-tag classification method of heating metal image based on compression convolutional neural networks |
CN110322528A (en) * | 2019-06-26 | 2019-10-11 | 浙江大学 | Nuclear magnetic resonance brain image reconstructing blood vessel method based on 3T, 7T |
CN110490892A (en) * | 2019-07-03 | 2019-11-22 | 中山大学 | A kind of Thyroid ultrasound image tubercle automatic positioning recognition methods based on USFaster R-CNN |
CN110348390A (en) * | 2019-07-12 | 2019-10-18 | 创新奇智(重庆)科技有限公司 | A kind of training method, computer-readable medium and the system of fire defector model |
CN110378976A (en) * | 2019-07-18 | 2019-10-25 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
WO2021008022A1 (en) * | 2019-07-18 | 2021-01-21 | 北京市商汤科技开发有限公司 | Image processing method and apparatus, electronic device and storage medium |
CN110660046B (en) * | 2019-08-30 | 2022-09-30 | 太原科技大学 | Industrial product defect image classification method based on lightweight deep neural network |
CN110660046A (en) * | 2019-08-30 | 2020-01-07 | 太原科技大学 | Industrial product defect image classification method based on lightweight deep neural network |
CN111080588A (en) * | 2019-12-04 | 2020-04-28 | 南京航空航天大学 | Multi-scale neural network-based rapid fetal MR image brain extraction method |
CN111028246A (en) * | 2019-12-09 | 2020-04-17 | 北京推想科技有限公司 | Medical image segmentation method and device, storage medium and electronic equipment |
CN111080599A (en) * | 2019-12-12 | 2020-04-28 | 哈尔滨市科佳通用机电股份有限公司 | Fault identification method for hook lifting rod of railway wagon |
CN111191649A (en) * | 2019-12-31 | 2020-05-22 | 上海眼控科技股份有限公司 | Method and equipment for identifying bent multi-line text image |
CN111814536A (en) * | 2020-05-21 | 2020-10-23 | 闽江学院 | Breeding monitoring method and device |
CN111814536B (en) * | 2020-05-21 | 2023-11-28 | 闽江学院 | Culture monitoring method and device |
CN111860233A (en) * | 2020-07-06 | 2020-10-30 | 中国科学院空天信息创新研究院 | SAR image complex building extraction method and system based on attention network selection |
CN111860233B (en) * | 2020-07-06 | 2021-05-18 | 中国科学院空天信息创新研究院 | SAR image complex building extraction method and system based on attention network selection |
CN112258431B (en) * | 2020-09-27 | 2021-07-20 | 成都东方天呈智能科技有限公司 | Image classification model based on mixed depth separable expansion convolution and classification method thereof |
CN112258431A (en) * | 2020-09-27 | 2021-01-22 | 成都东方天呈智能科技有限公司 | Image classification model based on mixed depth separable expansion convolution and classification method thereof |
CN112446292A (en) * | 2020-10-28 | 2021-03-05 | 山东大学 | 2D image salient target detection method and system |
CN112446292B (en) * | 2020-10-28 | 2023-04-28 | 山东大学 | 2D image salient object detection method and system |
CN112115951A (en) * | 2020-11-19 | 2020-12-22 | 之江实验室 | RGB-D image semantic segmentation method based on spatial relationship |
CN112861795A (en) * | 2021-03-12 | 2021-05-28 | 云知声智能科技股份有限公司 | Method and device for detecting salient target of remote sensing image based on multi-scale feature fusion |
Also Published As
Publication number | Publication date |
---|---|
CN109635882B (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635882A (en) | Salient object detection method based on multi-scale convolution feature extraction and fusion | |
KR102302725B1 (en) | Room Layout Estimation Methods and Techniques | |
Garcia-Garcia et al. | A review on deep learning techniques applied to semantic segmentation | |
CN104809187B (en) | A kind of indoor scene semanteme marking method based on RGB D data | |
CN110210551A (en) | A kind of visual target tracking method based on adaptive main body sensitivity | |
CN109753885B (en) | Target detection method and device and pedestrian detection method and system | |
CN108961327A (en) | A kind of monocular depth estimation method and its device, equipment and storage medium | |
CN110111366A (en) | A kind of end-to-end light stream estimation method based on multistage loss amount | |
CN103262119B (en) | For the method and system that image is split | |
CN108564097A (en) | A kind of multiscale target detection method based on depth convolutional neural networks | |
CN110073359A (en) | Valid data for convolutional neural networks are laid out | |
CN108734120A (en) | Mark method, apparatus, equipment and the computer readable storage medium of image | |
CN113874883A (en) | Hand pose estimation | |
CN109816769A (en) | Scene based on depth camera ground drawing generating method, device and equipment | |
WO2021218786A1 (en) | Data processing system, object detection method and apparatus thereof | |
CN106372648A (en) | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method | |
CN109214366A (en) | Localized target recognition methods, apparatus and system again | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN108121931A (en) | two-dimensional code data processing method, device and mobile terminal | |
CN111488827A (en) | Crowd counting method and system based on multi-scale feature information | |
CN108596919A (en) | A kind of Automatic image segmentation method based on depth map | |
CN110349167A (en) | A kind of image instance dividing method and device | |
CN110991444A (en) | Complex scene-oriented license plate recognition method and device | |
CN114918918B (en) | Domain-containing self-adaptive robot disordered target pushing and grabbing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |