CN109635882A - Salient object detection method based on multi-scale convolution feature extraction and fusion - Google Patents

Salient object detection method based on multi-scale convolution feature extraction and fusion Download PDF

Info

Publication number
CN109635882A
CN109635882A CN201910062293.9A CN201910062293A CN109635882A CN 109635882 A CN109635882 A CN 109635882A CN 201910062293 A CN201910062293 A CN 201910062293A CN 109635882 A CN109635882 A CN 109635882A
Authority
CN
China
Prior art keywords
feature
convolution
multiple dimensioned
network
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910062293.9A
Other languages
Chinese (zh)
Other versions
CN109635882B (en
Inventor
牛玉贞
龙观潮
郭文忠
苏超然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201910062293.9A priority Critical patent/CN109635882B/en
Publication of CN109635882A publication Critical patent/CN109635882A/en
Application granted granted Critical
Publication of CN109635882B publication Critical patent/CN109635882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Abstract

The invention relates to a salient object detection method based on multi-scale convolution feature extraction and fusion, which comprises the steps of firstly enhancing data, simultaneously processing a color image and a corresponding artificial labeling image, and increasing the data volume of a training data set; extracting multi-scale features, and performing channel compression to optimize the computing efficiency of the network; then, fusing multi-scale features to obtain a predicted saliency map; finally, the optimal parameters of the model are learned by solving the minimum cross entropy loss; and finally, predicting the salient objects in the image by using the trained model network. The invention can obviously improve the detection precision of the obvious object.

Description

A kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion
Technical field
The present invention relates to image procossing and computer vision fields, especially a kind of to be based on multiple dimensioned convolution feature extraction With the obvious object detection method of fusion.
Background technique
How an opening that a variety of scale convolution be characterized in obvious object detection field is merged in full convolutional network Problem.From this problem, most of existing obvious object detection methods based on full convolutional neural networks generally pass through Addition network branches enable the convolution feature of different scale to be merged via branch, and obvious object is detected to generate Task more useful feature.The obvious object detection algorithm proposed after 2015, it is most of to be devoted to using full convolution mind The precision of computational efficiency and obvious object detection through network (FCNN) Lai Tisheng network.
These work can be divided into two classes, be the innovation of full convolutional network structure first, Li et al. people is in training in advance After the feature for obtaining different scale on VGG-16 network, the feature of each scale obtains new characteristic results by convolutional calculation, It is operated again by these characteristic recoveries by up-sampling to unified size, finally obtains significant testing result by convolution operation, and And the branch for having merged a super-pixel scale optimizes the result of last obvious object detection from space scale.Wang etc. The obvious object network that people proposes is the full convolutional neural networks of coder-decoder form, while being additionally added circulation nerve net Network structure carrys out continuous iteration optimization obvious object testing result.Cheng et al. joined short connection structure in full convolutional network (short connection structure), since each output branch has merged high-level semantics letter in short connection structure Breath and the features such as rudimentary texture, shape, the performance of algorithm are significantly improved and also keep model simple and efficient simultaneously.
However, most methods be using trained character network preparatory in classification task come in converged network it is corresponding not With the convolution feature of scale, and the scale of these features is typically all limited and fixed.
Summary of the invention
In view of this, the purpose of the present invention is to propose to a kind of obvious objects based on multiple dimensioned convolution feature extraction and fusion Detection method can significantly improve obvious object detection accuracy.
The present invention is realized using following scheme: a kind of to be detected based on the obvious object of multiple dimensioned convolution feature extraction and fusion Method, specifically includes the following steps:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction Practice the data volume of data set;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally using trained Prototype network carrys out the obvious object in forecast image.
Further, step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes The calculation amount of neural network can be undertaken by calculating equipment;
Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together Operation is cut, to increase the diversity of data;
Step S13: it is overturn by image level and generates mirror image, to expand the data volume of legacy data collection.
Further, step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme Picture classification convolutional network generates 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network Convolution feature, in convolution feature EniWith convolution feature Eni+1Between there are a pond layers to reduce characteristic pattern step by step Size, the step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results are reduced in wide and high two spaces dimension Half;In order to keep convolution feature and have enough information on Spatial Dimension, make between most latter two convolution feature The step-length of pond layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated The convolution feature of each scale, obtains multiple dimensioned content characteristic;
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
Further, step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;These three operate obtained feature knot Fruit and convolution feature EniFeature sizes be consistent, feature sizes are all (c, h, w);
Three characteristic results: being stitched together by step S222 using attended operation on channel dimension and it is big to obtain feature The small characteristic results for (3c, h, w);
Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1) Channel compressions to convolution feature EniUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).
Further, step S3 specifically includes the following steps:
Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputiFeature Size is (c, h, w);It is respectively the depth of (1, k) and (k, 1) using convolution kernel size in the multi-scale feature fusion module It spends separable convolution operation and convolution kernel size is the separable convolution operation of depth of (k, 1) and (1, k), obtain and input Feature FeatiFusion Features result of the same size;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, all it is using multiple dimensioned spy Fusion Module is levied to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature of input Deci+1Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Using up-sampling operation on Spatial Dimension Twice of amplification, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHave same size on Spatial Dimension, it is special Levying size is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1It is spelled using concatenation Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers Size is the characteristic results of (c, h, w);And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)i;Finally, reapplying convolution kernel size as (1,1) volume Product is operated characteristic results DeciPort number compression half in order to Deci-1Merged, by ReLU activation primitive with And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Deci, and pass it through convolution operation for channel compressions to 1, The notable figure Pred that can be predicted using Sigmoid functioni
Further, step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is (1, k) and (k, 1) using convolution kernel size The separable convolution operation of depth, while again successively to input feature vector FeatiIt is (k, 1) and (1, k) using convolution kernel size The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiSize Consistent characteristic results;
Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1) Sign is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
Further, in step S4, the calculating for intersecting entropy loss Loss uses following formula:
Compared with prior art, the invention has the following beneficial effects: the invention proposes a Multi resolution feature extraction moulds Module is directly embedded into the U-Net of typical coder-decoder structure in network design by block and Multiscale Fusion module The network architecture, while the redundancy of information on feature channel on decoder architecture is also contemplated, apply a channel compressions mould Block is higher to make model computational efficiency.The method can significantly improve obvious object detection accuracy.
Detailed description of the invention
Fig. 1 is the method flow schematic diagram of the embodiment of the present invention.
Fig. 2 is that the obvious object of the embodiment of the present invention detects network structure.
Fig. 3 is the Multi resolution feature extraction module diagram of the embodiment of the present invention.
Fig. 4 is the channel compressions module diagram of the embodiment of the present invention.
Fig. 5 is the multi-scale feature fusion module diagram of the embodiment of the present invention.
Fig. 6 is the schematic network structure of the multi-scale feature fusion process of the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
As shown in Figure 1, present embodiments providing a kind of based on the inspection of the obvious object of multiple dimensioned convolution feature extraction and fusion Survey method, specifically includes the following steps:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction Practice the data volume of data set;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally using trained Prototype network carrys out the obvious object in forecast image.
In the present embodiment, the step S1 carries out data enhancing, while to color image and corresponding artificial mark Figure is handled, and the data volume of training dataset is increased.For training the international mainstream data set one of obvious object detection network As be to scheme comprising color image and corresponding handmarking, wherein in color image such as Fig. 2 shown in (a), handmarking's figure class It is similar to the bianry image that notable figure (in such as Fig. 2 (b)) is the image obvious object region gone out by handmarking.Due to constructing data Collection needs to expend biggish manpower, considers that training deep neural network needs enough data, it is therefore desirable in legacy data collection Data enhancement operations are carried out on the basis of data volume.Therefore step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes The calculation amount of neural network can be undertaken by calculating equipment;
Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together Operation is cut, to increase the diversity of data;
Step S13: being overturn by image level and generate mirror image, to expand the data volume of legacy data collection, to meet instruction Practice biggish data volume required for the convolutional neural networks CNN of depth, enhances the generalization ability of model.
In the present embodiment, step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme As classification convolutional network is as character network (such as VGG or ResNet network structure), by continuous stacked combination convolutional layer with And pond layer generates the convolution feature of 5 kinds of different scales, such as the En in Fig. 21, En2, En3, En4And En5Five corresponding spies Levy result.In this five convolution features, in convolution feature EniWith convolution feature Eni+1Between there are a pond layers to come gradually Ground reduces the size of characteristic pattern, and the step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results are wide and two high Reduce half on Spatial Dimension, this also results in the decaying of convolution feature information on Spatial Dimension;In order to keep convolution special It seeks peace and has enough information on Spatial Dimension, make most latter two convolution feature (En4And En5) between pond layer step-length It is 1, so that most latter two convolution feature (En4And En5) size that is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated The convolution feature of each scale, obtains multiple dimensioned content characteristic;Wherein, Multi resolution feature extraction module is as shown in figure 3, here Assuming that the feature sizes of convolution feature are (c, h, w);
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network. Channel compressions module is as shown in figure 4, wherein " SE Module " is by Hu et al. in SENet (Squeeze-and-Excitation Networks) the module proposed in paper.SE module is with multiple dimensioned content characteristic FeatiAs input, by each channel On feature between correlation modeled and carry out weighting operations to keep feature generalization ability stronger.Then channel compressions mould The port number of characteristic results is compressed to original half using the convolution operation that a convolution kernel size is (1,1) by block, then By ReLU (Rectified Linear Unit, line rectification function) function and BN, (Batch Normalization is criticized Normalization) layer obtains the multiple dimensioned content characteristic Feat after channel compressionsi
In the present embodiment, step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;The swollen of different empty convolution is set Swollen coefficient can make convolution operation capture different size of content area feature on image, that is, generate multiple dimensioned content regions The characteristic results in domain.The characteristic results and convolution feature En that these three operations obtainiFeature sizes be consistent, feature sizes All it is (c, h, w);
Step S222: three characteristic results are stitched together simultaneously on channel dimension using attended operation (concate) Obtain the characteristic results that feature sizes are (3c, h, w);
Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1) Channel compressions to convolution feature EniUnanimously and the multiple dimensioned content characteristic that feature sizes are (c, h, w) is obtained, in Fig. 4 Feati
In the present embodiment, step S3 specifically includes the following steps:
Step S31: in order to merge various sizes of feature, the present embodiment designs a multi-scale feature fusion module, such as Shown in Fig. 5, if the multiple dimensioned content characteristic Feat of inputiFeature sizes be (c, h, w);In the multi-scale feature fusion mould In block, respectively using convolution kernel size be (1, k) and (k, 1) the separable convolution operation of depth and convolution kernel size for (k, 1) it with the separable convolution operation of the depth of (1, k), obtains and input feature vector FeatiFusion Features result of the same size;This It is equivalent to the convolution operation of (k, k), but required computing resource can less splice different scale from Spatial Dimension simultaneously Content area feature;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, all it is using multiple dimensioned spy Fusion Module is levied to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature of input Deci+1Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Using up-sampling operation on Spatial Dimension Twice of amplification, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHave same size on Spatial Dimension, it is special Levying size is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1It is spelled using concatenation Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers Size is the characteristic results of (c, h, w);And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)i;Finally, reapplying convolution kernel size as (1,1) volume Product is operated characteristic results DeciPort number compression half in order to Deci-1Merged, by ReLU activation primitive with And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Deci, and pass it through convolution operation for channel compressions to 1, The notable figure Pred that can be predicted using Sigmoid functioni.It is worth noting that due to Dec4And Dec5Has phase Same number of active lanes, so port number is without compression here.
In the present embodiment, step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is (1, k) and (k, 1) using convolution kernel size The separable convolution operation of depth, while again successively to input feature vector FeatiIt is (k, 1) and (1, k) using convolution kernel size The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiSize Consistent characteristic results;
Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1) Sign is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
In the present embodiment, in step S4, using Adam, (Adaptive moment estimation, adaptive square are estimated Meter) algorithm the training stage optimize loss function.As shown in Figure 2, in step 3 each scale characteristic results DeciIt is all right Answer a LossiCalculating, wherein each LossiIt is all the notable figure Pred predicted in Fig. 6iIt is calculated with artificial mark figure An obtained intersection entropy loss.
Wherein, the calculating that network intersects entropy loss Loss uses following formula:
The optimized parameter of network is obtained by Adam algorithm optimization, is finally predicted using network significant in color image Object.
It, can be with when algorithm goes the feature for extracting more scale dependents to be merged again on the basis of original scale feature Make fused feature that there is stronger generalization ability.The thinking extracting a variety of scale convolution features and being merged is follow, The present embodiment proposes a Multi resolution feature extraction module and Multiscale Fusion module.It is in network design that module is directly embedding Enter the U-Net network architecture to typical coder-decoder structure, while also contemplating on decoder architecture on feature channel The redundancy of information applies a channel compressions module to keep model computational efficiency higher.In conclusion the present embodiment proposes A kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, one kind of algorithm design are based on multiple dimensioned The network structure of feature extraction and fusion can significantly improve obvious object detection accuracy.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
The above described is only a preferred embodiment of the present invention, being not that the invention has other forms of limitations, appoint What those skilled in the art changed or be modified as possibly also with the technology contents of the disclosure above equivalent variations etc. Imitate embodiment.But without departing from the technical solutions of the present invention, according to the technical essence of the invention to above embodiments institute Any simple modification, equivalent variations and the remodeling made, still fall within the protection scope of technical solution of the present invention.

Claims (7)

1. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, it is characterised in that: including following Step:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases training number According to the data volume of collection;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally utilize trained model Network carrys out the obvious object in forecast image.
2. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes to calculate Equipment can undertake the calculation amount of neural network;
Step S12: each color image artificial mark figure corresponding with its concentrated to data carries out random cropping behaviour together Make, to increase the diversity of data;
Step S13: it is overturn by image level and generates mirror image, to expand the data volume of legacy data collection.
3. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is with image point Class convolutional network generates the volume of 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network Product feature, in convolution feature EniWith convolution feature Eni+1Between reduce the size of characteristic pattern step by step there are a pond layer, The step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results reduce one in wide and high two spaces dimension Half;In order to keep convolution feature and have enough information on Spatial Dimension, make the pond between most latter two convolution feature The step-length of layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on each of the improved U-Net network generation of step S21 The convolution feature of a scale, obtains multiple dimensioned content characteristic;
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
4. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 3, It is characterized by: step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth to separate Empty convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;These three operate obtained characteristic results and volume Product feature EniFeature sizes be consistent, feature sizes are all (c, h, w);
Step S222: using attended operation being stitched together and obtain feature sizes three characteristic results on channel dimension is The characteristic results of (3c, h, w);
Step S223: using the convolution operation that a convolution kernel size is (1,1) leading to the obtained characteristic results of step S222 Road is compressed to and convolution feature EniUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).
5. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: step S3 specifically includes the following steps:
Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputiFeature sizes be (c,h,w);In the multi-scale feature fusion module, it can divide using the depth that convolution kernel size is (1, k) and (k, 1) respectively From convolution operation and convolution kernel size be (k, 1) and (1, k) the separable convolution operation of depth, obtain and input feature vector FeatiFusion Features result of the same size;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with the feature of 5 different scales As a result, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, melted using Analysis On Multi-scale Features Block is molded to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature Dec of inputi+1 Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Amplify on Spatial Dimension using up-sampling operation Twice, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHas same size on Spatial Dimension, feature is big Small is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1After obtaining splicing using concatenation Feature, feature sizes are (2c, h, w), reapply convolution operation, obtain feature sizes by ReLU activation primitive and BN layers For the characteristic results of (c, h, w);And then, Fusion Features are obtained to obtained characteristic results application multi-scale feature fusion module As a result, this feature result and Fusion Features result are carried out concatenation again simultaneously, convolution operation, by ReLU activation primitive with And BN layers obtain feature sizes be (c, h, w) characteristic results Deci;Finally, reapplying convolution kernel size as (1,1) convolution behaviour Make characteristic results DeciPort number compression half in order to Deci-1It is merged, by ReLU activation primitive and BN Layer obtains the characteristic results Dec that feature sizes are (0.5c, h, w)i, and convolution operation is passed it through by channel compressions to 1, then is passed through Cross the notable figure Pred that Sigmoid function can be predictedi
6. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 5, It is characterized by: step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is the depth of (1, k) and (k, 1) using convolution kernel size Separable convolution operation, while again successively to input feature vector FeatiIt is the depth of (k, 1) and (1, k) using convolution kernel size Separable convolution operation has BN layers of addition after this successively operation twice and respectively obtains two characteristic results;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiIt is of the same size Characteristic results;
Step S313: using the convolution operation that convolution kernel size is (1,1) come the feature on the channel to characteristic results into Row is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
7. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: the calculating for intersecting entropy loss Loss uses following formula in step S4:
CN201910062293.9A 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion Active CN109635882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910062293.9A CN109635882B (en) 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910062293.9A CN109635882B (en) 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion

Publications (2)

Publication Number Publication Date
CN109635882A true CN109635882A (en) 2019-04-16
CN109635882B CN109635882B (en) 2022-05-13

Family

ID=66063115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910062293.9A Active CN109635882B (en) 2019-01-23 2019-01-23 Salient object detection method based on multi-scale convolution feature extraction and fusion

Country Status (1)

Country Link
CN (1) CN109635882B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084309A (en) * 2019-04-30 2019-08-02 北京市商汤科技开发有限公司 Characteristic pattern amplification method, device and equipment and computer readable storage medium
CN110298397A (en) * 2019-06-25 2019-10-01 东北大学 The multi-tag classification method of heating metal image based on compression convolutional neural networks
CN110322528A (en) * 2019-06-26 2019-10-11 浙江大学 Nuclear magnetic resonance brain image reconstructing blood vessel method based on 3T, 7T
CN110348390A (en) * 2019-07-12 2019-10-18 创新奇智(重庆)科技有限公司 A kind of training method, computer-readable medium and the system of fire defector model
CN110378976A (en) * 2019-07-18 2019-10-25 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110490892A (en) * 2019-07-03 2019-11-22 中山大学 A kind of Thyroid ultrasound image tubercle automatic positioning recognition methods based on USFaster R-CNN
CN110660046A (en) * 2019-08-30 2020-01-07 太原科技大学 Industrial product defect image classification method based on lightweight deep neural network
CN111028246A (en) * 2019-12-09 2020-04-17 北京推想科技有限公司 Medical image segmentation method and device, storage medium and electronic equipment
CN111080599A (en) * 2019-12-12 2020-04-28 哈尔滨市科佳通用机电股份有限公司 Fault identification method for hook lifting rod of railway wagon
CN111080588A (en) * 2019-12-04 2020-04-28 南京航空航天大学 Multi-scale neural network-based rapid fetal MR image brain extraction method
CN111191649A (en) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 Method and equipment for identifying bent multi-line text image
CN111814536A (en) * 2020-05-21 2020-10-23 闽江学院 Breeding monitoring method and device
CN111860233A (en) * 2020-07-06 2020-10-30 中国科学院空天信息创新研究院 SAR image complex building extraction method and system based on attention network selection
CN112115951A (en) * 2020-11-19 2020-12-22 之江实验室 RGB-D image semantic segmentation method based on spatial relationship
CN112258431A (en) * 2020-09-27 2021-01-22 成都东方天呈智能科技有限公司 Image classification model based on mixed depth separable expansion convolution and classification method thereof
CN112446292A (en) * 2020-10-28 2021-03-05 山东大学 2D image salient target detection method and system
CN112861795A (en) * 2021-03-12 2021-05-28 云知声智能科技股份有限公司 Method and device for detecting salient target of remote sensing image based on multi-scale feature fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170289434A1 (en) * 2016-03-29 2017-10-05 Sony Corporation Method and system for image processing to detect salient objects in image
CN108171701A (en) * 2018-01-15 2018-06-15 复旦大学 Conspicuousness detection method based on U networks and confrontation study
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN109191426A (en) * 2018-07-24 2019-01-11 江南大学 A kind of flat image conspicuousness detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170289434A1 (en) * 2016-03-29 2017-10-05 Sony Corporation Method and system for image processing to detect salient objects in image
CN108171701A (en) * 2018-01-15 2018-06-15 复旦大学 Conspicuousness detection method based on U networks and confrontation study
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN109191426A (en) * 2018-07-24 2019-01-11 江南大学 A kind of flat image conspicuousness detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HANGKE SONG ET AL.: "Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
YUZHEN NIU ET AL.: "Salient Object Segmentation Based on Superpixel and Background Connectivity Prior", 《IEEE ACCESS》 *
李金东: "基于似物性采样的语义物体检测与分割", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084309A (en) * 2019-04-30 2019-08-02 北京市商汤科技开发有限公司 Characteristic pattern amplification method, device and equipment and computer readable storage medium
US11049217B2 (en) 2019-04-30 2021-06-29 Beijing Sensetime Technology Development Co., Ltd. Magnifying feature map
CN110084309B (en) * 2019-04-30 2022-06-21 北京市商汤科技开发有限公司 Feature map amplification method, feature map amplification device, feature map amplification equipment and computer readable storage medium
CN110298397A (en) * 2019-06-25 2019-10-01 东北大学 The multi-tag classification method of heating metal image based on compression convolutional neural networks
CN110322528A (en) * 2019-06-26 2019-10-11 浙江大学 Nuclear magnetic resonance brain image reconstructing blood vessel method based on 3T, 7T
CN110490892A (en) * 2019-07-03 2019-11-22 中山大学 A kind of Thyroid ultrasound image tubercle automatic positioning recognition methods based on USFaster R-CNN
CN110348390A (en) * 2019-07-12 2019-10-18 创新奇智(重庆)科技有限公司 A kind of training method, computer-readable medium and the system of fire defector model
CN110378976A (en) * 2019-07-18 2019-10-25 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
WO2021008022A1 (en) * 2019-07-18 2021-01-21 北京市商汤科技开发有限公司 Image processing method and apparatus, electronic device and storage medium
CN110660046B (en) * 2019-08-30 2022-09-30 太原科技大学 Industrial product defect image classification method based on lightweight deep neural network
CN110660046A (en) * 2019-08-30 2020-01-07 太原科技大学 Industrial product defect image classification method based on lightweight deep neural network
CN111080588A (en) * 2019-12-04 2020-04-28 南京航空航天大学 Multi-scale neural network-based rapid fetal MR image brain extraction method
CN111028246A (en) * 2019-12-09 2020-04-17 北京推想科技有限公司 Medical image segmentation method and device, storage medium and electronic equipment
CN111080599A (en) * 2019-12-12 2020-04-28 哈尔滨市科佳通用机电股份有限公司 Fault identification method for hook lifting rod of railway wagon
CN111191649A (en) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 Method and equipment for identifying bent multi-line text image
CN111814536A (en) * 2020-05-21 2020-10-23 闽江学院 Breeding monitoring method and device
CN111814536B (en) * 2020-05-21 2023-11-28 闽江学院 Culture monitoring method and device
CN111860233A (en) * 2020-07-06 2020-10-30 中国科学院空天信息创新研究院 SAR image complex building extraction method and system based on attention network selection
CN111860233B (en) * 2020-07-06 2021-05-18 中国科学院空天信息创新研究院 SAR image complex building extraction method and system based on attention network selection
CN112258431B (en) * 2020-09-27 2021-07-20 成都东方天呈智能科技有限公司 Image classification model based on mixed depth separable expansion convolution and classification method thereof
CN112258431A (en) * 2020-09-27 2021-01-22 成都东方天呈智能科技有限公司 Image classification model based on mixed depth separable expansion convolution and classification method thereof
CN112446292A (en) * 2020-10-28 2021-03-05 山东大学 2D image salient target detection method and system
CN112446292B (en) * 2020-10-28 2023-04-28 山东大学 2D image salient object detection method and system
CN112115951A (en) * 2020-11-19 2020-12-22 之江实验室 RGB-D image semantic segmentation method based on spatial relationship
CN112861795A (en) * 2021-03-12 2021-05-28 云知声智能科技股份有限公司 Method and device for detecting salient target of remote sensing image based on multi-scale feature fusion

Also Published As

Publication number Publication date
CN109635882B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN109635882A (en) Salient object detection method based on multi-scale convolution feature extraction and fusion
KR102302725B1 (en) Room Layout Estimation Methods and Techniques
Garcia-Garcia et al. A review on deep learning techniques applied to semantic segmentation
CN104809187B (en) A kind of indoor scene semanteme marking method based on RGB D data
CN110210551A (en) A kind of visual target tracking method based on adaptive main body sensitivity
CN109753885B (en) Target detection method and device and pedestrian detection method and system
CN108961327A (en) A kind of monocular depth estimation method and its device, equipment and storage medium
CN110111366A (en) A kind of end-to-end light stream estimation method based on multistage loss amount
CN103262119B (en) For the method and system that image is split
CN108564097A (en) A kind of multiscale target detection method based on depth convolutional neural networks
CN110073359A (en) Valid data for convolutional neural networks are laid out
CN108734120A (en) Mark method, apparatus, equipment and the computer readable storage medium of image
CN113874883A (en) Hand pose estimation
CN109816769A (en) Scene based on depth camera ground drawing generating method, device and equipment
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN106372648A (en) Multi-feature-fusion-convolutional-neural-network-based plankton image classification method
CN109214366A (en) Localized target recognition methods, apparatus and system again
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN108121931A (en) two-dimensional code data processing method, device and mobile terminal
CN111488827A (en) Crowd counting method and system based on multi-scale feature information
CN108596919A (en) A kind of Automatic image segmentation method based on depth map
CN110349167A (en) A kind of image instance dividing method and device
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN114918918B (en) Domain-containing self-adaptive robot disordered target pushing and grabbing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant