CN109635882A

CN109635882A - Salient object detection method based on multi-scale convolution feature extraction and fusion

Info

Publication number: CN109635882A
Application number: CN201910062293.9A
Authority: CN
Inventors: 牛玉贞; 龙观潮; 郭文忠; 苏超然
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2019-04-16
Anticipated expiration: 2039-01-23
Also published as: CN109635882B

Abstract

The invention relates to a salient object detection method based on multi-scale convolution feature extraction and fusion, which comprises the steps of firstly enhancing data, simultaneously processing a color image and a corresponding artificial labeling image, and increasing the data volume of a training data set; extracting multi-scale features, and performing channel compression to optimize the computing efficiency of the network; then, fusing multi-scale features to obtain a predicted saliency map; finally, the optimal parameters of the model are learned by solving the minimum cross entropy loss; and finally, predicting the salient objects in the image by using the trained model network. The invention can obviously improve the detection precision of the obvious object.

Description

A kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion

Technical field

The present invention relates to image procossing and computer vision fields, especially a kind of to be based on multiple dimensioned convolution feature extraction With the obvious object detection method of fusion.

Background technique

How an opening that a variety of scale convolution be characterized in obvious object detection field is merged in full convolutional network Problem.From this problem, most of existing obvious object detection methods based on full convolutional neural networks generally pass through Addition network branches enable the convolution feature of different scale to be merged via branch, and obvious object is detected to generate Task more useful feature.The obvious object detection algorithm proposed after 2015, it is most of to be devoted to using full convolution mind The precision of computational efficiency and obvious object detection through network (FCNN) Lai Tisheng network.

These work can be divided into two classes, be the innovation of full convolutional network structure first, Li et al. people is in training in advance After the feature for obtaining different scale on VGG-16 network, the feature of each scale obtains new characteristic results by convolutional calculation, It is operated again by these characteristic recoveries by up-sampling to unified size, finally obtains significant testing result by convolution operation, and And the branch for having merged a super-pixel scale optimizes the result of last obvious object detection from space scale.Wang etc. The obvious object network that people proposes is the full convolutional neural networks of coder-decoder form, while being additionally added circulation nerve net Network structure carrys out continuous iteration optimization obvious object testing result.Cheng et al. joined short connection structure in full convolutional network (short connection structure), since each output branch has merged high-level semantics letter in short connection structure Breath and the features such as rudimentary texture, shape, the performance of algorithm are significantly improved and also keep model simple and efficient simultaneously.

However, most methods be using trained character network preparatory in classification task come in converged network it is corresponding not With the convolution feature of scale, and the scale of these features is typically all limited and fixed.

Summary of the invention

In view of this, the purpose of the present invention is to propose to a kind of obvious objects based on multiple dimensioned convolution feature extraction and fusion Detection method can significantly improve obvious object detection accuracy.

The present invention is realized using following scheme: a kind of to be detected based on the obvious object of multiple dimensioned convolution feature extraction and fusion Method, specifically includes the following steps:

Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction Practice the data volume of data set；

Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network；

Step S3: merging multiple dimensioned feature, the notable figure Pred predictedⁱ；

Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study；Finally using trained Prototype network carrys out the obvious object in forecast image.

Further, step S1 specifically includes the following steps:

Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes The calculation amount of neural network can be undertaken by calculating equipment；

Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together Operation is cut, to increase the diversity of data；

Step S13: it is overturn by image level and generates mirror image, to expand the data volume of legacy data collection.

Further, step S2 specifically includes the following steps:

Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme Picture classification convolutional network generates 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network Convolution feature, in convolution feature EnⁱWith convolution feature Enⁱ⁺¹Between there are a pond layers to reduce characteristic pattern step by step Size, the step-length that the pond layer is arranged is 2, so that Enⁱ⁺¹Compared to EnⁱCharacteristic results are reduced in wide and high two spaces dimension Half；In order to keep convolution feature and have enough information on Spatial Dimension, make between most latter two convolution feature The step-length of pond layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension；

Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated The convolution feature of each scale, obtains multiple dimensioned content characteristic；

Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.

Further, step S22 specifically includes the following steps:

Step S221: three convolutional layers of design, with convolution feature EnⁱAs input, these three convolution are all to execute depth Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively；These three operate obtained feature knot Fruit and convolution feature EnⁱFeature sizes be consistent, feature sizes are all (c, h, w)；

Three characteristic results: being stitched together by step S222 using attended operation on channel dimension and it is big to obtain feature The small characteristic results for (3c, h, w)；

Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1) Channel compressions to convolution feature EnⁱUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).

Further, step S3 specifically includes the following steps:

Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputⁱFeature Size is (c, h, w)；It is respectively the depth of (1, k) and (k, 1) using convolution kernel size in the multi-scale feature fusion module It spends separable convolution operation and convolution kernel size is the separable convolution operation of depth of (k, 1) and (1, k), obtain and input Feature FeatⁱFusion Features result of the same size；

The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesⁱ, all it is using multiple dimensioned spy Fusion Module is levied to merge multiple dimensioned content characteristic FeatⁱAnd convolution feature Decⁱ⁺¹, it is assumed here that the convolution feature of input Decⁱ⁺¹Feature sizes be (c, h/2, w/2)；Firstly, first to convolution feature Decⁱ⁺¹Using up-sampling operation on Spatial Dimension Twice of amplification, thus convolution feature Decⁱ⁺¹With multiple dimensioned content characteristic FeatⁱHave same size on Spatial Dimension, it is special Levying size is (c, h, w)；Then by multiple dimensioned content characteristic FeatⁱAnd convolution feature Decⁱ⁺¹It is spelled using concatenation Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers Size is the characteristic results of (c, h, w)；And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)ⁱ；Finally, reapplying convolution kernel size as (1,1) volume Product is operated characteristic results DecⁱPort number compression half in order to Dec^i-1Merged, by ReLU activation primitive with And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Decⁱ, and pass it through convolution operation for channel compressions to 1, The notable figure Pred that can be predicted using Sigmoid functionⁱ。

Further, step S31 specifically includes the following steps:

Step S311: successively to the multiple dimensioned content characteristic Feat of inputⁱIt is (1, k) and (k, 1) using convolution kernel size The separable convolution operation of depth, while again successively to input feature vector FeatⁱIt is (k, 1) and (1, k) using convolution kernel size The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice；

Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatⁱSize Consistent characteristic results；

Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1) Sign is modeled and is obtained and input feature vector FeatⁱFusion Features result of the same size.

Further, in step S4, the calculating for intersecting entropy loss Loss uses following formula:

Compared with prior art, the invention has the following beneficial effects: the invention proposes a Multi resolution feature extraction moulds Module is directly embedded into the U-Net of typical coder-decoder structure in network design by block and Multiscale Fusion module The network architecture, while the redundancy of information on feature channel on decoder architecture is also contemplated, apply a channel compressions mould Block is higher to make model computational efficiency.The method can significantly improve obvious object detection accuracy.

Detailed description of the invention

Fig. 1 is the method flow schematic diagram of the embodiment of the present invention.

Fig. 2 is that the obvious object of the embodiment of the present invention detects network structure.

Fig. 3 is the Multi resolution feature extraction module diagram of the embodiment of the present invention.

Fig. 4 is the channel compressions module diagram of the embodiment of the present invention.

Fig. 5 is the multi-scale feature fusion module diagram of the embodiment of the present invention.

Fig. 6 is the schematic network structure of the multi-scale feature fusion process of the embodiment of the present invention.

Specific embodiment

The present invention will be further described with reference to the accompanying drawings and embodiments.

It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.

It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.

As shown in Figure 1, present embodiments providing a kind of based on the inspection of the obvious object of multiple dimensioned convolution feature extraction and fusion Survey method, specifically includes the following steps:

In the present embodiment, the step S1 carries out data enhancing, while to color image and corresponding artificial mark Figure is handled, and the data volume of training dataset is increased.For training the international mainstream data set one of obvious object detection network As be to scheme comprising color image and corresponding handmarking, wherein in color image such as Fig. 2 shown in (a), handmarking's figure class It is similar to the bianry image that notable figure (in such as Fig. 2 (b)) is the image obvious object region gone out by handmarking.Due to constructing data Collection needs to expend biggish manpower, considers that training deep neural network needs enough data, it is therefore desirable in legacy data collection Data enhancement operations are carried out on the basis of data volume.Therefore step S1 specifically includes the following steps:

Step S13: being overturn by image level and generate mirror image, to expand the data volume of legacy data collection, to meet instruction Practice biggish data volume required for the convolutional neural networks CNN of depth, enhances the generalization ability of model.

In the present embodiment, step S2 specifically includes the following steps:

Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme As classification convolutional network is as character network (such as VGG or ResNet network structure), by continuous stacked combination convolutional layer with And pond layer generates the convolution feature of 5 kinds of different scales, such as the En in Fig. 2¹, En², En³, En⁴And En⁵Five corresponding spies Levy result.In this five convolution features, in convolution feature EnⁱWith convolution feature Enⁱ⁺¹Between there are a pond layers to come gradually Ground reduces the size of characteristic pattern, and the step-length that the pond layer is arranged is 2, so that Enⁱ⁺¹Compared to EnⁱCharacteristic results are wide and two high Reduce half on Spatial Dimension, this also results in the decaying of convolution feature information on Spatial Dimension；In order to keep convolution special It seeks peace and has enough information on Spatial Dimension, make most latter two convolution feature (En⁴And En⁵) between pond layer step-length It is 1, so that most latter two convolution feature (En⁴And En⁵) size that is consistent in wide and high two spaces dimension；

Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated The convolution feature of each scale, obtains multiple dimensioned content characteristic；Wherein, Multi resolution feature extraction module is as shown in figure 3, here Assuming that the feature sizes of convolution feature are (c, h, w)；

Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network. Channel compressions module is as shown in figure 4, wherein " SE Module " is by Hu et al. in SENet (Squeeze-and-Excitation Networks) the module proposed in paper.SE module is with multiple dimensioned content characteristic FeatⁱAs input, by each channel On feature between correlation modeled and carry out weighting operations to keep feature generalization ability stronger.Then channel compressions mould The port number of characteristic results is compressed to original half using the convolution operation that a convolution kernel size is (1,1) by block, then By ReLU (Rectified Linear Unit, line rectification function) function and BN, (Batch Normalization is criticized Normalization) layer obtains the multiple dimensioned content characteristic Feat after channel compressionsⁱ。

In the present embodiment, step S22 specifically includes the following steps:

Step S221: three convolutional layers of design, with convolution feature EnⁱAs input, these three convolution are all to execute depth Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively；The swollen of different empty convolution is set Swollen coefficient can make convolution operation capture different size of content area feature on image, that is, generate multiple dimensioned content regions The characteristic results in domain.The characteristic results and convolution feature En that these three operations obtainⁱFeature sizes be consistent, feature sizes All it is (c, h, w)；

Step S222: three characteristic results are stitched together simultaneously on channel dimension using attended operation (concate) Obtain the characteristic results that feature sizes are (3c, h, w)；

Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1) Channel compressions to convolution feature EnⁱUnanimously and the multiple dimensioned content characteristic that feature sizes are (c, h, w) is obtained, in Fig. 4 Featⁱ。

In the present embodiment, step S3 specifically includes the following steps:

Step S31: in order to merge various sizes of feature, the present embodiment designs a multi-scale feature fusion module, such as Shown in Fig. 5, if the multiple dimensioned content characteristic Feat of inputⁱFeature sizes be (c, h, w)；In the multi-scale feature fusion mould In block, respectively using convolution kernel size be (1, k) and (k, 1) the separable convolution operation of depth and convolution kernel size for (k, 1) it with the separable convolution operation of the depth of (1, k), obtains and input feature vector FeatⁱFusion Features result of the same size；This It is equivalent to the convolution operation of (k, k), but required computing resource can less splice different scale from Spatial Dimension simultaneously Content area feature；

The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesⁱ, all it is using multiple dimensioned spy Fusion Module is levied to merge multiple dimensioned content characteristic FeatⁱAnd convolution feature Decⁱ⁺¹, it is assumed here that the convolution feature of input Decⁱ⁺¹Feature sizes be (c, h/2, w/2)；Firstly, first to convolution feature Decⁱ⁺¹Using up-sampling operation on Spatial Dimension Twice of amplification, thus convolution feature Decⁱ⁺¹With multiple dimensioned content characteristic FeatⁱHave same size on Spatial Dimension, it is special Levying size is (c, h, w)；Then by multiple dimensioned content characteristic FeatⁱAnd convolution feature Decⁱ⁺¹It is spelled using concatenation Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers Size is the characteristic results of (c, h, w)；And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)ⁱ；Finally, reapplying convolution kernel size as (1,1) volume Product is operated characteristic results DecⁱPort number compression half in order to Dec^i-1Merged, by ReLU activation primitive with And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Decⁱ, and pass it through convolution operation for channel compressions to 1, The notable figure Pred that can be predicted using Sigmoid functionⁱ.It is worth noting that due to Dec⁴And Dec⁵Has phase Same number of active lanes, so port number is without compression here.

In the present embodiment, step S31 specifically includes the following steps:

In the present embodiment, in step S4, using Adam, (Adaptive moment estimation, adaptive square are estimated Meter) algorithm the training stage optimize loss function.As shown in Figure 2, in step 3 each scale characteristic results DecⁱIt is all right Answer a LossⁱCalculating, wherein each LossⁱIt is all the notable figure Pred predicted in Fig. 6ⁱIt is calculated with artificial mark figure An obtained intersection entropy loss.

Wherein, the calculating that network intersects entropy loss Loss uses following formula:

The optimized parameter of network is obtained by Adam algorithm optimization, is finally predicted using network significant in color image Object.

It, can be with when algorithm goes the feature for extracting more scale dependents to be merged again on the basis of original scale feature Make fused feature that there is stronger generalization ability.The thinking extracting a variety of scale convolution features and being merged is follow, The present embodiment proposes a Multi resolution feature extraction module and Multiscale Fusion module.It is in network design that module is directly embedding Enter the U-Net network architecture to typical coder-decoder structure, while also contemplating on decoder architecture on feature channel The redundancy of information applies a channel compressions module to keep model computational efficiency higher.In conclusion the present embodiment proposes A kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, one kind of algorithm design are based on multiple dimensioned The network structure of feature extraction and fusion can significantly improve obvious object detection accuracy.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

The above described is only a preferred embodiment of the present invention, being not that the invention has other forms of limitations, appoint What those skilled in the art changed or be modified as possibly also with the technology contents of the disclosure above equivalent variations etc. Imitate embodiment.But without departing from the technical solutions of the present invention, according to the technical essence of the invention to above embodiments institute Any simple modification, equivalent variations and the remodeling made, still fall within the protection scope of technical solution of the present invention.

Claims

1. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, it is characterised in that: including following Step:

Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases training number According to the data volume of collection；

Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study；Finally utilize trained model Network carrys out the obvious object in forecast image.

2. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: step S1 specifically includes the following steps:

Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes to calculate Equipment can undertake the calculation amount of neural network；

Step S12: each color image artificial mark figure corresponding with its concentrated to data carries out random cropping behaviour together Make, to increase the diversity of data；

3. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: step S2 specifically includes the following steps:

Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is with image point Class convolutional network generates the volume of 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network Product feature, in convolution feature EnⁱWith convolution feature Enⁱ⁺¹Between reduce the size of characteristic pattern step by step there are a pond layer, The step-length that the pond layer is arranged is 2, so that Enⁱ⁺¹Compared to EnⁱCharacteristic results reduce one in wide and high two spaces dimension Half；In order to keep convolution feature and have enough information on Spatial Dimension, make the pond between most latter two convolution feature The step-length of layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension；

Step S22: one Multi resolution feature extraction module of design acts on each of the improved U-Net network generation of step S21 The convolution feature of a scale, obtains multiple dimensioned content characteristic；

4. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 3, It is characterized by: step S22 specifically includes the following steps:

Step S221: three convolutional layers of design, with convolution feature EnⁱAs input, these three convolution are all to execute depth to separate Empty convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively；These three operate obtained characteristic results and volume Product feature EnⁱFeature sizes be consistent, feature sizes are all (c, h, w)；

Step S222: using attended operation being stitched together and obtain feature sizes three characteristic results on channel dimension is The characteristic results of (3c, h, w)；

Step S223: using the convolution operation that a convolution kernel size is (1,1) leading to the obtained characteristic results of step S222 Road is compressed to and convolution feature EnⁱUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).

5. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: step S3 specifically includes the following steps:

Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputⁱFeature sizes be (c,h,w)；In the multi-scale feature fusion module, it can divide using the depth that convolution kernel size is (1, k) and (k, 1) respectively From convolution operation and convolution kernel size be (k, 1) and (1, k) the separable convolution operation of depth, obtain and input feature vector FeatⁱFusion Features result of the same size；

The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with the feature of 5 different scales As a result, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesⁱ, melted using Analysis On Multi-scale Features Block is molded to merge multiple dimensioned content characteristic FeatⁱAnd convolution feature Decⁱ⁺¹, it is assumed here that the convolution feature Dec of inputⁱ⁺¹ Feature sizes be (c, h/2, w/2)；Firstly, first to convolution feature Decⁱ⁺¹Amplify on Spatial Dimension using up-sampling operation Twice, thus convolution feature Decⁱ⁺¹With multiple dimensioned content characteristic FeatⁱHas same size on Spatial Dimension, feature is big Small is (c, h, w)；Then by multiple dimensioned content characteristic FeatⁱAnd convolution feature Decⁱ⁺¹After obtaining splicing using concatenation Feature, feature sizes are (2c, h, w), reapply convolution operation, obtain feature sizes by ReLU activation primitive and BN layers For the characteristic results of (c, h, w)；And then, Fusion Features are obtained to obtained characteristic results application multi-scale feature fusion module As a result, this feature result and Fusion Features result are carried out concatenation again simultaneously, convolution operation, by ReLU activation primitive with And BN layers obtain feature sizes be (c, h, w) characteristic results Decⁱ；Finally, reapplying convolution kernel size as (1,1) convolution behaviour Make characteristic results DecⁱPort number compression half in order to Dec^i-1It is merged, by ReLU activation primitive and BN Layer obtains the characteristic results Dec that feature sizes are (0.5c, h, w)ⁱ, and convolution operation is passed it through by channel compressions to 1, then is passed through Cross the notable figure Pred that Sigmoid function can be predictedⁱ。

6. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 5, It is characterized by: step S31 specifically includes the following steps:

Step S311: successively to the multiple dimensioned content characteristic Feat of inputⁱIt is the depth of (1, k) and (k, 1) using convolution kernel size Separable convolution operation, while again successively to input feature vector FeatⁱIt is the depth of (k, 1) and (1, k) using convolution kernel size Separable convolution operation has BN layers of addition after this successively operation twice and respectively obtains two characteristic results；

Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatⁱIt is of the same size Characteristic results；

Step S313: using the convolution operation that convolution kernel size is (1,1) come the feature on the channel to characteristic results into Row is modeled and is obtained and input feature vector FeatⁱFusion Features result of the same size.

7. a kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion according to claim 1, It is characterized by: the calculating for intersecting entropy loss Loss uses following formula in step S4: