CN109101975A

CN109101975A - Image, semantic dividing method based on full convolutional neural networks

Info

Publication number: CN109101975A
Application number: CN201810947884.XA
Authority: CN
Inventors: 程建; 苏炎洲; 林莉; 高银星; 李恩泽
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-08-20
Filing date: 2018-08-20
Publication date: 2018-12-28
Anticipated expiration: 2038-08-20
Also published as: CN109101975B

Abstract

The present invention discloses a kind of image, semantic dividing method based on full convolutional neural networks, is related to image, semantic segmentation and deep learning field, includes the following steps: to select training dataset；It constructs and trains by the disaggregated model of image to class label, and as semantic segmentation model front network；The characteristic pattern of each piece of front network output passes through details respectively, and to retain pond layer down-sampled at unified size, then this four output characteristic patterns is connected, and by the heavy correction module of feature, after re-calibrating characteristic pattern, by the incoming back-end network of obtained characteristic pattern；Back-end network is to be mainly responsible for picture up-sampling, by up-sampling and then carrying out error back propagation finally with the semantic tagger image of training dataset calculating cross entropy by the global pool of a variable weight.The present invention solves the problems, such as that image segmentation accuracy rate in the prior art is lower.

Description

Image, semantic dividing method based on full convolutional neural networks

Technical field

The present invention relates to image, semantic segmentation and deep learning fields, more particularly to the image based on full convolutional neural networks Semantic segmentation method.

Background technique

Semantic segmentation is an important problem in computer vision field.Image, semantic segmentation is to give each pixel A different label (classification) is assigned, therefore is considered an intensive classification problem.

In recent years, most current optimal image, semantic dividing methods are all based on full convolutional neural networks.Allusion quotation The semantic segmentation network structure of type is coder-decoder structure, and encoder is an image drop sampling process, is responsible for extracting figure As coarse semantic feature, it is followed by a decoder, decoder is a picture up-sampling process, is responsible for down-sampled Obtained characteristics of image carries out up-sampling and is restored to the original dimension of input picture.

Although pond is in the process a crucial component part in the down-sampled of convolutional neural networks, can be used to reduce The scale of parameter enhances the invariance to certain distortions, while increasing receptive field.But because of pond, itself is a damage The process of consumption makes semantic point so it will lead to the loss of image, semantic information during the image drop sampling of semantic segmentation The precision for cutting result is relatively low.

In depth convolutional neural networks, pond layer is replaced commonly using the convolution that strides (strided convolutions) Have the function that down-sampled, the convolution that strides only considers a node of the fixation position of each local neighborhood, activates without considering Importance.From the angle of image drop sampling, such down-sampled mode equally also results in the distortion of feature.Full convolutional Neural Network is the state-of-the-art image, semantic partitioning algorithm of a large amount of application programming, and wherein the innovation of network structure is mainly concentrated It encodes or is connected to the network to promote gradient current in room for improvement.

Summary of the invention

It is existing to solve it is an object of the invention to design a kind of image, semantic dividing method based on full convolutional neural networks The problem for having the image segmentation accuracy rate in technology lower.

Technical scheme is as follows:

Image, semantic dividing method based on full convolutional neural networks, includes the following steps:

Step 1: selection training dataset.

Step 2: constructing and train the disaggregated model by image to class label, and as semantic segmentation model front end Network；

The structure of semantic segmentation model front network includes Conv1, Conv2_x, Conv3_x and Conv4_x, Conv1, Conv2_x, Conv3_x and Conv4_x include multiple convolutional layers, Conv4_x, Conv1, Conv2_x, Conv3_x and It is all connected with a details behind Conv4_x and retains pond layer one.

Step 3: based on trained semantic segmentation model front network, constructing semantic segmentation model back-end network.

The structure of back-end network include details retain pond layer two, feature weigh correction module, convolutional layer, Conv5_x, Conv6_x, Conv7_x, convolutional layer, variable weight global pool layer and up-sampling layer；The output of Conv1, Conv2_x, Conv3_x point Tong Guo not be after three details retain and are connected in series after pond layers two with Conv4_x, common input feature vector weighs correction module；Conv5_ X, be all connected with a up-sampling layer before Conv6_x and Conv7_x, Conv5_x, Conv6_x and Conv7_x include convolutional layer, Batch normalization layer and line rectification unit, Conv5_x, Conv6_x, Conv7_x by jump structure respectively successively with Conv3_ X, the output characteristic pattern series connection of Conv2_x and Conv1.

Step 4: whole image semantic segmentation model is trained.

Step 5: inputting new image, a propagated forward is carried out in trained deep neural network model, hold The semantic segmentation result of prediction is exported to end.

It specifically, include 33 residual error structures, each residual error structure packet in the semantic segmentation front end model front network Containing 11 × 1 convolution, 13 × 3 convolution, 11 × 1 convolution sum 1 quick connection.

Specifically, the details after the Conv1 retains down-sampled 8 times of output characteristic pattern of pond layer a pair of Conv1, Details after Conv2_x retains down-sampled 4 times of output characteristic pattern of pond layer a pair of Conv2_x, and the details after Conv3_x retains Down-sampled 2 times of the output characteristic pattern of pond layer a pair of Conv3_x.

Specifically, the details retains pond layer one and details retains the detailed process of pond layer two are as follows:

The output P of each position is calculated according to the characteristic pattern I of input:

Wherein,[P] indicates that input feature vector figure retains the value of pond Hua Cenghou output position P by details；

Neighborhood spaceThe space average weights omega of input node_α,β[p, q] is

Wherein α is biased exponent, and β is reward index.ρ_β(·)It is anti-bilateral filtering function, is used in neighborhood space Ω_pMeter The weight of input point is calculated, β reduces the dynamic range of reward function, and β → 0 is exactly that simple field is average.

It is linear-scale reduction factor, specifically:

Wherein F is in neighborhoodOn one can learn, the 2D filter of nonstandardized technique, thisSize be 3 × 3。

Specifically, the feature weight correction module is to combine space characteristics to correct the network corrected again with channel characteristics again Module.

Specifically, the process of training whole image semantic segmentation model are as follows:

Step 4.1: the image concentrated to training data pre-processes, and is fixed dimension by image cutting-out.

Step 4.2: whole image semantic segmentation model is initialized.

Step 4.3: the data concentrated to training data are expanded by way of overturning, scaling and rotation.

Step 4.4: using the sum of the intersection entropy loss of each pixel as loss function, reusing random tonsure descent algorithm Error back propagation is carried out, model parameter is updated, obtains trained semantic segmentation model.

After adopting the above scheme, beneficial effects of the present invention are as follows:

(1) image, semantic parted pattern of the invention introduces details reservation pond layer, during down-sampled, Neng Goubao Stay more image detail informations.It is a kind of adaptive pond method that details, which retains pond layer, and this method can amplify sky Between change and retain important CONSTRUCTED SPECIFICATION, it is also important that, its parameter can learn jointly with the rest part of network.

(2) introduced feature weight correction module in the present invention, is carrying out re-graduation just to feature, space characteristics re-graduation just can be more preferable The importance by same position pixels all in space re-calibrated, and be assigned to corresponding weight, improve semantic segmentation Accuracy rate, important channel can just be assigned to high weight by channel characteristics re-graduation, prominent importance；In short, feature re-graduation is just Module can efficiently solve image, semantic segmentation, and accuracy rate is low, during pond the problem of detailed information loss, finally obtains Preferable semantic segmentation result.

(3) the variable weight global pool described in, since traditional global draw pondization operates, to the same of all feature channels Position is carried out same operation, i.e. 1 × 1 convolution, cannot protrude the correct class categories of each pixel in semantic segmentation, give 1 × 1 convolution in the average pond of the overall situation adds 1 weight vector, is distributed by standard gaussian and carries out parameter initialization, in error In back-propagation process, the weight of pixel is constantly updated, can preferably be classified pixel-by-pixel, moreover it is possible to it is convergent to play quickening Effect.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is image, semantic parted pattern structure chart of the invention；

Fig. 3 is residual error structure chart of the invention；

Fig. 4 is the positive function structure chart of feature re-graduation of the invention；

Fig. 5 is the positive function structure chart of channel characteristics re-graduation of the invention；

Fig. 6 is the positive function structure chart of space characteristics re-graduation of the invention.

Specific embodiment

Understand to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific implementation with and it is attached Figure, invention is further described in detail.

It is proposed by the present invention a kind of based on full convolution to solve the problems, such as that image segmentation accuracy rate in the prior art is lower The image, semantic dividing method of neural network can be widely applied for the field of General Two-Dimensional image, semantic segmentation.

As shown in Figure 1, the image, semantic dividing method based on full convolutional neural networks, the present invention include the following steps:

Step 1: selection training dataset；Using 21 classes of 2012 data set of VOC in the present embodiment (wherein 1 class is background) On the basis of scene type, acquires the image in COCO data set comprising above-mentioned 20 class classification target and data set is added, finally obtain Trained and test data set.

Step 2: constructing and train the disaggregated model by image to class label, and as semantic segmentation model front end Network.

As shown in Fig. 2, the structure of semantic segmentation model front network includes Conv1, Conv2_x, Conv3_x and Conv4_ X, Conv1, Conv2_x, Conv3_x and Conv4_x include multiple convolutional layers, Conv4_x, Conv1, Conv2_x, Conv3_ It is all connected with a details behind x and Conv4_x and retains pond layer one；After each piece of Conv2_x, Conv3_x and Conv4_x Face adds a details to retain pond layer, this is a kind of adaptive pool method, can amplify spatial variations and retain important knot Structure details.

As shown in figure 3, including 33 residual error structures in the model front network of semantic segmentation front end, each residual error structure includes 11 × 1 convolution, 13 × 3 convolution, 11 × 1 convolution sum 1 quick connection (shortcut connection).

For the convenience of description, by the characteristic pattern (having a size of 112 × 112) of Conv1 output, the characteristic pattern of Conv2_x output (having a size of 56 × 56), the characteristic pattern (having a size of 28 × 28) of Conv3_x output, the characteristic pattern of Conv4_x output is (having a size of 14 × 14) characteristic pattern Res_1, characteristic pattern Res_2, characteristic pattern Res_3 and characteristic pattern Res_4 are denoted as.

Details after Conv1 retains down-sampled 8 times of pond layer a pair of characteristic pattern Res_1, and the details after Conv2_x retains pond Change down-sampled 4 times of layer a pair of characteristic pattern Res_2, the details after Conv3_x retains pond layer a pair of characteristic pattern Res_3 down-sampled 2 Times.

As shown in Fig. 2, the structure of back-end network include details retain pond layer two, feature weigh correction module, convolutional layer, Conv5_x, Conv6_x, Conv7_x, convolutional layer, variable weight global pool layer and up-sampling layer；Characteristic pattern Res_1, characteristic pattern Res_2, characteristic pattern Res_3 pass through respectively after three details retain and be connected in series after pond layers two with Conv4_x, common input spy Levy weight correction module；Be all connected with a up-sampling layer before Conv5_x, Conv6_x and Conv7_x, Conv5_x, Conv6_x and Conv7_x includes that convolutional layer, batch normalization layer and line rectification unit, Conv5_x, Conv6_x, Conv7_x pass through jump Structure is successively connected with the output characteristic pattern of Conv3_x, Conv2_x and Conv1 respectively.

For in step 2 and step 3, the details retains pond layer one and details retains the detailed process of pond layer two Are as follows:

Wherein,[P] indicates that input feature vector figure retains the value of pond Hua Cenghou output position P by details；Neighborhood is empty BetweenThe space weighted average ω of input node_α,β[p, q] is

Wherein α is biased exponent, and β is reward index.ρ_β() is anti-bilateral filtering function, is used in neighborhood space Ω_pMeter The weight of input point is calculated, β reduces the dynamic range of reward function, and β → 0 is exactly that simple field is average.

It is linear-scale reduction factor, specifically:

Specifically, feature weight correction module it is (as shown in Figure 4) be in conjunction with space characteristics re-graduation just with channel characteristics re-graduation just Network module.

It will separately be illustrated below:

As shown in figure 5, process in space characteristics weight correction module are as follows:

(1) by primitive character figureIt is 1 × 1 by a convolution kernel size, port number is c (each logical The weight in road is not shared, it is allowed to obtain from study) convolution, obtain a characteristic pattern

(2) one sigmoid layers are passed it through again, by M^cEach spatial position M ' (i, j), i ∈ { 1,2 ..., H }, j The importance of ∈ { 1,2 ..., W } re-calibrates, and is assigned to the weight p (i, j) of each spatial position one, obtained p (i, j) with Primitive character figure M^cCarry out dot product.

Finally, M^cThe characteristic pattern just obtained by space characteristics re-graduation are as follows:

Space characteristics re-graduation just can preferably be re-calibrated the importance of same position pixels all in space, And it is assigned to corresponding weight, improve the accuracy rate of semantic segmentation.

As shown in fig. 6, process in channel characteristics weight correction module are as follows:

(1) by primitive character figureIt is averaged pond by an overall situation, obtains a characteristic patternAgain by M ' and primitive character figure M^cIt is connected entirely, carries out the integration of characteristic pattern.

(2) characteristic pattern after integrating is modified feature using a linear amending unit.

It (3) is finally H × W using a convolution kernel size to revised characteristic pattern, port number is that the convolution of c obtains One feature vector

(4) the activation range of feature vector z is limited between [0,1], obtains using one sigmoid layers by characteristic pattern To a channel weight vectorM^cThe characteristic pattern just obtained by channel characteristics re-graduation:

Just by channel characteristics re-graduation, important channel can be assigned to high weight, prominent importance.

Step 4: whole image semantic segmentation model is trained；The process of training whole image semantic segmentation model For.

Step 4.1: the image concentrated to training data pre-processes, and is fixed dimension 513 × 513 by image cutting-out.

Step 4.2: whole image semantic segmentation model being initialized, i.e., mould is divided with the good image, semantic of pre-training The parameter value of type is initial value.

Step 4.3: the data concentrated to training data are expanded by way of overturning, scaling and rotation；Specifically, Overturning is random overturning；In the random zoomed image between 0.5 to 2 times of original image；In original image between -10 to 10 degree, Random-Rotation image.

Step 4.4: using the sum of the intersection entropy loss of each pixel as loss function, reusing random tonsure descent algorithm Error back propagation is carried out, with multinomial learning strategy, model parameter is updated, obtains trained semantic segmentation model.It is multinomial In formula learning strategy, learning rate lr setting are as follows:

Wherein, baselr is initial learning rate, is set as 0.001, power settingization 0.9 here.

The principle of the present invention and process are as follows: using the spy of Conv1 output in image, semantic parted pattern of the invention The feature of characteristic pattern Res_3, the Conv4_x output of characteristic pattern Res_2, the Conv3_x output of sign figure Res_1, Conv2_x output Scheme Res_4, the respectively first layer, the second layer of front network (i.e. feature extraction network), third layer and the 4th layer.It then will be special Sign figure Res_1, which retains pond layer one by details, to carry out retaining down-sampled 8 times of details convolution, and Res_2 passes through to be retained by details Pond layer one carries out retaining down-sampled 4 times of details pondization, and Res_3 passes through to carry out retaining details volume by details reservation pond layer one Down-sampled 2 times and Res_1 of product is together in series, and is input to feature weight correction module, just by channel characteristics re-graduation, space characteristics Re-graduation just can preferably be re-calibrated the importance of same position pixels all in space, and be assigned to weigh accordingly Value, improves the accuracy rate of semantic segmentation, and important channel can be just assigned to high weight, prominent importance by channel characteristics re-graduation. Then the characteristic pattern of feature weight correction module output is passed through to 11 × 1 convolutional layer, obtained characteristic pattern De_1, by characteristic pattern De_1 is up-sampled to 28 × 28, and obtained characteristic pattern passes through 23 × 3 convolution, batch normalization layer and line rectification units, most It connects afterwards with characteristic pattern Res_3, obtained characteristic pattern De_2；De_1De_2 is up-sampled to 56 × 56, using 23 × 3 Convolution, batch normalization layer and line rectification unit, connect with characteristic pattern Res_2, obtained characteristic pattern De_3, by characteristic pattern De_3 Up-sampling is to 112 × 112, using 23 × 3 convolution, batch normalization layer and line rectification unit, obtained characteristic pattern De_4, Characteristic pattern De_4 is finally passed through into 1 variable weight global pool, is finally up-sampled to original image size, and marks and calculates with semantic segmentation Cross entropy is propagated using direction of error, obtains the network model of semantic segmentation.Variable weight global pool, since traditional overall situation is flat Office's pondization operation, is carried out same operation, i.e. 1 × 1 convolution to the same position in all feature channels, cannot protrude semantic segmentation In the correct class categories of each pixel pass through mark to 1 × 1 convolution in global average pond plus 1 weight vector Quasi- Gaussian Profile carries out parameter initialization, in training process, according to backpropagation, is assigned to Gao Quan to the pixel for belonging to target category Value, can preferably be classified pixel-by-pixel, moreover it is possible to play the role of accelerating convergent.The present invention is in VOC2012 semantic segmentation number According to achieved on collection mIoU be 76.33% result.

All technology deformations made according to the technique and scheme of the present invention, fall within the scope of protection of the present invention.

Claims

1. the image, semantic dividing method based on full convolutional neural networks, which comprises the steps of:

Step 1: selection training dataset；

Step 2: constructing and train the semantic segmentation model front network by image to class label；

The structure of semantic segmentation model front network includes Conv1, Conv2_x, Conv3_x and Conv4_x, Conv1, Conv2_ X, Conv3_x and Conv4_x includes multiple convolutional layers, Conv4_x, after Conv1, Conv2_x, Conv3_x and Conv4_x Face is all connected with a details and retains pond layer；

Step 3: based on trained semantic segmentation model front network, constructing semantic segmentation model back-end network；

The structure of back-end network include details retain pond layer two, feature weigh correction module, convolutional layer, Conv5_x, Conv6_x, Conv7_x, convolutional layer, variable weight global pool layer and up-sampling layer；The output of Conv1, Conv2_x, Conv3_x pass through three respectively After being connected in series after a details reservation pond layer two with Conv4_x, common input feature vector weight correction module；Conv5_x,Conv6_ A up-sampling layer is all connected with before x and Conv7_x, Conv5_x, Conv6_x and Conv7_x include convolutional layer, batch normalization Layer and line rectification unit, Conv5_x, Conv6_x, Conv7_x by jump structure respectively successively with Conv3_x, Conv2_x It connects with the output characteristic pattern of Conv1；

Step 4: whole image semantic segmentation model is trained；

Step 5: new image is inputted, a propagated forward is carried out in trained deep neural network model, it is end-to-end The semantic segmentation result of ground output prediction.

2. the image, semantic dividing method according to claim 1 based on full convolutional neural networks, which is characterized in that described It include 33 residual error structures in semantic segmentation front network, each residual error structure includes 11 × 1 convolution, 13 × 3 volume Product, 11 × 1 convolution sum 1 quick connection.

3. the image, semantic dividing method according to claim 1 based on full convolutional neural networks, which is characterized in that described Details after Conv1 retains down-sampled 8 times of output characteristic pattern of pond layer a pair of Conv1, and the details after Conv2_x retains pond Down-sampled 4 times of output characteristic pattern of layer a pair of Conv2_x, the details after Conv3_x retain the output of pond layer a pair of Conv3_x Down-sampled 2 times of characteristic pattern.

4. the image, semantic dividing method according to claim 1 based on full convolutional neural networks, which is characterized in that

The details retains pond layer one and details retains the detailed process of pond layer two are as follows:

Wherein,Indicate that input feature vector figure retains the value of pond Hua Cenghou output position P by details；

Neighborhood spaceThe space weighted average ω of input node_α,β[p, q] is

Wherein α is biased exponent, and β is reward index；ρ_β() is anti-bilateral filtering function, is used in neighborhood space Ω_pIt calculates defeated The weight of access point, β reduce the dynamic range of reward function, and β → 0 is exactly simple neighborhood averaging.

It is linear-scale reduction factor, specifically:

Wherein F is in neighborhoodOn one can learn, the 2D filter of nonstandardized technique, thisSize be 3 × 3.

5. the image, semantic dividing method according to claim 1 based on full convolutional neural networks, which is characterized in that described Feature weight correction module is to combine the space characteristics re-graduation just positive network module with channel characteristics re-graduation.

6. the image, semantic dividing method according to claim 1 based on full convolutional neural networks, which is characterized in that training The process of whole image semantic segmentation model are as follows:

Step 4.1: the image concentrated to training data pre-processes, and is fixed dimension by image cutting-out；

Step 4.2: whole image semantic segmentation model is initialized；

Step 4.3: the data concentrated to training data are expanded by way of overturning, scaling and rotation；

Step 4.4: using the sum of the intersection entropy loss of each pixel as loss function, reusing random tonsure descent algorithm and carry out Error back propagation updates model parameter, obtains trained semantic segmentation model.