CN113658200A

CN113658200A - An edge-aware image semantic segmentation method based on adaptive feature fusion

Info

Publication number: CN113658200A
Application number: CN202110864679.9A
Authority: CN
Inventors: 郭军; 顾文哲; 董蔼萱; 白经纬; 崔中健; 郭欣然; 李威; 李泽霖; 张斌
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-16
Anticipated expiration: 2041-07-29
Also published as: CN113658200B

Abstract

The present invention provides an edge-aware image semantic segmentation method based on adaptive feature fusion, which is a new semantic segmentation method based on residual network, and is a double-branch network structure model, including edge branch and semantic branch, wherein , the edge branch is drawn from the shallow part of the semantic branch, and the semantic branch adopts an encoding-decoding structure. In the edge branch, the added multi-scale cross-fusion operation obtains image multi-scale features by superimposing hole convolutions with different hole rates. At the same time, the cross-fusion between each branch can further improve the robustness of multi-scale features. In the semantic branch The fusion of deep features and shallow features based on the spatial attention mechanism can obtain the rich spatial information contained in the shallow features while filtering the large amount of noise contained in them; finally, the semantic branch features and edge branch features are fused to further Optimize the segmentation effect.

Description

Edge perception image semantic segmentation method based on self-adaptive feature fusion

Technical Field

The invention relates to the technical field of semantic segmentation, in particular to an edge perception image semantic segmentation method based on self-adaptive feature fusion.

Background

Semantic segmentation refers to the realization of pixel-level automatic identification and segmentation of image contents, and is widely applied to a plurality of fields such as land utilization planning, earthquake monitoring, vegetation classification, environmental pollution monitoring and the like at present. For example, by analyzing the remote sensing images of the atmosphere, the distribution state of the atmospheric pollutants can be clarified so as to monitor the atmospheric pollution. How to accurately segment images is always a hotspot and a difficulty of domestic and foreign research.

In recent years, with the rapid development of the deep learning field, the semantic segmentation research work has made a breakthrough progress, and the semantic segmentation model based on the convolutional neural network has been significantly improved in the aspects of computational efficiency, accuracy and the like. The current popular semantic segmentation models are full convolution neural network FCN, U-Net and Deeplab series. Although these advanced models can achieve good segmentation, there are two main problems: (1) the edge features of the image are not fully extracted, so that the segmentation performance of the model in the edge region is poor. (2) The shallow feature maps are not filtered before the shallow and deep feature maps are fused, introducing a large amount of noise information.

Disclosure of Invention

Aiming at the defects of the prior art and solving the problems of the existing semantic segmentation model, the invention provides an edge perception image semantic segmentation method based on self-adaptive feature fusion, which is used for segmenting an image so as to realize a better image segmentation effect.

In order to achieve the technical effect, the invention provides an edge perception image semantic segmentation method based on self-adaptive feature fusion, which comprises the following steps:

the method comprises the following steps: making a data set, collecting N images and carrying out pixel-level classification labeling on each image, wherein each sample in the data set comprises one image and a pixel-level labeling result of the image;

step two: building an edge perception image semantic segmentation model based on self-adaptive feature fusion, wherein the edge perception image semantic segmentation model is a double-branch network structure which takes a ResNet network as a main body and comprises edge branches and semantic branches;

step three: dividing a data set into a training set and a testing set, training a model by using the training set, and verifying the trained model by using the testing set;

step four: and using the trained semantic segmentation model for image segmentation to obtain the segmentation result of the image.

The second step comprises the following steps:

step 1: using a ResNet network model as a downsampling stage of semantic branches, leading out an edge branch at the downsampling stage of the semantic branches, and performing multi-scale cross fusion operation on edge features in the edge branch;

step 2: performing up-sampling on output features of a semantic branch down-sampling stage, and performing fusion of deep features and shallow features in an up-sampling stage to obtain semantic features containing rich spatial information;

and step 3: edge feature for edge branch output

Semantic features and semantic branching output

Fusion was performed to obtain fused feature F'.

The step 1 comprises the following steps:

step 1.1: taking shallow features output by the semantic branch downsampling stage as input features of edge branches, and performing convolution processing;

step 1.2: processing the edge characteristics by using the void convolution layers with the porosity of 7, 5 and 3 respectively to obtain 3 characteristics F₁、F₂、F₃；

Step 1.3: for feature F₁Performing convolution treatment to obtain new characteristic F'₁；

Step 1.4: feature F 'is spliced'₁And F₂Fusing to obtain new characteristic F'₂；

Step 1.5: feature F 'is spliced'₂And F₃Fusing to obtain new characteristic F'₃；

Step 1.6: feature F 'is spliced'₁、F′₂And F'₃Carrying out fusion to obtain a new characteristic F';

step 1.7: the new feature F' is successively processed by convolution and upsampling to obtain the final edge feature

The step 2 comprises the following steps:

step 2.1: obtaining 4 features M from a downsampling stage of semantic branching₁、M₂、M₃、M₄；

Step 2.2: for feature M₄After convolution processing, the characteristic M is compared with₃Multiplying to obtain new characteristic M'₃；

Step 2.3: fusion of feature M 'in a splicing manner'₃And M₄And obtaining an output characteristic M' through two-layer convolution processing₃；

Step (ii) of2.4: for output characteristic M ″)₃After convolution processing, the characteristic M is compared with₂Multiplying to obtain new characteristic M'₂；

Step 2.5: fusing feature map M 'in a splicing manner'₂And M ″)₃And obtaining an output characteristic M' through two-layer convolution processing₂；

Step 2.6: for output characteristic M ″)₂After convolution processing, the characteristic M is compared with₁Multiplying to obtain new characteristic M'₁；

Step 2.7: fusion of feature M 'in a splicing manner'₁And M ″)₂And obtaining semantic features of semantic branch output through two-layer convolution processing

The step 3 comprises the following steps:

step 3.1: edge feature by splicing

And semantic features

Fusing, and performing convolution processing to obtain a new characteristic W';

step 3.2: carrying out global pooling, convolution and Sigmoid activation processing on the feature W 'in sequence to obtain a new feature W'₁；

Step 3.3: mixing characteristics W 'and W'₁Multiplication to obtain new characteristic W'₂；

Step 3.4: mixing characteristics W 'and W'₂Fusion was performed in an additive manner to obtain fused features F'.

The invention has the beneficial effects that:

the invention provides an edge perception image semantic segmentation method based on self-adaptive feature fusion, which is a new semantic segmentation method based on a residual error network (ResNet). The model constructed based on the method is a double-branch network structure model and comprises edge branches and semantic branches, wherein the edge branches are led out from the shallow part of the semantic branches, and the semantic branches adopt coding and decoding structures. In the edge branches, the added multi-scale cross fusion operation acquires image multi-scale features by superposing the void convolutions with different void ratios, meanwhile, the cross fusion among the branches can further improve the robustness of the multi-scale features, and the deep-layer features and the shallow-layer features are fused in the semantic branches based on a space attention mechanism, so that abundant space information contained in the shallow-layer features can be obtained, and simultaneously, a large amount of noise contained in the shallow-layer features can be filtered; and finally, fusing the semantic branch features and the edge branch features to further optimize the segmentation effect.

Drawings

FIG. 1 is a flow chart of an edge perception image semantic segmentation method based on adaptive feature fusion in the present invention;

FIG. 2 is a schematic diagram of the construction of a semantic segmentation model in the present invention;

FIG. 3 is a schematic diagram of the multi-scale cross-fusion operation of the present invention;

FIG. 4 is a schematic diagram of the fusion of deep and shallow features in the present invention;

FIG. 5 is a schematic diagram of edge feature and semantic feature fusion in the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

As shown in fig. 1, an edge-aware image semantic segmentation method based on adaptive feature fusion includes:

the data set adopted in this embodiment is an ISPRS Vaihingen data set, which includes six categories: impervious surfaces, buildings, low-rise vegetation, trees, automobiles, backgrounds; the data set contained a total of 33 images with an average size of 2494 × 2046 and a spatial resolution of 9 cm.

And (3) data preprocessing, namely, in order to further improve the segmentation precision of the model, methods such as random inversion, random cutting, random scaling and the like are adopted for the training data set.

A schematic diagram for constructing a semantic segmentation model is shown in fig. 2, wherein the network overall structure comprises semantic branches and edge branches, the semantic branches are mainly used for extracting semantic features, and a space attention fusion module (SA) is inserted into the semantic branches for fusing shallow semantic features and deep semantic features; the edge branch is mainly used for extracting edge features, and a multi-scale cross fusion Module (MSB) is inserted into the edge branch for fully digging the edge features; finally, a channel adaptation module (CA) is referenced for the fusion of semantic features and edge features. Wherein, MSB: a multi-scale cross-fusion module; and SA: a spatial attention fusion module; CA: a channel adaptation module; mul: multiplying; add: adding; UpSample: upsampling; concat: splicing; conv: convolution; BN: carrying out batch standardization; ReLU: activating; output: and outputting the characteristics.

Step two: constructing an edge perception image semantic segmentation model based on self-adaptive feature fusion, wherein the edge perception image semantic segmentation model comprises an edge branch and semantic branch double-branch network structure by taking a ResNet network as a main body; the method comprises the following steps:

step 1: using a ResNet network model as a downsampling stage of semantic branching, leading out an edge branch at the downsampling stage of the semantic branching, performing multi-scale cross fusion operation on edge features in the edge branch, and extracting more detailed edge features, as shown in FIG. 3, performing convolution on the edge features through 3 void convolution layers with different porosity to obtain 3 features with different scales, and then performing cross fusion operation (instead of simple splicing or addition) on the 3 features to more fully extract edge information; the method comprises the following steps:

Step 2: the ResNet network is a down-sampling stage of semantic branching, and performs up-sampling on output features of the down-sampling stage of semantic branching, and performs fusion of deep features and shallow features in the up-sampling stage to obtain semantic features containing rich spatial information, as shown in fig. 4, including:

Step 2.4: for output characteristic M ″)₃After convolution processing, the characteristic M is compared with₂Multiplying to obtain new characteristic M'₂；

And step 3: edge feature for edge branch output

Semantic features and semantic branching output

The fusion is performed to obtain a fused feature F', and a finer segmentation result is obtained, as shown in fig. 5, which includes:

step 3.1: edge feature by splicing

And semantic features

Step 3.4: mixing characteristics W 'and W'₂Fusing in an adding mode to obtain fused characteristics F', wherein the characteristics output in the step are a group of pixel data, and the pixel data are visualized and converted into an image to obtain a segmented image;

step three: dividing a data set into a training set and a testing set, wherein the training set is used for training a model, the testing set is used for evaluating the final performance of the model, the trained model is verified by using the testing set, images in the testing data set are input into the trained semantic segmentation model, the obtained semantic segmentation result is compared with the labels of the testing set, and the segmentation precision is calculated; visualizing the segmentation result of the input image to explicitly show the segmentation effect;

In order to verify the effect of the invention, compared with the FCN, PSPNet, scaled FCN, RotEqNet and DeepLab v3+, which have better effect, the experimental data is ISPRS Vaihingen data set widely used in semantic segmentation. Table 1 gives the results of the comparative experiments.

When the model is trained, the image is cut into 512 × 512 sub-blocks by a sliding window, considering that the image resolution in the ISPRS Vaihingen data set is too large to be input into the model at one time. In addition, the model is built by a pyrrch frame, training is completed on a GeForce RTX 2080Ti, an Adam optimizer is adopted in the training process, the initial learning rate is set to be 0.001, in order to improve the stability of the model in the training stage and jump out the local minimum value, a learning rate reduction strategy of cosine annealing is adopted, the epoch times in the three annealing processes are respectively 10, 20 and 40, and finally the model with the highest precision on the verification set is selected as the final result.

Table 1: comparative experiment results

As can be seen from Table 1, the proposed edge perception image semantic segmentation method based on adaptive feature fusion is verified on an ISPRS Vaihingen data set, the obtained segmentation precision is higher than that of other networks, and the effectiveness of the method is proved.

Claims

1. an edge-aware image semantic segmentation method based on adaptive feature fusion, is characterized in that, comprises:

Step 1: Make a data set, collect N images and perform pixel-level classification and labeling on each image. Each sample in the data set includes an image and the pixel-level labeling result of the image;

Step 2: build an edge-aware image semantic segmentation model based on adaptive feature fusion, and the edge-aware image semantic segmentation model is a dual-branch network structure with a ResNet network as the main body and including edge branches and semantic branches;

Step 3: Divide the data set into a training set and a test set, use the training set to train the model, and use the test set to verify the trained model;

Step 4: Use the trained semantic segmentation model for image segmentation to obtain image segmentation results.

2. a kind of edge-aware image semantic segmentation method based on adaptive feature fusion according to claim 1, is characterized in that, described step 2 comprises:

Step 1: Use the ResNet network model as the downsampling stage of the semantic branch, introduce an edge branch in the downsampling stage of the semantic branch, and perform a multi-scale cross-fusion operation on the edge features in the edge branch;

Step 2: Upsampling the output features of the semantic branch downsampling stage, and fuses deep features and shallow features in the upsampling stage to obtain semantic features with rich spatial information;

Step 3: Edge features output by edge branch

and semantic features output by the semantic branch

Perform fusion to obtain the fused feature F'.

3. a kind of edge-aware image semantic segmentation method based on adaptive feature fusion according to claim 1, is characterized in that, described step 1 comprises:

Step 1.1: Use the shallow feature output from the downsampling stage of the semantic branch as the input feature of the edge branch, and perform convolution processing;

Step 1.2: Use the hole convolution layers with aperture ratios of 7, 5, and 3 to process the edge features respectively, and obtain 3 features F ₁ , F ₂ , and F ₃ ;

Step 1.3: Convolve the feature F ₁ to obtain a new feature F ₁ ';

Step 1.4: fuse the features F ₁ ′ and F ₂ in a splicing manner to obtain a new feature F′ ₂ ;

Step 1.5: fuse the features F′ ₂ and F ₃ in a splicing manner to obtain a new feature F′ ₃ ;

Step 1.6: fuse the features F ₁ ', F' ₂ and F' ₃ in a splicing manner to obtain a new feature F";

Step 1.7: Perform convolution and upsampling on the new feature F" to obtain the final edge feature

4. a kind of edge-aware image semantic segmentation method based on adaptive feature fusion according to claim 1, is characterized in that, described step 2 comprises:

Step 2.1: Obtain 4 features M ₁ , M ₂ , M ₃ , M ₄ from the down-sampling stage of the semantic branch;

Step 2.2: Convolve the feature M ₄ and multiply it with the feature M ₃ to obtain a new feature M′ ₃ ;

Step 2.3: fuse the features M' ₃ and M ₄ in a splicing manner, and obtain the output feature M' ₃ through two layers of convolution processing;

Step 2.4: After performing convolution processing on the output feature M" ₃ , multiply it with the feature M ₂ to obtain a new feature M'₂;

Step 2.5: fuse the feature maps M' ₂ and M" ₃ in a splicing manner, and obtain the output feature M" ₂ through two layers of convolution processing;

Step 2.6: After the output feature M″ ₂ is subjected to convolution processing, it is multiplied with the feature M ₁ to obtain a new feature M′ ₁ ;

Step 2.7: Merge the features M' ₁ and M" ₂ by splicing, and obtain the semantic features output by the semantic branch through two-layer convolution processing

5. a kind of edge-aware image semantic segmentation method based on adaptive feature fusion according to claim 1, is characterized in that, described step 3 comprises:

Step 3.1: Concatenate edge features

and semantic features

Fusion is performed, and then a new feature W' is obtained through convolution processing;

Step 3.2: Perform global pooling, convolution and sigmoid activation on the feature W' in turn to obtain a new feature W ₁ ';

Step 3.3: Multiply the features W' and W ₁ ' to obtain a new feature W'₂;

Step 3.4: Fusing the features W' and W' ₂ in an additive manner to obtain a fused feature F'.