Disclosure of Invention
Aiming at the defects of the prior art and solving the problems of the existing semantic segmentation model, the invention provides an edge perception image semantic segmentation method based on self-adaptive feature fusion, which is used for segmenting an image so as to realize a better image segmentation effect.
In order to achieve the technical effect, the invention provides an edge perception image semantic segmentation method based on self-adaptive feature fusion, which comprises the following steps:
the method comprises the following steps: making a data set, collecting N images and carrying out pixel-level classification labeling on each image, wherein each sample in the data set comprises one image and a pixel-level labeling result of the image;
step two: building an edge perception image semantic segmentation model based on self-adaptive feature fusion, wherein the edge perception image semantic segmentation model is a double-branch network structure which takes a ResNet network as a main body and comprises edge branches and semantic branches;
step three: dividing a data set into a training set and a testing set, training a model by using the training set, and verifying the trained model by using the testing set;
step four: and using the trained semantic segmentation model for image segmentation to obtain the segmentation result of the image.
The second step comprises the following steps:
step 1: using a ResNet network model as a downsampling stage of semantic branches, leading out an edge branch at the downsampling stage of the semantic branches, and performing multi-scale cross fusion operation on edge features in the edge branch;
step 2: performing up-sampling on output features of a semantic branch down-sampling stage, and performing fusion of deep features and shallow features in an up-sampling stage to obtain semantic features containing rich spatial information;
and step 3: edge feature for edge branch output
Semantic features and semantic branching output
Fusion was performed to obtain fused feature F'.
The step 1 comprises the following steps:
step 1.1: taking shallow features output by the semantic branch downsampling stage as input features of edge branches, and performing convolution processing;
step 1.2: processing the edge characteristics by using the void convolution layers with the porosity of 7, 5 and 3 respectively to obtain 3 characteristics F1、F2、F3;
Step 1.3: for feature F1Performing convolution treatment to obtain new characteristic F'1;
Step 1.4: feature F 'is spliced'1And F2Fusing to obtain new characteristic F'2;
Step 1.5: feature F 'is spliced'2And F3Fusing to obtain new characteristic F'3;
Step 1.6: feature F 'is spliced'1、F′2And F'3Carrying out fusion to obtain a new characteristic F';
step 1.7: the new feature F' is successively processed by convolution and upsampling to obtain the final edge feature
The step 2 comprises the following steps:
step 2.1: obtaining 4 features M from a downsampling stage of semantic branching1、M2、M3、M4;
Step 2.2: for feature M4After convolution processing, the characteristic M is compared with3Multiplying to obtain new characteristic M'3;
Step 2.3: fusion of feature M 'in a splicing manner'3And M4And obtaining an output characteristic M' through two-layer convolution processing3;
Step (ii) of2.4: for output characteristic M ″)3After convolution processing, the characteristic M is compared with2Multiplying to obtain new characteristic M'2;
Step 2.5: fusing feature map M 'in a splicing manner'2And M ″)3And obtaining an output characteristic M' through two-layer convolution processing2;
Step 2.6: for output characteristic M ″)2After convolution processing, the characteristic M is compared with1Multiplying to obtain new characteristic M'1;
Step 2.7: fusion of feature M 'in a splicing manner'
1And M ″)
2And obtaining semantic features of semantic branch output through two-layer convolution processing
The step 3 comprises the following steps:
step 3.1: edge feature by splicing
And semantic features
Fusing, and performing convolution processing to obtain a new characteristic W';
step 3.2: carrying out global pooling, convolution and Sigmoid activation processing on the feature W 'in sequence to obtain a new feature W'1;
Step 3.3: mixing characteristics W 'and W'1Multiplication to obtain new characteristic W'2;
Step 3.4: mixing characteristics W 'and W'2Fusion was performed in an additive manner to obtain fused features F'.
The invention has the beneficial effects that:
the invention provides an edge perception image semantic segmentation method based on self-adaptive feature fusion, which is a new semantic segmentation method based on a residual error network (ResNet). The model constructed based on the method is a double-branch network structure model and comprises edge branches and semantic branches, wherein the edge branches are led out from the shallow part of the semantic branches, and the semantic branches adopt coding and decoding structures. In the edge branches, the added multi-scale cross fusion operation acquires image multi-scale features by superposing the void convolutions with different void ratios, meanwhile, the cross fusion among the branches can further improve the robustness of the multi-scale features, and the deep-layer features and the shallow-layer features are fused in the semantic branches based on a space attention mechanism, so that abundant space information contained in the shallow-layer features can be obtained, and simultaneously, a large amount of noise contained in the shallow-layer features can be filtered; and finally, fusing the semantic branch features and the edge branch features to further optimize the segmentation effect.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
As shown in fig. 1, an edge-aware image semantic segmentation method based on adaptive feature fusion includes:
the method comprises the following steps: making a data set, collecting N images and carrying out pixel-level classification labeling on each image, wherein each sample in the data set comprises one image and a pixel-level labeling result of the image;
the data set adopted in this embodiment is an ISPRS Vaihingen data set, which includes six categories: impervious surfaces, buildings, low-rise vegetation, trees, automobiles, backgrounds; the data set contained a total of 33 images with an average size of 2494 × 2046 and a spatial resolution of 9 cm.
And (3) data preprocessing, namely, in order to further improve the segmentation precision of the model, methods such as random inversion, random cutting, random scaling and the like are adopted for the training data set.
A schematic diagram for constructing a semantic segmentation model is shown in fig. 2, wherein the network overall structure comprises semantic branches and edge branches, the semantic branches are mainly used for extracting semantic features, and a space attention fusion module (SA) is inserted into the semantic branches for fusing shallow semantic features and deep semantic features; the edge branch is mainly used for extracting edge features, and a multi-scale cross fusion Module (MSB) is inserted into the edge branch for fully digging the edge features; finally, a channel adaptation module (CA) is referenced for the fusion of semantic features and edge features. Wherein, MSB: a multi-scale cross-fusion module; and SA: a spatial attention fusion module; CA: a channel adaptation module; mul: multiplying; add: adding; UpSample: upsampling; concat: splicing; conv: convolution; BN: carrying out batch standardization; ReLU: activating; output: and outputting the characteristics.
Step two: constructing an edge perception image semantic segmentation model based on self-adaptive feature fusion, wherein the edge perception image semantic segmentation model comprises an edge branch and semantic branch double-branch network structure by taking a ResNet network as a main body; the method comprises the following steps:
step 1: using a ResNet network model as a downsampling stage of semantic branching, leading out an edge branch at the downsampling stage of the semantic branching, performing multi-scale cross fusion operation on edge features in the edge branch, and extracting more detailed edge features, as shown in FIG. 3, performing convolution on the edge features through 3 void convolution layers with different porosity to obtain 3 features with different scales, and then performing cross fusion operation (instead of simple splicing or addition) on the 3 features to more fully extract edge information; the method comprises the following steps:
step 1.1: taking shallow features output by the semantic branch downsampling stage as input features of edge branches, and performing convolution processing;
step 1.2: processing the edge characteristics by using the void convolution layers with the porosity of 7, 5 and 3 respectively to obtain 3 characteristics F1、F2、F3;
Step 1.3: for feature F1Performing convolution treatment to obtain new characteristic F'1;
Step 1.4: feature F 'is spliced'1And F2Fusing to obtain new characteristic F'2;
Step 1.5: feature F 'is spliced'2And F3Fusing to obtain new characteristic F'3;
Step 1.6: feature F 'is spliced'1、F′2And F'3Carrying out fusion to obtain a new characteristic F';
step 1.7: the new feature F' is successively processed by convolution and upsampling to obtain the final edge feature
Step 2: the ResNet network is a down-sampling stage of semantic branching, and performs up-sampling on output features of the down-sampling stage of semantic branching, and performs fusion of deep features and shallow features in the up-sampling stage to obtain semantic features containing rich spatial information, as shown in fig. 4, including:
step 2.1: obtaining 4 features M from a downsampling stage of semantic branching1、M2、M3、M4;
Step 2.2: for feature M4After convolution processing, the characteristic M is compared with3Multiplying to obtain new characteristic M'3;
Step 2.3: fusion of feature M 'in a splicing manner'3And M4And obtaining an output characteristic M' through two-layer convolution processing3;
Step 2.4: for output characteristic M ″)3After convolution processing, the characteristic M is compared with2Multiplying to obtain new characteristic M'2;
Step 2.5: fusing feature map M 'in a splicing manner'2And M ″)3And obtaining an output characteristic M' through two-layer convolution processing2;
Step 2.6: for output characteristic M ″)2After convolution processing, the characteristic M is compared with1Multiplying to obtain new characteristic M'1;
Step 2.7: fusion of feature M 'in a splicing manner'
1And M ″)
2And obtaining semantic features of semantic branch output through two-layer convolution processing
And step 3: edge feature for edge branch output
Semantic features and semantic branching output
The fusion is performed to obtain a fused feature F', and a finer segmentation result is obtained, as shown in fig. 5, which includes:
step 3.1: edge feature by splicing
And semantic features
Fusing, and performing convolution processing to obtain a new characteristic W';
step 3.2: carrying out global pooling, convolution and Sigmoid activation processing on the feature W 'in sequence to obtain a new feature W'1;
Step 3.3: mixing characteristics W 'and W'1Multiplication to obtain new characteristic W'2;
Step 3.4: mixing characteristics W 'and W'2Fusing in an adding mode to obtain fused characteristics F', wherein the characteristics output in the step are a group of pixel data, and the pixel data are visualized and converted into an image to obtain a segmented image;
step three: dividing a data set into a training set and a testing set, wherein the training set is used for training a model, the testing set is used for evaluating the final performance of the model, the trained model is verified by using the testing set, images in the testing data set are input into the trained semantic segmentation model, the obtained semantic segmentation result is compared with the labels of the testing set, and the segmentation precision is calculated; visualizing the segmentation result of the input image to explicitly show the segmentation effect;
step four: and using the trained semantic segmentation model for image segmentation to obtain the segmentation result of the image.
In order to verify the effect of the invention, compared with the FCN, PSPNet, scaled FCN, RotEqNet and DeepLab v3+, which have better effect, the experimental data is ISPRS Vaihingen data set widely used in semantic segmentation. Table 1 gives the results of the comparative experiments.
When the model is trained, the image is cut into 512 × 512 sub-blocks by a sliding window, considering that the image resolution in the ISPRS Vaihingen data set is too large to be input into the model at one time. In addition, the model is built by a pyrrch frame, training is completed on a GeForce RTX 2080Ti, an Adam optimizer is adopted in the training process, the initial learning rate is set to be 0.001, in order to improve the stability of the model in the training stage and jump out the local minimum value, a learning rate reduction strategy of cosine annealing is adopted, the epoch times in the three annealing processes are respectively 10, 20 and 40, and finally the model with the highest precision on the verification set is selected as the final result.
Table 1: comparative experiment results
As can be seen from Table 1, the proposed edge perception image semantic segmentation method based on adaptive feature fusion is verified on an ISPRS Vaihingen data set, the obtained segmentation precision is higher than that of other networks, and the effectiveness of the method is proved.