CN117115442A

CN117115442A - Semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion

Info

Publication number: CN117115442A
Application number: CN202311037515.4A
Authority: CN
Inventors: 程文明; 陈国强; 魏振兴; 张国财; 麻斌鑫
Original assignee: Zhejiang Aerospace Runbo Measurement And Control Technology Co ltd
Current assignee: Zhejiang Aerospace Runbo Measurement And Control Technology Co ltd
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2023-11-24

Abstract

The invention discloses a semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion, which comprises the following steps of: s1, acquiring an infrared photoelectric pod reconnaissance image and a visible photoelectric pod reconnaissance image; s2, respectively extracting features in an infrared photoelectric pod reconnaissance image and a visible photoelectric pod reconnaissance image by using a convNeXT feature extractor; s3, fusing the extracted features through a differential feature fusion module; and S4, performing up-sampling operation one by one through a decoder to obtain a semantic segmentation graph. The invention has the characteristics of effectively improving the segmentation accuracy and reducing the false detection and missing detection phenomena caused by interference factors.

Description

Semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion

Technical Field

The invention relates to the field of remote sensing image processing, in particular to a semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion.

Background

With the development of deep learning technology, many effective semantic segmentation networks have emerged in recent years. However, the mainstream semantic segmentation network mainly adopts a photoelectric scout image of visible light. When the illumination condition is not satisfied, the quality of the visible light image is liable to be degraded, and the segmentation performance is liable to be degraded. For example, most algorithms fail to segment objects correctly in near complete darkness. Therefore, there is a need to develop a semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion to reduce false detection and missing detection phenomena caused by interference factors and improve segmentation accuracy.

Disclosure of Invention

The invention aims to provide a semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion. The invention has the characteristics of effectively improving the segmentation accuracy and reducing the false detection and missing detection phenomena caused by interference factors.

The technical scheme of the invention is as follows: a semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion comprises the following steps:

s1, acquiring an infrared photoelectric pod reconnaissance image and a visible photoelectric pod reconnaissance image;

s2, respectively extracting features in an infrared photoelectric pod reconnaissance image and a visible photoelectric pod reconnaissance image by using a convNeXT feature extractor;

s3, fusing the extracted features through a differential feature fusion module;

and S4, performing up-sampling operation one by one through a decoder to obtain a semantic segmentation graph.

In the semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion, the convNeXT feature extractor processes the image by adopting four ConvNeXt convolution blocks in sequence to extract features with different scales;

the ConvNeXt convolution block connection mode adopts a reverse bottleneck structure, and the separable convolution of the ConvNeXt convolution block connection mode is moved up to a first layer;

the ConvNeXt convolution block is used in the following steps: firstly, carrying out convolution operation on an image by using 7 multiplied by 7 convolution check, and then, carrying out linear regularization treatment and twice 1 multiplied by 1 convolution operation to finish feature extraction; the gel activation function is used to operate between two 1 x 1 convolution operations.

In the semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion, the fusion calculation formula of the differential feature fusion module is as follows:

wherein OUT is the fused output characteristic, M is the characteristic fusion weight, X is the characteristic vector obtained by extracting the characteristic of the infrared photoelectric pod reconnaissance image, and Y is the characteristic vector obtained by extracting the characteristic of the visible photoelectric pod reconnaissance image.

In the semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion, the calculation process of the feature fusion weight M is as follows:

calculating the channel attention of the local feature, wherein the calculation formula is as follows:

L(X)＝B(PWConv2(δ(B(PWConv1(G(X))))))，

wherein PWConv represents point convolution, namely 1 multiplied by 1 convolution, B represents batch normalization layers, delta represents an activation function GELU, and X is a feature vector obtained by extracting features of an infrared photoelectric pod reconnaissance image;

the channel attention of the global feature is calculated, and the calculation formula is as follows:

L(X)＝B(PWConv2(δ(B(PWConv1(X)))))，

wherein G represents a global average pooling operation; PWConv represents a point convolution, i.e., a 1×1 convolution; b represents a batch normalization layer; delta represents the activation function GELU; y is a feature vector obtained by extracting features of a photoelectric pod reconnaissance image of visible light;

and finally, adding the two weights element by element to obtain a final feature fusion weight M.

In the semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion, the decoder adopts a simple residual block plus a sampling function.

Compared with the prior art, the method and the device have the advantages that the advantages of the infrared photoelectric pod reconnaissance image are utilized, the convNEXT feature extractor is utilized to extract the features in the infrared photoelectric pod reconnaissance image and the visible light photoelectric pod reconnaissance image respectively, and the extracted features are fused through the differential feature fusion module, so that the visible light image and the infrared image can be fused effectively, and semantic segmentation with better robustness and higher accuracy is obtained.

In particular

The ConvNeXt convolution block adopted by the invention carries out convolution operation through a 7X 7 convolution check image, and then uses linear regularization treatment and twice 1X 1 convolution operation to finish the extraction of the features; operating using a GELU activation function between two 1 x 1 convolution operations; therefore, noise can be well restrained, and false detection and missing detection caused by interference factors such as illumination vegetation can be reduced. And the difference feature fusion module selectively reserves difference pixel points, so that the model can obtain richer global and local information.

In conclusion, the method has the characteristics of effectively improving the segmentation accuracy and reducing the false detection and missing detection phenomena caused by interference factors.

Experiments prove that on the visible light-infrared photoelectric reconnaissance image data set, the segmentation accuracy is improved by 2.1%, and the accuracy is improved by 2.3%.

Drawings

FIG. 1 is a general block diagram of the present invention;

FIG. 2 is a schematic diagram of a ConvNeXt convolution block structure;

fig. 3 is a schematic structural diagram of the differential feature fusion module.

Detailed Description

The invention is further illustrated by the following figures and examples, which are not intended to be limiting.

Examples. The semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion is formed as shown in figures 1-3, and comprises the following steps:

s3, fusing the extracted features through a differential feature fusion module;

The convNeXT feature extractor processes the image by adopting four ConvNeXt convolution blocks in sequence, and extracts features with different scales;

The fusion calculation formula of the differential feature fusion module is as follows:

The calculation process of the feature fusion weight M is as follows:

L(X)＝B(PWConv2(δ(B(PWConv1(X)))))，

L(X)＝B(PWConv2(δ(B(PWConv1(G(X))))))，

The decoder uses a simple residual block, which is a method proposed by the ResNet network, plus a sampling function (upsampling by bilinear interpolation).

The semantic segmentation network proposed by the invention is shown in figure 1, and ConvNeXt is used as a backbone network of an encoder. The convNeXT feature extractor extracts features with different scales through four ConvNeXt convolution blocks, then carries out differential feature fusion on the features extracted by the infrared photoelectric pod reconnaissance image and the visible photoelectric pod reconnaissance image, finally carries out up-sampling operation one by one through a decoder, and finally obtains a segmentation map. The decoder uses a simple residual block plus a sampling function.

ConvNeXt convolution block as shown in FIG. 2, D7X17 represents a convolution operation with dimension 7, and C is the number of channels. The ConvNeXt connection uses a reverse bottleneck structure to shift its separable convolution up to the first layer. The typical convolution kernel size is 3 x 3, and the present invention uses a 7 x 7 convolution kernel, so the network employed by the present invention is also referred to as a large convolution kernel network. Using linear regularization after separable convolutions, a GELU activation function is used between the two 1 x 1 convolutions.

Feature fusion refers to a combination of features from different levels or branches, typically achieved by simple linear operations. The differential feature fusion structure provided by the invention is shown in fig. 3, and differential pixel points are selectively reserved, so that the network is more trainable. The specific process is as follows:

1) Firstly, subtracting absolute values of initial features X and Y element by element to obtain initial feature differences;

2) The feature fusion weight can be effectively obtained through the multi-scale channel attention unit;

the specific calculation formula can be expressed as:

wherein OUT is the fused output feature; m is a feature fusion weight; m (|X-Y|) and 1-M (|X-Y|) are composed of real numbers between 0 and 1, so that characteristic points between X and Y in the training process of the network are selected and removed, and finally the characteristic which is most contributed to the segmentation task is selected, so that the model can obtain richer global and local information.

By introducing a multi-scale channel attention unit, feature fusion weights can be effectively acquired. The unit consists of two separate parts: the attention weight containing global features is extracted by global average pooling; the other contains the attention weight of the local feature, directly using the point convolution operation.

The channel attention calculating method of the local features comprises the following steps:

L(X)＝B(PWConv2(δ(B(PWConv1(X)))))

wherein PWConv represents point convolution, namely 1X 1 convolution, reduces the number of input characteristic channels to 1/r originally, B represents a batch normalization layer, delta represents an activation function, and the invention refers to the structure of ConvNeXt and changes the activation function in two attention modules into GELU. The number of channels is restored to be the same as the number of the original input channels by convolution of 1×1, and r refers to the channel scaling ratio.

The channel attention calculation mode of the global features is different from the local channel attention in that the input features are subjected to a global average pooling operation. And adding the calculated two weights element by element to obtain a final feature fusion weight M.

Claims

1. The semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion is characterized by comprising the following steps of:

s3, fusing the extracted features through a differential feature fusion module;

2. The semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion according to claim 1, wherein the semantic segmentation method is characterized in that: the convNeXT feature extractor processes the image by adopting four ConvNeXt convolution blocks in sequence, and extracts features with different scales;

3. The semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion according to claim 2, wherein a fusion calculation formula of the differential feature fusion module is as follows:

4. The semantic segmentation method based on visible light-infrared photoelectric reconnaissance image fusion according to claim 3, wherein the calculation process of the feature fusion weight M is as follows:

L(X)＝B(PWConv2(δ(B(PWConv1(X))))，

L(X)＝B(PWConv2(δ(B(PWConv1(G(X))))))，

5. The semantic segmentation method based on visible-infrared photoelectric scout image fusion according to claim 1, wherein the decoder uses a residual block plus a sampling function.