CN116310693A

CN116310693A - Camouflage target detection method based on edge feature fusion and high-order space interaction

Info

Publication number: CN116310693A
Application number: CN202310356445.2A
Authority: CN
Inventors: 牛玉贞; 张家榜; 杨立芬
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-06-23

Abstract

The invention relates to a camouflage target detection method based on edge feature fusion and high-order space interaction. Comprising the following steps: performing data preprocessing, including data pairing and data enhancement processing, to obtain a training data set; designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the network comprises an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module; designing a loss function, and guiding the parameter optimization of the network designed in the step B; c, training the camouflage target detection network based on the edge feature fusion and the high-order space interaction in the step B by using the training data set obtained in the step A, converging to Nash balance, and obtaining a trained camouflage target detection model based on the edge feature fusion and the high-order space interaction; inputting the image to be detected into a trained camouflage target detection model based on edge feature fusion and high-order space interaction, and outputting a mask image of the camouflage target.

Description

Camouflage target detection method based on edge feature fusion and high-order space interaction

Technical Field

The invention relates to the technical fields of image and video processing and computer vision, in particular to a camouflage target detection method based on edge feature fusion and high-order space interaction.

Background

With the development of the technology level, digital image processing has been widely applied to various aspects of human society life, and furthermore, it can be applied to many aspects of scientific research and the like. Camouflage target detection is an emerging digital image processing task, and aims to accurately and efficiently detect a camouflage target embedded in the surrounding environment, and divide an image into the camouflage target and a background so as to find the camouflage target therein. Camouflage phenomenon widely exists in the nature, and living things in the nature utilize self structures and physiological characteristics to blend in the surrounding environment, so that predators are avoided. The camouflage target detection can help to find camouflage organisms in nature and help scientists to better study the natural organisms. The applicable field of camouflage target detection is quite wide, and besides the academic value, the camouflage object detection is also helpful to promote search detection of camouflage targets in military, judgment of disease conditions in medical fields, invasion of locusts in agricultural remote sensing and the like.

Early camouflage target detection methods differentiated camouflage targets from background based on low-level features such as color, texture, geometric gradients, frequency domain, motion, etc. that were made by hand. Most camouflage targets are however very similar in color to the background, and color-based methods only address situations where the object is color-different from the background. The texture feature-based method has good detection effect when the color is very close to the background, but has poor performance when the texture of the camouflage target is similar to the background. The motion-based detection method relies on motion information that locates a camouflage target based on the varying differences between background color and texture created by the motion. However, the method is greatly influenced by interference factors, and the problems of false leakage detection and the like can occur due to illumination change or background movement. The camouflage target detection method based on the manual design features can achieve a certain effect, but often fails in a complex scene.

In recent years, with deep learning being deeply applied to various fields of computer vision, a plurality of camouflage target detection models based on convolutional neural networks appear, and the models model camouflage target information with strong feature extraction capability and autonomous learning capability, so that the accuracy of camouflage target detection can be improved, meanwhile, the generalization of the camouflage target detection models can be enhanced, and the effect is obviously improved compared with that of a traditional camouflage detection method. The mainstream method is to input an image into a backbone network, extract image features from the backbone network, and then predict masks of camouflage targets based on the features, thereby finding the camouflage targets therein. The methods make full use of semantic information of convolutional neural networks and expand receptive fields to detect camouflage targets. However, since the camouflage target has high similarity in color and texture with the background, the camouflage target detection model based on the convolutional neural network has difficulty in learning the characteristics of the camouflage target to distinguish the foreground from the background. Therefore, other methods introduce additional clues, such as edge information, based on the original, so as to help the camouflage target detection module based on the convolutional neural network to better distinguish the camouflage target from the background. Therefore, the accuracy of camouflage target detection can be effectively improved by utilizing the additional information. The invention designs a camouflage target detection method based on edge feature fusion and high-order space interaction, which comprises the steps of firstly extracting image features through a backbone network, then designing an edge perception module to generate an edge mask and edge features, then designing an edge enhancement module and an edge feature fusion module, constructing a high-order space interaction module and a context aggregation module, and finally generating a camouflage target mask by using the designed network.

Disclosure of Invention

In view of the above, the present invention aims to provide a camouflage target detection method based on edge feature fusion and higher-order spatial interaction, which is beneficial to significantly improving the performance of camouflage target detection by fusing edge features and performing higher-order spatial interaction.

In order to achieve the above purpose, the invention adopts the following technical scheme: a camouflage target detection method based on edge feature fusion and high-order space interaction comprises the following steps:

step A, data preprocessing, including data pairing and data enhancement processing, is carried out, and a training data set is obtained;

step B, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network consists of an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module;

c, designing a loss function, and guiding parameter optimization of the network designed in the step B;

step D, training the camouflage target detection network based on the edge feature fusion and the high-order space interaction in the step B by using the training data set obtained in the step A, converging to Nash balance, and obtaining a trained camouflage target detection model based on the edge feature fusion and the high-order space interaction;

And E, inputting the image to be detected into a trained camouflage target detection model based on edge feature fusion and high-order space interaction, and outputting a mask image of the camouflage target.

In a preferred embodiment, the step a is implemented as follows:

a1, forming an image triplet by each original image, a label image corresponding to the original image and an edge label image;

step A2, randomly turning left and right, randomly cutting and randomly rotating each group of image triples; performing color enhancement on the original image, and adjusting the brightness, contrast, saturation and definition of the original image by setting random values as parameters; adding random black points or white points as random noise to the label image corresponding to the original image;

and A3, scaling each image in the data set into images with the same size of H multiplied by W.

In a preferred embodiment, the step B is implemented as follows:

step B1, constructing an image feature extraction network, and extracting image features by using the constructed network;

step B2, designing an edge perception module, and generating an edge mask and edge characteristics by using the designed module;

step B3, designing an edge enhancement module and an edge feature fusion module, enhancing the feature representation with camouflage target edge structure semantics by using the edge enhancement module, and generating features of fusion edge information by using the edge feature fusion module;

Step B4, constructing a high-order space interaction module and a context aggregation module, using the high-order space interaction module to inhibit the attention to the background and promote the attention to the foreground, and using the context aggregation module to mine context semantics to enhance object detection;

and B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module, and generating a final camouflage target mask by using the designed network.

In a preferred embodiment, the step B1 is implemented as follows:

step B1, taking Res2Net-50 as a backbone network, extracting characteristics of an original image I with the input size of H multiplied by W multiplied by 3, and specifically, respectively recording characteristic diagrams output by the original image I in a first stage, a second stage, a third stage and a fourth stage as F ₁ 、F ₂ 、F ₃ And F ₄ Wherein the characteristic diagram F ₁ The size is as follows

Feature map F ₂ The size is +.>

Feature map F ₃ The size is +.>

Feature map F ₄ The size is +.>

C＝256。

In a preferred embodiment, the step B2 is implemented as follows:

step B21, designing an edge perception module, wherein the input of the edge perception module is the first stage characteristic diagram F extracted in the step B1 ₁ And fourth stage characteristic diagram F ₄ The edge perception module outputs as an edge feature map F _e And edge mask M _e ；

Step B22, designing a feature fusion block in the edge perception module; the input of the edge perception module is the feature map F extracted in the step B1 ₁ And F ₄ Input of a feature map F ₁ Sequentially performing 1×1 convolution, BN layer and ReLU activation function to reduce channel number to obtain feature map

Input of a feature map F ₄ The number of channels is reduced by a 1 multiplied by 1 convolution, a BN layer and a ReLU activation function in sequence to obtain a characteristic diagram +.>

Feature map F 'using bilinear interpolation' ₄ Is adjusted to sum F', the width and the height of ₁ The same width and height, a characteristic diagram is obtained>

Will F' ₁ And F' ₄ The edge feature map is obtained by the channel attention module after being spliced along the channel dimension>

The specific formula is as follows:

F ₁ ＝ReLU(BN(Conv1(F ₁ )))

F ₄ ＝ReLU(BN(Conv1(F ₄ )))

”'

F ₄ ＝Up(F ₄ )

Fe＝SE(Concat(F ₁ ,F ₄ ))

wherein Conv1 (·) is a convolution layer with a convolution kernel size of 1×1, BN (·) is a batch normalization operation, reLU (·) is a ReLU activation function, up (·) is bilinear interpolation upsampling, concat (·, ·) is a splice operation along the channel dimension, SE (·) is a channel attention module;

step B23, designing a convolution block in the edge perception module; inputting the edge characteristic diagram F obtained in the step B22 _e Sequentially performing 3×3 convolution, BN layer, reLU activation function, and 1×1 convolution to finally generate edge mask

The specific formula is as follows:

Me＝Conv1(ReLU(BN(Conv3(ReLU(BN(Conv3(Fe))))))))

where Conv3 (·) is the convolution layer with a convolution kernel size of 3×3, BN (·) is the batch normalization operation, reLU (·) is the activation function, conv1 (·) is the convolution with a convolution kernel size of 1×1.

In a preferred embodiment, the step B3 is implemented as follows:

step B31, designing an edge enhancement module, namely firstly designing edge guiding operation in the edge enhancement module; input is the edge mask M obtained in step B2 _e And the feature map obtained in the step B1

Masking the edges of the input M _e Downsampling bilinear interpolation to and from feature map F _i The same width and height, resulting in a mask +.>

Mask M' _e And feature map F _i Multiplying and thenAnd F is equal to _i Adding, and sequentially performing 3×3 convolution, BN layer, and ReLU activation function to obtain edge-guided feature map ++>

The specific formula is as follows:

M' _e ＝Down(M _e )

where Down (·) is a bilinear interpolation downsampling operation,

is a matrix multiplication, +.>

Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, and ReLU (·) is an activation function;

step B32, constructing a CBAM attention sub-module in an edge enhancement module, wherein the edge enhancement module consists of serial channel attention SE and spatial attention SA, and the input feature map is a feature map F obtained in the step B32 _guide Obtaining edge enhancement features

The specific formula is as follows:

F _ee ＝SA(SE(F _guide ))

wherein SE (-) is a channel attention module and SA (-) is a spatial attention module;

step B33, designing an edge feature fusion module, and inputting the edge feature fusion module into the first-stage feature map extracted in the step B1

Edge feature map obtained in step B2 +.>

And edge mask->

Masking the edges M _e And feature map F ₄ Multiplying by F ₄ Adding to obtain a feature map->

Edge feature map F _e Sequentially performing 3×3 convolution, BN layer and ReLU activation function to obtain a reduced channel feature map +.>

Will F _M With F' _e Splicing along the channel dimension, sequentially passing through 3×3 convolution, swish activation function, SE module, and 3×3 convolution, and adding feature map F' _e Obtaining a characteristic diagram +.>

Map F' _e Through SE module and->

Splicing along the channel dimension, and then performing 3X 3 convolution to obtain a feature map +.>

Finally, the feature map is added->

And feature map F ₁ Adding to obtain a feature map of the final fused edge information>

The specific formula is as follows:

F' _e ＝ReLU(BN(Conv3(F _e )))

wherein the method comprises the steps of

Is a matrix multiplication, +.>

Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, reLU (·) is an activation function, swish (·) is a Swish activation function, SE (·) is a channel attention module, concat (·), is a concatenation operation along the channel dimension.

In a preferred embodiment, the step B4 is implemented as follows:

step B41, firstly constructing a gating convolution module in the high-order space interaction module, and recording the characteristic diagram input by the module as

Will input a feature map F _α Layer Normalization (LN) ₁ ) Obtaining a normalized feature map

Then will->

The channel is expanded to twice by a 1X 1 convolution to obtain a characteristic diagram +.>

Will->

Splitting into two feature maps along the channel>

Inputting q into depth separable convolution to obtain a characteristic diagram +.>

Splitting it into n (n is the order) feature maps>

Wherein->

Map the characteristic map p ₀ And feature map q ₀ Multiplying, and expanding its channel to twice by 1×1 convolution to obtain first space interaction characteristic diagram ++>

Map the characteristic map p ₁ And feature map q ₁ Multiplying, and expanding the channel to twice by 1×1 convolution to obtain second space interaction characteristic diagram ++>

Then sequentially iterate to the feature map p _n-1 And feature map q _n-1 After multiplication, a convolution layer with the same number of input channels and output channels and a convolution kernel size of 1 multiplied by 1 is passed, so as to obtain n times of space interaction characteristic diagrams +.>

Finally, the characteristic diagram F is input _α And p is as follows _n Adding to obtain an intermediate output profile->

The specific formula is as follows:

Q＝DWConv(q)

Wherein Split (·) is Split along the channel dimension, DWConv (·) is a depth separable convolution, conv1 (·) is a convolution layer with a convolution kernel size of 1 x 1,

is a matrix multiplication, +.>

Is a matrix addition operation;

step B42, constructing a high-order space interaction moduleIs input into the feed-forward module of step B41 to obtain a characteristic diagram F _mid For F _mid Layer normalization was performed, denoted LN ₂ Then input into two layers of full-connection layers, marked as MLP, and output of the two layers of full-connection layers and the feature map F _mid Adding to obtain high-order space interaction characteristics

The specific formula is as follows:

wherein the method comprises the steps of

Is a matrix addition operation;

step B43, constructing a channel reduction module in the high-order space interaction module, and inputting F obtained in step B42 _hsi F is to F _hsi Sequentially performing 1×1 convolution, BN layer and ReLU activation function to obtain a channel-reduced high-order space interaction characteristic diagram

The specific formula is as follows:

F’ _hsi ＝ReLU(BN(Conv1(F _hsi )))

where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, reLU (·) is the activation function;

step B44, firstly constructing a convolution block in a context aggregation module, and recording that the context aggregation module inputs two feature graphs with different scales

And->

First, feature map F _high Upsampling bilinear interpolation to adjust its width and height to be equal to F _low Likewise, theWidth and height, and then F _low Splicing along the channel dimension, and then sequentially carrying out 1×1 convolution, BN layer and ReLU activation function to obtain a feature map +.>

And then F is arranged _cat Four feature maps are equally divided along the channel dimension>

And->

Will->

And->

After addition, the characteristic diagram ++is obtained by 3X 3 convolution, BN layer and ReLU activation function in sequence>

Will->

And->

After the three are added, the characteristic diagram +.f is obtained by 3 multiplied by 3 convolution with expansion rate of 2, BN layer and ReLU activation function>

Will->

And->

Adding the three componentsThen the characteristic diagram is obtained by 3X 3 convolution with the expansion rate of 3, BN layer and ReLU activation function>

Will->

And->

After addition, the characteristic diagram is obtained by 3X 3 convolution with expansion ratio of 4, BN layer and ReLU activation function

Then will->

And->

After being spliced along the channel dimension, the characteristic diagram +.f is obtained by sequentially carrying out 1 multiplied by 1 convolution, BN layer and ReLU activation function>

Finally F is arranged _cat And F' _cat After addition, the context feature map +.f. is obtained by 3×3 convolution, BN layer and ReLU activation function in sequence>

The specific formula is as follows:

F _cat ＝ReLU(BN(Conv1(Concat(F _low ,Up(F _high )))))

wherein Up (-) is a bilinear interpolation upsampling operation, concat (-), and Concat (-), are concatenation operations along the channel dimension,

is a matrix addition operation, conv3 (&) is a convolution layer with a convolution kernel size of 3×3, conv3 _d＝i (. Cndot.) is a 3X 3 convolution with a dilation rate of i, conv1 (-) is a convolution layer with a convolution kernel size of 1X 1, BN (-) is a batch normalization operation, reLU (-) is an activation function, split (-) is a Split equally operation along the channel dimension.

In a preferred embodiment, the step B5 is implemented as follows:

step B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module; inputting an original image, and obtaining four feature images with different scales through a backbone network in the step B1

And->

Will F ₁ And F ₄ Inputting the edge sensing module in the step B2 to obtain an edge feature map +.>

And edge mask->

Then three edge enhancement modules in step B3 are constructed and respectively marked as EEM ₁ 、EEM ₂ And EEM ₃ Wherein EEM is ₁ Is input as the fourth stage characteristic map F extracted in the step B1 ₄ And an edge mask M obtained in the step B2 _e The output is edge enhancement features

EEM ₂ Is input as the third stage characteristic map F extracted in the step B1 ₃ And an edge mask M obtained in the step B2 _e The output is edge enhancement feature->

EEM ₃ Is input as the second stage feature map F extracted in step B1 ₂ And an edge mask M obtained in the step B2 _e The output is edge enhancement feature->

Then constructing an edge feature fusion module in the step B3, and inputting the edge feature fusion module into the first-stage feature map F extracted in the step B1 ₁ Edge feature map F obtained in step B2 _e And edge mask M _e Output as fused edge informationFeature map->

Then, constructing four high-order space interaction modules in the step B4, which are respectively marked as HSIM ₁ 、HSIM ₂ 、HSIM ₃ And HSIM ₄ Their inputs are the characteristic map obtained in step B3 +.>

And->

The outputs are +.>

And->

In the context aggregation module immediately following the construction of the three steps B4, they are denoted as CAM respectively ₁ 、CAM ₂ And CAM (CAM) ₃ In which CAM is ₁ Is the input of a feature map->

And->

Output as context feature map +.>

CAM ₂ The input of (2) is CAM ₁ Output of (2)

And feature map->

Output as context feature map +.>

CAM ₃ The input of (2) is CAM ₂ Output of +.>

And feature map->

Output as context feature map +.>

For edge mask M _e The two linear interpolation up-sampling is amplified by 4 times to obtain a final edge mask M _edge The method comprises the steps of carrying out a first treatment on the surface of the For contextual profile->

Compressing the mask into 1 channel through 1X 1 convolution, and then performing bilinear interpolation up-sampling and amplifying by 16 times to obtain a first-stage camouflage target mask +.>

For contextual profile->

Compressing the mask into 1 channel by 1X 1 convolution, and performing bilinear interpolation up-sampling and amplification by 8 times to obtain second-stage camouflage target mask +.>

For contextual profile->

Will be convolved by 1 x 1Compressing the mask into 1 channel, and performing bilinear interpolation up-sampling and amplification for 4 times to obtain final camouflage target mask +. >

The specific formula is as follows:

M _edge ＝Up _scale＝4 (M _e )

wherein Up _scale＝4 (. Cndot.) is double linear interpolation upsampling with a multiple of 4, up _scale＝8 (. Cndot.) is bilinear interpolation upsampling with a multiple of 8, up _scale＝16 (. Cndot.) is bilinear interpolation upsampling by a factor of 16, conv1 (. Cndot.) is a convolution layer with a convolution kernel size of 1×1 and output channel number of 1.

In a preferred embodiment, the step C is implemented as follows:

step C, designing a loss function as constraint to optimize a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the specific formula is as follows:

wherein G is _camo Representing a label image corresponding to the original image I, G _edge Representing the edge label image to which the original image I corresponds,

expressed as the total loss functionCount (n)/(l)>

Representing weighted binary cross entropy loss, ">

Represented as a weighted cross-ratio loss,

the Dice coefficient loss is represented, and λ is represented as the weight of the loss.

In a preferred embodiment, the step D is implemented as follows:

step D1, randomly dividing the training data set obtained in the step A into a plurality of batches, wherein each batch comprises N pairs of images;

step D2, inputting an original image I, and obtaining an edge mask M after the original image I passes through the camouflage target detection network based on edge feature fusion and high-order space interaction in the step B _edge Camouflage target mask

And->

Calculating the loss +.using the formula in step C>

Step D3, calculating the gradient of the parameters in the network by using a back propagation method according to the loss, and updating the network parameters by using an Adam optimization method;

step D4, repeatedly executing the steps D1 to D3 by taking the batch as a unit until the target loss function value of the network converges to Nash balance, and storing network parameters to obtain a camouflage target detection model based on edge feature fusion and higher-order space interaction; for a camouflage target image tested toThe highest resolution of three camouflage target masks for model prediction

As the final camouflage target mask.

Compared with the prior art, the invention has the following beneficial effects: according to the invention, on the basis of utilizing the good edge information, the edge information is better fused with the main features, and the features are subjected to high-order spatial interaction, so that the relationship between the camouflage target and the background in the image can be better learned. The invention provides a camouflage target detection method based on edge feature fusion and high-order space interaction, which is characterized in that edge features and edge masks are respectively generated through an edge perception module, edge information is fused in an edge enhancement module and an edge feature fusion module, high-order space interaction is carried out on the fused features in a high-order space interaction module, and finally, features of different levels are aggregated in a context aggregation module, so that a high-quality camouflage target mask can be finally output.

Drawings

FIG. 1 is a flow chart of an implementation of the method in a preferred embodiment of the invention.

FIG. 2 is a block diagram of a camouflage object detection network based on edge feature fusion and higher order spatial interaction in a preferred embodiment of the invention.

Fig. 3 is a block diagram of an edge-aware module in a preferred embodiment of the present invention.

Fig. 4 is a block diagram of an edge enhancement module in a preferred embodiment of the present invention.

Fig. 5 is a block diagram of an edge feature fusion module in a preferred embodiment of the invention.

Fig. 6 is a block diagram of a high-order spatial interaction module in a preferred embodiment of the present invention.

FIG. 7 is a block diagram of a context aggregation module in accordance with a preferred embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The invention provides a camouflage target detection method based on edge feature fusion and high-order space interaction, which is shown in fig. 1-7 and comprises the following steps:

step B, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the network comprises an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module;

Further, the step a includes the steps of:

and A1, forming an image triplet by each original image, the corresponding label image and the corresponding edge label image.

Step A2, randomly turning left and right, randomly cutting and randomly rotating each group of image triples; performing color enhancement on the original image, and adjusting the brightness, contrast, saturation and definition of the original image by setting random values as parameters; and adding random black points or white points as random noise to the label image corresponding to the original image.

Further, the step B includes the steps of:

and B1, constructing an image feature extraction network, and extracting image features by using the constructed network.

And B2, designing an edge perception module, and generating an edge mask and edge characteristics by using the designed module.

And B3, designing an edge enhancement module and an edge feature fusion module, enhancing the feature representation with camouflage target edge structure semantics by using the edge enhancement module, and generating features of fusion edge information by using the edge feature fusion module.

And B4, constructing a high-order space interaction module and a context aggregation module, inhibiting the attention to the background by using the high-order space interaction module, promoting the attention to the foreground, and mining context semantics by using the context aggregation module to enhance object detection.

Further, step B1 includes the steps of:

Feature map F ₂ The size is +.>

Feature map F ₃ The size is +.>

Feature map F ₄ The size is +.>

Further, as shown in fig. 3, step B2 includes the steps of:

step B21, designing an edge perception module, wherein the input of the edge perception module is the first stage characteristic diagram F extracted in the step B1 ₁ And fourth stage characteristic diagram F ₄ The output of the module is an edge feature map F _e And edge mask M _e 。

And step B22, designing a feature fusion block in the edge perception module. The input of the module is the feature map F extracted in the step B1 ₁ And F ₄ Input of a feature map F ₁ Sequentially performing 1×1 convolution, BN layer and ReLU activation function to reduce channel number to obtain feature map

Feature map F 'using bilinear interpolation' ₄ Width and height of (2) are adjusted to sum F' ₁ The same width and height, a characteristic diagram is obtained>

Will F' ₁ And F'. ₄ The edge special is obtained through the channel attention module after the channel dimension is splicedSyndrome/pattern of->

The specific formula is as follows:

F’ ₁ ＝ReLU(BN(Conv1(F ₁ )))

F’ ₄ ＝ReLU(BN(Conv1(F ₄ )))

F” ₄ ＝Up(F’ ₄ )

F _e ＝SE(Concat(F’ ₁ ,F” ₄ ))

where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, reLU (·) is the ReLU activation function, up (·) is bilinear interpolation upsampling, concat (·, ·) is the splice operation along the channel dimension, SE (·) is the channel attention module.

And step B23, designing a convolution block in the edge perception module. Inputting the edge characteristic diagram F obtained in the step B22 _e Sequentially performing 3×3 convolution, BN layer, reLU activation function, and 1×1 convolution to finally generate edge mask

The specific formula is as follows:

M _e ＝Conv1(ReLU(BN(Conv3(ReLU(BN(Conv3(F _e ))))))))

Further, as shown in fig. 4 and 5, step B3 includes the steps of:

and step B31, designing an edge enhancement module, and firstly designing an edge guiding operation in the edge enhancement module. Input is the edge mask M obtained in step B2 _e And the feature map obtained in the step B1

Mask M' _e And feature map F _i Multiplying by F _i Adding, and sequentially performing 3×3 convolution, BN layer, and ReLU activation function to obtain edge-guided feature map ++>

The specific formula is as follows:

M' _e ＝Down(M _e )

where Down (·) is a bilinear interpolation downsampling operation,

is a matrix multiplication, +.>

Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, and ReLU (·) is an activation function.

Step B32, constructing a CBAM attention sub-module in the edge enhancement module, wherein the module consists of serial channel attention SE and spatial attention SA, and the input feature map is a feature map F obtained in the step B32 _guide Obtaining edge enhancement features

The specific formula is as follows:

F _ee ＝SA(SE(F _guide ))

where SE (-) is the channel attention module and SA (-) is the spatial attention module.

Edge feature map obtained in step B2 +.>

And edge mask->

Map F' _e Through SE module and->

Finally, the feature map is added->

The specific formula is as follows:

F' _e ＝ReLU(BN(Conv3(F _e )))

wherein the method comprises the steps of

Is a matrix multiplication, +.>

Further, as shown in fig. 6 and 7, step B4 includes the steps of:

step (a)B41, firstly constructing a gating convolution module in the high-order space interaction module, and recording the characteristic diagram input by the module as

Then will->

The channel is enlarged to be twice of the original channel by a 1X 1 convolution to obtain a characteristic diagram

Will->

Splitting into two feature maps along the channel>

Splitting it into n (n is the order) feature maps>

Wherein->

Map the characteristic map p ₁ And feature map q ₁ Multiplying and expanding the channel to the original one by a 1X 1 convolutionTwice as much as the first time, obtain the second time space interaction feature map +.>

The specific formula is as follows:

Q＝DWConv(q)

/>

Wherein Split (·) is Split along the channel dimension, DWConv (.cndot.) is a depth separable convolution, conv1 (.cndot.) is a convolution layer with a convolution kernel size of 1X 1,

is a matrix multiplication, +.>

Is a matrix addition operation.

Step B42, constructing a feedforward module in the high-order space interaction module, and inputting the feedforward module into the feature map F obtained in the step B41 _mid For F _mid Layer Normalization (LN) ₂ ) Then input into two layers of full-connection layer (marked as MLP), and output of the two layers of full-connection layer and the feature map F _mid Adding to obtain high-order space interaction characteristics

The specific formula is as follows:

wherein the method comprises the steps of

Is a matrix addition operation.

The specific formula is as follows:

F’ _hsi ＝ReLU(BN(Conv1(F _hsi )))

where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, and ReLU (·) is the activation function.

Step B44, first build context aggregationThe convolution block in the module is combined, and the module is recorded as two characteristic diagrams with different scales

And->

First, feature map F _high Upsampling bilinear interpolation to adjust its width and height to be equal to F _low The same width and height as F _low Splicing along the channel dimension, and then sequentially carrying out 1×1 convolution, BN layer and ReLU activation function to obtain a feature map +.>

And->

Will->

And->

Will->

And->

Adding the three components, and sequentially carrying out 3×3 convolution with expansion rate of 2 and BNLayer, reLU activation function gets feature map->

Will->

And->

The three are added and then sequentially subjected to 3 multiplied by 3 convolution with the expansion rate of 3, BN layer and ReLU activation function to obtain a characteristic diagram +.>

Will->

And->

Then will->

And->

The specific formula is as follows:

F _cat ＝ReLU(BN(Conv1(Concat(F _low ,Up(F _high )))))

is a matrix addition operation, conv3 (&) is a convolution layer with a convolution kernel size of 3×3, conv3 _d＝i (. Cndot.) is a 3X 3 convolution with an expansion ratio of i, conv1 (-) is a convolution layer with a convolution kernel size of 1X 1, BN (-) is a batch normalization operation, and ReLU (-) is an activation function Split (·) is an equal Split operation along the channel dimension.

Further, as shown in fig. 2, step B5 includes the steps of:

and B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module. Inputting an original image, and obtaining four feature images with different scales through a backbone network in the step B1

And->

And edge mask->

Then three edge enhancement modules in step B3 are constructed and respectively marked as EEM ₁ 、EEM ₂ And EEM ₃ Wherein EEM is ₁ Is input as the fourth stage characteristic map F extracted in the step B1 ₄ And an edge mask M obtained in the step B2 _e The output is edge enhancement feature->

EEM ₃ The input of (a) is the second order extracted in step B1Segment characteristic diagram F ₂ And an edge mask M obtained in the step B2 _e The output is edge enhancement feature->

Then constructing an edge feature fusion module in the step B3, and inputting the edge feature fusion module into the first-stage feature map F extracted in the step B1 ₁ Edge feature map F obtained in step B2 _e And edge mask M _e Output as feature map of fused edge information +.>

Then, constructing four high-order space interaction modules in the step B4, which are respectively marked as HSIM ₁ 、HSIM ₂ 、HSIM ₃ And HSIM ₄ Their inputs are the feature maps obtained in step B3, respectively

And->

The outputs are respectively

And->

And->

Output as context feature map +.>

CAM ₂ The input of (2) is CAM ₁ Output of +.>

And feature map->

Output as context feature map +.>

CAM ₃ The input of (2) is CAM ₂ Output of +.>

And feature map->

Output as context feature map +.>

For edge mask M _e The two linear interpolation up-sampling is amplified by 4 times to obtain a final edge mask M _edge . For contextual profile->

For contextual profile->

Compressing the pseudo-noise signal into 1 channel through 1X 1 convolution, and performing bilinear interpolation up-sampling and amplification for 8 times to obtain a second-stage pseudo-noise signalLoad target mask->

For contextual profile->

Compressing it into 1 channel by 1X 1 convolution, then carrying out bilinear interpolation up-sampling and amplifying 4 times to obtain final camouflage target mask +. >

The specific formula is as follows:

M _edge ＝Up _scale＝4 (M _e )

Further, step C comprises the steps of:

expressed as a total loss function->

Representing weighted binary cross entropy loss, ">

Expressed as weighted cross-ratio loss, ">

The Dice coefficient loss is represented, and λ is represented as the weight of the loss. />

Further, the step D is implemented as follows:

and D1, randomly dividing the training data set obtained in the step A into a plurality of batches, wherein each batch comprises N pairs of images.

And->

Calculating the loss +.using the formula in step C>

And D3, calculating the gradient of the parameters in the network by using a back propagation method according to the loss, and updating the network parameters by using an Adam optimization method.

And D4, repeating the steps D1 to D3 by taking the batch as a unit until the target loss function value of the network converges to Nash balance, and storing network parameters to obtain a camouflage target detection model based on edge feature fusion and higher-order space interaction. For the tested camouflage target image, the highest resolution of three camouflage target masks is predicted by a model

As the final camouflage target mask.

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The camouflage target detection method based on edge feature fusion and high-order space interaction is characterized by comprising the following steps of:

2. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation steps of the step A are as follows:

3. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation steps of the step B are as follows:

4. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B1 is specifically implemented as follows:

Feature map F ₂ The size is +.>

Feature map F ₃ The size is as follows

Feature map F ₄ The size is +.>

C＝256。

5. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B2 is specifically implemented as follows:

Will F' ₁ And F' ₄ After being spliced along the channel dimension, the edge feature diagram is obtained through a channel attention module>

The specific formula is as follows:

F′ ₁ ＝ReLU(BN(Conv1(F ₁ )))

F′ ₄ ＝ReLU(BN(Conv1(F ₄ )))

F″ ₄ ＝Up(F′ ₄ )

F _e ＝SE(Concat(F′ ₁ ,F″ ₄ ))

step B23, designing a convolution block in the edge perception module; inputting the edge characteristic diagram F obtained in the step B22 _e Sequentially passing through 3×3 convolution, BN layer, reLU activation function and 3×3 volumeThe product, BN layer, reLU activation function, 1×1 convolution ultimately generates an edge mask

The specific formula is as follows:

M _e ＝Conv1(ReLU(BN(Conv3(ReLU(BN(Conv3(F _e ))))))))

6. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B3 is specifically implemented as follows:

The specific formula is as follows:

M' _e ＝Down(M _e )

where Down (·) is a bilinear interpolation downsampling operation,

is a matrix multiplication, +.>

The specific formula is as follows:

F _ee ＝SA(SE(F _guide ))

Edge feature map obtained in step B2 +.>

And edge mask->

Map F' _e Through SE module and->

Finally, the feature map is added->

The specific formula is as follows:

F' _e ＝ReLI(BN(Conv3(F _e )))

wherein the method comprises the steps of

Is a matrix multiplication, +.>

7. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B4 is specifically implemented as follows:

Then will->

Will->

Splitting into two feature maps along the channel>

Splitting it into n (n is the order) feature maps>

Wherein->

Then sequentially iterate to the feature map p _n-1 And feature map q _n-1 After multiplication, the number of the input channels is the same as the number of the output channels, and the convolution kernel size is obtainedFor a 1×1 convolution layer, n-degree spatial interaction feature map is obtained>

The specific formula is as follows:

Q＝DWConv(q)

Is a matrix multiplication, +.>

Is a matrix addition operation;

step B42, constructing a feedforward module in the high-order space interaction module, and inputting the feedforward module into the feature map F obtained in the step B41 _mid For F _mid Layer normalization was performed, denoted LN ₂ Then input into two layers of full-connection layers, marked as MLP, and output of the two layers of full-connection layers and the feature map F _mid Adding to obtain high-order space interaction characteristics

The specific formula is as follows:

wherein the method comprises the steps of

Is a matrix addition operation;

step B43, constructing a channel reduction module in the high-order space interaction module, and inputting F obtained in step B42 _{h si} F is to F _{h si} Sequentially performing 1×1 convolution, BN layer and ReLU activation function to obtain a channel-reduced high-order space interaction characteristic diagram

The specific formula is as follows:

F′ _{h si} ＝ReLU(BN(Conv1(F _{h si} )))

And->

First, feature map F _{h igh} Upsampling bilinear interpolation to adjust its width and height to be equal to F _low The same width and height as F _low Splicing along the channel dimension, and then sequentially carrying out 1×1 convolution, BN layer and ReLU activation function to obtain a feature map +. >

And->

Will->

And->

Will->

And->

Will->

And->

Will->

And->

Then will->

And->

The specific formula is as follows:

F _cat ＝ReLU(BN(Conv1(Concat(F _low ,Up(F _high )))))

is a matrix addition operation, conv3 (&) is a convolution with a convolution kernel size of 3×3Layer, conv3 _d＝i (. Cndot.) is a 3X 3 convolution with a rate of expansion of i, conv1 (-) is a convolution layer with a convolution kernel size of 1X 1, EN (-) is a batch normalization operation, reLU (-) is an activation function, split (-) is a Split operation equally along the channel dimension.

8. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B5 is specifically implemented as follows:

And->

And edge mask->

And->

The outputs are +.>

And->

And->

Output as context feature map +.>

CAM ₂ The input of (2) is CAM ₁ Output of +.>

And feature map->

Output as context feature map +.>

CAM ₃ The input of (2) is CAM ₂ Output of +.>

And feature map->

Output as context feature map +.>

For contextual profile->

For contextual profile->

The specific formula is as follows:

M _edge ＝Up _scale＝4 (M _e )

9. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation step of the step C is as follows:

expressed as a total loss function->

Representing weighted binary cross entropy loss, ">

Expressed as weighted cross-ratio loss, ">

10. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation step of the step D is as follows:

And->

Calculating the loss +.using the formula in step C>

step D4, repeatedly executing the steps D1 to D3 by taking the batch as a unit until the target loss function value of the network converges to Nash balance, and storing network parameters to obtain a camouflage target detection model based on edge feature fusion and higher-order space interaction; for the tested camouflage target image, the highest resolution of three camouflage target masks is predicted by a model

As the final camouflage target mask.