CN116310693A - Camouflage target detection method based on edge feature fusion and high-order space interaction - Google Patents
Camouflage target detection method based on edge feature fusion and high-order space interaction Download PDFInfo
- Publication number
- CN116310693A CN116310693A CN202310356445.2A CN202310356445A CN116310693A CN 116310693 A CN116310693 A CN 116310693A CN 202310356445 A CN202310356445 A CN 202310356445A CN 116310693 A CN116310693 A CN 116310693A
- Authority
- CN
- China
- Prior art keywords
- edge
- module
- convolution
- feature map
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a camouflage target detection method based on edge feature fusion and high-order space interaction. Comprising the following steps: performing data preprocessing, including data pairing and data enhancement processing, to obtain a training data set; designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the network comprises an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module; designing a loss function, and guiding the parameter optimization of the network designed in the step B; c, training the camouflage target detection network based on the edge feature fusion and the high-order space interaction in the step B by using the training data set obtained in the step A, converging to Nash balance, and obtaining a trained camouflage target detection model based on the edge feature fusion and the high-order space interaction; inputting the image to be detected into a trained camouflage target detection model based on edge feature fusion and high-order space interaction, and outputting a mask image of the camouflage target.
Description
Technical Field
The invention relates to the technical fields of image and video processing and computer vision, in particular to a camouflage target detection method based on edge feature fusion and high-order space interaction.
Background
With the development of the technology level, digital image processing has been widely applied to various aspects of human society life, and furthermore, it can be applied to many aspects of scientific research and the like. Camouflage target detection is an emerging digital image processing task, and aims to accurately and efficiently detect a camouflage target embedded in the surrounding environment, and divide an image into the camouflage target and a background so as to find the camouflage target therein. Camouflage phenomenon widely exists in the nature, and living things in the nature utilize self structures and physiological characteristics to blend in the surrounding environment, so that predators are avoided. The camouflage target detection can help to find camouflage organisms in nature and help scientists to better study the natural organisms. The applicable field of camouflage target detection is quite wide, and besides the academic value, the camouflage object detection is also helpful to promote search detection of camouflage targets in military, judgment of disease conditions in medical fields, invasion of locusts in agricultural remote sensing and the like.
Early camouflage target detection methods differentiated camouflage targets from background based on low-level features such as color, texture, geometric gradients, frequency domain, motion, etc. that were made by hand. Most camouflage targets are however very similar in color to the background, and color-based methods only address situations where the object is color-different from the background. The texture feature-based method has good detection effect when the color is very close to the background, but has poor performance when the texture of the camouflage target is similar to the background. The motion-based detection method relies on motion information that locates a camouflage target based on the varying differences between background color and texture created by the motion. However, the method is greatly influenced by interference factors, and the problems of false leakage detection and the like can occur due to illumination change or background movement. The camouflage target detection method based on the manual design features can achieve a certain effect, but often fails in a complex scene.
In recent years, with deep learning being deeply applied to various fields of computer vision, a plurality of camouflage target detection models based on convolutional neural networks appear, and the models model camouflage target information with strong feature extraction capability and autonomous learning capability, so that the accuracy of camouflage target detection can be improved, meanwhile, the generalization of the camouflage target detection models can be enhanced, and the effect is obviously improved compared with that of a traditional camouflage detection method. The mainstream method is to input an image into a backbone network, extract image features from the backbone network, and then predict masks of camouflage targets based on the features, thereby finding the camouflage targets therein. The methods make full use of semantic information of convolutional neural networks and expand receptive fields to detect camouflage targets. However, since the camouflage target has high similarity in color and texture with the background, the camouflage target detection model based on the convolutional neural network has difficulty in learning the characteristics of the camouflage target to distinguish the foreground from the background. Therefore, other methods introduce additional clues, such as edge information, based on the original, so as to help the camouflage target detection module based on the convolutional neural network to better distinguish the camouflage target from the background. Therefore, the accuracy of camouflage target detection can be effectively improved by utilizing the additional information. The invention designs a camouflage target detection method based on edge feature fusion and high-order space interaction, which comprises the steps of firstly extracting image features through a backbone network, then designing an edge perception module to generate an edge mask and edge features, then designing an edge enhancement module and an edge feature fusion module, constructing a high-order space interaction module and a context aggregation module, and finally generating a camouflage target mask by using the designed network.
Disclosure of Invention
In view of the above, the present invention aims to provide a camouflage target detection method based on edge feature fusion and higher-order spatial interaction, which is beneficial to significantly improving the performance of camouflage target detection by fusing edge features and performing higher-order spatial interaction.
In order to achieve the above purpose, the invention adopts the following technical scheme: a camouflage target detection method based on edge feature fusion and high-order space interaction comprises the following steps:
step A, data preprocessing, including data pairing and data enhancement processing, is carried out, and a training data set is obtained;
step B, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network consists of an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module;
c, designing a loss function, and guiding parameter optimization of the network designed in the step B;
step D, training the camouflage target detection network based on the edge feature fusion and the high-order space interaction in the step B by using the training data set obtained in the step A, converging to Nash balance, and obtaining a trained camouflage target detection model based on the edge feature fusion and the high-order space interaction;
And E, inputting the image to be detected into a trained camouflage target detection model based on edge feature fusion and high-order space interaction, and outputting a mask image of the camouflage target.
In a preferred embodiment, the step a is implemented as follows:
a1, forming an image triplet by each original image, a label image corresponding to the original image and an edge label image;
step A2, randomly turning left and right, randomly cutting and randomly rotating each group of image triples; performing color enhancement on the original image, and adjusting the brightness, contrast, saturation and definition of the original image by setting random values as parameters; adding random black points or white points as random noise to the label image corresponding to the original image;
and A3, scaling each image in the data set into images with the same size of H multiplied by W.
In a preferred embodiment, the step B is implemented as follows:
step B1, constructing an image feature extraction network, and extracting image features by using the constructed network;
step B2, designing an edge perception module, and generating an edge mask and edge characteristics by using the designed module;
step B3, designing an edge enhancement module and an edge feature fusion module, enhancing the feature representation with camouflage target edge structure semantics by using the edge enhancement module, and generating features of fusion edge information by using the edge feature fusion module;
Step B4, constructing a high-order space interaction module and a context aggregation module, using the high-order space interaction module to inhibit the attention to the background and promote the attention to the foreground, and using the context aggregation module to mine context semantics to enhance object detection;
and B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module, and generating a final camouflage target mask by using the designed network.
In a preferred embodiment, the step B1 is implemented as follows:
step B1, taking Res2Net-50 as a backbone network, extracting characteristics of an original image I with the input size of H multiplied by W multiplied by 3, and specifically, respectively recording characteristic diagrams output by the original image I in a first stage, a second stage, a third stage and a fourth stage as F 1 、F 2 、F 3 And F 4 Wherein the characteristic diagram F 1 The size is as follows Feature map F 2 The size is +.>Feature map F 3 The size is +.>Feature map F 4 The size is +.>C=256。
In a preferred embodiment, the step B2 is implemented as follows:
step B21, designing an edge perception module, wherein the input of the edge perception module is the first stage characteristic diagram F extracted in the step B1 1 And fourth stage characteristic diagram F 4 The edge perception module outputs as an edge feature map F e And edge mask M e ;
Step B22, designing a feature fusion block in the edge perception module; the input of the edge perception module is the feature map F extracted in the step B1 1 And F 4 Input of a feature map F 1 Sequentially performing 1×1 convolution, BN layer and ReLU activation function to reduce channel number to obtain feature mapInput of a feature map F 4 The number of channels is reduced by a 1 multiplied by 1 convolution, a BN layer and a ReLU activation function in sequence to obtain a characteristic diagram +.>Feature map F 'using bilinear interpolation' 4 Is adjusted to sum F', the width and the height of 1 The same width and height, a characteristic diagram is obtained>Will F' 1 And F' 4 The edge feature map is obtained by the channel attention module after being spliced along the channel dimension>The specific formula is as follows:
F 1 =ReLU(BN(Conv1(F 1 )))
F 4 =ReLU(BN(Conv1(F 4 )))
”'
F 4 =Up(F 4 )
Fe=SE(Concat(F 1 ,F 4 ))
wherein Conv1 (·) is a convolution layer with a convolution kernel size of 1×1, BN (·) is a batch normalization operation, reLU (·) is a ReLU activation function, up (·) is bilinear interpolation upsampling, concat (·, ·) is a splice operation along the channel dimension, SE (·) is a channel attention module;
step B23, designing a convolution block in the edge perception module; inputting the edge characteristic diagram F obtained in the step B22 e Sequentially performing 3×3 convolution, BN layer, reLU activation function, and 1×1 convolution to finally generate edge mask The specific formula is as follows:
Me=Conv1(ReLU(BN(Conv3(ReLU(BN(Conv3(Fe))))))))
where Conv3 (·) is the convolution layer with a convolution kernel size of 3×3, BN (·) is the batch normalization operation, reLU (·) is the activation function, conv1 (·) is the convolution with a convolution kernel size of 1×1.
In a preferred embodiment, the step B3 is implemented as follows:
step B31, designing an edge enhancement module, namely firstly designing edge guiding operation in the edge enhancement module; input is the edge mask M obtained in step B2 e And the feature map obtained in the step B1 Masking the edges of the input M e Downsampling bilinear interpolation to and from feature map F i The same width and height, resulting in a mask +.>Mask M' e And feature map F i Multiplying and thenAnd F is equal to i Adding, and sequentially performing 3×3 convolution, BN layer, and ReLU activation function to obtain edge-guided feature map ++>The specific formula is as follows:
M' e =Down(M e )
where Down (·) is a bilinear interpolation downsampling operation,is a matrix multiplication, +.>Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, and ReLU (·) is an activation function;
step B32, constructing a CBAM attention sub-module in an edge enhancement module, wherein the edge enhancement module consists of serial channel attention SE and spatial attention SA, and the input feature map is a feature map F obtained in the step B32 guide Obtaining edge enhancement featuresThe specific formula is as follows:
F ee =SA(SE(F guide ))
wherein SE (-) is a channel attention module and SA (-) is a spatial attention module;
step B33, designing an edge feature fusion module, and inputting the edge feature fusion module into the first-stage feature map extracted in the step B1Edge feature map obtained in step B2 +.>And edge mask->Masking the edges M e And feature map F 4 Multiplying by F 4 Adding to obtain a feature map->Edge feature map F e Sequentially performing 3×3 convolution, BN layer and ReLU activation function to obtain a reduced channel feature map +.>Will F M With F' e Splicing along the channel dimension, sequentially passing through 3×3 convolution, swish activation function, SE module, and 3×3 convolution, and adding feature map F' e Obtaining a characteristic diagram +.>Map F' e Through SE module and->Splicing along the channel dimension, and then performing 3X 3 convolution to obtain a feature map +.> Finally, the feature map is added->And feature map F 1 Adding to obtain a feature map of the final fused edge information>The specific formula is as follows:
F' e =ReLU(BN(Conv3(F e )))
wherein the method comprises the steps ofIs a matrix multiplication, +.>Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, reLU (·) is an activation function, swish (·) is a Swish activation function, SE (·) is a channel attention module, concat (·), is a concatenation operation along the channel dimension.
In a preferred embodiment, the step B4 is implemented as follows:
step B41, firstly constructing a gating convolution module in the high-order space interaction module, and recording the characteristic diagram input by the module asWill input a feature map F α Layer Normalization (LN) 1 ) Obtaining a normalized feature mapThen will->The channel is expanded to twice by a 1X 1 convolution to obtain a characteristic diagram +.>Will->Splitting into two feature maps along the channel>Inputting q into depth separable convolution to obtain a characteristic diagram +.>Splitting it into n (n is the order) feature maps>Wherein->Map the characteristic map p 0 And feature map q 0 Multiplying, and expanding its channel to twice by 1×1 convolution to obtain first space interaction characteristic diagram ++>Map the characteristic map p 1 And feature map q 1 Multiplying, and expanding the channel to twice by 1×1 convolution to obtain second space interaction characteristic diagram ++>Then sequentially iterate to the feature map p n-1 And feature map q n-1 After multiplication, a convolution layer with the same number of input channels and output channels and a convolution kernel size of 1 multiplied by 1 is passed, so as to obtain n times of space interaction characteristic diagrams +.>Finally, the characteristic diagram F is input α And p is as follows n Adding to obtain an intermediate output profile->The specific formula is as follows:
Q=DWConv(q)
Wherein Split (·) is Split along the channel dimension, DWConv (·) is a depth separable convolution, conv1 (·) is a convolution layer with a convolution kernel size of 1 x 1,is a matrix multiplication, +.>Is a matrix addition operation;
step B42, constructing a high-order space interaction moduleIs input into the feed-forward module of step B41 to obtain a characteristic diagram F mid For F mid Layer normalization was performed, denoted LN 2 Then input into two layers of full-connection layers, marked as MLP, and output of the two layers of full-connection layers and the feature map F mid Adding to obtain high-order space interaction characteristicsThe specific formula is as follows:
step B43, constructing a channel reduction module in the high-order space interaction module, and inputting F obtained in step B42 hsi F is to F hsi Sequentially performing 1×1 convolution, BN layer and ReLU activation function to obtain a channel-reduced high-order space interaction characteristic diagramThe specific formula is as follows:
F’ hsi =ReLU(BN(Conv1(F hsi )))
where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, reLU (·) is the activation function;
step B44, firstly constructing a convolution block in a context aggregation module, and recording that the context aggregation module inputs two feature graphs with different scalesAnd->First, feature map F high Upsampling bilinear interpolation to adjust its width and height to be equal to F low Likewise, theWidth and height, and then F low Splicing along the channel dimension, and then sequentially carrying out 1×1 convolution, BN layer and ReLU activation function to obtain a feature map +.>And then F is arranged cat Four feature maps are equally divided along the channel dimension>And->Will->And->After addition, the characteristic diagram ++is obtained by 3X 3 convolution, BN layer and ReLU activation function in sequence>Will->And->After the three are added, the characteristic diagram +.f is obtained by 3 multiplied by 3 convolution with expansion rate of 2, BN layer and ReLU activation function> Will->And->Adding the three componentsThen the characteristic diagram is obtained by 3X 3 convolution with the expansion rate of 3, BN layer and ReLU activation function>Will->And->After addition, the characteristic diagram is obtained by 3X 3 convolution with expansion ratio of 4, BN layer and ReLU activation functionThen will->And->After being spliced along the channel dimension, the characteristic diagram +.f is obtained by sequentially carrying out 1 multiplied by 1 convolution, BN layer and ReLU activation function>Finally F is arranged cat And F' cat After addition, the context feature map +.f. is obtained by 3×3 convolution, BN layer and ReLU activation function in sequence>The specific formula is as follows:
F cat =ReLU(BN(Conv1(Concat(F low ,Up(F high )))))
wherein Up (-) is a bilinear interpolation upsampling operation, concat (-), and Concat (-), are concatenation operations along the channel dimension,is a matrix addition operation, conv3 (&) is a convolution layer with a convolution kernel size of 3×3, conv3 d=i (. Cndot.) is a 3X 3 convolution with a dilation rate of i, conv1 (-) is a convolution layer with a convolution kernel size of 1X 1, BN (-) is a batch normalization operation, reLU (-) is an activation function, split (-) is a Split equally operation along the channel dimension.
In a preferred embodiment, the step B5 is implemented as follows:
step B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module; inputting an original image, and obtaining four feature images with different scales through a backbone network in the step B1And-> Will F 1 And F 4 Inputting the edge sensing module in the step B2 to obtain an edge feature map +.> And edge mask->Then three edge enhancement modules in step B3 are constructed and respectively marked as EEM 1 、EEM 2 And EEM 3 Wherein EEM is 1 Is input as the fourth stage characteristic map F extracted in the step B1 4 And an edge mask M obtained in the step B2 e The output is edge enhancement featuresEEM 2 Is input as the third stage characteristic map F extracted in the step B1 3 And an edge mask M obtained in the step B2 e The output is edge enhancement feature->EEM 3 Is input as the second stage feature map F extracted in step B1 2 And an edge mask M obtained in the step B2 e The output is edge enhancement feature->Then constructing an edge feature fusion module in the step B3, and inputting the edge feature fusion module into the first-stage feature map F extracted in the step B1 1 Edge feature map F obtained in step B2 e And edge mask M e Output as fused edge informationFeature map->Then, constructing four high-order space interaction modules in the step B4, which are respectively marked as HSIM 1 、HSIM 2 、HSIM 3 And HSIM 4 Their inputs are the characteristic map obtained in step B3 +.> And->The outputs are +.> And->In the context aggregation module immediately following the construction of the three steps B4, they are denoted as CAM respectively 1 、CAM 2 And CAM (CAM) 3 In which CAM is 1 Is the input of a feature map->And->Output as context feature map +.> CAM 2 The input of (2) is CAM 1 Output of (2)And feature map->Output as context feature map +.>CAM 3 The input of (2) is CAM 2 Output of +.>And feature map->Output as context feature map +.>For edge mask M e The two linear interpolation up-sampling is amplified by 4 times to obtain a final edge mask M edge The method comprises the steps of carrying out a first treatment on the surface of the For contextual profile->Compressing the mask into 1 channel through 1X 1 convolution, and then performing bilinear interpolation up-sampling and amplifying by 16 times to obtain a first-stage camouflage target mask +.>For contextual profile->Compressing the mask into 1 channel by 1X 1 convolution, and performing bilinear interpolation up-sampling and amplification by 8 times to obtain second-stage camouflage target mask +.>For contextual profile->Will be convolved by 1 x 1Compressing the mask into 1 channel, and performing bilinear interpolation up-sampling and amplification for 4 times to obtain final camouflage target mask +. >The specific formula is as follows:
M edge =Up scale=4 (M e )
wherein Up scale=4 (. Cndot.) is double linear interpolation upsampling with a multiple of 4, up scale=8 (. Cndot.) is bilinear interpolation upsampling with a multiple of 8, up scale=16 (. Cndot.) is bilinear interpolation upsampling by a factor of 16, conv1 (. Cndot.) is a convolution layer with a convolution kernel size of 1×1 and output channel number of 1.
In a preferred embodiment, the step C is implemented as follows:
step C, designing a loss function as constraint to optimize a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the specific formula is as follows:
wherein G is camo Representing a label image corresponding to the original image I, G edge Representing the edge label image to which the original image I corresponds,expressed as the total loss functionCount (n)/(l)>Representing weighted binary cross entropy loss, ">Represented as a weighted cross-ratio loss,the Dice coefficient loss is represented, and λ is represented as the weight of the loss.
In a preferred embodiment, the step D is implemented as follows:
step D1, randomly dividing the training data set obtained in the step A into a plurality of batches, wherein each batch comprises N pairs of images;
step D2, inputting an original image I, and obtaining an edge mask M after the original image I passes through the camouflage target detection network based on edge feature fusion and high-order space interaction in the step B edge Camouflage target mask And->Calculating the loss +.using the formula in step C>
Step D3, calculating the gradient of the parameters in the network by using a back propagation method according to the loss, and updating the network parameters by using an Adam optimization method;
step D4, repeatedly executing the steps D1 to D3 by taking the batch as a unit until the target loss function value of the network converges to Nash balance, and storing network parameters to obtain a camouflage target detection model based on edge feature fusion and higher-order space interaction; for a camouflage target image tested toThe highest resolution of three camouflage target masks for model predictionAs the final camouflage target mask.
Compared with the prior art, the invention has the following beneficial effects: according to the invention, on the basis of utilizing the good edge information, the edge information is better fused with the main features, and the features are subjected to high-order spatial interaction, so that the relationship between the camouflage target and the background in the image can be better learned. The invention provides a camouflage target detection method based on edge feature fusion and high-order space interaction, which is characterized in that edge features and edge masks are respectively generated through an edge perception module, edge information is fused in an edge enhancement module and an edge feature fusion module, high-order space interaction is carried out on the fused features in a high-order space interaction module, and finally, features of different levels are aggregated in a context aggregation module, so that a high-quality camouflage target mask can be finally output.
Drawings
FIG. 1 is a flow chart of an implementation of the method in a preferred embodiment of the invention.
FIG. 2 is a block diagram of a camouflage object detection network based on edge feature fusion and higher order spatial interaction in a preferred embodiment of the invention.
Fig. 3 is a block diagram of an edge-aware module in a preferred embodiment of the present invention.
Fig. 4 is a block diagram of an edge enhancement module in a preferred embodiment of the present invention.
Fig. 5 is a block diagram of an edge feature fusion module in a preferred embodiment of the invention.
Fig. 6 is a block diagram of a high-order spatial interaction module in a preferred embodiment of the present invention.
FIG. 7 is a block diagram of a context aggregation module in accordance with a preferred embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The invention provides a camouflage target detection method based on edge feature fusion and high-order space interaction, which is shown in fig. 1-7 and comprises the following steps:
step A, data preprocessing, including data pairing and data enhancement processing, is carried out, and a training data set is obtained;
step B, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the network comprises an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module;
c, designing a loss function, and guiding parameter optimization of the network designed in the step B;
step D, training the camouflage target detection network based on the edge feature fusion and the high-order space interaction in the step B by using the training data set obtained in the step A, converging to Nash balance, and obtaining a trained camouflage target detection model based on the edge feature fusion and the high-order space interaction;
and E, inputting the image to be detected into a trained camouflage target detection model based on edge feature fusion and high-order space interaction, and outputting a mask image of the camouflage target.
Further, the step a includes the steps of:
and A1, forming an image triplet by each original image, the corresponding label image and the corresponding edge label image.
Step A2, randomly turning left and right, randomly cutting and randomly rotating each group of image triples; performing color enhancement on the original image, and adjusting the brightness, contrast, saturation and definition of the original image by setting random values as parameters; and adding random black points or white points as random noise to the label image corresponding to the original image.
And A3, scaling each image in the data set into images with the same size of H multiplied by W.
Further, the step B includes the steps of:
and B1, constructing an image feature extraction network, and extracting image features by using the constructed network.
And B2, designing an edge perception module, and generating an edge mask and edge characteristics by using the designed module.
And B3, designing an edge enhancement module and an edge feature fusion module, enhancing the feature representation with camouflage target edge structure semantics by using the edge enhancement module, and generating features of fusion edge information by using the edge feature fusion module.
And B4, constructing a high-order space interaction module and a context aggregation module, inhibiting the attention to the background by using the high-order space interaction module, promoting the attention to the foreground, and mining context semantics by using the context aggregation module to enhance object detection.
And B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module, and generating a final camouflage target mask by using the designed network.
Further, step B1 includes the steps of:
step B1, taking Res2Net-50 as a backbone network, extracting characteristics of an original image I with the input size of H multiplied by W multiplied by 3, and specifically, respectively recording characteristic diagrams output by the original image I in a first stage, a second stage, a third stage and a fourth stage as F 1 、F 2 、F 3 And F 4 Wherein the characteristic diagram F 1 The size is as followsFeature map F 2 The size is +.>Feature map F 3 The size is +.>Feature map F 4 The size is +.>
Further, as shown in fig. 3, step B2 includes the steps of:
step B21, designing an edge perception module, wherein the input of the edge perception module is the first stage characteristic diagram F extracted in the step B1 1 And fourth stage characteristic diagram F 4 The output of the module is an edge feature map F e And edge mask M e 。
And step B22, designing a feature fusion block in the edge perception module. The input of the module is the feature map F extracted in the step B1 1 And F 4 Input of a feature map F 1 Sequentially performing 1×1 convolution, BN layer and ReLU activation function to reduce channel number to obtain feature mapInput of a feature map F 4 The number of channels is reduced by a 1 multiplied by 1 convolution, a BN layer and a ReLU activation function in sequence to obtain a characteristic diagram +.>Feature map F 'using bilinear interpolation' 4 Width and height of (2) are adjusted to sum F' 1 The same width and height, a characteristic diagram is obtained>Will F' 1 And F'. 4 The edge special is obtained through the channel attention module after the channel dimension is splicedSyndrome/pattern of->The specific formula is as follows:
F’ 1 =ReLU(BN(Conv1(F 1 )))
F’ 4 =ReLU(BN(Conv1(F 4 )))
F” 4 =Up(F’ 4 )
F e =SE(Concat(F’ 1 ,F” 4 ))
where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, reLU (·) is the ReLU activation function, up (·) is bilinear interpolation upsampling, concat (·, ·) is the splice operation along the channel dimension, SE (·) is the channel attention module.
And step B23, designing a convolution block in the edge perception module. Inputting the edge characteristic diagram F obtained in the step B22 e Sequentially performing 3×3 convolution, BN layer, reLU activation function, and 1×1 convolution to finally generate edge maskThe specific formula is as follows:
M e =Conv1(ReLU(BN(Conv3(ReLU(BN(Conv3(F e ))))))))
where Conv3 (·) is the convolution layer with a convolution kernel size of 3×3, BN (·) is the batch normalization operation, reLU (·) is the activation function, conv1 (·) is the convolution with a convolution kernel size of 1×1.
Further, as shown in fig. 4 and 5, step B3 includes the steps of:
and step B31, designing an edge enhancement module, and firstly designing an edge guiding operation in the edge enhancement module. Input is the edge mask M obtained in step B2 e And the feature map obtained in the step B1Masking the edges of the input M e Downsampling bilinear interpolation to and from feature map F i The same width and height, resulting in a mask +.>Mask M' e And feature map F i Multiplying by F i Adding, and sequentially performing 3×3 convolution, BN layer, and ReLU activation function to obtain edge-guided feature map ++> The specific formula is as follows:
M' e =Down(M e )
where Down (·) is a bilinear interpolation downsampling operation,is a matrix multiplication, +.>Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, and ReLU (·) is an activation function.
Step B32, constructing a CBAM attention sub-module in the edge enhancement module, wherein the module consists of serial channel attention SE and spatial attention SA, and the input feature map is a feature map F obtained in the step B32 guide Obtaining edge enhancement featuresThe specific formula is as follows:
F ee =SA(SE(F guide ))
where SE (-) is the channel attention module and SA (-) is the spatial attention module.
Step B33, designing an edge feature fusion module, and inputting the edge feature fusion module into the first-stage feature map extracted in the step B1Edge feature map obtained in step B2 +.>And edge mask->Masking the edges M e And feature map F 4 Multiplying by F 4 Adding to obtain a feature map->Edge feature map F e Sequentially performing 3×3 convolution, BN layer and ReLU activation function to obtain a reduced channel feature map +.>Will F M With F' e Splicing along the channel dimension, sequentially passing through 3×3 convolution, swish activation function, SE module, and 3×3 convolution, and adding feature map F' e Obtaining a characteristic diagram +.> Map F' e Through SE module and->Splicing along the channel dimension, and then performing 3X 3 convolution to obtain a feature map +.>Finally, the feature map is added->And feature map F 1 Adding to obtain a feature map of the final fused edge information> The specific formula is as follows:
F' e =ReLU(BN(Conv3(F e )))
wherein the method comprises the steps ofIs a matrix multiplication, +.>Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, reLU (·) is an activation function, swish (·) is a Swish activation function, SE (·) is a channel attention module, concat (·), is a concatenation operation along the channel dimension.
Further, as shown in fig. 6 and 7, step B4 includes the steps of:
step (a)B41, firstly constructing a gating convolution module in the high-order space interaction module, and recording the characteristic diagram input by the module asWill input a feature map F α Layer Normalization (LN) 1 ) Obtaining a normalized feature mapThen will->The channel is enlarged to be twice of the original channel by a 1X 1 convolution to obtain a characteristic diagramWill->Splitting into two feature maps along the channel>Inputting q into depth separable convolution to obtain a characteristic diagram +.>Splitting it into n (n is the order) feature maps>Wherein->Map the characteristic map p 0 And feature map q 0 Multiplying, and expanding its channel to twice by 1×1 convolution to obtain first space interaction characteristic diagram ++>Map the characteristic map p 1 And feature map q 1 Multiplying and expanding the channel to the original one by a 1X 1 convolutionTwice as much as the first time, obtain the second time space interaction feature map +.>Then sequentially iterate to the feature map p n-1 And feature map q n-1 After multiplication, a convolution layer with the same number of input channels and output channels and a convolution kernel size of 1 multiplied by 1 is passed, so as to obtain n times of space interaction characteristic diagrams +.>Finally, the characteristic diagram F is input α And p is as follows n Adding to obtain an intermediate output profile->The specific formula is as follows:
Q=DWConv(q)
Wherein Split (·) is Split along the channel dimension, DWConv (.cndot.) is a depth separable convolution, conv1 (.cndot.) is a convolution layer with a convolution kernel size of 1X 1,is a matrix multiplication, +.>Is a matrix addition operation.
Step B42, constructing a feedforward module in the high-order space interaction module, and inputting the feedforward module into the feature map F obtained in the step B41 mid For F mid Layer Normalization (LN) 2 ) Then input into two layers of full-connection layer (marked as MLP), and output of the two layers of full-connection layer and the feature map F mid Adding to obtain high-order space interaction characteristicsThe specific formula is as follows:
Step B43, constructing a channel reduction module in the high-order space interaction module, and inputting F obtained in step B42 hsi F is to F hsi Sequentially performing 1×1 convolution, BN layer and ReLU activation function to obtain a channel-reduced high-order space interaction characteristic diagramThe specific formula is as follows:
F’ hsi =ReLU(BN(Conv1(F hsi )))
where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, and ReLU (·) is the activation function.
Step B44, first build context aggregationThe convolution block in the module is combined, and the module is recorded as two characteristic diagrams with different scalesAnd->First, feature map F high Upsampling bilinear interpolation to adjust its width and height to be equal to F low The same width and height as F low Splicing along the channel dimension, and then sequentially carrying out 1×1 convolution, BN layer and ReLU activation function to obtain a feature map +.>And then F is arranged cat Four feature maps are equally divided along the channel dimension>And-> Will->And->After addition, the characteristic diagram ++is obtained by 3X 3 convolution, BN layer and ReLU activation function in sequence>Will->And->Adding the three components, and sequentially carrying out 3×3 convolution with expansion rate of 2 and BNLayer, reLU activation function gets feature map-> Will->And->The three are added and then sequentially subjected to 3 multiplied by 3 convolution with the expansion rate of 3, BN layer and ReLU activation function to obtain a characteristic diagram +.>Will->And->After addition, the characteristic diagram is obtained by 3X 3 convolution with expansion ratio of 4, BN layer and ReLU activation functionThen will->And->After being spliced along the channel dimension, the characteristic diagram +.f is obtained by sequentially carrying out 1 multiplied by 1 convolution, BN layer and ReLU activation function>Finally F is arranged cat And F' cat After addition, the context feature map +.f. is obtained by 3×3 convolution, BN layer and ReLU activation function in sequence>The specific formula is as follows:
F cat =ReLU(BN(Conv1(Concat(F low ,Up(F high )))))
wherein Up (-) is a bilinear interpolation upsampling operation, concat (-), and Concat (-), are concatenation operations along the channel dimension,is a matrix addition operation, conv3 (&) is a convolution layer with a convolution kernel size of 3×3, conv3 d=i (. Cndot.) is a 3X 3 convolution with an expansion ratio of i, conv1 (-) is a convolution layer with a convolution kernel size of 1X 1, BN (-) is a batch normalization operation, and ReLU (-) is an activation function Split (·) is an equal Split operation along the channel dimension.
Further, as shown in fig. 2, step B5 includes the steps of:
and B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module. Inputting an original image, and obtaining four feature images with different scales through a backbone network in the step B1And->Will F 1 And F 4 Inputting the edge sensing module in the step B2 to obtain an edge feature map +.>And edge mask->Then three edge enhancement modules in step B3 are constructed and respectively marked as EEM 1 、EEM 2 And EEM 3 Wherein EEM is 1 Is input as the fourth stage characteristic map F extracted in the step B1 4 And an edge mask M obtained in the step B2 e The output is edge enhancement feature-> EEM 2 Is input as the third stage characteristic map F extracted in the step B1 3 And an edge mask M obtained in the step B2 e The output is edge enhancement feature->EEM 3 The input of (a) is the second order extracted in step B1Segment characteristic diagram F 2 And an edge mask M obtained in the step B2 e The output is edge enhancement feature->Then constructing an edge feature fusion module in the step B3, and inputting the edge feature fusion module into the first-stage feature map F extracted in the step B1 1 Edge feature map F obtained in step B2 e And edge mask M e Output as feature map of fused edge information +.>Then, constructing four high-order space interaction modules in the step B4, which are respectively marked as HSIM 1 、HSIM 2 、HSIM 3 And HSIM 4 Their inputs are the feature maps obtained in step B3, respectively And->The outputs are respectively And->In the context aggregation module immediately following the construction of the three steps B4, they are denoted as CAM respectively 1 、CAM 2 And CAM (CAM) 3 In which CAM is 1 Is the input of a feature map->And->Output as context feature map +.>CAM 2 The input of (2) is CAM 1 Output of +.>And feature map->Output as context feature map +.>CAM 3 The input of (2) is CAM 2 Output of +.>And feature map->Output as context feature map +.>For edge mask M e The two linear interpolation up-sampling is amplified by 4 times to obtain a final edge mask M edge . For contextual profile->Compressing the mask into 1 channel through 1X 1 convolution, and then performing bilinear interpolation up-sampling and amplifying by 16 times to obtain a first-stage camouflage target mask +.>For contextual profile->Compressing the pseudo-noise signal into 1 channel through 1X 1 convolution, and performing bilinear interpolation up-sampling and amplification for 8 times to obtain a second-stage pseudo-noise signalLoad target mask->For contextual profile->Compressing it into 1 channel by 1X 1 convolution, then carrying out bilinear interpolation up-sampling and amplifying 4 times to obtain final camouflage target mask +. >The specific formula is as follows:
M edge =Up scale=4 (M e )
wherein Up scale=4 (. Cndot.) is double linear interpolation upsampling with a multiple of 4, up scale=8 (. Cndot.) is bilinear interpolation upsampling with a multiple of 8, up scale=16 (. Cndot.) is bilinear interpolation upsampling by a factor of 16, conv1 (. Cndot.) is a convolution layer with a convolution kernel size of 1×1 and output channel number of 1.
Further, step C comprises the steps of:
step C, designing a loss function as constraint to optimize a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the specific formula is as follows:
wherein G is camo Representing a label image corresponding to the original image I, G edge Representing the edge label image to which the original image I corresponds,expressed as a total loss function->Representing weighted binary cross entropy loss, ">Expressed as weighted cross-ratio loss, ">The Dice coefficient loss is represented, and λ is represented as the weight of the loss. />
Further, the step D is implemented as follows:
and D1, randomly dividing the training data set obtained in the step A into a plurality of batches, wherein each batch comprises N pairs of images.
Step D2, inputting an original image I, and obtaining an edge mask M after the original image I passes through the camouflage target detection network based on edge feature fusion and high-order space interaction in the step B edge Camouflage target mask And->Calculating the loss +.using the formula in step C>
And D3, calculating the gradient of the parameters in the network by using a back propagation method according to the loss, and updating the network parameters by using an Adam optimization method.
And D4, repeating the steps D1 to D3 by taking the batch as a unit until the target loss function value of the network converges to Nash balance, and storing network parameters to obtain a camouflage target detection model based on edge feature fusion and higher-order space interaction. For the tested camouflage target image, the highest resolution of three camouflage target masks is predicted by a modelAs the final camouflage target mask.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.
Claims (10)
1. The camouflage target detection method based on edge feature fusion and high-order space interaction is characterized by comprising the following steps of:
step A, data preprocessing, including data pairing and data enhancement processing, is carried out, and a training data set is obtained;
step B, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network consists of an edge perception module, an edge enhancement module, an edge feature fusion module, a high-order space interaction module and a context aggregation module;
C, designing a loss function, and guiding parameter optimization of the network designed in the step B;
step D, training the camouflage target detection network based on the edge feature fusion and the high-order space interaction in the step B by using the training data set obtained in the step A, converging to Nash balance, and obtaining a trained camouflage target detection model based on the edge feature fusion and the high-order space interaction;
and E, inputting the image to be detected into a trained camouflage target detection model based on edge feature fusion and high-order space interaction, and outputting a mask image of the camouflage target.
2. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation steps of the step A are as follows:
a1, forming an image triplet by each original image, a label image corresponding to the original image and an edge label image;
step A2, randomly turning left and right, randomly cutting and randomly rotating each group of image triples; performing color enhancement on the original image, and adjusting the brightness, contrast, saturation and definition of the original image by setting random values as parameters; adding random black points or white points as random noise to the label image corresponding to the original image;
And A3, scaling each image in the data set into images with the same size of H multiplied by W.
3. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation steps of the step B are as follows:
step B1, constructing an image feature extraction network, and extracting image features by using the constructed network;
step B2, designing an edge perception module, and generating an edge mask and edge characteristics by using the designed module;
step B3, designing an edge enhancement module and an edge feature fusion module, enhancing the feature representation with camouflage target edge structure semantics by using the edge enhancement module, and generating features of fusion edge information by using the edge feature fusion module;
step B4, constructing a high-order space interaction module and a context aggregation module, using the high-order space interaction module to inhibit the attention to the background and promote the attention to the foreground, and using the context aggregation module to mine context semantics to enhance object detection;
and B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module, and generating a final camouflage target mask by using the designed network.
4. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B1 is specifically implemented as follows:
step B1, taking Res2Net-50 as a backbone network, extracting characteristics of an original image I with the input size of H multiplied by W multiplied by 3, and specifically, respectively recording characteristic diagrams output by the original image I in a first stage, a second stage, a third stage and a fourth stage as F 1 、F 2 、F 3 And F 4 Wherein the characteristic diagram F 1 The size is as follows Feature map F 2 The size is +.>Feature map F 3 The size is as followsFeature map F 4 The size is +.>C=256。
5. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B2 is specifically implemented as follows:
step B21, designing an edge perception module, wherein the input of the edge perception module is the first stage characteristic diagram F extracted in the step B1 1 And fourth stage characteristic diagram F 4 The edge perception module outputs as an edge feature map F e And edge mask M e ;
Step B22, designing a feature fusion block in the edge perception module; the input of the edge perception module is the feature map F extracted in the step B1 1 And F 4 Input of a feature map F 1 Sequentially performing 1×1 convolution, BN layer and ReLU activation function to reduce channel number to obtain feature map Input of a feature map F 4 The number of channels is reduced by a 1 multiplied by 1 convolution, a BN layer and a ReLU activation function in sequence to obtain a characteristic diagram +.>Feature map F 'using bilinear interpolation' 4 Width and height of (2) are adjusted to sum F' 1 The same width and height, a characteristic diagram is obtained>Will F' 1 And F' 4 After being spliced along the channel dimension, the edge feature diagram is obtained through a channel attention module>The specific formula is as follows:
F′ 1 =ReLU(BN(Conv1(F 1 )))
F′ 4 =ReLU(BN(Conv1(F 4 )))
F″ 4 =Up(F′ 4 )
F e =SE(Concat(F′ 1 ,F″ 4 ))
wherein Conv1 (·) is a convolution layer with a convolution kernel size of 1×1, BN (·) is a batch normalization operation, reLU (·) is a ReLU activation function, up (·) is bilinear interpolation upsampling, concat (·, ·) is a splice operation along the channel dimension, SE (·) is a channel attention module;
step B23, designing a convolution block in the edge perception module; inputting the edge characteristic diagram F obtained in the step B22 e Sequentially passing through 3×3 convolution, BN layer, reLU activation function and 3×3 volumeThe product, BN layer, reLU activation function, 1×1 convolution ultimately generates an edge maskThe specific formula is as follows:
M e =Conv1(ReLU(BN(Conv3(ReLU(BN(Conv3(F e ))))))))
where Conv3 (·) is the convolution layer with a convolution kernel size of 3×3, BN (·) is the batch normalization operation, reLU (·) is the activation function, conv1 (·) is the convolution with a convolution kernel size of 1×1.
6. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B3 is specifically implemented as follows:
Step B31, designing an edge enhancement module, namely firstly designing edge guiding operation in the edge enhancement module; input is the edge mask M obtained in step B2 e And the feature map obtained in the step B1 Masking the edges of the input M e Downsampling bilinear interpolation to and from feature map F i The same width and height, resulting in a mask +.>Mask M' e And feature map F i Multiplying by F i Adding, and sequentially performing 3×3 convolution, BN layer, and ReLU activation function to obtain edge-guided feature map ++>The specific formula is as follows:
M' e =Down(M e )
where Down (·) is a bilinear interpolation downsampling operation,is a matrix multiplication, +.>Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, and ReLU (·) is an activation function;
step B32, constructing a CBAM attention sub-module in an edge enhancement module, wherein the edge enhancement module consists of serial channel attention SE and spatial attention SA, and the input feature map is a feature map F obtained in the step B32 guide Obtaining edge enhancement featuresThe specific formula is as follows:
F ee =SA(SE(F guide ))
wherein SE (-) is a channel attention module and SA (-) is a spatial attention module;
step B33, designing an edge feature fusion module, and inputting the edge feature fusion module into the first-stage feature map extracted in the step B1 Edge feature map obtained in step B2 +.>And edge mask->Masking the edges M e And feature map F 4 Multiplying by F 4 Adding to obtain a feature map->Edge feature map F e Sequentially performing 3×3 convolution, BN layer and ReLU activation function to obtain a reduced channel feature map +.>Will F M With F' e Splicing along the channel dimension, sequentially passing through 3×3 convolution, swish activation function, SE module, and 3×3 convolution, and adding feature map F' e Obtaining a characteristic diagram +.>Map F' e Through SE module and->Splicing along the channel dimension, and then performing 3X 3 convolution to obtain a feature map +.> Finally, the feature map is added->And feature map F 1 Adding to obtain a feature map of the final fused edge information>The specific formula is as follows:
F' e =ReLI(BN(Conv3(F e )))
wherein the method comprises the steps ofIs a matrix multiplication, +.>Is a matrix addition operation, conv3 (·) is a convolution layer with a convolution kernel size of 3×3, BN (·) is a batch normalization operation, reLU (·) is an activation function, swish (·) is a Swish activation function, SE (·) is a channel attention module, concat (·), is a concatenation operation along the channel dimension.
7. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B4 is specifically implemented as follows:
Step B41, firstly constructing a gating convolution module in the high-order space interaction module, and recording the characteristic diagram input by the module asWill input a feature map F α Layer Normalization (LN) 1 ) Obtaining a normalized feature mapThen will->The channel is enlarged to be twice of the original channel by a 1X 1 convolution to obtain a characteristic diagramWill->Splitting into two feature maps along the channel>Inputting q into depth separable convolution to obtain a characteristic diagram +.>Splitting it into n (n is the order) feature maps>Wherein->Map the characteristic map p 0 And feature map q 0 Multiplying, and expanding its channel to twice by 1×1 convolution to obtain first space interaction characteristic diagram ++>Map the characteristic map p 1 And feature map q 1 Multiplying, and expanding the channel to twice by 1×1 convolution to obtain second space interaction characteristic diagram ++>Then sequentially iterate to the feature map p n-1 And feature map q n-1 After multiplication, the number of the input channels is the same as the number of the output channels, and the convolution kernel size is obtainedFor a 1×1 convolution layer, n-degree spatial interaction feature map is obtained>Finally, the characteristic diagram F is input α And p is as follows n Adding to obtain an intermediate output profile->The specific formula is as follows:
Q=DWConv(q)
wherein Split (·) is Split along the channel dimension, DWConv (·) is a depth separable convolution, conv1 (·) is a convolution layer with a convolution kernel size of 1 x 1, Is a matrix multiplication, +.>Is a matrix addition operation;
step B42, constructing a feedforward module in the high-order space interaction module, and inputting the feedforward module into the feature map F obtained in the step B41 mid For F mid Layer normalization was performed, denoted LN 2 Then input into two layers of full-connection layers, marked as MLP, and output of the two layers of full-connection layers and the feature map F mid Adding to obtain high-order space interaction characteristicsThe specific formula is as follows:
step B43, constructing a channel reduction module in the high-order space interaction module, and inputting F obtained in step B42 h si F is to F h si Sequentially performing 1×1 convolution, BN layer and ReLU activation function to obtain a channel-reduced high-order space interaction characteristic diagramThe specific formula is as follows:
F′ h si =ReLU(BN(Conv1(F h si )))
where Conv1 (·) is the convolution layer with a convolution kernel size of 1×1, BN (·) is the batch normalization operation, reLU (·) is the activation function;
step B44, firstly constructing a convolution block in a context aggregation module, and recording that the context aggregation module inputs two feature graphs with different scalesAnd->First, feature map F h igh Upsampling bilinear interpolation to adjust its width and height to be equal to F low The same width and height as F low Splicing along the channel dimension, and then sequentially carrying out 1×1 convolution, BN layer and ReLU activation function to obtain a feature map +. >And then F is arranged cat Four feature maps are equally divided along the channel dimension>And->Will->And->After addition, the characteristic diagram ++is obtained by 3X 3 convolution, BN layer and ReLU activation function in sequence>Will->And->After the three are added, the characteristic diagram +.f is obtained by 3 multiplied by 3 convolution with expansion rate of 2, BN layer and ReLU activation function> Will->And->The three are added and then sequentially subjected to 3 multiplied by 3 convolution with the expansion rate of 3, BN layer and ReLU activation function to obtain a characteristic diagram +.>Will->And->After addition, the characteristic diagram is obtained by 3X 3 convolution with expansion ratio of 4, BN layer and ReLU activation functionThen will->And->After being spliced along the channel dimension, the characteristic diagram +.f is obtained by sequentially carrying out 1 multiplied by 1 convolution, BN layer and ReLU activation function>Finally F is arranged cat And F' cat After addition, the context feature map +.f. is obtained by 3×3 convolution, BN layer and ReLU activation function in sequence>The specific formula is as follows:
F cat =ReLU(BN(Conv1(Concat(F low ,Up(F high )))))
wherein Up (-) is a bilinear interpolation upsampling operation, concat (-), and Concat (-), are concatenation operations along the channel dimension,is a matrix addition operation, conv3 (&) is a convolution with a convolution kernel size of 3×3Layer, conv3 d=i (. Cndot.) is a 3X 3 convolution with a rate of expansion of i, conv1 (-) is a convolution layer with a convolution kernel size of 1X 1, EN (-) is a batch normalization operation, reLU (-) is an activation function, split (-) is a Split operation equally along the channel dimension.
8. The method for detecting a camouflage target based on edge feature fusion and higher-order spatial interaction according to claim 3, wherein the step B5 is specifically implemented as follows:
step B5, designing a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the camouflage target detection network comprises an edge perception module, an edge feature fusion module, an edge enhancement module, a high-order space interaction module and a context aggregation module; inputting an original image, and obtaining four feature images with different scales through a backbone network in the step B1And-> Will F 1 And F 4 Inputting the edge sensing module in the step B2 to obtain an edge feature map +.> And edge mask->Then three edge enhancement modules in step B3 are constructed and respectively marked as EEM 1 、EEM 2 And EEM 3 Wherein EEM is 1 Is input as the fourth stage characteristic map F extracted in the step B1 4 And an edge mask M obtained in the step B2 e The output is edge enhancement featuresEEM 2 Is input as the third stage characteristic map F extracted in the step B1 3 And an edge mask M obtained in the step B2 e The output is edge enhancement feature->EEM 3 Is input as the second stage feature map F extracted in step B1 2 And an edge mask M obtained in the step B2 e The output is edge enhancement feature->Then constructing an edge feature fusion module in the step B3, and inputting the edge feature fusion module into the first-stage feature map F extracted in the step B1 1 Edge feature map F obtained in step B2 e And edge mask M e Output as feature map of fused edge information +.>Then, constructing four high-order space interaction modules in the step B4, which are respectively marked as HSIM 1 、HSIM 2 、HSIM 3 And HSIM 4 Their inputs are the characteristic map obtained in step B3 +.> And->The outputs are +.> And->In the context aggregation module immediately following the construction of the three steps B4, they are denoted as CAM respectively 1 、CAM 2 And CAM (CAM) 3 In which CAM is 1 Is the input of a feature map->And->Output as context feature map +.> CAM 2 The input of (2) is CAM 1 Output of +.>And feature map->Output as context feature map +.>CAM 3 The input of (2) is CAM 2 Output of +.>And feature map->Output as context feature map +.>For edge mask M e The two linear interpolation up-sampling is amplified by 4 times to obtain a final edge mask M edge The method comprises the steps of carrying out a first treatment on the surface of the For contextual profile->Compressing the mask into 1 channel through 1X 1 convolution, and then performing bilinear interpolation up-sampling and amplifying by 16 times to obtain a first-stage camouflage target mask +.>For contextual profile->Compressing the mask into 1 channel by 1X 1 convolution, and performing bilinear interpolation up-sampling and amplification by 8 times to obtain second-stage camouflage target mask +.>For contextual profile->Compressing it into 1 channel by 1X 1 convolution, then carrying out bilinear interpolation up-sampling and amplifying 4 times to obtain final camouflage target mask +. >The specific formula is as follows:
M edge =Up scale=4 (M e )
wherein Up scale=4 (. Cndot.) is double linear interpolation upsampling with a multiple of 4, up scale=8 (. Cndot.) is bilinear interpolation upsampling with a multiple of 8, up scale=16 (. Cndot.) is bilinear interpolation upsampling by a factor of 16, conv1 (. Cndot.) is a convolution layer with a convolution kernel size of 1×1 and output channel number of 1.
9. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation step of the step C is as follows:
step C, designing a loss function as constraint to optimize a camouflage target detection network based on edge feature fusion and high-order space interaction, wherein the specific formula is as follows:
wherein G is camo Representing a label image corresponding to the original image I, G edge Representing the edge label image to which the original image I corresponds,expressed as a total loss function->Representing weighted binary cross entropy loss, ">Expressed as weighted cross-ratio loss, ">The Dice coefficient loss is represented, and λ is represented as the weight of the loss.
10. The method for detecting the camouflage target based on the edge feature fusion and the higher-order spatial interaction according to claim 1, wherein the specific implementation step of the step D is as follows:
step D1, randomly dividing the training data set obtained in the step A into a plurality of batches, wherein each batch comprises N pairs of images;
Step D2, inputting an original image I, and obtaining an edge mask M after the original image I passes through the camouflage target detection network based on edge feature fusion and high-order space interaction in the step B edge Camouflage target mask And->Calculating the loss +.using the formula in step C>
Step D3, calculating the gradient of the parameters in the network by using a back propagation method according to the loss, and updating the network parameters by using an Adam optimization method;
step D4, repeatedly executing the steps D1 to D3 by taking the batch as a unit until the target loss function value of the network converges to Nash balance, and storing network parameters to obtain a camouflage target detection model based on edge feature fusion and higher-order space interaction; for the tested camouflage target image, the highest resolution of three camouflage target masks is predicted by a modelAs the final camouflage target mask.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310356445.2A CN116310693A (en) | 2023-04-06 | 2023-04-06 | Camouflage target detection method based on edge feature fusion and high-order space interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310356445.2A CN116310693A (en) | 2023-04-06 | 2023-04-06 | Camouflage target detection method based on edge feature fusion and high-order space interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116310693A true CN116310693A (en) | 2023-06-23 |
Family
ID=86824077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310356445.2A Pending CN116310693A (en) | 2023-04-06 | 2023-04-06 | Camouflage target detection method based on edge feature fusion and high-order space interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116310693A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116563313A (en) * | 2023-07-11 | 2023-08-08 | 安徽大学 | Remote sensing image soybean planting region segmentation method based on gating and attention fusion |
CN117095180A (en) * | 2023-09-01 | 2023-11-21 | 武汉互创联合科技有限公司 | Embryo development stage prediction and quality assessment method based on stage identification |
CN117593517A (en) * | 2024-01-19 | 2024-02-23 | 南京信息工程大学 | Camouflage target detection method based on complementary perception cross-view fusion network |
-
2023
- 2023-04-06 CN CN202310356445.2A patent/CN116310693A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116563313A (en) * | 2023-07-11 | 2023-08-08 | 安徽大学 | Remote sensing image soybean planting region segmentation method based on gating and attention fusion |
CN116563313B (en) * | 2023-07-11 | 2023-09-19 | 安徽大学 | Remote sensing image soybean planting region segmentation method based on gating and attention fusion |
CN117095180A (en) * | 2023-09-01 | 2023-11-21 | 武汉互创联合科技有限公司 | Embryo development stage prediction and quality assessment method based on stage identification |
CN117095180B (en) * | 2023-09-01 | 2024-04-19 | 武汉互创联合科技有限公司 | Embryo development stage prediction and quality assessment method based on stage identification |
CN117593517A (en) * | 2024-01-19 | 2024-02-23 | 南京信息工程大学 | Camouflage target detection method based on complementary perception cross-view fusion network |
CN117593517B (en) * | 2024-01-19 | 2024-04-16 | 南京信息工程大学 | Camouflage target detection method based on complementary perception cross-view fusion network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shao et al. | Feature learning for image classification via multiobjective genetic programming | |
CN116310693A (en) | Camouflage target detection method based on edge feature fusion and high-order space interaction | |
Alshdaifat et al. | Improved deep learning framework for fish segmentation in underwater videos | |
CN111242841B (en) | Image background style migration method based on semantic segmentation and deep learning | |
CN112614077B (en) | Unsupervised low-illumination image enhancement method based on generation countermeasure network | |
CN112598643B (en) | Depth fake image detection and model training method, device, equipment and medium | |
CN113221639A (en) | Micro-expression recognition method for representative AU (AU) region extraction based on multitask learning | |
CN113870124B (en) | Weak supervision-based double-network mutual excitation learning shadow removing method | |
Xu et al. | Instance segmentation of biological images using graph convolutional network | |
Su et al. | Multi‐scale cross‐path concatenation residual network for Poisson denoising | |
CN111062329A (en) | Unsupervised pedestrian re-identification method based on augmented network | |
CN112052877A (en) | Image fine-grained classification method based on cascade enhanced network | |
Qu et al. | Visual cross-image fusion using deep neural networks for image edge detection | |
Zheng et al. | Differential-evolution-based generative adversarial networks for edge detection | |
Xu et al. | AutoSegNet: An automated neural network for image segmentation | |
Zhang et al. | MultiResolution attention extractor for small object detection | |
CN116402851A (en) | Infrared dim target tracking method under complex background | |
Li et al. | Findnet: Can you find me? boundary-and-texture enhancement network for camouflaged object detection | |
Zhu et al. | A novel simple visual tracking algorithm based on hashing and deep learning | |
CN112801092B (en) | Method for detecting character elements in natural scene image | |
Dai et al. | DFN-PSAN: Multi-level deep information feature fusion extraction network for interpretable plant disease classification | |
CN109284765A (en) | The scene image classification method of convolutional neural networks based on negative value feature | |
CN116740362B (en) | Attention-based lightweight asymmetric scene semantic segmentation method and system | |
Yan et al. | Joint image-to-image translation with denoising using enhanced generative adversarial networks | |
Zhang et al. | Object tracking in siamese network with attention mechanism and Mish function |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |