CN114581752A

CN114581752A - Camouflage target detection method based on context sensing and boundary refining

Info

Publication number: CN114581752A
Application number: CN202210495815.6A
Authority: CN
Inventors: 史彩娟; 任弼娟; 陈厚儒; 段昌钰; 闫晓东
Original assignee: North China University of Science and Technology
Current assignee: North China University of Science and Technology
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-06-03
Anticipated expiration: 2042-05-09
Also published as: CN114581752B

Abstract

The invention discloses a disguised target detection method based on context awareness and boundary refinement, which comprises the following steps of: the system comprises a backbone network, a GCIE module, an AINF module and a BR module; extracting multi-scale characteristics of an image to be detected containing a camouflage target by a backbone network; the GCIE module enhances the third layer, the fourth layer and the fifth layer of characteristics in the multi-scale characteristics extracted by the backbone network to fully sense the global context information; the AINF module adopts a hierarchical structure and an AFF assembly to fuse the characteristics of adjacent layers, and senses global and local information at the same time to obtain a rough prediction map of regional characteristics and a camouflage target; the BR module refines the boundary by using spatial information in the low-level features and inhibits non-disguised factors, so that disguised targets with rich boundaries can be better detected; the invention can comprehensively sense the disguised target because of simultaneously paying attention to the context information and the boundary information, thereby being beneficial to improving the detection performance of the disguised target and expanding the use scene of the invention.

Description

Camouflage target detection method based on context sensing and boundary refining

Technical Field

The invention relates to image camouflaged target detection, belongs to the technical field of data identification, and particularly relates to a camouflaged target detection method based on context sensing and boundary refinement.

Background

In recognizing patterns, camouflage targets typically embed themselves into the surrounding environment using structural and physiological features, or by manual techniques to achieve self-protection. The camouflage target is highly similar to the background in visual features such as color, texture and the like, so that the common target detection algorithm and the salient target detection algorithm cannot detect the camouflage target, and the existing camouflage target detection algorithm has the following defects: 1. the global context information is not sufficiently perceived, resulting in limited detection performance of disguised objects, particularly large disguised objects and occluded disguised objects. The receptive field block is the only way to acquire the context information, but the receptive field in the existing disguised target detection method cannot cover the whole image with high resolution and can not well sense the interaction of different positions in the image, so that the global context information cannot be fully and comprehensively sensed. 2. Global and local context information is not perceived sufficiently at the same time, resulting in limited detection performance of the decoy target, especially of a plurality of small decoy targets. The existing detection method of the disguised target mainly determines the position of the disguised target by sensing rich global context information, and little work is needed while local context information is fully sensed. 3. Sufficient refinement of the boundary information is omitted, resulting in limited detection performance of the disguised object, particularly a disguised object with rich boundaries. In summary, the capability of the existing disguised object detection algorithm is to be improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for detecting a disguised target based on context sensing and boundary refinement, which improves the capability of detecting the disguised target.

The purpose of the invention is realized by the following technical scheme:

a method for detecting a disguised target based on context sensing and boundary refinement comprises the steps of inputting an image to be detected, which contains the disguised target, into a constructed and trained disguised target detection model, and detecting the disguised target; the camouflage target detection model comprises a backbone network, a GCIE module, an AINF module and a BR module;

the multi-scale features of the image to be detected containing the disguised target extracted by the backbone network comprise five layers of features

；

Third-layer, fourth-layer and fifth-layer features in multi-scale features extracted by GCIE module for backbone network

Enhancing and fully sensing global context information, and outputting enhanced features to an AINF module;

the AINF module adopts a hierarchical structure and an AFF component to fuse the characteristics of adjacent layers, and simultaneously senses global and local information to obtain a rough prediction graph of the regional characteristics and the camouflage target;

second-tier features of regional and backbone extraction

And inputting the boundary into a BR module, refining the boundary by utilizing the spatial information in the low-layer characteristics and inhibiting non-disguising factors, refining the boundary of the disguised target and obtaining a fine prediction map of the disguised target.

Further, the constructing and training of the disguised target detection model comprises:

s11, dividing a pre-collected image data set containing a camouflage target into a training set and a test set;

s12, constructing a camouflage target detection model;

s13, training the constructed disguised target detection model by using a training set;

s14, the trained camouflage target detection model is tested by using the test set.

Further, the GCIE module comprises a GC sub-module and a PMMC sub-module, and is used for enlarging the receptive field to fully sense the global upper and lower partsMessage, enhancing third, fourth and fifth tier features in a backbone network

Wherein, the GC sub-module firstly obtains global context characteristics from the backbone network characteristics, then obtains conversion characteristics from the global context characteristics, and finally adds the conversion characteristics and the backbone network characteristics to obtain enhanced characteristics

I.e., the output of the GC sub-module; PMMC submodule first reducing enhancement features

The number of channels is input into three parallel mixed convolution branches, the results of the three branches and the characteristics after the channels are reduced are spliced and convolved, and finally the global enhancement characteristics are obtained through jump connection and Relu function operation

I.e. the GCIE module output.

Further, the AINF module firstly adopts a feature fusion component AFF to respectively fuse the global enhanced features

And

to obtain

And

then fused again using AFF components

And

to obtain

；

And

after splicing, the two are convolved and then

Splicing, the splicing characteristic is convoluted to obtain the regional characteristic of the disguised target

And a roughness prediction map

I.e. the AINF module output; roughness prediction map of a target to be camouflaged

Constructing loss with the detected tag value, and characterizing the region

Input to the BR module.

Further, the BR module uses the second layer characteristics of the backbone network

The space information in the method refines the boundary of the disguised target, and firstly, the second layer characteristics of a backbone network are subjected to

De-noising to obtain

Will be

And regional characteristics

Adding and fusing to obtain fused features

Then will be

Input to the MSCA component and SA component in turn calculates attention coefficients and uses the jump connection to re-integrate with the features

Adding to obtain a weighted feature

(ii) a Then will be

And

multiplying to enhance denoised second layer features

The disguised target boundary information contained in (1); then multiplying the multiplied features with the region features

Adding to obtain fine features

(ii) a Finally, the

Obtaining the final fine prediction image of the disguised target through convolution

And a fine prediction map is generated

And detecting tag building loss.

Further, a loss function for training the constructed camouflage target detection model by using a training set adopts pixel position perception loss

Total loss function of the disguised object detection model

Comprises the following steps:

wherein

And

respectively representing the surveillance of the camouflage target behind the AINF module and the BR module,

and

respectively representing weighted binary cross-entropy loss and weighted cross-over ratio loss,

and

coarse prediction graph representing AINF module and BR module prediction

And fine prediction maps

Respectively carrying out 8 times of upsampling to obtain a pretending target prediction graph,

a binary label map representing the disguised object.

Furthermore, the MSCA component comprises two branches, one branch obtains global information by using a global average pooling layer and two point-by-point convolution layers, the other branch obtains local information by using only the two point-by-point convolution layers, and finally, the global information and the local information are subjected to additive fusion and a sigmoid activation function is carried out to obtain a multi-scale channel attention coefficient.

Further, the SA component obtains Max feature and Avg feature by respectively using global maximum pooling and global average pooling operations for the channel refinement feature processed by the MSCA component along a channel axis, splices the Max feature and the Avg feature to generate a channel feature descriptor, and generates a spatial attention coefficient by using a 3 × 3 convolution and sigmoid activation function.

Compared with the prior art, the invention has the following advantages:

the invention improves the detection performance of the disguised target by utilizing the deep learning technology. According to the invention, global context information is fully sensed through feature enhancement, and the detection performance of a large disguised target and a shielded disguised target is improved; global and local context information is sensed simultaneously through feature fusion, and the detection performance of a plurality of small disguised targets is improved; the detection performance of the disguised target with rich boundaries is improved by refining the boundaries of the disguised target by utilizing the spatial information in the bottom layer characteristics, the detection capability of the disguised target is improved by the characteristics, and the use scene of the method is expanded; the invention is a detection model obtained by training on a large-scale data set, and has better robustness and universality.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic analysis diagram of the present invention for improving the detection accuracy of a disguised target based on context awareness and boundary refinement;

FIG. 2 is a block diagram of a disguised object detection model of the present invention;

FIG. 3 is a block diagram of the GCIE module of the present invention;

FIG. 4 is a block diagram of an AINF module of the present invention;

FIG. 5 is a block diagram of the AFF assembly and the MSCA assembly of the present invention;

FIG. 6 is a block diagram of a BR module of the present invention;

fig. 7 is a structural view of an SA module of the present invention.

Detailed Description

The invention is further illustrated by the following figures and examples.

Referring to fig. 2-6, a disguised object detection method based on context awareness and boundary refinement includes: inputting an image to be detected containing a disguised target into a constructed and trained disguised target detection model, and detecting the disguised target; the camouflage target detection model comprises a backbone network, a GCIE module, an AINF module and a BR module; the multi-scale features of the image to be detected containing the disguised target extracted by the backbone network comprise five layers of features

(ii) a Third-layer, fourth-layer and fifth-layer features in multi-scale features extracted by GCIE module for backbone network

Enhancing and fully sensing global context information, and outputting enhanced features to an AINF module; AINF module adopts hierarchical structure and AFF component fusionCombining the characteristics of adjacent layers, and simultaneously sensing global and local information to obtain a rough prediction graph of the regional characteristics and the disguised target; second-tier features of regional and backbone extraction

And the boundary is refined by using the spatial information in the low-layer characteristics and non-camouflage factors are inhibited by the BR module, the boundary of the camouflage target is refined, a refined prediction image of the camouflage target is obtained, and the detection of the camouflage target is finished.

In this embodiment, constructing and training the disguised target detection model includes:

s12, constructing a camouflage target detection model;

s13, training the constructed disguised target detection model by using a training set; in this embodiment, the loss function for training the constructed detection model of the disguised target by using the training set is the pixel position perception loss

Total loss function of the disguised object detection model

Comprises the following steps:

wherein

And

and

and

coarse prediction graph representing AINF module and BR module prediction

And fine prediction maps

a binary label map representing the disguised object.

In the embodiment, the multi-scale features extracted by the backbone network from the image to be detected and containing the disguised target comprise five layers of features

. The backbone network generally has detection and segmentation capabilities after being trained by the ImageNet data set, and may adopt the most common networks such as VGG and ResNet, which are not specifically limited herein.

In this embodiment, the GCIE module structure is as shown in fig. 3, the GCIE module,comprises a GC sub-module and a PMMC sub-module, and is used for extracting third-layer, fourth-layer and fifth-layer characteristics in multi-scale characteristics of a backbone network

Enhancing the fully-sensed global context information, and outputting the enhanced features to an AINF module, wherein a GC sub-module firstly performs 1 × 1 convolution and softmax function operation on the characteristics of a backbone network and multiplies the result with the characteristics of the backbone network to obtain the global context features, then the global context features are subjected to 1 × 1 convolution, layer normalization, Relu function operation and 1 × 1 convolution to obtain conversion features, and finally the conversion features and the characteristics of the backbone network are added to obtain the enhanced features

I.e., the output of the GC sub-module; the PMMC sub-module first reduces the enhancement feature using a 1 × 1 convolution

Then input into three parallel mixed convolution branches, where the expansion ratio of the expansion convolutionrate=3,5,7, then splicing the results of the three branches and the features after the channels are reduced, performing 3 × 3 convolution, and finally performing jump connection and Relu function operation to obtain the global enhanced features

I.e. the GCIE module output. Third, fourth and fifth tier features of backbone extraction

As the input of the GCIE module, the GCIE module uses a GC sub-module capable of capturing long-distance dependency and a PMMC sub-module simulating a receptive field mechanism in human visual sense, so that the model can comprehensively sense global context information, and meanwhile, the model has stronger robustness.

In the embodiment, the structure of the AINF module is shown in FIG. 4, and the AINF module adopts a hierarchical structure and an AFF assembly to fuse the characteristics of adjacent layersSimultaneously perceiving global and local information to obtain a rough prediction graph of regional characteristics and a disguised target, and respectively fusing global enhanced characteristics by adopting a characteristic fusion component AFF (auto-fuzzy function)

And

to obtain

And

then fused again using AFF components

And

to obtain

；

And

after splicing, the data is convolved by 3 x 3 and then combined with

Splicing, namely obtaining the area characteristics of the disguised target by carrying out 3 x 3 convolution and 1 x 1 convolution on the spliced characteristics

And a roughness prediction map

Constructing loss with the detected tag value, and characterizing the region

Input to the BR module;

the formula of the AFF component fusion characteristics is as follows:

wherein

Showing the fusion features obtained using the AFF components,

the representation of the function of the ReLU,

representing a 3 x 3 convolution followed by a batch normalization operation,

representing the multi-scale channel attention coefficients obtained using the MSCA components,

and

respectively representing the input low-level features and high-level features,

。

the MSCA component comprises two branches, wherein one branch acquires global information by using a global average pooling layer and two point-by-point convolution layers, the other branch acquires local information by using only the two point-by-point convolution layers, and finally the global information and the local information are subjected to additive fusion and pass through a sigmoid activation function to obtain a multi-scale channel attention coefficient.

In this embodiment, the BR module structure is as shown in fig. 6, and the regional characteristics and the second-layer characteristics extracted by the backbone network

Input to BR module, it utilizes space information in low-level feature to refine boundary and inhibit non-disguise factor, refine disguise target boundary and obtain fine prediction graph of disguise target, firstly, for second-level feature of backbone network

De-noising to obtain

Will be

And regional characteristics

Adding and fusing to obtain fused features

Then will be

Adding to obtain a weighted feature

(ii) a Then will be

And

multiplying to enhance denoised second layer features

Adding to obtain fine features

(ii) a Finally, the

Obtaining the final fine prediction image of the camouflage target by a convolution of 3 multiplied by 3 and a convolution of 1 multiplied by 1

And a fine prediction map is generated

And detecting tag building loss.

And the SA component obtains Max feature and Avg feature by respectively using global maximum pooling and global average pooling operations for the channel refining feature processed by the MSCA component along a channel axis, splices the Max feature and the Avg feature to generate a channel feature descriptor, and generates a spatial attention coefficient by using a 3 x 3 convolution and sigmoid activation function.

The BR module utilizes a large amount of spatial information contained in low-level features to supplement and refine the region features, and simultaneously, the multi-scale channel attention and the spatial attention adopted by the BR module can effectively inhibit the interference of non-camouflage factors.

Fig. 1 (a) is an input image including a masquerading target, fig. 1(b) is an initial prediction image output by a backbone network, fig. 1(c) is a rough prediction image obtained using context sensing, fig. 1(d) is a fine prediction image obtained after boundary refinement, and fig. 1 (e) is a binary label image of an input image. The object (devil's dragon) is similar in shape and texture to the seaweed in the background and the color of the object (devil's dragon) is similar to the background color in the image, and thus can be considered as a camouflage object. As can be seen from the figure, the initial predicted image (fig. 1 (b)) obtained after the input image is detected by the backbone network is not complete enough and has unclear edges, and the detection capability of the disguised target needs to be improved; the accuracy of the position and the contour of the disguised target in the rough predicted image (fig. 1 (c)) obtained by using context sensing is greatly improved, but the edge details are still not fine enough; the boundary of the masquerading target in the fine prediction image (fig. 1 (d)) obtained after the boundary refinement is further clear. The idea of context sensing and boundary refining not only can accurately highlight the position and the outline main body of the disguised target, but also can obtain a fine target boundary. Therefore, the invention provides a method for detecting a disguised target based on context sensing and boundary refinement, which can improve the overall detection capability and detection precision of the disguised target, so that the method can be applied to more practical data detection and identification application scenes (such as military fields, medical fields, biological fields, agricultural fields, traffic fields and the like) related to disguise, and the working efficiency of related workers is improved. The method mainly applies a deep learning technology, adds a context sensing module and a boundary refining module in the neural network, and combines context information (global context and local context information) and boundary information to effectively separate the disguised target from the environment with complex background.

The above-mentioned embodiments are preferred embodiments of the present invention, and the present invention is not limited thereto, and any other modifications or equivalent substitutions that do not depart from the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A method for detecting a disguised target based on context sensing and boundary refinement comprises the steps of inputting an image to be detected, which contains the disguised target, into a constructed and trained disguised target detection model, and detecting the disguised target; the method is characterized in that the camouflage target detection model comprises a main network, a GCIE module, an AINF module and a BR module;

the multi-scale features of the image to be detected containing the camouflage target extracted by the trunk network comprise five layers of features

；

regional features and backbone extracted second-level features

2. The method for detecting a disguised target based on context awareness and boundary refinement according to claim 1, wherein constructing and training a disguised target detection model comprises:

s12, constructing a camouflage target detection model;

s13, training the constructed camouflage target detection model by using a training set;

3. The disguised object detection method based on context awareness and boundary refinement of claim 1, wherein GC is used for detecting the disguised objectIE module including GC sub-module and PMMC sub-module for enlarging reception field to fully sense global context information and enhancing third layer, fourth layer and fifth layer characteristics in backbone network

I.e. the output of the GC sub-module; PMMC submodule first reducing enhancement features

I.e. the GCIE module output.

4. The method for detecting the disguised target based on the context awareness and the boundary refinement as claimed in claim 1, wherein the AINF module firstly adopts a feature fusion component AFF to respectively fuse the global enhanced features

And

to obtain

And

then fused again using AFF components

And

to obtain

；

And

after splicing, the two are convolved and then

And a roughness prediction map

Constructing loss with the detected tag value, and characterizing the region

Input to the BR module.

5. The method for detecting the disguised target based on the context awareness and the boundary refinement as claimed in claim 1, wherein the BR module uses the second layer characteristics of the backbone network

De-noising to obtain

Will be

And regional characteristics

Adding and fusing to obtain fused features

Then will be

Adding them to obtain a weighted feature

(ii) a Then will be

And with

Multiplying to enhance denoised second layer features

The disguised target boundary information contained in (1); then will beMultiplied features and regional features

Adding to obtain fine features

(ii) a Finally, the

And a fine prediction map is generated

And detecting tag building loss.

6. The method of claim 2, wherein the loss function for training the constructed detection model of the disguised target by using the training set employs pixel position perception loss

Total loss function of the disguised object detection model

Comprises the following steps:

wherein

And

and

and

coarse prediction graph representing AINF module and BR module prediction

And fine prediction maps

a binary label map representing the disguised object.

7. The method for detecting a camouflaged object based on context awareness and boundary refinement according to claim 5, wherein the MSCA component comprises two branches, one branch uses global mean pooling and two point-by-point convolutional layers to obtain global information, the other branch uses only two point-by-point convolutional layers to obtain local information, and finally the global information and the local information are subjected to additive fusion and a sigmoid activation function to obtain a multi-scale channel attention coefficient.

8. The method for detecting the disguised target based on the context awareness and the boundary refinement as claimed in claim 5, wherein the SA component uses the global Max pooling and the global mean pooling respectively for the channel refinement features processed by the MSCA component along the channel axis to obtain Max feature and Avg feature and concatenate them to generate the channel feature descriptor, and then uses a 3 x 3 convolution and sigmoid activation function to generate the spatial attention coefficient.