CN115620049A

CN115620049A - Method for detecting disguised target based on polarized image clues and application thereof

Info

Publication number: CN115620049A
Application number: CN202211210090.8A
Authority: CN
Inventors: 王昕�; 丁甲甲; 张钊; 张勇; 周育民; 高隽
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-17

Abstract

The invention discloses a camouflage target detection method based on a polarized image clue and application thereof, wherein the method comprises the following steps: 1. acquiring a polarized image data set with a pixel-level label; 2. calculating the four original polarization angles to obtain an intensity image I, a linear polarization degree image DoLP and a linear polarization angle image AoLP; 3. performing data enhancement on the intensity image, the DoLP image and the AoLP image; 4. constructing a depth convolution network based on a polarized image clue, taking an intensity image, a DoLP image and an AoLP image as input, and training the depth convolution network to obtain a camouflage target detection model; 5. and carrying out camouflage target detection on the intensity image, the DoLP image and the AoLP image to be detected by using the trained model. The invention can realize the detection of the disguised target based on the threads of the polarized image, thereby effectively improving the accuracy of the detection of the disguised target in the scene under the complex and changeable environment.

Description

Method for detecting disguised target based on polarized image clues and application thereof

Technical Field

The invention belongs to the field of computer vision, image processing and analysis, and particularly relates to a method for detecting a disguised target based on a polarized image clue.

Background

The term "camouflage" was originally used to describe the ability of animals such as insects to attempt to hide their behavior in order to avoid pursuits by natural enemies of their surroundings, and this camouflage reduces the risk of their being discovered by natural enemies and increases the chances of survival. For example, the chameleon can change the appearance of the chameleon according to the color and the pattern of the surrounding environment. Humans are motivated to adopt this mechanism and start to find widespread use in the field. Such as soldiers and war devices, enhance camouflage or concealment by dressing and coloring. In many civilian areas, there is also a need to handle task scenarios with highly similar goals to the environment. Such as polyp segmentation in medical images, pest identification in agriculture, etc.

In recent years, the detection of the disguised object is more and more emphasized by researchers, and a plurality of disguised object detection models with good performance are proposed. Currently, existing frames are broadly divided into two categories: traditional detection of a disguised object relying on visual features (color, texture, motion, gradient, etc.) and detection of a disguised object based on a deep learning algorithm. Traditional models are highly dependent on visual features, and although some progress has been made in established scenes, the generalization capability is limited. In other words, once the environment has changed significantly, the visual features need to be redesigned, relabeled, and the model designed. Therefore, the traditional detection method for the disguised target is not suitable for scenes with complicated backgrounds. The method based on deep learning trains a camouflage target detection model through a certain amount of training data, and tests are carried out on the test data by using the trained model. The learning-based method relies on the strong learning ability of the deep neural network, integrates various features, and greatly improves the detection precision compared with the traditional method relying on visual features.

However, these learning-based approaches still have deficiencies: 1. most of the deep learning-based methods input a single image (RGB image) for training and testing, more visual perception knowledge is not learned, and a disguised target or area is difficult to effectively detect in complex scenes such as highlight, dim light, partial shielding, low contrast or disordered background; 2. the other part of the disguised target detection method based on deep learning uses RGB-D as input, the method simultaneously inputs an RGB image and a depth map, and additionally introduces depth information, which has proved that the performance of detecting the disguised target can be improved, but if the quality of the depth map is poor, the detection effect of the disguised target is deteriorated.

Disclosure of Invention

The invention provides a method for detecting a disguised target based on a polarized image clue and application thereof to solve the defects in the prior art, so that the disguised target of the polarized image can be effectively detected in a complex scene, and the accuracy and the efficiency of detecting the disguised target in the scene can be effectively improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses a method for detecting a disguised target based on a polarized image clue, which is characterized by comprising the following steps of:

step 1, acquiring a polarized image data set with pixel level labeling;

step 1.1, acquiring a group of original polarization images with polarization directions theta of 0 degrees, 45 degrees, 90 degrees and 135 degrees respectively in an nth shooting scene by using a focus-dividing plane polarimeter in a polarization camera

Thereby obtaining N groups of original polarization images under N scenes; wherein, the first and the second end of the pipe are connected with each other,

representing the original polarization image in the n-th scene in the polarization direction theta, n ∈ [1, N]；

Step 1.2, respectively labeling the N groups of original polarization images to obtain pixel-level labeled images; and the value range of each pixel point in the pixel-level labeling image is between (0, 1); forming a training data set D by N groups of original polarization images and corresponding pixel level labeled images _tr ；

Step 2, calculating a group of original polarization images of the nth scene to obtain an intensity image, a linear polarization degree image and a linear polarization angle image;

step 2.1, calculating an original polarization image in the polarization direction theta in the nth scene by using the formula (1)

Stokes vector of

In the formula (1), the reaction mixture is,

representing the total light intensity of objects in the nth scene, i.e. intensity I ⁿ ；

Respectively representing linearly polarized light in the vertical direction and the horizontal direction in the nth scene;

step 2.2, using equation (2) to correct the Stokes vector

Calculating and imaging to obtain a linear polarization degree image DoLP of the nth scene ⁿ ；

Step 2.3, using formula (3) to correct the Stokes vector

Calculating and imaging to obtain the linear polarization angle image AoLP of the nth scene ⁿ ；

Step 3, carrying out intensity image I on the nth scene ⁿ Linear degree of polarization image DoLP ⁿ And linear polarization angle image AoLP ⁿ Carrying out data enhancement processing to obtain the enhanced intensity image of the nth scene

Linear polarization degree image

And linear polarization angle image

And forming an nth polarized image set;

taking the pixel level annotation image of the nth scene as a real camouflage image of the nth scene, and marking as G ⁿ True camouflage G of the nth scene ⁿ Data enhancement is carried out to obtain a real camouflage painting after the data enhancement

Step 4, constructing a camouflage target detection model based on the polarized image clues, comprising the following steps: the device comprises an encoder, a channel dimension reduction module, a fusion module, an up-sampling module and an output module;

and 4.1, the encoder is composed of three Res2Net50 backbone networks with the same structure, each Res2Net backbone network is composed of H downsampling blocks, and each downsampling block is marked as a DSampleBlock ₁ ,……,DSampleBlock _h ,......,DSampleBlock _H (ii) a Wherein, DSampleBlock _h Represents an H-level downsample block, H =1, 2.... H; and the h-th level down-sampling block DSampleBlock _h From the h-th X layer two-dimensional convolution layer Dconv2d _h Are connected in series;

the h-th two-dimensional convolution layer Dconv2d of the x-th layer _h,x Sequentially comprises the following steps: a convolution layer, a batch normalization and a Relu activation function layer, wherein the x-th two-dimensional convolution layer Dconv2d _h,x The convolution kernel size of the convolution layer in (1) is k _x ；x＝1,2,......,X；

When h =1, inputting the nth polarized image set into the encoder, and respectively corresponding to h-level down-sampling blocks DSampleBlock passing through three Res2Net backbone networks _h The h-th order X layer convolution layer Dconv2d in (1) _h And respectively outputting the nth intensity image characteristic map of the h level

Nth polarization degree image characteristic diagram of h level

And nth polarization angle characteristic diagram of h order

x _h ,y _h ,c _h Respectively representing the h-th down-sampling block dsampletlock _h The height, width and channel number of the output feature map;

when H =2,3.... H, the nth intensity image feature map of the H-1 th level is used

Characteristic diagram of nth polarization degree image of h-1 level

And the nth polarization angle image feature map of the h-1 th order

Corresponding to h-th down-sampling blocks DSampleBlock respectively input into three Res2Net backbone networks _h And obtaining corresponding intensity image characteristic diagram

Polarization degree image characteristic diagram

And polarization angle image feature map

Thus, the H-th two-dimensional convolutional layer Dconv2d of the three Res2Net backbone networks _H Correspondingly obtaining the final output intensity image

Polarization degree image characteristic diagram

And polarization angle image feature map

x _h-1 ,y _h-1 ,c _h-1 Respectively representing the h-1 th down-sampling block dsampletlock _h-1 The height, width and channel number of the output feature map; x is the number of _H ,y _H ,c _H Respectively, the H-th down-sampling block DSMampleBlock _H The height, width and channel number of the output feature map;

and 4.2, the channel dimension reduction module is formed by sequentially connecting H layers of two-dimensional convolution layers in series, and each layer of two-dimensional convolution layer sequentially comprises: one convolution kernel is k' _h ×k′ _h 2 ofDimension convolution layer, a batch normalization and a Relu activation function;

the intensity image feature map

Polarization degree image characteristic diagram

And polarization angle image feature map

After the processing of the channel dimension reduction module, an intensity image feature map is output

Polarization degree image characteristic diagram

And polarization angle image feature map

H =1,2.... H, wherein c _N Representing the number of channels of the feature diagram after the channel dimension reduction module;

step 4.3, the Fusion module is composed of H Fusion modules and is respectively marked as Fusion ₁ ,...,Fusion _h ,...,Fusion _H Wherein, fusion _h Representing a h-th level fusion block;

wherein, the Fusion block Fusion of the level 1 ₁ Includes a 1 st-stage T-layer two-dimensional convolution layer Dconv2d' ₁ And O layer convolution layer conv of level 1 ₁ The other H-1 level fusion blocks comprise T two-dimensional convolution layers, an upper sampling layer and O layers;

h-th-stage T-layer two-dimensional convolution layer Dconv2d' _h T-th two-dimensional buildup layer of (1) Dconv2d' _h,t Sequentially comprises the following steps: a convolution layer, a batch normalization and a Relu activation function layer, wherein the convolution kernel size of the t-th convolution layer is k ″) _t ×k″ _t (ii) a The h-th O-layer convolution layer conv _h Middle o layerConvolution layer conv _h,o Has a convolution kernel size of k' _o ×k″′ _o ；

When h =1, the intensity image feature map is set

Degree of polarization image feature map

And polarization angle image feature map

Jointly input into the Fusion module and pass through the h-level Fusion block Fusion _h H-th-stage T-layer two-dimensional buildup layer Dconv2d' _h After processing, respectively outputting h-level intensity image characteristic maps

H-order polarization degree image characteristic diagram

And h-th order polarization angle image feature map

And splicing the channels to obtain a h-level splicing characteristic diagram

H-th level of stitching feature maps

Then passes through a h-th-stage T-layer two-dimensional winding layer Dconv2d' _h After the processing, outputting to obtain a processed mosaic

Then passing through the h-th O-layer convolution layer conv _h Outputting the obtained dimension-reduced splicing characteristic diagram

Wherein, c _o Is the number of channels output after the convolution layer;

dimension reduction splicing feature map by splitting processing

Splitting into three weight feature maps

And

feature map of intensity image

And with

Multiplied, polarization degree image feature map

And with

Multiplicative, polarization angle image signatures

And with

Adding the multiplied three results, and finally outputting the h-level rough characteristic diagram

When H =2,3.... H, the H-level intensity image feature map is used

Polarization degree image characteristic diagram

And polarization angle image feature map

Inputting the h-th level up-sampling layer Usample _h After the spatial resolution of the signal is changed to be a times of the input signal, the h-level up-sampling intensity characteristic diagram is output

Up-sampling polarization degree characteristic diagram

And up-sampling polarization angle profiles

x _h ,y _h Respectively representing the height and the width of an output characteristic diagram of the upper sampling layer;

will be provided with

And with

The addition of the two components is carried out,

and

the addition is carried out in such a way that,

and

after addition, the sum is inputted to the h-th two-dimensional convolution layer Dconv2d' _h After the h-level intensity image characteristic graph is processed, the h-level intensity image characteristic graph is correspondingly output

H-order polarization degree image characteristic diagram

And h-th order polarization angle profile

And obtaining an h-level splicing characteristic diagram after channel splicing

The h-th level splicing characteristic diagram

Then passes through the h-th two-dimensional convolution layer Dconv2d 'of the T-layer' _h After the intermediate processing, outputting to obtain a processing and splicing characteristic diagram

Then passing through the h-th O-layer convolution layer conv _h The post-output obtains a dimension reduction splicing characteristic diagram

Wherein, c _o The number of channels output after the convolutional layer;

dimension reduction splicing feature map by splitting processing

Splitting into three weight feature maps

And

feature map of intensity image

And with

Multiplication, polarization degree image characteristic diagram

And

multiplicative, polarization angle image signatures

And with

So that the H-th coarse feature map is output by the H-th fusion block

Step 4.4, the up-sampling module is composed of H-2 aggregation convolution blocks and an up-sampling block, and each aggregation convolution block comprises: the system comprises an aggregation convolution layer, a batch normalization layer and a Relu activation function layer, wherein the convolution kernel size of the aggregation convolution layer is k x k;

mapping the H-th coarse feature

Inputting the sampling block to perform a-time upsampling, and fusing with H-1 level Fusion block _H-1 H-1 level coarse feature map of output

Performing channel splicing and obtaining a rough characteristic diagram after H-1 level splicing

The H-1 level semi-rough characteristic diagram is obtained after the output processing of H-2 aggregation volume blocks

The H-1 level semi-rough characteristic diagram

Inputting the sampling data into the upsampling block for a-time upsampling, and performing H-2 stage Fusion _H-2 Output coarse feature map

Performing channel splicing and obtaining a rough characteristic diagram after H-2 level splicing

The H-2 level semi-rough characteristic diagram is obtained after the output processing of H-2 aggregation volume blocks

So as to obtain the 1 st-level semi-rough characteristic diagram after the output processing of the H-2 aggregation volume blocks

The output module is composed of a layer of convolution layer, and the convolution kernel size of the output module is b multiplied by b two-dimensional convolution;

level 1 semi-coarse feature map of the nth scene

Outputting the nth camouflage prediction map pre after the processing of the output module _n ；

Step 5, training a camouflage target detection model based on a polarized image clue;

based on the intensity images, the linear polarization degree images, the linear polarization angle images and the corresponding real camouflage images of the N scenes after data enhancement, a gradient descent method is utilized to train the camouflage target detection model based on the polarization image clues, and the weighted binary cross entropy loss and the weighted IoU loss are used as loss functions together for calculating the loss between the camouflage prediction image and the real camouflage image so as to update model parameters until the loss functions are converged, so that the camouflage target detection model of the optimal polarization image clue is obtained and is used for carrying out camouflage target detection on any intensity image to be predicted and any polarization degree image.

An electronic device of the present invention includes a memory for storing a program for supporting a processor to execute the disguised object detection method, and a processor configured to execute the program stored in the memory.

The present invention is a computer-readable storage medium having stored thereon a computer program characterized in that the computer program is executed by a processor to perform the steps of the disguised object detection method.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, the neural network based on the polarized image clues is constructed, the label data is used for monitoring the deep neural network for learning, so that the disguised target detection characteristic model with robustness is obtained, and the problems that a plurality of characteristic information is ignored and the detection precision is low due to the fact that model design is carried out based on clues such as colors, depth and background prior in a statistical model are solved, and the disguised target is detected more accurately.

2. The depth neural network based on the polarized image clues constructed by the invention introduces the polarized information, considers that the reflection of polarized light to each object is different, so that the polarization difference exists in the whole image, and utilizes the characteristic to well segment the target and the background. Meanwhile, the intensity image is complementary with the polarization degree image and the polarization angle image, so that the problem that the detection effect of the disguised target is poor due to low quality of a depth map in the detection of the disguised target with RGB-D as input is solved.

3. According to the depth neural network based on the polarization image clues, the intensity image I, the linear polarization degree image DoLP and the linear polarization angle image AoLP are input into the network in parallel, the intensity image characteristics, the polarization degree image characteristics and the polarization angle image characteristics obtained by the encoder are fused by the fusion module, the characteristic performance of the intensity image is enhanced, the problems that a single-image camouflage target detection network model is poor in learning of more visual perception knowledge are solved, and therefore the robustness of camouflage target detection in a low-contrast or complex scene is effectively improved.

4. According to the invention, the up-sampling module is used, and the fused results of the intensity image characteristics, the polarization degree image characteristics and the polarization angle image characteristics are aggregated on each layer of the encoder, so that a more refined prediction result is obtained, the detection edge is smoother, and the accuracy of the detection of the camouflage target is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention for detection of a pretended object using polarized image cues;

FIG. 2 is a schematic diagram of a deep neural network structure based on polarized image cues according to the present invention;

FIG. 3 is a graph of the detection results of the decoy target on the polarization data set by the method of the present invention and other decoy target detection methods.

Detailed Description

In this embodiment, as shown in fig. 1, a method for detecting a disguised target based on a polarized image cue aims to solve the problems that an existing network lacks much learning visual perception knowledge and has poor detection performance in a low-contrast or background cluttered scene, and a multi-channel input disguised target detection model capable of effectively detecting a disguised target in a complex scene is obtained by constructing a deep neural network based on a polarized image cue, so that the accuracy and precision of detecting the disguised target in a low-contrast or complex and changeable environment can be improved, and specifically, the method includes the following steps:

step 1, obtaining a polarized image data set with pixel level marks;

In this embodiment, a polarization-of-focal-plane (DoFP) polarimeter in a lucidtron polarization camera is used to photograph a polarization camouflage object detection data set, which contains N =639 scenes in total, with 352 width and 352 height per image.

Step 1.2, respectively labeling the N groups of original polarization images to obtain pixel-level labeled images; and the value range of each pixel point in the pixel-level labeling image is between (0, 1); forming a polarized image data set D by N groups of original polarized images and corresponding pixel level labeled images and dividing the polarized image data set D into a training data set D _tr And a test data set D _te ；

In this embodiment, labeling is performed through Labelme software, where a labeled image is obtained by assigning a category label V to each pixel in a polarized image, where V belongs to (0.1), and the colors of the pixels are respectively represented as: black, white; where black represents the background of the pixel and white represents the target. And dividing a polarization camouflage target detection data set into training and testing, wherein the training set comprises 511 scenes, and the testing machine comprises 128 scenes.

Stokes vector of

In the formula (1), the acid-base catalyst,

representing the total light intensity, i.e. intensity I, of objects in the nth scene ⁿ ，

in this embodiment, there is a difference between four original polarization angle images of the same scene, and this difference determines the stokes vector, which is an important parameter of polarization, and the obtained intensity image is a color image, the number of channels is 3, and the height and width thereof are consistent with those of the original polarization angle images.

Step 2.2, stokes vector pair by using formula (2)

Performing calculation imaging to obtain a linear polarization degree image DoLP of the nth scene ⁿ ；

In this embodiment, the degree of polarization image DoLP obtained from the stokes vector is a grayscale image, the number of channels is 3, and the height and width thereof are consistent with the original polarization angle image.

Step 2.3, stokes vector pair by using formula (3)

Performing calculation imaging to obtain a linear polarization angle image AoLP of the nth scene ⁿ ；

In this embodiment, the polarization angle image AoLP obtained from the stokes vector is a gray scale image, the number of channels is 3, and the height and width thereof are consistent with those of the original polarization angle image.

Step 3, intensity image I of nth scene ⁿ Linear polarization degree image DoLP ⁿ And linear polarization angle image AoLP ⁿ Carrying out data enhancement processing to obtain the enhanced intensity image of the nth scene

Linear polarization degree image

And linear polarization angle image

And forming an nth polarized image set;

taking the pixel level annotation image of the nth scene as a real camouflage image of the nth scene, and marking as G ⁿ True camouflage G for nth scene ⁿ Data enhancement is carried out to obtain a real camouflage painting after the data enhancement

In this embodiment, the intensity image, the DoLP image, the AoLP image, and the corresponding real camouflage image in the polarization camouflage target detection data set are subjected to data enhancement in a manner including rotation and mirror inversion, and the intensity image, the DoLP image, the AoLP image, and the corresponding real camouflage image of each scene in the polarization camouflage target detection are expanded to 4 times of the original intensity image, doLP image, aoLP image, and the corresponding real camouflage image.

Step 4, constructing a detection model of the disguised target based on the polarized image clues, comprising the following steps: the device comprises an encoder, a channel dimension reduction module, a fusion module, an up-sampling module and an output module;

step 4.1, constructing an encoder, wherein the encoder is composed of three Res2Net50 backbone networks with the same structure, each Res2Net backbone network is composed of H down-sampling blocks, and the H down-sampling blocks are respectively marked as DSMampleBlock ₁ ,……,DSampleBlock _h ,......,DSampleBlock _H (ii) a Wherein, the DSampleBlock _h Representing a block of down-sampling of the h-th levelH =1,2, ·.. H; and the h-th level down-sampling block DSampleBlock _h From the h-th X layer two-dimensional convolution layer Dconv2d _h Are connected in series;

in this embodiment, H =4,resnet backbone network corresponds to Res1, res2, res3, res4 in fig. 2, where Res1, res2, res3, res4 correspond to 4 downsampling blocks dsampletlock respectively ₁ ，DSampleBlock ₂ ，DSampleBlock ₃ ，DSampleBlock ₄ 。

The x-th two-dimensional convolution layer Dconv2d of the h-th order _h,x Sequentially comprises the following steps: a convolution layer, a batch normalization and a Relu activation function layer, wherein the x-th two-dimensional convolution layer Dconv2d _h,x The convolution kernel size of the convolution layer in (1) is k _x ；x＝1,2,......,X；

In the present embodiment, X =3,k _x Equal to 1,3,1, respectively; when h =1, DSMampleBlock ₁ Contains 9 Dconv2d ₁ When h =2, DSMampleBlock ₂ Contains 12 Dconv2d ₂ When h =3, DSMampleBlock ₃ Contains 18 Dconv2d ₃ When h =4, DSMampleBlock ₄ Contains 9 Dconv2d ₄ 。

When h =1, inputting the nth polarized image set into the encoder, and respectively corresponding to h-level down-sampling blocks DSampleBlock passing through three Res2Net backbone networks _h H-th order X-layer convolution layer Dconv2d in (1) _h And respectively outputting the nth intensity image characteristic map of the h level

Nth polarization degree image characteristic diagram of h level

And nth polarization angle characteristic diagram of h order

x _h ,y _h ,c _h Respectively representing the h-th down-sampling block dsampletlock _h The height, width and channel number of the output feature map; in thatIn this example, x _h ＝88,y _h ＝88,c _h ＝256。

Characteristic diagram of nth polarization degree image of h-1 level

And the nth polarization angle image feature map of the h-1 th order

Degree of polarization image feature map

And polarization angle image feature map

Polarization degree image characteristic diagram

And polarization angle image feature map

x _h-1 ,y _h-1 ,c _h-1 Respectively representing the h-1 th down-sampling block dsampletlock _h-1 The height, width and channel number of the output feature map; x is a radical of a fluorine atom _H ,y _H ,c _H Respectively representing the H-th lower miningSample block DSampleBlock _H The height, width and channel number of the output feature map;

in this example, the output characteristic diagram of the 2 nd-level multilayer two-dimensional convolution layer

And

is 44, 44, 512, and the output characteristic diagram of the 3 rd level multi-layer two-dimensional convolution layer

22, 22, 352, output characteristic diagram of a 4 th-level multilayer two-dimensional convolution layer

And

the width, height and number of channels of (2) are all 11, 11, 2048.

Step 4.2, the channel dimension reduction module is formed by sequentially connecting H layers of two-dimensional convolution layers in series, and each layer of two-dimensional convolution layer sequentially comprises: one convolution kernel is k' _h ×k′ _h The two-dimensional convolution layer, a batch normalization and a Relu activation function; in this embodiment, k' _h 1,3, respectively, the channel dimension reduction module corresponds to the CJ module in FIG. 2.

Intensity image feature map

Polarization degree image characteristic diagram

And polarization angle image feature map

After being processed by the channel dimension reduction module, the output intensityImage feature map

Polarization degree image characteristic diagram

And polarization angle image feature map

H =1,2.... H, wherein c _N Representing the number of channels of the feature diagram after the channel dimension reduction module; in the present embodiment, when h =1,2,3,4, c _N Are both 32.

Step 4.3, constructing the Fusion module is composed of H Fusion modules and respectively marked as Fusion ₁ ,...,Fusion _h ,...,Fusion _H Wherein, fusion _h Representing a h-th level fusion block; in the present embodiment, fusion _h Corresponding to the Fuse module in fig. 2.

h-th-stage T-layer two-dimensional convolution layer Dconv2d' _h T-th two-dimensional buildup layer of (1) Dconv2d' _h,t Sequentially comprises the following steps: a convolution layer, a batch normalization and a Relu activation function layer, wherein the convolution kernel size of the t-th convolution layer is k ″) _t ×k″ _t (ii) a H-th O-layer convolution layer conv _h Second layer convolution layer conv _h,o Has a convolution kernel size of k' _o ×k″′ _o (ii) a In the present embodiment, T =2,k ″ _t 1,3 respectively; o =1,k' _o ＝1。

When h =1, the intensity image feature map is set

Polarization degree image characteristic diagram

And polarization angle image feature map

The combined input is input into a Fusion module and passes through a h-level Fusion block Fusion _h The h-th-stage T-layer two-dimensional build-up layer Dconv2d' _h After processing, respectively outputting h-level intensity image characteristic maps

H-order polarization degree image characteristic diagram

And h-th order polarization angle image feature map

And obtaining h-level splicing characteristic diagram after channel splicing

H-th level of stitching feature maps

Then passes through the h-th two-dimensional convolution layer Dconv2d 'of the T-layer' _h After the processing, outputting to obtain a processing mosaic

Wherein, c _o Is the number of channels output after the convolution layer; in this embodiment, c _o ＝3。

Dimension reduction splicing feature map by splitting processing

Splitting into weighted feature maps

And

feature map of intensity image

And

multiplication, polarization degree image characteristic diagram

And

multiplication, polarization angle image characteristic map

And

adding the multiplied three results, and finally outputting the h-th level rough characteristic diagram

In the present embodiment

And

the number of the channels of (a) is 1,

is 88, 88, 32, respectively.

When H =2,3.... H, the H-level intensity image feature map is used

Degree of polarization image feature map

And polarization angle image feature map

Input the h-th level upsampling layer Usample _h After the spatial resolution of the signal is changed to be a times of the input signal, the h-level up-sampling intensity characteristic diagram is output

Up-sampling polarization degree characteristic diagram

And up-sampling polarization angle signatures

x _h ,y _h Respectively representing the height and the width of an output characteristic diagram of the upper sampling layer; in this embodiment, a =2, and the upsampling layer sampling transpose convolution convTransposed at each level upsamples the signal; x is a radical of a fluorine atom ₂ ,y ₂ 44, respectively; x is a radical of a fluorine atom ₃ ,y ₃ 22, respectively; x is the number of ₄ ,y ₄ 11, 11 respectively.

Will be provided with

And

the addition is carried out in such a way that,

and

the addition is carried out in such a way that,

and

after addition, the sum is inputted to the h-th two-dimensional convolution layer Dconv2d' _h After the h-level intensity image characteristic diagram is correspondingly output after the h-level intensity image characteristic diagram is processed

H-order polarization degree image feature map

And h-th order polarization angle profile

And h-level splicing characteristic diagram is obtained after channel splicing

H-th level mosaic feature map

Then passes through a h-th-stage T-layer two-dimensional winding layer Dconv2d' _h After the intermediate processing, outputting to obtain a processing and splicing characteristic diagram

Wherein, c _o Is the number of channels output after the convolution layer;

dimension reduction splicing characteristic graph through splitting processing

Splitting into weight feature maps

And

characterizing the intensity image

And with

Multiplication, polarization degree image characteristic diagram

And

multiplication, polarization angle image characteristic map

And with

So that the H-th coarse feature map is output by the H-th fusion block

In this embodiment, c _o ＝3；

Is 44, 44, 32, 22, 32, 11, 32, respectively.

Step 4.4, the up-sampling module is composed of H-2 aggregation convolution blocks and an up-sampling block, and each aggregation convolution block comprises: the system comprises an aggregation convolution layer, a batch normalization layer and a Relu activation function layer, wherein the size of a convolution kernel of the aggregation convolution layer is k multiplied by k; in this embodiment, k =3, the convolution step is 1, and the padding is 0.

Drawing the H-th coarse feature

Inputting the sampling block to perform a-time upsampling, and fusing with the H-1 level Fusion block _H-1 Output H-1 level coarse feature map

The H-1 level semi-rough characteristic diagram

After a times of up-sampling is carried out in an input up-sampling block, the H-2 level Fusion is carried out _H-2 Output coarse feature map

In this embodiment, the nth half-roughness profile is obtained

The number of channels of (2) is 32.

The output module is composed of a layer of convolution layer, and the convolution kernel size is b multiplied by b two-dimensional convolution; in this embodiment, the convolution kernel size of the two-dimensional convolution is 1 × 1, the input channel is 32, and the output channel is 1.

Nth half roughness profile

The nth camouflage prediction image pre is output after being processed by the output module _n ；

and 5.1, training a disguised target detection model based on a polarized image clue by using a gradient descent method based on the intensity images, the linear polarization degree images, the linear polarization angle images and the corresponding real disguised images of the N scenes after data enhancement, and using weighted binary cross entropy loss and weighted IoU loss as loss functions together for calculating the loss between the disguised prediction image and the real disguised image so as to update model parameters until the loss functions are converged, thereby obtaining the disguised target detection model of the optimal polarized image clue, and performing disguised target detection on any intensity image to be predicted and any polarization degree image.

In this embodiment, the intensity images, doLP images, aoLP images of 2044 scenes after data enhancement by the polarization data set and the corresponding real camouflage maps are used for training, and the signals output by the channel dimensionality reduction module, the fusion module, the upsampling module and the output module are output

Respectively with the real camouflage pattern

And calculating weighted binary cross entropy and weighted IoU loss to obtain 5 training losses, adding the 5 training losses to obtain a total loss, and guiding the network to train by using the total loss and combining a gradient descent algorithm to obtain the camouflage target detection characteristic model using the polarized image clues.

In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the disguised object detection method, and a processor configured to execute the program stored in the memory.

In the present embodiment, a computer-readable storage medium is a computer program stored on a computer-readable storage medium, and the computer program is executed by a processor to execute the steps of the disguised object detection method.

TABLE 1

Methods	S-measure	E-measure	F-measure	MAE
					BASNet	0.830	0.868	0.722	0.020
SINet-V1	0.789	0.811	0.646	0.042
					LSR	0.863	0.910	0.793	0.014
PFNet	0.849	0.910	0.769	0.017
					C ² FNet	0.860	0.913	0.774	0.018
SINet-V2	0.865	0.925	0.784	0.015
					Ours	0.879	0.927	0.805	0.014

Table 1 shows that the detection method of the disguised target based on the polarized image clues respectively takes S-measure, F-measure and MAE as evaluation indexes, utilizes a test set of a polarized disguised target detection data set and compares the evaluation indexes with the results of other current disguised target detection methods, and S-measure is used for measuring the similarity of results of area-oriented and object-oriented prediction between a prediction disguised image and a real disguised image, and the closer the value of the S-measure is to 1, the better the detection effect of the disguised target is. The E-measure is the combination of local pixels and image-level average values, and commonly captures image-level statistic h-amount local pixel matching information, and the closer the value is to 1, the better the detection effect of the disguised target is. The 'F-measure' is a weighted harmonic mean of the precision rate and the recall rate, and the closer the value is to 1, the better the detection effect of the camouflage target is. The MAE is the average absolute error, the difference between the predicted value and the actual value is measured, and the closer the value is to 0, the better the detection effect of the disguised target is. From the quantitative analysis in table 1, it can be seen that the method of the present invention achieves the best results in all four evaluation indexes.

Fig. 3 shows the results of the method for detecting a disguised object based on polarized image cues and other current methods for detecting a disguised object according to the present invention. Wherein, ours represents the camouflage target detection method based on the polarized image clue; BASNet shows that two U-shaped networks are stacked in sequence, a significance map is generated in a prediction-refinement mode, and a mixed training loss is further provided to supervise a training process; SINet-V1 shows that the masquerading object is designed and identified by a searching module and an identification module based on the behavior process of a hunter; the LSR represents a multi-task learning framework in the pretending target prediction process, introduces auxiliary tasks such as classification and the like, and provides a model for simultaneously positioning, segmenting and sequencing the pretending targets, wherein the Rank module can sequence the difficulty degree of the pretending target detection; PFNet represents learning based on context characteristics, provides a new distraction mining strategy, and develops a framework for accurately detecting a disguised target, wherein the framework positions a potential target by exploring long-term correlation and refines a segmentation effect by distraction discovery and removal; c ² Net indicates that a double-branch global context module (DGCM) is designed based on context features to mine rich context information, and in addition, an attention-induced cross-layer fusion module (ACFM) is introduced to aggregate multi-level features; the detection mechanism of the SINet-V2 is similar to that of the SINet-V1, the detection mechanism is also divided into two steps of searching and identifying, and meanwhile, in order to better realize interlayer information fusion and prevent information loss or characteristic redundancy, a reverse guidance method is adopted to obtain a final camouflage prediction map.

Claims

1. A camouflaged target detection method based on polarized image clues is characterized by comprising the following steps:

step 1, obtaining a polarized image data set with pixel level marks;

Thereby obtaining N groups of original polarization images under N scenes; wherein the content of the first and second substances,

Stokes vector of

In the formula (1), the reaction mixture is,

step 2.2, using equation (2) to correct the Stokes vector

Step 2.3 of aligning the Stokes vectors by using the formula (3)

Step 3, carrying out intensity image I on the nth scene ⁿ Linear polarization degree image DoLP ⁿ And linear polarization angle image AoLP ⁿ Carrying out data enhancement processing to obtain the enhanced intensity image of the nth scene

Linear polarization degree image

And angle of linear polarizationImage of a person

And forming an nth polarized image set;

taking the pixel level annotation image of the nth scene as the real camouflage image of the nth scene, and recording the image as G ⁿ True camouflage G of the nth scene ⁿ Data enhancement is carried out to obtain a real camouflage image after the data enhancement

step 4.1, the encoder is composed of three Res2Net50 backbone networks with the same structure, and the Res2Net backbone network is composed of H down-sampling blocks, and the H down-sampling blocks are respectively marked as DSMampleBlock ₁ ,……,DSampleBlock _h ,......,DSampleBlock _H (ii) a Wherein, the DSampleBlock _h Represents an H-level downsample block, H =1, 2.... H; and the h-th level down-sampling block DSampleBlock _h From the h-th X layer two-dimensional convolution layer Dconv2d _h Are connected in series;

the x-th two-dimensional convolution layer Dconv2d of the h-th level _h,x Sequentially comprises the following steps: a convolution layer, a batch normalization and a Relu activation function layer, wherein the x-th two-dimensional convolution layer Dconv2d _h,x The convolution kernel size of the convolution layer in (1) is k _x ；x＝1,2,......,X；

H stageCharacteristic diagram of the n-th polarization degree image

And nth polarization angle characteristic diagram of h order

when H =2,3.... H, the nth intensity image feature map of the H-1 level is used

Characteristic diagram of nth polarization degree image of h-1 level

And the nth polarization angle image feature map of the h-1 th order

Corresponding to h-level down-sampling blocks DSampleBlock input into three Res2Net backbone networks respectively _h And obtaining corresponding intensity image characteristic diagram

Polarization degree image characteristic diagram

And polarization angle image feature map

Degree of polarization imageCharacteristic diagram

And polarization angle image feature map

and 4.2, the channel dimension reduction module is formed by sequentially connecting H layers of two-dimensional convolution layers in series, and each layer of two-dimensional convolution layer sequentially comprises: one convolution kernel is k _h ′×k _h ' two-dimensional convolution layer, a batch normalization and a Relu activation function;

the intensity image feature map

Degree of polarization image feature map

And polarization angle image feature map

Polarization degree image characteristic diagram

And polarization angle image feature map

step 4.3, the Fusion module is composed of H Fusion modules and is respectively marked as Fusion ₁ ,...,Fusion _h ,...,Fusion _H Wherein, fusion _h Representing an h-th level fusion block;

wherein, the Fusion block Fusion of the level 1 ₁ Including a 1 st-level T-layer two-dimensional convolution layer Dconv2d ₁ ' and O layer convolution layer conv of level 1 ₁ The other H-1 level fusion blocks comprise T two-dimensional convolution layers, an upper sampling layer and O layers;

h-th-level T-layer two-dimensional convolution layer Dconv2d _h ' the t-th two-dimensional convolution layer Dconv2d _h ′ _,t Sequentially comprises the following steps: a convolution layer, a batch normalization and a Relu activation function layer, wherein the convolution kernel size of the t-th convolution layer is k _t ″×k _t "; the h-th O layer convolution layer conv _h Second layer convolution layer conv _h,o Has a convolution kernel size of k _o ″′×k _o ″′；

When h =1, the intensity image feature map is set

Degree of polarization image feature map

And polarization angle image feature map

Jointly input into the Fusion module and pass through the h-level Fusion block Fusion _h H-th order T layer two-dimensional convolution layer Dconv2d in (1) _h ' after processing, the h-level intensity image characteristic maps are respectively output

H-order polarization degree image characteristic diagram

And h-th order polarization angle image feature map

And splicing the channels to obtain a h-level splicing characteristic diagram

H-th level of stitching feature maps

Then passes through the h-th-level T-layer two-dimensional convolution layer Dconv2d _h ' after processing, the output is processed to obtain a mosaic

Wherein, c _o The number of channels output after the convolutional layer;

dimension reduction splicing feature map by splitting processing

Splitting into three weight feature maps

And

feature map of intensity image

And with

Multiplication, polarization degree image characteristic diagram

And with

Multiplicative, polarization angle image signatures

And

When H =2,3.... H, the H-level intensity image feature map is used

Degree of polarization image feature map

And polarization angle image feature map

Inputting the h-th level up-sampling layer Usample _h In the method, after the spatial resolution of the signal is changed to be a times of the input signal, the h-level up-sampling intensity characteristic diagram is output

Up-sampling polarization degree characteristic diagram

And up-sampling polarization angle profiles

will be provided with

And

the addition is carried out in such a way that,

and

the addition is carried out in such a way that,

and with

After addition, the sum is inputted to the h-th level T-layer two-dimensional convolution layer Dconv2d _h ' after the processing in the method, the h-level intensity image feature map is correspondingly output

H-order polarization degree image feature map

And h-th order polarization angle characteristic diagram

And h-level splicing characteristic diagram is obtained after channel splicing

The h-th level splicing characteristic diagram

Then passes through the hLevel T-layer two-dimensional convolution layer Dconv2d _h After the' middle processing, the output obtains a processing splicing characteristic diagram

Wherein, c _o The number of channels output after the convolutional layer;

dimension reduction splicing characteristic graph through splitting processing

Splitting into three weight feature maps

And

feature map of intensity image

And with

Multiplied, polarization degree image feature map

And with

Multiplicative, polarization angle image signatures

And with

Adding the multiplied three results, and finally outputting the h-level rough feature map

So that the H-th coarse feature map is output by the H-th fusion block

Step 4.4, the up-sampling module is composed of H-2 aggregation convolution blocks and an up-sampling block, and each aggregation convolution block comprises: an aggregate convolution layer, a batch normalization and a Relu activation function layer, wherein the convolution kernel size of the aggregate convolution layer is k x k;

mapping the H-th coarse feature

Inputting the sampling data into the up-sampling block for a-time up-sampling, and fusing the sampling data with an H-1 level Fusion block _H-1 Output H-1 level coarse feature map

The H-1 level semi-rough characteristic diagram is obtained after the output treatment of H-2 aggregation volume blocks

The H-1 level semi-rough feature map

Inputting the sampling data into the upsampling block for a-time upsampling, and performing H-2 stage Fusion _H-2 Output roughness profile

So as to obtain the 1 st-stage semi-rough characteristic diagram after the output processing of the H-2 aggregation volume blocks

level 1 semi-coarse feature map of the nth scene

2. An electronic device comprising a memory and a processor, wherein the memory is used for storing a program that enables the processor to execute the method of detecting a false target of claim 1, and the processor is configured to execute the program stored in the memory.

3. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, is adapted to perform the steps of the disguised object detection method as claimed in claim 1.