CN115861608A

CN115861608A - Disguised target segmentation method and system based on light intensity and polarization clues

Info

Publication number: CN115861608A
Application number: CN202211327795.8A
Authority: CN
Inventors: 曹铁勇; 付炳阳; 郑云飞; 王烨奎; 方正; 赵斐; 申海霞; 王杨; 陈雷; 韩彤
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-03-28

Abstract

The invention discloses a camouflage target segmentation method and a system based on light intensity and polarization clues, wherein the method comprises the following steps: acquiring a picture to be detected, and preprocessing the picture to be detected to obtain a polarization information graph of the picture to be detected, wherein the polarization information graph comprises a light intensity graph, a polarization degree graph and a polarization angle graph; inputting the polarization information graph of the picture to be tested into a pre-trained camouflage target segmentation network model, and sequentially performing multilayer feature extraction processing, compression processing, multilayer feature fusion processing, multi-branch search processing and three-mode fusion processing on each polarization information graph to obtain a mask image of the camouflage target. The method introduces the polarization information comprising light intensity and polarization clues, solves the problem of dividing the disguised target by taking the light intensity and the spectral polarization as clues by using a multi-layer fusion and multi-branch search dual-stage fusion method, and helps a polarization vision system to accurately find the disguised target highly similar to the current scene.

Description

Camouflage target segmentation method and system based on light intensity and polarization clues

Technical Field

The invention relates to a disguised target segmentation method and system based on light intensity and polarization clues, and belongs to the technical field of scene segmentation in computer vision.

Background

In recent years, under the rapid development of the intelligent manufacturing industry, imaging systems such as polarization, hyperspectral, infrared and the like have been gradually applied to various fields and play a vital role. The polarization imaging technology mainly detects the polarization characteristics closely related to the physical characteristics of the target through the polarization states of target reflected light and radiated light. The technology can improve the contrast of the target and the background and enrich the detail information of the image. Therefore, the polarized image is applied to the task of detecting the disguised object in more and more fields, such as: underwater camouflage target detection, flaw detection, military camouflage target identification and the like.

In the early conventional method, researchers designed a feature extraction operator suitable for polarization images to detect targets. With the development of deep learning techniques, many scholars combine polarization images with deep learning. However, the current research has more defects. Firstly, most of the existing methods for fusing polarization characteristics and light intensity characteristics adopt direct addition, so that the advantages of various modal information cannot be fully exerted, and the effective information of the polarization characteristics and the light intensity characteristics cannot be fully fused. Secondly, in complex environments such as forest and grass, the contrast between the camouflage target and the background is increased by the polarization information, and meanwhile, the detail information of other objects in the environment is also increased, so that a large amount of interference information appears in the polarization image.

Disclosure of Invention

The invention aims to provide a method and a system for segmenting a disguised target based on light intensity and polarization clues aiming at the problem that a disguised target segmentation model is difficult to efficiently fuse the advantage information of light intensity and polarization in a complex environment.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides a method for segmenting a disguised object based on light intensity and polarization clues, comprising:

the method comprises the steps of obtaining a picture to be detected, preprocessing the picture to be detected, and obtaining a polarization information graph of the picture to be detected, wherein the polarization information graph comprises a light intensity graph, a polarization degree graph and a polarization angle graph;

inputting the polarization information graph of the picture to be tested into a pre-trained camouflage target segmentation network model, and performing multi-layer characteristic extraction processing on each polarization information graph to obtain multi-layer modal characteristics of each polarization information graph; compressing each modal characteristic of each polarization information graph to obtain a multi-layer compressed characteristic of each polarization information graph; performing multilayer feature fusion processing on the multilayer compression features of each polarization information image to obtain the cross fusion features of each polarization information image; performing multi-branch search processing on the cross fusion characteristics of each polarization information graph to obtain residual fusion characteristics of each polarization information graph; and carrying out three-mode fusion processing on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

Further, the obtaining of the picture to be detected and the preprocessing of the picture to be detected to obtain the polarization information map of the picture to be detected specifically include:

selecting a scene to be detected, and acquiring light intensities of the picture to be detected in four polarization directions of 0 degree, 45 degrees, 90 degrees and 135 degrees through a professional gray level polarization camera;

and calculating the Stokes parameters according to the light intensities in the four polarization directions to obtain a polarization information graph of the picture to be detected.

Further, the multi-layer feature extraction comprises the step of performing feature extraction on each polarization information graph by using a Res2Net-50 feature extraction network to obtain multi-layer modal features of each polarization information graph.

Further, the performing multi-layer feature fusion processing on the multi-layer compression features of each polarization information graph to obtain the cross fusion features of each polarization information graph specifically includes:

obtaining a public space attention diagram of each polarization information diagram according to the compression characteristics of adjacent layers in each polarization information diagram, wherein the calculation formula is as follows:

F ^SA ＝SA(Concat((F ₁ +U(F ₂ )),(F ₁ ×U(F ₂ ))))，

in the formula, F ^SA A common space attention map; f ₁ And F ₂ Are two compression characteristics of adjacent layers; u (-) denotes an upsample operation; concat (·) denotes the connection operation; SA (-) represents a space attention operation;

performing spatial alignment by taking the public spatial attention map as the weight of the combined features, performing channel attention operation on the obtained spatial alignment features respectively, and finally obtaining the adjacent layer cross fusion features of each polarization information map through connection operation, wherein the calculation formula is as follows:

F＝Concat(CA(F ^SA ×(F ₁ +U(F ₂ ))),CA(F ^SA ×(F ₁ ×U(F ₂ ))))，

in the formula, F is the cross fusion characteristic of adjacent layers; CA (-) represents a channel attention operation;

continuously fusing the compression characteristics of adjacent layers to obtain the final cross fusion characteristic of each polarization information graph

Further, the performing multi-branch search processing on the cross fusion feature of each polarization information graph to obtain the residual fusion feature of each polarization information graph specifically includes:

the cross fusion characteristics of the light intensity graphs are respectively subjected to three-branch operation to respectively obtain high-resolution characteristic graphs F _u Original resolution characteristic diagram F _c And low resolution feature map F _d (ii) a The three-branch operation comprises an upsampling branch operation, a convolution branch operation and a pooling branch operation;

mapping the high resolution feature map F _u Original resolution characteristic diagram F _c And low resolution feature map F _d Respectively and sequentially carrying out multi-scale channel attention processing and element multiplication fusion, and then carrying out addition operation and fusion on three branch operations to obtain convolution group characteristics F _ucd ；

Characterizing the convolution group by F _ucd Cross-fusion features with intensity maps

Residual error fusion characteristic F for obtaining light intensity graph by residual error fusion _I The calculation formula is as follows:

in the formula, F _I Residual fusion characteristics of the light intensity map;

is a cross-fusion feature of the intensity map; u (-) denotes an upsample operation; d (-) represents a downsampling operation; conv ₃ (. Cndot.) represents a convolution operation with a convolution kernel size of 3 × 3; m (-) represents a multiscale channel attention operation;

according to the cross fusion characteristics of the polarization degree graphs, the cross fusion characteristics of the polarization angle graphs and the multi-branch search processing, residual error fusion characteristics F of the polarization degree graphs are obtained respectively _P Residual fusion feature F with polarization angle map _A 。

Further, the three-mode fusion processing is performed on the residual fusion features of the polarization information graph to obtain a mask image of the camouflage target, and the method specifically includes:

combining and fusing the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target, wherein the calculation formula is as follows:

in the formula, Ψ (-) represents a convolution group comprising Conv layer-BN layer-ReLU layer; concat (·) denotes the connection operation; conv ₃ (. Cndot.) represents a convolutional layer having a convolution kernel size of 3 × 3; m _S () represents a multi-scale spatial attention operation; pred denotes the final mask image; a. The _i 、A _p And A _a Residual fusion features F, each of which is a light intensity map _I Residual error fusion characteristic F of polarization degree graph _P Residual fusion feature F with polarization angle map _A And (5) combining the result characteristics.

Further, the training method of the disguised target segmentation network model comprises the following steps:

acquiring a training set, wherein the training set comprises polarization diagrams acquired by a plurality of professional polarization cameras;

processing the polarization diagram in the training set according to a disguised target segmentation method, optimizing and adjusting parameters of a disguised target segmentation network model according to a loss function value until a loss function value tends to be stable, and stopping training;

the calculation formula of the loss function is as follows:

L＝λ ₁ L _wbce +λ ₂ L _wDL ，

wherein L is the loss function of this embodiment; l is a radical of an alcohol _wbce For cross entropy lossA function; l is _wDL Is a Dice loss function; lambda ₁ Is the weight of the cross entropy loss function; lambda [ alpha ] ₂ Respectively, the weights of the Dice loss functions.

In a second aspect, the present invention provides a system for segmenting a disguised object based on light intensity and polarization cues, comprising:

a preprocessing module: the image processing device is used for preprocessing an image to be detected to obtain a polarization information graph of the image to be detected, wherein the polarization information graph comprises a light intensity graph, a polarization degree graph and a polarization angle graph;

a multilayer feature extraction module: the polarization information graph acquisition module is used for sequentially carrying out multilayer characteristic extraction processing on each polarization information graph to obtain multilayer modal characteristics of each polarization information graph;

a multi-layer feature compression module: the multi-layer modal characteristic compression processing device is used for performing multi-layer characteristic compression processing on the multi-layer modal characteristic of each polarization information graph to obtain the multi-layer compression characteristic of each polarization information graph;

a multi-layer feature fusion module: the device is used for carrying out multilayer feature fusion processing on the multilayer compression features of each polarization information graph to obtain adjacent layer cross fusion features of each polarization information graph;

a multi-branch search module: the multi-branch search processing is carried out on the cross fusion characteristics of adjacent layers of each polarization information graph to obtain the residual fusion characteristics of each polarization information graph;

a tri-modal fusion module: and the three-mode fusion processing is carried out on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

In a third aspect, a computer device includes a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of the above.

In a fourth aspect, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.

Compared with the prior art, the invention has the following beneficial effects:

according to the method, the problem of disguised target segmentation with light intensity and spectral polarization as clues is solved by introducing polarization information comprising light intensity and polarization clues into images collected by a polarization camera of a special scene and using a multi-layer fusion and multi-branch search two-stage fusion method, the disguised target which is highly similar to the current scene is helped to be accurately found out by a polarization vision system, meanwhile, multi-level features of all modes are effectively fused, the multi-level features are acted by a multi-layer fusion and multi-branch search mode, and finally, the disguised target is accurately segmented by adopting a three-mode combination fusion mode.

Drawings

FIG. 1 is a flowchart of a method for segmenting a disguised object according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating a process of segmenting a network model by a disguised target according to an embodiment of the present invention;

FIG. 3 is a block diagram of a feature compression coding network block according to an embodiment of the present invention;

FIG. 4 is a block diagram of a multi-layer feature fusion network block according to an embodiment of the present invention;

FIG. 5 is a block diagram of a multi-branch search network block according to an embodiment of the present invention;

fig. 6 is a structural diagram of a tri-modal convergence network block according to an embodiment of the present invention;

FIG. 7 is a test chart of the output of the disguised target segmentation network model according to the first embodiment of the present invention;

FIG. 8 is an output side view of a deep neural network segmentation model in the prior art according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The first embodiment is as follows:

the embodiment provides a disguised object segmentation method based on light intensity and polarization clues, which specifically comprises the following steps:

acquiring a picture to be detected, and preprocessing the picture to be detected to obtain a polarization information graph of the picture to be detected, wherein the polarization information graph comprises a light intensity graph, a polarization degree graph and a polarization angle graph;

inputting the polarization information graph of the picture to be tested into a pre-trained camouflage target segmentation network model, and performing multi-layer characteristic extraction processing on each polarization information graph to obtain multi-layer modal characteristics of each polarization information graph; performing multi-layer characteristic compression processing on the multi-layer modal characteristics of each polarization information graph to obtain the multi-layer compression characteristics of each polarization information graph; performing multilayer feature fusion processing on the multilayer compression features of each polarization information graph to obtain the cross fusion features of each polarization information graph; performing multi-branch search processing on the cross fusion characteristics of each polarization information graph to obtain residual fusion characteristics of each polarization information graph; and carrying out three-mode fusion processing on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

The steps mainly comprise three stages: the method comprises a data acquisition and preprocessing stage, a model construction stage, a training stage and an application stage. Each stage is described in detail below.

1. Data acquisition and preprocessing stage

In this embodiment, a professional grayscale polarization camera is used as a receiving source to record polarization information in a scene, and obtain a picture to be measured or a picture for training. During pretreatment, a mask plate with four polarization directions is covered on a sensing surface of a camera to record light intensities (I) in four polarization directions of 0 degree, 45 degrees, 90 degrees and 135 degrees _0° ,I _45° ,I _90° And I _135° ). And then, calculating a Stokes parameter by using the light intensities in the four directions, and finally obtaining the light intensity, the polarization degree and the polarization angle of the image by using the Stokes parameter, thereby obtaining a to-be-detected image or a polarization information graph for training a network model, wherein the polarization information graph specifically comprises a light intensity graph, a polarization degree graph and a polarization angle graph.

2. Model construction phase and training phase

In this embodiment, as shown in fig. 2, the decoy target segmentation network model mainly includes a multi-layer feature extraction network block, a feature compression coding network block, a multi-layer feature fusion network block, a multi-branch search network block, and a tri-modal fusion network block, which are connected in sequence.

The method is mainly implemented by using an existing network Res2Net-50 with feature extraction capability in a multilayer feature extraction network block, wherein three polarization information graphs, namely a light intensity graph, a polarization degree graph and a polarization angle graph, containing a camouflage target are input, feature extraction is carried out on the three polarization information graphs, and multilayer modal features E of each polarization information graph are obtained, in the graph 2, the size of the modal features of each polarization information graph is sequentially reduced from the bottom to the top, and a primary modal feature, a secondary modal feature, a tertiary modal feature, a quaternary modal feature and a quinary modal feature are sequentially arranged from the bottom to the upper part.

And the characteristic compression coding network block performs multi-layer characteristic compression processing (CFE) on the multi-layer modal characteristics of each polarization information graph by adopting a multi-scale convolution and hole convolution method to obtain the multi-layer compression characteristics of each polarization information graph, wherein the channel of each compression characteristic is 64. The network structure is shown in fig. 3, the modal characteristics E of the multiple layers of each polarization information diagram first pass through four 1 × 1 convolutional layers (if not otherwise stated, the convolution kernel of the 1 × 1 convolutional layer described herein is 1 × 1 and the step size is 1); except for the first 1 × 1 convolutional layer, no other structure is accessed, after the other 1 × 1 convolutional layers, respectively connecting 3 × 3 convolutional layers (if not additionally stated, the convolutional kernels of the 3 × 3 convolutional layers described herein are 3 × 3 and have a step size of 1), 5 × 5 convolutional layers (if not additionally stated, the convolutional kernels of the 5 × 5 convolutional layers described herein are 5 × 5 and have a step size of 1), and 7 × 7 convolutional layers (if not additionally stated, the convolutional kernels of the 7 × 7 convolutional layers described herein are 7 × 7 and have a step size of 1), and then respectively connecting with the hole convolutional layers of the corresponding convolutional kernel sizes; and splicing the convolution layers passing through the four branches by the channels, performing 1 × 1 convolution layer processing on the obtained result, adding the obtained result and the modal characteristics subjected to the 1 × 1 convolution layer processing, and finally activating by the ReLu activation function to obtain the compression characteristics corresponding to the modal characteristics in each polarization information graph one by one.

Multi-layer compression of features for each polarization information graph by a multi-layer feature fusion network blockAnd performing multilayer feature fusion processing (CLFM) to obtain the cross fusion features of each polarization information graph, relieving the scale change of the features of different layers by introducing an attention mechanism, and effectively fusing compact features of all levels. The network structure is shown in fig. 4, and comprises a first convolution group and a second convolution group, wherein the first convolution group comprises a 3 × 3 convolution layer, a global maximum pooling layer (CGMP layer) along a channel direction, a 3 × 3 convolution layer and a Sigmoid function layer which are sequentially connected; the second convolution group includes a global maximum pooling layer (GMP layer), a 1 × 1 convolution layer, and a Sigmoid function layer, which are connected in this order. First-order compression characteristic F with maximum length, width and size ₁ And the second-order compression characteristic F of the previous layer ₂ For example, the two-stage compression feature F is first obtained ₂ Performing advanced difference processing (UP × 2 for short) to match the first-stage compression characteristic F ₁ The length and width are the same, and then the first-stage compression characteristic F ₁ Respectively carrying out addition and multiplication processing to respectively obtain an addition compression characteristic and a multiplication compression characteristic; the added compression features and the multiplied compression features are input into a first convolution group for space attention operation after being spliced by channels, and a public space attention diagram F is obtained ^SA Public space attention map F ^SA After the multiplication processing is performed on the addition compression characteristic and the multiplication compression characteristic respectively, the channel attention operation is performed, and finally, the two adjacent layers are subjected to channel splicing again and convolution processing (in this text, the convolution layer represents convolution processing of 3 × 3 if no special description exists), so that the cross fusion characteristic F of the two adjacent layers is obtained. The continuous upward fusion is carried out to obtain the final cross fusion characteristic of each polarization information graph

Common space attention diagrams F in a multi-layer feature fusion network block ^SA The calculation formula of (2) is as follows:

F ^SA ＝SA(Concat((F ₁ +U(F ₂ )),(F ₁ ×U(F ₂ ))))，

in the formula, F ^SA A common space attention map; u (-) denotes an upsample operation; concat (·) denotes the connection operation; SA (-) represents a space attention operation; it is composed ofThe general definition formula for the mid-space attention operation is:

SA(x)＝Sigmoid(conv ₃ (CGMP(conv ₁ (x))))，

in the formula, conv ₁ (. Cndot.) represents a convolutional layer having a convolution kernel size of 1 × 1; CGMP (-) represents the global max pooling operation along the channel direction; conv ₃ (. Smallcap.). Smallcap.3x.3 convolutional layers; sigmoid (·) denotes Sigmoid activation function.

In the multi-layer feature fusion network block, the calculation formula of the cross fusion feature F of adjacent layers is as follows:

F＝Concat(CA(F ^SA ×(F ₁ +U(F ₂ ))),CA(F ^SA ×(F ₁ ×U(F ₂ ))))，

in the formula, F is the cross fusion characteristic of adjacent layers; CA (-) represents a channel attention operation; wherein, the general definition formula of the channel attention operation is as follows:

CA(x)＝Sigmoid(conv ₁ (GMP(x)))，

in the formula, GMP (-) represents a global max-pooling operation; conv ₁ (. Cndot.) represents a convolutional layer having a convolution kernel size of 1 × 1; sigmoid (·) denotes Sigmoid activation function.

In the multi-branch search network block, multi-branch search processing (MSM) is carried out on the cross fusion characteristics of each polarization information graph to obtain the residual fusion characteristics of each polarization information graph, richer global information can be mined, and the detection performance of the disguised target is improved. The network structure is shown in FIG. 5, and the cross fusion characteristic of each polarization information graph

Respectively carrying out three-branch operation, the first branch carrying out up-sampling processing, the second branch carrying out convolution processing of the convolution layer, and the third branch carrying out pooling processing of the pooling layer to respectively obtain high-resolution characteristic graphs F _u Original resolution characteristic diagram F _c And low resolution feature map F _d (ii) a High resolution feature map F _u Original resolution characteristic diagram F _c And low resolution feature map F _d Separately through a multi-scale channel attention (MS-CA) mechanismProcessing, and respectively comparing with the high-resolution feature map F without MS-CA processing _u Original resolution characteristic diagram F _c And low resolution feature map F _d Carrying out element multiplication fusion; then the first branch is processed with down sampling, the third branch is processed with up sampling, the second branch is not processed with other processing, the results of the three branches are added to obtain the convolution group characteristic F _ucd (ii) a Convolution group feature F _ucd Convolution processing is carried out on the convolution layer, and then the cross fusion characteristic of the convolution layer and each polarization information graph is combined>

Residual error fusion is carried out, and finally, the convolution treatment of the convolution layer is carried out to obtain residual error fusion characteristics which are respectively residual error fusion characteristics F of the light intensity graph _I Residual error fusion characteristic F of polarization degree graph _P And residual fusion feature F of polarization angle map _A Fusing the features F with the residual of the intensity map _I For example, the calculation formula is:

is a cross-fusion feature of the intensity map; u (-) denotes an upsample operation; d (-) represents a downsampling operation; conv ₃ (. Cndot.) represents a convolution operation with a convolution kernel size of 3 × 3; m (-) represents a multiscale channel attention operation; where as shown in fig. 5, the general definition formula for multi-scale channel attention (MS-CA) operation is:

F _b ＝F _a ×Sigmoid(L(F _a )+g(F _a ))，

in the formula, F _a Is an input feature, L (-) represents the channel attention operation of a local feature, g (-) represents the channel attention operation of a global feature, and is generally defined by the formula:

in the formula, PWConv ₁ Expression reducing the number of channels of an input feature to the original

1 × 1 point convolution; b (-) represents a BatchNorm layer; δ (·) denotes the ReLU activation function; PWConv ₂ (. Cndot.) represents a 1 × 1 point convolution that restores the number of channels to the same number as the original input channels; GAP (-) represents the global average pooling operation.

Finally, residual error fusion characteristics F of the light intensity maps are obtained respectively _I Residual error fusion characteristic F of polarization degree graph _P And residual fusion feature F of polarization angle map _A 。

In the three-mode fusion network block, three-mode fusion processing (TFM) is carried out on residual fusion characteristics of the polarization information graph to obtain a mask image of a camouflage target, and the three-mode fusion network block can selectively gather specific information of each mode to explore key semantic clues on different modes, enhance characteristic characterization and obtain an accurate prediction graph. The network structure is shown in FIG. 6, and the residual fusion characteristic F of the obtained light intensity diagram _I Residual error fusion characteristic F of polarization degree graph _P And residual fusion feature F of polarization angle diagram _A Respectively processed by a third convolution group to respectively obtain corresponding F _I ’、F _P ' and F _A ', the third convolution group comprises a convolution layer, a BN layer and a ReLu activation function layer which are connected in sequence; the obtained result is combined by a fourth convolution group to obtain a corresponding A _i 、A _p And A _a (ii) a Finally F _I ’、F _P ' and F _A ' A corresponding thereto _i 、A _p And A _a Performing fusion processing to obtain a final mask image; the calculation formula is as follows:

where Ψ (·) represents a convolution operation for the third convolution group; concat (·) denotes the connection operation; conv ₃ (. Cndot.) represents a convolutional layer having a convolution kernel size of 3 × 3; m _S () represents a multiscale spatial attention (MS-SA) operation; pred denotes the final mask image.

The above is a specific structure of the disguised target segmentation network model in this embodiment, and the network model needs to be optimally trained after the construction is completed, and implemented by using a PyTorch framework. Both training and testing used a 6-core computer equipped with an Intel (R) Xeon (R) E5-2609 v3.9GHz CPU and an NVIDIA GeForce RTX3090Ti GPU (24 GB memory). Using the momentum SGD optimizer, the weight decay is 5e-4, the initial learning rate is 1e-3, and the momentum is 0.9. In addition, the batch size was set to 4 and the learning rate was adjusted by a poly strategy with a factor of 0.9 and the network trained for 50 rounds. The specific method of model training is as follows:

and processing the polarization diagram in the training set according to the disguised target segmentation method, optimizing and adjusting parameters of the disguised target segmentation network model according to the loss function value until the loss function value tends to be stable, and stopping training.

The training set is an I-P camouflage target segmentation polarization diagram (4390 images in total), and various scenes in the polarization diagram are randomly segmented into 2930 training images and 1470 test images. Images of various sizes in the polarization diagram are uniformly scaled to 352 × 352 size in training, and the output result of image segmentation is re-adjusted to the original size of the input image. The parameters of the feature extraction network are initialized by the pre-trained Res2Net-50 network, and the other parameters are initialized randomly.

The loss function adopted in this embodiment is a method combining cross entropy loss function and Dice loss, and the calculation formula is:

L＝λ ₁ L _wbce +λ ₂ L _wDL ，

wherein L is the loss function of this embodiment; l is _wbce Is a cross entropy loss function; l is _wDL Is a Dice loss function;λ ₁ the weight of the cross entropy loss function is a fixed value; lambda [ alpha ] ₂ The weights of the Dice loss functions are fixed values respectively.

And preprocessing the training image in a segmentation method, inputting the training image into the camouflage target segmentation network model for segmentation to obtain a prediction image, optimizing and adjusting parameters of the camouflage target segmentation network model according to the loss function value until the loss function value tends to be stable, and stopping training. In this embodiment, in order to improve the training effect, the prediction graphs in the training process are all supervised by the corresponding artificially labeled mask graphs in the training set.

3. Application phase

In this example, 1470 test images were used for testing, and to verify the effectiveness of the invention, the results of comparison with other recent methods in the field were shown and other deep neural networks for segmentation in the field (EAFNet, PGSNet, RD3D, MIDD and SwinNet) were retrained and trained on the same data set using available codes as disclosed. In the prediction process, the collected gray level polarization image is used as an input image, three modes of light intensity, polarization degree and polarization angle are obtained through preprocessing, and the input image enters the network of the invention to finally obtain an accurate segmentation image of the disguised target, wherein the specific process is shown as 7. To compare the present invention with other deep neural networks in the field, the same data source is used to predict the network through different deep neural networks. Fig. 8 shows that the segmentation result of the present invention is more accurate and clearer than other networks, and other networks have false detection and missed detection.

In addition, the method is suitable for segmenting the disguised target in various scenes, and has wide application value in different fields of manufacturing industry (apparent defect detection), agriculture (such as locust detection), computer vision (such as search and rescue tasks) and the like. The invention detects the polarization characteristics closely related to the physical characteristics of the target by means of the polarization imaging technology mainly through the polarization states of the target reflected light and the radiated light. The technology can improve the contrast of the target and the background and enrich the detail information of the image. Especially for special materials the recognition is much higher than for RGB cameras. And the method generates more accurate and complete segmentation maps of the camouflaged targets aiming at the natural camouflaged targets (small camouflaged targets and large camouflaged targets) with various sizes, and has clear boundaries and coherent details. This demonstrates the effectiveness and practicality of the method of the invention in different scenarios.

The second embodiment:

a system for segmentation of camouflaged objects based on light intensity and polarization cues, comprising:

a tri-modal fusion module: the three-mode fusion processing is carried out on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

The specific network structures of the above modules can be seen from the disguised target segmentation network model in the first embodiment, and are not described in detail again.

Example three:

the embodiment of the invention also provides computer equipment, which comprises a processor and a storage medium;

the storage medium is to store instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method of:

inputting the polarization information graph of the picture to be tested into a pre-trained camouflage target segmentation network model, and performing multi-layer characteristic extraction processing on each polarization information graph to obtain multi-layer modal characteristics of each polarization information graph; performing multi-layer characteristic compression processing on the multi-layer modal characteristics of each polarization information graph to obtain multi-layer compression characteristics of each polarization information graph; performing multilayer feature fusion processing on the multilayer compression features of each polarization information graph to obtain adjacent layer cross fusion features of each polarization information graph; performing multi-branch search processing on the adjacent layer cross fusion characteristics of each polarization information graph to obtain residual fusion characteristics of each polarization information graph; and carrying out three-mode fusion processing on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

Example four:

an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following method steps:

inputting the polarization information graph of the picture to be tested into a pre-trained camouflage target segmentation network model, and performing multi-layer characteristic extraction processing on each polarization information graph to obtain multi-layer modal characteristics of each polarization information graph; performing multi-layer characteristic compression processing on the multi-layer modal characteristics of each polarization information graph to obtain the multi-layer compression characteristics of each polarization information graph; performing multilayer feature fusion processing on the multilayer compression features of each polarization information graph to obtain adjacent layer cross fusion features of each polarization information graph; performing multi-branch search processing on the adjacent layer cross fusion characteristics of each polarization information graph to obtain residual fusion characteristics of each polarization information graph; and carrying out three-mode fusion processing on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A disguised object segmentation method based on light intensity and polarization clues is characterized by comprising the following steps:

inputting the polarization information graph of the picture to be tested into a pre-trained camouflage target segmentation network model, and performing multi-layer characteristic extraction processing on each polarization information graph to obtain multi-layer modal characteristics of each polarization information graph; compressing each modal characteristic of each polarization information graph to obtain a multi-layer compression characteristic of each polarization information graph; performing multilayer feature fusion processing on the multilayer compression features of each polarization information graph to obtain the cross fusion features of each polarization information graph; performing multi-branch search processing on the cross fusion characteristics of each polarization information graph to obtain residual fusion characteristics of each polarization information graph; and carrying out three-mode fusion processing on the residual fusion characteristics of the polarization information image to obtain a mask image of the camouflage target.

2. The disguised object segmentation method based on light intensity and polarization cues as claimed in claim 1, wherein the obtaining of the picture to be detected and the preprocessing of the picture to be detected to obtain the polarization information map of the picture to be detected specifically comprises:

and calculating the Stokes parameters according to the light intensities in the four polarization directions to obtain a polarization information graph of the picture to be measured.

3. The disguised object segmentation method according to claim 1, wherein the multi-layer feature extraction comprises performing feature extraction on each polarization information graph using Res2Net-50 feature extraction network to obtain multi-layer modal features of each polarization information graph.

4. The method as claimed in claim 3, wherein the multi-layer feature fusion processing is performed on the compressed features of the layers of each polarization information graph to obtain the cross fusion features of each polarization information graph, and the method specifically comprises:

F ^SA ＝SA(Concat((F ₁ +U(F ₂ )),(F ₁ ×U(F ₂ ))))，

performing spatial alignment by taking the public spatial attention map as the weight of the combined features, performing channel attention operation on the obtained spatial alignment features respectively, and finally obtaining adjacent layer cross fusion features of each polarization information map through connection operation, wherein the calculation formula is as follows:

F＝Concat(CA(F ^SA ×(F ₁ +U(F ₂ ))),CA(F ^SA ×(F ₁ ×U(F ₂ ))))，

phase of constant fusionThe compression characteristics of the adjacent layers are adopted to obtain the final cross fusion characteristics of each polarization information graph

5. The method for segmenting the disguised target based on the light intensity and the polarization clues as claimed in claim 4, wherein the multi-branch search processing is performed on the cross fusion features of each polarization information graph to obtain the residual fusion features of each polarization information graph, and specifically comprises:

mapping the high resolution feature map F _u Original resolution characteristic diagram F _c And a low resolution feature map F _d Respectively and sequentially carrying out multi-scale channel attention processing and element multiplication fusion, and then carrying out addition operation and fusion three branch operations to obtain a convolution group characteristic F _ucd ；

is a cross-fusion feature of the intensity map; u (-) denotes an upsample operation; d (-) represents a downsampling operation;Conv ₃ (. Cndot.) represents a convolution operation with a convolution kernel size of 3 × 3; m (-) represents a multiscale channel attention operation;

6. The method for segmenting a camouflage target based on light intensity and polarization cues according to claim 5, wherein the three-mode fusion processing is performed on the residual fusion characteristics of the polarization information map to obtain the mask image of the camouflage target, and the method specifically comprises the following steps:

in the formula, Ψ (-) represents the convolution process of a convolution group, including Conv layer-BN layer-ReLU layer; concat (·) denotes the connection operation; conv ₃ (. Cndot.) represents a convolutional layer having a convolution kernel size of 3 × 3; m is a group of _S () represents a multi-scale spatial attention operation; pred denotes the final mask image; a. The _i 、A _p And A _a Residual fusion features F, each of which is a light intensity map _I Residual error fusion characteristic F of polarization degree graph _P Residual fusion feature F with polarization angle map _A And (5) combining the result characteristics.

7. The disguised object segmentation method based on light intensity and polarization cues as claimed in claims 1 to 6, wherein the training method of the disguised object segmentation network model comprises:

the calculation formula of the loss function is as follows:

L＝λ ₁ L _wbce +λ ₂ L _wDL ，

wherein L is the loss function of this embodiment; l is _wbce Is a cross entropy loss function; l is _wDL Is a Dice loss function; lambda [ alpha ] ₁ Is the weight of the cross entropy loss function; lambda [ alpha ] ₂ Respectively, the weight of the Dice loss function.

8. A system for segmenting camouflaged objects based on light intensity and polarization cues, comprising:

a multi-layer feature extraction module: the polarization information graph acquisition module is used for sequentially carrying out multilayer characteristic extraction processing on each polarization information graph to obtain multilayer modal characteristics of each polarization information graph;

a multi-branch search module: the multi-branch search processing is carried out on the cross fusion characteristics of the adjacent layers of each polarization information graph to obtain the residual fusion characteristics of each polarization information graph;

9. A computer device comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 7.

10. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.