CN114926629B

CN114926629B - Infrared ship target significance detection method based on lightweight convolutional neural network

Info

Publication number: CN114926629B
Application number: CN202210346815.XA
Authority: CN
Inventors: 刘兆英; 贺俊然; 张婷; 张学思
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2024-03-22
Anticipated expiration: 2042-03-31
Also published as: CN114926629A

Abstract

The invention discloses an infrared ship target significance detection method based on a lightweight convolutional neural network, which designs a lightweight module SimpleInceptionwithDilated (SIWD), aims to realize the expansion of a receptive field through hole convolution under the condition of reducing parameters, simplifies the existing classical network, realizes the further reduction of the number of parameters, and compensates the defect brought by single upsampling by combining two different upsampling with a SIWD module in the upsampling process. The invention realizes the improvement of the result under the condition that the parameter quantity is obviously reduced. In addition, the invention also constructs a data set containing 3069 infrared ship target images aiming at the problem of lacking the infrared ship significance detection data set. The method has strong operability and expandability, and is suitable for detecting the saliency of the infrared ship target in the sea surface background.

Description

Infrared ship target significance detection method based on lightweight convolutional neural network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an infrared ship target significance detection method based on a convolutional neural network.

Background

The infrared image has the advantages of good concealment, strong penetrability, no influence of illumination intensity, night work and the like, so the infrared imaging technology is widely applied to civil and military aspects. However, due to the influence of imaging technology and environment, infrared images are generally low in contrast, low in signal-to-noise ratio, lack of texture information, uneven in gray distribution, and also have the influence of sea clutter, islands, seaweed and the like, so that the analysis and the processing of the infrared images are very challenging. For marine infrared ship target images, ship targets usually show obvious visual saliency due to the influence of heat sources such as an engine, a chimney and the like, so that the saliency target detection becomes an important preprocessing step of infrared image analysis processing.

The traditional saliency target detection method is mainly based on an image processing method, the method is based on manual selection of features, the features are designed based on priori knowledge, the adaptability to different scenes is poor, and the effect facing complex backgrounds is generally unsatisfactory. Along with the rapid development of the deep learning technology, the method based on the convolution neural network is widely applied to solve the problem of the detection of the salient objects, the method based on the deep learning relies on a large number of marked data sets for learning, the deep features are automatically extracted, the limitations of the traditional artificial design features are overcome, the recognition effect on complex scenes is good, the generalization capability is strong, however, most of the current salient object detection network models based on the deep learning are designed for visible light images, and the model is complex and the parameter quantity is large. Because of the lack of the disclosed infrared target data set, the application of the method based on deep learning in infrared ship target saliency detection is less, and on the other hand, because of the application requirement, the infrared ship saliency detection has higher requirements on the accuracy and the speed of an algorithm, so that the research on the rapid light infrared ship saliency target detection method has important significance and application value. .

Most of the existing significance detection models are based on a classical VGG16 network, and are usually five-layer models, so that the purpose of extracting the characteristics of enough quantity, enough receptive fields and strong characterization capability is achieved. However, the sea surface infrared ship image is lack of texture, color and other information, so that the complex feature extraction network is not needed in practice, and therefore, the invention improves the model in the following two aspects, and designs a lightweight rapid saliency target detection network: firstly, in order to reduce the parameter quantity, the fifth layer network with the largest parameter quantity in the VGG16 network is pruned, and the backbone network is changed into a four-layer model, so that the parameter quantity of the model is reduced; second, to reduce the loss of the pruned fifth layer network, the present invention designs a new lightweight module-a simple implant module with hole convolution (Simple Inception with Dilated, SIWD) replaces the traditional convolutional network in the bone stem network by using two different upsampling in combination with SIWD to compensate for the drawbacks of single upsampling. The invention has the advantages of high detection precision, high real-time performance, high operability and high expandability, and is suitable for the infrared ship significance detection of sea surface background.

Disclosure of Invention

The invention aims to solve the technical problem of providing an infrared ship saliency detection method based on a light-weight convolutional neural network, which is used for sea surface infrared ship saliency detection and meets the real-time performance and effectiveness required by saliency detection by designing a light-weight model. In order to achieve the above purpose, the invention adopts the following technical scheme: aiming at the problem that the infrared ship saliency algorithm based on deep learning lacks a data set, an infrared ship saliency detection data set is constructed. And extracting images in the infrared ship video, and marking the foreground and the background of the images by using labelme software. And designing an infrared ship significance detection network by using the constructed data set. The backbone network is modified from a classical five-layer VGG16 model to a four-layer model, and the design light module and the up-sampling module are applied to the backbone network, so that the requirements of instantaneity and effectiveness are met. The model is trained with the constructed dataset and saved.

The infrared ship significance detection method based on the light-weight convolutional neural network comprises the following steps of:

step 1: the classical VGG16 five-layer model is changed into a four-layer model to be used as a backbone network.

Step 2, designing a lightweight module SIWD, wherein the module is formed by combining an acceptance structure of four layers of branches with cavity convolution, and has the advantage of increasing receptive field while reducing the size of a network.

And 3, designing a double-branch up-sampling module (Two Branch Upsampling) TBU based on the SIWD module and combining up sampling algorithms.

And 3, applying the SIWD and TBU modules to the backbone network proposed in the step 1.

And 5, storing the model in the step 4 for model test.

Drawings

Fig. 1 (a) is a frame image of infrared boat video extraction.

Fig. 1 (b) is a schematic view of the ship selection tag corresponding to fig. 1 (a).

FIG. 2 is a flow chart of the infrared ship significance detection method of the invention.

FIG. 3 is a schematic diagram of a SIWD module according to the present invention

Fig. 4 (a) is a frame image of a test dataset.

FIG. 4 (b) is an image of the test result of the present invention.

Fig. 5 (a) is a frame image of a test dataset.

FIG. 5 (b) is an image of the test result of the present invention.

Detailed Description

The invention provides an infrared ship significance detection method based on a light convolutional neural network, which is explained and illustrated by the following drawings:

the data processing mode is as follows: the programming extracts each frame of infrared video (fig. 1 (a)), with a channel number of 3, a pixel value e 0,256, and a size of 256×256. And 3068 infrared ship target images are selected, outlines of the images are carefully marked by labelme software, and image foreground and background (fig. 1 (b)) are generated, wherein file names and frame image names are the same.

The embodiment flow of the invention is as follows:

step 1: the infrared image is different from the common VGG16 five-layer structure due to the lack of color information, texture information, edge blurring and other characteristics, the fifth layer with the greatest parameter amount is deleted, the backbone network of the invention consists of four SWIDs, namely four-layer structure, and is respectively marked as S _i I=1, 2,3,4, the output of each SWID module is X _i I=1, 2,3,4, up-sampling module is four TBUs, respectively labeled T _i I=1, 2,3,4, and the corresponding output of each module is U _i I=1, 2,3,4, wherein

The final prediction result is o=sigmoid (Conv (U ₄ ) And C) _cat (f ₁ ,...,f _n ) For concat operation, i.e. feature f ₁ ,...,f _n Conv is a convolution operation by channel concatenation, and the flow chart of the network is shown in FIG. 2.

Step 2: to design a lightweight module SIWD, in order to reduce the number of parameters, the invention uses convolution kernels with the sizes of 1×3 and 3×1 to replace the convolution kernel with the size of 3×3, and uses hole convolution to increase the receptive field. The SIWD module mainly comprises four branches, wherein a first branch of the four branches is a convolution layer with a convolution kernel size of 1 multiplied by 1 and a void ratio of 1, a second branch is a convolution layer with a convolution kernel size of 3 multiplied by 1 and a void ratio of 1 respectively, input features respectively pass through the two convolution layers and then are added point by point to form an output of the second branch, a third branch is a convolution layer with a convolution kernel size of 3 multiplied by 1 and a void ratio of 3 respectively, input features respectively pass through the two convolution layers and then are added point by point to form an output of the third branch, a fourth branch is a convolution layer with a convolution kernel size of 3 multiplied by 1 and a void ratio of 1 multiplied by 3 respectively, input features respectively pass through the two convolution layers and then are added point by point to form an output of the fourth branch, and the input I is calculated by adding point by _S After the convolution layers of the four branches are respectively input, the obtained outputs are spliced, and then are fused through a point convolution layer to obtain O _L Finally, a shortcut mechanism is introduced to input I _S And O _L Spliced to obtain the final output O of SIWD _S The specific structure is as shown in fig. 3, and the process can be expressed as follows:

wherein the method comprises the steps ofA represents a convolution operation with a convolution kernel size of x and a void fraction of y _add (f ₁ ,...,f _n ) Representing the characteristic f ₁ ,...,f _n Add point by point, B _i Is the output characteristic of the ith branch.

Step 3: in the up-sampling process, in order to make up the defect caused by single up-sampling, the up-sampling module TBU of the invention comprises two branches, wherein the first branch consists of a SIWD module and an Upsampling module, the second branch consists of a SIWD module and a PixelSheffe, and input data I is firstly input _I Respectively passing through two branches, and adding the outputs of the two branches to obtain an up-sampled final output O _TBU The process can be expressed as

O _TBU ＝A _add (U _up (SIWD(I _T )),P _ps (SIWD(I _T ))) (4)

Wherein U is _up (x) Representing the Upsampling operation on x, P _ps (x) Representing a PixelShuffle operation on x.

Step 4: and (3) combining the TBU and the SIWD to form a final network by combining the backbone network obtained in the step one, sequentially inputting training data into the network, selecting super parameters such as iteration times, learning rate and the like, using the sum of cross entropy loss function and similarity structure loss as a loss function, and back-propagating the training network according to a network result.

Step 5: and (5) storing the model trained in the step (4) for model test. Inputting the test infrared ship image into the model to obtain a prediction result, and calculating the average absolute error and F of the prediction result and the true value _β The values evaluate the model performance. The infrared ship image is shown in fig. 4 (a) and fig. 5 (a), and the predicted image is shown in fig. 4 (b) and fig. 5 (b). The quantitative results of this model on the test set are compared with those of other models as shown in table 1.

TABLE 1

Method	MAE	F _β	Parameters
				FT	0.9481	0.0178	-
DSS	0.2109	53.20	62.24M
				NLDF	0.0046	74.42	25M
Light_NLDF	0.0055	73.29	20.55M
				BAS	0.0049	75.73	87M
MLU	0.0047	76.16	24.04M
				Ours+	0.0040	78.33	3.69M

The above examples are only for describing the present invention and are not intended to limit the technical solutions described by the present invention. Therefore, the technical scheme and the improvement thereof without departing from the spirit and scope of the invention are all included in the scope of the claims of the invention.

Claims

1. The infrared ship target significance detection method based on the lightweight convolutional neural network is characterized by comprising the following steps of:

step 1: processing ship target data sets; extracting representative images in the video, and marking the outline of the target ship by using labelme software as a label;

step 2: the fifth layer of VGG16 five-layer structure with the greatest deleting parameter quantity, the backbone network is composed of four SWIDs, namely four-layer structure modules, which are respectively marked as S _i I=1, 2,3,4 each of the structural modules has an output of X _i I=1, 2,3,4, up-sampling structure module is four TBUs, respectively labeled T _i I=1, 2,3,4, and the corresponding output of each structural module is U _i I=1, 2,3,4, wherein

The final prediction result is o=sigmoid (Conv (U ₄ ) And C) _cat (f ₁ ,...,f _n ) For concat operation, i.e. feature f ₁ ,...,f _n Splicing according to channels, wherein Conv is convolution operation;

step 3: a lightweight block SIWD is designed, and convolution kernels with the sizes of 1×3 and 3×1 are usedInstead of a 3 x 3 convolution kernel, a kernel convolution is used to increase the receptive field; the light weight module SIWD comprises four branches, wherein a first branch of the four branches is a convolution layer with the convolution kernel size of 1 multiplied by 1 and the void ratio of 1, a second branch is a convolution layer with the convolution kernel sizes of 3 multiplied by 1 and 1 multiplied by 3 respectively, input features respectively pass through the two convolution layers and then are added point by point to form the output of the second branch, a third branch is a convolution layer with the convolution kernel sizes of 3 multiplied by 1 and 1 multiplied by 3 respectively, the void ratio of 3 is a convolution layer with the input features respectively pass through the two convolution layers and then are added point by point to form the output of the third branch, a fourth branch is a convolution layer with the convolution kernel sizes of 3 multiplied by 1 and 1 multiplied by 3 respectively and the void ratio of 5 respectively, the input features respectively pass through the two convolution layers and then are added point by point to form the output of the fourth branch, and the input I is obtained _S After the convolution layers of the four branches are respectively input, the obtained outputs are spliced, and then are fused through a point convolution layer to obtain O _L Finally, a shortcut mechanism is introduced to input I _S And O _L Spliced to obtain the final output O of SIWD _S The process is expressed as:

wherein the method comprises the steps ofA represents a convolution operation with a convolution kernel size of x and a void fraction of y _add (f ₁ ,...,f _n ) Representing the characteristic f ₁ ,...,f _n Add point by point, B _i Output characteristics for the ith branch;

step 4: the up-sampling module TBU comprises two branches, wherein the first branch consists of a SIWD and an Upsampling, and the second branch consists of a SIWD and a PixelShuffle, composition; first input data I _T Respectively passing through two branches, and adding the outputs of the two branches to obtain an up-sampled final output O _TBU The process is expressed as

O _TBU ＝A _add (U _up (SIWD(I _T )),P _ps (SIWD(I _T ))) (4)

Wherein U is _up (x) Representing the Upsampling operation on x, P _ps (x) Representing a PixelSheffle operation on x;

step 5: combining the TBU and the SIWD with the backbone network obtained in the step two to form a final network, sequentially inputting training data into the network, selecting iteration times and learning rate super-parameters, using the sum of cross entropy loss function and similarity structure loss as a loss function, and back-propagating the training network according to a network result;

step 6: and (5) storing the up-sampling module TBU trained in the step (4) for testing.