CN114283323A

CN114283323A - Marine target recognition system based on image deep learning

Info

Publication number: CN114283323A
Application number: CN202111624780.3A
Authority: CN
Inventors: 赵琛; 李子文; 王庆兵; 黄攀; 刘明洁; 张王成; 王雪; 闫雪娇
Original assignee: Aerospace Science And Engineering Intelligent Operation Research And Information Security Research Institute Wuhan Co ltd
Current assignee: Aerospace Science And Engineering Intelligent Operation Research And Information Security Research Institute Wuhan Co ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-05

Abstract

The invention relates to a marine target recognition system based on image depth learning, and belongs to the technical field related to artificial intelligence image recognition. The invention provides a marine target recognition system based on image deep learning, which aims to quickly and accurately recognize marine targets, meet the requirements of situation assessment and threat assessment and provide important basis for command decision. Compared with the traditional offshore target identification mode, the accuracy, timeliness and intelligent degree of the technical scheme are greatly improved.

Description

Marine target recognition system based on image deep learning

Technical Field

The invention belongs to the technical field related to artificial intelligence image recognition, and particularly relates to a marine target recognition system based on image deep learning.

Background

The target identification technology is one of the important problems in national defense information technology research, and has important theoretical and application values in the fields of information collection and monitoring and weapon guidance.

With the development of deep learning theory, the deep neural network is widely applied to target detection. However, the current deep neural network for detecting the target has the problems of poor real-time performance and poor detection performance in the detection scene with high real-time requirement.

Disclosure of Invention

Technical problem to be solved

The technical problem to be solved by the invention is as follows: a marine target identification system is designed to identify marine targets more quickly and meet the requirements of situation assessment and threat assessment.

(II) technical scheme

In order to solve the technical problem, the invention provides a marine target recognition system based on image deep learning, which comprises the following steps:

the image preprocessing module is used for acquiring an offshore target image, respectively forming a sample set and a prediction set, preprocessing the sample set and constructing a training label;

the model training module is used for inputting the training labels of the marine target images into an improved YOLO v3 recognition model and training to obtain a marine target recognition model;

the target detection module is used for inputting the prediction set of the marine target image into the marine target recognition model obtained by the model training module, performing target detection to obtain attribute information of the marine target image as a final target detection result; the attribute information includes a category.

Preferably, the image preprocessing module performs image preprocessing on the sample set, and the method for constructing the training label specifically includes:

s1.1, collecting a marine target image, and cutting a sample set of the marine target image into a fixed size;

s1.2, carrying out target frame labeling on the marine ships in the sample set, wherein information parameters of the target frame comprise: x _ center, y _ center, w and h respectively correspond to a central point abscissa, a central point ordinate, the width of the target frame and the height of the target frame, and simultaneously give the classification of each target, and the labeled information is stored in an xml file;

s1.3, constructing a training label: dividing a picture into grids of S by using a YOLO v3 model, wherein each grid is responsible for predicting a target, parameters of each predicted target comprise x _ center, y _ center, w, h, confidence and prob, each target frame is provided with n _ anchor prior frames, so that the characteristic shape of the input training label is [ batch _ size, S, S, n _ anchor (5+ class) ], the batch _ size is batch size, the class is standard point, the xml file is traversed, the sequence number of the grid is calculated according to the center coordinates of the target frame to serve as the index of the target, and then the sequence number of the grid is filled in the xml file.

Preferably, the improved YOLO v3 recognition model is generated by parsing the initial YOLO v3 model based on marine target images, and in a backbone network based on YOLO v3 network, using DarkNet53 as a feature extraction network, wherein a residual network is added; and introducing a prior frame, and obtaining the prior frame in a clustering mode to be used as a regression reference.

Preferably, the convolutional neural network of the improved YOLO v3 recognition model can perform convolution operations of different sizes on the input marine target images of the training set to form different scales of feature degrees of the marine target images; the convolutional neural network learns the characteristics of different scales of the marine target image, and realizes the detection of multiple scales of the marine target.

Preferably, the collected marine target images are input into an improved YOLO v3 recognition model, and the improved YOLO v3 recognition model predicts three 3D tensors 3 dtnsor with different sizes, corresponding to three scales with different sizes.

Preferably, dividing the marine target image to be detected into grids of S × S, and predicting C rectangular frames and confidence coefficients of the rectangular frames by each grid; wherein S represents the number of divided grids; b represents the number of frames responsible for each grid; and selecting the marine target prior bounding box with the maximum confidence score value, and predicting the position of the marine target image to be detected through a logistic regression function.

Preferably, the predicted output of the improved YOLO v3 recognition model is the cell coordinates of the offshore target mesh, the width, height of the bounding box before prediction; the improved YOLO v3 recognition model predicts the score of each bounding box using logistic regression.

Preferably, in the improved YOLO v3 recognition model, a loss function is constructed as a criterion for evaluating an error between a predicted value and a true value of a metric.

Preferably, for the coordinates of the marine target, the loss function adopts a sum of squared errors loss function, the confidence coefficient and the category adopt a binary cross entropy loss function, the prediction is carried out on 3 different scales, and 3 candidate target frames are predicted on each scale;

preferably, when the improved YOLO v3 recognition model is trained, the batch _ size is selected to be 8 sheets, and the optimizer selects an Adam optimizer.

(III) advantageous effects

The marine target recognition system based on image deep learning is provided for rapidly and accurately recognizing marine targets, meeting the requirements of situation assessment and threat assessment and providing important basis for command decision. Compared with the traditional offshore target identification mode, the accuracy, timeliness and intelligent degree of the technical scheme are greatly improved.

Drawings

FIG. 1 is a diagram of the main network structure of the offshore object recognition model of the present invention;

FIG. 2 is a graph of the improved YOLO v3 recognition model loss function in accordance with the present invention;

FIG. 3 is a Scale record chart of the model training process in the present invention;

FIG. 4 is a prediction input diagram for model validation in accordance with the present invention;

FIG. 5 is a diagram of the prediction results of model validation in the present invention.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

In order to quickly and accurately identify the offshore target, meet the needs of situation assessment and threat assessment and provide important basis for command decision, the offshore target identification system based on image deep learning provided by the invention, with reference to fig. 1, comprises:

the image preprocessing module is used for acquiring an offshore target image, respectively forming a sample set and a prediction set, preprocessing the sample set and constructing a training label; the image preprocessing is carried out on the sample set, and the construction of the training label specifically comprises the following steps:

s1.2, carrying out target frame labeling on the marine ships in the sample set, wherein information parameters of the target frame comprise: the x _ center, the y _ center, the w and the h respectively correspond to a central point abscissa, a central point ordinate, a target frame width and a target frame height, and simultaneously give the classification of each target, and the labeled information is stored in an xml file, so that subsequent programs can be read conveniently;

s1.3, constructing a training label: dividing a picture into grids of S by using a YOLO v3 model, wherein each grid is responsible for predicting a target, parameters of each predicted target comprise x _ center, y _ center, w, h, confidence and prob, each target frame is provided with n _ anchor prior frames, so that the characteristic shape of the input training label is [ batch _ size, S, S, n _ anchor (5+ class) ], the batch _ size is batch size, the class is standard point, the xml file can be traversed, the sequence number of the grid is calculated according to the center coordinates of the target frame to serve as the index of the target, and then the sequence number of the grid is filled in the xml file;

the improved YOLO v3 network is generated by analyzing the initial YOLO v3 model based on a marine target image, and in a backbone network based on a YOLOv3 network, DarkNet53 is used as a feature extraction network, wherein a residual network is added, so that the expression of a deep network is improved; introducing a prior frame, and obtaining the prior frame in a clustering (1-IOU) mode to be used as a regression reference; the improved YOLO v3 network is obtained by training according to a sample set and attribute information corresponding to the samples.

The improved convolution neural network of the YOLO v3 model can perform convolution operation with different sizes on the marine target images of the input training set to form feature degrees of the marine target images with different scales;

the convolutional neural network learns the characteristics of different scales of the marine target image, and realizes the detection of multiple scales of the marine target;

inputting the collected marine target image into an improved YOLO v3 recognition model, wherein YOLOv3 predicts three 3D tensors 3 Dtensors with different sizes and corresponds to three scales with different sizes;

dividing an offshore target image to be detected into grids of S x S, and predicting C rectangular frames and confidence coefficients of the rectangular frames by each grid; wherein S represents the number of divided grids; b represents the number of frames responsible for each grid;

selecting a marine target prior bounding box with the maximum confidence score value, and predicting the position of a marine target image to be detected through a logistic regression function;

pr(object)×IOU(b,object)＝σ(t0)，

bx＝σ(tx)+cx，

by＝σ(ty)+cy，

bw＝pw×etw，

bh＝ph×eth，

the prediction output of the model is (tx, ty, tw, th), cx and cy, which represent the unit coordinates of the offshore target grid, and pw and ph represent the width and height of the bounding box before prediction; bx, by, bw and bh are coordinates and width and height of the center of the predicted bounding box; the YOLO v3 model predicts the score of each bounding box by using a logistic regression method; if the overlap between the real box and one of the predicted bounding boxes is better than that between the other bounding boxes, the value may be 1; if not, but exceeds a preset threshold, ignoring the prediction; the YOLO v3 model assigns a bounding box to each real object, and if the real object does not match the bounding box, no class prediction penalty or coordinates will be generated, but only an object prediction penalty.

In the improved YOLO v3 recognition model, the loss function is used as a judgment standard for measuring the error between a predicted value and a true value, and plays a key role in the speed of network learning and the quality of the detection effect of the final model. Constructing a loss function shown in FIG. 2, reducing the target difference when the YOLO v3 model identifies the offshore target frame, reducing the loss of the center coordinate, width and height and square of the sample frame, and increasing the accuracy of sample identification;

as shown in fig. 2, for the coordinates of the offshore object, the loss function adopts the error square and the loss function; confidence and classification using a binary cross entropy loss function, where λ_ccordSetting the value as 5 for the penalty coefficient of coordinate prediction; lambda [ alpha ]_noobjSetting the value to be 0.5 for the penalty coefficient of the confidence coefficient when the moving target is not included; k × K represents the number of meshes into which an input picture is divided; m represents the predicted target frame number of each grid, the method predicts on 3 different scales, and predicts 3 candidate target selection frames on each scale; wherein x_i、y_i、w_i、h_iRespectively represents the horizontal and vertical coordinates and the width and the height of the central point of the predicted moving target,

respectively representing the horizontal and vertical coordinates and the width and the height of the central point of the real moving target;

and

respectively indicating whether the ith grid where the jth candidate target frame is located is responsible for detecting the object; c_iAnd

respectively representing the confidence degrees of the prediction and the reality of the moving object in the ith grid; p is a radical of_i((C) with

Respectively representing the predicted and real probability values of the target belonging to a certain class in the ith grid.

In the improved YOLO v3 recognition model, Darknet53 is adopted as a feature extraction network, short connection is added, residual error networks are used more in a deep network, on one hand, the influence of gradient disappearance caused by the deep network can be relieved, and on the other hand, the number of layers of the network can be increased without influencing the deterioration of model precision. Therefore, the expression power of the deep network can be improved.

The improved YOLO v3 recognition model is used for multi-scale detection, 3 prediction results are output by a network structure, in order to adapt to different targets and improve the detection rate of small targets, a priori frame is introduced, and the priori frame is obtained in a clustering (1-IOU) mode and is used as a regression reference.

In the improved YOLO v3 recognition model, a full convolution network is used, so that the network is seen to be not provided with a full connection network, the number of parameters is reduced, the derivation efficiency can be improved, and pictures with different sizes can be input.

During training of the improved YOLO v3 recognition model, the Batch _ size is selected to be 8, the initial learning rate adopted by the learning rate is 1e-4, exponential attenuation is adopted, and an Adam optimizer is selected by the optimizer and can reduce lost oscillation to a certain extent. Fig. 3 records the Scale record of the training process.

The target detection module is used for inputting the prediction set of the marine target image into the marine target recognition model obtained by the model training module, performing target detection to obtain attribute information of the marine target image as a final target detection result; the attribute information includes at least a category.

The prediction input is shown in fig. 4, and the prediction result is shown in fig. 5.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. An image deep learning-based marine target recognition system, comprising:

2. The system of claim 1, wherein the image pre-processing module performs image pre-processing on the sample set and constructs the training labels in a manner that:

s1.3, constructing a training label: dividing a picture into grids of S by using a YOLO v3 model, wherein each grid is responsible for predicting a target, parameters of each predicted target comprise x _ center, y _ center, w and h, each target frame is provided with n _ anchor prior frames, so that the characteristic shape of the input training label is [ batch _ size, S, S, n _ anchor (5+ class) ], the batch _ size is batch size, class is a standard point, traversing the xml file, calculating the serial number of the grid according to the center coordinates of the target frame as the index of the target, and filling the serial number of the grid in the xml file.

3. The system of claim 1, wherein the improved YOLO v3 recognition model is generated by parsing the initial YOLO v3 model based on marine target images, using DarkNet53 as a feature extraction network in a YOLO v3 network-based backbone network, with a residual network added; and introducing a prior frame, and obtaining the prior frame in a clustering mode to be used as a regression reference.

4. The system of claim 1, wherein the convolutional neural network of the improved YOLO v3 recognition model can perform convolution operations of different sizes on the input marine target images of the training set to form different scales of feature degrees of the marine target images; the convolutional neural network learns the characteristics of different scales of the marine target image, and realizes the detection of multiple scales of the marine target.

5. The system of claim 1, wherein the collected marine target images are input into a modified YOLO v3 recognition model, the modified YOLO v3 recognition model predicting three 3D tensors 3 dtnsor of different sizes corresponding to three different scales.

6. The system of claim 1, wherein the marine target image to be detected is divided into grids of S x S, each grid predicting C rectangular boxes and confidence levels of the rectangular boxes; wherein S represents the number of divided grids; b represents the number of frames responsible for each grid; and selecting the marine target prior bounding box with the maximum confidence score value, and predicting the position of the marine target image to be detected through a logistic regression function.

7. The system of claim 1, wherein the predicted output of the improved YOLO v3 recognition model is the cell coordinates of the offshore target mesh, the width, height of the bounding box before prediction; the improved YOLO v3 recognition model predicts the score of each bounding box using logistic regression.

8. The system of claim 1, wherein in the modified YOLO v3 recognition model, a loss function is constructed as a criterion for evaluating an error between a predicted value and a true value of a metric.

9. The system of claim 8, wherein for the coordinates of marine targets, the penalty function employs a sum of squared errors penalty function, the confidence and class employ a binary cross entropy penalty function, predictions are made on 3 different scales, and 3 candidate target boxes are predicted on each scale.

10. The system of claim 2, wherein when the improved YOLO v3 recognition model is trained, batch _ size is selected to be 8 sheets and the optimizer selects Adam optimizer.