CN112052817A

CN112052817A - Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning

Info

Publication number: CN112052817A
Application number: CN202010967912.1A
Authority: CN
Inventors: 金绍华; 汤寓麟; 边刚; 张永厚; 王美娜
Original assignee: PLA Dalian Naval Academy
Current assignee: PLA Dalian Naval Academy
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2020-12-08
Anticipated expiration: 2040-09-15
Also published as: CN112052817B

Abstract

An improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning belongs to the technical field of side-scan sonar image target identification and deep learning. The invention provides an improved YOLOv3 model side-scan sonar image sunken ship target identification method based on transfer learning, which solves the problems of manual interpretation and manual feature extraction of the existing side-scan sonar image, and solves the problems of poor effect, high alarm-missing rate and low identification speed of an Faster R-CNN model on small target identification. The identification and positioning precision of the sunken ship target is further improved, the model achieves a better convergence effect, and the purposes of improving the overall performance of the model and detecting in real time are finally achieved.

Description

Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning

Technical Field

The invention belongs to the technical field of side-scan sonar image target identification and deep learning, relates to an improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning, is an improved identification algorithm in the technical field of deep learning target identification, and is applied to side-scan sonar image sunken ship target identification.

Background

How to accurately, quickly and efficiently search for the wrecking ship is only an important component of maritime search and rescue and obstacle check. The side-scan sonar can be used for detecting submarine targets, and plays a key role in emergency search and rescue. The side-scan sonar detection generally adopts a towing measurement mode, is influenced by the offshore maneuvering and the length of a towing cable, and generally has the underwater penetration depth of only dozens of meters, which has the defects of low resolution, unobvious target image characteristics and poor sonar image quality in deep sea area measurement. At present, the side-scan sonar image mainly adopts a manual interpretation mode, and has higher requirements on the resolution of the image. The Autonomous Underwater Vehicle (AUV) carries a side-scanning deep-sea submarine, carries out deep-sea high-precision and high-resolution submarine sunken vessel detection, can make up for the defects of ship-borne towing type measurement, but due to the limitation of underwater acoustic communication, the scanned sunken vessel data cannot be transmitted in real time, so that the searching efficiency is low, and the gold time for rescue is missed. Meanwhile, the traditional manual interpretation has the problems of low efficiency, long time consumption, large resource consumption, strong subjective uncertainty, excessive dependence on experience and the like. In order to solve the problems of the manual interpretation mode and weaken the influence of artificial subjective factors, domestic and foreign scholars conduct extensive research, and the method mainly comprises the steps of utilizing a basic algorithm of image processing, an image processing algorithm based on a Pulse Coupled Neural Network (PCNN), a morphological image processing algorithm and identifying a side-scan sonar image target through median filtering, binarization processing, noise suppression, gain negative feedback control, edge feature extraction, image enhancement, image segmentation and the like. Although some typical targets can be identified under the condition of manual intervention, the methods have the problems of high difficulty in feature design, complex processing process, low detection precision and reliability, weak generalization capability and the like due to the fact that manual intervention is more.

In recent years, Convolutional Neural Networks (CNN) have been widely used in various fields such as target location and detection, image classification and recognition, face verification, traffic sign recognition, and speech recognition. The inventor proposes a method for identifying a submarine sunken ship target by side-scan sonar images by using a Faster R-CNN model, and the method has high identification precision, but the time consumption of a region generation network (RPN) network generation suggestion frame is too much, so that the model processing speed is low, and the real-time requirement of sunken ship maritime search and rescue cannot be met. Meanwhile, the ship-sinking target of the side-scan sonar image is generally small in proportion and belongs to a small-scale target, and the fast R-CNN model loses part of position information while acquiring rich semantic information due to regression prediction in a deep characteristic diagram of a convolutional network, so that the small target recognition effect is poor, and the problem of high false alarm rate exists.

Meanwhile, although the convolutional neural network is widely applied in various fields, the performance of the convolutional neural network can be shown only under the conditions that the network structure is relatively complex and the training samples are enough, and the convolutional neural network usually has millions of parameters, so that a large number of labeled samples are required to be used for training the convolutional neural network. And the side-scan sonar sunken ship has less image data, and the model is easy to generate the phenomena of overfitting, falling into a local optimal solution, poor model generalization capability and the like during training.

Disclosure of Invention

The invention aims to provide an improved YOLOv3 model side-scan sonar image sunken ship target identification method based on transfer learning, solve the problems of manual interpretation and manual feature extraction of the existing side-scan sonar image, and solve the problems of poor effect, high false alarm rate and low identification speed of an Faster R-CNN model on small target identification. The identification and positioning precision of the sunken ship target is further improved, the model achieves a better convergence effect, and the purposes of improving the overall performance of the model and detecting in real time are finally achieved.

The technical scheme adopted by the invention for solving the technical problems is as follows:

an improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning comprises the following steps:

step 1: preprocessing the image data set of the side-scan sonar sunken ship;

the step 1 comprises the following steps: standardizing pixels of the whole data set image, and forcibly unifying images with inconsistent sizes into the same pixel; normalizing, converting into float32 format and converting into floating point number in 0-1 range; thirdly, cutting different proportions of the image by adopting a center cutting mode, and then amplifying the image to the size of the original image; and fourthly, expanding the data set by adopting data enhancement.

Step 2: re-clustering the prior boxes using a K-means clustering algorithm. The side-scan sonar image sunken ship target of the data set is in a flat and vertical long shape, and YOLOv3 is obtained by adopting a K-means algorithm taking an intersection ratio as a distance measurement, and the distance formula is shown as follows.

d(b,o)＝1-IOU(b,o)

In the formula: d (b, o) is the distance between the prior frame b and the cluster center o; IOU (b, o) is the intersection-parallel ratio between the prior frame b and the clustering frame o; b_ptIs a prior frame; b_gtAre actual boxes.

And obtaining a prior frame which relatively better accords with the shape characteristics of the sunken ship target through multiple clustering.

And step 3: and carrying out multi-scale feature training of shallow feature fusion based on the YOLOv3 model. The shallow feature learned by 4 times downsampling and 2 times downsampling of the YOLOv3 model is fused with three scale features (32 times downsampling, 16 times downsampling and 8 times downsampling) in the traditional YOLOv3 model, and the gray level information of the contour texture of the shallow sunken ship learned by the YOLOv3 model is fused with the deep semantic abstract feature, so that the image has richer information.

And 4, step 4: and adding binary cross entropy to calculate the loss value. The adaptive learning rate Adam algorithm combined by Momentum and RMSProp algorithms is adopted to comprehensively consider First Moment Estimation (mean value of gradient) and Second Moment Estimation (variance of non-centralization of gradient) of the gradient, and the updating step length is calculated. The loss function of the model is shown below.

Where x, y, w, h are the center coordinates and length and width of the prediction box, S²Dividing the feature map into grids, B containing the number of prediction frames for each grid, when the jth prediction frame in the ith grid is responsible for predicting a certain object,

otherwise

When the jth prediction box in the ith grid is not responsible for predicting a certain object but is larger than the IOU of the actual box by a set threshold (here, IOU is 0.5), G_ijNot equal to 0, otherwise G_ij＝1。t_x，t_yTo predict the amount of deviation in the center of the bounding box,

as the true frame center offset, t_w，t_hTo predict the bounding box width to height scaling ratio,

for true bounding box scaling, σ is the Sigmod function, which aims to compress the computed value to [0, 1%]In between, ensuring the target center in the predicted grid cell prevents excessive drift.

The jth prediction box representing the ith mesh is responsible for the center coordinate error between the predicted and real boxes,

the jth prediction box representing the ith mesh is responsible for the wide-high error between the prediction time and the real box. C is confidence of prediction, p is probability of class, and L is cross entropy of two classesFunction, L

The jth prediction box representing the ith mesh is responsible for confidence errors in the prediction,

the jth prediction box representing the ith mesh is responsible for the classification error at the time of prediction. The coordinate of the central point and the length and the width of the prediction box adopt a mean square error, a Sigmod function sigma is used for calculating errors, the calculation amount is large, the parameter updating speed is low, the convergence time is long, and when the parameters are reversely transmitted, the gradient updating amplitude is small, so that the situation that the gradient disappears is easy to occur. Therefore, the confidence coefficient and the category error are calculated by adopting a binary cross entropy function so as to achieve better convergence effect, as shown in the following formula.

And 5: model training is performed using a transfer learning strategy. And training the network model by adopting a transfer learning strategy.

The weight parameters of the convolutional layer before the multi-scale feature fusion are frozen on the basis of the pretrained COCO data set, and part of the convolutional layer, the full-link layer and the Sigmoid output layer are initialized and retrained on the target data set, and a specific flow chart is shown in FIG. 5.

Step 6: and testing the data of the test set by using the trained model.

The invention has the beneficial effects that:

1. the YOLOv3 algorithm adopted by the invention greatly improves the recognition speed through end-to-end training and detection, and meets the requirement of the real-time property of sunken ship target recognition.

2. The invention trains the model by adopting a transfer learning mode, and shares the learned model parameters to the new model through transfer, thereby accelerating and optimizing the learning efficiency of the model, effectively relieving the limitation of small sample size of a data set, preventing the model from being over-fitted and improving the performance of the model.

3. The invention adopts multi-scale training of shallow feature fusion, enriches image information of algorithm learning while considering detection efficiency, improves nonlinear degree, increases generalization capability, improves the identification and positioning precision of the network on small-scale targets, and effectively reduces the false-alarm rate of the small-scale targets.

4. The invention re-clusters the prior frame parameters and the size by using a K-means clustering algorithm to generate the prior frame more suitable for the characteristics of the sunken ship data set, so that the predicted value and the true value can obtain better intersection ratio (IOU) to improve the target positioning precision.

5. According to the method, a two-classification cross entropy function is added to calculate the loss value, so that the model parameters have higher robustness, overfitting is effectively prevented, the convergence speed of the model is accelerated, and a better convergence effect is achieved.

Drawings

FIG. 1 is a diagram of a conventional YOLOv3 model architecture;

FIG. 2 is a diagram of Darknet-53 structure;

FIG. 3 is an exemplary diagram of the action range of the prior frame on the sunken ship target after re-clustering, (a) is an exemplary diagram of the action range of the original prior frame, (b) is an exemplary diagram of the action range of the prior frame after K-means re-clustering;

FIG. 4 is a schematic diagram of multi-scale feature training for shallow feature fusion;

FIG. 5 is a flow chart of transfer learning;

FIG. 6 is two YOLOv3 model loss values;

FIG. 7 is a P-R plot of three models, (a) (b) (c) P-R curves for the Faster R-CNN, the New learned Yolov3, and the transfer learned Yolov3 models, respectively;

FIG. 8 is a comparison of the detection results of partial targets of three models, where (a-1) (b-1) (c-1) are the detection results of three different images fast R-CNN models, respectively, (a-2) (b-2) (c-2) are the detection results of three different images new learning Yolov3 models, respectively, and (a-3) (b-3) (c-3) are the detection results of three different images migration learning Yolov3 models, respectively.

Detailed Description

The following experiments of the present invention are described in detail with reference to the accompanying drawings:

this experiment training and test are all realized with python programming under the TensorFlow frame based on, and the experimental environment is: linux is an Ubuntu18.04 version operating system; CPU is Inter (R) Xeon (R) CPU E5-2678 v3@2.50 GHz; the GPU was NVIDIA TITAN RTX, 24GB memory.

The method is based on a YOLOv3 model, the specific model structure is shown in fig. 2, the YOLOv3 model adopts a Darknet-53 network structure to extract the features of the image, as shown in the figure, the network mainly comprises 53 1 × 1 and 3 × 3 Convolutional layers (conditional), which are located in front of a Res layer, and each Convolutional layer is followed by a BN layer and a LeakyReLU layer, which jointly form a DBL, as shown in fig. 2, which is a basic component of the YOLOv3 network structure. The YOLOv3 model adds a jump connection layer and an up-sampling layer on the basis of a Darknet-53 network, and has 75 convolutional layers in total.

Step 1: the original experimental data consists of pictures and network screenshots provided by a marine surveying department and a side-scan sonar manufacturer, wherein the number of the pictures is 1000, and simultaneously, an open source software LabelImg is used for marking targets in the pictures.

Step 2: and preprocessing the image data set of the side-scan sonar sunken ship.

The step 2 comprises the following steps: standardizing pixels of the whole data set, and forcibly unifying images with inconsistent sizes into 416 pixl multiplied by 416 pixl; normalizing, converting into float32 format and dividing by 255 to obtain floating point number in the range of 0-1; clipping 50%, 60%, 70%, 80% and 90% of the image by adopting a center clipping mode, and then amplifying to the size of the original image; and fourthly, data enhancement including turning transformation, rotation transformation, color dithering, translation transformation, contrast transformation and noise disturbance is adopted to expand the data set.

The data set after pretreatment is 5000 pieces, 4000 pieces are randomly extracted as a training set in a sample balance sampling mode, and 1000 pieces are used as a testing set.

And step 3: re-clustering the prior boxes using a K-means clustering algorithm. The side-scan sonar image sunken ship target of the data set is in a flat vertical long shape, so that the prior frame of the COCO data set is not beneficial to identifying the sunken ship target by continuously using the prior frame of the COCO data set. The traditional K-means algorithm adopts Euclidean distance as similarity measurement, but in the detection algorithm, the purpose of reasonable preset anchor point frame setting is to enable a predicted value and a true value to obtain better intersection ratio (IOU). YOLOv3 was therefore derived using the K-means algorithm using a cross-over ratio as a distance measure, the distance formula being shown below.

d(b,o)＝1-IOU(b,o)

In the formula: d (b, o) is the distance between the prior frame b and the cluster center o; b_ptIs a prior frame; b_gtAre actual boxes.

The average result was ((75, 55), (85, 30), (116, 76)) by five-time clustering; ((46, 24), (52, 17), (57, 25)); ((22, 13), (31, 12), (34, 41)). As shown in fig. 3, the original prior frame cannot be well adapted to the submarine sunken ship target of the side-scan sonar image, and the re-clustered prior frame relatively better conforms to the shape characteristics of the sunken ship target.

And 4, step 4: and carrying out multi-scale feature training of shallow feature fusion. The invention carries out multi-scale training of shallow feature fusion, specifically, as shown in figure 4, the learned features of 4 times of downsampling and 2 times of downsampling are fused with the traditional three scale features, the learned information of shallow sunken ship contour texture and the like is fused with the deep semantic abstract features, the feature proportion of contour texture gray scale change and the like is increased, and the image has richer information. Through multi-scale fusion training of the shallow features, learning of the shallow and deep features can be guaranteed while detection efficiency is considered, the non-linear degree is improved, generalization capability is improved, and the identification and positioning accuracy of a network on small-scale targets is improved.

And 5: and adding binary cross entropy to calculate the loss value. Aiming at the problems that a side scan sonar sunken ship data set has few samples and a gradient has great noise, an appropriate initial learning rate is difficult to select during model training, if the learning rate is too small, the convergence rate is very slow, if the learning rate is too large, a loss value can continuously oscillate or even deviate at a minimum value, and meanwhile the same learning rate cannot be applied to the learning of each parameter, in order to enable the model to learn more detailed image characteristics as much as possible and obtain an optimal value of a parameter solution, a First Moment Estimation (First Moment Estimation, namely a mean value of the gradient) and a Second Moment Estimation (Second Moment Estimation, namely a non-centralized variance of the gradient) are comprehensively considered by adopting an adaptive learning rate Adam algorithm combined by a Momentum algorithm and a RMSProp algorithm, and an updating step length is calculated. Because the updating of the model parameters is not influenced by the expansion change of the gradient, the noise sample can be better processed, the learning rate can be automatically adjusted, the robustness on the parameters is higher, the model can better achieve convergence, and overfitting is effectively prevented. The loss function of the YOLOv3 model is shown below.

When a prediction box predicts an object

Otherwise

When the prediction block is not responsible for predicting a certain object but the IOU with the actual block is larger than a set threshold (here, IOU ═ 0.5), G_ijNot equal to 0, otherwise G _ij1. x, y, w and h are the central coordinates and the length and width of the prediction box, S is the number of grids into which the feature map is divided, B is the number of prediction boxes contained in each grid, C is the confidence coefficient of prediction, and p is the probability of the category. Wherein the coordinates of the central point of the prediction frame and the length and width are both adoptedVariance, error is calculated using a Sigmod function σ. Because the partial derivatives of the parameters of the mean square error are multiplied by the derivative sigma 'of sigmoid, the sigma' approaches to 0 when the variable value is large or small, the gradient updating amplitude is small, the parameter updating speed is slow, and the convergence time is long. Therefore, the confidence coefficient and the category error of the invention are calculated by adopting a two-category cross entropy function so as to achieve better convergence effect, as shown in the following formula.

Step 6: model training is performed using a transfer learning strategy. And training the network model by adopting a transfer learning strategy. The retraining of a complex convolutional neural network needs massive data resources, a large amount of computing resources and time resources, and considering that certain correlation exists among tasks, knowledge obtained in the previous task can be directly applied to a new task through micro transformation even without any change, when the common and effective knowledge is difficult to obtain in the new task by using a small amount of data, the model parameters which are learned can be shared to the new model through migration learning, so that the learning efficiency of the model is accelerated and optimized, the repeated labor and the dependence on target task training data are reduced, and the model performance is improved.

According to the characteristic that the convolutional neural network learns deeper abstract specific target characteristics along with the increase of the depth of the convolutional layer, the characteristics of superficial textures, contours, colors and the like belong to universal superficial characteristics, and are obtained by learning of the superficial convolutional layer, so that the mobility is high. And the convolution layer after the multi-scale feature fusion belongs to a deep convolution layer, and the characteristics of the learnt and extracted image are abstract and low in mobility. Therefore, the invention freezes the weight parameters of the convolution layer before the multi-scale feature fusion on the basis of the pretrained COCO data set, initializes and retrains the 59 th convolution layer, the 67 th convolution layer, the 75 th convolution layer, the full connection layer and the Sigmoid output layer on the target data set, and the specific flow chart is shown in FIG. 5.

And 7: a small batch gradient descent method is adopted, namely all pictures are input into the model training in 88 batches, 64 pictures of each batch of input training (batch size) are trained, and 1000 steps (epoch) are trained. The loss values of the two YOLOv3 models are shown in fig. 6, and the loss values of the two models decrease continuously with the increase of the training times and finally tend to be stable. The newly learned Yolov3 model tends to be stable after 600 training steps, and the loss value is finally maintained at about 5.5. The migration learning YOLOv3 model shares part of shallow feature extraction parameters of the training completion model based on the COCO data set, the initial loss value is low and drops quickly, but because the side scan sonar sunken ship data set is different from the COCO data set to a certain extent after all, the learning of abstract features uses a large number of parameters on the COCO data set, the number is large and the difference is large, the loss value fluctuation is large in the first 350 steps of training, but because the model can well obtain the position information of a target, and meanwhile cross entropy is adopted for error calculation, the model loss value tends to be stable afterwards, and the final loss value tends to converge after 750 steps of training, is maintained to be about 4.3 and is lower than the loss value of a newly learned YOLOv3 model, and the migration learning-based YOLOv3 model is proved to have better generalization capability.

And 8: and testing the data of the test set by using the trained model.

The evaluation criteria used in this experiment were the Average accuracy (AP Average Precision) which is an index reflecting the performance of the entire model, and the harmonic mean F1, which is the area value of the P-R (Precision-reduce) curve, i.e. the Average Precision. Wherein Precision (accuracy rate is also called Precision rate) indicates how many detected targets are accurate, and the accuracy of the result is measured; recall (Recall also called Recall) indicates how many accurate targets have been detected and measures the integrity of the results.

The classified samples can be classified into four types according to the classification result: correctly classified positive samples (TP true positives), incorrectly classified positive samples (FP false positives), correctly classified negative samples (TN true negatives), and incorrectly classified negative samples (FN false negatives). TP + FP is the total number of classified samples and TP + FN is the total number of positive samples. The available AP is defined by

The P-R curves of the Faster R-CNN model, the brand-new learning Yolov3 model and the transfer learning Yolov3 model are respectively shown in FIG. 7, and the larger the area of the curve and the coordinate axis is, the better the model effect is. Comparing the P-R curves and the AP values of the three models, the AP values are respectively 87.72%, 89.18% and 89.49%, the AP value of the YOLOv3 model is obviously higher than that of the Faster R-CNN model, and the average accuracy of the YOLOv3 model based on the transfer learning is improved by 0.31% compared with that of the new learning YOLOv3 model. The accuracy of the Faster R-CNN model reaches 88% under the condition that the recall ratio reaches 85%, and the accuracy is greatly reduced under the condition that the recall ratio is further improved. The YOLOv3 model has a slower descending trend, keeps higher accuracy while keeping higher recall ratio, the accuracy of the newly learned YOLOv3 is 89% when the recall ratio reaches 90%, and compared with the YOLOv3 model based on the migration learning, the P-R curve of the newly learned YOLOv3 model has a slower descending amplitude, the area of the curve and the coordinate axis is larger, and the accuracy of the YOLOv3 model based on the migration learning reaches 91% when the recall ratio is 90%, so that the OLOv3 model based on the migration learning has a better effect on the identification of the side scan sonar sunken ship target.

F1 is the harmonic mean of the accuracy and recall, and the overall performance of the algorithm is characterized using the F1 value shown below.

Both interval confidence and IOU are set to 0.5 herein. The test results for the three models are shown in table 1.

TABLE 1 comparison of test results of three models

Tab1 Comparison of Test results of Three Models

As can be seen from Table 1, the identification accuracy of the Faster R-CNN model is higher than that of the newly learned YOLOv3 model by 2.37%, but is lower than that of the recall ratio by 6%, which proves that the performance of the Faster R-CNN model on small target detection is inferior to that of the YOLOv3 model, and the F1 value is lower by 2.33%, and the newly learned YOLOv3 model is superior to the Faster R-CNN model in consideration of the comprehensive performance of the model. The accuracy, recall ratio, AP value and F1 of the migration learning-based YOLOv3 model are higher than those of the other two models, the AP value is respectively improved by 1.77% and 0.31% compared with an Faster R-CNN model and a brand-new learning YOLOv3 model, and the F1 value is respectively improved by 1.63% and 3.96%. For the comprehensive performance of the model, the detection speed is also an important index for measurement, and as can be seen from the table, the time for detecting one picture by the Faster R-CNN is 2.8s, while the time for detecting one picture by the Yolov3 model is 0.17s, which is only 3/50 of the Faster R-CNN, so that the detection efficiency is greatly improved, and the comprehensive performance based on the migration learning Yolov3 model is obviously superior to that of the Faster R-CNN model and is improved to a certain extent compared with the full new learning Yolov3 model.

Fig. 8 compares the detection effects of the three models on the ship sinking target of the partial side-scan sonar image, and the detection results of the fast R-CNN, the new learning YOLOv3 and the transfer learning YOLOv3 models are respectively shown from left to right. The characteristics of (a-1) to (a-3) in FIG. 8 are that the sunken ship targets are different in size and have diversity of scales. From the recognition effect, the Faster R-CNN model can well recognize large-scale sunken ship targets, but the recognition effect on small-scale sunken ship targets is poor, and the false alarm rate is high. Compared with the Faster R-CNN model, the newly learned YOLOv3 model has a larger improvement on the identification performance of small targets, and the red boxes represent the identified missed targets. The migration learning YOLOv3 model is further improved in the recognition performance of small-scale targets compared with the completely new learning, but when the sunken ship targets are closely arranged, the positioning accuracy is reduced to a certain extent, two sunken ship targets are falsely detected into one target, but in general terms, the YOLOv3 model can better recognize and distinguish the small-scale targets, and the bad alarm rate of the model is greatly reduced. As can be seen from (b-1) to (b-3) and (c-1) to (c-3) in FIG. 8, the detection boxes of the migration learning-based YOLOv3 model are higher in intersection ratio with the actual boxes and more accurate in positioning, and the IOUs (b-1) to (b-3) in FIG. 8 are 69.92%, 75.93% and 86.09%, respectively, and the IOUs (c-1) to (c-3) in FIG. 8 are 77.03%, 69.32% and 91.15%, respectively. Meanwhile, the confidence degrees of the three models to the sunken ship targets (b-1) to (b-3) in the graph 8 are respectively 98.88%, 98.97% and 99.07%, and the confidence degrees to the sunken ship targets (c-1) to (c-3) in the graph 8 are respectively 96.51%, 94.45% and 99.42%, so that the migration learning based Yolov3 model has higher recognition accuracy and positioning accuracy, and the comprehensive performance of the model is better than that of the other two models.

Claims

1. An improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning is characterized by comprising the following steps:

step 1: preprocessing the image data set of the side-scan sonar sunken ship;

the step 1 comprises the following steps: standardizing pixels of the whole data set image, and forcibly unifying images with inconsistent sizes into the same pixel; normalizing, converting into float32 format and converting into floating point number in 0-1 range; thirdly, cutting different proportions of the image by adopting a center cutting mode, and then amplifying the image to the size of the original image; fourthly, data enhancement is adopted to expand the data set;

step 2: re-clustering the prior frames by using a K-means clustering algorithm; the side-scan sonar image sunken ship target of the data set is flat and vertical, and YOLOv3 is obtained by adopting a K-means algorithm taking an intersection ratio as a distance measurement, and the distance formula is shown as follows;

d(b,o)＝1-IOU(b,o)

in the formula: d (b, o) is the distance between the prior frame b and the cluster center o; IOU (b, o) is the intersection-parallel ratio between the prior frame b and the clustering frame o; b_ptIs a prior frame; b_gtIs a real frame;

obtaining a prior frame which relatively accords with the shape characteristics of the sunken ship target through multiple clustering;

and step 3: carrying out multi-scale feature training of shallow feature fusion based on a YOLOv3 model; the shallow feature learned by 4 times of downsampling and 2 times of downsampling of a YOLOv3 model is fused with three scale features in a traditional YOLOv3 model, and the shallow sunken ship contour texture gray information learned by the YOLOv3 model is fused with deep semantic abstract features, so that the image has richer information;

and 4, step 4: adding a binary cross entropy to calculate a loss value; the adaptive learning rate Adam algorithm combined by Momentum and RMSProp algorithms is adopted to comprehensively consider the first moment estimation and the second moment estimation of the gradient and calculate the updating step length; the loss function of the model is shown below;

otherwise

When the j prediction box in the ith grid is not responsible for predicting a certain object but is larger than the IOU of the actual box by a set threshold value, G_ijNot equal to 0, otherwise G_ij＝1；t_x，t_yTo predict the amount of deviation in the center of the bounding box,

for true bounding box scaling, σ is the Sigmod function, which aims to compress the computed value to [0, 1%]Ensuring that the target center is in a predicted grid unit, and preventing excessive deviation;

the jth prediction box representing the ith grid is responsible for width and height errors between prediction time and a real box; c is confidence of prediction, p is probability of class, L is cross entropy function of two classes, L

the jth prediction box representing the ith grid is responsible for the classification error during prediction; the coordinate of the central point and the length and the width of the prediction box adopt a mean square error, a Sigmod function sigma is used for calculating errors, the calculated amount is large, the parameter updating speed is low, the convergence time is long, and when the parameters are reversely transmitted, the gradient updating amplitude is small, so that the situation that the gradient disappears is easy to occur; therefore, the confidence coefficient and the category error are calculated by adopting a two-category cross entropy function to achieve a better convergence effect, which is as follows:

and 5: performing model training by using a transfer learning strategy; training a network model by adopting a transfer learning strategy;

on the basis of the pretrained COCO data set, the weight parameters of the convolutional layer before the multi-scale feature fusion are frozen, part of the convolutional layer, the full-link layer and the Sigmoid output layer are initialized and retrained on the target data set, and a specific flow chart is shown in FIG. 5;

step 6: and testing the data of the test set by using the trained model.