CN112052817A - Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning - Google Patents

Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning Download PDF

Info

Publication number
CN112052817A
CN112052817A CN202010967912.1A CN202010967912A CN112052817A CN 112052817 A CN112052817 A CN 112052817A CN 202010967912 A CN202010967912 A CN 202010967912A CN 112052817 A CN112052817 A CN 112052817A
Authority
CN
China
Prior art keywords
model
prediction
sunken ship
image
scan sonar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010967912.1A
Other languages
Chinese (zh)
Other versions
CN112052817B (en
Inventor
金绍华
汤寓麟
边刚
张永厚
王美娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA Dalian Naval Academy
Original Assignee
PLA Dalian Naval Academy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA Dalian Naval Academy filed Critical PLA Dalian Naval Academy
Priority to CN202010967912.1A priority Critical patent/CN112052817B/en
Publication of CN112052817A publication Critical patent/CN112052817A/en
Application granted granted Critical
Publication of CN112052817B publication Critical patent/CN112052817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning belongs to the technical field of side-scan sonar image target identification and deep learning. The invention provides an improved YOLOv3 model side-scan sonar image sunken ship target identification method based on transfer learning, which solves the problems of manual interpretation and manual feature extraction of the existing side-scan sonar image, and solves the problems of poor effect, high alarm-missing rate and low identification speed of an Faster R-CNN model on small target identification. The identification and positioning precision of the sunken ship target is further improved, the model achieves a better convergence effect, and the purposes of improving the overall performance of the model and detecting in real time are finally achieved.

Description

Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning
Technical Field
The invention belongs to the technical field of side-scan sonar image target identification and deep learning, relates to an improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning, is an improved identification algorithm in the technical field of deep learning target identification, and is applied to side-scan sonar image sunken ship target identification.
Background
How to accurately, quickly and efficiently search for the wrecking ship is only an important component of maritime search and rescue and obstacle check. The side-scan sonar can be used for detecting submarine targets, and plays a key role in emergency search and rescue. The side-scan sonar detection generally adopts a towing measurement mode, is influenced by the offshore maneuvering and the length of a towing cable, and generally has the underwater penetration depth of only dozens of meters, which has the defects of low resolution, unobvious target image characteristics and poor sonar image quality in deep sea area measurement. At present, the side-scan sonar image mainly adopts a manual interpretation mode, and has higher requirements on the resolution of the image. The Autonomous Underwater Vehicle (AUV) carries a side-scanning deep-sea submarine, carries out deep-sea high-precision and high-resolution submarine sunken vessel detection, can make up for the defects of ship-borne towing type measurement, but due to the limitation of underwater acoustic communication, the scanned sunken vessel data cannot be transmitted in real time, so that the searching efficiency is low, and the gold time for rescue is missed. Meanwhile, the traditional manual interpretation has the problems of low efficiency, long time consumption, large resource consumption, strong subjective uncertainty, excessive dependence on experience and the like. In order to solve the problems of the manual interpretation mode and weaken the influence of artificial subjective factors, domestic and foreign scholars conduct extensive research, and the method mainly comprises the steps of utilizing a basic algorithm of image processing, an image processing algorithm based on a Pulse Coupled Neural Network (PCNN), a morphological image processing algorithm and identifying a side-scan sonar image target through median filtering, binarization processing, noise suppression, gain negative feedback control, edge feature extraction, image enhancement, image segmentation and the like. Although some typical targets can be identified under the condition of manual intervention, the methods have the problems of high difficulty in feature design, complex processing process, low detection precision and reliability, weak generalization capability and the like due to the fact that manual intervention is more.
In recent years, Convolutional Neural Networks (CNN) have been widely used in various fields such as target location and detection, image classification and recognition, face verification, traffic sign recognition, and speech recognition. The inventor proposes a method for identifying a submarine sunken ship target by side-scan sonar images by using a Faster R-CNN model, and the method has high identification precision, but the time consumption of a region generation network (RPN) network generation suggestion frame is too much, so that the model processing speed is low, and the real-time requirement of sunken ship maritime search and rescue cannot be met. Meanwhile, the ship-sinking target of the side-scan sonar image is generally small in proportion and belongs to a small-scale target, and the fast R-CNN model loses part of position information while acquiring rich semantic information due to regression prediction in a deep characteristic diagram of a convolutional network, so that the small target recognition effect is poor, and the problem of high false alarm rate exists.
Meanwhile, although the convolutional neural network is widely applied in various fields, the performance of the convolutional neural network can be shown only under the conditions that the network structure is relatively complex and the training samples are enough, and the convolutional neural network usually has millions of parameters, so that a large number of labeled samples are required to be used for training the convolutional neural network. And the side-scan sonar sunken ship has less image data, and the model is easy to generate the phenomena of overfitting, falling into a local optimal solution, poor model generalization capability and the like during training.
Disclosure of Invention
The invention aims to provide an improved YOLOv3 model side-scan sonar image sunken ship target identification method based on transfer learning, solve the problems of manual interpretation and manual feature extraction of the existing side-scan sonar image, and solve the problems of poor effect, high false alarm rate and low identification speed of an Faster R-CNN model on small target identification. The identification and positioning precision of the sunken ship target is further improved, the model achieves a better convergence effect, and the purposes of improving the overall performance of the model and detecting in real time are finally achieved.
The technical scheme adopted by the invention for solving the technical problems is as follows:
an improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning comprises the following steps:
step 1: preprocessing the image data set of the side-scan sonar sunken ship;
the step 1 comprises the following steps: standardizing pixels of the whole data set image, and forcibly unifying images with inconsistent sizes into the same pixel; normalizing, converting into float32 format and converting into floating point number in 0-1 range; thirdly, cutting different proportions of the image by adopting a center cutting mode, and then amplifying the image to the size of the original image; and fourthly, expanding the data set by adopting data enhancement.
Step 2: re-clustering the prior boxes using a K-means clustering algorithm. The side-scan sonar image sunken ship target of the data set is in a flat and vertical long shape, and YOLOv3 is obtained by adopting a K-means algorithm taking an intersection ratio as a distance measurement, and the distance formula is shown as follows.
d(b,o)=1-IOU(b,o)
Figure BDA0002683012970000021
In the formula: d (b, o) is the distance between the prior frame b and the cluster center o; IOU (b, o) is the intersection-parallel ratio between the prior frame b and the clustering frame o; bptIs a prior frame; bgtAre actual boxes.
And obtaining a prior frame which relatively better accords with the shape characteristics of the sunken ship target through multiple clustering.
And step 3: and carrying out multi-scale feature training of shallow feature fusion based on the YOLOv3 model. The shallow feature learned by 4 times downsampling and 2 times downsampling of the YOLOv3 model is fused with three scale features (32 times downsampling, 16 times downsampling and 8 times downsampling) in the traditional YOLOv3 model, and the gray level information of the contour texture of the shallow sunken ship learned by the YOLOv3 model is fused with the deep semantic abstract feature, so that the image has richer information.
And 4, step 4: and adding binary cross entropy to calculate the loss value. The adaptive learning rate Adam algorithm combined by Momentum and RMSProp algorithms is adopted to comprehensively consider First Moment Estimation (mean value of gradient) and Second Moment Estimation (variance of non-centralization of gradient) of the gradient, and the updating step length is calculated. The loss function of the model is shown below.
Figure BDA0002683012970000031
Where x, y, w, h are the center coordinates and length and width of the prediction box, S2Dividing the feature map into grids, B containing the number of prediction frames for each grid, when the jth prediction frame in the ith grid is responsible for predicting a certain object,
Figure BDA0002683012970000032
otherwise
Figure BDA0002683012970000033
When the jth prediction box in the ith grid is not responsible for predicting a certain object but is larger than the IOU of the actual box by a set threshold (here, IOU is 0.5), GijNot equal to 0, otherwise Gij=1。tx,tyTo predict the amount of deviation in the center of the bounding box,
Figure BDA0002683012970000034
as the true frame center offset, tw,thTo predict the bounding box width to height scaling ratio,
Figure BDA0002683012970000035
for true bounding box scaling, σ is the Sigmod function, which aims to compress the computed value to [0, 1%]In between, ensuring the target center in the predicted grid cell prevents excessive drift.
Figure BDA0002683012970000036
The jth prediction box representing the ith mesh is responsible for the center coordinate error between the predicted and real boxes,
Figure BDA0002683012970000037
the jth prediction box representing the ith mesh is responsible for the wide-high error between the prediction time and the real box. C is confidence of prediction, p is probability of class, and L is cross entropy of two classesFunction, L
Figure BDA0002683012970000038
The jth prediction box representing the ith mesh is responsible for confidence errors in the prediction,
Figure BDA0002683012970000039
the jth prediction box representing the ith mesh is responsible for the classification error at the time of prediction. The coordinate of the central point and the length and the width of the prediction box adopt a mean square error, a Sigmod function sigma is used for calculating errors, the calculation amount is large, the parameter updating speed is low, the convergence time is long, and when the parameters are reversely transmitted, the gradient updating amplitude is small, so that the situation that the gradient disappears is easy to occur. Therefore, the confidence coefficient and the category error are calculated by adopting a binary cross entropy function so as to achieve better convergence effect, as shown in the following formula.
Figure BDA00026830129700000310
Figure BDA00026830129700000311
Figure BDA00026830129700000312
And 5: model training is performed using a transfer learning strategy. And training the network model by adopting a transfer learning strategy.
The weight parameters of the convolutional layer before the multi-scale feature fusion are frozen on the basis of the pretrained COCO data set, and part of the convolutional layer, the full-link layer and the Sigmoid output layer are initialized and retrained on the target data set, and a specific flow chart is shown in FIG. 5.
Step 6: and testing the data of the test set by using the trained model.
The invention has the beneficial effects that:
1. the YOLOv3 algorithm adopted by the invention greatly improves the recognition speed through end-to-end training and detection, and meets the requirement of the real-time property of sunken ship target recognition.
2. The invention trains the model by adopting a transfer learning mode, and shares the learned model parameters to the new model through transfer, thereby accelerating and optimizing the learning efficiency of the model, effectively relieving the limitation of small sample size of a data set, preventing the model from being over-fitted and improving the performance of the model.
3. The invention adopts multi-scale training of shallow feature fusion, enriches image information of algorithm learning while considering detection efficiency, improves nonlinear degree, increases generalization capability, improves the identification and positioning precision of the network on small-scale targets, and effectively reduces the false-alarm rate of the small-scale targets.
4. The invention re-clusters the prior frame parameters and the size by using a K-means clustering algorithm to generate the prior frame more suitable for the characteristics of the sunken ship data set, so that the predicted value and the true value can obtain better intersection ratio (IOU) to improve the target positioning precision.
5. According to the method, a two-classification cross entropy function is added to calculate the loss value, so that the model parameters have higher robustness, overfitting is effectively prevented, the convergence speed of the model is accelerated, and a better convergence effect is achieved.
Drawings
FIG. 1 is a diagram of a conventional YOLOv3 model architecture;
FIG. 2 is a diagram of Darknet-53 structure;
FIG. 3 is an exemplary diagram of the action range of the prior frame on the sunken ship target after re-clustering, (a) is an exemplary diagram of the action range of the original prior frame, (b) is an exemplary diagram of the action range of the prior frame after K-means re-clustering;
FIG. 4 is a schematic diagram of multi-scale feature training for shallow feature fusion;
FIG. 5 is a flow chart of transfer learning;
FIG. 6 is two YOLOv3 model loss values;
FIG. 7 is a P-R plot of three models, (a) (b) (c) P-R curves for the Faster R-CNN, the New learned Yolov3, and the transfer learned Yolov3 models, respectively;
FIG. 8 is a comparison of the detection results of partial targets of three models, where (a-1) (b-1) (c-1) are the detection results of three different images fast R-CNN models, respectively, (a-2) (b-2) (c-2) are the detection results of three different images new learning Yolov3 models, respectively, and (a-3) (b-3) (c-3) are the detection results of three different images migration learning Yolov3 models, respectively.
Detailed Description
The following experiments of the present invention are described in detail with reference to the accompanying drawings:
this experiment training and test are all realized with python programming under the TensorFlow frame based on, and the experimental environment is: linux is an Ubuntu18.04 version operating system; CPU is Inter (R) Xeon (R) CPU E5-2678 v3@2.50 GHz; the GPU was NVIDIA TITAN RTX, 24GB memory.
The method is based on a YOLOv3 model, the specific model structure is shown in fig. 2, the YOLOv3 model adopts a Darknet-53 network structure to extract the features of the image, as shown in the figure, the network mainly comprises 53 1 × 1 and 3 × 3 Convolutional layers (conditional), which are located in front of a Res layer, and each Convolutional layer is followed by a BN layer and a LeakyReLU layer, which jointly form a DBL, as shown in fig. 2, which is a basic component of the YOLOv3 network structure. The YOLOv3 model adds a jump connection layer and an up-sampling layer on the basis of a Darknet-53 network, and has 75 convolutional layers in total.
Step 1: the original experimental data consists of pictures and network screenshots provided by a marine surveying department and a side-scan sonar manufacturer, wherein the number of the pictures is 1000, and simultaneously, an open source software LabelImg is used for marking targets in the pictures.
Step 2: and preprocessing the image data set of the side-scan sonar sunken ship.
The step 2 comprises the following steps: standardizing pixels of the whole data set, and forcibly unifying images with inconsistent sizes into 416 pixl multiplied by 416 pixl; normalizing, converting into float32 format and dividing by 255 to obtain floating point number in the range of 0-1; clipping 50%, 60%, 70%, 80% and 90% of the image by adopting a center clipping mode, and then amplifying to the size of the original image; and fourthly, data enhancement including turning transformation, rotation transformation, color dithering, translation transformation, contrast transformation and noise disturbance is adopted to expand the data set.
The data set after pretreatment is 5000 pieces, 4000 pieces are randomly extracted as a training set in a sample balance sampling mode, and 1000 pieces are used as a testing set.
And step 3: re-clustering the prior boxes using a K-means clustering algorithm. The side-scan sonar image sunken ship target of the data set is in a flat vertical long shape, so that the prior frame of the COCO data set is not beneficial to identifying the sunken ship target by continuously using the prior frame of the COCO data set. The traditional K-means algorithm adopts Euclidean distance as similarity measurement, but in the detection algorithm, the purpose of reasonable preset anchor point frame setting is to enable a predicted value and a true value to obtain better intersection ratio (IOU). YOLOv3 was therefore derived using the K-means algorithm using a cross-over ratio as a distance measure, the distance formula being shown below.
d(b,o)=1-IOU(b,o)
Figure BDA0002683012970000051
In the formula: d (b, o) is the distance between the prior frame b and the cluster center o; bptIs a prior frame; bgtAre actual boxes.
The average result was ((75, 55), (85, 30), (116, 76)) by five-time clustering; ((46, 24), (52, 17), (57, 25)); ((22, 13), (31, 12), (34, 41)). As shown in fig. 3, the original prior frame cannot be well adapted to the submarine sunken ship target of the side-scan sonar image, and the re-clustered prior frame relatively better conforms to the shape characteristics of the sunken ship target.
And 4, step 4: and carrying out multi-scale feature training of shallow feature fusion. The invention carries out multi-scale training of shallow feature fusion, specifically, as shown in figure 4, the learned features of 4 times of downsampling and 2 times of downsampling are fused with the traditional three scale features, the learned information of shallow sunken ship contour texture and the like is fused with the deep semantic abstract features, the feature proportion of contour texture gray scale change and the like is increased, and the image has richer information. Through multi-scale fusion training of the shallow features, learning of the shallow and deep features can be guaranteed while detection efficiency is considered, the non-linear degree is improved, generalization capability is improved, and the identification and positioning accuracy of a network on small-scale targets is improved.
And 5: and adding binary cross entropy to calculate the loss value. Aiming at the problems that a side scan sonar sunken ship data set has few samples and a gradient has great noise, an appropriate initial learning rate is difficult to select during model training, if the learning rate is too small, the convergence rate is very slow, if the learning rate is too large, a loss value can continuously oscillate or even deviate at a minimum value, and meanwhile the same learning rate cannot be applied to the learning of each parameter, in order to enable the model to learn more detailed image characteristics as much as possible and obtain an optimal value of a parameter solution, a First Moment Estimation (First Moment Estimation, namely a mean value of the gradient) and a Second Moment Estimation (Second Moment Estimation, namely a non-centralized variance of the gradient) are comprehensively considered by adopting an adaptive learning rate Adam algorithm combined by a Momentum algorithm and a RMSProp algorithm, and an updating step length is calculated. Because the updating of the model parameters is not influenced by the expansion change of the gradient, the noise sample can be better processed, the learning rate can be automatically adjusted, the robustness on the parameters is higher, the model can better achieve convergence, and overfitting is effectively prevented. The loss function of the YOLOv3 model is shown below.
Figure BDA0002683012970000061
When a prediction box predicts an object
Figure BDA0002683012970000062
Otherwise
Figure BDA0002683012970000063
When the prediction block is not responsible for predicting a certain object but the IOU with the actual block is larger than a set threshold (here, IOU ═ 0.5), GijNot equal to 0, otherwise G ij1. x, y, w and h are the central coordinates and the length and width of the prediction box, S is the number of grids into which the feature map is divided, B is the number of prediction boxes contained in each grid, C is the confidence coefficient of prediction, and p is the probability of the category. Wherein the coordinates of the central point of the prediction frame and the length and width are both adoptedVariance, error is calculated using a Sigmod function σ. Because the partial derivatives of the parameters of the mean square error are multiplied by the derivative sigma 'of sigmoid, the sigma' approaches to 0 when the variable value is large or small, the gradient updating amplitude is small, the parameter updating speed is slow, and the convergence time is long. Therefore, the confidence coefficient and the category error of the invention are calculated by adopting a two-category cross entropy function so as to achieve better convergence effect, as shown in the following formula.
Figure BDA0002683012970000071
Step 6: model training is performed using a transfer learning strategy. And training the network model by adopting a transfer learning strategy. The retraining of a complex convolutional neural network needs massive data resources, a large amount of computing resources and time resources, and considering that certain correlation exists among tasks, knowledge obtained in the previous task can be directly applied to a new task through micro transformation even without any change, when the common and effective knowledge is difficult to obtain in the new task by using a small amount of data, the model parameters which are learned can be shared to the new model through migration learning, so that the learning efficiency of the model is accelerated and optimized, the repeated labor and the dependence on target task training data are reduced, and the model performance is improved.
According to the characteristic that the convolutional neural network learns deeper abstract specific target characteristics along with the increase of the depth of the convolutional layer, the characteristics of superficial textures, contours, colors and the like belong to universal superficial characteristics, and are obtained by learning of the superficial convolutional layer, so that the mobility is high. And the convolution layer after the multi-scale feature fusion belongs to a deep convolution layer, and the characteristics of the learnt and extracted image are abstract and low in mobility. Therefore, the invention freezes the weight parameters of the convolution layer before the multi-scale feature fusion on the basis of the pretrained COCO data set, initializes and retrains the 59 th convolution layer, the 67 th convolution layer, the 75 th convolution layer, the full connection layer and the Sigmoid output layer on the target data set, and the specific flow chart is shown in FIG. 5.
And 7: a small batch gradient descent method is adopted, namely all pictures are input into the model training in 88 batches, 64 pictures of each batch of input training (batch size) are trained, and 1000 steps (epoch) are trained. The loss values of the two YOLOv3 models are shown in fig. 6, and the loss values of the two models decrease continuously with the increase of the training times and finally tend to be stable. The newly learned Yolov3 model tends to be stable after 600 training steps, and the loss value is finally maintained at about 5.5. The migration learning YOLOv3 model shares part of shallow feature extraction parameters of the training completion model based on the COCO data set, the initial loss value is low and drops quickly, but because the side scan sonar sunken ship data set is different from the COCO data set to a certain extent after all, the learning of abstract features uses a large number of parameters on the COCO data set, the number is large and the difference is large, the loss value fluctuation is large in the first 350 steps of training, but because the model can well obtain the position information of a target, and meanwhile cross entropy is adopted for error calculation, the model loss value tends to be stable afterwards, and the final loss value tends to converge after 750 steps of training, is maintained to be about 4.3 and is lower than the loss value of a newly learned YOLOv3 model, and the migration learning-based YOLOv3 model is proved to have better generalization capability.
And 8: and testing the data of the test set by using the trained model.
The evaluation criteria used in this experiment were the Average accuracy (AP Average Precision) which is an index reflecting the performance of the entire model, and the harmonic mean F1, which is the area value of the P-R (Precision-reduce) curve, i.e. the Average Precision. Wherein Precision (accuracy rate is also called Precision rate) indicates how many detected targets are accurate, and the accuracy of the result is measured; recall (Recall also called Recall) indicates how many accurate targets have been detected and measures the integrity of the results.
Figure BDA0002683012970000081
Figure BDA0002683012970000082
The classified samples can be classified into four types according to the classification result: correctly classified positive samples (TP true positives), incorrectly classified positive samples (FP false positives), correctly classified negative samples (TN true negatives), and incorrectly classified negative samples (FN false negatives). TP + FP is the total number of classified samples and TP + FN is the total number of positive samples. The available AP is defined by
Figure BDA0002683012970000083
The P-R curves of the Faster R-CNN model, the brand-new learning Yolov3 model and the transfer learning Yolov3 model are respectively shown in FIG. 7, and the larger the area of the curve and the coordinate axis is, the better the model effect is. Comparing the P-R curves and the AP values of the three models, the AP values are respectively 87.72%, 89.18% and 89.49%, the AP value of the YOLOv3 model is obviously higher than that of the Faster R-CNN model, and the average accuracy of the YOLOv3 model based on the transfer learning is improved by 0.31% compared with that of the new learning YOLOv3 model. The accuracy of the Faster R-CNN model reaches 88% under the condition that the recall ratio reaches 85%, and the accuracy is greatly reduced under the condition that the recall ratio is further improved. The YOLOv3 model has a slower descending trend, keeps higher accuracy while keeping higher recall ratio, the accuracy of the newly learned YOLOv3 is 89% when the recall ratio reaches 90%, and compared with the YOLOv3 model based on the migration learning, the P-R curve of the newly learned YOLOv3 model has a slower descending amplitude, the area of the curve and the coordinate axis is larger, and the accuracy of the YOLOv3 model based on the migration learning reaches 91% when the recall ratio is 90%, so that the OLOv3 model based on the migration learning has a better effect on the identification of the side scan sonar sunken ship target.
F1 is the harmonic mean of the accuracy and recall, and the overall performance of the algorithm is characterized using the F1 value shown below.
Both interval confidence and IOU are set to 0.5 herein. The test results for the three models are shown in table 1.
Figure BDA0002683012970000084
TABLE 1 comparison of test results of three models
Tab1 Comparison of Test results of Three Models
Figure BDA0002683012970000091
As can be seen from Table 1, the identification accuracy of the Faster R-CNN model is higher than that of the newly learned YOLOv3 model by 2.37%, but is lower than that of the recall ratio by 6%, which proves that the performance of the Faster R-CNN model on small target detection is inferior to that of the YOLOv3 model, and the F1 value is lower by 2.33%, and the newly learned YOLOv3 model is superior to the Faster R-CNN model in consideration of the comprehensive performance of the model. The accuracy, recall ratio, AP value and F1 of the migration learning-based YOLOv3 model are higher than those of the other two models, the AP value is respectively improved by 1.77% and 0.31% compared with an Faster R-CNN model and a brand-new learning YOLOv3 model, and the F1 value is respectively improved by 1.63% and 3.96%. For the comprehensive performance of the model, the detection speed is also an important index for measurement, and as can be seen from the table, the time for detecting one picture by the Faster R-CNN is 2.8s, while the time for detecting one picture by the Yolov3 model is 0.17s, which is only 3/50 of the Faster R-CNN, so that the detection efficiency is greatly improved, and the comprehensive performance based on the migration learning Yolov3 model is obviously superior to that of the Faster R-CNN model and is improved to a certain extent compared with the full new learning Yolov3 model.
Fig. 8 compares the detection effects of the three models on the ship sinking target of the partial side-scan sonar image, and the detection results of the fast R-CNN, the new learning YOLOv3 and the transfer learning YOLOv3 models are respectively shown from left to right. The characteristics of (a-1) to (a-3) in FIG. 8 are that the sunken ship targets are different in size and have diversity of scales. From the recognition effect, the Faster R-CNN model can well recognize large-scale sunken ship targets, but the recognition effect on small-scale sunken ship targets is poor, and the false alarm rate is high. Compared with the Faster R-CNN model, the newly learned YOLOv3 model has a larger improvement on the identification performance of small targets, and the red boxes represent the identified missed targets. The migration learning YOLOv3 model is further improved in the recognition performance of small-scale targets compared with the completely new learning, but when the sunken ship targets are closely arranged, the positioning accuracy is reduced to a certain extent, two sunken ship targets are falsely detected into one target, but in general terms, the YOLOv3 model can better recognize and distinguish the small-scale targets, and the bad alarm rate of the model is greatly reduced. As can be seen from (b-1) to (b-3) and (c-1) to (c-3) in FIG. 8, the detection boxes of the migration learning-based YOLOv3 model are higher in intersection ratio with the actual boxes and more accurate in positioning, and the IOUs (b-1) to (b-3) in FIG. 8 are 69.92%, 75.93% and 86.09%, respectively, and the IOUs (c-1) to (c-3) in FIG. 8 are 77.03%, 69.32% and 91.15%, respectively. Meanwhile, the confidence degrees of the three models to the sunken ship targets (b-1) to (b-3) in the graph 8 are respectively 98.88%, 98.97% and 99.07%, and the confidence degrees to the sunken ship targets (c-1) to (c-3) in the graph 8 are respectively 96.51%, 94.45% and 99.42%, so that the migration learning based Yolov3 model has higher recognition accuracy and positioning accuracy, and the comprehensive performance of the model is better than that of the other two models.

Claims (1)

1. An improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning is characterized by comprising the following steps:
step 1: preprocessing the image data set of the side-scan sonar sunken ship;
the step 1 comprises the following steps: standardizing pixels of the whole data set image, and forcibly unifying images with inconsistent sizes into the same pixel; normalizing, converting into float32 format and converting into floating point number in 0-1 range; thirdly, cutting different proportions of the image by adopting a center cutting mode, and then amplifying the image to the size of the original image; fourthly, data enhancement is adopted to expand the data set;
step 2: re-clustering the prior frames by using a K-means clustering algorithm; the side-scan sonar image sunken ship target of the data set is flat and vertical, and YOLOv3 is obtained by adopting a K-means algorithm taking an intersection ratio as a distance measurement, and the distance formula is shown as follows;
d(b,o)=1-IOU(b,o)
Figure FDA0002683012960000011
in the formula: d (b, o) is the distance between the prior frame b and the cluster center o; IOU (b, o) is the intersection-parallel ratio between the prior frame b and the clustering frame o; bptIs a prior frame; bgtIs a real frame;
obtaining a prior frame which relatively accords with the shape characteristics of the sunken ship target through multiple clustering;
and step 3: carrying out multi-scale feature training of shallow feature fusion based on a YOLOv3 model; the shallow feature learned by 4 times of downsampling and 2 times of downsampling of a YOLOv3 model is fused with three scale features in a traditional YOLOv3 model, and the shallow sunken ship contour texture gray information learned by the YOLOv3 model is fused with deep semantic abstract features, so that the image has richer information;
and 4, step 4: adding a binary cross entropy to calculate a loss value; the adaptive learning rate Adam algorithm combined by Momentum and RMSProp algorithms is adopted to comprehensively consider the first moment estimation and the second moment estimation of the gradient and calculate the updating step length; the loss function of the model is shown below;
Figure FDA0002683012960000012
where x, y, w, h are the center coordinates and length and width of the prediction box, S2Dividing the feature map into grids, B containing the number of prediction frames for each grid, when the jth prediction frame in the ith grid is responsible for predicting a certain object,
Figure FDA0002683012960000021
otherwise
Figure FDA0002683012960000022
When the j prediction box in the ith grid is not responsible for predicting a certain object but is larger than the IOU of the actual box by a set threshold value, GijNot equal to 0, otherwise Gij=1;tx,tyTo predict the amount of deviation in the center of the bounding box,
Figure FDA0002683012960000023
as the true frame center offset, tw,thTo predict the bounding box width to height scaling ratio,
Figure FDA0002683012960000024
for true bounding box scaling, σ is the Sigmod function, which aims to compress the computed value to [0, 1%]Ensuring that the target center is in a predicted grid unit, and preventing excessive deviation;
Figure FDA0002683012960000025
the jth prediction box representing the ith mesh is responsible for the center coordinate error between the predicted and real boxes,
Figure FDA0002683012960000026
the jth prediction box representing the ith grid is responsible for width and height errors between prediction time and a real box; c is confidence of prediction, p is probability of class, L is cross entropy function of two classes, L
Figure FDA0002683012960000027
The jth prediction box representing the ith mesh is responsible for confidence errors in the prediction,
Figure FDA0002683012960000028
the jth prediction box representing the ith grid is responsible for the classification error during prediction; the coordinate of the central point and the length and the width of the prediction box adopt a mean square error, a Sigmod function sigma is used for calculating errors, the calculated amount is large, the parameter updating speed is low, the convergence time is long, and when the parameters are reversely transmitted, the gradient updating amplitude is small, so that the situation that the gradient disappears is easy to occur; therefore, the confidence coefficient and the category error are calculated by adopting a two-category cross entropy function to achieve a better convergence effect, which is as follows:
Figure FDA0002683012960000029
Figure FDA00026830129600000210
Figure FDA00026830129600000211
and 5: performing model training by using a transfer learning strategy; training a network model by adopting a transfer learning strategy;
on the basis of the pretrained COCO data set, the weight parameters of the convolutional layer before the multi-scale feature fusion are frozen, part of the convolutional layer, the full-link layer and the Sigmoid output layer are initialized and retrained on the target data set, and a specific flow chart is shown in FIG. 5;
step 6: and testing the data of the test set by using the trained model.
CN202010967912.1A 2020-09-15 2020-09-15 Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning Active CN112052817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010967912.1A CN112052817B (en) 2020-09-15 2020-09-15 Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010967912.1A CN112052817B (en) 2020-09-15 2020-09-15 Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning

Publications (2)

Publication Number Publication Date
CN112052817A true CN112052817A (en) 2020-12-08
CN112052817B CN112052817B (en) 2023-09-05

Family

ID=73603994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010967912.1A Active CN112052817B (en) 2020-09-15 2020-09-15 Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning

Country Status (1)

Country Link
CN (1) CN112052817B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598040A (en) * 2020-12-16 2021-04-02 浙江方圆检测集团股份有限公司 Switch consistency real-time detection method based on deep learning
CN112613504A (en) * 2020-12-17 2021-04-06 上海大学 Sonar underwater target detection method
CN113065446A (en) * 2021-03-29 2021-07-02 青岛东坤蔚华数智能源科技有限公司 Depth inspection method for automatically identifying ship corrosion area
CN113077017A (en) * 2021-05-24 2021-07-06 河南大学 Synthetic aperture image classification method based on impulse neural network
CN113343964A (en) * 2021-08-09 2021-09-03 湖南汇视威智能科技有限公司 Balanced underwater acoustic image target detection method
CN114677568A (en) * 2022-05-30 2022-06-28 山东极视角科技有限公司 Linear target detection method, module and system based on neural network
CN114758237A (en) * 2022-04-19 2022-07-15 哈尔滨工程大学 Construction method, detection method and construction device of automatic water delivery tunnel defect identification model, computer and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447034A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Traffic mark detection method in automatic Pilot based on YOLOv3 network
CN110147807A (en) * 2019-01-04 2019-08-20 上海海事大学 A kind of ship intelligent recognition tracking
CN110991516A (en) * 2019-11-28 2020-04-10 哈尔滨工程大学 Side-scan sonar image target classification method based on style migration
CN111222574A (en) * 2020-01-07 2020-06-02 西北工业大学 Ship and civil ship target detection and classification method based on multi-model decision-level fusion
CN111460894A (en) * 2020-03-03 2020-07-28 温州大学 Intelligent car logo detection method based on convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447034A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Traffic mark detection method in automatic Pilot based on YOLOv3 network
CN110147807A (en) * 2019-01-04 2019-08-20 上海海事大学 A kind of ship intelligent recognition tracking
CN110991516A (en) * 2019-11-28 2020-04-10 哈尔滨工程大学 Side-scan sonar image target classification method based on style migration
CN111222574A (en) * 2020-01-07 2020-06-02 西北工业大学 Ship and civil ship target detection and classification method based on multi-model decision-level fusion
CN111460894A (en) * 2020-03-03 2020-07-28 温州大学 Intelligent car logo detection method based on convolutional neural network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598040A (en) * 2020-12-16 2021-04-02 浙江方圆检测集团股份有限公司 Switch consistency real-time detection method based on deep learning
CN112613504A (en) * 2020-12-17 2021-04-06 上海大学 Sonar underwater target detection method
CN113065446A (en) * 2021-03-29 2021-07-02 青岛东坤蔚华数智能源科技有限公司 Depth inspection method for automatically identifying ship corrosion area
CN113077017A (en) * 2021-05-24 2021-07-06 河南大学 Synthetic aperture image classification method based on impulse neural network
CN113343964A (en) * 2021-08-09 2021-09-03 湖南汇视威智能科技有限公司 Balanced underwater acoustic image target detection method
CN114758237A (en) * 2022-04-19 2022-07-15 哈尔滨工程大学 Construction method, detection method and construction device of automatic water delivery tunnel defect identification model, computer and storage medium
CN114677568A (en) * 2022-05-30 2022-06-28 山东极视角科技有限公司 Linear target detection method, module and system based on neural network

Also Published As

Publication number Publication date
CN112052817B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN112052817B (en) Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning
CN111222574B (en) Ship and civil ship target detection and classification method based on multi-model decision-level fusion
CN108460382B (en) Optical remote sensing image ship detection method based on deep learning single-step detector
CN110232350B (en) Real-time water surface multi-moving-object detection and tracking method based on online learning
CN111563473B (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
CN109740460B (en) Optical remote sensing image ship detection method based on depth residual error dense network
CN110598693A (en) Ship plate identification method based on fast-RCNN
CN109581339B (en) Sonar identification method based on automatic adjustment self-coding network of brainstorming storm
CN111368671A (en) SAR image ship target detection and identification integrated method based on deep learning
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN113052215A (en) Sonar image automatic target identification method based on neural network visualization
CN111860106A (en) Unsupervised bridge crack identification method
Sun et al. Image recognition technology in texture identification of marine sediment sonar image
CN112784757A (en) Marine SAR ship target significance detection and identification method
CN112699833A (en) Ship target identification method based on convolutional neural network under complex illumination environment
CN115471746A (en) Ship target identification detection method based on deep learning
CN114764801A (en) Weak and small ship target fusion detection method and device based on multi-vision significant features
CN114821358A (en) Optical remote sensing image marine ship target extraction and identification method
CN113064133B (en) Sea surface small target feature detection method based on time-frequency domain depth network
CN113313166B (en) Ship target automatic labeling method based on feature consistency learning
CN114549909A (en) Pseudo label remote sensing image scene classification method based on self-adaptive threshold
CN113344148A (en) Marine ship target identification method based on deep learning
Zou et al. Maritime target detection of intelligent ship based on faster R-CNN
Feng et al. Rapid ship detection method on movable platform based on discriminative multi-size gradient features and multi-branch support vector machine
CN117115436A (en) Ship attitude detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant