CN112613504A

CN112613504A - Sonar underwater target detection method

Info

Publication number: CN112613504A
Application number: CN202011492415.7A
Authority: CN
Inventors: 曾丹; 陆恬昳; 徐霁轩; 蔡周吟
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-06

Abstract

The invention discloses a sonar underwater target detection method, which comprises the following specific steps: preprocessing an underwater target image to obtain data to be detected; extracting the data characteristics to be detected, then selecting a prior frame, extracting a characteristic graph, processing and calculating the prior frame in the characteristic graph to obtain a boundary frame value, obtaining a detection result according to the boundary frame value, and correcting the detection result. Compared with the traditional method, the method provided by the invention has the advantages that the time is reduced, the detection accuracy and speed are improved, the specific target can be effectively identified, the sonar images of continuous frames can be processed, the position and the type of the target in the current image can be accurately displayed in the front view sonar image, and the real-time detection effect can be achieved.

Description

Sonar underwater target detection method

Technical Field

The invention belongs to the crossing field of artificial intelligence and underwater electronic information, and particularly relates to a forward-looking sonar image target detection and identification method based on a YOLOv3 deep learning network.

Background

Due to the harsh and dangerous underwater environment, the marine industry faces urgent industrial intelligent upgrading requirements in development. To solve such problems, optical techniques, acoustic techniques, and AI algorithms are gradually incorporated into the marine industry. With the development of the technology, the development of the underwater target detection technology is fast, for example, the detection of underwater landforms, the detection technology of torpedoes and the like, the detection range of the acoustic image in the underwater target detection is long, the practicability is high, and the detection of sonar underwater targets has wide application prospects and development space.

The sonar equipment mainly comprises a forward looking sonar, a side scan sonar and a synthetic aperture sonar, wherein sonar signals are continuously emitted and received in the advancing process of the vehicle, and detection is realized. The forward-looking sonar is divided into a single beam and a multi-beam, wherein the single-beam forward-looking sonar forms a beam, the natural directivity of an array is utilized for orientation, only the space covered by one beam can be observed in the primary receiving and transmitting process, when a large fixed area is detected, the beam is required to be rotated to cover the whole area, the multi-beam forward-looking sonar can simultaneously emit a plurality of beams to form a fan-shaped detection area, and strip type measurement can be carried out. The image interference that sonar equipment gathered is big, and it is big to be influenced by the noise, and the detection of little target under water has many problems. At present, the working performance of sonar is greatly influenced by environmental factors besides being limited by technical parameters of the sonar, and the sonar causes problems of refraction, diffusion, absorption, noise and the like of sound waves.

The deep learning model becomes a hot spot of machine vision due to strong representation capability, and accumulation of data volume and improvement of calculation power, and target detection and identification are important problems in the machine vision. The image can be understood in several levels, the classification means that the image is classified into a certain class, the detection is performed, the content description of the whole picture is given, the identification and the position determination of the class of the target are performed, and the segmentation task is the separation of the value to the background and the separation of the target contour. In recent years, methods for detecting underwater targets mainly include a template matching-based method, a mathematical statistics-based method, a shallow neural network method, and the like.

The existing sonar underwater target detection technology has the defects of few sonar data sets and insufficient data sets for network training; the sonar image is not clear, the interference is much, the sonar image position is not fixed, the difference of the direct characteristics of different targets is not large or the situation that the difference of the same target is large due to the background interference exists at the same time; and the sonar image target recognition rate is different.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a sonar underwater target detection method which can use any data set, has high detection accuracy and accurate target identification, and specifically comprises the following steps:

the method comprises the following steps: acquiring a pre-training underwater target image, and preprocessing the pre-training underwater target image to obtain a training data set;

step two: constructing a feature extraction network, and training the feature extraction network based on a training data set;

step three: acquiring an underwater target image, and preprocessing the underwater target image to obtain a data set to be detected; inputting a data set to be detected into a feature extraction network, dividing the data set to be detected into grids through the feature extraction network, and extracting features of the underwater image;

step four: based on the characteristics of the underwater target image, performing k-means clustering on a data set to be detected, then selecting a prediction boundary box, sequencing the clustering clusters and the scales of the prediction boundary box, and averagely dividing the clustering clusters to each scale according to the sequencing result to obtain characteristic graphs of different scales;

step five: constructing a boundary box prediction network, and performing convolution prediction through the boundary box prediction network based on the characteristic graph to obtain a predicted boundary box value;

step six: post-processing the predicted bounding box value to obtain a detection result of corresponding data;

step seven: and respectively detecting adjacent frames of the underwater target image data to obtain detection results of the adjacent frames, and correcting the image prediction result according to the detection results of the adjacent frames.

Preferably, the second step is to construct a feature extraction network based on DarkNet 53.

Preferably, the pre-training underwater target image in the first step is a rectangular image captured from a forward looking sonar sector image.

Preferably, the pretreatment in the step one comprises the following specific steps: and adjusting the shape and the size of the intercepted rectangular image, and then filling the background of the image after the size adjustment.

Preferably, when the data to be detected in the seventh step is data to be detected of consecutive frames, the prediction result is optimized through the relevance of the previous frame and the next frame.

Preferably, the step five specifically includes the following steps: based on the RPN, a bounding box prediction network is constructed, and the bounding box prediction network obtains the value of a prediction bounding box by using direct position prediction.

Preferably, the sixth step comprises, in particular,

controlling the offset to be 0-1 by using Sigmoid limitation according to the offset of the grid at the lower left corner of the prediction boundary box, and constraining the center of the prediction boundary box in the grid where the current center is located to obtain a processed prediction boundary box;

based on the processed prediction boundary frames, dividing the processed prediction boundary frames according to categories through a non-maximum value suppression algorithm, arranging the processed prediction boundary frames in a descending order according to scores, calculating the intersection ratio of one boundary frame and other boundary frames, and screening the boundary frames through setting a threshold value of the intersection ratio to obtain a target boundary frame; based on the target bounding box, calculating the target confidence coefficient of each predicted bounding box by using logistic regression, and calculating the parameters of the target bounding box by using square errors to obtain a detection result;

and the detection result is a target confidence coefficient and a target boundary box parameter.

Preferably, the number of convolution kernels of the step five bounding box prediction network is (4+1+ c) × k, wherein each prediction bounding box comprises: and 4 x k parameters representing the offset of the target bounding box, k parameters representing the target probability contained in the target bounding box, and c x k parameters representing the probability of predicting c target categories corresponding to k preset bounding boxes.

Preferably, the feature map in step four comprises two dimensions including a plane dimension and a depth dimension;

the plane dimension is m × m, and m × m grids are correspondingly used for prediction;

the depth has a dimension of (4+1+ c) k.

The invention has the beneficial effects that:

1. aiming at forward-looking sonar data, starting from data set processing, framing an area where an object to be detected is located for a series of forward-looking sonar data, manufacturing a corresponding data set, performing different preprocessing aiming at areas with different object sizes, processing image data which can be suitable for deep learning network input and learning aiming at the forward-looking sonar image data, wherein the size of the area to be detected can be any, a preprocessing part inputs a detection network after performing corresponding processing on the area to be detected, and the network can also run completely and achieve the same technical effect;

2. the invention uses a deep learning method, sends a processed forward-looking sonar image data set into a network for training, extracts features, uses a deep neural network to realize the function of target detection, and has high detection rate, accuracy and recall ratio indexes;

3. according to the invention, a forward-looking sonar image is input into a YOLOv3 network model, and feature extraction, target detection and target classification are completed in the model. And finally, accurately giving a target confidence result, a target category and the position of the target relative to the image. Meanwhile, the current sonar image detection result can be adjusted in a judgment module according to a certain strategy through related front and rear frame sonar data, and the effect can be achieved as well;

4. the invention adopts multi-scale prediction, can simultaneously use one network to predict the targets with different sizes in the sonar images, and does not need to train the model repeatedly.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a forward-looking sonar image target detection and identification method based on YOLOv 3;

FIG. 2 is a block diagram of a neural network part of a forward-looking sonar image object detection and recognition method based on YOLOv 3;

fig. 3 is a graph of a neural network training process of a forward-looking sonar image target detection and recognition method based on YOLOv3, wherein fig. 3(a) - (h) respectively correspond to a generalized intersection-to-parallel ratio change curve, an Obj value change curve, a precision ratio change curve, a recall ratio change curve, a verification set generalized intersection-to-parallel ratio change curve, a verification set Obj value change curve, an average precision change curve, and an F1 value change curve;

fig. 4 is an exemplary diagram of image data which needs to be put into a network for training after being preprocessed, fig. 4(a) - (d) correspond to exemplary diagrams of a first type of target image, fig. 4(e) is an exemplary diagram of a second type of target image, fig. 4(f) is an exemplary diagram of a third type of target image, fig. 4(g) is an exemplary diagram of a fourth type of target image, and the four types of targets are thunder, pipeline, base matrix and latent standard respectively;

fig. 5 is an exemplary diagram of detection results of different types of targets, wherein fig. 5(a) - (d) respectively correspond to output results of images of targets of the first type, the second type, the third type and the fourth type.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the problems that the bias problem of the edge model under the federal machine learning environment causes the bias which is not in line with the reality of the model prediction result and the fairness is violated, the invention provides the following scheme:

referring to fig. 1, the invention provides a sonar underwater target detection method, which comprises the following steps:

the method comprises the following steps: acquiring pre-training underwater target images from a to-be-detected region part acquired from a forward-looking sonar, and preprocessing the pre-training underwater target images to obtain a training data set; the pre-training underwater target image is a rectangular image intercepted from a forward-looking sonar sector image. The pretreatment comprises the following specific steps: and adjusting the shape and the size of the intercepted rectangular image, and then filling the background of the image after the size adjustment.

Step two: and constructing a feature extraction network based on DarkNet53, and training the feature extraction network based on a training data set.

As shown in fig. 4, the image is acquired and then preprocessed to obtain forward-looking sonar image data, and the forward-looking sonar image data is obtained by directly cutting a sector image and then performing size adjustment on the sector image, or by performing size adjustment on an area with the actual target size in the sector image and then filling a background around the image with the target. And meanwhile, before the intercepted image is filled to a fixed size, the intercepted image is rotated, folded and the like to expand a training data set, the data set containing the four types of targets is sent to a network for training, and a neural network model for detecting the corresponding four types of targets is obtained.

Step three: acquiring an underwater target image, and preprocessing the underwater target image to obtain a data set to be detected; inputting a data set to be detected into a feature extraction network, dividing the data set to be detected into grids through the feature extraction network, and extracting the underwater image features.

Step four: and performing k-means clustering on a data set to be detected based on the characteristics of the underwater target image, then selecting a prediction boundary box, sequencing the clustering clusters and the scales of the prediction boundary box, and averagely dividing the clustering clusters to each scale according to the sequencing result to obtain characteristic graphs of different scales.

The method comprises the steps that a feature extraction network and a network structure in a feature graph obtained by clustering are combined to form a target detection network, the target detection network structure is shown in figure 2, the feature extraction network is used for extracting features, target detection is carried out on data to be detected by dividing cells, Leaky ReLu is used as an activation function, end-to-end training is adopted, and batch normalization is used as a regularization, convergence acceleration and overfitting avoidance method to extract image features.

The CBL is the minimum component in the target detection network, and means that three operations of convolution Conv, normalization Bn and activation by using an activation function are carried out on input, wherein the activation function used in the invention is a Leaky ReLu function. The res module refers to a residual block designed to follow the residual network, where the input can propagate forward faster through the data lines across layers. The target detection network flow is as follows: inputting the image to a basic network comprising a CBL component and a res component, and outputting three values at three different network parts, namely corresponding three-scale feature maps. The three characteristic maps with different scales are divided into three branches, the first branch is directly input into the CBL assembly and then output, the second branch is used for combining the value output by the CBL assembly in the first branch with the characteristic map output extracted from the basic network and then input into the CBL assembly to obtain output, similarly, the third branch is used for carrying out tensor splicing on the result output by the first group of CBL in the second branch and the result output by the basic network, and then the third branch is continuously processed by the CBL to obtain a result.

Step five: based on the RPN network, constructing a bounding box prediction network, wherein the number of convolution kernels of the bounding box prediction network is (4+1+ c) × k, and each prediction bounding box comprises: the 4 x k parameters represent the offset of the target bounding box, the k parameters represent the probability that the target is contained in the target bounding box, and the c x k parameters represent the probability of predicting that the k preset bounding boxes correspond to the c target categories. The feature map includes two dimensions including a planar dimension and a depth dimension; the plane dimension is m × m, and m × m grids are correspondingly used for prediction; the depth has a dimension of (4+1+ c) k.

The bounding box prediction network uses direct position prediction to obtain the values of the predicted bounding box.

Step six: and carrying out post-processing on the value of the predicted boundary box to obtain a detection result of the corresponding data.

Compressing the offset tx and ty of the grid coordinate at the lower left corner of the predicted boundary frame between 0 and 1 through a sigmoid function according to the value of the predicted boundary frame (the offset tx.ty of the grid coordinate at the lower left corner of the predicted boundary frame relative to the x and y directions and the scale scaling value tw and th of the predicted boundary frame), and constraining the center of the predicted boundary frame in the grid where the current center is located to obtain the processed predicted boundary frame;

each grid contains 3 processed prediction frames, and which processed prediction frame is specifically used as a target boundary frame depends on the Intersection-over-unity (IOU) of the frame and the ground route, the matching of the target frame with the largest IOU and the ground route, the coordinate error, the confidence error and the classification error of the target frame matched with the ground route, and only the confidence error of other target frames not matched with the ground route. The bounding box prediction network reduces loss and obtains an accurate model through the tiny offset and the scale scaling value of the target bounding box. And for the matched target boundary boxes, performing fine adjustment on each boundary box by using logistic regression, namely translation and scaling, finally predicting to obtain a target confidence coefficient of each boundary box, calculating boundary box parameters by using a square error, and finally obtaining target boundary box parameters comprising the coordinates bx and by of the upper left corner of the target boundary box, the width and height bw and bh of the target boundary box and the target confidence coefficient confidence.

The target bounding box parameter is obtained by the following formula:

bx＝σ(tx)+cx

by＝σ(ty)+cy

bw＝p_we^tw

bh＝p_he^th

the variation curves of the loss values in the training process are shown in fig. 3(a) - (h), which also include the variation of the calculation index precision (precision) and the recall rate (recall) of the target detection result by the model.

Step seven: according to the adjacent frames of the underwater target image data, the image data of the adjacent frames are respectively and sequentially input into a target detection network and a boundary frame prediction network to obtain the detection results of the adjacent frames, the final detection result is obtained by weighting according to the detection results of the adjacent frames, then judgment is carried out according to a set threshold value, and the image prediction result is corrected. And when the data to be detected is data to be detected of continuous frames, correcting the confidence coefficient of the detection result of the current frame through the relevance of the previous frame and the next frame according to the corresponding strategy of the judging module.

The invention has the beneficial effects that:

1. aiming at forward-looking sonar data, starting from data set processing, framing an area where an object to be detected is located for a series of forward-looking sonar data, manufacturing a corresponding data set, performing different preprocessing aiming at areas with different object sizes, processing image data which can be suitable for deep learning network input and learning aiming at the forward-looking sonar image data, wherein the size of the area to be detected can be any, a preprocessing part inputs a detection network after performing corresponding processing on the area to be detected, and the network can also run completely and achieve the same technical effect; 2. the invention uses a deep learning method, sends a processed forward-looking sonar image data set into a network for training, extracts features, uses a deep neural network to realize the function of target detection, and has high detection rate, accuracy and recall ratio indexes; 3. according to the invention, a forward-looking sonar image is input into a YOLOv3 network model, and feature extraction, target detection and target classification are completed in the model. And finally, accurately giving a target confidence result, a target category and the position of the target relative to the image. Meanwhile, the current sonar image detection result can be adjusted in a judgment module according to a certain strategy through related front and rear frame sonar data, and the effect can be achieved as well; 4. the invention adopts multi-scale prediction, can simultaneously use one network to predict the targets with different sizes in the sonar images, and does not need to train the model repeatedly.

Fig. 5 shows that the four types of target detection output images corresponding to the technical solutions of the present invention are finally obtained, and the final result gives the type, position, and confidence of the corresponding text recording target at the same time.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A sonar underwater target detection method is characterized by comprising the following steps:

2. The sonar underwater target detection method according to claim 1, characterized in that:

and step two, constructing a feature extraction network based on DarkNet 53.

3. The sonar underwater target detection method according to claim 1, characterized in that:

the pre-training underwater target image in the first step is a rectangular image intercepted from a forward-looking sonar sector image.

4. The sonar underwater target detection method according to claim 3, comprising:

the pretreatment in the first step comprises the following specific steps: and adjusting the shape and the size of the intercepted rectangular image, and then filling the background of the image after the size adjustment.

5. The sonar underwater target detection method according to claim 1, characterized in that:

and seventhly, when the data to be detected is data to be detected of continuous frames, optimizing a prediction result through the relevance of the previous frame and the next frame.

6. The sonar underwater target detection method according to claim 1, characterized in that:

the fifth step comprises the following specific steps: constructing a boundary box prediction network based on an RPN (resilient packet network), wherein the boundary box prediction network obtains a value of a prediction boundary box by using direct position prediction;

the values of the prediction bounding box include an offset of a lower left corner mesh of the prediction bounding box and a prediction bounding box scaling value.

7. The sonar underwater target detection method according to claim 1, characterized in that:

the sixth step comprises the specific steps of,

8. The sonar underwater target detection method according to claim 6, characterized in that:

the number of convolution kernels of the step five bounding box prediction network is (4+1+ c) × k, wherein each prediction bounding box comprises: and 4 x k parameters representing the offset of the target bounding box, k parameters representing the target probability contained in the target bounding box, and c x k parameters representing the probability of predicting c target categories corresponding to k preset bounding boxes.

9. The sonar underwater target detection method according to claim 8, comprising:

the feature map in the fourth step comprises two dimensions including a plane dimension and a depth dimension;

the depth has a dimension of (4+1+ c) k.