CN111310622A

CN111310622A - Fish swarm target identification method for intelligent operation of underwater robot

Info

Publication number: CN111310622A
Application number: CN202010080320.8A
Authority: CN
Inventors: 刘明雍; 石廷超; 牛云; 黄宇轩; 杨扬; 汪培新
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-02-05
Filing date: 2020-02-05
Publication date: 2020-06-19

Abstract

The invention provides a fish swarm target identification method facing intelligent operation of an underwater robot, which comprises the steps of collecting underwater images, marking the underwater targets in the images one by one to form attribute files, and forming a data set by the images and the attribute files; establishing a fish school target detection and recognition deep convolutional neural network, and increasing three prediction scales output by the network into four; training the constructed deep convolutional neural network by using a data set, clustering by using a K-means algorithm to obtain the width and height of an Anchor Box as training parameters, and inputting the constructed deep convolutional neural network; and finally, recognizing and classifying the underwater target by using the trained deep convolutional neural network, and processing the recognition and classification result by using an improved non-maximum suppression algorithm model. The method can improve the identification precision of the underwater targets which are densely distributed and have a large amount of shielding in a complex underwater environment with high turbidity and dark light, and has stronger robustness and generalization capability.

Description

Fish swarm target identification method for intelligent operation of underwater robot

Technical Field

The invention relates to a fish swarm target identification method for intelligent operation of an underwater robot, and belongs to the field of target identification.

Background

Deep sea exploration and operation technology is one of the important fields of ocean technology research. Underwater robots include cabled underwater Robots (ROVs) and untethered underwater robots (AUVs), which are currently the most advanced deep sea exploration and working equipment. To realize deep sea detection and operation, an underwater robot must be capable of quickly sensing the submarine environment and accurately identifying an interested target. Therefore, the real-time identification of the underwater interested target by using the AUVs provided with the underwater optical camera has important research value and application prospect.

Most of traditional target identification methods are based on local feature identification, certain feature points are extracted for statistical calculation, then a feature matching method is used for target identification, and commonly used SIFT features, SURF features and the like are included. However, the method based on feature matching has poor robustness when the target background is complex, which causes low target identification precision, high time complexity and poor generalization capability.

Due to the rapid development of deep learning, the convolutional neural network is widely applied to the field of computer vision, and the Yolov3 algorithm is widely applied to the field of target recognition as an algorithm which gives consideration to detection speed and detection precision. However, the underwater environment is relatively complex, and the problems of high water turbidity, dark light and the like exist; secondly, underwater objects are often more densely distributed with a large amount of occlusions. If the YOLOv3 algorithm is directly applied to underwater target recognition, the false detection rate and the missing detection rate are relatively high.

Therefore, it is necessary to provide an underwater object recognition method with high detection accuracy.

Disclosure of Invention

The invention provides an underwater robot intelligent operation-oriented fish swarm target identification method based on an improved YOLOv3 algorithm, aiming at the problems of low detection precision and high omission factor of the traditional YOLOv3 algorithm in a complex underwater environment. Carrying out dimension clustering on the artificially marked underwater target frames by adopting a K-means algorithm to obtain the proper number of candidate frames and width-height dimensions; three scales are expanded into four scales in the YOLOv3 network, so that the detection effect on underwater targets with different scales is improved; meanwhile, in order to solve the problem of missed detection of the densely distributed and shielded underwater targets, the method optimizes a non-maximum suppression (NMS) algorithm. Through the improvement, the detection precision of the underwater target is improved, and the requirement of intelligent operation of the underwater robot is met.

The technical scheme of the invention is as follows:

the fish swarm target identification method facing the intelligent operation of the underwater robot is characterized in that: the method comprises the following steps:

step 1: acquiring images under different underwater environments by using an optical camera;

step 2: dividing the underwater image acquired in the step 1 into images with uniform width and height dimensions;

and step 3: randomly selecting the images processed in the step (2), wherein the images comprise images containing underwater targets and images not containing the underwater targets, marking the underwater targets in the images one by one, and generating attribute files containing marking information correspondingly for each image after marking; the image and the corresponding attribute file jointly form a data set;

and 4, step 4: building a fish swarm target detection and recognition deep convolutional neural network:

the deep convolutional neural network comprises a plurality of CBR3, CBR1, Res and Up sample units; wherein the CBR3 is composed of a filling layer, a 3x3 convolutional layer, a batch normalization layer and an activation function layer; the CBR1 is composed of a filling layer, a 1x1 convolution layer, a batch normalization layer and an activation function layer; the residual error unit Res consists of a plurality of 1x1 convolutional layers and 3x3 convolutional layers; upsample refers to upsampling;

the network input firstly passes through two CBR3 units, then passes through 5 groups of residual error units, then passes through one CBR1 unit and one CBR3 unit, at this time, the network output firstly passes through one CBR1 unit to obtain a first characteristic scale output, and secondly passes through one CBR1 unit, and is fused with the network output of the 4 th group of residual error units after Up-sampling Up sample, and then passes through one CBR1 unit and one CBR3 unit, at this time, the network output secondly passes through one CBR1 unit to obtain a second characteristic scale output, secondly passes through one CBR1 unit, and is fused with the network output of the 3 rd group of residual error units after Up-sampling Up sample, and then passes through one CBR1 unit and one CBR3 unit, at this time, the network output secondly passes through one CBR1 unit to obtain a third characteristic scale output, secondly passes through one CBR1 unit, and is fused with the network output of the 2 nd group of residual error units after Up-sampling Up sample, obtaining a fourth characteristic scale output through a CBR1 unit, a CBR3 unit and a CBR1 unit;

and 5: training the deep convolutional neural network constructed in the step 4 by using the data set obtained in the step 3;

step 6: and (5) identifying and classifying the underwater target by using the trained deep convolutional neural network in the step 5.

Further, in step 5, clustering is performed by adopting a K-means algorithm to obtain the width and height of the Anchor Box as training parameters, and the obtained width and height are input into the deep convolution neural network built in step 4, wherein the specific process is as follows:

step 5.1: the data set manufactured in the step 3 comprises an attribute file corresponding to each image, and the attribute file comprises the position information of the labeling frame in the image:

(x_j,y_j,w_j,h_j),j∈{1,2,...,N}

wherein (x)_j,y_j) Is the center point of the label box, (w)_j,h_j) The width and the height of the marking frame, and N is the number of the marking frames;

step 5.2: randomly giving k cluster center points (W)_i,H_i) I ∈ {1, 2.., k }, where (W ∈ ·, k }, where_i,H_i) Is the width and height dimensions of the Anchor Box;

step 5.3: calculating the distance between each marking frame and each clustering center point:

d(box,centroid)＝1-IOU(box,centroid)

when calculating, the central point of each marking frame is coincided with the clustering center to obtain

d＝1-IOU[(x_j,y_j,w_j,h_j),(x_j,y_j,W_i,H_i)],j∈{1,2,...,N},i∈{1,2,...,k}

Assigning the labeling box to the clustering center with the minimum distance d; IOU represents the ratio of intersection to union;

step 5.4: after all the label boxes are distributed, forming a plurality of cluster clusters, recalculating a cluster central point for each cluster in a manner of solving w_jAnd h_jA median of (d);

step 5.5: and (5) repeating the steps 5.3 and 5.4 until the change amount of the clustering center is smaller than a set threshold value or the iteration times is larger than the set highest iteration times, finishing clustering, and obtaining clustering results of the width and height sizes of the Anchor Box.

Further, the improved non-maximum suppression algorithm model identification and classification result is adopted in the step 6 for processing, and the specific process is as follows:

step 6.1: establishing a storage prediction frame h_iIs initialized to N prediction boxes of the model output, each prediction box has a prediction confidence score of S_iWherein i is more than or equal to 1 and less than or equal to N; establishing a set M for storing an optimal prediction frame, and initializing the set M into an empty set;

step 6.2: traversing all the prediction frames in the set H, sequencing the prediction confidence scores of all the prediction frames, and selecting the prediction frame m corresponding to the highest score;

step 6.3: calculating the intersection ratio IOU of the prediction boxes in the set H and the prediction box m respectively if a certain prediction box H_iWith prediction block mThe intersection-parallel ratio is higher than a set threshold value N_tIf the predicted frame is overlapped with m to a higher degree, the confidence score of the predicted frame is reduced to S'_iPutting the prediction frame M into the set M of the optimal prediction frame; wherein the rule for confidence score reduction is:

wherein h is_iFor one prediction box in the set H, N_tIs a set threshold value; if the confidence score S 'of a certain prediction frame in the set H'_iIf the number is less than or equal to 0, removing the data in the set H;

step 6.4: and repeating the step 6.2 and the step 6.3 until the set H is an empty set, and taking the prediction box in the set M as an output result.

In a further preferred scheme, the image is segmented into 416 × 416 images with width and height dimensions in step 2; in the step 4, the first characteristic scale is 13x13, is 32 times of down sampling relative to the input image, has a larger characteristic image receptive field, and is suitable for detecting a target with a large size in the image; the second characteristic scale is 26x26, is 16 times down-sampling relative to the input image, has a characteristic map receptive field of a mesoscale, and is suitable for detecting a target with a larger scale in the image; the third characteristic scale is 52x52, is 8 times down-sampling relative to the input image, has a smaller characteristic map receptive field, and is suitable for detecting a medium-scale target in the image; the fourth feature scale is 104x104, is down-sampled 4 times relative to the input image, has the smallest feature map receptive field, and is suitable for detecting very small targets in the image.

In a further preferred scheme, four different scenes, namely high water quality definition, high water quality turbidity, more underwater light and less underwater light, are selected in the step 1 for image acquisition, and the acquired images comprise underwater images with densely distributed targets and a large amount of shelters.

In a further preferred scheme, in step 3, the image processed in step 2 is subjected to operations of enlarging, reducing, rotating, adding noise or super-pixel, so as to obtain more images and expand the image set.

In a further preferred scheme, labeling underwater targets in the image one by using labeling image labeling software in step 3, and generating an attribute file of an XML suffix corresponding to each image after labeling.

Advantageous effects

The invention provides a fish swarm target identification method for intelligent operation of an underwater robot, which can improve the identification precision of underwater targets which are densely distributed and have a large amount of shielding in a complex underwater environment with high turbidity and dark light, and has stronger robustness and generalization capability.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of underwater target identification;

FIG. 2 is a network structure of the modified YOLOv3 algorithm;

FIG. 3 is a comparison graph of underwater target recognition and classification effects.

Detailed Description

The following detailed description of embodiments of the invention is intended to be illustrative, and not to be construed as limiting the invention.

The invention provides a fish swarm target identification method facing intelligent operation of an underwater robot, the flow of the method is shown in figure 1, and the method can improve the identification precision of underwater targets which are densely distributed and have a large amount of shielding in a complex underwater environment with high turbidity and dark light.

Step 1: the method comprises the steps of utilizing an optical camera to collect images in different underwater environments, selecting four different scenes which are high in water quality definition, high in water quality turbidity, more in underwater light and less in underwater light, and collecting underwater images which are densely distributed and have a large amount of shelters.

Step 2: and (3) uniformly dividing the width and height dimensions of the underwater image acquired in the step (1) into 416 multiplied by 416.

And step 3: and randomly selecting images containing underwater targets and images not containing underwater targets to make a data set, and expanding and labeling the data set so as to increase training samples, avoid the problem of overfitting in the training process and improve the robustness and generalization capability of the training model.

Step 3.1: obtaining underwater images including different types of underwater images through an underwater optical camera;

step 3.2: amplifying, reducing and rotating the underwater image, increasing noise, performing super-pixel operation and expanding a data set;

step 3.3: labeling underwater targets in the images one by using labeling image labeling software, and correspondingly generating an XML suffix attribute file for each image after the labeling;

step 3.4: and (4) storing the XML attribute file generated after the marking in the step (3.3) and the original image, and making into a data set.

And 4, step 4: constructing a fish swarm target detection and identification deep convolutional neural network structure, wherein the constructed deep convolutional neural network structure is specifically described as follows:

the deep convolutional neural network built by the invention mainly comprises a series of CBR3, CBR1, Res and Up sample, wherein the CBR3 consists of a filling layer (Padding), a 3x3 convolutional layer (3x3conv), a batch normalization layer (BN) and an activation function layer (ELU); the CBR1 is composed of a filling layer (Padding), a 1x1 convolution layer (1x1 conv), a batch normalization layer (BN) and an activation function layer (ELU); the residual unit Res consists of a series of 1x1 convolutional layers (1x1 conv) and 3x3 convolutional layers (3x3 conv); up sample refers to upsampling; in order to improve the detection precision of the underwater target, three prediction scales output by the network are increased to four, and a multi-scale detection characteristic pyramid network is constructed.

The network firstly passes through two CBR3 units, then passes through 5 groups of residual error units, then passes through one CBR1 unit and one CBR3 unit, at this time, the network output firstly passes through one CBR1 unit to obtain a first characteristic scale of 13x13, and secondly passes through one CBR1 unit, the network output after Up-sampling Up sample is fused with the network output of the 4 th group of residual error units, then passes through one CBR1 unit and one CBR3 unit, at this time, the network output secondly passes through one CBR1 unit to obtain a second characteristic scale of 26x26, and thirdly passes through one CBR1 unit, the network output after Up-sampling Up sample is fused with the network output of the 3 rd group of residual error units, then passes through one CBR1 unit and one CBR3 unit, at this time, the network output secondly passes through one CBR1 unit to obtain a third characteristic scale of 52x52, and thirdly passes through one CBR1 unit, the network output after Up-sampling Up sample is fused with the network output of the 2 nd group, and obtaining a fourth characteristic dimension 104x104 through one CBR1 unit, one CBR3 unit and one CBR1 unit.

The first characteristic scale is 13x13, the first characteristic scale is 32 times of down sampling relative to the input image, the characteristic map receptive field is large, and the first characteristic scale is suitable for detecting a target with a large size in the image; the second characteristic scale is 26x26, which is 16 times down-sampling relative to the input image, has a mesoscale receptive field, and is suitable for detecting the object with larger scale in the image; the third characteristic scale is 52x52, which is 8 times down-sampling relative to the input image, has a smaller receptive field and is suitable for detecting the medium-scale target in the image; the fourth feature scale is 104x104, which is a 4-fold down-sampling with respect to the input image, has the smallest receptive field, and is suitable for detecting very small objects in the image.

And 5: and (4) training the detection and recognition network built in the step (4) on a data set by using a high-performance computing platform. In the training process, the width and height dimensions of the Anchor Box are required to be used as training parameters and input into the deep convolutional neural network built in the step 4, the number and the width and height dimensions of the Anchor Box are clustered by adopting a K-means algorithm, the clustering aims to enable the Anchor Box and a real mark frame (ground route) to have a larger IOU value, the detection precision of the trained model is improved, and the method specifically comprises the following steps:

step 5.1: the created training data set includes an XML file corresponding to each image, and the XML file includes position information of the annotation box in the image, that is:

(x_j,y_j,w_j,h_j),j∈{1,2,...,N}

d(box,centroid)＝1-IOU(box,centroid)

the central point of each marking frame coincides with the clustering center during calculation, i.e. the central point of each marking frame coincides with the clustering center

d＝1-IOU[(x_j,y_j,w_j,h_j),(x_j,y_j,W_i,H_i)],j∈{1,2,...,N},i∈{1,2,...,k}

Step 6: recognizing and classifying underwater targets by using the model trained in the step 5, inputting an image into the model trained in the step 5, and then outputting the target in the image to be surrounded by N different prediction frames, wherein each prediction frame has a prediction confidence score S_iWherein i is more than or equal to 1 and less than or equal to N. The desired detection and recognition result is that the same target in the same image is surrounded by only one prediction box, and the non-maximum suppression algorithm (NMS) can be used to solve the problem, but the problem is solvedAnd the missed detection rate of the conventional non-maximum value inhibition algorithm is higher for the condition that the targets are dense. To solve the above problem, the present invention improves a non-maximum suppression algorithm. The method comprises the following specific steps:

step 6.1: establishing a storage prediction frame h_iA set H of (a), initialized to N prediction boxes; establishing a set M for storing an optimal prediction frame, and initializing the set M into an empty set;

step 6.3: calculating the intersection ratio (IOU) of the prediction boxes in the set H and the prediction box m respectively if a certain prediction box H_iThe intersection ratio with the prediction frame m is higher than a set threshold value N_tIf the predicted frame is overlapped with m to a higher degree, the confidence score of the predicted frame is reduced to S'_iPutting the prediction frame M into the set M of the optimal prediction frame; wherein the rule for confidence score reduction is:

In the embodiment, 3000 images containing 4 different underwater targets are selected, the targets are unevenly distributed in the images, and the YOLOv3 algorithm is improved in order to improve the detection accuracy of the underwater dense targets. The trained model is subjected to underwater target recognition and classification test by performing 30000 times of iterative training on the improved YOLOv3 algorithm, and the test result pair is shown in fig. 3, wherein fig. 3(1), fig. 3(3), fig. 3(5) are the recognition and classification results of the YOLOv3 algorithm on the underwater target, and fig. 3(2), fig. 3(4), fig. 3(6) are the recognition and classification results of the improved YOLOv3 algorithm on the underwater target. By comparison, the improved YOLOv3 algorithm improves the detection precision of the underwater dense target, and the effectiveness of the invention is verified.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims

1. A fish swarm target identification method for intelligent operation of an underwater robot is characterized in that: the method comprises the following steps:

the deep convolutional neural network comprises a plurality of CBR3, CBR1, Res and Up sample units; wherein the CBR3 is composed of a filling layer, a 3x3 convolutional layer, a batch normalization layer and an activation function layer; the CBR1 is composed of a filling layer, a 1x1 convolution layer, a batch normalization layer and an activation function layer; the residual error unit Res consists of a plurality of 1x1 convolutional layers and 3x3 convolutional layers; up sample refers to upsampling;

2. The underwater robot intelligent operation-oriented fish swarm target identification method according to claim 1, characterized in that: in the step 5, clustering is carried out by adopting a K-means algorithm to obtain the width and height of the Anchor Box as training parameters, and the width and height of the Anchor Box are input into the deep convolutional neural network built in the step 4, wherein the specific process is as follows:

(x_j,y_j,w_j,h_j),j∈{1,2,...,N}

d(box,centroid)＝1-IOU(box,centroid)

d＝1-IOU[(x_j,y_j,w_j,h_j),(x_j,y_j,W_i,H_i)],j∈{1,2,...,N},i∈{1,2,...,k}

3. The underwater robot intelligent operation-oriented fish swarm target identification method according to claim 1 or 2, characterized in that: and step 6, adopting an improved non-maximum suppression algorithm model to identify and classify results for processing, wherein the specific process comprises the following steps:

step 6.3: calculating the intersection ratio IOU of the prediction boxes in the set H and the prediction box m respectively if a certain prediction box H_iThe intersection ratio with the prediction frame m is higher than a set threshold value N_tThen the prediction frame is considered to overlap with mThe confidence score of the prediction frame is reduced to S'_iPutting the prediction frame M into the set M of the optimal prediction frame; wherein the rule for confidence score reduction is:

4. The underwater robot intelligent operation-oriented fish swarm target identification method according to claim 3, characterized in that: step 2, dividing the image into 416 x 416 images with width and height dimensions; in the step 4, the first characteristic scale is 13x13, is 32 times of down sampling relative to the input image, has a larger characteristic image receptive field, and is suitable for detecting a target with a large size in the image; the second characteristic scale is 26x26, is 16 times down-sampling relative to the input image, has a characteristic map receptive field of a mesoscale, and is suitable for detecting a target with a larger scale in the image; the third characteristic scale is 52x52, is 8 times down-sampling relative to the input image, has a smaller characteristic map receptive field, and is suitable for detecting a medium-scale target in the image; the fourth feature scale is 104x104, is down-sampled 4 times relative to the input image, has the smallest feature map receptive field, and is suitable for detecting very small targets in the image.

5. The underwater robot intelligent operation-oriented fish swarm target identification method according to claim 3, characterized in that: in the step 1, four different scenes, namely high water quality definition, high water quality turbidity, more underwater light and less underwater light, are selected for image acquisition, and the acquired images comprise underwater images with densely distributed targets and a large amount of shelters.

6. The underwater robot intelligent operation-oriented fish swarm target identification method according to claim 3, characterized in that: and 3, carrying out amplification, reduction, rotation, noise increase or super-pixel operation on the image processed in the step 2 to obtain more images and expand the image set.

7. The underwater robot intelligent operation-oriented fish swarm target identification method according to claim 3, characterized in that: and 3, marking underwater targets in the images one by using labeling image marking software, and correspondingly generating an XML suffix attribute file for each marked image.