CN110826575A

CN110826575A - Underwater target identification method based on machine learning

Info

Publication number: CN110826575A
Application number: CN201910950105.6A
Authority: CN
Inventors: 魏延辉; 姜瑶瑶; 蒋志龙; 贺佳林; 李强强; 马博也; 牛家乐; 刘东东
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-02-21

Abstract

An underwater target recognition method based on machine learning belongs to the technical field of underwater machine vision detection and processing. The core of the underwater target recognition algorithm is the SSD target detection algorithm, which adopts a feedforward convolutional network structure, and uses convolution boxes of different sizes between different layers to obtain feature maps of different scales, and then performs regression according to the feature map. The results are obtained through the non-maximum value suppression algorithm. The SSD algorithm adopts multi-scale and anchor point methods to solve the low-precision problem of region proposal, and uses multi-scale feature vectors, which greatly improves the good effect on both small and large targets. , which is of great help to improve the overall recognition accuracy, and can obtain more accurate location information. The non-maximum suppression algorithm can not only realize the detection of the target, but also greatly improve the recognition accuracy of the underwater target, provide effective visual information for the underwater robot to observe and operate the underwater target, and improve the underwater target recognition accuracy. Intelligent recognition ability.

Description

An underwater target recognition method based on machine learning

技术领域technical field

本发明属于水下机器视觉检测技术领域，具体涉及一种基于机器学习的水下目标识别方法。The invention belongs to the technical field of underwater machine vision detection, in particular to an underwater target recognition method based on machine learning.

背景技术Background technique

海洋覆盖地球绝大部分面积，拥有着极大的资源和奥秘。近现代以来，随着海洋装备研发技术的进步，人类开始进一步的认识和开发海洋资源，世界各个国家和地区都开始不遗余力的进行海洋装备的研发和水下资源开采。我国拥有近2万公里的海岸线，同时沿海经济海域有着丰富的海洋资源，拥有很好的海洋开发条件和很高的需求，计算机视觉设备作为水下传感设备之一，越来越多的被投入海洋探测当中。其完整的视觉体系涵盖光学、计算机科学控制理论等诸多学科及技术，已经广泛搭载于水下探测、作业和载人设备，其中基于视觉的水下跟踪识别技术有着非常重要的研究价值。The ocean covers most of the earth's area and possesses great resources and mysteries. Since modern times, with the advancement of marine equipment research and development technology, human beings have begun to further understand and develop marine resources, and all countries and regions in the world have begun to spare no effort in the research and development of marine equipment and the exploitation of underwater resources. my country has a coastline of nearly 20,000 kilometers. At the same time, the coastal economic sea areas are rich in marine resources, with good ocean development conditions and high demand. As one of the underwater sensing equipment, computer vision equipment is increasingly used. Into ocean exploration. Its complete vision system covers many disciplines and technologies such as optics, computer science control theory, etc., and has been widely used in underwater detection, operation and manned equipment. Among them, vision-based underwater tracking and identification technology has very important research value.

水下机器人配备视觉设备能更好的感知水下环境，同时单目的安装可以给水下设备节省更多空间，这为水下设备精密及机动作业提供有效的条件。本发明相关研究意在为水下设备作业，水下机器人避障、路径规划等提供目标和空间信息，因此水下单目视觉技术有着切实研究和实际应用价值。The underwater robot is equipped with visual equipment to better perceive the underwater environment, and the single-purpose installation can save more space for the underwater equipment, which provides effective conditions for the precise and maneuverable operation of the underwater equipment. The related research of the present invention is intended to provide target and space information for underwater equipment operation, underwater robot obstacle avoidance, path planning, etc. Therefore, the underwater monocular vision technology has practical research and practical application value.

目前水下目标识别算法的主要不足在于：The main shortcomings of the current underwater target recognition algorithms are:

首先，目前的水下视觉检测的精度不够，传统的识别算法对于目标的定位精度不够，因此，就会对目标物检测造成很大的影响，在定位中发现，定位精度呈现震荡式收敛并贯穿整个过程，这就造成了水下成像的模糊，对目标物的识别精度产生影响。First of all, the accuracy of the current underwater visual detection is not enough. The traditional recognition algorithm is not accurate enough for the target positioning. Therefore, it will have a great impact on the target detection. In the positioning, it is found that the positioning accuracy shows an oscillatory convergence and runs through In the whole process, this causes the blur of underwater imaging and affects the recognition accuracy of the target.

现在基于机器学习的算法主要分为两种，一种为two-stage方法如R-CNN，一种为one-stage，SSD属于后者，相比于Yolo，SSD算法在准确度和速度上都比其好很多。相比Yolo，SSD采用CNN来直接进行检测，而不是像Yolo那样在全连接层之后做检测。其实采用卷积直接做检测只是SSD相比Yolo的其中一个不同点，另外还有两个重要的改变，一是SSD提取了不同尺度的特征图来做检测，大尺度特征图(较靠前的特征图)可以用来检测小物体，而小尺度特征图(较靠后的特征图)用来检测大物体；二是SSD采用了不同尺度和长宽比的先验框(Prior boxes,Default boxes，在Faster R-CNN中叫做锚，Anchors)。Yolo算法缺点是难以检测小目标，而且定位不准，但是这几点重要改进使得SSD在一定程度上克服这些缺点。At present, there are two main types of algorithms based on machine learning, one is a two-stage method such as R-CNN, the other is one-stage, and SSD belongs to the latter. Compared with Yolo, the SSD algorithm has both accuracy and speed. Much better than that. Compared with Yolo, SSD uses CNN to detect directly, instead of doing detection after the fully connected layer like Yolo. In fact, the use of convolution for direct detection is only one of the differences between SSD and Yolo. There are also two important changes. One is that SSD extracts feature maps of different scales for detection. Large-scale feature maps (more advanced Feature map) can be used to detect small objects, while small-scale feature maps (later feature maps) are used to detect large objects; second, SSD uses prior boxes (Prior boxes, Default boxes of different scales and aspect ratios) , called anchors in Faster R-CNN). The disadvantage of the Yolo algorithm is that it is difficult to detect small targets and the positioning is inaccurate, but these important improvements enable SSD to overcome these shortcomings to a certain extent.

相比于传统CNN，神经网络,即有输入层、输出层和多个隐含层的网络,属于机器学习的一个子领域。主要的原理是将训练的数据,进入了输入层之后,经过多个隐含层的处理,经过大量的训练,以将输入的某些特征提取出来,并运用这个模型在实际中进行分类或者预测。传统的方法在水下目标识别中存在几个问题,如处理速度慢,识别率不高等。Compared with traditional CNN, neural network, that is, a network with input layer, output layer and multiple hidden layers, belongs to a subfield of machine learning. The main principle is that after the training data enters the input layer, it is processed by multiple hidden layers, and after a lot of training, some features of the input are extracted, and the model is used to classify or predict in practice. . Traditional methods have several problems in underwater target recognition, such as slow processing speed and low recognition rate.

引入了SSD，一种用于多个类别的单射探测器，它比以前用于单射探测器(yolo)等最先进的速度更快，与faster-Rcnn等识别网络相比有相似的准确度，但速度上更快。SSD的核心是使用应用于特征图的小卷滤波器来预测一组默认包围框的类别分数和框偏移量。这些设计特点导致了简单的端到端训练和高精度，即使在低分辨的输入图像，能够满足水下环境中图像不清晰，分辨率低的情况，进一步提高速度与准确性的权衡。Introduced SSD, a single-shot detector for multiple classes, which is faster than previous state-of-the-art detectors such as single-shot detectors (yolo) and has similar accuracy compared to recognition networks such as faster-Rcnn degrees, but faster. The core of SSD is to predict class scores and box offsets for a set of default bounding boxes using a small volume filter applied to the feature map. These design features lead to simple end-to-end training and high accuracy, even with low-resolution input images, which can meet the unsharp and low-resolution situations in underwater environments, further improving the speed-accuracy trade-off.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于机器学习的水下目标识别方法，能够更加精确的检测水下目标信息。The purpose of the present invention is to provide an underwater target recognition method based on machine learning, which can detect underwater target information more accurately.

本发明的目的是这样实现的：The object of the present invention is achieved in this way:

一种基于机器学习的水下目标识别方法，包括如下步骤：An underwater target recognition method based on machine learning, comprising the following steps:

步骤1：对水下目标图像进行预处理，对图像进行retinex复原得到输入图像；Step 1: Preprocess the underwater target image, and perform retinex restoration on the image to obtain the input image;

步骤2：通过不同层之间采用不同大小尺度的卷积盒进行卷积得到不同尺度的特征图；Step 2: Convolution between different layers using convolution boxes of different sizes to obtain feature maps of different scales;

步骤3：根据步骤二得到的特征图进行回归，对每一个特征图进行归一化处理，再通过不同尺寸的检测器和分类器进而得到不同尺度和具有偏移量的defualt box；Step 3: Perform regression according to the feature map obtained in step 2, normalize each feature map, and then obtain defualt boxes of different scales and offsets through detectors and classifiers of different sizes;

步骤4：通过NMS抑制算法得到最终的结果；Step 4: Obtain the final result through the NMS suppression algorithm;

所述步骤2中采用区域选择算法中RPN计算得到的区域结果，根据不同的尺度进行划分得到不同尺寸的特征图，然后利用RPN卷积核对其上区域进行移动并得到该区域的置信度值，通过不断的移动不同尺度的defualt box得到一个具有置信度的矩阵。In the step 2, the region results obtained by the RPN calculation in the region selection algorithm are used, and the feature maps of different sizes are obtained by dividing them according to different scales, and then the RPN convolution kernel is used to move the upper region and obtain the confidence value of the region, A matrix with confidence is obtained by continuously moving the defualt boxes of different scales.

所述的步骤3中defualt box的宽和高为：The width and height of the defualt box in the step 3 are:

其中

和

分别为defualt box的宽和高，尺度采用a_r∈{1,2,1/2,3,1/3}，为不同层的尺度，其中m为个数，默认的尺度是通过m个各不相同的尺度特征图预测来完成，S_min为最小的尺度，S_max为最大的特征图的尺度。in

and

are the width and height of the defualt box, respectively, and the scale adopts a _r ∈ {1,2,1/2,3,1/3}, is the scale of different layers, where m is the number, and the default scale is completed by predicting m different scale feature maps, S _min is the smallest scale, and S _max is the largest feature map scale.

所述步骤3中选择框的中心位置为：The center position of the selection box in the step 3 is:

其中x和y分别表示横框、纵轴中心点，|f_k|表示第k张特征图尺度，i.j∈[0,f_k]。where x and y represent the horizontal frame and the center point of the vertical axis, respectively, |f _k | represents the scale of the k-th feature map, ij∈[0,f _k ].

所述步骤4中NMS抑制的方法包括：The method for NMS suppression in the step 4 includes:

步骤4.1：将矩阵中元素根据conf大小进行排列；Step 4.1: Arrange the elements in the matrix according to the conf size;

步骤4.2：根据步骤4.1中计算的结果从大到小的顺序对所有交叉区域进行IoU计算，并设定Th值，按次序分别于IoU对比，根据其大小进行分类与划归；Step 4.2: Perform IoU calculation on all the intersection areas in descending order according to the results calculated in Step 4.1, and set the Th value, compare the IoU in order, and classify and classify according to its size;

步骤4.3：在列的第二大框位置重新回到步骤4.2执行；Step 4.3: Go back to step 4.2 at the second largest frame position in the column;

步骤4.4：重复执行步骤4.3，至到此列所有defualt box执行完毕；Step 4.4: Repeat step 4.3 until all defualt boxes in this column are executed;

步骤4.5：执行完对矩阵的遍历，即执行了所有类别的NMS；Step 4.5: After executing the traversal of the matrix, that is, executing all types of NMS;

步骤4.6：进行进一步剩余筛选，将最后所有剩余类别进行根据置信度进行选择。Step 4.6: Perform further residual screening, and select all remaining categories based on confidence.

所述步骤4中整个模型的损失函数为The loss function of the entire model in step 4 is

其中x用来判断设计的特征抓取盒是否有对应的目标，

表示第i个盒是否与第p类物体的第j个目标边框相匹配，匹配为1，反之为0；若表示对于第j个目标边界框至少有一个盒与之匹配；N表示匹配和的数量；

用来衡量识别的性能；用来衡量边界框预测性能；其中

表示第j个目标的真实目标边框与特征抓取盒的边框之间的偏差，m∈{cx,cy,w,h}，其中(cx,cy)表示边框中心点坐标，(w,h)表示边框的宽和高。where x is used to judge whether the designed feature grab box has a corresponding target,

Indicates whether the i-th box matches the j-th target frame of the p-th object, the match is 1, otherwise it is 0; if Indicates that at least one box matches the j-th target bounding box; N represents the number of matching sums;

used to measure the performance of recognition; used to measure the bounding box prediction performance; where

Represents the deviation between the real target frame of the j-th target and the frame of the feature grab box, m∈{cx,cy,w,h}, where (cx,cy) represents the coordinates of the center point of the frame, (w,h) Indicates the width and height of the border.

本发明有益效果在于：The beneficial effects of the present invention are:

(1)本发明采用多尺度和锚点的方式来解决区域建议的低精度问题，采用的多尺度的特征向量，极大的提高了对小目标和大目标兼具的良好的效果，同时对整体的识别准确率提高有很大的帮助，相对于以往的建议类方法，能获得更加精准的位置信息；(1) The present invention adopts the method of multi-scale and anchor points to solve the low-precision problem of region proposal, and the multi-scale feature vector used greatly improves the good effect on both small targets and large targets, and at the same time The improvement of the overall recognition accuracy is of great help, and more accurate location information can be obtained compared to the previous suggested methods;

(2)本算法进行20000次迭代根据损失函数计算得到误差曲线，初始误差定义为500，即收敛范围为(0,500)，最终收敛到20左右，误差率约小于百分之一，可以看出该算法在精度上有了很大的提高。(2) This algorithm performs 20,000 iterations to calculate the error curve according to the loss function. The initial error is defined as 500, that is, the convergence range is (0,500), and the final convergence is about 20. The error rate is about less than 1%. It can be seen that the The algorithm has greatly improved in accuracy.

附图说明Description of drawings

图1为多尺度检测实现过程图；Figure 1 is a diagram of the realization process of multi-scale detection;

图2为区域选择法；Figure 2 shows the area selection method;

图3为NMS算法流程图；Fig. 3 is the flow chart of NMS algorithm;

图4为叠率计算图；Fig. 4 is the calculation diagram of stacking ratio;

图5为多角度情况下个鱼与多鱼检测图；Fig. 5 is the detection diagram of single fish and multiple fish under the multi-angle situation;

图6为误差变化图。Figure 6 is a graph of error variation.

具体实施方式Detailed ways

下面结合发明内容，通过以下实施例阐述本发明的一种详细实施方案与效果。Below in conjunction with the content of the invention, a detailed implementation and effects of the present invention are illustrated by the following examples.

针对目前现有技术存在的不足，本发明旨在提供一种可靠性高、实时性好水下目标检测算法，能够更加精确的检测水下目标信息。该算法能够满足水下观测和作业需要，为水下机器人提供水下目标准确识别。本发明能够极大提高水下目标的识别准确度，为水下机器人进行水下目标观察和操作提供有效视觉信息，提高了水下目标的智能化识别能力。Aiming at the deficiencies of the current prior art, the present invention aims to provide an underwater target detection algorithm with high reliability and good real-time performance, which can detect underwater target information more accurately. The algorithm can meet the needs of underwater observation and operation, and provide accurate identification of underwater targets for underwater robots. The invention can greatly improve the recognition accuracy of the underwater target, provide effective visual information for the underwater robot to observe and operate the underwater target, and improve the intelligent recognition ability of the underwater target.

实施1：如附图1所示，发明根据水下观测和作业的需要实现了一种基于机器学习的水下目标识别算法。该算法的核心是SSD目标检测算法，SSD是一种精度非常高的目标检测的方法，其采用的是前馈的卷积网络结构，首先对水下目标图像进行预处理，对图像进行retinex复原得到输入图像，然后通过不同层之间采用不同大小尺度的卷积盒进行卷积得到不同的尺度的特征图，再根据的到的特征图进行回归最终通过NMS抑制的方法得到最终的结果，SSD目标检测的方法采用的是多尺度以及锚点的方式来解决区域建议的低精度问题，其中采用的多尺度的特征向量，极大的提高了对小目标和大目标兼具的良好的效果，同时对整体的识别准确率提高有很大的帮助，相对于以往的建议类方法，能获得更加精准的位置信息。接下来的卷积网络使用不同尺度的卷积模板进行特征融合，进而得到不同尺度和具有偏移量的default box，在最后通过加入极值抑制算法实现了最终的检测分类结果。具体的步骤是采用标准的VGG-16网络，通过不同尺寸的卷积模板进行特征检测，得到不同的特征图，对每一个特征图进行归一化处理，再通过不同尺寸的检测器和分类器进而得到不同尺度和具有偏移量的defualt box，最后通过非极大值抑制算法实现目标物的检测。Implementation 1: As shown in Figure 1, the invention implements an underwater target recognition algorithm based on machine learning according to the needs of underwater observation and operations. The core of the algorithm is the SSD target detection algorithm. SSD is a very high-precision target detection method. It uses a feedforward convolutional network structure. First, the underwater target image is preprocessed, and the image is retinex restored. Obtain the input image, and then use convolution boxes of different sizes between different layers to obtain feature maps of different scales, and then perform regression according to the obtained feature maps. Finally, the final result is obtained by the NMS suppression method, SSD The method of target detection uses multi-scale and anchor points to solve the low-precision problem of region proposal. The multi-scale feature vector used in it greatly improves the good effect of both small and large targets. At the same time, it is of great help to improve the overall recognition accuracy, and more accurate location information can be obtained compared to the previous suggested methods. The next convolutional network uses convolution templates of different scales for feature fusion, and then obtains default boxes with different scales and offsets. Finally, the final detection and classification results are achieved by adding an extreme value suppression algorithm. The specific steps are to use the standard VGG-16 network, perform feature detection through convolution templates of different sizes, obtain different feature maps, normalize each feature map, and then pass detectors and classifiers of different sizes. Then, defualt boxes with different scales and offsets are obtained, and finally the target detection is realized by the non-maximum suppression algorithm.

其中SSD多尺度的实现过程如图1所示，这里说明其前端用于特征提取的网络采用的是标准的卷积神经网络VGG-16，接下来的卷积网络使用不同尺度的卷积模板进行特征融合，进而得到不同尺度和具有偏移量的default box，在最后通过加入极值抑制算法实现了最终的检测分类结果。The implementation process of SSD multi-scale is shown in Figure 1. Here, the front-end network used for feature extraction uses the standard convolutional neural network VGG-16, and the subsequent convolutional networks use different scales of convolution templates to perform Feature fusion, and then obtain default boxes with different scales and offsets, and finally achieve the final detection and classification results by adding an extreme value suppression algorithm.

实施2：如附图2所示，在SSD的卷积网络算法中，其中RPN(Region ProposalNetwork)可以用来计算得到的区域结果，并根据不同的尺度进行划分得到特征图大小38×38×512，19×19×1024，10×10×512，5×5×256，3×3×256，1×1×256等诸多区域，根据附图2所示，其中采用的是5×5×256的为例来对RPN过程进行阐述，其中可以看见通过不同尺度和比率生成k个default box，其中k＝6可以生成5×5×256个卷积盒子以及4个带有偏置的defaultbox同时具有相应的置信类别。Implementation 2: As shown in Figure 2, in the convolutional network algorithm of SSD, RPN (Region Proposal Network) can be used to calculate the obtained regional results, and divided according to different scales to obtain a feature map size of 38 × 38 × 512 , 19 × 19 × 1024, 10 × 10 × 512, 5 × 5 × 256, 3 × 3 × 256, 1 × 1 × 256 and many other areas, according to Figure 2, which uses 5 × 5 × 256 The RPN process is described as an example. It can be seen that k default boxes are generated through different scales and ratios, where k=6 can generate 5 × 5 × 256 convolution boxes and 4 default boxes with bias at the same time. the corresponding confidence class.

这里默认的尺度是通过m个各不相同的尺度特征图预测来完成的，其中设置最小的尺度为S_min＝0.2，设置最大的特征图的尺度为S_max＝0.95，则可以根据相应的尺度递推的方式求取所有层的尺度，具体计算为:The default scale here is done by predicting m different scale feature maps, where the minimum scale is set to S _min = 0.2, and the scale of the largest feature map is set to S _max = 0.95, then you can set the scale according to the corresponding scale. The scale of all layers is obtained by recursion, and the specific calculation is:

这里根据不同比率对a_r，其中尺度采用a_r∈{1,2,1/2,3,1/3}，并根据a_r可以求得对应default box的宽和高。Here, a _r is paired according to different ratios, where the scale adopts a _r ∈ {1,2,1/2,3,1/3}, and the width and height of the corresponding default box can be obtained according to a _r .

其中

和

分别为defualt box的宽和高，此外在radio＝1时，需要重新制定尺度为

至此可以得到6种不同尺度的default box。in

and

are the width and height of the defualt box, respectively. In addition, when radio=1, the scale needs to be re-scaled as

So far, 6 default boxes of different scales can be obtained.

同时可以通过下面公式(3)求得选择框的中心位置。At the same time, the center position of the selection box can be obtained by the following formula (3).

其中，|f_k|表示第k张特征图尺度，x和y分别表示框横、纵轴中心点，i.j∈[0,f_k]。Among them, |f _k | represents the scale of the k-th feature map, x and y represent the center point of the horizontal and vertical axes of the frame, respectively, ij∈[0,f _k ].

NMS抑制算法，在卷积神经网络中，不能很准确的定位真实的位置，这时候需要采用非极值抑制NMS(Non-maximum suppression)的方法来减轻这种情况的发生。The NMS suppression algorithm, in the convolutional neural network, cannot locate the real position very accurately. At this time, the method of NMS (Non-maximum suppression) needs to be used to reduce the occurrence of this situation.

实施3：如附图3和附图4所示，Implementation 3: As shown in Figure 3 and Figure 4,

Step1：将8732×21矩阵中元素根据Conf大小进行排列。Step1: Arrange the elements in the 8732×21 matrix according to the size of Conf.

Step2：根据第一步中计算的结果从大到小的顺序，对所有交叉区域进行IoU计算，并设定Th值，按次序分别于IoU对比，根据其大小进行分类与划归。Step2: According to the results calculated in the first step from large to small, perform IoU calculation on all the intersection areas, and set the Th value, compare the IoU in order, and classify and classify according to their size.

Step3：在列的第二大框位置重新回到Step2执行。Step3: Return to Step2 to execute at the second largest frame position in the column.

Step4：重复执行Step3，至到此列所有default box执行完毕。Step4: Repeat Step3 until all default boxes in this column are executed.

Step5：执行完对8732×21矩阵的遍历，即执行了所有类别的NMS。Step5: After executing the traversal of the 8732×21 matrix, all types of NMS are executed.

Step6：进行进一步剩余筛选，将最后所有剩余类别进行根据置信度进行选择。Step 6: Perform further residual screening, and select all the remaining categories based on confidence.

根据每个特征图坐标点上设计好特征抓取盒来提取特征，并将这些特征用来预测目标的种类和边界框。这里用到3*3的卷积核去提取每个特征抓取盒中的特则会那个，每个特征图用到的卷积盒为3*3*6*(class+4)，6为每个特征图坐标点上抓取盒的数量，4为预测的目标边界和目标边界框之间的偏差。若一个特征图的尺寸为m*n，每个坐标上有6个盒，则最终产生m*n*6*(class+4)的输出结果：Features are extracted according to the feature capture box designed on each feature map coordinate point, and these features are used to predict the type and bounding box of the target. Here, a 3*3 convolution kernel is used to extract the special feature in each feature capture box. The convolution box used in each feature map is 3*3*6*(class+4), and 6 is The number of grab boxes at each feature map coordinate point, 4 is the deviation between the predicted target boundary and the target bounding box. If the size of a feature map is m*n, and there are 6 boxes on each coordinate, the output result of m*n*6*(class+4) is finally generated:

损失函数计算，整个模型的损失函数为公式如下：Loss function calculation, the loss function of the entire model is as follows:

其中x用来判断设计的特征抓取盒是否有对应的目标，

表示第i个盒是否与第p类物体的第j个目标边框相匹配，匹配为1，反之为0。若

表示对于第j个目标边界框至少有一个盒与之匹配。式中N表示匹配和的数量。公式(5)中的第一部分式用来衡量识别的性能的，主要就是一个多类的softmax损失函数，细节where x is used to judge whether the designed feature grab box has a corresponding target,

Indicates whether the i-th box matches the j-th target bounding box of the p-th object, matching is 1, otherwise it is 0. like

Indicates that for the jth object bounding box at least one box matches it. where N represents the number of matching sums. The first part of formula (5) is used to measure the performance of recognition, mainly a multi-class softmax loss function.

其中

公式(6)中，第二部分式用来衡量边界框预测性能的，用到的损失函数如下公式：in

In formula (6), the second part of the formula is used to measure the prediction performance of the bounding box, and the loss function used is as follows:

其中

表示第j个目标的真实目标边框与特征抓取盒的边框之间的偏差，m∈{cx,cy,w,h}，其中(cx,cy)表示边框中心点坐标，(w,h)表示边框的宽和高。最后边框的位置信息可由下面公式(7)表示：in

Represents the deviation between the real target frame of the j-th target and the frame of the feature grab box, m∈{cx,cy,w,h}, where (cx,cy) represents the coordinates of the center point of the frame, (w,h) Indicates the width and height of the border. The position information of the final border can be represented by the following formula (7):

同时，多个区域的综合置信度损失函数可以表示为：Meanwhile, the comprehensive confidence loss function of multiple regions can be expressed as:

实施5：如附图5所示，针对水下环境多角度识别的情况，针对水下鱼个体进行识别，通过互联网和实验水池实验收集了200张水下不同形式的鱼类和螃蟹的照片。对网络进行训练，从损失函数、识别精度和定位精度IOU几个方面实验。这里设定目标的Conf为0.5以上，对图像区域中心1/2区域内排除进行递归检测，针对鱼类分布密集和目标交叉等复杂真实情况进行检测识别。能够比较精确的识别出不同姿态，不同位置下的目标鱼，对远近多角度以及复杂多条鱼并列分布的情况进行精确识别，能够实现多目标的识别，对一些部分遮挡的鱼类能够实现精确的识别。进行20000次迭代根据损失函数计算得到误差曲线如附图6所示，初始误差定义为500，即收敛范围为(0,500)，最终收敛于20左右，误差率约小于1％。在定位精度图中发现，定位精度呈震荡式收敛并贯穿整个训练过程，这与水下成像模糊和虚影所导致。最终证明，能够很好的克服现有的水下成像模糊导致的识别精度不高造成的误识别等。Implementation 5: As shown in Figure 5, for the multi-angle identification of the underwater environment, for the identification of individual underwater fish, 200 underwater photos of different forms of fish and crabs were collected through the Internet and experimental pool experiments. The network is trained, and experiments are carried out from the aspects of loss function, recognition accuracy and positioning accuracy IOU. Here, the Conf of the target is set to be more than 0.5, and recursive detection is performed to exclude the 1/2 area of the center of the image area, and detection and identification are performed for complex real situations such as dense fish distribution and target intersection. It can more accurately identify the target fish in different poses and different positions, accurately identify the situation of far, near, multi-angle and complex parallel distribution of multiple fish, can realize multi-target recognition, and can achieve accurate identification of some partially occluded fish. identification. After 20,000 iterations, the error curve was calculated according to the loss function, as shown in Figure 6. The initial error is defined as 500, that is, the convergence range is (0,500), and the final convergence is about 20, and the error rate is about less than 1%. It is found in the positioning accuracy graph that the positioning accuracy converges in an oscillating manner and runs through the entire training process, which is caused by the blurring and phantom of underwater imaging. Finally, it is proved that it can well overcome the misrecognition caused by the low recognition accuracy caused by the blur of the existing underwater imaging.

Claims

1. An underwater target identification method based on machine learning is characterized by comprising the following steps:

step1, preprocessing an underwater target image, and performing retinex restoration on the image to obtain an input image;

step2, carrying out convolution by adopting convolution boxes with different sizes and scales among different layers to obtain feature maps with different scales;

step3, performing regression according to the feature maps obtained in the step two, performing normalization processing on each feature map, and obtaining defuelt boxes with different scales and offset through detectors and classifiers with different sizes;

and 4, obtaining a final result through an NMS (network management system) inhibition algorithm.

2. The machine learning-based underwater target recognition method according to claim 1, characterized in that: in the step2, a region result obtained by RPN calculation in a region selection algorithm is adopted, the feature maps with different sizes are obtained by dividing according to different scales, then the RPN convolution core is used for moving the region on the feature map and obtaining the confidence value of the region, and a matrix with the confidence is obtained by continuously moving defualt boxes with different scales.

3. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the width and height of the defuelt box in the step3 are as follows:

whereinAnd

width and height of defualt box, respectively, the scale being a_r∈{1,2,1/2,3,1/3}，

Is the scale of different layers, wherein m is the number, the default scale is realized by m different scale characteristic diagram predictions, S_minIs of the smallest scale, S_maxIs the scale of the largest feature map.

4. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the central position of the selection frame in the step3 is as follows:

wherein x and y represent the horizontal frame, the central point of the longitudinal axis, | f_kI represents the k-th feature graph scale, i.j E [0, | f_k|]。

5. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the method for NMS inhibition in step4 comprises

Step 4.1: arranging elements in the matrix according to the conf size;

step 4.2: IoU calculation is carried out on all the cross areas according to the sequence from large to small of the calculation result in the step 4.1, Th values are set, comparison is carried out on the Th values according to the sequence and IoU, and classification are carried out according to the sizes of the Th values;

step 4.3: the execution returns to the step 4.2 again at the second big frame position of the column;

step 4.4: repeating the step 4.3 until all the defuelt boxes in the row are executed;

step 4.5: performing traversal on the matrix, namely performing NMS of all categories;

step 4.6: and further residual screening is carried out, and all the final residual categories are selected according to the confidence coefficient.

6. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the loss function of the whole model in the step4 is

Where x is used to determine whether the designed feature capture box has a corresponding target,

whether the ith box is matched with the jth target frame of the pth object or not is represented, the matching is 1, and otherwise, the matching is 0; if it is

Indicating that at least one box matches the jth target bounding box; n represents the number of matching sums;

to measure the performance of the recognition;

used to measure bounding box prediction performance; wherein

And (2) representing the deviation between the real target frame of the jth target and the frame of the characteristic grabbing box, wherein m belongs to { cx, cy, w, h }, wherein (cx, cy) represents the coordinate of the center point of the frame, and (w, h) represents the width and the height of the frame.