CN116912670A

CN116912670A - Deep sea fish identification method based on improved YOLO model

Info

Publication number: CN116912670A
Application number: CN202211441477.4A
Authority: CN
Inventors: 刘长红; 温嘉文; 吴博淳; 刘金辉; 李天注
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-10-20

Abstract

The invention relates to the field of deep learning target detection, and discloses a deep sea fish identification method based on an improved YOLO model, which comprises the steps of generating an antagonistic neural network for sample expansion to obtain an expanded sample; training an improved YOLO v7 model, wherein the improved YOLO-v7 model is composed of an HN recursion gating convolution module, a CBS convolution module, a downsampling MP module, a receptive field increasing module SPPCSPC module, a RepConv module and an ASFF pyramid feature fusion module; dividing the extended sample into an input test picture and an input training picture, and dividing the input training picture into 7:3, dividing the ratio into a test set and a verification set, marking class names of each deep sea fish, and carrying out data enhancement by adopting Cutout data enhancement and Random affine transformation; the improved YOLO-V7 model is input into the data after data enhancement for training, the performance of the model is judged according to the loss function, model training parameters are updated, and 100 training iterations are performed in total; and obtaining a model file with the optimal model after training, and detecting the pictures and videos of the deep sea fishes by using the model.

Description

Deep sea fish identification method based on improved YOLO model

Technical Field

The invention relates to the field of deep learning target detection, in particular to a deep sea fish identification method based on an improved YOLO model.

Background

The invention described in reference [1] is an underwater video fish identification method based on a neural network, comprising the steps of: training a neural network model, wherein the model comprises an input layer, a first convolution layer, a second convolution layer, a third convolution layer, a maximum pooling layer, a full connection layer and an output layer which are sequentially connected, wherein the first convolution layer carries out feature map fusion after carrying out different feature extraction on information of different channels by a convolution layer of each channel in the input layer, the second convolution layer adopts a multiple convolution method to extract the scales of different receptive fields from targets of different scales, and then carries out feature map fusion and batch normalization processing; taking each channel of a color image in underwater video data and a gray level image thereof as the input of a model; and outputting a plurality of target positioning frames and the confidence degrees thereof by the model, and screening out targets according to the confidence degrees. The method can meet the requirements of real-time video fish identification and simultaneously reduce the quality requirements of images shot by the camera.

The invention described in reference [2] belongs to the technical field of image target detection, and particularly discloses a fabric flaw detection method based on a Yolov4 improved algorithm, which introduces a latest lightweight attention module Coordinate Attention (CA) on a backbone network, not only can capture cross-channel information, but also can capture direction sensing and position sensing information, so that the network can perform heavy detection on an interested target, and deformable convolution (Deformable Convolutional Network, DCN) is added to enhance the adaptability of the network to flaws with changeable shapes, and the detection accuracy is improved. For the feature fusion part, self-adaptive weighted fusion (ASFF) is used on the basis of the original path aggregation network, so that the extracted features of each feature layer are fused with different weights before prediction, and meanwhile, the cross-stage local network structure (CSP) is used for partial convolution of the feature fusion part for replacement, so that the accuracy of the network on fabric flaw detection is greatly improved under the condition of ensuring the speed.

The invention described in reference [3] discloses a fish identification method and device based on convolutional neural network, comprising: (1) Collecting an original fish image, performing significance analysis on the original fish image to locate and divide fish targets in the fish image to obtain a foreground image, and linearly fusing the foreground image and the original fish image to obtain a fish image with obvious contrast as a training sample so as to construct a training set; (2) Pre-training ResNet by using ImageNet, extracting ResNet with determined network parameters as a feature extraction unit, wherein the output of the feature extraction unit is sequentially connected with an average pooling layer and a Softmax classifier to form a fish identification network; (3) Optimizing network parameters of a fish identification network by using the training set to obtain a fish identification model; (4) And identifying the fish image to be identified by utilizing the fish identification model, and outputting an identification result. The fish identification method and device based on the convolutional neural network can accurately identify fish.

The method described in reference [4] provides a brief description of the problem that the fish shape in the natural environment is various and is susceptible to different light and background environments, resulting in reduced recognition accuracy and poor classification results of some conventional fish recognition algorithms based on color texture or feature point extraction.

[1] Underwater video fish identification method based on neural network

https://d.wanfangdata.com.cn/patent/ChJQYXRlbnROZXdTMjAyMjAzMjMSEENOMjAyM DExMzE5MzYxLjQaCGk4eGo1M3Rx

[2] Fabric flaw detection method based on YOLO v4 improved algorithm

https://d.wanfangdata.com.cn/patent/ChJQYXRlbnROZXdTMjAyMjAzMjMSEENOMjAyM TEwNTA1MzI2LlgaCHFqeWY1NXE2

[3] Fish identification method and device based on convolutional neural network

https://d.wanfangdata.com.cn/patent/ChJQYXRlbnROZXdTMjAyMjAzMjMSEENOMjAxO TEwOTEyMjg3LjgaCDhpcnZlN3Zp

[4] Fish identification algorithm based on improved AlexNet

https://d.wanfangdata.com.cn/periodical/ChlQZXJpb2RpY2FsQ0hJTmV3UzIwMjIxM DEzEg1kemtqMjAyMTA0MDAzGgh3NndnODJ1Zw％3D％3D

Object detection is one of the most important subjects in computer vision. Most computer vision problems involve detecting visual object categories such as pedestrians, cars, buses, faces, etc. This area is not limited to academia but also in video surveillance, healthcare, vehicle sensing and autopilot. Starting from AlexNet in 2012, the target detection algorithm is rapidly developed in the field of deep learning, and is mainly divided into two ideas: one-Stage and Two-Stage. The One-Stage extracts characteristics directly through a convolutional neural network, and predicts classification and positioning of targets. Two-Stage is firstly Region generation, namely a candidate Region (Region Propos) is generated, and then classification and positioning of a target are predicted through a convolutional neural network;

the YOLO model is a target recognition model based on One-stage thought, and has good target detection performance. However, in the deep sea fish identification problem, the YOLOv7 trunk feature extraction network is a CNN network, the CNN has translational invariance and locality, the capability of global modeling long-distance modeling is lacking, the original YOLO model feature fusion network is PANet, and although the original YOLO model feature fusion network can better fuse features of targets with different scales compared with the FPN, so that the effect is improved, but there is room for improvement, and a more advanced feature fusion network exists, meanwhile, the existing data set of the deep sea fish is less, and under the condition of small sample input, how to improve the whole network precision is still to be studied.

The prior art is improved based on the traditional neural network, but in the research of recent years, the neural network model is continuously optimized and improved, more improved network structures are proposed, and the current latest target detection neural network model has the characteristics of higher accuracy and higher speed. Meanwhile, the prior art lacks a more effective data enhancement method under the condition of less learning samples, and the samples are expanded so as to achieve a better model effect, so that a deep sea fish identification method based on an improved YOLO model is provided.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a deep sea fish identification method based on an improved YOLO model, which solves the problems.

(II) technical scheme

In order to achieve the above purpose, the present invention provides the following technical solutions: the deep sea fish identification method based on the improved YOLO model comprises the following steps:

the first step: collecting deep sea fish pictures;

and a second step of: generating an antagonistic neural network for sample expansion to obtain an expanded sample;

and a third step of: training an improved YOLO v7 model, wherein the improved YOLO-v7 model is composed of an HN recursion gating convolution module, a CBS convolution module, a downsampling MP module, a receptive field increasing module SPPCSPC module, a RepConv module and an ASFF pyramid feature fusion module;

fourth step: dividing the extended sample into an input test picture and an input training picture, and dividing the input training picture into 7:3, dividing the ratio into a test set and a verification set, marking class names of each deep sea fish, and carrying out data enhancement by adopting Cutout data enhancement and Random affine transformation;

fifth step: the improved YOLO-V7 model is input into the data after data enhancement for training, the performance of the model is judged according to the loss function, model training parameters are updated, and 100 training iterations are performed in total;

sixth step: and obtaining a model file with the optimal model after training, and detecting the pictures and videos of the deep sea fishes by using the model.

Preferably, the generating antagonistic neural network in the first step is divided into a discrimination model and a generating model, and the input deep sea fish image sample is processed according to the following 6:4, dividing the ratio into a training set and a testing set, inputting the classified pictures into a generated countermeasure network training to obtain a generated model and a judging model, inputting the pictures in the training set into the generated model obtained after the previous training to obtain a plurality of simulation samples, and finally combining the generated simulation samples with the original samples to be used as a new data set.

Preferably, the HN recursive gating convolution module and the ASFF pyramid feature fusion module are improved modules based on the original network, the HN recursive gating convolution module is added into the backbone network and the detection head network, the original feature fusion network of the YOLO-v7 network is improved into the ASFF self-adaptive feature fusion network, the main principle of the HN recursive gating convolution module is recursive gate convolution, and high-order space interaction is performed by using gate convolution and recursive design.

Preferably, the principle of the high-order space interaction is as follows:

s1: first, a set of projection features p0 and p0 are obtained using a linear projection function

S2: performing convolution:

scaling the output by 1/α to stabilize training, { f _k The } is a set of deep convolutional layers, { g _k -for matching dimensions in different orders;

the last recursive step q _n Is fed to the projection layer phi _out And obtaining a final recursive gate convolution result. In order to reduce the computational overhead caused by higher-order interactions, the channel dimensions in each order are set to:

s3: the original feature fusion network of the YOLO-v7 network is improved to be an ASFF self-adaptive feature fusion network, for the features of a certain level, the features of other levels are adjusted to be the same resolution and are simply integrated, then training is carried out to find the optimal fusion mode, and at each spatial position, the features of different levels are adaptively fused together;

s4: the spatial weight of each scale feature in fusion is adaptively adjusted through learning;

the feature fusion formula is as follows:

s5: alpha is obtained by softmax algorithm _i Is the value of (1):

s6: the ASFF pyramid feature fusion module was added to the YOLO-v7 model.

Preferably, the Cutout data enhancement randomly subtracts a portion of the picture during training, and the Random affine transformation comprises Random rotation, translation, scaling, and miscut operations on the image.

Preferably, the loss function of YOLO-v7 consists essentially of classification loss, obj confidence loss, and location loss, calculated as follows:

Loss＝λ ₁ L _cls +λ ₁ L _obj +λ ₁ L _los ；

the positioning loss function is CIOU, and the calculation formula of the CIOU is as follows:

wherein ρ is ² (b,b ^gt ) The Euclidean distance between the center points of the prediction frame and the real frame, c represents the diagonal distance of the minimum closure area capable of simultaneously containing the prediction frame and the real frame, and the calculation formulas of alpha and v are as follows:

the final Loss value is:

(III) beneficial effects

Compared with the prior art, the invention provides a deep sea fish identification method based on an improved YOLO model, which comprises the following steps of

The beneficial effects are that:

1. according to the deep sea fish identification method based on the improved YOLO model, high accuracy and high speed can be achieved under the condition that an input sample data set is small through expansion of deep sea fish data samples and improvement of a YOLO-v7 network. In the face of smaller data sets we use the generation of more sample data against the network (GAN). The method is improved on the original YOLO-v7 model, and an HN recursion gating convolution module and an ASFF self-adaptive feature fusion network are introduced. Both of these improved methods have been experimentally verified to have a positive effect on improving the performance of the model.

2. The deep sea fish identification method based on the improved YOLO model provides an accurate and efficient target detection method, and the same scheme of the method is not detected and recorded in the market at present, so that the scheme can well fill the technical gap of efficient target detection of small-sample deep sea fish, and has good application prospect.

Drawings

FIG. 1 is a schematic overall flow diagram;

FIG. 2 is a schematic diagram of a process for generating an reactive neural network;

FIG. 3 is a schematic flow diagram of an expanded portion of a sample dataset;

FIG. 4 is a schematic diagram of an ASFF pyramid feature fusion network;

FIG. 5 is a schematic diagram of the model test effect.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-5, the deep sea fish identification method based on the improved YOLO model comprises the following steps:

the first step: the invention relates to expansion of a sample data set, which expands a data sample by utilizing a generated antagonistic neural network (GAN), wherein the generated antagonistic neural network (GAN) is divided into a judging model and a generating model, the generating model is used for capturing the distribution of a real sample and generating a new simulation sample according to the distribution, and the judging model is a classifier for judging whether the input is the real sample or the simulation sample. The generation model and the countermeasure model enable the discrimination model to correctly discriminate the source of the training sample through continuous countermeasure training, and simultaneously enable the simulation sample generated by the generation model to be more similar to the real sample, thereby achieving the purpose of expanding sample data, and the input deep sea fish image sample is processed according to the following steps: 4, dividing the ratio into a training set and a testing set, inputting the classified pictures into a generated countermeasure network for training to obtain a generated model and a judging model, inputting the pictures in the training set into the generated model obtained after the previous training to obtain a plurality of simulation samples, and finally combining the generated simulation samples with the original samples to serve as a new data set for the next training of the improved YOLO-v7 model;

and a second step of: the improved YOLO v7 model is trained, and the improved YOLO-v7 model is composed of an HN recursion gating convolution module, a CBS convolution module, a downsampling MP module, a receptive field increasing module SPPCSPC module, a RepConv module and an ASFF pyramid feature fusion module. The HN recursion gating convolution module and the ASFF pyramid feature fusion module are improved modules based on the original network. And adding an HN recursion gating convolution module into an original backbone network (backbone) and a detection head network (head), and improving an original feature fusion network (PANet) of the YOLO-v7 network into an ASFF self-adaptive feature fusion network. The main principle of the HN recursive gated convolution module is recursive gate convolution (g n Conv). The characteristics of input self-adaption, long-range and high-order space interaction of the transducer neural network model can also be effectively realized through a convolution-based framework. Recursive gate convolution (gn Conv), high order spatial interactions are performed with the gate convolution and the recursive design. This approach has a high degree of flexibility and customizable, is compatible with various variants of convolution, and extends the second order interactions in self-attention to arbitrary orders without introducing significant additional computation. The principle for higher order spatial interactions is as follows:

first, a set of projection features p0 and p0 are obtained using a linear projection function

The convolution is then performed in the following recursive manner:

wherein the output is scaled by 1/α to stabilize training, { f _k The } is a set of deep convolutional layers, { g _k -for matching dimensions in different orders;

the original feature fusion network (PANet) of the YOLO-v7 network is improved to an ASFF adaptive feature fusion network. ASFF enables the network to learn directly how to spatially filter features at other levels, leaving only useful information to combine. For a certain level of features, the other levels of features are first tuned to the same resolution and simply integrated, and then trained to find the best fusion approach. At each spatial location, features of different levels are adaptively fused together, for example: if a certain position carries contradictory information, the features will be filtered out, and if the features of a certain position have more distinguishing clues, the features will be enhanced;

the spatial weights of the scale features at the time of fusion are adaptively adjusted through learning.

The feature fusion formula is as follows:

alpha is obtained by softmax algorithm _i Is the value of (1):

after the step of adding the ASFF pyramid feature fusion module to the detection head part of the YOLO-v7 is completed, the improvement of the whole YOLO-v7 model is completed;

and a third step of: training the model, and dividing the extended sample obtained in the first step into an input test picture and an input training picture. Inputting a test picture for testing the final effect of the subsequent model, and inputting a training picture according to 7:3 is divided into a test set and a verification set, and the class name of each deep sea fish is marked by using Labelimg marking software. And clustering the GT frames according to the marks, wherein the GT frame mark data are (c, x1, y1, x2 and y 2), c represents the category of objects contained in the GT frame, x1 and y1 respectively represent the x coordinate and the y coordinate of the top left corner vertex in the GT frame, x2 and y2 respectively represent the x coordinate and the y coordinate of the bottom right corner vertex in the GT frame, and the data enhancement part adopts Cutout data enhancement and Random affine transformation. The Cutout data enhancement randomly subtracts a part of the picture during training, so that the robustness of the model can be improved. Random affine transformation involves Random rotation, translation, scaling, miscut operations on the image. The data after data enhancement is input into the improved YOLO-V7 model for training, the performance of the model is judged according to a loss function, model training parameters are updated, and after 100 training iterations, the loss function of the YOLO-V7 mainly comprises three parts, namely classification loss (class loss), obj confidence loss (object loss) and positioning loss (Location loss), and the calculation formula is as follows:

Loss＝λ ₁ L _cls +λ ₁ L _obj +λ ₁ L _los ；

the positioning loss function is CIOU (complete IoU), and the CIOU calculation formula is as follows:

the final Loss value is:

and obtaining a model file with the optimal model after training, and detecting the pictures and videos of the deep sea fishes by using the model.

And the third part is a test of the model, and the input test picture separated in the second step is used for testing the accuracy of the model. And inputting a plurality of deep sea fish sample graphs, wherein the final detection effect is shown in figure 5.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The deep sea fish identification method based on the improved YOLO model is characterized by comprising the following steps of:

the first step: collecting deep sea fish pictures;

and a second step of: sample expansion is carried out on the production countermeasure neural network, and an expansion sample is obtained;

2. The method for identifying deep sea fish based on the improved YOLO model according to claim 1, wherein: the generation of the countermeasure neural network in the first step is divided into a discrimination model and a generation model, and input deep sea fish image samples are subjected to the following steps: 4, dividing the ratio into a training set and a testing set, inputting the classified pictures into a generated countermeasure network training to obtain a generated model and a judging model, inputting the pictures in the training set into the generated model obtained after the previous training to obtain a plurality of simulation samples, and finally combining the generated simulation samples with the original samples to be used as a new data set.

3. The method for identifying deep sea fish based on the improved YOLO model according to claim 1, wherein: the HN recursion gating convolution module and the ASFF pyramid feature fusion module are improved modules based on the original network, the HN recursion gating convolution module is added into the backbone network and the detection head network, the original feature fusion network of the YOLO-v7 network is improved into the ASFF self-adaptive feature fusion network, the main principle of the HN recursion gating convolution module is recursion gating convolution, and high-order space interaction is carried out by gating convolution and recursion design.

4. The method for identifying deep sea fish based on the improved YOLO model according to claim 1, wherein: the principle of the high-order space interaction is as follows:

S2: performing convolution:

the last recursive step q _n Is fed to the projection layer phi _out And obtaining a final recursive gate convolution result. To reduce the heightThe computational overhead caused by order interaction sets the channel dimension in each order to:

the feature fusion formula is as follows:

s5: alpha is obtained by softmax algorithm _i Is the value of (1):

s6: the ASFF pyramid feature fusion module was added to the YOLO-v7 model.

5. The method for identifying deep sea fish based on the improved YOLO model according to claim 1, wherein: the Cutout data enhancement randomly subtracts a portion of the picture during training, and the Random affine transformation includes Random rotation, translation, scaling, miscut operations on the image.

6. The method for identifying deep sea fish based on the improved YOLO model according to claim 1, wherein: the loss function of YOLO-v7 consists of classification loss, obj confidence loss and location loss, and the calculation formula is as follows:

Loss＝λ ₁ L _cls +λ ₁ L _obj +λ ₁ L _los ；

the final Loss value is: