CN115661628A

CN115661628A - Fish detection method based on improved YOLOv5S model

Info

Publication number: CN115661628A
Application number: CN202211339303.7A
Authority: CN
Inventors: 卢熙; 井煜; 张淑媚
Original assignee: Guilin University of Technology
Current assignee: Guilin University of Technology
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-01-31

Abstract

The invention discloses a fish detection method based on an improved YOLOv5S model, which comprises the following steps: collecting a fish image dataset; performing data enhancement processing on the data set; improving a YOLOv5s neural network; including (1) adding an SE attention module to the C3 module. (2) The underlying convolution operation in CSP is replaced with a deep separable convolution. (3) Changing a frame loss function in the original YOLOv5 from a complete intersection ratio loss function (CIOU) to an effective intersection ratio loss function (EIOU); after the data set is subjected to labeling processing, the data set is sent to an improved YOLOv5s neural network for training and then is tested. The invention can realize the detection and identification of fish, can better protect marine fish resources and promote the healthy and sustainable development of the fish resources; and can provide a reference scheme for the monitoring and identification of other marine organisms.

Description

Fish detection method based on improved YOLOv5S model

Technical Field

The invention relates to the field of deep learning target detection, in particular to a fish detection method based on an improved YOLOv5S model.

Background

The ocean is the largest ecosystem on earth, which contains an extremely abundant biological resource. Although the ocean area of China is more than 300 ten thousand square kilometers, fishery resources are extremely important resources in ocean resources, and the trend of continuous decline is shown in recent years. This is mainly because of the increasing of fishing strength for fish resources and the unordered fishing for fish resources in recent years. This fishing greatly impairs the otherwise fragile marine ecological environment. It is therefore necessary to monitor the species and quantity of fish.

The traditional monitoring method mainly depends on supervision of an observer on a ship or carries out manual analysis after a fishing video is shot by using a camera of a fishing boat, and the like, but the modes have stronger dependence on people, and the working state of the people is often influenced by external factors. After long repetitive work, the physical state and emotion of a person may fluctuate greatly. And the problems of overhigh cost, low efficiency and the like can also exist by depending on manual supervision.

Disclosure of Invention

The invention aims to provide a fish detection algorithm based on an improved YOLOv5S model, which can overcome many defects existing in the traditional manual fish species identification, and can automatically detect the fish and identify the species to which the fish belongs by the method;

in order to achieve the above purpose, the technical scheme adopted by the invention comprises the following steps:

s1, obtaining fish image data, labeling the data, and dividing the labeled fish image data into a training set, a verification set and a test set;

and S2, constructing a YOLOv5S model, wherein the model specifically comprises Backbone, sock and Prediction. The backhaul is a core module of a YOLOv5S model and consists of Focus, C3, SPP and other modules; the neck consists of an FPN + PAN structure; the Prediction is a loss function of the detection model; an improvement to the YOLOv5S model involves adding the SE attention module to the C3 module in the YOLOv5S model, using deep separable convolution instead of the basic convolution operation in CSP. And changing the frame loss function in the original YOLOv5s from a complete cross ratio loss function (CIOU) to an effective cross ratio loss function (EIOU). Obtaining an improved YOLOv5S model;

s3, sending the training set obtained in the step S1 into the improved YOLOv5S model obtained in the step S2 for training; then, verification is carried out through the verification set obtained in the step S1, and parameter tuning is carried out on the improved YOLOv5S model according to the verification result; finally, carrying out model test on the improved YOLOv5S model through the test set obtained in the step S1, storing the model after the test is passed and taking the model as a detection model;

s4, inputting Chinese medicinal material image data to be detected into the detection model, and detecting the Chinese medicinal material image to be detected through the detection model;

further, labeling each traditional Chinese medicine picture by using a LabelImg tool in the step S1;

further, the division ratio of the training set, the verification set and the test set in the step S1 is 7:2: 1;

further, the SE attention module in step S2 performs global average pooling on the input feature map to reduce the feature map to 1 × 1, then establishes connection between channels using two full connection layers and a nonlinear activation function, finally obtains normalized weight through a Sigmoid activation function, and then weights the original feature map channel by channel through multiplication to complete recalibration of the original feature;

further, in the step S2, EIOU is:

wherein, E _Iou Represents the loss of the intersection ratio IoU of the overlapping portions of two rectangular frames, L _dis Denotes the distance loss, L _asp Represents the side length loss; rho ² (b，b ^gt ) Representing Euclidean distance between the center points of the prediction frame and the real frame, b representing the coordinate of the center point of the prediction frame, b ^gt Coordinates representing the center point of the real box, gt represents the real value, c represents the diagonal distance of the minimum closure area that can contain both the predicted box and the real box，ρ ² (w，w ^gt ) Euclidean distance representing the width of the prediction frame and the real frame, w represents the width of the prediction frame, w ^gt Width of the real box, C _w Width, ρ, representing the minimum closure area that can contain both the prediction box and the real box ² (h，h ^gt ) The Euclidean distance representing the height of the prediction frame and the real frame, h represents the height of the prediction frame ^gt Representing the height of the real box; c _h A height representing a minimum closure area that can contain both the prediction box and the real box;

further, in the step S3, the data in the training set is subjected to rotation, translation, scaling, random illumination and Mosaic8 enhancement operations. Then, the processed data is sent to an improved YOLOv5S model for training;

compared with the prior art, the invention has the beneficial effects that: the fish detection model built by the single-stage detection model based on the improved YOLOv5S has the advantages of high identification speed and capability of processing a large number of fish pictures. Meanwhile, the added SE attention module can enable the model to pay more attention to certain discriminative characteristics of the fishes, so that the detection precision is improved. The use of depth separable convolution can further reduce the amount of parameters and thus increase the detection speed. The EIoU frame regression loss function is used, the loss function unpacks the influence factors of the aspect ratio on the basis of the penalty term of the original CIOU loss function to respectively calculate the loss functions of the height and the width of the target frame and the anchor frame, and the new width and height loss directly minimizes the difference between the width and the height of the target frame and the anchor frame, so that the convergence speed is higher;

drawings

Fig. 1 is a schematic flow chart based on an improved YOLOv5S model according to the present invention.

Fig. 2 is a schematic diagram of a channel-by-channel convolution based on the improved YOLOv5S model according to the present invention.

FIG. 3 is a schematic diagram of a point-by-point convolution based on the improved YOLOv5S model according to the present invention.

FIG. 4 is a schematic flow chart of the present invention for enhancing motion data based on the improved YOLOv5S model.

Detailed Description

Example (b):

fig. 1 is a flowchart of a fish detection method based on an improved YOLOV5S model according to an embodiment of the present invention. The specific steps are as follows:

s1, collecting fish image data, performing labeling processing and normalization processing on the fish image data, and dividing the fish image data into a training set, a verification set and a test set;

and S11, labeling by using a LabelImg tool, wherein the labeling can generate an XML label file, and the file name corresponds to the picture name. Then carrying out normalization processing on the obtained product;

a further, normalized formula is:

wherein, (width, height) represents the original width of the picture, (x) _max ，y _max )，(x _min ，y _min ) Position information of the upper left corner and the lower right corner of the original sample bounding box, (x, y), (w, h) are respectively the coordinate of the central point and the width and the height after target normalization;

and S12, dividing the processed data into a training set, a verification set and a test set. The ratio of the components is 7:2: 1;

s2, constructing an improved YOLOv5S network model;

s21 adds an SE attention module to the C3 module of the original YOLOv5S network. The SE attention module performs global average pooling on the input feature map to reduce the feature map to 1 × 1, then establishes connection between channels by using two full-connection layers and a nonlinear activation function, finally obtains normalized weight through a Sigmoid activation function, and then weights the original feature map channel by channel through multiplication to complete recalibration of the original feature;

s22 replaces the underlying convolution operation in the CSP module of the original YOLOv5S network with a deep separable convolution. The depth separable convolution is mainly divided into two steps, channel-by-channel convolution and point-by-point convolution. Firstly, channel-by-channel convolution is performed, as shown in fig. 2, the number of convolution kernels is equal to the dimension of the input feature diagram, and different from ordinary convolution, each convolution kernel corresponds to one channel in channel-by-channel convolution. Next, point-by-point convolution is performed, schematically shown in fig. 3, which is similar to the ordinary convolution, where the size of the convolution kernel is M × 1, and M is the number of channels of the feature map output by the previous layer, and the convolution operation here weights and combines the feature maps generated by the channel-by-channel convolution in the channel direction to generate a new feature map;

s23, changing the frame loss function in the original YOLOv5S from a complete cross ratio loss function (CIOU) to an effective cross ratio loss function (EIOU). The expression of EIOU is shown below;

expression of EIOU:

E _Iou represents the loss of the intersection ratio IoU of the overlapping portions of two rectangular frames, L _dis Denotes the distance loss, L _asp Representing the loss of side length; ρ is a unit of a gradient ² (b，b ^gt ) Euclidean distance representing the central points of the prediction frame and the real frame, b represents the coordinate of the central point of the prediction frame, b ^gt Coordinates representing the center point of the real box, gt represents the real value, c represents the diagonal distance of the minimum closure area capable of containing both the predicted box and the real box, ρ ² (w，w ^gt ) Euclidean distance representing the width of the prediction frame and the real frame, w represents the width of the prediction frame, w ^gt Width, C, representing real box _w Width, ρ, representing the minimum closure area that can contain both the prediction box and the real box ² (h，h ^gt ) The Euclidean distance representing the height of the prediction frame and the real frame, h represents the height of the prediction frame ^gt Representing the height of the real box; c _h Represents the height of the minimum closure area that can contain both the prediction box and the real box;

s3, training and testing the built YOLOv5S neural network model;

firstly, carrying out image data enhancement processing on a training set;

s31, the data enhancement processing comprises the steps of carrying out random gamma transformation, random perspective transformation, random brightness-contrast transformation and noise-motion blur addition on the image;

s32 needs to be subjected to dynamic data enhancement processing after the enhancement processing. The motion flow is shown in fig. 4, and the motion data enhancement processing is to splice four images by means of random scaling, random clipping and random arrangement of each four image data; the data set can be effectively enriched by using motion data enhancement processing, the number of targets is increased, four pictures are spliced together, namely phase change is realized, the BatchSize is improved, and the statistics of variance and mean value can be better carried out during batch averaging processing;

and then training the improved YOLOv5S neural network by using the training set and the verification set and testing the trained model by using the test set, wherein the method specifically comprises the following steps:

s33, preparing two folders which are respectively a label folder and an image data set folder, wherein three folders are arranged below the image data set folder and respectively correspond to the divided training set, the divided verification set and the divided testing set. XML label files are stored in the label folder;

s34, training by using a PyTorch neural network framework;

s35, setting model hyper-parameters including BatchSize, iteration times, learning rate adjustment strategy and selection of an optimizer;

and S36, training to model convergence by using the training set, and performing verification and parameter tuning by using the verification set in the training process. Storing the obtained weight file after passing the test;

s37, testing the trained model by using the test set data in the step S33;

s4, inputting Chinese medicinal material image data to be detected into the tested detection model, and then detecting and processing the Chinese medicinal material image to be detected through the detection model;

the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A fish detection method based on an improved YOLOv5S model is characterized in that: the method comprises the following steps:

s2, constructing a YOLOv5S model, wherein the model specifically comprises Backbone, sock and Prediction; the Backbone is a core module of a YOLOv5S model and consists of Focus, C3, SPP and other modules; the neck consists of an FPN + PAN structure; the Prediction is a loss function of the detection model; the improvement to the YOLOv5S model includes adding an SE attention module to the C3 module in the YOLOv5S model, using deep separable convolution instead of the basic convolution operation in CSP; and changing the frame loss function in the original YOLOv5s from a complete cross ratio loss function (CIOU) to an effective cross ratio loss function (EIOU); obtaining an improved YOLOv5S model;

s3, performing data enhancement processing on the training set obtained in the step S1, and then sending the training set to the improved YOLOv5S model obtained in the step S2 for training; then, verification is carried out through the verification set obtained in the step S1, and parameter tuning is carried out on the improved YOLOv5S model according to the verification result; finally, performing model test on the improved YOLOv5S model through the test set obtained in the step S1, and storing the model after the test is passed and using the model as a detection model;

and S4, inputting the Chinese medicinal material image data to be detected into the detection model, and then detecting and processing the Chinese medicinal material image to be detected through the detection model.

2. The fish detection method based on the improved YOLOv5S model as claimed in claim 1, wherein in step S1, labelImg tool is used to label each Chinese medicine picture.

3. The fish detection method based on the improved YOLOv5S model of claim 1, wherein the training set, the verification set and the test set are divided into 7:2:1.

4. the fish detection method based on the improved YOLOv5S model as claimed in claim 1, wherein in step S2, the SE attention module performs global average pooling on the input feature map to reduce the feature map to 1 × 1, then establishes inter-channel connections using two fully connected layers and a nonlinear activation function, finally obtains normalized weights through a Sigmoid activation function, and then weights the original feature map channel by channel through multiplication to complete the recalibration of the original feature.

5. The fish detection method based on the improved YOLOv5S model of claim 1, wherein the EIOU loss function in step S2 is:

wherein E is _Iou Represents the loss of the intersection ratio IoU of the overlapping portions of the two rectangular frames, L _dis Denotes the distance loss, L _asp Represents the side length loss; rho ² (b，b ^gt ) Representing Euclidean distance between the center points of the prediction frame and the real frame, b representing the coordinate of the center point of the prediction frame, b ^gt Coordinates representing the center point of the real box, gt represents the real value, c represents the diagonal distance of the minimum closure area that can contain both the prediction box and the real box, ρ ² (w，w ^gt ) Euclidean distance representing the width of the prediction frame and the real frame, w represents the width of the prediction frame, w ^gt Width, C, representing real box _w Width, ρ, representing the minimum closure area that can contain both the prediction box and the real box ² (h，h ^gt ) The Euclidean distance representing the height of the prediction frame and the real frame, h represents the height of the prediction frame ^gt Representing the height of the real box; c _h Represents the height of the minimum closure area that can contain both the prediction box and the real box.

6. The fish detection method based on the improved YOLOv5S model of claim 1, wherein in step S3, the data in the training set is subjected to rotation, translation, scaling, random illumination and Mosaic8 enhancement operations; and then the processed data is sent to an improved YOLOv5S model for training.