CN116912674A

CN116912674A - Target detection method and system based on improved YOLOv5s network model under complex water environment

Info

Publication number: CN116912674A
Application number: CN202310951353.9A
Authority: CN
Inventors: 管志光; 侯成龙
Original assignee: Shandong Jiaotong University
Current assignee: Shandong Jiaotong University
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-20

Abstract

The invention discloses a target detection method and a target detection system based on an improved YOLOv5S network model under complex water environment, wherein S1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set; s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model; s3: inputting the seafood dataset into a modified YOLOv5s based network model for training; s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected. The network model in the invention can pay more attention to marine products, can reduce the influence of useless features, can be applied to the detection and identification work of marine products in an underwater complex environment, and has high identification precision.

Description

Target detection method and system based on improved YOLOv5s network model under complex water environment

Technical Field

The invention relates to the technical field of ocean information, in particular to a target detection method and system based on an improved YOLOv5s network model under complex water environment.

Background

At present, most target detection frameworks are constructed based on the theory of convolutional neural networks, and targets to be detected are detected aiming at objects in clear environments on land. The idea of the network is to extract the characteristics of the input image and further learn the extracted characteristics in the network, so as to generate a weight file to realize the detection of the identified object in the new input image.

However, when the method is applied to an underwater environment, the underwater image imaging quality is poor due to the fact that the underwater environment is complex and is influenced by factors such as illumination, and the method is applied to the underwater aquaculture industry by using a target detection frame based on a convolutional neural network, and the conditions of low detection precision and inaccurate regression exist in the detection process.

Disclosure of Invention

In order to solve the problems in the prior art, the target detection method and system based on the improved YOLOv5s network model under complex water environment are provided.

The technical scheme adopted for solving the technical problems is as follows:

the invention provides a target detection method based on an improved YOLOv5s network model under complex water environment, which comprises the following steps:

s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;

s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;

s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;

s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.

Preferably, in S1, the data set is divided into a training set and a test set, and converted into a format that can be read by the deep learning framework.

Preferably, in S3, the training set image is input to the improved YOLOv5S network model for feature extraction, feature fusion is performed on the improved neck network after feature extraction in the backbone network, and finally the training set image is output on the detection head.

Preferably, in S4, after training based on the improved YOLOv5S network model is completed, a trained weight file is generated, the test set sample is input into training based on the improved YOLOv5S network model, the weight file is loaded for prediction, and finally the recognition detection result is output.

Preferably, the training based on the improved YOLOv5s network model comprises a trunk part, a neck part and an output part.

Preferably, the trunk portion: respectively integrating a hor_block attention module after each C3 layer of the main network part for extracting features, wherein the hor_block attention module is used for enhancing the feature extraction capability of the Yolov5S main network on underwater images, and the hor_block attention module is used for carrying out feature dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, feature dimension transformation, dropPath layer and output tensor on input tensors, and respectively outputting a feature graph after each C3 module, wherein the feature graph is marked as S1, S2, S3 and S4, and the method specifically comprises the following steps:

s11: the input tensor X is normalized and then enters a DropPath layer, the tensor is subjected to recursive gating convolution operation in the DropPath layer, scaling parameters are multiplied, and finally random discarding characteristics are carried out;

s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;

s13: processing the tensor through one linear layer, an activation function layer and the other linear layer, and multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;

s14: the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.

Preferably, the neck portion: adopting a structure of a feature pyramid network between the main part and the detection head, wherein the feature pyramid network is used for processing feature information of different scales in the image so as to effectively detect targets with different sizes;

in the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure is used for adjusting the number of channels of the features and changing the size of the features, and finally, feature graphs containing different scale feature information are fused;

the CBS module is to change the number of channels of the feature by using a plurality of 1 x 1 convolution kernels;

the up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, and up-sampling is needed for the feature map with a deeper level;

the Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved; the two modules have the functions of enabling the model to effectively process targets with different scales and having stronger characteristic representation capability;

the improvement of the FPN network structure further comprises:

s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;

s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.

Preferably, the output section: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.

Preferably, a Loss function for training an improved YOLOv5s network model adopts CIoU Loss, and the Loss function consists of three parts of confidence Loss, category Loss and position Loss, and the accuracy of confidence, the accurate judgment of category and the accuracy of regression of a detection frame in the model training process are respectively measured, as shown in formulas (1) - (4):

L＝L _box +L _cls +L _obj (1)

wherein L is _box Indicating confidence loss, L _cls Representing class loss, L _obj Represents a loss of position, where L _obj The method is realized through a CIoU Loss function, and the CIoU calculation formula is as follows:

wherein p is ² (b,b ^gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w ^gt Represents the width of the real frame, h ^gt Representing the height of the real box.

Target detection system under complicated water environment based on improved YOLOv5s network model includes:

the acquisition module is used for acquiring images of the underwater marine products;

the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;

the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;

based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;

and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.

Compared with the prior art, the invention has the beneficial effects that:

1. the improved YOLOv5s image is improved by 1.5% compared with the original YOLOv5s image after the hor_Block module is added, so that the problem that the network pays attention to marine products more by adding the hor_Block module into the main network can be solved, the influence of useless features can be reduced, and the model can be applied to marine product detection and identification work under an underwater complex environment.

2. According to the invention, after small target detection is added, the accuracy of the model is improved by 1.1%, and experimental results show that in terms of parameters, the improved YOLOv5s is increased by 4.33M compared with the original YOLOv5s, the final detection accuracy of the model is improved by 1.9% compared with the original detection accuracy, the marine product detection accuracy can be effectively improved, and the experimental requirements on the detection accuracy can be met.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of the Hor_Block attention Module in the present invention;

FIG. 2 is a schematic diagram of an improved Yolov5 network structure according to the present invention;

FIG. 3 is a graph of initial model effects in accordance with the present invention;

FIG. 4 is a graph of the effect of the improved model of the present invention;

FIG. 5 is a graph of the effect of the initial model of the present invention;

FIG. 6 is a graph showing the effect of the model after improvement in the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Example 1

As shown in fig. 1-6, the embodiment provides a target detection method under complex water environment based on an improved YOLOv5s network model, which comprises the following steps:

In S1, the underwater seafood comprises common product types such as sea cucumbers, sea urchins, starfish, scallops and the like, the data set is divided into a training set and a testing set, and the training set and the testing set are converted into txt format which can be read by a deep learning framework.

And S3, inputting the training set image into an improved Yolov5S network model for feature extraction, performing feature fusion on an improved neck network after feature extraction in a main network, and finally outputting at a detection head.

And S4, after training based on the improved YOLOv5S network model is completed, generating a trained weight file, inputting the test set sample into training based on the improved YOLOv5S network model, loading the weight file for prediction, and finally outputting an identification detection result.

The improved YOLOv5s based network model training includes a stem portion, a neck portion, and an output portion.

A trunk portion: and integrating a Hor_Block attention module after each C3 layer of the main network part for extracting the characteristics, wherein the structure of the Hor_Block attention module is shown in figure 1, and the Hor_Block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data, and outputting a characteristic graph as S1, S2, S3 and S4 after each C3 module.

The Hor_Block attention module has the function of strengthening the feature extraction capability of the YOLOv5s backbone network on underwater images aiming at underwater data sets with unobvious image features so as to improve the precision of marine product detection, and specifically comprises the following steps of:

s11: the input tensor X is normalized and then enters a DropPath layer, in which the tensor is subjected to recursive gating convolution operation, multiplied by scaling parameters and finally subjected to random discarding characteristics;

s13: tensors are processed by one linear layer, the activation function layer, the other linear layer. Multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;

s14: finally, the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.

Neck portion: a structure of FPN (feature pyramid network) is adopted between the trunk part and the detection head,

a structure of a feature pyramid network is adopted between the trunk part and the detection head, and the feature pyramid network (Feature Pyramid Network, abbreviated as FPN) is a network structure for solving the problem of multi-scale target detection. The method is mainly used for processing characteristic information of different scales in the image so as to effectively detect targets with different sizes.

In the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure has the main functions of adjusting the number of channels of the features and changing the sizes of the features, and finally fusing feature graphs containing different scale feature information.

The CBS module is to change the number of channels of a feature by using a plurality of 1 x 1 convolution kernels.

The up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, so that up-sampling of the feature map with a deeper level is required.

The Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved. The two modules have the functions of enabling the model to effectively process targets with different scales and have more powerful characteristic representation capability.

The improvement of the FPN network structure further comprises:

An output section: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.

The Loss function for training the improved YOLOv5s network model adopts CIoU Loss, and the Loss function consists of three parts, namely confidence Loss, category Loss and position Loss, and is used for respectively measuring the accuracy of the confidence in the model training process, the accuracy of the category judgment and the accuracy of the regression of the detection frame, wherein the accuracy is shown in formulas (1) - (4):

L＝L _box +L _cls +L _obj (1)

wherein L is _box Indicating confidence loss, L _cls Representing class loss, L _obj Represents a loss of position, where L _obj Is realized by a CIoULoss function, and the CIoU calculation formula is as follows:

By considering the three parts of confidence loss, category loss and location loss in combination, CIoULoss provides a comprehensive training goal that promotes better performance of the model in the goal detection task.

Based on the improved method for detecting the target under the complex water environment of the YOLOv5s network model, 6575 underwater images are used in the network training process according to 7:3, randomly dividing the training set and the verification set according to the proportion, and counting the label information, the class proportion and the size distribution again after the division is finished, so that the distribution of the training set and the verification set is ensured to have similarity.

The system environment is Windows10, the GPU is adopted for training, and the CUDA11.1 version pushed by NIVIDIA and the neural network acceleration library cuDNN are mutually configured. The overall configuration of the training environment is shown in table 1. In the training process, the batch_size is 48, the optimizer is an Adam optimizer, the initial learning rate is 0.001, the weight attenuation rate is 0.0005, and the epochs is 200. The picture size in the input network is adjusted to the default size of 640 x 640 pixels.

Table 1 training environment configuration table

After the hor_block module is added, the mAP of the improved YOLOv5s is improved by 1.5% compared with the original YOLOv5s, so that the addition of the hor_block module to the backbone network can make the network pay more attention to marine products, and the influence of useless features can be reduced.

After the small target detection is added, the accuracy of the model is improved by 1.1%, so that the recognition accuracy of the model to the small target detection in the detection process is higher.

Experimental results show that the improved YOLOv5s is increased by 4.33M in terms of parameters and reduced by 22.7ms in terms of speed compared with the original YOLOv5 s. The final detection precision of the model is improved by 1.9% compared with the original detection precision.

Therefore, the model can meet the experimental requirements in terms of detection accuracy, although the detection speed is reduced. The initial model versus the modified model is shown in figures 3-6.

Example two

The object in this example is to provide a target detection system under complex water environment based on an improved YOLOv5s network model. Comprising the following steps:

After the hor_block module is added, the improved image of the YOLOv5s is improved by 1.5% compared with the original image of the YOLOv5s, so that the fact that the hor_block module is added in a main network can enable the network to pay more attention to marine products, influence of useless features can be reduced, and the model can be applied to detection and identification work of the marine products in underwater complex environments.

After small target detection is added, the accuracy of the model is improved by 1.1%, and experimental results show that in terms of parameters, compared with the original Yolov5s, the improved Yolov5s is increased by 4.33M, the final detection accuracy of the model is improved by 1.9% compared with the original detection accuracy, the marine product detection accuracy can be effectively improved, and the experimental requirements on the detection accuracy can be met.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. The target detection method under the complex water environment based on the improved YOLOv5s network model is characterized by comprising the following steps:

2. The method for detecting targets in complex water environments based on the improved YOLOv5S network model according to claim 1, wherein in S1, the data set is divided into a training set and a test set, and converted into a format readable by a deep learning framework.

3. The method for detecting the target under the complex water environment based on the improved YOLOv5S network model according to claim 2, wherein in the step S3, the training set image is input into the model based on the improved YOLOv5S network for feature extraction, the feature is extracted from the main network, then the feature fusion is performed on the improved neck network, and finally the head output is detected.

4. The method for detecting the target under the complex water environment based on the improved YOLOv5S network model according to claim 2, wherein in the step S4, after the training based on the improved YOLOv5S network model is completed, a trained weight file is generated, a test set sample is input into the training based on the improved YOLOv5S network model to load the weight file for prediction, and finally an identification detection result is output.

5. The method for detecting targets in complex water environments based on the improved YOLOv5s network model of claim 1, wherein training based on the improved YOLOv5s network model comprises a trunk portion, a neck portion and an output portion.

6. The improved YOLOv5s network model based target detection method in complex water environments of claim 5, wherein the backbone portion: respectively integrating a hor_block attention module after each C3 layer of the main network part for extracting features, wherein the hor_block attention module is used for enhancing the feature extraction capability of the Yolov5S main network on underwater images, and the hor_block attention module is used for carrying out feature dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, feature dimension transformation, dropPath layer and output tensor on input tensors, and respectively outputting a feature graph after each C3 module, wherein the feature graph is marked as S1, S2, S3 and S4, and the method specifically comprises the following steps:

7. The improved YOLOv5s network model based method of target detection in complex aquatic environments of claim 6, wherein the neck portion: adopting a structure of a feature pyramid network between the main part and the detection head, wherein the feature pyramid network is used for processing feature information of different scales in the image so as to effectively detect targets with different sizes;

the improvement of the FPN network structure further comprises:

8. The target detection method based on the improved YOLOv5s network model under the complex water environment according to claim 7, wherein the output part is: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.

9. The method for detecting targets in complex water environment based on the improved YOLOv5s network model according to claim 1, wherein a Loss function for training based on the improved YOLOv5s network model adopts CIoU Loss, the Loss function consists of three parts of confidence Loss, category Loss and position Loss, and the accuracy of confidence, the accuracy judgment of category and the accuracy of detection frame regression in the model training process are respectively measured, as shown in formulas (1) - (4):

L＝L _box +L _cls +L _obj (1)

wherein L is _box Indicating confidence loss, L _cls Representing class loss, L _obj Represents a loss of position, where L _obj Is through CThe IoU Loss function is implemented, and the CIoU calculation formula is as follows:

10. The target detection system based on the improved YOLOv5s network model under the complex water environment is used for realizing the target detection method based on the improved YOLOv5s network model under the complex water environment, and is characterized by comprising the following steps: