CN116912674A - Target detection method and system based on improved YOLOv5s network model under complex water environment - Google Patents

Target detection method and system based on improved YOLOv5s network model under complex water environment Download PDF

Info

Publication number
CN116912674A
CN116912674A CN202310951353.9A CN202310951353A CN116912674A CN 116912674 A CN116912674 A CN 116912674A CN 202310951353 A CN202310951353 A CN 202310951353A CN 116912674 A CN116912674 A CN 116912674A
Authority
CN
China
Prior art keywords
improved
yolov5s
network model
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310951353.9A
Other languages
Chinese (zh)
Inventor
管志光
侯成龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jiaotong University
Original Assignee
Shandong Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jiaotong University filed Critical Shandong Jiaotong University
Priority to CN202310951353.9A priority Critical patent/CN116912674A/en
Publication of CN116912674A publication Critical patent/CN116912674A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method and a target detection system based on an improved YOLOv5S network model under complex water environment, wherein S1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set; s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model; s3: inputting the seafood dataset into a modified YOLOv5s based network model for training; s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected. The network model in the invention can pay more attention to marine products, can reduce the influence of useless features, can be applied to the detection and identification work of marine products in an underwater complex environment, and has high identification precision.

Description

Target detection method and system based on improved YOLOv5s network model under complex water environment
Technical Field
The invention relates to the technical field of ocean information, in particular to a target detection method and system based on an improved YOLOv5s network model under complex water environment.
Background
At present, most target detection frameworks are constructed based on the theory of convolutional neural networks, and targets to be detected are detected aiming at objects in clear environments on land. The idea of the network is to extract the characteristics of the input image and further learn the extracted characteristics in the network, so as to generate a weight file to realize the detection of the identified object in the new input image.
However, when the method is applied to an underwater environment, the underwater image imaging quality is poor due to the fact that the underwater environment is complex and is influenced by factors such as illumination, and the method is applied to the underwater aquaculture industry by using a target detection frame based on a convolutional neural network, and the conditions of low detection precision and inaccurate regression exist in the detection process.
Disclosure of Invention
In order to solve the problems in the prior art, the target detection method and system based on the improved YOLOv5s network model under complex water environment are provided.
The technical scheme adopted for solving the technical problems is as follows:
the invention provides a target detection method based on an improved YOLOv5s network model under complex water environment, which comprises the following steps:
s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;
s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;
s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;
s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.
Preferably, in S1, the data set is divided into a training set and a test set, and converted into a format that can be read by the deep learning framework.
Preferably, in S3, the training set image is input to the improved YOLOv5S network model for feature extraction, feature fusion is performed on the improved neck network after feature extraction in the backbone network, and finally the training set image is output on the detection head.
Preferably, in S4, after training based on the improved YOLOv5S network model is completed, a trained weight file is generated, the test set sample is input into training based on the improved YOLOv5S network model, the weight file is loaded for prediction, and finally the recognition detection result is output.
Preferably, the training based on the improved YOLOv5s network model comprises a trunk part, a neck part and an output part.
Preferably, the trunk portion: respectively integrating a hor_block attention module after each C3 layer of the main network part for extracting features, wherein the hor_block attention module is used for enhancing the feature extraction capability of the Yolov5S main network on underwater images, and the hor_block attention module is used for carrying out feature dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, feature dimension transformation, dropPath layer and output tensor on input tensors, and respectively outputting a feature graph after each C3 module, wherein the feature graph is marked as S1, S2, S3 and S4, and the method specifically comprises the following steps:
s11: the input tensor X is normalized and then enters a DropPath layer, the tensor is subjected to recursive gating convolution operation in the DropPath layer, scaling parameters are multiplied, and finally random discarding characteristics are carried out;
s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;
s13: processing the tensor through one linear layer, an activation function layer and the other linear layer, and multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;
s14: the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.
Preferably, the neck portion: adopting a structure of a feature pyramid network between the main part and the detection head, wherein the feature pyramid network is used for processing feature information of different scales in the image so as to effectively detect targets with different sizes;
in the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure is used for adjusting the number of channels of the features and changing the size of the features, and finally, feature graphs containing different scale feature information are fused;
the CBS module is to change the number of channels of the feature by using a plurality of 1 x 1 convolution kernels;
the up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, and up-sampling is needed for the feature map with a deeper level;
the Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved; the two modules have the functions of enabling the model to effectively process targets with different scales and having stronger characteristic representation capability;
the improvement of the FPN network structure further comprises:
s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;
s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.
Preferably, the output section: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.
Preferably, a Loss function for training an improved YOLOv5s network model adopts CIoU Loss, and the Loss function consists of three parts of confidence Loss, category Loss and position Loss, and the accuracy of confidence, the accurate judgment of category and the accuracy of regression of a detection frame in the model training process are respectively measured, as shown in formulas (1) - (4):
L=L box +L cls +L obj (1)
wherein L is box Indicating confidence loss, L cls Representing class loss, L obj Represents a loss of position, where L obj The method is realized through a CIoU Loss function, and the CIoU calculation formula is as follows:
wherein p is 2 (b,b gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w gt Represents the width of the real frame, h gt Representing the height of the real box.
Target detection system under complicated water environment based on improved YOLOv5s network model includes:
the acquisition module is used for acquiring images of the underwater marine products;
the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;
the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;
based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;
and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.
Compared with the prior art, the invention has the beneficial effects that:
1. the improved YOLOv5s image is improved by 1.5% compared with the original YOLOv5s image after the hor_Block module is added, so that the problem that the network pays attention to marine products more by adding the hor_Block module into the main network can be solved, the influence of useless features can be reduced, and the model can be applied to marine product detection and identification work under an underwater complex environment.
2. According to the invention, after small target detection is added, the accuracy of the model is improved by 1.1%, and experimental results show that in terms of parameters, the improved YOLOv5s is increased by 4.33M compared with the original YOLOv5s, the final detection accuracy of the model is improved by 1.9% compared with the original detection accuracy, the marine product detection accuracy can be effectively improved, and the experimental requirements on the detection accuracy can be met.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of the Hor_Block attention Module in the present invention;
FIG. 2 is a schematic diagram of an improved Yolov5 network structure according to the present invention;
FIG. 3 is a graph of initial model effects in accordance with the present invention;
FIG. 4 is a graph of the effect of the improved model of the present invention;
FIG. 5 is a graph of the effect of the initial model of the present invention;
FIG. 6 is a graph showing the effect of the model after improvement in the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Example 1
As shown in fig. 1-6, the embodiment provides a target detection method under complex water environment based on an improved YOLOv5s network model, which comprises the following steps:
s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;
s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;
s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;
s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.
In S1, the underwater seafood comprises common product types such as sea cucumbers, sea urchins, starfish, scallops and the like, the data set is divided into a training set and a testing set, and the training set and the testing set are converted into txt format which can be read by a deep learning framework.
And S3, inputting the training set image into an improved Yolov5S network model for feature extraction, performing feature fusion on an improved neck network after feature extraction in a main network, and finally outputting at a detection head.
And S4, after training based on the improved YOLOv5S network model is completed, generating a trained weight file, inputting the test set sample into training based on the improved YOLOv5S network model, loading the weight file for prediction, and finally outputting an identification detection result.
The improved YOLOv5s based network model training includes a stem portion, a neck portion, and an output portion.
A trunk portion: and integrating a Hor_Block attention module after each C3 layer of the main network part for extracting the characteristics, wherein the structure of the Hor_Block attention module is shown in figure 1, and the Hor_Block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data, and outputting a characteristic graph as S1, S2, S3 and S4 after each C3 module.
The Hor_Block attention module has the function of strengthening the feature extraction capability of the YOLOv5s backbone network on underwater images aiming at underwater data sets with unobvious image features so as to improve the precision of marine product detection, and specifically comprises the following steps of:
s11: the input tensor X is normalized and then enters a DropPath layer, in which the tensor is subjected to recursive gating convolution operation, multiplied by scaling parameters and finally subjected to random discarding characteristics;
s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;
s13: tensors are processed by one linear layer, the activation function layer, the other linear layer. Multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;
s14: finally, the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.
Neck portion: a structure of FPN (feature pyramid network) is adopted between the trunk part and the detection head,
a structure of a feature pyramid network is adopted between the trunk part and the detection head, and the feature pyramid network (Feature Pyramid Network, abbreviated as FPN) is a network structure for solving the problem of multi-scale target detection. The method is mainly used for processing characteristic information of different scales in the image so as to effectively detect targets with different sizes.
In the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure has the main functions of adjusting the number of channels of the features and changing the sizes of the features, and finally fusing feature graphs containing different scale feature information.
The CBS module is to change the number of channels of a feature by using a plurality of 1 x 1 convolution kernels.
The up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, so that up-sampling of the feature map with a deeper level is required.
The Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved. The two modules have the functions of enabling the model to effectively process targets with different scales and have more powerful characteristic representation capability.
The improvement of the FPN network structure further comprises:
s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;
s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.
An output section: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.
The Loss function for training the improved YOLOv5s network model adopts CIoU Loss, and the Loss function consists of three parts, namely confidence Loss, category Loss and position Loss, and is used for respectively measuring the accuracy of the confidence in the model training process, the accuracy of the category judgment and the accuracy of the regression of the detection frame, wherein the accuracy is shown in formulas (1) - (4):
L=L box +L cls +L obj (1)
wherein L is box Indicating confidence loss, L cls Representing class loss, L obj Represents a loss of position, where L obj Is realized by a CIoULoss function, and the CIoU calculation formula is as follows:
wherein p is 2 (b,b gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w gt Represents the width of the real frame, h gt Representing the height of the real box.
By considering the three parts of confidence loss, category loss and location loss in combination, CIoULoss provides a comprehensive training goal that promotes better performance of the model in the goal detection task.
Based on the improved method for detecting the target under the complex water environment of the YOLOv5s network model, 6575 underwater images are used in the network training process according to 7:3, randomly dividing the training set and the verification set according to the proportion, and counting the label information, the class proportion and the size distribution again after the division is finished, so that the distribution of the training set and the verification set is ensured to have similarity.
The system environment is Windows10, the GPU is adopted for training, and the CUDA11.1 version pushed by NIVIDIA and the neural network acceleration library cuDNN are mutually configured. The overall configuration of the training environment is shown in table 1. In the training process, the batch_size is 48, the optimizer is an Adam optimizer, the initial learning rate is 0.001, the weight attenuation rate is 0.0005, and the epochs is 200. The picture size in the input network is adjusted to the default size of 640 x 640 pixels.
Table 1 training environment configuration table
After the hor_block module is added, the mAP of the improved YOLOv5s is improved by 1.5% compared with the original YOLOv5s, so that the addition of the hor_block module to the backbone network can make the network pay more attention to marine products, and the influence of useless features can be reduced.
After the small target detection is added, the accuracy of the model is improved by 1.1%, so that the recognition accuracy of the model to the small target detection in the detection process is higher.
Experimental results show that the improved YOLOv5s is increased by 4.33M in terms of parameters and reduced by 22.7ms in terms of speed compared with the original YOLOv5 s. The final detection precision of the model is improved by 1.9% compared with the original detection precision.
Therefore, the model can meet the experimental requirements in terms of detection accuracy, although the detection speed is reduced. The initial model versus the modified model is shown in figures 3-6.
Example two
The object in this example is to provide a target detection system under complex water environment based on an improved YOLOv5s network model. Comprising the following steps:
the acquisition module is used for acquiring images of the underwater marine products;
the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;
the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;
based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;
and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.
After the hor_block module is added, the improved image of the YOLOv5s is improved by 1.5% compared with the original image of the YOLOv5s, so that the fact that the hor_block module is added in a main network can enable the network to pay more attention to marine products, influence of useless features can be reduced, and the model can be applied to detection and identification work of the marine products in underwater complex environments.
After small target detection is added, the accuracy of the model is improved by 1.1%, and experimental results show that in terms of parameters, compared with the original Yolov5s, the improved Yolov5s is increased by 4.33M, the final detection accuracy of the model is improved by 1.9% compared with the original detection accuracy, the marine product detection accuracy can be effectively improved, and the experimental requirements on the detection accuracy can be met.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. The target detection method under the complex water environment based on the improved YOLOv5s network model is characterized by comprising the following steps:
s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;
s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;
s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;
s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.
2. The method for detecting targets in complex water environments based on the improved YOLOv5S network model according to claim 1, wherein in S1, the data set is divided into a training set and a test set, and converted into a format readable by a deep learning framework.
3. The method for detecting the target under the complex water environment based on the improved YOLOv5S network model according to claim 2, wherein in the step S3, the training set image is input into the model based on the improved YOLOv5S network for feature extraction, the feature is extracted from the main network, then the feature fusion is performed on the improved neck network, and finally the head output is detected.
4. The method for detecting the target under the complex water environment based on the improved YOLOv5S network model according to claim 2, wherein in the step S4, after the training based on the improved YOLOv5S network model is completed, a trained weight file is generated, a test set sample is input into the training based on the improved YOLOv5S network model to load the weight file for prediction, and finally an identification detection result is output.
5. The method for detecting targets in complex water environments based on the improved YOLOv5s network model of claim 1, wherein training based on the improved YOLOv5s network model comprises a trunk portion, a neck portion and an output portion.
6. The improved YOLOv5s network model based target detection method in complex water environments of claim 5, wherein the backbone portion: respectively integrating a hor_block attention module after each C3 layer of the main network part for extracting features, wherein the hor_block attention module is used for enhancing the feature extraction capability of the Yolov5S main network on underwater images, and the hor_block attention module is used for carrying out feature dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, feature dimension transformation, dropPath layer and output tensor on input tensors, and respectively outputting a feature graph after each C3 module, wherein the feature graph is marked as S1, S2, S3 and S4, and the method specifically comprises the following steps:
s11: the input tensor X is normalized and then enters a DropPath layer, the tensor is subjected to recursive gating convolution operation in the DropPath layer, scaling parameters are multiplied, and finally random discarding characteristics are carried out;
s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;
s13: processing the tensor through one linear layer, an activation function layer and the other linear layer, and multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;
s14: the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.
7. The improved YOLOv5s network model based method of target detection in complex aquatic environments of claim 6, wherein the neck portion: adopting a structure of a feature pyramid network between the main part and the detection head, wherein the feature pyramid network is used for processing feature information of different scales in the image so as to effectively detect targets with different sizes;
in the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure is used for adjusting the number of channels of the features and changing the size of the features, and finally, feature graphs containing different scale feature information are fused;
the CBS module is to change the number of channels of the feature by using a plurality of 1 x 1 convolution kernels;
the up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, and up-sampling is needed for the feature map with a deeper level;
the Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved; the two modules have the functions of enabling the model to effectively process targets with different scales and having stronger characteristic representation capability;
the improvement of the FPN network structure further comprises:
s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;
s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.
8. The target detection method based on the improved YOLOv5s network model under the complex water environment according to claim 7, wherein the output part is: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.
9. The method for detecting targets in complex water environment based on the improved YOLOv5s network model according to claim 1, wherein a Loss function for training based on the improved YOLOv5s network model adopts CIoU Loss, the Loss function consists of three parts of confidence Loss, category Loss and position Loss, and the accuracy of confidence, the accuracy judgment of category and the accuracy of detection frame regression in the model training process are respectively measured, as shown in formulas (1) - (4):
L=L box +L cls +L obj (1)
wherein L is box Indicating confidence loss, L cls Representing class loss, L obj Represents a loss of position, where L obj Is through CThe IoU Loss function is implemented, and the CIoU calculation formula is as follows:
wherein p is 2 (b,b gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w gt Represents the width of the real frame, h gt Representing the height of the real box.
10. The target detection system based on the improved YOLOv5s network model under the complex water environment is used for realizing the target detection method based on the improved YOLOv5s network model under the complex water environment, and is characterized by comprising the following steps:
the acquisition module is used for acquiring images of the underwater marine products;
the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;
the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;
based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;
and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.
CN202310951353.9A 2023-07-31 2023-07-31 Target detection method and system based on improved YOLOv5s network model under complex water environment Pending CN116912674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310951353.9A CN116912674A (en) 2023-07-31 2023-07-31 Target detection method and system based on improved YOLOv5s network model under complex water environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310951353.9A CN116912674A (en) 2023-07-31 2023-07-31 Target detection method and system based on improved YOLOv5s network model under complex water environment

Publications (1)

Publication Number Publication Date
CN116912674A true CN116912674A (en) 2023-10-20

Family

ID=88353024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310951353.9A Pending CN116912674A (en) 2023-07-31 2023-07-31 Target detection method and system based on improved YOLOv5s network model under complex water environment

Country Status (1)

Country Link
CN (1) CN116912674A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117854116A (en) * 2024-03-08 2024-04-09 中国海洋大学 Sea cucumber in-situ length measurement method based on Bezier curve
CN117876848A (en) * 2024-03-13 2024-04-12 成都理工大学 Complex environment falling stone detection method based on improved yolov5

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117854116A (en) * 2024-03-08 2024-04-09 中国海洋大学 Sea cucumber in-situ length measurement method based on Bezier curve
CN117854116B (en) * 2024-03-08 2024-05-17 中国海洋大学 Sea cucumber in-situ length measurement method based on Bezier curve
CN117876848A (en) * 2024-03-13 2024-04-12 成都理工大学 Complex environment falling stone detection method based on improved yolov5
CN117876848B (en) * 2024-03-13 2024-05-07 成都理工大学 Complex environment falling stone detection method based on improvement yolov5

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108427920B (en) Edge-sea defense target detection method based on deep learning
CN110084292B (en) Target detection method based on DenseNet and multi-scale feature fusion
CN116912674A (en) Target detection method and system based on improved YOLOv5s network model under complex water environment
CN111738344B (en) Rapid target detection method based on multi-scale fusion
Rahaman et al. An efficient multilevel thresholding based satellite image segmentation approach using a new adaptive cuckoo search algorithm
CN114972976B (en) Night target detection and training method and device based on frequency domain self-attention mechanism
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN113591592B (en) Overwater target identification method and device, terminal equipment and storage medium
CN114973222B (en) Scene text recognition method based on explicit supervision attention mechanism
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN113191222A (en) Underwater fish target detection method and device
CN116977844A (en) Lightweight underwater target real-time detection method
CN113077438B (en) Cell nucleus region extraction method and imaging method for multi-cell nucleus color image
CN110766708B (en) Image comparison method based on contour similarity
CN112270404A (en) Detection structure and method for bulge defect of fastener product based on ResNet64 network
CN116543295A (en) Lightweight underwater target detection method and system based on degradation image enhancement
Raj et al. A novel Ship detection method from SAR image with reduced false alarm
CN113435389B (en) Chlorella and golden algae classification and identification method based on image feature deep learning
Mao et al. Power transmission line image segmentation method based on binocular vision and feature pyramid network
CN112417961B (en) Sea surface target detection method based on scene prior knowledge
CN114964628A (en) Shuffle self-attention light-weight infrared detection method and system for ammonia gas leakage
CN114463764A (en) Table line detection method and device, computer equipment and storage medium
CN113076819A (en) Fruit identification method and device under homochromatic background and fruit picking robot
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination