CN116912674A - Target detection method and system based on improved YOLOv5s network model under complex water environment - Google Patents
Target detection method and system based on improved YOLOv5s network model under complex water environment Download PDFInfo
- Publication number
- CN116912674A CN116912674A CN202310951353.9A CN202310951353A CN116912674A CN 116912674 A CN116912674 A CN 116912674A CN 202310951353 A CN202310951353 A CN 202310951353A CN 116912674 A CN116912674 A CN 116912674A
- Authority
- CN
- China
- Prior art keywords
- improved
- yolov5s
- network model
- module
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 89
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 42
- 235000014102 seafood Nutrition 0.000 claims abstract description 21
- 230000009466 transformation Effects 0.000 claims description 27
- 238000000034 method Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 13
- 238000005070 sampling Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 230000006872 improvement Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000258957 Asteroidea Species 0.000 description 1
- 241000257465 Echinoidea Species 0.000 description 1
- 241000251511 Holothuroidea Species 0.000 description 1
- 241000237503 Pectinidae Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000009360 aquaculture Methods 0.000 description 1
- 244000144974 aquaculture Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 235000020637 scallop Nutrition 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/05—Underwater scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method and a target detection system based on an improved YOLOv5S network model under complex water environment, wherein S1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set; s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model; s3: inputting the seafood dataset into a modified YOLOv5s based network model for training; s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected. The network model in the invention can pay more attention to marine products, can reduce the influence of useless features, can be applied to the detection and identification work of marine products in an underwater complex environment, and has high identification precision.
Description
Technical Field
The invention relates to the technical field of ocean information, in particular to a target detection method and system based on an improved YOLOv5s network model under complex water environment.
Background
At present, most target detection frameworks are constructed based on the theory of convolutional neural networks, and targets to be detected are detected aiming at objects in clear environments on land. The idea of the network is to extract the characteristics of the input image and further learn the extracted characteristics in the network, so as to generate a weight file to realize the detection of the identified object in the new input image.
However, when the method is applied to an underwater environment, the underwater image imaging quality is poor due to the fact that the underwater environment is complex and is influenced by factors such as illumination, and the method is applied to the underwater aquaculture industry by using a target detection frame based on a convolutional neural network, and the conditions of low detection precision and inaccurate regression exist in the detection process.
Disclosure of Invention
In order to solve the problems in the prior art, the target detection method and system based on the improved YOLOv5s network model under complex water environment are provided.
The technical scheme adopted for solving the technical problems is as follows:
the invention provides a target detection method based on an improved YOLOv5s network model under complex water environment, which comprises the following steps:
s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;
s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;
s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;
s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.
Preferably, in S1, the data set is divided into a training set and a test set, and converted into a format that can be read by the deep learning framework.
Preferably, in S3, the training set image is input to the improved YOLOv5S network model for feature extraction, feature fusion is performed on the improved neck network after feature extraction in the backbone network, and finally the training set image is output on the detection head.
Preferably, in S4, after training based on the improved YOLOv5S network model is completed, a trained weight file is generated, the test set sample is input into training based on the improved YOLOv5S network model, the weight file is loaded for prediction, and finally the recognition detection result is output.
Preferably, the training based on the improved YOLOv5s network model comprises a trunk part, a neck part and an output part.
Preferably, the trunk portion: respectively integrating a hor_block attention module after each C3 layer of the main network part for extracting features, wherein the hor_block attention module is used for enhancing the feature extraction capability of the Yolov5S main network on underwater images, and the hor_block attention module is used for carrying out feature dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, feature dimension transformation, dropPath layer and output tensor on input tensors, and respectively outputting a feature graph after each C3 module, wherein the feature graph is marked as S1, S2, S3 and S4, and the method specifically comprises the following steps:
s11: the input tensor X is normalized and then enters a DropPath layer, the tensor is subjected to recursive gating convolution operation in the DropPath layer, scaling parameters are multiplied, and finally random discarding characteristics are carried out;
s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;
s13: processing the tensor through one linear layer, an activation function layer and the other linear layer, and multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;
s14: the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.
Preferably, the neck portion: adopting a structure of a feature pyramid network between the main part and the detection head, wherein the feature pyramid network is used for processing feature information of different scales in the image so as to effectively detect targets with different sizes;
in the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure is used for adjusting the number of channels of the features and changing the size of the features, and finally, feature graphs containing different scale feature information are fused;
the CBS module is to change the number of channels of the feature by using a plurality of 1 x 1 convolution kernels;
the up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, and up-sampling is needed for the feature map with a deeper level;
the Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved; the two modules have the functions of enabling the model to effectively process targets with different scales and having stronger characteristic representation capability;
the improvement of the FPN network structure further comprises:
s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;
s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.
Preferably, the output section: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.
Preferably, a Loss function for training an improved YOLOv5s network model adopts CIoU Loss, and the Loss function consists of three parts of confidence Loss, category Loss and position Loss, and the accuracy of confidence, the accurate judgment of category and the accuracy of regression of a detection frame in the model training process are respectively measured, as shown in formulas (1) - (4):
L=L box +L cls +L obj (1)
wherein L is box Indicating confidence loss, L cls Representing class loss, L obj Represents a loss of position, where L obj The method is realized through a CIoU Loss function, and the CIoU calculation formula is as follows:
wherein p is 2 (b,b gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w gt Represents the width of the real frame, h gt Representing the height of the real box.
Target detection system under complicated water environment based on improved YOLOv5s network model includes:
the acquisition module is used for acquiring images of the underwater marine products;
the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;
the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;
based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;
and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.
Compared with the prior art, the invention has the beneficial effects that:
1. the improved YOLOv5s image is improved by 1.5% compared with the original YOLOv5s image after the hor_Block module is added, so that the problem that the network pays attention to marine products more by adding the hor_Block module into the main network can be solved, the influence of useless features can be reduced, and the model can be applied to marine product detection and identification work under an underwater complex environment.
2. According to the invention, after small target detection is added, the accuracy of the model is improved by 1.1%, and experimental results show that in terms of parameters, the improved YOLOv5s is increased by 4.33M compared with the original YOLOv5s, the final detection accuracy of the model is improved by 1.9% compared with the original detection accuracy, the marine product detection accuracy can be effectively improved, and the experimental requirements on the detection accuracy can be met.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of the Hor_Block attention Module in the present invention;
FIG. 2 is a schematic diagram of an improved Yolov5 network structure according to the present invention;
FIG. 3 is a graph of initial model effects in accordance with the present invention;
FIG. 4 is a graph of the effect of the improved model of the present invention;
FIG. 5 is a graph of the effect of the initial model of the present invention;
FIG. 6 is a graph showing the effect of the model after improvement in the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Example 1
As shown in fig. 1-6, the embodiment provides a target detection method under complex water environment based on an improved YOLOv5s network model, which comprises the following steps:
s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;
s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;
s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;
s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.
In S1, the underwater seafood comprises common product types such as sea cucumbers, sea urchins, starfish, scallops and the like, the data set is divided into a training set and a testing set, and the training set and the testing set are converted into txt format which can be read by a deep learning framework.
And S3, inputting the training set image into an improved Yolov5S network model for feature extraction, performing feature fusion on an improved neck network after feature extraction in a main network, and finally outputting at a detection head.
And S4, after training based on the improved YOLOv5S network model is completed, generating a trained weight file, inputting the test set sample into training based on the improved YOLOv5S network model, loading the weight file for prediction, and finally outputting an identification detection result.
The improved YOLOv5s based network model training includes a stem portion, a neck portion, and an output portion.
A trunk portion: and integrating a Hor_Block attention module after each C3 layer of the main network part for extracting the characteristics, wherein the structure of the Hor_Block attention module is shown in figure 1, and the Hor_Block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data, and outputting a characteristic graph as S1, S2, S3 and S4 after each C3 module.
The Hor_Block attention module has the function of strengthening the feature extraction capability of the YOLOv5s backbone network on underwater images aiming at underwater data sets with unobvious image features so as to improve the precision of marine product detection, and specifically comprises the following steps of:
s11: the input tensor X is normalized and then enters a DropPath layer, in which the tensor is subjected to recursive gating convolution operation, multiplied by scaling parameters and finally subjected to random discarding characteristics;
s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;
s13: tensors are processed by one linear layer, the activation function layer, the other linear layer. Multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;
s14: finally, the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.
Neck portion: a structure of FPN (feature pyramid network) is adopted between the trunk part and the detection head,
a structure of a feature pyramid network is adopted between the trunk part and the detection head, and the feature pyramid network (Feature Pyramid Network, abbreviated as FPN) is a network structure for solving the problem of multi-scale target detection. The method is mainly used for processing characteristic information of different scales in the image so as to effectively detect targets with different sizes.
In the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure has the main functions of adjusting the number of channels of the features and changing the sizes of the features, and finally fusing feature graphs containing different scale feature information.
The CBS module is to change the number of channels of a feature by using a plurality of 1 x 1 convolution kernels.
The up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, so that up-sampling of the feature map with a deeper level is required.
The Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved. The two modules have the functions of enabling the model to effectively process targets with different scales and have more powerful characteristic representation capability.
The improvement of the FPN network structure further comprises:
s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;
s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.
An output section: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.
The Loss function for training the improved YOLOv5s network model adopts CIoU Loss, and the Loss function consists of three parts, namely confidence Loss, category Loss and position Loss, and is used for respectively measuring the accuracy of the confidence in the model training process, the accuracy of the category judgment and the accuracy of the regression of the detection frame, wherein the accuracy is shown in formulas (1) - (4):
L=L box +L cls +L obj (1)
wherein L is box Indicating confidence loss, L cls Representing class loss, L obj Represents a loss of position, where L obj Is realized by a CIoULoss function, and the CIoU calculation formula is as follows:
wherein p is 2 (b,b gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w gt Represents the width of the real frame, h gt Representing the height of the real box.
By considering the three parts of confidence loss, category loss and location loss in combination, CIoULoss provides a comprehensive training goal that promotes better performance of the model in the goal detection task.
Based on the improved method for detecting the target under the complex water environment of the YOLOv5s network model, 6575 underwater images are used in the network training process according to 7:3, randomly dividing the training set and the verification set according to the proportion, and counting the label information, the class proportion and the size distribution again after the division is finished, so that the distribution of the training set and the verification set is ensured to have similarity.
The system environment is Windows10, the GPU is adopted for training, and the CUDA11.1 version pushed by NIVIDIA and the neural network acceleration library cuDNN are mutually configured. The overall configuration of the training environment is shown in table 1. In the training process, the batch_size is 48, the optimizer is an Adam optimizer, the initial learning rate is 0.001, the weight attenuation rate is 0.0005, and the epochs is 200. The picture size in the input network is adjusted to the default size of 640 x 640 pixels.
Table 1 training environment configuration table
After the hor_block module is added, the mAP of the improved YOLOv5s is improved by 1.5% compared with the original YOLOv5s, so that the addition of the hor_block module to the backbone network can make the network pay more attention to marine products, and the influence of useless features can be reduced.
After the small target detection is added, the accuracy of the model is improved by 1.1%, so that the recognition accuracy of the model to the small target detection in the detection process is higher.
Experimental results show that the improved YOLOv5s is increased by 4.33M in terms of parameters and reduced by 22.7ms in terms of speed compared with the original YOLOv5 s. The final detection precision of the model is improved by 1.9% compared with the original detection precision.
Therefore, the model can meet the experimental requirements in terms of detection accuracy, although the detection speed is reduced. The initial model versus the modified model is shown in figures 3-6.
Example two
The object in this example is to provide a target detection system under complex water environment based on an improved YOLOv5s network model. Comprising the following steps:
the acquisition module is used for acquiring images of the underwater marine products;
the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;
the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;
based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;
and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.
After the hor_block module is added, the improved image of the YOLOv5s is improved by 1.5% compared with the original image of the YOLOv5s, so that the fact that the hor_block module is added in a main network can enable the network to pay more attention to marine products, influence of useless features can be reduced, and the model can be applied to detection and identification work of the marine products in underwater complex environments.
After small target detection is added, the accuracy of the model is improved by 1.1%, and experimental results show that in terms of parameters, compared with the original Yolov5s, the improved Yolov5s is increased by 4.33M, the final detection accuracy of the model is improved by 1.9% compared with the original detection accuracy, the marine product detection accuracy can be effectively improved, and the experimental requirements on the detection accuracy can be met.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
Claims (10)
1. The target detection method under the complex water environment based on the improved YOLOv5s network model is characterized by comprising the following steps:
s1: acquiring images of underwater seafood, marking and dividing the acquired images, and establishing a seafood data set;
s2: improving a main network and a detection head part of the Yolov5s network model, and establishing an improved Yolov5s network model;
s3: inputting the seafood dataset into a modified YOLOv5s based network model for training;
s4: after training, inputting the images of the underwater seafood to be detected into the trained model based on the improved Yolov5s network for detection, thereby obtaining the detection result of the underwater seafood to be detected.
2. The method for detecting targets in complex water environments based on the improved YOLOv5S network model according to claim 1, wherein in S1, the data set is divided into a training set and a test set, and converted into a format readable by a deep learning framework.
3. The method for detecting the target under the complex water environment based on the improved YOLOv5S network model according to claim 2, wherein in the step S3, the training set image is input into the model based on the improved YOLOv5S network for feature extraction, the feature is extracted from the main network, then the feature fusion is performed on the improved neck network, and finally the head output is detected.
4. The method for detecting the target under the complex water environment based on the improved YOLOv5S network model according to claim 2, wherein in the step S4, after the training based on the improved YOLOv5S network model is completed, a trained weight file is generated, a test set sample is input into the training based on the improved YOLOv5S network model to load the weight file for prediction, and finally an identification detection result is output.
5. The method for detecting targets in complex water environments based on the improved YOLOv5s network model of claim 1, wherein training based on the improved YOLOv5s network model comprises a trunk portion, a neck portion and an output portion.
6. The improved YOLOv5s network model based target detection method in complex water environments of claim 5, wherein the backbone portion: respectively integrating a hor_block attention module after each C3 layer of the main network part for extracting features, wherein the hor_block attention module is used for enhancing the feature extraction capability of the Yolov5S main network on underwater images, and the hor_block attention module is used for carrying out feature dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, feature dimension transformation, dropPath layer and output tensor on input tensors, and respectively outputting a feature graph after each C3 module, wherein the feature graph is marked as S1, S2, S3 and S4, and the method specifically comprises the following steps:
s11: the input tensor X is normalized and then enters a DropPath layer, the tensor is subjected to recursive gating convolution operation in the DropPath layer, scaling parameters are multiplied, and finally random discarding characteristics are carried out;
s12: entering a characteristic dimension transformation layer, transforming the dimension sequence of tensors from (N, C, H, W) to (N, H, W, C), and carrying out normalization operation on the tensors;
s13: processing the tensor through one linear layer, an activation function layer and the other linear layer, and multiplying the tensor by the learnable parameter T if the learnable parameter T is not null;
s14: the dimension order of the tensors is changed back (N, C, H, W), added to the output tensors through the first layer, and the final tensors are output using the DropPath layer for random discard feature.
7. The improved YOLOv5s network model based method of target detection in complex aquatic environments of claim 6, wherein the neck portion: adopting a structure of a feature pyramid network between the main part and the detection head, wherein the feature pyramid network is used for processing feature information of different scales in the image so as to effectively detect targets with different sizes;
in the model, a CBS module, an up-sampling module, a Concat module and a C3 module are used for forming an FPN network structure, and the FPN network structure is used for adjusting the number of channels of the features and changing the size of the features, and finally, feature graphs containing different scale feature information are fused;
the CBS module is to change the number of channels of the feature by using a plurality of 1 x 1 convolution kernels;
the up-sampling module is used for combining the high semantic features with low resolution with the low semantic features with high resolution, and up-sampling is needed for the feature map with a deeper level;
the Concat module and the C3 module fuse the up-sampled feature map with the corresponding shallow feature map, so that the target detection performance is improved; the two modules have the functions of enabling the model to effectively process targets with different scales and having stronger characteristic representation capability;
the improvement of the FPN network structure further comprises:
s21: adding 160×160 small target detection after the 80×80 detection head, extracting a feature map S1 from the C3 module of layer 2;
s22: and adding a convolution module, an up-sampling module and a C3 module after splicing the feature map S1 led out from the layer 2 after 21 layers of the network, and finally outputting.
8. The target detection method based on the improved YOLOv5s network model under the complex water environment according to claim 7, wherein the output part is: the three detection heads of the original 80×80, 40×40 and 20×20 are changed into four detection heads of 160×160, 80×80, 40×40 and 20×20.
9. The method for detecting targets in complex water environment based on the improved YOLOv5s network model according to claim 1, wherein a Loss function for training based on the improved YOLOv5s network model adopts CIoU Loss, the Loss function consists of three parts of confidence Loss, category Loss and position Loss, and the accuracy of confidence, the accuracy judgment of category and the accuracy of detection frame regression in the model training process are respectively measured, as shown in formulas (1) - (4):
L=L box +L cls +L obj (1)
wherein L is box Indicating confidence loss, L cls Representing class loss, L obj Represents a loss of position, where L obj Is through CThe IoU Loss function is implemented, and the CIoU calculation formula is as follows:
wherein p is 2 (b,b gt ) The Euclidean distance between the real frame and the predicted frame is represented, c represents the length of the minimum circumscribed rectangular diagonal line of the real frame and the predicted frame, v represents the distance between the aspect ratio of the real frame and the predicted frame, a represents the weight coefficient, w represents the width of the predicted frame, h represents the height of the predicted frame, and w gt Represents the width of the real frame, h gt Representing the height of the real box.
10. The target detection system based on the improved YOLOv5s network model under the complex water environment is used for realizing the target detection method based on the improved YOLOv5s network model under the complex water environment, and is characterized by comprising the following steps:
the acquisition module is used for acquiring images of the underwater marine products;
the feature extraction module is used for inputting the training set image into the improved Yolov5s network model for feature extraction;
the hor_block attention module is used for carrying out characteristic dimension transformation, horizontal layer normalization, linear layer linear transformation, activation function nonlinear mapping, linear layer linear transformation, characteristic dimension transformation, dropPath layer and output tensor on input data;
based on an improved YOLOv5s network model, the method is used for extracting features from the data set acquired by the acquisition module through the feature extraction module, carrying out feature fusion on an improved neck network after extracting the features from a main network, and finally outputting the features at a detection head;
and inputting the test set sample into the improved Yolov5s network model to load a weight file for prediction, and outputting the identification detection result of the test set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310951353.9A CN116912674A (en) | 2023-07-31 | 2023-07-31 | Target detection method and system based on improved YOLOv5s network model under complex water environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310951353.9A CN116912674A (en) | 2023-07-31 | 2023-07-31 | Target detection method and system based on improved YOLOv5s network model under complex water environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116912674A true CN116912674A (en) | 2023-10-20 |
Family
ID=88353024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310951353.9A Pending CN116912674A (en) | 2023-07-31 | 2023-07-31 | Target detection method and system based on improved YOLOv5s network model under complex water environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116912674A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117854116A (en) * | 2024-03-08 | 2024-04-09 | 中国海洋大学 | Sea cucumber in-situ length measurement method based on Bezier curve |
CN117876848A (en) * | 2024-03-13 | 2024-04-12 | 成都理工大学 | Complex environment falling stone detection method based on improved yolov5 |
-
2023
- 2023-07-31 CN CN202310951353.9A patent/CN116912674A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117854116A (en) * | 2024-03-08 | 2024-04-09 | 中国海洋大学 | Sea cucumber in-situ length measurement method based on Bezier curve |
CN117854116B (en) * | 2024-03-08 | 2024-05-17 | 中国海洋大学 | Sea cucumber in-situ length measurement method based on Bezier curve |
CN117876848A (en) * | 2024-03-13 | 2024-04-12 | 成都理工大学 | Complex environment falling stone detection method based on improved yolov5 |
CN117876848B (en) * | 2024-03-13 | 2024-05-07 | 成都理工大学 | Complex environment falling stone detection method based on improvement yolov5 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN108427920B (en) | Edge-sea defense target detection method based on deep learning | |
CN111738344B (en) | Rapid target detection method based on multi-scale fusion | |
CN116912674A (en) | Target detection method and system based on improved YOLOv5s network model under complex water environment | |
CN114972976B (en) | Night target detection and training method and device based on frequency domain self-attention mechanism | |
CN114973222B (en) | Scene text recognition method based on explicit supervision attention mechanism | |
CN110827312A (en) | Learning method based on cooperative visual attention neural network | |
CN113591592B (en) | Overwater target identification method and device, terminal equipment and storage medium | |
CN111179270A (en) | Image co-segmentation method and device based on attention mechanism | |
CN116977844A (en) | Lightweight underwater target real-time detection method | |
CN113077438B (en) | Cell nucleus region extraction method and imaging method for multi-cell nucleus color image | |
CN117078608B (en) | Double-mask guide-based high-reflection leather surface defect detection method | |
CN112465821A (en) | Multi-scale pest image detection method based on boundary key point perception | |
CN112270404A (en) | Detection structure and method for bulge defect of fastener product based on ResNet64 network | |
CN112507770A (en) | Rice disease and insect pest identification method and system | |
CN116543295A (en) | Lightweight underwater target detection method and system based on degradation image enhancement | |
CN115578364A (en) | Weak target detection method and system based on mixed attention and harmonic factor | |
Raj et al. | A novel Ship detection method from SAR image with reduced false alarm | |
CN113435389B (en) | Chlorella and golden algae classification and identification method based on image feature deep learning | |
CN112417961B (en) | Sea surface target detection method based on scene prior knowledge | |
CN108765365A (en) | A kind of rotor winding image qualification detection method | |
CN114964628A (en) | Shuffle self-attention light-weight infrared detection method and system for ammonia gas leakage | |
CN114463764A (en) | Table line detection method and device, computer equipment and storage medium | |
CN113076819A (en) | Fruit identification method and device under homochromatic background and fruit picking robot | |
CN115965681A (en) | Method and device for obtaining seaweed bed area based on convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |