CN117058232A - Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model - Google Patents
Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model Download PDFInfo
- Publication number
- CN117058232A CN117058232A CN202310933832.8A CN202310933832A CN117058232A CN 117058232 A CN117058232 A CN 117058232A CN 202310933832 A CN202310933832 A CN 202310933832A CN 117058232 A CN117058232 A CN 117058232A
- Authority
- CN
- China
- Prior art keywords
- target
- fish
- yolov8
- model
- pixels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000251468 Actinopterygii Species 0.000 title claims abstract description 78
- 238000001514 detection method Methods 0.000 title claims abstract description 76
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 7
- 230000001629 suppression Effects 0.000 claims abstract description 4
- 230000007246 mechanism Effects 0.000 claims description 38
- 238000000034 method Methods 0.000 claims description 38
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000001373 regressive effect Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 description 15
- 230000000694 effects Effects 0.000 description 11
- 238000000605 extraction Methods 0.000 description 10
- 238000002372 labelling Methods 0.000 description 7
- 238000009360 aquaculture Methods 0.000 description 5
- 244000144974 aquaculture Species 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 101000799969 Escherichia coli (strain K12) Alpha-2-macroglobulin Proteins 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 241001441722 Takifugu rubripes Species 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/80—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
- Y02A40/81—Aquaculture, e.g. of fish
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The application provides a position detection method for fish target individuals in a cultured fish shoal with an improved YOLOv8 model, which is characterized in that images in a cultured water area are collected as images to be detected and preprocessed; improving the YOLOv8 model; inputting the preprocessed image into an improved YOLOv8 model to obtain a feature map; positioning the target through the center point coordinates of the predicted target and the scale of the boundary frame to obtain a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame; selecting a part with higher score in the candidate predicted frames as the candidate frames according to the scores of the candidate predicted frames, and removing repeated candidate frames according to a non-maximum suppression algorithm to obtain predicted frames; and detecting the position and the size of the fish target individual and the category of the fish target individual in the prediction frame according to the prediction frame to obtain a final target detection result, wherein the final target detection result comprises the position information of the fish individual target and the category information of the fish individual target. The application improves the accuracy of the position detection of the fish target individuals in the cultured fish shoal.
Description
Technical Field
The application belongs to the technical field of intelligent recognition, and particularly discloses a position detection method for fish target individuals in a cultured fish shoal by improving a YOLOv8 model.
Description of the background
Accurate aquaculture is a new trend of aquaculture, and a target detection technology is a basis of accurate aquaculture, however, in a real aquaculture environment, blurring and shielding of underwater fish shoals have strong interference on fish target detection, and complex underwater conditions are often difficult to process by traditional image processing and machine vision methods, so that detection accuracy is not high. In recent years, development of deep learning provides a new solution for target detection, wherein the YOLO algorithm stands out due to the characteristics of stable operation, accurate detection and the like, but the accuracy of detection is still insufficient when the underwater shoal is in the face of blurring and shielding. Aiming at the problem of feature loss caused by shoal blurring, chen et al realize multi-scale feature extraction by using expansion convolution with different sampling rates in YOLOv7, but the effective features of the extraction are still limited. Li Haiqing and the like incorporate priori knowledge into YOLOv5 to further enhance the feature extraction capability of the model, but the final detection effect of the model is overly dependent on the quality of the priori knowledge. When noise exists under water and cannot acquire clean priori knowledge, yu and the like replace the nearest neighbor interpolation up-sampling method in Yolov7 with a CARAFE up-sampling method, so that the influence of the noise on the model feature extraction capability is reduced, and the detection effect is reduced when the method faces high-density farmed fish shoals. Aiming at the problem of target missed detection caused by high-density fish-shoal shielding, li Haiqing and the like make sampling points pay more attention to the acquisition of a foreground target by adding a deformable convolution module into YOLOv5, but the deformable convolution module possibly focuses on unnecessary features due to lack of effective learning guidance, so that feature extraction capability is reduced. In the aspect of the feature extraction capability of the model, the attention mechanism has unique advantages, and the perception capability of the model on key information can be improved. Zhao Meng, etc. fuse the SKNet attention mechanism with YOLOv5 to form a feature extraction network of the pixel-level information of interest, but the dimension reduction operation adopted by SKNet can negatively affect the prediction of channel attention. The efficient correlation channel attention mechanism ECA proposed by Wang et al avoids dimension reduction operation in the feature mapping process, but does not pay attention to the spatial position of the feature map, so that the spatial feature extraction capability is insufficient. Woo et al propose CBAM, add the space attention mechanism on the basis of focusing on channel coding, has strengthened the space extraction ability of the model, but still do not consider the performance degradation problem that the dimension reduction operation causes in the channel attention. Wei Saixue and the like propose a channel non-dimension-reduction dual-attention mechanism ECBAM, and the dimension-reduction operation existing in the CBAM is optimized, but the ECBAM completely compresses space information into the channel in a pooling mode, so that the information interaction of remote space positions cannot be satisfied. Aiming at the problems, a novel and improved method for detecting the position of a fish target individual in a fish farm by using a YOLOv8 model is researched and designed, and the problems existing in the conventional method for detecting the position of the fish target individual are very necessary.
Disclosure of Invention
The application provides a method for detecting the positions of fish target individuals in a cultured fish swarm by improving a YOLOv8 model, which aims to solve the problems that the existing method for detecting the positions of the fish target individuals cannot effectively treat the fuzzy fish swarm and the turbid water body exists and is blocked.
The application provides a method for detecting the position of a fish target individual in a cultured fish shoal by improving a YOLOv8 model, which comprises the following steps:
s1, collecting an image in a culture water area as an image to be detected, and preprocessing the image to be detected;
s2, adding an attention mechanism module at a C2f structure in a neck network of the YOLOv8 model, wherein the attention mechanism module comprises a high-efficiency channel attention module and a coordination space attention module, adding 1 large-size detection head at the bottom layer of a feature pyramid network of the YOLOv8 model, and carrying out feature fusion with the C2f structure at the bottommost layer of the neck network to obtain an improved YOLOv8 model;
s3, inputting the preprocessed image in the step S1 into the improved YOLOv8 model obtained in the step S2 to obtain a feature map, wherein the feature map is specifically as follows: inputting the preprocessed image in the step S1 into a high-efficiency channel attention module, and learning the correlation among different characteristic channels by using the channel attention to extract the characteristics more useful for the task; inputting the preprocessed image in the step S1 into a coordination space attention module, and extracting the position information characteristics of the target; fusing the more useful features for the task with the position information features of the target to obtain a feature map;
s4, positioning the target according to the feature map obtained in the step S3 through the center point coordinates of the predicted target and the scale of the boundary frame, obtaining a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame;
s5, sorting all the candidate frames according to the scores of the candidate prediction frames obtained in the step S4, selecting a part with the highest score in the selected prediction frames of the overlapped candidates as a candidate frame, and removing the repeated candidate frames according to a non-maximum suppression algorithm to obtain a prediction frame;
s6, obtaining the position information of the individual fish targets according to the prediction frame obtained in the step S5.
A method for detecting the position of a fish target individual in a farmed fish farm with an improved YOLOv8 model according to some embodiments of the application, the preprocessing in step S1 includes resizing the image and normalizing the pixel values.
According to some embodiments of the present application, in the step S2, the improved YOLOv8 model includes an input end, a backbone network and a head network, and the input end unifies the image into a resolution size of 640 pixels×640 pixels; the backbone network comprises a convolutional neural network for extracting image features, and the backbone network acquires richer gradient flows through a C2f structure and learns more features; the head network comprises a neck and a detection head, and the neck is connected with the backbone network and the detection head and is used for further extracting features and adjusting the resolution of a feature map, so that the prediction capability of the detection head is improved; the detection head comprises a classifier and a regressive device, wherein the classifier is used for predicting the category of the target, and the regressive device is used for predicting the position and the size of the target.
According to some embodiments of the present application, in the step S2, the size of the large-size detection head added to the bottom layer of the feature pyramid network of the YOLOv8 model is 160 pixels×160 pixels, the added large-size detection head is feature fused with the C2f structure of the bottom layer of the backbone network, and the detection size of the improved detection head on the feature map includes 160 pixels×160 pixels, 80 pixels×80 pixels, 40 pixels×40 pixels and 20 pixels×20 pixels.
In step S3, the efficient channel attention module uses an ECA attention mechanism to adaptively select a one-dimensional convolution kernel, and the efficient channel attention module extracts a feature M more useful for a task c (F) The calculation of (2) is shown in the formula (1):
M c (F)=σ(C 1 (G(F))) (1)
wherein G represents global average pooling, C 1 Representing a one-dimensional convolution, σ representing a sigmoid function, F representing the input preprocessed image, and F ε R C×H×W 。
In step S3, the coordination spatial attention module uses a CA attention mechanism to decompose spatial attention into two one-dimensional feature encoding processes, respectively aggregating features along two spatial directions, in one spatial directionWhile capturing the remote dependencies, maintaining accurate location information in another spatial direction, given input x, each channel is first encoded in the horizontal and vertical directions using (H, 1) and (1, W) pooling kernels of sizes, respectively, with the c-th channel encoded at HThe calculation of (c) is shown in formula (2), the c-th channel encodes +.>The calculation of (2) is shown in the formula (3):
after the decomposition operation, an intermediate feature map f of the encoded spatial information is obtained, as shown in formula (4):
f=δ(C 2 ([Z h ,Z w ])) (4)
wherein δ is a nonlinear activation function, C 2 Representing a two-dimensional convolution, [,]indicating the splicing operation, Z h Indicating that the c-th channel is encoded at h, Z w Indicating that the c-th channel is encoded at w.
After splitting f, obtaining the attention weight g through convolution transformation h And g w The coordination space attention module extracts target position information features M s The calculation of (2) is shown in formula (5):
according to some embodiments of the application, a method for detecting the position of a fish target individual in a cultured fish farm by improving the YOLOv8 model is provided, wherein in the step S3, the feature map M (F) is calculated as shown in the formula (6):
wherein F is E R C×H×W ,Representing element-wise multiplication.
According to the method for detecting the position of the fish target individual in the cultured fish shoal with the improved YOLOv8 model, in the step S4, the target confidence coefficient is obtained by carrying out cross-over ratio calculation on the overlapping degree of the candidate prediction frame and the real target frame in the image to be detected, if the cross-over ratio of the candidate prediction frame and the real target frame is larger than a threshold value, the target confidence coefficient is defined as 1, otherwise, the target confidence coefficient is defined as 0; and converting the original prediction output by the network into probability distribution by adopting a softmax function to obtain category prediction probability, and multiplying the target confidence coefficient and the category prediction probability to obtain the score of the candidate prediction frame.
According to the method for improving the position detection of the fish target individuals in the fish shoal in the YOLOv8 model, provided by the application, the large-size detection head is added in the bottom layer of the feature pyramid network of the YOLOv8 model, so that the detail information of the underwater fish individuals can be better captured, richer semantic information is obtained, the interference of fuzzy backgrounds is reduced by using the attention mechanism module, the key features of the fish individuals are focused, the recognition capability of the fuzzy fish shoal is enhanced, meanwhile, the direction sensing and position sensitivity capability of the spatial features are enhanced, the defect of the existing attention mechanism in fish shoal detection is effectively solved, the method can better adapt to the fuzzy and shielding conditions of the underwater fish shoal, and the accuracy of the position detection of the fish target individuals in the fish shoal is improved.
Drawings
FIG. 1 is a flow chart of a method for detecting the position of a fish target individual in a farmed fish farm with an improved YOLOv8 model according to the present application.
Detailed Description
Embodiments of the present application are described in further detail below with reference to the accompanying drawings and examples. The following examples are illustrative of the application but are not intended to limit the scope of the application.
Example 1
A method for detecting the position of a fish target individual in a cultured fish farm by improving a YOLOv8 model comprises the following steps:
s1, collecting an image in a culture water area as an image to be detected, and preprocessing the image to be detected;
preprocessing includes adjusting the size of the image and normalizing the pixel values;
s2, adding an attention mechanism module at a C2f structure in a neck network of the YOLOv8 model, wherein the attention mechanism module comprises a high-efficiency channel attention module and a coordination space attention module, adding 1 large-size detection head at the bottom layer of a feature pyramid network of the YOLOv8 model, and carrying out feature fusion with the C2f structure at the bottommost layer of the neck network to obtain an improved YOLOv8 model;
the improved YOLOv8 model comprises an input end, a backbone network and a head network, wherein the input end unifies the image into a resolution of 640 pixels by 640 pixels; the backbone network comprises a convolutional neural network for extracting image features, acquires richer gradient flows through a C2f structure and learns more features; the head network comprises a neck and a detection head, and the neck is connected with the backbone network and the detection head and is used for further extracting features and adjusting the resolution of the feature map, so that the prediction capability of the detection head is improved; the detection head comprises a classifier and a regressive device, wherein the classifier is used for predicting the category of the target, and the regressive device is used for predicting the position and the size of the target;
the attention mechanism module is used for helping the model to focus on a specific region of interest in the image, and discarding useless and interference information, so that the feature expression capacity is improved;
the size of the added large-size detection head at the bottom layer of the feature pyramid network of the YOLOv8 model is 160 pixels multiplied by 160 pixels, the added large-size detection head is subjected to feature fusion with the C2f structure at the bottom layer of the backbone network, the detection size of the improved detection head for the feature map comprises 160 pixels multiplied by 160 pixels, 80 pixels multiplied by 80 pixels, 40 pixels multiplied by 40 pixels and 20 pixels multiplied by 20 pixels, the newly added detection size of 160 pixels multiplied by 160 pixels can provide higher resolution, more detailed information can be captured, and fish target individuals with underwater blur can be detected more effectively;
s3, inputting the preprocessed image in the step S1 into an improved YOLOv8 model, wherein the method specifically comprises the following steps: inputting the preprocessed image in the step S1 into a high-efficiency channel attention module, and learning the correlation among different characteristic channels by using the channel attention to extract the characteristics more useful for the task; inputting the preprocessed image in the step S1 into a coordination space attention module, and extracting the position information characteristics of the target; fusing the more useful features of the task with the position information features of the target to obtain a feature map;
the efficient channel attention module uses the ECA attention mechanism to adaptively select one-dimensional convolution kernels, and features M extracted by the efficient channel attention module and more useful for tasks c (F) As shown in formula (1):
M c (F)=σ(C 1 (G(F))) (1)
wherein G represents global average pooling, C 1 Representing a one-dimensional convolution, σ representing a sigmoid function, F representing the input preprocessed image, and F ε R C×H×W ;
The high-efficiency channel attention module realizes high-efficiency interaction of local channels, and reduces the complexity of the YOLOv8 model;
the coordination spatial attention module uses CA attention mechanism to decompose the spatial attention into two one-dimensional feature coding processes, respectively aggregate features along two spatial directions, capture remote dependence in one spatial direction while maintaining accurate position information in the other spatial direction, given input x, firstly codes each channel in horizontal and vertical directions respectively using (H, 1) and (1, W) pooling cores of sizes, and the c-th channel codes at HThe calculation of (c) is shown in formula (2), the c-th channel encodes +.>The calculation of (2) is shown in the formula (3):
after the decomposition operation, an intermediate feature map f of the encoded spatial information is obtained, as shown in formula (4):
f=δ(C 2 ([Z h ,Z w ])) (4)
wherein δ is a nonlinear activation function, C 2 Representing a two-dimensional convolution, [,]indicating the splicing operation, Z h Indicating that the c-th channel is encoded at h, Z w Indicating that the c-th channel is encoded at w;
after splitting f, obtaining the attention weight g through convolution transformation h And g w Coordinating target position information features M extracted by a spatial attention module s As shown in formula (5):
the calculation of the feature map M (F) is as shown in formula (6):
wherein F is E R C×H×w ,Representing element-wise multiplication.
S4, positioning the target through the center point coordinates of the predicted target and the scale of the boundary frame according to the feature map obtained in the step S3, obtaining a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame;
obtaining target confidence coefficient by carrying out cross-over ratio calculation on the overlapping degree of the candidate prediction frame and a real target frame in the image to be detected, if the cross-over ratio of the candidate prediction frame and the real target frame is larger than a threshold value, defining the target confidence coefficient as 1, otherwise defining the target confidence coefficient as 0; converting the original prediction output by the network into probability distribution by adopting a softmax function to obtain category prediction probability, and multiplying the target confidence coefficient and the category prediction probability to obtain the score of the candidate prediction frame;
s5, sorting all the candidate frames according to the scores of the candidate prediction frames obtained in the step S4, selecting a part with the highest score in the overlapped candidate prediction frames as a candidate frame, and removing the repeated candidate frames according to a non-maximum suppression algorithm to obtain a prediction frame;
s6, obtaining the position information of the individual fish targets according to the prediction frame obtained in the step S5.
Example 2
The data used in the test are collected from a takifugu rubripes cultivation workshop in a cultivation farm, and in order to increase the difference among data set samples, part of highly similar data is removed, so that 600 images with the resolution of 1280 pixels multiplied by 720 pixels are obtained.
The swimming direction of the shoal of fish in the cultivation environment is not fixed, and the shooting angle of the camera is single and is not consistent with the actual situation. Therefore, in order to increase the diversity of fish-shoal samples, horizontal inversion, vertical inversion, and horizontal-vertical hybrid inversion are performed on the images in the dataset. After data augmentation, the number of images in the data set is increased by 1 time, and the total of 1200 images with the resolution of 1280 pixels×720 pixels is obtained. According to 7:2:1 to obtain 840, 240 and 120 images of training set, verification set and test set.
In fish target detection, labeling of a data set is a key link, and traditional manual labeling causes subjective differences of labeling results due to different understanding and judging of a main body, and the data labeling in the embodiment consists of two parts of semi-automatic labeling and manual fine adjustment. Firstly, a model YOLOv8 is trained by adopting 1008 Zhang Yuqun detection images marked in Li Haiqing and other experiments, 1200 unlabeled images required by the experiment are sent into the trained model YOLOv8 for data marking, and a file is stored in txt format. In the test process, the trained YOLOv8 model can accurately label 70% of fish individuals in the image, but the problems of false detection and missed detection exist for fuzzy and blocked targets, so that the rest 30% of individuals which cannot be accurately labeled in the image are manually fine-tuned through a LabelImg labeling tool, and the data set labeled in the mixed mode has better universality and higher labeling quality.
Test environment: pytorch framework version 1.10.0, CUDA version 11.3, operating system Ubuntu20.04, python version 3.8, development platform Jupyter, GPU RTX 3080 (10 GB). Times.1 card, CPU Intel (R) Xeon (R) platform 8255C, system main frequency 2.50GHz.
Parameter setting: batch_size was set to 8, epoch was 300 rounds, and initial learning rate was 0.0001.
To verify the effectiveness of the algorithm, a confusion matrix was used to evaluate the method used for the study. The confusion matrix includes True Positive (TP), true Negative (TN), false Positive (FP), false Negative (FN). Obtaining accuracy (P), recall (R) and average mean accuracy (mean Average Precision, mAP) from the confusion matrix, wherein the calculation formula is shown in formula (7) -formula (10):
where TP is the number of correctly predicted fish samples, FP is the number of incorrectly predicted fish samples, FN is the number of samples not predicted to fish, mAP@0.5 is the average accuracy of the cross ratio threshold set to 0.5.
In order to verify the effectiveness of the model improvement part, an ablation test was designed, the test protocol was as follows: the large-size detection Head and the attention mechanism module were added to the yolov8 model, respectively, the large-size detection Head was added to the yolov8 model and named yolov8+head, the attention mechanism module was added to the yolov8 model and named yolov8+ecam, and then compared with the method of the present embodiment (ECAM-yolov8) in which both modules were added at the same time, and the test results are shown in table 1.
Table 1 ablation test results
As can be seen from Table 1, the large-sized detection head and the attention mechanism module have a certain improvement on the detection performance of the YOLOv8 model, but the attention mechanism module has a larger improvement range, which means that adding the attention mechanism module has a larger influence on the performance of the YOLOv8 model, and can help the model focus on the most relevant part when processing input data, thereby improving the detection effect of the model. But the ECAM-YOLOv8 has the best effect, compared with a YOLOv8 model, the accuracy is improved by 2.30%, the recall rate is improved by 1.70%, and the average mean value accuracy is improved by 1.60%. This is because the large-sized detection head can provide more context information, which is important for understanding the semantics and position of the obscuring and occluding individuals, and after passing through the attention mechanism module, the improved YOLOv8 model can easily distinguish between background and target, enhancing the processing capacity of the model in complex underwater environments.
In the case of turbid water and low visibility, the attention-introducing mechanism module can help the model to selectively pay attention to the region of interest when processing the image, so that the time and the calculation amount for processing background information are reduced. In order to study the influence of the added position of the attention mechanism module on the detection effect, a position contrast test is designed. The attention mechanism modules were placed at 3 different locations, respectively, after SPPF of the Backbone network (Backbone), at the middle C2f structure of the Neck (Neck) and at the bottom C2f structure of the Neck (Neck), and the test results are shown in table 2.
TABLE 2 comparison of the Performance of the attention mechanism Module at different addition locations
As shown in table 2, the addition of the attention mechanism module to the Neck (neg) resulted in higher accuracy and recall than the addition to the Backbone network (Backbone) because the Backbone network (Backbone) mainly extracted the image ground layer features, failing to obtain the global information of the individual fish well. In contrast, having higher semantic features at the Neck (Neck) allows better global modeling of features. In addition, the attention mechanism module can bring more remarkable improvement after being added to the C2f of the Neck (Neck), and the improvement effect is more obvious because the attention mechanism is added in the information fusion process after the C2f of the Neck, and key information in the features can be selected and amplified.
In order to verify the effect of the method in the embodiment on underwater fish shoal detection, an advanced method in the field is selected for model comparison experiments, and the model comparison experiments are respectively as follows: li Haiqing and the like incorporate prior knowledge into an underwater population target detection model KAYOLO in YOLOv5, li Haiqing and the like add a deformable convolution module into the underwater population target detection model DCM-ATM-YOLOv5, zhao Meng and the like in which a SKNet attention mechanism and a model SK-YOLOv5, wei Saixue and the like of YOLOv5 are fused, a channel non-dimension-reducing dual attention mechanism ECBAM and a model ESB-YOLO of YOLOv5 are fused, test results are shown in Table 3,
TABLE 3 comparison of the Performance of different models with ECAM-YOLOv8
As shown in Table 3, ECAM-YOLOv8 of the method of this example gave the best results. Although the adopted contrast models release the problem of poor detection effect caused by blurring and shielding to a certain extent, the problems are respectively insufficient. KAYOLO utilizes priori knowledge to enhance the characteristics of an object to be detected, but the model detection result is too dependent on the quality and quantity of the priori knowledge, and when the prior knowledge is noisy or the quantity is small, the influence on the detection effect of the model is larger; the deformable convolution module adopted by DCM-ATM-YOLOv5 may focus on unnecessary features due to lack of effective learning guidance, resulting in reduced feature extraction capability; because the SK-YOLOv5 adopts the SKNet attention mechanism, after the feature fusion based on the global context, local detail information can be ignored, so that the detection difficulty of fuzzy fish swarm individuals is increased; in the space information coding operation, the ESB-YOLO does not consider space position information interaction, so that fish individuals cannot be accurately positioned. Therefore, the ECAM-YOLOv8 of the embodiment not only extracts the detailed information of the fuzzy individual, but also enhances the position sensing capability of the model by adding the large-size detection head and the ECAM attention mechanism, thereby improving the accuracy of detecting the underwater fish shoal target.
In order to verify the robustness of the algorithm, 2420 images are selected from the Chinese agricultural artificial intelligence innovation entrepreneur large-scale fish data set according to 7:2:1, and performing a simultaneous expansion test with KAYOLO, DCM-ATM-YOLOv5, SK-YOLOv5 and ESB-YOLO, and the test results are shown in Table 4.
Table 4 discloses a performance comparison of data sets, tab.4Performance Comparison of Public Datasets
As shown in Table 4, the accuracy, recall and average mean precision of the improved YOLOv8 (ECAM-YOLOv 8) of this embodiment are all higher than those of the other models, indicating that the proposed ECAM has stronger robustness with the improved Head module.
In an advanced target detection model, the YOLOv8 model has higher detection precision and faster reasoning speed by introducing a stronger network structure and an optimization algorithm; and a more effective data enhancement strategy is adopted, so that the generalization capability of the model under various scenes and conditions is improved. The YOLOv8 algorithm series includes 5 different models, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x, from small to large. Compared with the first 4 models, the YOLOv8x model adopts a deeper network structure, more convolution layers and residual connection are introduced, so that more features can be learned, and the optimal detection effect is achieved. Thus, the present study uses the YOLOv8x model as the basic model framework. In order to better extract the characteristics of the underwater fuzzy target, the accuracy of detection is improved to the greatest extent, a large-size detection head is added in the YOLOv8 model, and more detail information is extracted from the fuzzy target. Meanwhile, after the characteristics are fused, the attention mechanism module is used for paying attention to the position information of the target fish school, so that the accurate detection of the underwater aquaculture fish school video stream is realized.
The results of the ablation test and the comparison test show that the improved YOLOv8 (ECAM-YOLOv 8) of the embodiment has the best effect in the underwater blurred fish group detection task, and compared with the YOLOv8 model, the accuracy, recall rate and average mean value accuracy of the method of the embodiment are respectively improved by 2.30%, 1.70% and 1.60%. The improved YOLOv8 (ECAM-YOLOv 8) of this embodiment also exhibited stronger performance and more accurate detection results, with average mean accuracy improved by 0.70%, 1.00%, 2.40% and 2.00%, respectively, compared to the current advanced underwater fish swarm target detection models KAYOLO, DCM-ATM-YOLOv5, SK-YOLOv5 and ESB-YOLO. In the comparative experiments of the disclosed dataset, the accuracy, recall rate and average mean precision of the improved YOLOv8 (ECAM-YOLOv 8) of this embodiment were all higher than those of the other models, indicating that the proposed algorithm was robust. Therefore, the method provided by the application can be suitable for complex underwater scenes, and provides a new method for detecting the cultured fish shoals.
The embodiments of the application have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the application in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, and to enable others of ordinary skill in the art to understand the application for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (8)
1. A method for detecting the position of a fish target individual in a cultured fish farm by improving a YOLOv8 model is characterized by comprising the following steps:
s1, collecting an image in a culture water area as an image to be detected, and preprocessing the image to be detected;
s2, adding an attention mechanism module at a C2f structure in a neck network of the YOLOv8 model, wherein the attention mechanism module comprises a high-efficiency channel attention module and a coordination space attention module, adding 1 large-size detection head at the bottom layer of a feature pyramid network of the YOLOv8 model, and carrying out feature fusion with the C2f structure at the bottommost layer of the neck network to obtain an improved YOLOv8 model;
s3, inputting the preprocessed image in the step S1 into the improved YOLOv8 model obtained in the step S2 to obtain a feature map, wherein the feature map is specifically as follows: inputting the preprocessed image in the step S1 into a high-efficiency channel attention module, and learning the correlation among different characteristic channels by using the channel attention to extract the characteristics more useful for the task; inputting the preprocessed image in the step S1 into a coordination space attention module, and extracting the position information characteristics of the target; fusing the more useful features for the task with the position information features of the target to obtain a feature map;
s4, positioning the target according to the feature map obtained in the step S3 through the center point coordinates of the predicted target and the scale of the boundary frame, obtaining a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame;
s5, sorting all the candidate frames according to the scores of the candidate prediction frames obtained in the step S4, selecting a part with the highest score in the selected prediction frames of the overlapped candidates as a candidate frame, and removing the repeated candidate frames according to a non-maximum suppression algorithm to obtain a prediction frame;
s6, obtaining the position information of the individual fish targets according to the prediction frame obtained in the step S5.
2. The method for detecting the position of a fish target individual in a farmed fish shoal with improved YOLOv8 model according to claim 1, wherein in the step S1, the preprocessing includes resizing the image and normalizing the pixel values.
3. The method for detecting the position of a fish target individual in a farmed fish farm with an improved YOLOv8 model according to claim 3, wherein in the step S2, the improved YOLOv8 model includes an input end, a backbone network and a head network, and the input end unifies the image into a resolution size of 640 pixels by 640 pixels; the backbone network comprises a convolutional neural network for extracting image features, and the backbone network acquires richer gradient flows through a C2f structure and learns more features; the head network comprises a neck and a detection head, and the neck is connected with the backbone network and the detection head and is used for further extracting features and adjusting the resolution of a feature map, so that the prediction capability of the detection head is improved; the detection head comprises a classifier and a regressive device, wherein the classifier is used for predicting the category of the target, and the regressive device is used for predicting the position and the size of the target.
4. The method for detecting the position of a fish target individual in a cultured fish farm with an improved YOLOv8 model according to claim 1, wherein in the step S2, the size of a large-size detection head added to a bottom layer of a feature pyramid network of the YOLOv8 model is 160 pixels×160 pixels, the added large-size detection head is subjected to feature fusion with a C2f structure at the bottom layer of the backbone network, and the detection size of the improved detection head to the feature image comprises 160 pixels×160 pixels, 80 pixels×80 pixels, 40 pixels×40 pixels and 20 pixels×20 pixels.
5. The method according to claim 1, wherein in the step S3, the efficient channel attention module uses ECA attention mechanism to adaptively select one-dimensional convolution kernel, and the efficient channel attention module extracts more useful features M for tasks c (F) As shown in formula (1):
M c (F)=σ(C 1 (G(F))) (1)
wherein G represents global average pooling, C 1 Representing a one-dimensional convolution, σ representing a sigmoid function, F representing the input preprocessed image, and F ε R C×H×W 。
6. The method according to claim 5, wherein in the step S3, the coordination spatial attention module uses CA attention mechanism to decompose spatial attention into two one-dimensional feature coding processes, aggregate features along two spatial directions respectively, capture remote dependency in one spatial direction while maintaining accurate position information in the other spatial direction, given input x, first code each channel in horizontal and vertical directions using pooling kernels of sizes (H, 1) and (1, W), c-th channel codes at HThe calculation of (c) is shown in formula (2), the c-th channel encodes +.>The calculation of (2) is shown in the formula (3):
after the decomposition operation, an intermediate feature map f of the encoded spatial information is obtained, as shown in formula (4):
f=δ(C 2 ([Z h ,Z w ])) (4)
wherein δ is a nonlinear activation function, C 2 Representing a two-dimensional convolution, [,]indicating the splicing operation, Z h Indicating that the c-th channel is encoded at h, Z w Indicating that the c-th channel is encoded at w,
after splitting f, obtaining the attention weight g through convolution transformation h And g w The coordination space attention module extracts target position information features M s As shown in formula (5):
7. the method for detecting the position of a fish target individual in a cultured fish farm with an improved YOLOv8 model according to claim 6, wherein in the step S3, the calculation of the feature map M (F) is as shown in the formula (6):
wherein F is E R C×H×W ,Representing element-wise multiplication.
8. The method for detecting the position of a fish target individual in a cultured fish farm with an improved YOLOv8 model according to claim 7, wherein in the step S4, a target confidence coefficient is obtained by calculating the overlap degree of a candidate prediction frame and a real target frame in an image to be detected, and if the overlap degree of the candidate prediction frame and the real target frame is greater than a threshold value, the target confidence coefficient is defined as 1, otherwise, as 0; and converting the original prediction output by the network into probability distribution by adopting a softmax function to obtain category prediction probability, and multiplying the target confidence coefficient and the category prediction probability to obtain the score of the candidate prediction frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310933832.8A CN117058232A (en) | 2023-07-27 | 2023-07-27 | Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310933832.8A CN117058232A (en) | 2023-07-27 | 2023-07-27 | Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117058232A true CN117058232A (en) | 2023-11-14 |
Family
ID=88656429
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310933832.8A Pending CN117058232A (en) | 2023-07-27 | 2023-07-27 | Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117058232A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117765421A (en) * | 2024-02-22 | 2024-03-26 | 交通运输部天津水运工程科学研究所 | coastline garbage identification method and system based on deep learning |
CN118279935A (en) * | 2024-06-03 | 2024-07-02 | 广东海洋大学 | High-body Seriola detection method based on YOLOv network structure |
CN118397074A (en) * | 2024-05-29 | 2024-07-26 | 中国海洋大学三亚海洋研究院 | Fish target length detection method based on binocular vision |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115690675A (en) * | 2022-10-12 | 2023-02-03 | 大连海洋大学 | ESB-YOLO model cultured fish shoal detection method based on channel non-dimensionality reduction attention mechanism and improved YOLOv5 |
CN116189012A (en) * | 2022-11-22 | 2023-05-30 | 重庆邮电大学 | Unmanned aerial vehicle ground small target detection method based on improved YOLOX |
-
2023
- 2023-07-27 CN CN202310933832.8A patent/CN117058232A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115690675A (en) * | 2022-10-12 | 2023-02-03 | 大连海洋大学 | ESB-YOLO model cultured fish shoal detection method based on channel non-dimensionality reduction attention mechanism and improved YOLOv5 |
CN116189012A (en) * | 2022-11-22 | 2023-05-30 | 重庆邮电大学 | Unmanned aerial vehicle ground small target detection method based on improved YOLOX |
Non-Patent Citations (2)
Title |
---|
杜宝侠: "基于改进YOLOv8的苹果检测方法", 无线互联科技, pages 119 - 122 * |
蒋亚军: "基于改进 YOLOv5s 模型纸杯缺陷检测方法", 包 装 工 程, vol. 44, no. 11, pages 249 - 258 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117765421A (en) * | 2024-02-22 | 2024-03-26 | 交通运输部天津水运工程科学研究所 | coastline garbage identification method and system based on deep learning |
CN117765421B (en) * | 2024-02-22 | 2024-04-26 | 交通运输部天津水运工程科学研究所 | Coastline garbage identification method and system based on deep learning |
CN118397074A (en) * | 2024-05-29 | 2024-07-26 | 中国海洋大学三亚海洋研究院 | Fish target length detection method based on binocular vision |
CN118279935A (en) * | 2024-06-03 | 2024-07-02 | 广东海洋大学 | High-body Seriola detection method based on YOLOv network structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
CN112150493B (en) | Semantic guidance-based screen area detection method in natural scene | |
CN110929593B (en) | Real-time significance pedestrian detection method based on detail discrimination | |
Meng et al. | Single-image dehazing based on two-stream convolutional neural network | |
CN117058232A (en) | Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model | |
CN112598713A (en) | Offshore submarine fish detection and tracking statistical method based on deep learning | |
CN114724022B (en) | Method, system and medium for detecting farmed fish shoal by fusing SKNet and YOLOv5 | |
CN114626445B (en) | Dam termite video identification method based on optical flow network and Gaussian background modeling | |
CN118379288B (en) | Embryo prokaryotic target counting method based on fuzzy rejection and multi-focus image fusion | |
CN116912694A (en) | Water surface target detection algorithm based on improved YOLO V5 | |
CN117132914A (en) | Method and system for identifying large model of universal power equipment | |
CN116452966A (en) | Target detection method, device and equipment for underwater image and storage medium | |
CN114926826A (en) | Scene text detection system | |
CN112070181B (en) | Image stream-based cooperative detection method and device and storage medium | |
Niu et al. | Underwater Waste Recognition and Localization Based on Improved YOLOv5. | |
CN116883832A (en) | Surface water boundary refined extraction method | |
Duan et al. | Boosting fish counting in sonar images with global attention and point supervision | |
Zhou et al. | A lightweight object detection framework for underwater imagery with joint image restoration and color transformation | |
Rout et al. | Underwater visual surveillance: A comprehensive survey | |
Hu et al. | Data augmentation vision transformer for fine-grained image classification | |
CN111160255A (en) | Fishing behavior identification method and system based on three-dimensional convolutional network | |
CN118262228B (en) | Fish object segmentation method in underwater robust video based on self-adaptive selection optical flow | |
CN118195926B (en) | Registration-free multi-focus image fusion method based on spatial position offset sensing | |
CN115909225B (en) | OL-YoloV ship detection method based on online learning | |
CN117392505B (en) | Image target detection method and system based on DETR (detail description of the invention) improved algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |