CN117058232A

CN117058232A - Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model

Info

Publication number: CN117058232A
Application number: CN202310933832.8A
Authority: CN
Inventors: 于红; 涂万; 韦思学; 杨宗轶; 张鹏; 张鑫
Original assignee: Dalian Ocean University
Current assignee: Dalian Ocean University
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2023-11-14

Abstract

The application provides a position detection method for fish target individuals in a cultured fish shoal with an improved YOLOv8 model, which is characterized in that images in a cultured water area are collected as images to be detected and preprocessed; improving the YOLOv8 model; inputting the preprocessed image into an improved YOLOv8 model to obtain a feature map; positioning the target through the center point coordinates of the predicted target and the scale of the boundary frame to obtain a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame; selecting a part with higher score in the candidate predicted frames as the candidate frames according to the scores of the candidate predicted frames, and removing repeated candidate frames according to a non-maximum suppression algorithm to obtain predicted frames; and detecting the position and the size of the fish target individual and the category of the fish target individual in the prediction frame according to the prediction frame to obtain a final target detection result, wherein the final target detection result comprises the position information of the fish individual target and the category information of the fish individual target. The application improves the accuracy of the position detection of the fish target individuals in the cultured fish shoal.

Description

Position detection method for fish target individuals in cultured fish shoal by improving YOLOv8 model

Technical Field

The application belongs to the technical field of intelligent recognition, and particularly discloses a position detection method for fish target individuals in a cultured fish shoal by improving a YOLOv8 model.

Description of the background

Accurate aquaculture is a new trend of aquaculture, and a target detection technology is a basis of accurate aquaculture, however, in a real aquaculture environment, blurring and shielding of underwater fish shoals have strong interference on fish target detection, and complex underwater conditions are often difficult to process by traditional image processing and machine vision methods, so that detection accuracy is not high. In recent years, development of deep learning provides a new solution for target detection, wherein the YOLO algorithm stands out due to the characteristics of stable operation, accurate detection and the like, but the accuracy of detection is still insufficient when the underwater shoal is in the face of blurring and shielding. Aiming at the problem of feature loss caused by shoal blurring, chen et al realize multi-scale feature extraction by using expansion convolution with different sampling rates in YOLOv7, but the effective features of the extraction are still limited. Li Haiqing and the like incorporate priori knowledge into YOLOv5 to further enhance the feature extraction capability of the model, but the final detection effect of the model is overly dependent on the quality of the priori knowledge. When noise exists under water and cannot acquire clean priori knowledge, yu and the like replace the nearest neighbor interpolation up-sampling method in Yolov7 with a CARAFE up-sampling method, so that the influence of the noise on the model feature extraction capability is reduced, and the detection effect is reduced when the method faces high-density farmed fish shoals. Aiming at the problem of target missed detection caused by high-density fish-shoal shielding, li Haiqing and the like make sampling points pay more attention to the acquisition of a foreground target by adding a deformable convolution module into YOLOv5, but the deformable convolution module possibly focuses on unnecessary features due to lack of effective learning guidance, so that feature extraction capability is reduced. In the aspect of the feature extraction capability of the model, the attention mechanism has unique advantages, and the perception capability of the model on key information can be improved. Zhao Meng, etc. fuse the SKNet attention mechanism with YOLOv5 to form a feature extraction network of the pixel-level information of interest, but the dimension reduction operation adopted by SKNet can negatively affect the prediction of channel attention. The efficient correlation channel attention mechanism ECA proposed by Wang et al avoids dimension reduction operation in the feature mapping process, but does not pay attention to the spatial position of the feature map, so that the spatial feature extraction capability is insufficient. Woo et al propose CBAM, add the space attention mechanism on the basis of focusing on channel coding, has strengthened the space extraction ability of the model, but still do not consider the performance degradation problem that the dimension reduction operation causes in the channel attention. Wei Saixue and the like propose a channel non-dimension-reduction dual-attention mechanism ECBAM, and the dimension-reduction operation existing in the CBAM is optimized, but the ECBAM completely compresses space information into the channel in a pooling mode, so that the information interaction of remote space positions cannot be satisfied. Aiming at the problems, a novel and improved method for detecting the position of a fish target individual in a fish farm by using a YOLOv8 model is researched and designed, and the problems existing in the conventional method for detecting the position of the fish target individual are very necessary.

Disclosure of Invention

The application provides a method for detecting the positions of fish target individuals in a cultured fish swarm by improving a YOLOv8 model, which aims to solve the problems that the existing method for detecting the positions of the fish target individuals cannot effectively treat the fuzzy fish swarm and the turbid water body exists and is blocked.

The application provides a method for detecting the position of a fish target individual in a cultured fish shoal by improving a YOLOv8 model, which comprises the following steps:

s1, collecting an image in a culture water area as an image to be detected, and preprocessing the image to be detected;

s2, adding an attention mechanism module at a C2f structure in a neck network of the YOLOv8 model, wherein the attention mechanism module comprises a high-efficiency channel attention module and a coordination space attention module, adding 1 large-size detection head at the bottom layer of a feature pyramid network of the YOLOv8 model, and carrying out feature fusion with the C2f structure at the bottommost layer of the neck network to obtain an improved YOLOv8 model;

s3, inputting the preprocessed image in the step S1 into the improved YOLOv8 model obtained in the step S2 to obtain a feature map, wherein the feature map is specifically as follows: inputting the preprocessed image in the step S1 into a high-efficiency channel attention module, and learning the correlation among different characteristic channels by using the channel attention to extract the characteristics more useful for the task; inputting the preprocessed image in the step S1 into a coordination space attention module, and extracting the position information characteristics of the target; fusing the more useful features for the task with the position information features of the target to obtain a feature map;

s4, positioning the target according to the feature map obtained in the step S3 through the center point coordinates of the predicted target and the scale of the boundary frame, obtaining a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame;

s5, sorting all the candidate frames according to the scores of the candidate prediction frames obtained in the step S4, selecting a part with the highest score in the selected prediction frames of the overlapped candidates as a candidate frame, and removing the repeated candidate frames according to a non-maximum suppression algorithm to obtain a prediction frame;

s6, obtaining the position information of the individual fish targets according to the prediction frame obtained in the step S5.

A method for detecting the position of a fish target individual in a farmed fish farm with an improved YOLOv8 model according to some embodiments of the application, the preprocessing in step S1 includes resizing the image and normalizing the pixel values.

According to some embodiments of the present application, in the step S2, the improved YOLOv8 model includes an input end, a backbone network and a head network, and the input end unifies the image into a resolution size of 640 pixels×640 pixels; the backbone network comprises a convolutional neural network for extracting image features, and the backbone network acquires richer gradient flows through a C2f structure and learns more features; the head network comprises a neck and a detection head, and the neck is connected with the backbone network and the detection head and is used for further extracting features and adjusting the resolution of a feature map, so that the prediction capability of the detection head is improved; the detection head comprises a classifier and a regressive device, wherein the classifier is used for predicting the category of the target, and the regressive device is used for predicting the position and the size of the target.

According to some embodiments of the present application, in the step S2, the size of the large-size detection head added to the bottom layer of the feature pyramid network of the YOLOv8 model is 160 pixels×160 pixels, the added large-size detection head is feature fused with the C2f structure of the bottom layer of the backbone network, and the detection size of the improved detection head on the feature map includes 160 pixels×160 pixels, 80 pixels×80 pixels, 40 pixels×40 pixels and 20 pixels×20 pixels.

In step S3, the efficient channel attention module uses an ECA attention mechanism to adaptively select a one-dimensional convolution kernel, and the efficient channel attention module extracts a feature M more useful for a task _c (F) The calculation of (2) is shown in the formula (1):

M _c (F)＝σ(C ₁ (G(F))) (1)

wherein G represents global average pooling, C ₁ Representing a one-dimensional convolution, σ representing a sigmoid function, F representing the input preprocessed image, and F ε R ^C×H×W 。

In step S3, the coordination spatial attention module uses a CA attention mechanism to decompose spatial attention into two one-dimensional feature encoding processes, respectively aggregating features along two spatial directions, in one spatial directionWhile capturing the remote dependencies, maintaining accurate location information in another spatial direction, given input x, each channel is first encoded in the horizontal and vertical directions using (H, 1) and (1, W) pooling kernels of sizes, respectively, with the c-th channel encoded at HThe calculation of (c) is shown in formula (2), the c-th channel encodes +.>The calculation of (2) is shown in the formula (3):

after the decomposition operation, an intermediate feature map f of the encoded spatial information is obtained, as shown in formula (4):

f＝δ(C ₂ ([Z ^h ,Z ^w ])) (4)

wherein δ is a nonlinear activation function, C ₂ Representing a two-dimensional convolution, [,]indicating the splicing operation, Z ^h Indicating that the c-th channel is encoded at h, Z ^w Indicating that the c-th channel is encoded at w.

After splitting f, obtaining the attention weight g through convolution transformation ^h And g ^w The coordination space attention module extracts target position information features M _s The calculation of (2) is shown in formula (5):

according to some embodiments of the application, a method for detecting the position of a fish target individual in a cultured fish farm by improving the YOLOv8 model is provided, wherein in the step S3, the feature map M (F) is calculated as shown in the formula (6):

wherein F is E R ^C×H×W ，Representing element-wise multiplication.

According to the method for detecting the position of the fish target individual in the cultured fish shoal with the improved YOLOv8 model, in the step S4, the target confidence coefficient is obtained by carrying out cross-over ratio calculation on the overlapping degree of the candidate prediction frame and the real target frame in the image to be detected, if the cross-over ratio of the candidate prediction frame and the real target frame is larger than a threshold value, the target confidence coefficient is defined as 1, otherwise, the target confidence coefficient is defined as 0; and converting the original prediction output by the network into probability distribution by adopting a softmax function to obtain category prediction probability, and multiplying the target confidence coefficient and the category prediction probability to obtain the score of the candidate prediction frame.

According to the method for improving the position detection of the fish target individuals in the fish shoal in the YOLOv8 model, provided by the application, the large-size detection head is added in the bottom layer of the feature pyramid network of the YOLOv8 model, so that the detail information of the underwater fish individuals can be better captured, richer semantic information is obtained, the interference of fuzzy backgrounds is reduced by using the attention mechanism module, the key features of the fish individuals are focused, the recognition capability of the fuzzy fish shoal is enhanced, meanwhile, the direction sensing and position sensitivity capability of the spatial features are enhanced, the defect of the existing attention mechanism in fish shoal detection is effectively solved, the method can better adapt to the fuzzy and shielding conditions of the underwater fish shoal, and the accuracy of the position detection of the fish target individuals in the fish shoal is improved.

Drawings

FIG. 1 is a flow chart of a method for detecting the position of a fish target individual in a farmed fish farm with an improved YOLOv8 model according to the present application.

Detailed Description

Embodiments of the present application are described in further detail below with reference to the accompanying drawings and examples. The following examples are illustrative of the application but are not intended to limit the scope of the application.

Example 1

A method for detecting the position of a fish target individual in a cultured fish farm by improving a YOLOv8 model comprises the following steps:

preprocessing includes adjusting the size of the image and normalizing the pixel values;

the improved YOLOv8 model comprises an input end, a backbone network and a head network, wherein the input end unifies the image into a resolution of 640 pixels by 640 pixels; the backbone network comprises a convolutional neural network for extracting image features, acquires richer gradient flows through a C2f structure and learns more features; the head network comprises a neck and a detection head, and the neck is connected with the backbone network and the detection head and is used for further extracting features and adjusting the resolution of the feature map, so that the prediction capability of the detection head is improved; the detection head comprises a classifier and a regressive device, wherein the classifier is used for predicting the category of the target, and the regressive device is used for predicting the position and the size of the target;

the attention mechanism module is used for helping the model to focus on a specific region of interest in the image, and discarding useless and interference information, so that the feature expression capacity is improved;

the size of the added large-size detection head at the bottom layer of the feature pyramid network of the YOLOv8 model is 160 pixels multiplied by 160 pixels, the added large-size detection head is subjected to feature fusion with the C2f structure at the bottom layer of the backbone network, the detection size of the improved detection head for the feature map comprises 160 pixels multiplied by 160 pixels, 80 pixels multiplied by 80 pixels, 40 pixels multiplied by 40 pixels and 20 pixels multiplied by 20 pixels, the newly added detection size of 160 pixels multiplied by 160 pixels can provide higher resolution, more detailed information can be captured, and fish target individuals with underwater blur can be detected more effectively;

s3, inputting the preprocessed image in the step S1 into an improved YOLOv8 model, wherein the method specifically comprises the following steps: inputting the preprocessed image in the step S1 into a high-efficiency channel attention module, and learning the correlation among different characteristic channels by using the channel attention to extract the characteristics more useful for the task; inputting the preprocessed image in the step S1 into a coordination space attention module, and extracting the position information characteristics of the target; fusing the more useful features of the task with the position information features of the target to obtain a feature map;

the efficient channel attention module uses the ECA attention mechanism to adaptively select one-dimensional convolution kernels, and features M extracted by the efficient channel attention module and more useful for tasks _c (F) As shown in formula (1):

M _c (F)＝σ(C ₁ (G(F))) (1)

wherein G represents global average pooling, C ₁ Representing a one-dimensional convolution, σ representing a sigmoid function, F representing the input preprocessed image, and F ε R ^C×H×W ；

The high-efficiency channel attention module realizes high-efficiency interaction of local channels, and reduces the complexity of the YOLOv8 model;

the coordination spatial attention module uses CA attention mechanism to decompose the spatial attention into two one-dimensional feature coding processes, respectively aggregate features along two spatial directions, capture remote dependence in one spatial direction while maintaining accurate position information in the other spatial direction, given input x, firstly codes each channel in horizontal and vertical directions respectively using (H, 1) and (1, W) pooling cores of sizes, and the c-th channel codes at HThe calculation of (c) is shown in formula (2), the c-th channel encodes +.>The calculation of (2) is shown in the formula (3):

f＝δ(C ₂ ([Z ^h ,Z ^w ])) (4)

wherein δ is a nonlinear activation function, C ₂ Representing a two-dimensional convolution, [,]indicating the splicing operation, Z ^h Indicating that the c-th channel is encoded at h, Z ^w Indicating that the c-th channel is encoded at w;

after splitting f, obtaining the attention weight g through convolution transformation ^h And g ^w Coordinating target position information features M extracted by a spatial attention module _s As shown in formula (5):

the calculation of the feature map M (F) is as shown in formula (6):

wherein F is E R ^C×H×w ，Representing element-wise multiplication.

S4, positioning the target through the center point coordinates of the predicted target and the scale of the boundary frame according to the feature map obtained in the step S3, obtaining a candidate predicted frame on the original image, and calculating the score of the candidate predicted frame;

obtaining target confidence coefficient by carrying out cross-over ratio calculation on the overlapping degree of the candidate prediction frame and a real target frame in the image to be detected, if the cross-over ratio of the candidate prediction frame and the real target frame is larger than a threshold value, defining the target confidence coefficient as 1, otherwise defining the target confidence coefficient as 0; converting the original prediction output by the network into probability distribution by adopting a softmax function to obtain category prediction probability, and multiplying the target confidence coefficient and the category prediction probability to obtain the score of the candidate prediction frame;

s5, sorting all the candidate frames according to the scores of the candidate prediction frames obtained in the step S4, selecting a part with the highest score in the overlapped candidate prediction frames as a candidate frame, and removing the repeated candidate frames according to a non-maximum suppression algorithm to obtain a prediction frame;

Example 2

The data used in the test are collected from a takifugu rubripes cultivation workshop in a cultivation farm, and in order to increase the difference among data set samples, part of highly similar data is removed, so that 600 images with the resolution of 1280 pixels multiplied by 720 pixels are obtained.

The swimming direction of the shoal of fish in the cultivation environment is not fixed, and the shooting angle of the camera is single and is not consistent with the actual situation. Therefore, in order to increase the diversity of fish-shoal samples, horizontal inversion, vertical inversion, and horizontal-vertical hybrid inversion are performed on the images in the dataset. After data augmentation, the number of images in the data set is increased by 1 time, and the total of 1200 images with the resolution of 1280 pixels×720 pixels is obtained. According to 7:2:1 to obtain 840, 240 and 120 images of training set, verification set and test set.

In fish target detection, labeling of a data set is a key link, and traditional manual labeling causes subjective differences of labeling results due to different understanding and judging of a main body, and the data labeling in the embodiment consists of two parts of semi-automatic labeling and manual fine adjustment. Firstly, a model YOLOv8 is trained by adopting 1008 Zhang Yuqun detection images marked in Li Haiqing and other experiments, 1200 unlabeled images required by the experiment are sent into the trained model YOLOv8 for data marking, and a file is stored in txt format. In the test process, the trained YOLOv8 model can accurately label 70% of fish individuals in the image, but the problems of false detection and missed detection exist for fuzzy and blocked targets, so that the rest 30% of individuals which cannot be accurately labeled in the image are manually fine-tuned through a LabelImg labeling tool, and the data set labeled in the mixed mode has better universality and higher labeling quality.

Test environment: pytorch framework version 1.10.0, CUDA version 11.3, operating system Ubuntu20.04, python version 3.8, development platform Jupyter, GPU RTX 3080 (10 GB). Times.1 card, CPU Intel (R) Xeon (R) platform 8255C, system main frequency 2.50GHz.

Parameter setting: batch_size was set to 8, epoch was 300 rounds, and initial learning rate was 0.0001.

To verify the effectiveness of the algorithm, a confusion matrix was used to evaluate the method used for the study. The confusion matrix includes True Positive (TP), true Negative (TN), false Positive (FP), false Negative (FN). Obtaining accuracy (P), recall (R) and average mean accuracy (mean Average Precision, mAP) from the confusion matrix, wherein the calculation formula is shown in formula (7) -formula (10):

where TP is the number of correctly predicted fish samples, FP is the number of incorrectly predicted fish samples, FN is the number of samples not predicted to fish, mAP@0.5 is the average accuracy of the cross ratio threshold set to 0.5.

In order to verify the effectiveness of the model improvement part, an ablation test was designed, the test protocol was as follows: the large-size detection Head and the attention mechanism module were added to the yolov8 model, respectively, the large-size detection Head was added to the yolov8 model and named yolov8+head, the attention mechanism module was added to the yolov8 model and named yolov8+ecam, and then compared with the method of the present embodiment (ECAM-yolov8) in which both modules were added at the same time, and the test results are shown in table 1.

Table 1 ablation test results

As can be seen from Table 1, the large-sized detection head and the attention mechanism module have a certain improvement on the detection performance of the YOLOv8 model, but the attention mechanism module has a larger improvement range, which means that adding the attention mechanism module has a larger influence on the performance of the YOLOv8 model, and can help the model focus on the most relevant part when processing input data, thereby improving the detection effect of the model. But the ECAM-YOLOv8 has the best effect, compared with a YOLOv8 model, the accuracy is improved by 2.30%, the recall rate is improved by 1.70%, and the average mean value accuracy is improved by 1.60%. This is because the large-sized detection head can provide more context information, which is important for understanding the semantics and position of the obscuring and occluding individuals, and after passing through the attention mechanism module, the improved YOLOv8 model can easily distinguish between background and target, enhancing the processing capacity of the model in complex underwater environments.

In the case of turbid water and low visibility, the attention-introducing mechanism module can help the model to selectively pay attention to the region of interest when processing the image, so that the time and the calculation amount for processing background information are reduced. In order to study the influence of the added position of the attention mechanism module on the detection effect, a position contrast test is designed. The attention mechanism modules were placed at 3 different locations, respectively, after SPPF of the Backbone network (Backbone), at the middle C2f structure of the Neck (Neck) and at the bottom C2f structure of the Neck (Neck), and the test results are shown in table 2.

TABLE 2 comparison of the Performance of the attention mechanism Module at different addition locations

As shown in table 2, the addition of the attention mechanism module to the Neck (neg) resulted in higher accuracy and recall than the addition to the Backbone network (Backbone) because the Backbone network (Backbone) mainly extracted the image ground layer features, failing to obtain the global information of the individual fish well. In contrast, having higher semantic features at the Neck (Neck) allows better global modeling of features. In addition, the attention mechanism module can bring more remarkable improvement after being added to the C2f of the Neck (Neck), and the improvement effect is more obvious because the attention mechanism is added in the information fusion process after the C2f of the Neck, and key information in the features can be selected and amplified.

In order to verify the effect of the method in the embodiment on underwater fish shoal detection, an advanced method in the field is selected for model comparison experiments, and the model comparison experiments are respectively as follows: li Haiqing and the like incorporate prior knowledge into an underwater population target detection model KAYOLO in YOLOv5, li Haiqing and the like add a deformable convolution module into the underwater population target detection model DCM-ATM-YOLOv5, zhao Meng and the like in which a SKNet attention mechanism and a model SK-YOLOv5, wei Saixue and the like of YOLOv5 are fused, a channel non-dimension-reducing dual attention mechanism ECBAM and a model ESB-YOLO of YOLOv5 are fused, test results are shown in Table 3,

TABLE 3 comparison of the Performance of different models with ECAM-YOLOv8

As shown in Table 3, ECAM-YOLOv8 of the method of this example gave the best results. Although the adopted contrast models release the problem of poor detection effect caused by blurring and shielding to a certain extent, the problems are respectively insufficient. KAYOLO utilizes priori knowledge to enhance the characteristics of an object to be detected, but the model detection result is too dependent on the quality and quantity of the priori knowledge, and when the prior knowledge is noisy or the quantity is small, the influence on the detection effect of the model is larger; the deformable convolution module adopted by DCM-ATM-YOLOv5 may focus on unnecessary features due to lack of effective learning guidance, resulting in reduced feature extraction capability; because the SK-YOLOv5 adopts the SKNet attention mechanism, after the feature fusion based on the global context, local detail information can be ignored, so that the detection difficulty of fuzzy fish swarm individuals is increased; in the space information coding operation, the ESB-YOLO does not consider space position information interaction, so that fish individuals cannot be accurately positioned. Therefore, the ECAM-YOLOv8 of the embodiment not only extracts the detailed information of the fuzzy individual, but also enhances the position sensing capability of the model by adding the large-size detection head and the ECAM attention mechanism, thereby improving the accuracy of detecting the underwater fish shoal target.

In order to verify the robustness of the algorithm, 2420 images are selected from the Chinese agricultural artificial intelligence innovation entrepreneur large-scale fish data set according to 7:2:1, and performing a simultaneous expansion test with KAYOLO, DCM-ATM-YOLOv5, SK-YOLOv5 and ESB-YOLO, and the test results are shown in Table 4.

Table 4 discloses a performance comparison of data sets, tab.4Performance Comparison of Public Datasets

As shown in Table 4, the accuracy, recall and average mean precision of the improved YOLOv8 (ECAM-YOLOv 8) of this embodiment are all higher than those of the other models, indicating that the proposed ECAM has stronger robustness with the improved Head module.

In an advanced target detection model, the YOLOv8 model has higher detection precision and faster reasoning speed by introducing a stronger network structure and an optimization algorithm; and a more effective data enhancement strategy is adopted, so that the generalization capability of the model under various scenes and conditions is improved. The YOLOv8 algorithm series includes 5 different models, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x, from small to large. Compared with the first 4 models, the YOLOv8x model adopts a deeper network structure, more convolution layers and residual connection are introduced, so that more features can be learned, and the optimal detection effect is achieved. Thus, the present study uses the YOLOv8x model as the basic model framework. In order to better extract the characteristics of the underwater fuzzy target, the accuracy of detection is improved to the greatest extent, a large-size detection head is added in the YOLOv8 model, and more detail information is extracted from the fuzzy target. Meanwhile, after the characteristics are fused, the attention mechanism module is used for paying attention to the position information of the target fish school, so that the accurate detection of the underwater aquaculture fish school video stream is realized.

The results of the ablation test and the comparison test show that the improved YOLOv8 (ECAM-YOLOv 8) of the embodiment has the best effect in the underwater blurred fish group detection task, and compared with the YOLOv8 model, the accuracy, recall rate and average mean value accuracy of the method of the embodiment are respectively improved by 2.30%, 1.70% and 1.60%. The improved YOLOv8 (ECAM-YOLOv 8) of this embodiment also exhibited stronger performance and more accurate detection results, with average mean accuracy improved by 0.70%, 1.00%, 2.40% and 2.00%, respectively, compared to the current advanced underwater fish swarm target detection models KAYOLO, DCM-ATM-YOLOv5, SK-YOLOv5 and ESB-YOLO. In the comparative experiments of the disclosed dataset, the accuracy, recall rate and average mean precision of the improved YOLOv8 (ECAM-YOLOv 8) of this embodiment were all higher than those of the other models, indicating that the proposed algorithm was robust. Therefore, the method provided by the application can be suitable for complex underwater scenes, and provides a new method for detecting the cultured fish shoals.

The embodiments of the application have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the application in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, and to enable others of ordinary skill in the art to understand the application for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method for detecting the position of a fish target individual in a cultured fish farm by improving a YOLOv8 model is characterized by comprising the following steps:

2. The method for detecting the position of a fish target individual in a farmed fish shoal with improved YOLOv8 model according to claim 1, wherein in the step S1, the preprocessing includes resizing the image and normalizing the pixel values.

3. The method for detecting the position of a fish target individual in a farmed fish farm with an improved YOLOv8 model according to claim 3, wherein in the step S2, the improved YOLOv8 model includes an input end, a backbone network and a head network, and the input end unifies the image into a resolution size of 640 pixels by 640 pixels; the backbone network comprises a convolutional neural network for extracting image features, and the backbone network acquires richer gradient flows through a C2f structure and learns more features; the head network comprises a neck and a detection head, and the neck is connected with the backbone network and the detection head and is used for further extracting features and adjusting the resolution of a feature map, so that the prediction capability of the detection head is improved; the detection head comprises a classifier and a regressive device, wherein the classifier is used for predicting the category of the target, and the regressive device is used for predicting the position and the size of the target.

4. The method for detecting the position of a fish target individual in a cultured fish farm with an improved YOLOv8 model according to claim 1, wherein in the step S2, the size of a large-size detection head added to a bottom layer of a feature pyramid network of the YOLOv8 model is 160 pixels×160 pixels, the added large-size detection head is subjected to feature fusion with a C2f structure at the bottom layer of the backbone network, and the detection size of the improved detection head to the feature image comprises 160 pixels×160 pixels, 80 pixels×80 pixels, 40 pixels×40 pixels and 20 pixels×20 pixels.

5. The method according to claim 1, wherein in the step S3, the efficient channel attention module uses ECA attention mechanism to adaptively select one-dimensional convolution kernel, and the efficient channel attention module extracts more useful features M for tasks _c (F) As shown in formula (1):

M _c (F)＝σ(C ₁ (G(F))) (1)

6. The method according to claim 5, wherein in the step S3, the coordination spatial attention module uses CA attention mechanism to decompose spatial attention into two one-dimensional feature coding processes, aggregate features along two spatial directions respectively, capture remote dependency in one spatial direction while maintaining accurate position information in the other spatial direction, given input x, first code each channel in horizontal and vertical directions using pooling kernels of sizes (H, 1) and (1, W), c-th channel codes at HThe calculation of (c) is shown in formula (2), the c-th channel encodes +.>The calculation of (2) is shown in the formula (3):

f＝δ(C ₂ ([Z ^h ,Z ^w ])) (4)

wherein δ is a nonlinear activation function, C ₂ Representing a two-dimensional convolution, [,]indicating the splicing operation, Z ^h Indicating that the c-th channel is encoded at h, Z ^w Indicating that the c-th channel is encoded at w,

after splitting f, obtaining the attention weight g through convolution transformation ^h And g ^w The coordination space attention module extracts target position information features M _s As shown in formula (5):

7. the method for detecting the position of a fish target individual in a cultured fish farm with an improved YOLOv8 model according to claim 6, wherein in the step S3, the calculation of the feature map M (F) is as shown in the formula (6):

wherein F is E R ^C×H×W ，Representing element-wise multiplication.

8. The method for detecting the position of a fish target individual in a cultured fish farm with an improved YOLOv8 model according to claim 7, wherein in the step S4, a target confidence coefficient is obtained by calculating the overlap degree of a candidate prediction frame and a real target frame in an image to be detected, and if the overlap degree of the candidate prediction frame and the real target frame is greater than a threshold value, the target confidence coefficient is defined as 1, otherwise, as 0; and converting the original prediction output by the network into probability distribution by adopting a softmax function to obtain category prediction probability, and multiplying the target confidence coefficient and the category prediction probability to obtain the score of the candidate prediction frame.