CN116206195A

CN116206195A - Offshore culture object detection method, system, storage medium and computer equipment

Info

Publication number: CN116206195A
Application number: CN202310216314.4A
Authority: CN
Inventors: 赵越; 刘艳; 徐嘉璐; 李庆武
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2023-06-02

Abstract

The invention discloses a method, a system, a storage medium and computer equipment for detecting an offshore culture object, which adopt a YOLOv5 network to detect the offshore culture object, a transducer network is used for replacing the last C3 network in a backbone module of the YOLOv5 network, the global self-attention mechanism of the transducer network is utilized to improve the interpretability of the backbone module, more accurate object positioning is obtained, a weighted bidirectional feature pyramid network is used for replacing the combination of a feature pyramid network and a path aggregation network in a neck module of the YOLOv5 network, the feature fusion can be carried out on the underwater culture object image more efficiently, the trans-scale connection is optimized by reducing the contribution to the feature fusion process, the fusion of more important features in the same hierarchy is enhanced, the false detection phenomenon is reduced, and the object detection effect is greatly enhanced.

Description

Offshore culture object detection method, system, storage medium and computer equipment

Technical Field

The invention relates to a method, a system, a storage medium and computer equipment for detecting an offshore culture object, and belongs to the technical field of deep learning and computer vision.

Background

In recent years, the population is continuously increased, the national living standard is increasingly improved, land resources are excessively mined, the demands of people for ocean resources are increased, and the marine farming industry is greatly developed. In the past, the fishing work of the culture mostly relies on manual capture, and in a complex underwater environment, the manual capture has the defects of long time consumption, low efficiency and the like.

With the development of a network model, automatic fishing by using a submerged robot becomes a reliable choice, and the premise of automatic fishing is to detect an offshore culture object. In practical application, due to the influence of an underwater complex environment, the obtained underwater culture object image has the problems of stacking, shielding, background interference and the like, and the conventional offshore culture object detection method based on the YOLOv5 network is poor in effect.

Disclosure of Invention

The invention provides a method, a system, a storage medium and computer equipment for detecting an offshore culture object, which solve the problems disclosed in the background art.

In order to solve the technical problems, the invention adopts the following technical scheme:

an offshore culture target detection method comprising:

collecting an offshore culture object image;

inputting the near-sea culture object image into a pre-trained object detection model to obtain an offshore culture object detection result; the target detection model is a YOLOv5 network, a transducer network is used for replacing the last C3 network in a backbone module of the YOLOv5 network, and a weighted bidirectional feature pyramid network is used for replacing a combination of a feature pyramid network and a path aggregation network in a neck module of the YOLOv5 network.

In the weighted bidirectional feature pyramid network, a third-stage up-sampling branch network is additionally arranged between a second-stage up-sampling branch network and a first-stage down-sampling branch network, and the up-sampling multiple of the third-stage up-sampling branch network is larger than that of the first-stage up-sampling branch network and the second-stage up-sampling branch network;

and a small target detection layer is additionally arranged in the prediction module of the YOLOv5 network, the input end of the additionally arranged small target detection layer is connected with the C3 network of the third-stage up-sampling branch network, and the characteristic diagram output by the additionally arranged small target detection layer is larger than the characteristic diagrams output by other small target detection layers in the prediction module.

The first-stage up-sampling branch network, the second-stage up-sampling branch network and the third-stage up-sampling branch network are identical in structure and comprise a convolution layer, an up-sampling layer, a normalized feature fusion layer, a C3 network and an attention module which are sequentially connected from input to output;

the first-stage downsampling branch network and the second-stage downsampling branch network have the same structure and comprise a convolution layer, a normalized feature fusion layer, a C3 network and an attention module which are sequentially connected from input to output;

the third-stage downsampling branch network comprises a convolution layer, a normalized feature fusion layer and a C3 network which are sequentially connected from input to output.

The feature map output by the transform network is used as a feature map input by the first-stage up-sampling branch network, and the input feature map is subjected to convolution and one-time up-sampling in sequence and then is subjected to normalization fusion with a feature map output by a third C3 network of the backbone network;

the feature image output by the first-stage up-sampling branch network is used as the feature image input by the second-stage up-sampling branch network, and the input feature image is subjected to normalization fusion with the feature image output by the second C3 network of the main network after convolution and one-time up-sampling in sequence;

the feature image output by the second-stage up-sampling branch network is used as the feature image input by the third-stage up-sampling branch network, and the input feature image is subjected to normalization fusion with the feature image output by the first C3 network of the main network after convolution and one-time up-sampling in sequence;

the characteristic diagram output by the third-stage up-sampling branch network is used as the characteristic diagram input by the first-stage down-sampling branch network, and after convolution, the input characteristic diagram is normalized and fused with the characteristic diagram output by the convolution layer of the third-stage up-sampling branch network and the characteristic diagram output by the second C3 network of the main network;

the feature map output by the first-stage downsampling branch network is used as the feature map input by the second-stage downsampling branch network, and after convolution, the input feature map is normalized and fused with the feature map output by the convolution layer of the second-stage upsampling branch network and the feature map output by the third C3 network of the main network;

the feature map output by the second-stage downsampling branch network is used as the feature map input by the third-stage downsampling branch network, and the input feature map is subjected to convolution and then normalized fusion with the feature map output by the convolution layer of the first-stage upsampling branch network.

An offshore culture target detection system comprising:

the acquisition module acquires an offshore culture object image;

the detection module inputs the target image of the offshore culture into a pre-trained target detection model to obtain a target detection result of the offshore culture; the target detection model is a YOLOv5 network, a transducer network is used for replacing the last C3 network in a backbone module of the YOLOv5 network, and a weighted bidirectional feature pyramid network is used for replacing a combination of a feature pyramid network and a path aggregation network in a neck module of the YOLOv5 network.

A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform an offshore culture target detection method.

A computer device comprising one or more processors, and one or more memories, one or more programs stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing an offshore culture target detection method.

The invention has the beneficial effects that: according to the invention, the YOLOv5 network is adopted to carry out offshore culture object detection, a transducer network is used for replacing the last C3 network in a backbone module of the YOLOv5 network, the global self-attention mechanism of the transducer network is utilized, the interpretability of the backbone module is improved, the focused area of the network is displayed more clearly so as to obtain more accurate object positioning, and a weighted bidirectional feature pyramid network is used for replacing the combination of the feature pyramid network and the path aggregation network in a neck module of the YOLOv5 network, so that feature fusion can be carried out on underwater culture object images more efficiently, cross-scale connection is optimized by reducing the contribution to the feature fusion process, the fusion of more important features in the same hierarchy is enhanced, the false detection phenomenon is reduced, and the object detection effect is greatly enhanced.

Drawings

FIG. 1 is a flow chart of a method of offshore culture target detection;

fig. 2 is a schematic structural diagram of the object detection model.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

As shown in fig. 1, the method for detecting the object of the offshore culture comprises the following steps:

and step 1, acquiring an offshore culture object image.

Step 2, inputting an offshore culture object image into a pre-trained object detection model to obtain an offshore culture object detection result; the target detection model is a YOLOv5 network, a transducer network is used for replacing the last C3 network in a backbone module of the YOLOv5 network, and a weighted bidirectional feature pyramid network is used for replacing a combination of a feature pyramid network and a path aggregation network in a neck module of the YOLOv5 network.

The method is implemented in an underwater robot, an offshore culture object is detected by using a YOLOv5 network, a last C3 network is replaced by using a transducer network in a backbone module of the YOLOv5 network, the global self-attention mechanism of the transducer network is utilized, the interpretability of the backbone module is improved, key areas of network attention are displayed more clearly, so that more accurate object positioning is obtained, and a combination of a feature pyramid network and a path aggregation network is replaced by using a weighted bidirectional feature pyramid network in a neck module of the YOLOv5 network, so that feature fusion can be carried out on underwater culture object images more efficiently, cross-scale connection is optimized by reducing contribution to the feature fusion process, fusion of important features in the same level is enhanced, error detection phenomenon is reduced, and object detection effect is greatly enhanced.

As an embodiment of the present invention, the YOLOv5 network is a modified network in the original YOLOv5 network, and in particular, the 5.0 version of the existing YOLOv5 network can be modified, see fig. 2, and the original YOLOv5 network is general, and the YOLOv5 network of the present invention also includes a backbone module, a neck module and a prediction module; the trunk module is used for downsampling an input image with the scale of 640 multiplied by 640 for a plurality of times, extracting the characteristics of the input image and obtaining a plurality of characteristic diagrams with different scales; the neck module is used for carrying out up-sampling and feature fusion for a plurality of times based on the feature graphs with different scales obtained by the trunk module, and outputting a plurality of detection feature graphs with different scales; the prediction module is used for predicting the offshore underwater culture target based on the detection feature images output by the neck module to obtain the predicted position information and the classification result of the target, wherein the classification categories comprise sea urchin, sea cucumber, scallop and starfish.

In order to adapt to the re-checked underwater environment and enhance the identification effect, the invention improves all 3 modules of the original YOLOv5 network, and specifically can be as follows:

the backbone module of the original YOLOv5 network comprises a Focus module, a convolution layer, a C3 network (a first C3 network), a convolution layer, a C3 network (a second C3 network), a convolution layer, a C3 network (a third C3 network), a convolution layer, an SPP module and a C3 network (a last C3 network) which are connected in sequence; according to the invention, the last C3 network is replaced by using the transducer network, the global self-attention mechanism of the transducer network is utilized, the interpretability of the backbone module is improved, and the focused area of the network is displayed more clearly, so that more accurate target positioning is obtained.

The neck module of the original YOLOv5 network is a combination of a feature pyramid network and a path aggregation network, in which the features of the same hierarchy may be calculated multiple times, resulting in an increase in calculation amount, and in which the feature fusion process is performed layer by layer, which may result in loss of low-level feature information. The invention uses the weighted bidirectional feature pyramid network to replace the combination, can more efficiently perform feature fusion on the underwater culture object image, optimizes the cross-scale connection by reducing the features which do not greatly contribute to the feature fusion process, strengthens the fusion of more important features in the same level, and reduces the false detection phenomenon.

As an embodiment of the present invention, the weighted bidirectional feature pyramid network of the present invention is also an improved network, that is, an improvement is made on the basis of the original weighted bidirectional feature pyramid network, and the weighted bidirectional feature pyramid network includes two upsampling branch networks and three downsampling branch networks, specifically includes a first upsampling branch network, a second upsampling branch network, a first downsampling branch network, and a second downsampling branch network that are sequentially connected from input to output; the up-sampling branch network comprises a convolution layer, an up-sampling layer, a normalized feature fusion layer (namely a rapid normalized feature fusion layer in the figure) and a C3 network which are connected in sequence, and the down-sampling branch network comprises the convolution layer, the normalized feature fusion layer and the C3 network which are connected in sequence. The traditional downsampling branch network can fuse the characteristics of different levels, so that the model can simultaneously utilize information from different levels to perform target detection, but the traditional network structure does not pay attention to the relation between different positions on the characteristic map, and global information cannot be effectively integrated, so that the characteristic map information is lost and redundant.

According to the invention, a third-stage up-sampling branch network is additionally arranged between a second-stage up-sampling branch network and a first-stage down-sampling branch network, and the up-sampling multiple of the third-stage up-sampling branch network is larger than that of the first-stage up-sampling branch network and the second-stage up-sampling branch network; the up-sampling times of the first-stage up-sampling branch network and the second-stage up-sampling branch network are 2 times and 4 times respectively, and the up-sampling times of the added third-stage up-sampling branch network are 8 times. The 8-time up-sampling branch network is added, so that the resolution of the feature map can be improved, and the model can capture the detailed information of the target more finely, so that the detection effect of the model on small targets and dense targets is improved.

In order to strengthen the fusion of the target characteristics of the offshore culture in the characteristic fusion process, the invention adds attention modules in the first-stage up-sampling branch network, the second-stage up-sampling branch network, the third-stage up-sampling branch network, the first-stage down-sampling branch network and the second-stage down-sampling branch network.

The structure of each branch network may therefore be:

the first-stage up-sampling branch network, the second-stage up-sampling branch network and the third-stage up-sampling branch network are identical in structure and comprise a convolution layer, an up-sampling layer, a normalized feature fusion layer, a C3 network and an attention module which are sequentially connected from input to output.

The feature map output by the transform network is used as a feature map input by the first-stage up-sampling branch network, and the input feature map is subjected to convolution and one-time up-sampling in sequence and then is subjected to normalization fusion with a feature map output by a third C3 network of the backbone network; the feature image output by the first-stage up-sampling branch network is used as the feature image input by the second-stage up-sampling branch network, and the input feature image is subjected to normalization fusion with the feature image output by the second C3 network of the main network after convolution and one-time up-sampling in sequence; the feature map output by the second-stage up-sampling branch network is used as the feature map input by the third-stage up-sampling branch network, and the input feature map is subjected to convolution and up-sampling for one time in sequence and then is subjected to normalization fusion with the feature map output by the first C3 network of the main network.

The first-stage downsampling branch network and the second-stage downsampling branch network have the same structure and comprise a convolution layer, a normalized feature fusion layer, a C3 network and an attention module which are sequentially connected from input to output; the third-stage downsampling branch network comprises a convolution layer, a normalized feature fusion layer and a C3 network which are sequentially connected from input to output.

The characteristic diagram output by the third-stage up-sampling branch network is used as the characteristic diagram input by the first-stage down-sampling branch network, and after convolution, the input characteristic diagram is normalized and fused with the characteristic diagram output by the convolution layer of the third-stage up-sampling branch network and the characteristic diagram output by the second C3 network of the main network; the feature map output by the first-stage downsampling branch network is used as the feature map input by the second-stage downsampling branch network, and after convolution, the input feature map is normalized and fused with the feature map output by the convolution layer of the second-stage upsampling branch network and the feature map output by the third C3 network of the main network; the feature map output by the second-stage downsampling branch network is used as the feature map input by the third-stage downsampling branch network, and the input feature map is subjected to convolution and then normalized fusion with the feature map output by the convolution layer of the first-stage upsampling branch network.

As one embodiment of the invention, the attention module adopts a CA (CoordinateAttention) attention mechanism for enhancing the fusion of the object characteristics of the offshore culture in the neck module characteristic fusion process, and the CA attention mechanism is a lightweight module which can be directly embedded into a network and can embed the position information of the object into the attention of a channel, so that the module has strong generalization capability, is beneficial to the detection of the offshore culture and can realize systematic improvement with little calculation burden.

The CA attention mechanism makes the channel attention be divided into a 1D feature coding process in the horizontal and vertical directions so as to respectively aggregate features along two spatial directions, and the output result can be expressed as:

wherein H is the height of the whole feature map, W is the width of the whole feature map, c is the channel number of the feature map, H and W respectively represent points on two lines of the feature map with the height H and the width W,

output result of c-th channel of height h,/or>

Output result of c-th channel with width w, x _c (h, i) and x _c (j, w) represent the eigenvalues of the c-th channel of the input eigenvector with spatial coordinates (h, i) and (j, w), respectively.

Splicing the obtained feature images of the global receptive field in two directions (width and height) together, performing channel compression by using a convolution module of 1×1 to reduce the dimension to the original C/r, wherein C is the channel number of the feature image, r is an superparameter for controlling the compression ratio of the channel number, and then performing batch normalization processing on the feature image F ₁ The Sigmoid activation function gives a feature map f shaped as 1× (w+h) ×c/r, w+h representing the dimension of the weighted average of the feature map's attention (since an attention weight is generated for each channel)Weight vector, so the process of weighted averaging requires calculation of all channels in width and height directions, and thus the dimension of the attention weight vector is w+h), the formula of f can be expressed as:

f＝δ(F ₁ ([z ^h ,z ^w ]))

wherein delta is a nonlinear activation function ReLU function, z ^h Representing stitching F in the height direction ₁ Obtained characteristic diagram, z ^w Representing the splice F in the widthwise direction ₁ And (5) obtaining a characteristic diagram.

The characteristic diagram F is convolved with convolution kernel of 1×1 according to the original height and width to obtain the characteristic diagram F with the same channel number as the original one _h And F _w Feature map F _h The value of each position in (a) represents the attention weight of the position in the vertical direction, i.e. the importance of the feature of the position to the feature fusion in the vertical direction, and F _w The value of each position in the list represents the attention weight of the position in the horizontal direction, i.e. the importance of the feature of the position to the feature fusion in the horizontal direction.

Feature map F _h And F _w After the Sigmoid activation function, the attention weight of the feature map in the height direction and the attention weight of the feature map in the width direction are obtained respectively, and the attention weight can be expressed as follows:

g ^h ＝σ(F _h (f ^h ))

g ^w ＝σ(F _w (f ^w ))

wherein g ^h Attention weight g of the feature map in the height direction ^w For the attention weight of the feature map in the width direction, sigma is a sigmoid activation function, f ^h And f ^w The tensors that decompose the feature map f in the height direction and the width direction are shown, respectively.

The feature map with the attention weight in the width and height directions is finally obtained through multiplication weighted calculation on the original feature map, and can be expressed as follows by a formula:

wherein y is _c (i, j) represents a feature map with attention weights in both width and height directions, x _c (i, j) represents the value of the c-th channel at position (i, j) of the input feature map,

for the attention weight in the height direction, the value representing the contribution degree of the value of the position (i, j) on the c-th channel to the other positions in the height direction is in the range of 0 to 1,/->

The attention weight in the width direction is a contribution degree of the value of the position (i, j) on the c-th channel to other positions in the width direction, and the value range is 0 to 1.

In order to enhance the detection effect on small targets and solve the problem of missed detection on the small targets of the underwater culture, a small target detection layer is additionally arranged in a prediction module of the YOLOv5 network, the input end of the additionally arranged small target detection layer is connected with a C3 network of a third-stage up-sampling branch network, and the characteristic diagram output by the additionally arranged small target detection layer is larger than the characteristic diagrams output by other small target detection layers in the prediction module.

In fig. 2, the size of the output feature map of the added small target detection layer is 160×160, and in order to adapt to the added small target detection layer, a group of anchor point frames (anchors) with small target size are also required to be added, and a K-means adaptive algorithm is adopted to obtain the anchor point frames which conform to the small target size characteristics of the offshore culture. According to the added 160×160 scale feature layer, the partitioned small scale grid needs to add anchors corresponding to the small scale, and thus anchors are added to 12 groups corresponding to 4 detection scales.

After the target detection model is built, training is further required to be performed by adopting a data set, wherein the data set can be selected from URPC2019 and URPC2020 data sets, the data set is divided according to a ratio of 6:2:2, namely 60% of offshore culture detection data is randomly selected as a training set, 20% of offshore culture detection data is used as a verification set, 20% of offshore culture detection data is used as a test set, and four categories of sea urchins, sea cucumbers, fan shellfish and starfishes are set according to the target category of the culture and are respectively represented by echinus, holothurian, scallop and starfish.

The training set, the verification set and the test set are further required to be preprocessed, the data enhancement of the offshore culture data can be realized by adopting a Mosaic algorithm, 4 pictures are spliced in a random scaling, random cutting and random arrangement mode by the Mosaic algorithm, the background and small targets of detected objects are enriched, the data of the four pictures can be calculated at one time when the calculation batch is normalized, the mini batch size can achieve a good effect without being large, and meanwhile, a model obtained through training has stronger generalization capability by various target samples.

The specific training process may be as follows:

a1 Setting training parameters, namely setting the size of a training Batch to be Batch-size=32, setting the Momentum momentum=0.937, setting the learning rate to be Ir=0.01, and setting the training iteration times epoch=100;

a2 The preprocessed training set and the preprocessed verification set are sent into the target detection model, in the process, namely when the target detection model reads in the training set and the preprocessed verification set, whether the images contain targets or not is determined according to the read-in tag data, and the images which do not contain the culture targets in the training set and the preprocessed verification set are automatically ignored, so that the training of the model is prevented from being interfered;

a3 The detection accuracy can reflect the detection accuracy of the model, and the loss plays an important role in the training process, and can reflect the deviation between the true value and the predicted value. The smaller the loss, the closer the predicted value is to the true value, and the better the performance of the model is; therefore, according to the average precision change and loss change trend of the cross verification of the training set and the verification set, the learning rate and the iteration times are adjusted until the precision change and the loss change gradually tend to a stable state, and the final learning rate and the iteration times are determined;

a4 And (3) according to the learning rate and the iteration times determined in the A3), training the target detection model is completed, and the target detection model with good convergence is obtained.

The specific test evaluation procedure may be as follows:

b1 Inputting the preprocessed test set into a trained target detection model, and testing and evaluating the performance of the target detection model;

b2 Judging whether the average detection precision and detection speed of the evaluation result of the target detection model meet the actual application requirements, if the target detection model meets the actual application requirements, executing the step 6.4, otherwise, executing the step B3); specifically, according to the final experimental result of the model, the average detection precision reaches 85.18%, and the detection speed is 113.6FPS (number of frames of picture detected per second); the requirements of high detection precision and real-time detection in practical application are met;

b3 Correcting the width and depth of the target detection model, and jumping to A3) retraining.

And B) using the target detection model meeting the actual application requirements in the step B2) for detecting the object of the offshore culture in the image or the video, namely inputting and collecting the image of the object of the offshore culture, and outputting the detection result of the object of the offshore culture.

Based on the same technical scheme, the invention also discloses a software system corresponding to the method, and the system can be arranged in an underwater robot, and a specific offshore culture object detection system can comprise:

and the acquisition module acquires an offshore culture object image.

In the weighted bidirectional feature pyramid network, a third-stage upsampling branch network is additionally arranged between the second-stage upsampling branch network and the first-stage downsampling branch network, and the upsampling multiple of the third-stage upsampling branch network is larger than that of the first-stage upsampling branch network and the second-stage upsampling branch network.

In order to adapt to the third-stage up-sampling branch network, a small target detection layer is additionally arranged in a prediction module of the YOLOv5 network, the input end of the additionally arranged small target detection layer is connected with a C3 network of the third-stage up-sampling branch network, and the characteristic diagram output by the additionally arranged small target detection layer is larger than the characteristic diagrams output by other small target detection layers in the prediction module.

The first-stage up-sampling branch network, the second-stage up-sampling branch network and the third-stage up-sampling branch network are identical in structure and comprise a convolution layer, an up-sampling layer, a normalized feature fusion layer, a C3 network and an attention module which are sequentially connected from input to output. The first-stage downsampling branch network and the second-stage downsampling branch network have the same structure and comprise a convolution layer, a normalized feature fusion layer, a C3 network and an attention module which are sequentially connected from input to output; the third-stage downsampling branch network comprises a convolution layer, a normalized feature fusion layer and a C3 network which are sequentially connected from input to output.

The feature map output by the transform network is used as a feature map input by the first-stage up-sampling branch network, and the input feature map is subjected to convolution and one-time up-sampling in sequence and then is subjected to normalization fusion with a feature map output by a third C3 network of the backbone network; the feature image output by the first-stage up-sampling branch network is used as the feature image input by the second-stage up-sampling branch network, and the input feature image is subjected to normalization fusion with the feature image output by the second C3 network of the main network after convolution and one-time up-sampling in sequence; the feature image output by the second-stage up-sampling branch network is used as the feature image input by the third-stage up-sampling branch network, and the input feature image is subjected to normalization fusion with the feature image output by the first C3 network of the main network after convolution and one-time up-sampling in sequence; the characteristic diagram output by the third-stage up-sampling branch network is used as the characteristic diagram input by the first-stage down-sampling branch network, and after convolution, the input characteristic diagram is normalized and fused with the characteristic diagram output by the convolution layer of the third-stage up-sampling branch network and the characteristic diagram output by the second C3 network of the main network; the feature map output by the first-stage downsampling branch network is used as the feature map input by the second-stage downsampling branch network, and after convolution, the input feature map is normalized and fused with the feature map output by the convolution layer of the second-stage upsampling branch network and the feature map output by the third C3 network of the main network; the feature map output by the second-stage downsampling branch network is used as the feature map input by the third-stage downsampling branch network, and the input feature map is subjected to convolution and then normalized fusion with the feature map output by the convolution layer of the first-stage upsampling branch network.

Based on the same technical solution, the present invention also discloses a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform an offshore culture target detection method.

Based on the same technical solution, the invention also discloses a computer device comprising one or more processors, and one or more memories, one or more programs stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the offshore culture object detection method.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as providing for the use of additional embodiments and advantages of all such modifications, equivalents, improvements and similar to the present invention are intended to be included within the scope of the present invention as defined by the appended claims.

Claims

1. An offshore culture target detection method, comprising:

collecting an offshore culture object image;

2. The offshore culture target detection method of claim 1, wherein a third level up-sampling branch network is added between the second level up-sampling branch network and the first level down-sampling branch network in the weighted bidirectional feature pyramid network, and the up-sampling multiple of the third level up-sampling branch network is larger than the up-sampling multiple of the first level up-sampling branch network and the second level up-sampling branch network;

3. The offshore culture target detection method of claim 2, wherein the first-stage upsampling branch network, the second-stage upsampling branch network, and the third-stage upsampling branch network are identical in structure and each comprise a convolution layer, an upsampling layer, a normalized feature fusion layer, a C3 network, and an attention module which are sequentially connected from input to output;

4. The offshore culture target detection method according to claim 3, wherein the feature map output by the Transformer network is used as a feature map input by the first-stage up-sampling branch network, and the input feature map is normalized and fused with a feature map output by a third C3 network of the main network after convolution and one-time up-sampling in sequence;

5. An offshore culture target detection system, comprising:

the acquisition module acquires an offshore culture object image;

6. The offshore culture target detection system of claim 5, wherein a third level up-sampling branch network is added between the second level up-sampling branch network and the first level down-sampling branch network in the weighted bi-directional feature pyramid network, and an up-sampling multiple of the third level up-sampling branch network is greater than up-sampling multiple of the first level up-sampling branch network and the second level up-sampling branch network;

7. The offshore culture target detection system of claim 6, wherein the first stage upsampling branch network, the second stage upsampling branch network, and the third stage upsampling branch network are structurally identical and each comprise a convolution layer, an upsampling layer, a normalized feature fusion layer, a C3 network, and an attention module connected in sequence from input to output;

8. The offshore culture target detection system of claim 7, wherein the feature map output by the Transformer network is used as a feature map input by the first-stage up-sampling branch network, and the input feature map is normalized and fused with a feature map output by a third C3 network of the main network after convolution and one-time up-sampling in sequence;

9. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-4.

10. A computer device, comprising:

one or more processors, and one or more memories in which one or more programs are stored and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-4.