CN113076871B - Fish shoal automatic detection method based on target shielding compensation - Google Patents

Fish shoal automatic detection method based on target shielding compensation Download PDF

Info

Publication number
CN113076871B
CN113076871B CN202110354428.6A CN202110354428A CN113076871B CN 113076871 B CN113076871 B CN 113076871B CN 202110354428 A CN202110354428 A CN 202110354428A CN 113076871 B CN113076871 B CN 113076871B
Authority
CN
China
Prior art keywords
feature
fish
feature map
image
characteristic diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110354428.6A
Other languages
Chinese (zh)
Other versions
CN113076871A (en
Inventor
丁泉龙
杨伟健
曹燕
王一歌
韦岗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110354428.6A priority Critical patent/CN113076871B/en
Publication of CN113076871A publication Critical patent/CN113076871A/en
Application granted granted Critical
Publication of CN113076871B publication Critical patent/CN113076871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/80Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
    • Y02A40/81Aquaculture, e.g. of fish

Abstract

The invention discloses a fish school automatic detection method based on target shielding compensation, which comprises the following steps: the method comprises the steps that a camera is carried on a multi-rotor unmanned aerial vehicle to collect fish shoal images, and marking and data expansion are carried out; performing feature extraction, namely performing multistage feature extraction from shallow to deep on an input fish school image by using a double-branch feature extraction network to obtain five feature maps; carrying out feature fusion, fusing semantic information of a deep feature map into a shallow feature map on the upper layer by using an improved semantic embedding branch, and fusing detail information of a four-time down-sampling feature map into an eight-time down-sampling feature map; and predicting the fish target through the three characteristic graphs to obtain a candidate frame, processing the repeated candidate frame by adopting an improved DIoU _ NMS non-maximum value inhibition algorithm, and outputting a fish school detection result. The invention can improve the recall rate of the fish school detection when mutual shielding is caused by fish school aggregation, thereby improving the average accuracy of the fish school detection.

Description

Fish shoal automatic detection method based on target shielding compensation
Technical Field
The invention relates to the technical field of image target detection, in particular to a fish school automatic detection method based on target shielding compensation.
Background
Modern fish culture is not independent of systematic management, and fish school detection has very important practical significance for culture industrialization, wherein the fish school detection can detect whether fish exists and the size of the fish, and further evaluate whether culture and fish feeding are proper.
The fish school detection can adopt a sonar image method and an optical image method. The sonar image method utilizes the ultrasonic principle, gathers fish shoal sonar image through sonar system, then detects out the fish target from the sonar image, but to actual underwater scene, adopts sonar image method easily to receive the interference of other objects. With the development and improvement of underwater photography technology, optical imaging methods are now available. By adopting an optical image method, an optical image of a fish shoal needs to be acquired firstly, and then the fish is detected and marked by a target detection method. And the target detection is a branch in image processing, which is to find out all objects in the specified category in the picture and mark their specific positions in the image with rectangular frames. The manual marking of the fish school is expensive and inefficient, and in order to promote the development of the automatic informatization of the fish farming industry, the research of the automatic fish school detection method aiming at the actual underwater environment of the farm is very important.
With the continuous development of computer technology, the automatic detection of the underwater fish school optical image by using deep learning can reduce the time for searching and marking fish, thereby saving the time for relevant workers to execute the task and improving the working efficiency.
The Yolov4 target detection algorithm belongs to a deep learning algorithm, gives consideration to detection speed and detection precision, and is widely applied to the field of image target detection. The YOLOv4 algorithm firstly sends a data set into a YOLOv4 network for training, stores a trained network model weight file, then inputs a test image by using the stored network model weight file, namely a prediction frame which possibly has a target in the test image can be generated, and simultaneously gives a confidence score of the target in the prediction frame. The algorithm has good effects on detection speed and detection precision, is suitable for being applied to automatic fish school detection, and can quickly obtain a detection result after a fish school image is shot.
However, when the image data of the fish school is shot underwater actually, the underwater scene is complex, the collected fish school images have the condition that mutual shielding is caused by fish school aggregation, if the YOLOv4 algorithm is directly used for detecting the fish school target, the detection effect on the shielded target is poor, detection omission occurs, and the recall rate of the fish target is relatively low. Therefore, it is desirable to provide an underwater fish detection method with high recall rate.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides an automatic fish school detection method based on target occlusion compensation.
The purpose of the invention can be achieved by adopting the following technical scheme:
a fish school automatic detection method based on target shielding compensation comprises the following steps:
s1, collecting a fish school image in a pond environment through a multi-rotor unmanned spacecraft carrying a camera, and marking and data expansion are carried out on the collected fish school image;
the underwater fish shoal image can be acquired by flying the multi-rotor unmanned airship to the sky of an interested water area and landing the unmanned airship to the water surface, and then acquiring optical image data of the cultured fish shoal by using a camera carried on the unmanned airship.
S2, inputting the fish image into a double-branch feature extraction network to perform multistage feature extraction from shallow to deep, wherein the double-branch feature extraction network is a light-weight original information feature extraction network parallel to CSPDarknet53 on the basis of a main feature extraction network CSPDarknet53 of a YOLOv4 algorithm, and is called as the double-branch feature extraction network; after multi-stage feature extraction is carried out by a double-branch feature extraction network, five feature maps are obtained, and the five feature maps are respectively a two-time down-sampling feature map F A1 Fourfold down-sampling feature map F A2 Eight-time down-sampling feature map F A3 Sixteen-fold down-sampling feature map F A4 Thirty-two times downsampling feature map F A5 The resolution is 1/2, 1/4, and the like of the input fish image,1/8、1/16、1/32;
S3, using an improved Semantic Embedding Branch (MSEB) to obtain the characteristic diagram F obtained in the step S2 A5 Fusing semantic information of to the feature map F A4 In (1), obtaining a characteristic diagram F AM4 Feature map F AM4 The resolution of (1/16) of the input fish shoal image; the characteristic diagram F obtained in the step S2 A4 Fusing semantic information of to the feature map F A3 In (1), obtaining a characteristic diagram F AM3 Feature map F AM3 The resolution of (2) is 1/8 of the input fish school image;
s4, carrying out convolution downsampling on the feature map F obtained in the step S2 in a quadruple downsampling mode A2 The detail information of (2) is fused with the eight-fold down-sampling feature map F obtained in the step S3 AM3 In (1), obtaining a characteristic diagram F AMC3 Feature map F AMC3 The resolution of (1/8) of the input fish shoal image;
s5, obtaining the characteristic diagram F obtained in the step S2 A5 And S3, obtaining a characteristic diagram F AM4 And the characteristic diagram F obtained in step S4 AMC3 After feature fusion is carried out on the feature pyramid structure of the YOLOv4 algorithm, three feature graphs are obtained, wherein the three feature graphs are F B3 、F B4 And F B5 Then using the feature map F B3 、F B4 And F B5 Predicting the fish target after convolution processing to obtain repeated candidate frames and corresponding prediction confidence scores;
and S6, processing the repeated candidate frames by adopting a non-maximum suppression algorithm of the improved DIoU _ NMS to obtain a prediction frame result containing the prediction confidence score, and drawing the prediction frame result on a corresponding picture to serve as a fish shoal detection result.
Further, in the step S1, the collected fish targets in each fish school image are marked one by one through labelImg image marking software, an xml tag file containing marking information is generated for each marked image, and the collected fish school image and the tag file corresponding to the collected fish school image construct an original data set; and then expanding the original data set in a data enhancement mode comprising vertical turnover, horizontal turnover, brightness change, random Gaussian white noise addition, filtering and affine transformation to form a final data set and improve the robustness of the network model to environmental changes.
Further, in step S2, the fish school image is input into a two-branch feature extraction network to perform multistage feature extraction from shallow to deep, so as to more fully extract and retain the original features of the input image and compensate for the problem of insufficient fish features when the fish school is blocked, and the specific process of performing feature extraction by the two-branch feature extraction network is as follows:
the trunk feature extraction network CSPDarknet53 comprises a CBM unit and five cross-phase local network CSPx units; the CBM unit consists of a Convolution layer conversion with the step length of 1 and a Convolution kernel of 3*3, a Batch Normalization layer Batch Normalization and a Mish activation function layer; the CSPx unit is formed by fusing a plurality of CBM units and x Res unit residual error units (Consatenate), each Res unit residual error unit is composed of a CBM unit with a convolution kernel of 1*1, a CBM unit with a convolution kernel of 3*3 and a residual error structure, the two feature maps are spliced on a channel through the Consatenate fusion operation, and the dimension of the feature map obtained after splicing is expanded; the number of channels of the convolutional layer conversion of the five CSPx units is 64, 128, 256, 512 and 1024 in sequence, and each CSPx unit is subjected to twice downsampling; the characteristic graphs obtained by five CSPx units are respectively F C1 、F C2 、F C3 、F C4 、F C5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
the lightweight original information feature extraction network comprises five CM units, wherein each CM unit consists of a convolutional layer Convolume with the step length of 2 and a convolutional kernel of 3*3 and a maximum pooling layer MaxPool with the pooling step length of 1 and a pooling kernel of 3*3, each convolutional layer with the step length of 2 is subjected to twice downsampling, and the number of convolutional layer channels of each CM unit is the same as that of corresponding cross-stage local network CSPx units in a main feature extraction network CSPDarknet 53; the characteristic diagram obtained by five CM units is F L1 、F L2 、F L3 、F L4 、F L5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
then, in the multistage feature extraction process from shallow to deep, a feature graph F extracted by a lightweight original information feature extraction network is used Li Feature map F extracted from corresponding CSPDarknet53 network Ci Performing Add fusion operation, i =1,2,3,4,5, add fusion operation adding corresponding pixel values of the two feature maps to obtain a final extracted feature map F A1 、F A2 、F A3 、F A4 、F A5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively.
Furthermore, in the multi-stage feature extraction process from shallow to deep, the feature map extracted from the shallow layer has rich detail information, but lacks semantic information, and cannot better identify and detect the target; on the contrary, semantic information can be well extracted from a deep layer, but a large amount of detail information is lost, and position information cannot be effectively predicted. Therefore, in step S3, the improved semantic embedding branch is used to fuse the semantic information of the deep feature map into the shallow feature map on the upper layer, so as to compensate the problem of insufficient semantic information in the shallow feature map, thereby improving the recall rate of the fish target during detection, and the process of fusing by using the improved semantic embedding branch is as follows:
firstly, the deep layer characteristic diagram F obtained in the step S2 A5 Performing coordinate fusion operation on different scale features by using a convolutional layer with a convolutional kernel of 1*1 and a convolutional layer with a convolutional kernel of 3*3, performing two-time upsampling by using a nearest neighbor interpolation mode after passing through a Sigmoid function, and performing two-time upsampling on the upsampled data, and then obtaining a shallow feature map F with the shallow feature map F obtained in the step S2 A4 Multiplying the pixel values to obtain a feature map F AM4 The resolution is 1/16 of the input fish image, so that the deep characteristic diagram F A5 Fusing semantic information of (2) to the shallow feature map F A4 Middle, shallow layer characteristic map F A4 The semantic information of (2) is insufficient;
then, the deep layer characteristic diagram F obtained in the step S2 A4 Fusing semantic information thereof into a shallow feature map F by using improved semantic embedding branches A3 In (1), obtaining a characteristic diagram F AM3 The resolution is 1/8 of the input fish image, and the shallow characteristic diagram F is compensated A3 The semantic information of (2) is insufficient;
the Sigmoid functional form used in the improved semantic embedding branch is as follows:
Figure BDA0003003154500000051
where i is the input and e is the natural constant.
Further, in the step S4, by means of convolution downsampling, the detail information of the four-time downsampling feature map is fused into the eight-time downsampling feature map, the detail information of the four-time downsampling feature map is fully utilized, the positioning of the edge contour of the fish when the fish school is shielded is compensated, and the fusion process is as follows:
firstly, the quadruple down-sampling feature map F obtained in the step S2 is subjected to A2 Processing by a CBL unit, wherein the CBL unit consists of a Convolution layer Convolation with a step size of 1 and a Convolution kernel of 3*3, a Batch Normalization layer Batch Normalization and a LeakyReLU activation function layer, performing double downsampling by using the Convolution layer Convolation with a step size of 2 and a Convolution kernel of 3*3, and performing downsampling by using a characteristic diagram F obtained in the step S3 AM3 Carrying out Concatenate fusion operation after CBL unit treatment to obtain a characteristic diagram F AMC3 The resolution is 1/8 of the input fish image, so that the feature map F is sampled by four times A2 The details of (a).
Further, the step S5 process is as follows:
firstly, the characteristic diagram F obtained in the step S2 A5 And step S3, obtaining a characteristic diagram F AM4 And the characteristic diagram F obtained in step S4 AMC3 After feature fusion is carried out on the feature pyramid structure of the YOLOv4 algorithm, three feature graphs F are obtained B3 、F B4 And F B5 Wherein, the feature Pyramid structure of the YOLOv4 algorithm includes a Spatial Pyramid Pooling layer (SPP) and a Path Aggregation Network (PANet), and the structure of the Spatial Pyramid Pooling layer is in a feature map F A5 After three times of CBL unit treatment, four maximum pooling layers with pooling cores of 1*1, 5*5, 9*9 and 13 × 13 are adopted for CPerforming oncatenate fusion operation, wherein the structure of the path aggregation network repeatedly performs fusion operation on the features through paths from bottom to top and from top to bottom; then, three feature maps F are processed B3 、F B4 And F B5 Respectively carrying out convolution layer processing with a CBL unit and a convolution kernel of 1*1 to obtain three Prediction feature maps of different sizes, namely Prediction1, prediction2 and Prediction3, wherein the resolutions of the three Prediction feature maps are respectively 1/8, 1/16 and 1/32 of the input fish school image; and then, predicting the fish target by using the three prediction characteristic graphs to obtain repeated candidate frames and corresponding prediction confidence scores.
Further, in step S6, the non-maximum suppression algorithm of the improved DIoU _ NMS is used to process the repeated candidate box, so that the missed detection problem of the blocked target is compensated, and the recall rate of the blocked fish is further improved, wherein the specific processing procedure is as follows:
s601, traversing all candidate frames in an image, sequentially judging the prediction confidence score of each candidate frame, reserving the candidate frames with the scores larger than the confidence threshold value and the corresponding scores thereof, and deleting the candidate frames with the scores lower than the confidence threshold value;
s602, selecting the candidate frame M with the highest prediction confidence score in the residual candidate frames, and traversing the other candidate frames B in sequence i Calculating Distance cross ratio Distance-IoU with candidate frame M, wherein Distance cross ratio Distance-IoU is abbreviated as DIoU if a certain candidate frame B i If the value of DIoU with the candidate box M is not lower than the given threshold value epsilon, the overlap of the two boxes is considered to be high, and if the candidate box B is directly deleted according to the DIoU _ NMS algorithm i The problem of missed detection caused by blocking due to fish swarm aggregation is easily caused, so the improved DIoU _ NMS algorithm is used for solving the problem of missed detection of the candidate box B i The prediction confidence score of (2) is reduced, and then the candidate frame M is removed to the final prediction frame set G; wherein the prediction confidence score reduction criterion is as follows:
Figure BDA0003003154500000071
where M is the current prediction confidence score highestCandidate frame, B i Is the other candidate box to be traversed, ρ (M, B) i ) Is M and B i C is a distance containing M and B i The diagonal length of the minimum bounding rectangle, DIoU (M, B) i ) Is M and B i Is the threshold value of the given distance cross-over ratio DIoU, S i Is candidate frame B i Is predicted confidence score, S' i Is candidate frame B i A reduced score prediction confidence score;
and S603, repeatedly executing the step S602 until all the candidate frames are processed, and drawing the final prediction frame set G on the corresponding picture as an output result to obtain a fish school detection result.
Further, in the step S602, DIoU is added with a penalty factor based on the intersection ratio IoU, the penalty factor considering the distance between the center points of the two candidate frames, DIoU (M, B) i ) The calculation method of (c) is as follows:
Figure BDA0003003154500000072
wherein M is the candidate box with the highest current prediction confidence score, B i Is the other candidate box to be traversed, ρ (M, B) i ) Is M and B i C is a distance containing M and B i IoU (M, B) is the minimum diagonal length of the circumscribed rectangle i ) Is M and B i The ratio of the intersection and union of (a).
Compared with the prior art, the invention has the following advantages and effects:
(1) In the image feature extraction process, the double-branch feature extraction network is used for extracting the features of the input fish school image, the problem of insufficient fish features when the fish school is shielded is compensated, and the original features of the fish can be extracted more fully.
(2) The invention adopts the improved semantic embedded branch MSEB to fuse the semantic information of the deep layer feature map into the feature map of the upper layer, thereby making up the problem of insufficient semantic information in the shallow layer feature map of the upper layer and further improving the recall rate of the fish target.
(3) According to the method, the detail information of the four-time down-sampling feature map is fused into the eight-time down-sampling feature map, so that the edge contour information of the fish is fully acquired by utilizing the detail information of the four-time down-sampling feature map, and the edge contour of the fish can be more accurately positioned when a fish shoal is shielded.
(4) The invention adopts the non-maximum suppression algorithm of the improved DIoU _ NMS to process the repeated candidate frame, and the missed detection problem of the blocked target is compensated, so that the missed detection of the deleted repeated candidate frame and the true frame is balanced, and the recall rate of the blocked fish is further improved.
Drawings
FIG. 1 is a flow chart of a fish shoal automatic detection method based on target occlusion compensation disclosed by the invention;
fig. 2 is a network structure diagram of a fish school automatic detection method based on target occlusion compensation in an embodiment of the present invention, where Concat represents Concatenate fusion operation;
fig. 3 is a block diagram of an improved semantic embedding branch MSEB in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Examples
In this embodiment, the flow chart shown in fig. 1 and the network structure chart shown in fig. 2 are used to provide an automatic fish school detection method based on target occlusion compensation, so as to realize automatic detection of underwater fish school targets, and the specific flow is as follows:
s1, flying the unmanned airship to the sky of a water area of interest by using a multi-rotor wing and landing the unmanned airship to the water surface, then shooting image data of cultured fish schools by using a camera carried on the unmanned airship, enabling a camera to face the front, setting the interval time of shooting the images to be 5 seconds, enabling the original resolution of the shot images to be 1920 x 1080, and then marking and expanding the collected fish school images to obtain a data set for training;
s2, inputting the fish image into a double-branch feature extraction network for multistage feature extraction from shallow to deep, wherein the double-branch feature extraction network is specifically characterized in that a lightweight original information feature extraction network parallel to a trunk feature extraction network CSPDarknet53 is added on the basis of the trunk feature extraction network CSPDarknet53 of a YOLOv4 algorithm, so that the double-branch feature extraction network is called; after multi-stage feature extraction is carried out by a double-branch feature extraction network, five feature maps are obtained, and the five feature maps are respectively a two-time down-sampling feature map F A1 Fourfold downsampling feature map F A2 Eight-fold down-sampling feature map F A3 Sixteen-fold down-sampling feature map F A4 Thirty-two times downsampling feature map F A5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
s3, using an improved Semantic Embedding Branch (MSEB) to obtain the characteristic diagram F obtained in the step S2 A5 Fusing semantic information to feature maps F A4 In (1), obtaining a characteristic diagram F AM4 The resolution of the input fish school image is 1/16 of the input fish school image; the characteristic diagram F obtained in the step S2 A4 Fusing semantic information of to the feature map F A3 In (1), obtaining a characteristic diagram F AM3 The resolution is 1/8 of the input fish school image;
s4, carrying out convolution downsampling on the feature map F obtained in the step S2 in a quadruple downsampling mode A2 The detail information of (2) is fused with the eight-fold down-sampling feature map F obtained in the step S3 AM3 In (1), a characteristic diagram F is obtained AMC3 The resolution is 1/8 of the input fish school image;
s5, obtaining the characteristic diagram F obtained in the step S2 A5 And step S3, obtaining a characteristic diagram F AM4 And the characteristic diagram F obtained in step S4 AMC3 After feature fusion is carried out on the feature pyramid structure of the YOLOv4 algorithm, three feature graphs are obtained, wherein the three feature graphs are F B3 、F B4 And F B5 Then makeUsing characteristic diagrams F B3 、F B4 And F B5 Predicting the fish target after convolution processing to obtain repeated candidate frames and corresponding prediction confidence scores;
and S6, processing the repeated candidate frames by adopting a non-maximum suppression algorithm of the improved DIoU _ NMS to obtain a prediction frame result containing the prediction confidence score, and drawing the prediction frame result on a corresponding picture to serve as a fish shoal detection result.
In the embodiment, step S1 uses labeimg marking software to mark fish bodies in the collected fish school image one by using a rectangular frame in a manual marking manner to obtain a corresponding xml tag file, and records coordinates and categories of each target in the image; and then expanding the acquired fish image and the corresponding tag file thereof in a data enhancement mode comprising vertical turnover, horizontal turnover, brightness change, random Gaussian white noise addition, filtering and affine transformation to form a final data set and improve the robustness of the network model to environmental changes.
In this embodiment, in step S2, a fish school image is input into a dual branch feature extraction network to perform multistage feature extraction from shallow to deep, a specific structure of the dual branch feature extraction network is given at 208 in fig. 2, a lightweight original information feature extraction network parallel to CSPDarknet53 is added on the basis of a main feature extraction network CSPDarknet53 of YOLOv4 algorithm, and a structure of the dual branch feature extraction network is described as follows:
the trunk feature extraction network CSPDarknet53 comprises a CBM unit and five cross-phase local network CSPx units; the CBM unit is composed of a Convolution layer convention, a Batch Normalization layer Batch Normalization and a mesh activation function layer, wherein the step length is 1, the Convolution kernel is 3*3, and a structure of one CBM unit is given by 201 in fig. 2; the CSPx unit is formed by fusing a plurality of CBM units and x Res unit residual error units, wherein 204 in FIG. 2 shows the structure of a CSPx unit; a Res unit residual error unit in the CSPx unit is composed of a CBM unit with a convolution kernel of 1*1, a CBM unit with a convolution kernel of 3*3, and a residual error structure, and 203 in fig. 2 gives a structure of a Res unit residual error unit; the Concatenate fusion operation is performed on twoThe feature graphs are spliced on the channels, and the dimensions can be expanded; the number of convolution layer channels of the five CSPx units is 64, 128, 256, 512 and 1024 in sequence, and each CSPx unit is subjected to twice down sampling; the characteristic diagram obtained by five CSPx units is F C1 、F C2 、F C3 、F C4 、F C5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
the lightweight original information feature extraction network comprises five CM units, wherein each CM unit consists of a Convolution layer constraint with the step size of 2 and the Convolution kernel of 3*3 and a maximum pooling layer MaxPool with the pooling step size of 1 and the pooling kernel of 3*3, and the structure of one CM unit is given by 205 in FIG. 2; the convolution layer with the step length of 2 can be subjected to one-time double down sampling, and the number of convolution layer channels of each CM unit is the same as that of corresponding cross-phase local network CSPx units in the main feature extraction network CSPDarknet 53; the characteristic diagram obtained by five CM units is F L1 、F L2 、F L3 、F L4 、F L5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
then, in the multi-stage feature extraction process from shallow to deep, a feature graph F extracted by a light-weight original information feature extraction network is obtained Li (i =1,2,3,4,5) and feature map F extracted by corresponding CSPDarknet53 network Ci (i =1,2,3,4,5) performing Add fusion operation, which is to Add corresponding pixel values of the two feature maps to obtain a final extracted feature map F A1 、F A2 、F A3 、F A4 、F A5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively.
In this embodiment, in step S3, the improved semantic embedding branch MSEB is used to fuse the semantic information of the deep feature map into the shallow feature map on the upper layer, and fig. 3 shows a specific structure diagram of the improved semantic embedding branch MSEB; the specific step of using MSEB to carry out fusion is that firstly, the deep layer characteristic diagram F obtained in step S2 A5 Concat for different scale features using convolutional layer with convolutional kernel 1*1 and convolutional layer with convolutional kernel 3*3enate fusion, performing two-time upsampling by using a nearest neighbor interpolation mode after passing through a Sigmoid function, and then performing the two-time upsampling with the shallow feature map F obtained in the step S2 A4 Multiplying the pixel values to obtain a feature map F AM4 The resolution is 1/16 of the input fish image, so that the deep layer characteristic diagram F A5 Fusing semantic information of (2) to the shallow feature map F A4 Middle, shallow layer characteristic map F A4 The semantic information of (2) is insufficient;
then, the deep layer characteristic diagram F obtained in the step S2 A4 The MSEB is also used for fusing the semantic information thereof into a shallow feature map F A3 In (1), obtaining a characteristic diagram F AM3 The resolution is 1/8 of the input fish image, and the shallow characteristic diagram F is compensated A3 The semantic information of (2) is insufficient;
the form of Sigmoid function used in the improved semantic embedding branch MSEB is as follows:
Figure BDA0003003154500000121
where i is the input and e is the natural constant.
In this embodiment, the implementation process of step S4 is as follows:
firstly, the quadruple down-sampling feature map F obtained in the step S2 is subjected to A2 After being processed by a CBL unit, wherein the CBL unit consists of a Convolution layer convention, a Batch Normalization layer Batch Normalization and a LeakyReLU activation function layer, the Convolution kernel of which is 3*3 with the step size of 1, 202 in FIG. 2 shows the structure of one CBL unit; then, the Convolution layer Convolution with 2 Convolution kernels of 3*3 is used for double down-sampling, and then the feature map F obtained in step S3 is combined with the Convolution layer Convolution AM3 Carrying out Concatenate fusion operation after CBL unit treatment to obtain a characteristic diagram F AMC3 The resolution is 1/8 of the input fish image, so that the feature map F is sampled by four times A2 The method fully obtains the edge contour information of the fish and compensates the edge contour positioning of the fish when the fish school is shielded.
In this embodiment, the implementation process of step S5 is as follows:
firstly, the characteristic diagram F obtained in the step S2 A5 And step S3, obtaining a characteristic diagram F AM4 And the characteristic diagram F obtained in step S4 AMC3 After feature fusion is carried out on the feature pyramid structure of the YOLOv4 algorithm, three feature graphs F are obtained B3 、F B4 And F B5 Wherein, the feature Pyramid structure of the YOLOv4 algorithm includes a Spatial Pyramid Pooling layer (SPP) and a Path Aggregation Network (PANet), and the SPP structure is in a feature graph F A5 After three times of CBL unit processing, using four maximum pooling layers with pooling cores of 1*1, 5*5, 9*9 and 13 × 13 to perform Concatenate fusion, wherein 206 in FIG. 2 shows the structure of SPP, a PANET structure repeatedly fuses features through a bottom-up path and a top-down path, and 207 in FIG. 2 shows the structure of PANET; then three feature maps F are processed B3 、F B4 And F B5 Respectively carrying out convolution layer processing with a CBL unit and a convolution kernel of 1*1 to obtain three Prediction feature maps of different sizes, namely Prediction1, prediction2 and Prediction3, wherein the resolutions of the three Prediction feature maps are respectively 1/8, 1/16 and 1/32 of the input fish school image; and then, predicting the fish target by using the three prediction characteristic graphs to obtain repeated candidate frames and corresponding prediction confidence scores.
In this embodiment, the implementation process of step S6 is as follows:
s601, traversing all candidate frames in an image, sequentially judging the prediction confidence score of each candidate frame, reserving the candidate frames with the scores larger than the confidence threshold value and the corresponding scores thereof, and deleting the candidate frames with the scores lower than the confidence threshold value;
s602, selecting the candidate frame M with the highest prediction confidence score in the residual candidate frames, and traversing the other candidate frames B in sequence i Calculating Distance cross ratio Distance-IoU with candidate frame M, wherein Distance cross ratio Distance-IoU is abbreviated as DIoU if a certain candidate frame B i If the value of DIoU with the candidate frame M is not less than the given threshold value epsilon, the degree of overlap between the two frames is considered to be high, and the candidate frame B is not directly deleted i Instead, the candidate frame B is reduced i Then remove the candidate box M to the lastIn the prediction box set G; wherein the prediction confidence score reduction criterion is as follows:
Figure BDA0003003154500000131
where M is the candidate box with the highest current prediction confidence score, B i Is the other candidate box to traverse, ρ (M, B) i ) Is M and B i C is a distance containing M and B i The diagonal length of the minimum bounding rectangle, DIoU (M, B) i ) Is M and B i Is a given threshold value of the distance cross-over ratio DIoU, S i Is candidate frame B i Is predicted confidence score, S' i Is candidate frame B i A reduced score prediction confidence score;
and S603, repeatedly executing the step S602 until all the candidate frames are processed, and drawing the final prediction frame set G on the corresponding picture as an output result to obtain a fish school detection result.
In the DIoU in the step S602, a penalty factor is added on the basis of the intersection ratio IoU, and the penalty factor takes into account the distance between the center points of the two candidate frames, and the specific calculation method is as follows:
Figure BDA0003003154500000141
wherein M is the candidate box with the highest current prediction confidence score, B i Is the other candidate box to be traversed, ρ (M, B) i ) Is M and B i C is a distance containing M and B i IoU (M, B) is the minimum diagonal length of the circumscribed rectangle i ) Is M and B i The ratio of the intersection and union of (a).
In this embodiment, the Prediction frame needs to be continuously adjusted during training to be close to the real frame of the target to be detected, so that 9 kinds of prior frames with different sizes are obtained by using a K-means clustering algorithm on the fish image data set before training, the prior frames are suitable for the collected fish image data set, and the three Prediction feature maps, namely, prior 1, prior 2 and prior 3, are respectively set to be 3 kinds of prior frames. The K-means clustering algorithm measures the approaching degree of the two frames by using the intersection ratio IoU as an index, and a formula for specifically calculating the distance between the two frames is as follows:
distance (box, center) =1-IoU (box, center) formula (4)
Wherein box represents a candidate frame to be calculated, center represents a candidate frame of a cluster center, and IoU (box, center) represents an intersection ratio of the candidate frame to be calculated and the candidate frame of the cluster center.
In this embodiment, during training, the initial learning rate is set to 0.0002, the initial training iteration round number epoch is set to 45, 8 images are randomly selected for training each time, an Adam optimizer is used to accelerate network model convergence, and meanwhile, in order to reduce GPU memory overhead, the resolution of each image for training is adjusted to 416 × 416.
In this embodiment, the loss function loss predicts the error L from the regression box loc Confidence error L conf Classification error L cls The method comprises the following three parts of components, wherein a specific calculation formula is as follows:
Figure BDA0003003154500000151
Figure BDA0003003154500000152
the specific calculation method of v in the above formula (5) is formula (6), ioU (P, T) is the intersection ratio of the prediction frame and the real frame, and ρ (P) ctr ,T ctr ) Is the distance between the center points of the predicted frame and the real frame, d is the diagonal length of the minimum bounding rectangle containing the predicted frame and the real frame, w gt And h gt Respectively the width and height of the real frame, w and h respectively the width and height of the predicted frame, the image is divided into S x S grids, M is the number of prior frames anchor generated by each grid,
Figure BDA0003003154500000153
indicating that the prediction box contains the object to be detected,
Figure BDA0003003154500000154
indicating that the prediction box does not contain the object to be detected,
Figure BDA0003003154500000155
is the prediction confidence for the corresponding prior box,
Figure BDA0003003154500000156
is the actual confidence, λ noobj Is a set weight coefficient, c is a category to which the object to be detected belongs,
Figure BDA0003003154500000157
is the actual probability that the object in the corresponding mesh belongs to the class c,
Figure BDA0003003154500000158
is the predicted probability that the target in the corresponding mesh belongs to the class c.
In this embodiment, after the relevant parameters are set, the fish school data set is trained, a curve change of loss can be obtained after the training is completed, the loss function loss starts to decrease faster and tends to converge finally, a trained fish school target detection model weight file is stored, then a test fish school image file is input by using the stored model weight file, the fish school image can be subjected to fish target detection, prediction frames in which targets may exist in the image are generated, a prediction confidence score of each prediction frame is given, and an image with the prediction frames and the prediction confidence scores thereof is output.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims (7)

1. A fish school automatic detection method based on target shielding compensation is characterized by comprising the following steps:
s1, collecting a fish school image in a pond environment through a multi-rotor unmanned spacecraft carrying a camera, and marking and data expansion are carried out on the collected fish school image;
s2, inputting the fish image into a double-branch feature extraction network for multistage feature extraction from shallow to deep, wherein the double-branch feature extraction network is a lightweight original information feature extraction network parallel to CSPDarknet53 on the basis of a trunk feature extraction network CSPDarknet53 of a YOLOv4 algorithm; after multi-stage feature extraction is carried out by a double-branch feature extraction network, five feature maps are obtained, and the five feature maps are respectively a two-time down-sampling feature map F A1 Fourfold down-sampling feature map F A2 Eight-fold down-sampling feature map F A3 Sixteen-fold downsampling feature map F A4 Thirty-two times downsampling feature map F A5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
the trunk feature extraction network CSPDarknet53 comprises a CBM unit and five cross-phase local network CSPx units; the CBM unit consists of a Convolution layer Convolution with the step length of 1 and a Convolution kernel of 3*3, a Batch Normalization layer Batch Normalization and a Mish activation function layer; the CSPx unit is formed by fusing a plurality of CBM units and x Res unit residual error units (Consatenate), each Res unit residual error unit is composed of a CBM unit with a convolution kernel of 1*1, a CBM unit with a convolution kernel of 3*3 and a residual error structure, the two feature maps are spliced on a channel through the Consatenate fusion operation, and the dimension of the feature map obtained after splicing is expanded; the channel number of the Convolution layer convergence of the five CSPx units is 64, 128, 256, 512 and 1024 in sequence, and each CSPx unit is subjected to twice down-sampling; the characteristic graphs obtained by five CSPx units are respectively F C1 、F C2 、F C3 、F C4 、F C5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
the lightweight original information feature extraction network comprises five CM units,the CM unit consists of a convolutional layer Convolume with the step length of 2 and a convolutional layer with the convolutional kernel of 3*3 and a maximum pooling layer MaxPool with the pooling step length of 1 and the pooling kernel of 3*3, the convolutional layer with the step length of 2 is subjected to twice down-sampling, and the number of convolutional layer channels of each CM unit is the same as that of corresponding cross-stage local network CSPx units in a main feature extraction network CSPDarknet 53; the characteristic diagram obtained by five CM units is F L1 、F L2 、F L3 、F L4 、F L5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
then, in the multi-stage feature extraction process from shallow to deep, a feature graph F extracted by a light-weight original information feature extraction network is obtained Li Feature map F extracted from corresponding CSPDarknet53 network Ci Performing Add fusion operation, i =1,2,3,4,5, add fusion operation adding corresponding pixel values of the two feature maps to obtain a final extracted feature map F A1 、F A2 、F A3 、F A4 、F A5 The resolution is 1/2, 1/4, 1/8, 1/16 and 1/32 of the input fish image respectively;
s3, using improved semantic embedding branch to embed the feature graph F obtained in the step S2 A5 Fusing semantic information of to the feature map F A4 In (1), a characteristic diagram F is obtained AM4 Feature map F AM4 The resolution of (1/16) of the input fish shoal image; the characteristic diagram F obtained in the step S2 A4 Fusing semantic information of to the feature map F A3 In (1), obtaining a characteristic diagram F AM3 Characteristic diagram F AM3 The resolution of (1/8) of the input fish shoal image;
s4, carrying out convolution downsampling on the feature map F obtained in the step S2 in a quadruple downsampling mode A2 The detail information of (2) is fused with the eight-fold down-sampling feature map F obtained in the step S3 AM3 In (1), a characteristic diagram F is obtained AMC3 Characteristic diagram F AMC3 The resolution of (1/8) of the input fish shoal image;
s5, obtaining the characteristic diagram F obtained in the step S2 A5 And step S3, obtaining a characteristic diagram F AM4 And the characteristic diagram F obtained in step S4 AMC3 Characteristic pyramid knot through YOLOv4 algorithmAfter feature fusion, three feature maps are obtained, namely F B3 、F B4 And F B5 Then using the feature map F B3 、F B4 And F B5 Predicting the fish target after convolution processing to obtain repeated candidate frames and corresponding prediction confidence scores;
and S6, processing the repeated candidate frames by adopting a non-maximum value suppression algorithm of the improved DIoU _ NMS to obtain a prediction frame result containing the prediction confidence score, and drawing the prediction frame result on a corresponding picture to be used as a fish school detection result.
2. The method according to claim 1, wherein in step S1, labelImg image labeling software is used to label the fish targets in each collected fish image one by one, and after labeling, each image generates an xml tag file containing labeling information, and the collected fish images and the tag files corresponding to the collected fish images construct an original data set; and then expanding the original data set in a data enhancement mode comprising vertical inversion, horizontal inversion, brightness change, random Gaussian white noise addition, filtering and affine transformation to form a final data set.
3. The method for automatically detecting fish school based on target occlusion compensation as claimed in claim 1, wherein the fusion process using improved semantic embedding branch in step S3 is as follows:
firstly, the deep layer characteristic diagram F obtained in the step S2 A5 Performing coordinate fusion operation on different scale features by using a convolutional layer with a convolutional kernel of 1*1 and a convolutional layer with a convolutional kernel of 3*3, performing two-time upsampling by using a nearest neighbor interpolation mode after passing through a Sigmoid function, and performing two-time upsampling on the upsampled data, and then obtaining a shallow feature map F with the shallow feature map F obtained in the step S2 A4 Multiplying the pixel values to obtain a feature map F AM4 The resolution is 1/16 of the input fish image, so that the deep characteristic diagram F A5 Fusing semantic information of (2) to the shallow feature map F A4 Performing the following steps;
then, the deep layer obtained in step S2 is processedFeature map F A4 Fusing its semantic information to the shallow feature map F also using improved semantic embedding branches A3 In (1), a characteristic diagram F is obtained AM3 The resolution is 1/8 of the input fish school image;
the Sigmoid functional form used in the improved semantic embedding branch is as follows:
Figure FDA0003754725310000041
where i is the input and e is the natural constant.
4. The method for automatically detecting fish school based on target occlusion compensation as claimed in claim 1, wherein the step S4 comprises the following steps:
firstly, the quadruple down-sampling feature map F obtained in the step S2 is subjected to A2 Processing by a CBL unit, wherein the CBL unit consists of a Convolution layer Convolation with a step size of 1 and a Convolution kernel of 3*3, a Batch Normalization layer Batch Normalization and a LeakyReLU activation function layer, performing double downsampling by using the Convolution layer Convolation with a step size of 2 and a Convolution kernel of 3*3, and performing downsampling by using a characteristic diagram F obtained in the step S3 AM3 Carrying out Concatenate fusion operation after CBL unit treatment to obtain a characteristic diagram F AMC3 The resolution is 1/8 of the input fish image.
5. The method for automatically detecting fish school based on target occlusion compensation as claimed in claim 4, wherein the step S5 comprises the following steps:
firstly, the characteristic diagram F obtained in the step S2 A5 And S3, obtaining a characteristic diagram F AM4 And the characteristic diagram F obtained in step S4 AMC3 After feature fusion is carried out on the feature pyramid structure of the YOLOv4 algorithm, three feature graphs F are obtained B3 、F B4 And F B5 Wherein, the feature pyramid structure of the YOLOv4 algorithm comprises a spatial pyramid pooling layer and a path aggregation network, and the structure of the spatial pyramid pooling layer is in a feature map F A5 Three times of CAfter BL unit processing, using four maximum pooling layers with pooling cores of 1*1, 5*5, 9*9 and 13 × 13 to perform a Concatenate fusion operation, and repeatedly fusing features by a path from bottom to top and a path from top to bottom in a path aggregation network structure; then three feature maps F are processed B3 、F B4 And F B5 Respectively carrying out convolution layer processing with a CBL unit and a convolution kernel of 1*1 to obtain three Prediction feature maps of different sizes, namely Prediction1, prediction2 and Prediction3, wherein the resolutions of the three Prediction feature maps are respectively 1/8, 1/16 and 1/32 of the input fish school image; and then, predicting the fish target by using the three prediction characteristic graphs to obtain repeated candidate frames and corresponding prediction confidence scores.
6. The method for automatically detecting fish school based on target occlusion compensation as claimed in claim 1, wherein the process of step S6 is as follows:
s601, traversing all candidate frames in an image, sequentially judging the prediction confidence score of each candidate frame, reserving the candidate frames with the scores larger than the confidence threshold value and the corresponding scores thereof, and deleting the candidate frames with the scores lower than the confidence threshold value;
s602, selecting the candidate frame M with the highest prediction confidence score in the residual candidate frames, and traversing the other candidate frames B in sequence i Calculating a Distance cross ratio Distance-IoU with the candidate frame M, wherein the Distance cross ratio Distance-IoU is abbreviated as DIoU if a certain candidate frame B i If the value of DIoU with the candidate frame M is not less than the given threshold value epsilon, the degree of overlap between the two frames is considered to be high, and the candidate frame B is not directly deleted i Instead, the candidate box B is reduced i Then removing the candidate frame M to the final prediction frame set G; wherein the prediction confidence score reduction criterion is as follows:
Figure FDA0003754725310000051
where M is the candidate box with the highest current prediction confidence score, B i Is the other to be traversedCandidate box, ρ (M, B) i ) Is M and B i C is a distance containing M and B i The diagonal length of the minimum bounding rectangle, DIoU (M, B) i ) Is M and B i Is a given threshold value of the distance cross-over ratio DIoU, S i Is candidate frame B i A predicted confidence score of, S i Is candidate frame B i A reduced score prediction confidence score;
and S603, repeatedly executing the step S602 until all the candidate frames are processed, and drawing the final prediction frame set G on the corresponding picture as an output result to obtain a fish school detection result.
7. The method as claimed in claim 6, wherein the DIoU in S602 is obtained by adding a penalty factor based on the cross-over ratio IoU, the penalty factor considering the distance between the center points of the two candidate frames, DIoU (M, B) i ) The calculation method of (c) is as follows:
Figure FDA0003754725310000061
wherein M is the candidate box with the highest current prediction confidence score, B i Is the other candidate box to traverse, ρ (M, B) i ) Is M and B i C is a distance containing M and B i IoU (M, B) is the minimum diagonal length of the circumscribed rectangle i ) Is M and B i The ratio of the intersection and union of (a).
CN202110354428.6A 2021-04-01 2021-04-01 Fish shoal automatic detection method based on target shielding compensation Active CN113076871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110354428.6A CN113076871B (en) 2021-04-01 2021-04-01 Fish shoal automatic detection method based on target shielding compensation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110354428.6A CN113076871B (en) 2021-04-01 2021-04-01 Fish shoal automatic detection method based on target shielding compensation

Publications (2)

Publication Number Publication Date
CN113076871A CN113076871A (en) 2021-07-06
CN113076871B true CN113076871B (en) 2022-10-21

Family

ID=76614401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110354428.6A Active CN113076871B (en) 2021-04-01 2021-04-01 Fish shoal automatic detection method based on target shielding compensation

Country Status (1)

Country Link
CN (1) CN113076871B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435396B (en) * 2021-07-13 2022-05-20 大连海洋大学 Underwater fish school detection method based on image self-adaptive noise resistance
CN113887608B (en) * 2021-09-28 2023-03-24 北京三快在线科技有限公司 Model training method, image detection method and device
CN113610070A (en) * 2021-10-11 2021-11-05 中国地质环境监测院(自然资源部地质灾害技术指导中心) Landslide disaster identification method based on multi-source data fusion
CN114387510A (en) * 2021-12-22 2022-04-22 广东工业大学 Bird identification method and device for power transmission line and storage medium
CN114419364A (en) * 2021-12-24 2022-04-29 华南农业大学 Intelligent fish sorting method and system based on deep feature fusion
CN114419568A (en) * 2022-01-18 2022-04-29 东北大学 Multi-view pedestrian detection method based on feature fusion
CN114898105B (en) * 2022-03-04 2024-04-19 武汉理工大学 Infrared target detection method under complex scene
CN114782759B (en) 2022-06-22 2022-09-13 鲁东大学 Method for detecting densely-occluded fish based on YOLOv5 network
CN114863263B (en) 2022-07-07 2022-09-13 鲁东大学 Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566691A (en) * 2009-05-11 2009-10-28 华南理工大学 Method and system for tracking and positioning underwater target
CN111310622A (en) * 2020-02-05 2020-06-19 西北工业大学 Fish swarm target identification method for intelligent operation of underwater robot
CN111652118A (en) * 2020-05-29 2020-09-11 大连海事大学 Marine product autonomous grabbing guiding method based on underwater target neighbor distribution
CN111738139A (en) * 2020-06-19 2020-10-02 中国水产科学研究院渔业机械仪器研究所 Cultured fish monitoring method and system based on image recognition
CN112001339A (en) * 2020-08-27 2020-11-27 杭州电子科技大学 Pedestrian social distance real-time monitoring method based on YOLO v4
CN112308040A (en) * 2020-11-26 2021-02-02 山东捷讯通信技术有限公司 River sewage outlet detection method and system based on high-definition images
CN112465803A (en) * 2020-12-11 2021-03-09 桂林慧谷人工智能产业技术研究院 Underwater sea cucumber detection method combining image enhancement

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084866B (en) * 2020-08-07 2022-11-04 浙江工业大学 Target detection method based on improved YOLO v4 algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566691A (en) * 2009-05-11 2009-10-28 华南理工大学 Method and system for tracking and positioning underwater target
CN111310622A (en) * 2020-02-05 2020-06-19 西北工业大学 Fish swarm target identification method for intelligent operation of underwater robot
CN111652118A (en) * 2020-05-29 2020-09-11 大连海事大学 Marine product autonomous grabbing guiding method based on underwater target neighbor distribution
CN111738139A (en) * 2020-06-19 2020-10-02 中国水产科学研究院渔业机械仪器研究所 Cultured fish monitoring method and system based on image recognition
CN112001339A (en) * 2020-08-27 2020-11-27 杭州电子科技大学 Pedestrian social distance real-time monitoring method based on YOLO v4
CN112308040A (en) * 2020-11-26 2021-02-02 山东捷讯通信技术有限公司 River sewage outlet detection method and system based on high-definition images
CN112465803A (en) * 2020-12-11 2021-03-09 桂林慧谷人工智能产业技术研究院 Underwater sea cucumber detection method combining image enhancement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-class fish stock statistics technology based on object classification and tracking algorithm;Tao Liu;《Ecological Informatics 63 (2021) 101240》;20210206;全文 *
基于改进YOLO和迁移学习的水下鱼类目标实时检测;李庆忠等;《模式识别与人工智能》;20190315(第03期);全文 *
基于深度学习的鱼群检测方法研究;沈军宇;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200115;全文 *

Also Published As

Publication number Publication date
CN113076871A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
CN111222396B (en) All-weather multispectral pedestrian detection method
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN110163213B (en) Remote sensing image segmentation method based on disparity map and multi-scale depth network model
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN108022244B (en) Hypergraph optimization method for significant target detection based on foreground and background seeds
CN111091101B (en) High-precision pedestrian detection method, system and device based on one-step method
CN110310305B (en) Target tracking method and device based on BSSD detection and Kalman filtering
CN113177560A (en) Universal lightweight deep learning vehicle detection method
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN113516664A (en) Visual SLAM method based on semantic segmentation dynamic points
CN116486288A (en) Aerial target counting and detecting method based on lightweight density estimation network
CN113537085A (en) Ship target detection method based on two-time transfer learning and data augmentation
CN110659601A (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN114565675A (en) Method for removing dynamic feature points at front end of visual SLAM
CN113887649B (en) Target detection method based on fusion of deep layer features and shallow layer features
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN114943888A (en) Sea surface small target detection method based on multi-scale information fusion, electronic equipment and computer readable medium
CN113160117A (en) Three-dimensional point cloud target detection method under automatic driving scene
CN116363532A (en) Unmanned aerial vehicle image traffic target detection method based on attention mechanism and re-parameterization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant