CN112598713A - Offshore submarine fish detection and tracking statistical method based on deep learning - Google Patents

Offshore submarine fish detection and tracking statistical method based on deep learning Download PDF

Info

Publication number
CN112598713A
CN112598713A CN202110232509.9A CN202110232509A CN112598713A CN 112598713 A CN112598713 A CN 112598713A CN 202110232509 A CN202110232509 A CN 202110232509A CN 112598713 A CN112598713 A CN 112598713A
Authority
CN
China
Prior art keywords
fish
tracking
detection
box
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110232509.9A
Other languages
Chinese (zh)
Inventor
李培良
刘韬
顾艳镇
刘浩杨
李琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110232509.9A priority Critical patent/CN112598713A/en
Publication of CN112598713A publication Critical patent/CN112598713A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration by the use of histogram techniques
    • G06T5/90
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Abstract

The invention discloses a method for detecting, tracking and counting fish at the near shore based on deep learning, which comprises the steps of extracting characteristics in an input underwater real-time video by processing the input underwater real-time video through a basic neural network YOLOv5, tracking the position and the type of fish output by a branch and the ID number of each tracked fish, correcting the output of the tracking branch by using a detection branch as the final output, and obtaining the position, the type and the number of the fish in each picture. The invention matches the result between each frame by using the particle filtering and the KM algorithm, thereby matching the serial number of the fish in the video.

Description

Offshore submarine fish detection and tracking statistical method based on deep learning
Technical Field
The invention relates to the field of seabed exploration and detection, in particular to a near-shore seabed fish detection, tracking and statistics method based on deep learning.
Background
The ocean has very abundant biological resources; thus, coastal countries are actively developing marine farms, particularly fisheries aquaculture-type marine farms. The food and agriculture organization of the united nations records that the global edible fish yield of a marine ranch in 2016 is 2870 ten thousand tons (674 hundred million dollars) and the yield accounts for 49.5 percent of the total yield of aquaculture in 2016. Currently, offshore fishing is being over developed and the aquaculture industry is also in saturation; thus, marine ranch operations are considered to be an important approach to addressing the decline in fishery resources. However, there are also some problems with marine ranch operations (e.g., over-fishing, ecosystem imbalance, etc.). By enhancing the monitoring of the underwater biological resources, the fishing time and intensity can be controlled according to the change of the underwater biological resources, thereby solving the problems. For marine ranches, real-time monitoring of the number of organisms can form the basis of a protection strategy for scientific fishery management and sustainable fish production. In addition, the fish resource statistics help researchers understand the abundance of species, and the corresponding fish resource statistics can be analyzed in combination with local sea states to determine conditions suitable for the survival of each species. Therefore, the technology has important practical significance.
In the last decade, several tracking and detection methods have been introduced in the field of fishery management. In the detection algorithm, the traditional research method is to extract the fine features of the underwater target by fusing multi-sensor and multi-feature information. For example, Ishibashi et al utilizes optical sensors to acquire specific images of underwater targets. Saini and Biswas detect targets by detecting edges using adaptive thresholds. At present, the mainstream method is to capture an object by using an underwater camera and extract features by using a deep learning algorithm. Deep learning algorithms such as fast-RCNN and Resnet have been applied to underwater biometric identification processes such as sea cucumber identification (Xia et al, 2018) and fish detection algorithms (CN 202010003815.0). The main problem with this detection algorithm is that it is not possible to identify whether the fish in both frames are the same animal; therefore, a tracking model is required. In tracking algorithms, traditional filtering methods, such as particle filtering, optical flow methods and object segmentation, are the main methods, which are mainly tested under controlled conditions, such as in a limited laboratory environment. For example, Chuang tracks fish using object segmentation and object height block stereo matching. The method divides the fish into a plurality of parts for matching, and ignores the overall characteristics of the fish. Sun proposes a consistent fish tracking strategy for underwater surveillance systems with multiple static cameras and overlapping fields of view. And (3) adopting an accelerated robust feature technology and a centroid coordinate isomorphic mapping technology to capture the fishes. However, this method cannot identify the kind of fish. Romero-Ferero proposed an automated method to track all individuals in a small or large population of unmarked animals. Their algorithms have high accuracy for populations below 100 people; however, this method must be performed in an ideal laboratory environment. Meng-Che proposes a fish segmentation and tracking algorithm, which overcomes the problem of low contrast and ensures accurate segmentation of fish shape boundaries by adopting a histogram back projection method for a double local threshold image. However, with this method, sudden movements of the fish may cause tracking failures. In addition, the algorithm is too complex to achieve real-time tracking.
In recent years, several methods for tracking the abundance of fishes and automatically counting fish populations by using a machine vision technology are proposed. For example, Song et al (2020) propose an automatic fish counting method based on a hybrid neural network model to achieve real-time, accurate, objective, and lossless fish population counting in ocean salmon farming. The method adopts a multi-row convolutional neural network as a front end to capture characteristic information of different receptive fields. Meanwhile, the back end adopts a wider and deeper expanded convolutional neural network to reduce the loss of space structure information in the network transmission process. Finally, a hybrid neural network model is constructed. However, the main limitation of this method is that fish are regarded as particles and the type of fish cannot be classified. Marini et al (2018) developed a content-based image analysis method based on genetic programming. However, crowded scenes limit the efficiency of identification when a large number of fish are gathered in front of the camera. When these aggregates are particularly dense, individuals often overlap each other, which increases the false negative rate.
Disclosure of Invention
Most of the above fish detection and tracking methods based on machine vision do not relate to problematic scenes and practical difficulties in the harsh environment of a real marine ranch. In particular, the problems of multi-class multi-target real-time tracking and high underwater turbidity and difficult recognition are not solved. Because the time complexity is high due to the high algorithm complexity when the prior art adopts the traditional image processing, the real-time requirement is difficult to complete while the high accuracy is kept, and in the latest technology adopting deep learning, the target detection and the target tracking cannot be integrated together, so that the step-by-step operation is required, the additional time overhead and storage space overhead of intermediate results are required, and the misrecognition rate is high due to the complex underwater environment.
In order to solve the defects of the prior art, the invention provides the following technical scheme:
a near-shore submarine fish detection and tracking statistical method based on deep learning comprises the following steps:
step 1, obtaining fish image information, preprocessing the fish image information, sending the preprocessed fish image information into a trained FDT neural network structure, and extracting three feature maps with different scales after passing through a basic neural network, wherein the basic neural network is a target detection network YOLOv5, and the target detection network adopts a parallel double-branch structure based on deep learning and is used for detecting and tracking fishes in real-time in a real marine ranch environment;
step 2, obtaining three feature maps with different scales, inputting the feature maps into a detection branch and a tracking branch, and outputting the position and the type of the fish in the image by the detection branch through modeling the features and carrying out nonlinear change in the detection branch and the tracking branch; tracking the position and the type of the fish in the branch output image and the serial number ID of each tracked fish; correcting the output result of the tracking branch by using the output result of the detection branch, and taking the corrected output result as a final output result, wherein the position, the category and the serial number ID of the fish in each picture are recorded in the final output result;
step 3, obtaining the final output result, and carrying out online tracking on the final output result;
and 4, matching results between each frame by using particle filtering and a KM algorithm according to an online tracking result so as to match the serial numbers of the fishes, and associating the serial numbers with data of the multi-class fish schools according to the identified and tracked fishes.
Further, in step 1, obtaining fish image information, and preprocessing the fish image information, specifically including: acquiring an acquired underwater real-time video, cutting a fish image data set from the underwater real-time video, performing contrast processing on images in the fish image data set, and cutting the images subjected to the contrast processing to a specified size from an original size, wherein the original size is 1920 x 1080.
Further, the cutting from the original size to the designated size specifically includes: and calculating a scaling factor by taking the longest edge of the image as a reference edge, scaling the whole image to 608 × 342 through bilinear interpolation, then performing zero filling on the upper and lower edges of the image, and finally obtaining a cropped image with the specified size of 608 × 608.
Further, performing contrast processing on the images in the fish image dataset specifically includes: the histogram of the RGB channel is color compensated and then the compensated image is CLAHE-processed with contrast limited adaptive histogram equalization.
Further, performing color compensation on the histogram of the RGB channel specifically includes:
divide the image into RGB three channels, respectivelyCalculate the average b for each channelavg、gavg、 ravgAnd the minimum value of the three average values is used as the correction parameter value of the color shift, value = min { b }avg,gavg,ravg};
The average value for each channel is calculated by the following formula:
Figure 586514DEST_PATH_IMAGE001
wherein the index i ranges from 0 to n-1 and the index j ranges from 0 to m-1;
if the calculation result is less than 0, each channel of the value at the (i, j) position is defined as 0; otherwise, the pixel value is corrected using the channel average and the correction parameter value.
Further, performing online tracking on the final output result, specifically including: extracting a posterior box by adopting a non-maximum inhibition method based on the heat map score; determining the positions of key points with the heat map scores larger than a threshold value, and calculating corresponding posterior frames according to the estimated offset and the size of the prediction frame; the box linking is implemented using an online tracking algorithm.
Further, the method for realizing the box link by using the online tracking algorithm specifically comprises the following steps: initializing a tracking track set based on a detection frame in a first frame, and setting a threshold value; particle filtering is used to predict the location of the tracking trajectory in the current frame, and in subsequent frames, the Re-ID features and the IOU measurements are linked to a set of tracking trajectories if the distance between these blocks is greater than a threshold.
Further, the training process of the FDT neural network structure is as follows: and inputting the training sample tensor of the original reconstructed image and the training tensor of the target truth-value image into the FDT neural network structure, and performing cyclic training on the FDT neural network structure until the loss function output by the network is lower than a set threshold value.
Further, the calculation formula of the loss function is as follows:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
represents the posterior box loss,
Figure DEST_PATH_IMAGE004
Represents the loss of the category and
Figure DEST_PATH_IMAGE005
indicating loss.
Further, posterior box loss
Figure 939872DEST_PATH_IMAGE003
The calculation formula of (a) is as follows,
Figure DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE007
is the coordinate of the center point of the ground real posterior box, b is the coordinate of the center point of the prediction boundary box, IoU is the intersection of the area of the ground real posterior box and the area of the prediction prior box,
Figure DEST_PATH_IMAGE008
c represents the diagonal distance of the minimum closure area containing the real a-posteriori box and the prediction a-priori box,
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
two influencing factors, IoU,
Figure 968877DEST_PATH_IMAGE009
Figure 177136DEST_PATH_IMAGE010
The calculation formula of (a) is as follows:
Figure DEST_PATH_IMAGE011
Figure DEST_PATH_IMAGE012
wherein
Figure DEST_PATH_IMAGE013
Figure DEST_PATH_IMAGE014
The width and height of the ground real posterior box, w, h are the width and height of the prediction prior box,
Figure DEST_PATH_IMAGE015
the area of the real posterior box is shown,
Figure DEST_PATH_IMAGE016
representing the area of a prediction prior box;
class loss
Figure 264213DEST_PATH_IMAGE004
With cross entropy, the calculation formula is as follows:
Figure DEST_PATH_IMAGE017
when the jth prior frame of the ith grid is responsible for a certain real object, calculating a classification loss function by a posterior frame generated by the prior frame; when the object is true, the object is,
Figure DEST_PATH_IMAGE018
=1, otherwise
Figure DEST_PATH_IMAGE019
=0;
Considering Re-ID loss, embedding Re-ID is treated as a classification task, all object instances of the same identity are treated as a class, an identity feature vector is extracted at position (i, j), and the mapping to a class distribution vector is learned
Figure DEST_PATH_IMAGE020
The category label of the real posterior box is expressed as
Figure DEST_PATH_IMAGE021
Calculating softmax loss
Figure 522894DEST_PATH_IMAGE005
The calculation formula is as follows:
Figure DEST_PATH_IMAGE022
where K is the number of classes and N is the number of true posterior boxes.
The software for detecting, tracking and counting the underwater submarine fishes comprises the following steps: collecting real-time videos; extracting features in the video through a basic neural network (YOLOv 5); respectively inputting to a detection branch and a tracking branch, further, respectively modeling the characteristics in the two branches, carrying out nonlinear change, and detecting the position and the type of fish in a branch output picture, wherein: 1) the location, type of the output fish of the tracking branch and the ID (number) of each tracked fish; 2) correcting the output of the tracking branch with the detected branch as the final output; the results between each frame are matched using the online tracking part, i.e. particle filtering and KM algorithm, to match the number of fish in the video.
The invention provides an end-to-end neural network framework, which can directly output results while inputting videos, and provides an image enhancement algorithm at an input end, thereby obviously improving the accuracy of images.
For the FDT algorithm, when a section of underwater real-time video is input, after the underwater real-time video passes through a basic neural network (YOLOv 5), characteristics such as fish textures, fish shapes, fish sizes and the like in the video are extracted, then the characteristics are respectively input into a detection branch and a tracking branch, nonlinear change is carried out in the two branches through modeling the characteristics, and the position and the type of the fish in a branch output picture are detected. The location, type of fish and ID (number) of each tracked fish are output by the tracking branch. And finally, correcting the output of the tracking branch by using the detection branch as the final output. The position, category and number of the fish in each picture can be obtained. And then matching the number of the fish in the video by using an online tracking part, namely particle filtering and a KM algorithm to match the result between each frame.
In the process, detection and multi-target tracking algorithms are fused into a framework, so that tracking statistics of multi-class fish schools can be realized, an end-to-end unified neural network architecture is adopted, online processing can be realized, and a statistical result is output while a video is input.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a comparison diagram of an original image after image preprocessing and an original image after processing according to an embodiment of the present invention;
FIG. 2 is a diagram of an FDT algorithm architecture provided by an embodiment of the present invention;
FIG. 3 provides a process diagram of a training and testing phase according to an embodiment of the present invention;
fig. 4 is a schematic diagram of the OceanEye software main interface according to the embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a near-shore submarine fish detection and tracking statistical method based on deep learning, which is applied to fish population statistics.
The fish population statistical technique mainly comprises two parts, namely accurately identifying underwater fishes, and matching the fishes identified in each frame to form a tracking track.
The first part is the object recognition task. At present, in the field of computer vision, many target recognition algorithms have reached the accuracy of human eye recognition, but they are only focused on object recognition on land; the identification precision of the underwater complex environment is low, because light can be refracted and reflected in underwater transmission, the turbid underwater illumination is uneven, the light attenuation rates of the same wavelength are different, and the color shift phenomenon can occur in the underwater environment; for the above reasons, images photographed underwater have problems of degradation in quality such as low contrast, color distortion and blurring of texture, and thus, it is difficult to distinguish marine animals.
The second part is mainly the multi-target tracking task. The task is to match the objects identified in the two frames and determine the ID of the object in the video. However, the current multi-target tracking algorithm mainly focuses on single-class multi-target tracking. The workload of multi-class multi-target tracking is less, the mainstream algorithm is a two-stage algorithm of firstly identifying and then tracking, the real-time requirement cannot be met, and the current end-to-end algorithm capable of simultaneously identifying and tracking has lower precision. Furthermore, many fish are similar in size and appearance, and it is therefore difficult to distinguish such fish according to their texture and size, and many experiments are required to create a usable model; moreover, the fish swim irregularly in all directions; therefore, fish deformation and a blocking phenomenon may frequently occur.
The invention provides a near-shore submarine fish detection, tracking and statistics method based on deep learning, which comprises the following steps:
step 1, obtaining fish image information, preprocessing the fish image information, sending the preprocessed fish image information into a trained FDT neural network structure, and extracting three feature maps with different scales after passing through a basic neural network, wherein the basic neural network is a target detection network YOLOv5, and the target detection network adopts a parallel double-branch structure based on deep learning and is used for detecting and tracking fishes in real-time in a real marine ranch environment;
in this step, the picture processed in step 1 is input into an FDT neural network structure, and features in the picture, such as fish texture, fish shape, fish size, etc., are extracted after passing through a basic neural network YOLOv 5; the FDT neural network structure is shown in fig. 2.
In step 1, obtaining fish image information, and preprocessing the fish image information, specifically including:
acquiring an acquired underwater real-time video, cutting a fish image data set from the underwater real-time video, performing contrast processing on images in the fish image data set, and cutting the images subjected to the contrast processing to a specified size from an original size, wherein the original size is 1920 x 1080;
the cutting from the original size to the specified size specifically comprises:
calculating a scaling factor by taking the longest edge of the image as a reference edge, scaling the whole image to 608 × 342 through bilinear interpolation, then performing zero filling on the upper edge and the lower edge of the image, and finally obtaining a cut image with the specified size of 608 × 608;
performing contrast processing on the images in the fish image dataset, specifically comprising: the histogram of the RGB channel is color compensated and then the compensated image is CLAHE-processed with contrast limited adaptive histogram equalization.
By the contrast processing, the problems of underwater color distortion, low contrast, fuzzy details and the like are effectively solved; then, the image contrast is improved by using a self-adaptive histogram equalization CLAHE algorithm, the problems of color shift and contrast are solved, and the identification precision is improved. The images before and after enhancement are shown in fig. 1.
Performing color compensation on the histogram of the RGB channel, specifically comprising:
dividing the image into RGB three channels, respectively calculating the average value b of each channelavg、gavg、 ravgAnd the minimum value of the three average values is used as the correction parameter value of the color shift, value = min { b }avg,gavg,ravg};
The average value for each channel is calculated by the following formula:
Figure 100002_DEST_PATH_IMAGE023
wherein the index i ranges from 0 to n-1 and the index j ranges from 0 to m-1;
if the calculation result is less than 0, each channel of the value at the (i, j) position is defined as 0; otherwise, the pixel value is corrected using the channel average and the correction parameter value.
Step 2, obtaining three feature maps with different scales, inputting the feature maps into a detection branch and a tracking branch, and outputting the position and the type of the fish in the image by the detection branch through modeling the features and carrying out nonlinear change in the detection branch and the tracking branch; tracking the position and the type of the fish in the branch output image and the serial number ID of each tracked fish; correcting the output result of the tracking branch by using the output result of the detection branch, and taking the corrected output result as a final output result, wherein the position, the category and the serial number ID of the fish in each picture are recorded in the final output result;
wherein the detection branch and the tracking branch share the same basic feature extraction network to reduce the amount of computation. The first output size of the tracking branch is 13 x 45. The first value 13 is the height of the feature map. The second value 13 is the width of the feature map. The third value 45 includes the x and y coordinates of the predicted center point for each channel, the width and height of the posterior box, the confidence and the probability of 9 fish. Similarly, the size of the second output is 26 × 26 × 45, and the size of the third output is 52 × 52 × 45. The tracking branch uses a one-time MOT method, which has four outputs, respectively, center point, posterior box position, center offset, and Re-ID. The center point displays the x-coordinate and the y-coordinate of the detected object. The posterior box position shows the height and width of the target box. The center offset output is responsible for more accurately locating the target for reducing quantization errors resulting from the signature steps. The Re-ID is used to identify whether the same object exists in different frames. Then, the correction tracking branch is output by the detection branch. The center point of the target is matched between the two branch outputs. Here, if the center point of the trace branch is the same as the center point of the detection branch, the posterior box corresponding to the target output by the trace branch is replaced by the detection branch. If the center point of the detected branch is not contained in the tracking list, the detected target is added to the tracking list. And finally, carrying out online tracking on the central point, the posterior frame, the fish and the Re-ID of each tracking target.
The advantages of the combination of detecting and tracking branches are summarized below. Multiple class objects cannot be tracked simultaneously using only one tracking branch. However, we can identify the class of the tracked object by detecting the additional class output of the branch. Therefore, multi-class synchronous tracking is realized. In addition, the output of the posterior box of the detection branch is more accurate than that of the tracking branch, and the tracking precision and the online tracking time are improved. Finally, due to the complexity of the underwater environment, such as when two fish overlap, the detection branch can extract more detailed features to track the correct target.
Step 3, obtaining the final output result, and carrying out online tracking on the final output result;
performing online tracking on the final output result, specifically comprising: extracting a posterior box by adopting a non-maximum inhibition method based on the heat map score; determining the positions of key points with the heat map scores larger than a threshold value, and calculating corresponding posterior frames according to the estimated offset and the size of the prediction frame; the box linking is implemented using an online tracking algorithm.
The method for realizing the frame linking by using the online tracking algorithm specifically comprises the following steps: initializing a tracking track set based on a detection frame in a first frame, and setting a threshold value; particle filtering is used to predict the location of the tracking trajectory in the current frame, and in subsequent frames, the Re-ID features and the IOU measurements are linked to a set of tracking trajectories if the distance between these blocks is greater than a threshold.
In this step, two parallel branches can get the posterior box and the Re-ID, as shown in FIG. 3.
And 4, matching results between each frame by using particle filtering and a KM algorithm according to an online tracking result so as to match the serial numbers of the fishes, and associating the serial numbers with data of the multi-class fish schools according to the identified and tracked fishes.
In the process, detection and multi-target tracking algorithms are fused into a framework, so that tracking statistics of multi-class fish schools can be realized, an end-to-end unified neural network architecture is adopted, online processing can be realized, and a statistical result is output while a video is input.
The training process of the FDT neural network structure is as follows: and inputting the training sample tensor of the original reconstructed image and the training tensor of the target truth-value image into the FDT neural network structure, and performing cyclic training on the FDT neural network structure until the loss function output by the network is lower than a set threshold value.
In the process, detection and multi-target tracking algorithms are fused into a framework, so that tracking statistics of multi-class fish schools can be realized, an end-to-end unified neural network architecture is adopted, online processing can be realized, and a statistical result is output while a video is input.
Said loss functionlossIs composed of three parts, namely posterior frame loss
Figure 602976DEST_PATH_IMAGE003
(loss function of predicted true position), class loss
Figure DEST_PATH_IMAGE024
(loss function of predicted species class) and losses
Figure 864193DEST_PATH_IMAGE005
(predicted loss function of biological number), the calculation formula is as follows:
Figure DEST_PATH_IMAGE025
loss of posterior box
Figure 844657DEST_PATH_IMAGE003
The calculation formula of (a) is as follows,
Figure DEST_PATH_IMAGE026
in the formula, the overlapping area, the distance between the central points and the length-width ratio are considered at the same time, and the CIoU can obtain better convergence speed and precision on the BBox regression problem;
wherein the content of the first and second substances,
Figure 713387DEST_PATH_IMAGE007
is the coordinate of the center point of the ground real posterior box, b is the coordinate of the center point of the prediction boundary box, IoU is the intersection of the area of the ground real posterior box and the area of the prediction prior box,
Figure 467716DEST_PATH_IMAGE008
c represents the diagonal distance of the minimum closure area that can contain the true a posteriori box and the predicted a priori box,
Figure 267045DEST_PATH_IMAGE009
Figure 790430DEST_PATH_IMAGE010
are two influencing factors, IoU,
Figure 220274DEST_PATH_IMAGE009
Figure 773484DEST_PATH_IMAGE010
The calculation formula of (a) is as follows:
Figure 48608DEST_PATH_IMAGE011
Figure DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE028
and
Figure 567445DEST_PATH_IMAGE014
the width and height of the ground real posterior box, w and h are the width and height of the prediction prior box,
Figure DEST_PATH_IMAGE029
it refers to the area of the real posterior box,
Figure 964928DEST_PATH_IMAGE016
refers to the predicted prior box area;
class loss
Figure 693850DEST_PATH_IMAGE024
With cross entropy, the calculation formula is as follows:
Figure 84249DEST_PATH_IMAGE017
when the jth prior frame of the ith grid is responsible for a certain real object, calculating a classification loss function by a posterior frame generated by the prior frame; when the object is true, the object is,
Figure 644543DEST_PATH_IMAGE018
=1, otherwise
Figure 416190DEST_PATH_IMAGE018
=0;
Considering Re-ID loss, embedding Re-ID is treated as a classification task, all object instances of the same identity are treated as a class, an identity feature vector is extracted at position (i, j), and the mapping to a class distribution vector is learned
Figure DEST_PATH_IMAGE030
The category label of the real posterior box is expressed as
Figure DEST_PATH_IMAGE031
Calculating softmax loss
Figure 242195DEST_PATH_IMAGE005
The calculation formula is as follows:
Figure 124700DEST_PATH_IMAGE022
where K is the number of classes and N is the number of true posterior boxes.
In a more preferred embodiment of the invention, the invention also provides a software for detecting and tracking the underwater submarine fishes, which is based on the algorithm; the software interface is as shown in fig. 4, specifically, the input video is clicked in a cloud storage manner, and the processed video and the corresponding statistical results of each category are displayed in real time. When a section of underwater real-time video is input, after the underwater real-time video passes through a basic neural network (YOLOv 5), characteristics in the video, such as fish textures, fish shapes, fish sizes and the like, are extracted, then the characteristics are respectively input into a detection branch and a tracking branch, nonlinear change is carried out in the two branches through modeling the characteristics, and the position and the type of the fish in an output picture of the detection branch are detected. The location, type of fish and ID (number) of each tracked fish are output by the tracking branch. And finally, correcting the output of the tracking branch by using the detection branch as the final output. The position, category and number of the fish in each picture can be obtained. And then matching results between each frame by using an online tracking part, namely particle filtering and a KM algorithm, so as to match the numbers of the fish in the video. In the process, detection and multi-target tracking algorithms are fused into a framework, so that tracking statistics of multi-class fish schools can be realized, an end-to-end unified neural network architecture is adopted, online processing can be realized, and a statistical result is output while a video is input.
Furthermore, the software detection and tracking process comprises the following steps: collecting real-time videos; extracting features in the video through a basic neural network (YOLOv 5); respectively inputting to a detection branch and a tracking branch, further, respectively modeling the characteristics in the two branches, carrying out nonlinear change, and detecting the position and the type of fish in a branch output picture, wherein: 1 tracking the location, type of the branched output fish and the ID (number) of each tracked fish; 2 correcting the output of the tracking branch by using the detection branch as the final output; the results between each frame are matched using the online tracking part, i.e. particle filtering and KM algorithm, to match the number of fish in the video.
While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and should not be construed as limiting the scope of the invention.

Claims (10)

1. A near-shore submarine fish detection and tracking statistical method based on deep learning is characterized by comprising the following steps:
step 1, obtaining fish image information, preprocessing the fish image information, sending the preprocessed fish image information into a trained FDT neural network structure, and extracting three feature maps with different scales after passing through a basic neural network, wherein the basic neural network is a target detection network YOLOv5, and the target detection network adopts a parallel double-branch structure based on deep learning and is used for detecting and tracking fishes in real-time in a real marine ranch environment;
step 2, obtaining three feature maps with different scales, inputting the feature maps into a detection branch and a tracking branch, and outputting the position and the type of the fish in the image by the detection branch through modeling the features and carrying out nonlinear change in the detection branch and the tracking branch; tracking the position and the type of the fish in the branch output image and the serial number ID of each tracked fish; correcting the output result of the tracking branch by using the output result of the detection branch, and taking the corrected output result as a final output result, wherein the position, the category and the serial number ID of the fish in each picture are recorded in the final output result;
step 3, obtaining the final output result, and carrying out online tracking on the final output result;
and 4, matching results between each frame by using particle filtering and a KM algorithm according to an online tracking result so as to match the serial numbers of the fishes, and associating the serial numbers with data of the multi-class fish schools according to the identified and tracked fishes.
2. The method for detecting, tracking and counting fish on the near shore seabed based on deep learning of claim 1, wherein in step 1, fish image information is obtained, and the fish image information is preprocessed, specifically comprising: acquiring an acquired underwater real-time video, cutting a fish image data set from the underwater real-time video, performing contrast processing on images in the fish image data set, and cutting the images subjected to the contrast processing to a specified size from an original size, wherein the original size is 1920 x 1080.
3. The deep learning-based offshore fish detection and tracking statistical method according to claim 2, wherein the cutting from an original size to a specified size specifically comprises: and calculating a scaling factor by taking the longest edge of the image as a reference edge, scaling the whole image to 608 × 342 through bilinear interpolation, then performing zero filling on the upper and lower edges of the image, and finally obtaining a cropped image with the specified size of 608 × 608.
4. The method for offshore submarine fish detection and tracking statistics based on deep learning according to claim 2, wherein the contrast processing of the images in the fish image dataset specifically comprises: the histogram of the RGB channel is color compensated and then the compensated image is CLAHE-processed with contrast limited adaptive histogram equalization.
5. The method for offshore fish detection and tracking statistics based on deep learning as claimed in claim 4, wherein the color compensation of the histogram of RGB channels specifically comprises:
dividing the image into RGB three channels, respectively calculating the average value b of each channelavg、gavg、 ravgAnd the minimum value of the three average values is used as the correction parameter value of the color shift, value = min { b }avg,gavg,ravg};
The average value for each channel is calculated by the following formula:
Figure 804974DEST_PATH_IMAGE001
wherein the index i ranges from 0 to n-1 and the index j ranges from 0 to m-1;
if the calculation result is less than 0, each channel of the value at the (i, j) position is defined as 0; otherwise, the pixel value is corrected using the channel average and the correction parameter value.
6. The method for offshore submarine fish detection and tracking statistics based on deep learning of claim 1, wherein the online tracking of the final output result specifically comprises: extracting a posterior box by adopting a non-maximum inhibition method based on the heat map score; determining the positions of key points with the heat map scores larger than a threshold value, and calculating corresponding posterior frames according to the estimated offset and the size of the prediction frame; the box linking is implemented using an online tracking algorithm.
7. The method for offshore fish detection and tracking statistics based on deep learning as claimed in claim 6, wherein the online tracking algorithm is used to realize frame linking, and specifically comprises: initializing a tracking track set based on a detection frame in a first frame, and setting a threshold value; particle filtering is used to predict the location of the tracking trajectory in the current frame, and in subsequent frames, the Re-ID features and the IOU measurements are linked to a set of tracking trajectories if the distance between these blocks is greater than a threshold.
8. The deep learning-based offshore fish detection and tracking statistical method according to claim 1, wherein the FDT neural network structure is trained as follows: and inputting the training sample tensor of the original reconstructed image and the training tensor of the target truth-value image into the FDT neural network structure, and performing cyclic training on the FDT neural network structure until the loss function output by the network is lower than a set threshold value.
9. The deep learning-based offshore fish detection and tracking statistical method according to claim 8, wherein the loss function is calculated by the following formula:
Figure 659797DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 439534DEST_PATH_IMAGE003
represents the posterior box loss,
Figure 270962DEST_PATH_IMAGE004
Represents the loss of the category and
Figure 85334DEST_PATH_IMAGE005
indicating loss.
10. The offshore seafloor fish detection and tracking statistical method based on deep learning of claim 9,
loss of posterior box
Figure 845480DEST_PATH_IMAGE003
The calculation formula of (a) is as follows,
Figure 112513DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 249096DEST_PATH_IMAGE007
is the coordinate of the center point of the ground real posterior box, b is the coordinate of the center point of the prediction boundary box, IoU is the intersection of the area of the ground real posterior box and the area of the prediction prior box,
Figure 855658DEST_PATH_IMAGE008
c represents the diagonal distance of the minimum closure area containing the real a-posteriori box and the prediction a-priori box,
Figure 849022DEST_PATH_IMAGE009
Figure 39570DEST_PATH_IMAGE010
two influencing factors, IoU,
Figure 714265DEST_PATH_IMAGE009
Figure 237650DEST_PATH_IMAGE010
The calculation formula of (a) is as follows:
Figure 605177DEST_PATH_IMAGE011
Figure 846803DEST_PATH_IMAGE012
wherein
Figure 59609DEST_PATH_IMAGE013
Figure 703080DEST_PATH_IMAGE014
The width and height of the ground real posterior box, w, h are the width and height of the prediction prior box,
Figure 474465DEST_PATH_IMAGE015
the area of the real posterior box is shown,
Figure 203387DEST_PATH_IMAGE016
representing the area of a prediction prior box;
class loss
Figure 219884DEST_PATH_IMAGE004
With cross entropy, the calculation formula is as follows:
Figure 717862DEST_PATH_IMAGE017
when the jth prior frame of the ith grid is responsible for a certain real object, calculating a classification loss function by a posterior frame generated by the prior frame; when the object is true, the object is,
Figure 427192DEST_PATH_IMAGE018
=1, otherwise
Figure 377830DEST_PATH_IMAGE019
=0;
In considering Re-ID loss, Re-ID embedding is handled as a classification task,all object instances of the same identity are treated as a class, an identity feature vector is extracted at location (i, j), and the mapping to a class distribution vector is learned
Figure 198019DEST_PATH_IMAGE020
The category label of the real posterior box is expressed as
Figure 550502DEST_PATH_IMAGE021
Calculating softmax loss
Figure 929269DEST_PATH_IMAGE005
The calculation formula is as follows:
Figure DEST_PATH_IMAGE023
where K is the number of classes and N is the number of true posterior boxes.
CN202110232509.9A 2021-03-03 2021-03-03 Offshore submarine fish detection and tracking statistical method based on deep learning Pending CN112598713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110232509.9A CN112598713A (en) 2021-03-03 2021-03-03 Offshore submarine fish detection and tracking statistical method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110232509.9A CN112598713A (en) 2021-03-03 2021-03-03 Offshore submarine fish detection and tracking statistical method based on deep learning

Publications (1)

Publication Number Publication Date
CN112598713A true CN112598713A (en) 2021-04-02

Family

ID=75210140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110232509.9A Pending CN112598713A (en) 2021-03-03 2021-03-03 Offshore submarine fish detection and tracking statistical method based on deep learning

Country Status (1)

Country Link
CN (1) CN112598713A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112726A (en) * 2021-05-11 2021-07-13 创新奇智(广州)科技有限公司 Intrusion detection method, device, equipment, system and readable storage medium
CN113326850A (en) * 2021-08-03 2021-08-31 中国科学院烟台海岸带研究所 Example segmentation-based video analysis method for group behavior of Charybdis japonica
CN113379746A (en) * 2021-08-16 2021-09-10 深圳荣耀智能机器有限公司 Image detection method, device, system, computing equipment and readable storage medium
CN113569971A (en) * 2021-08-02 2021-10-29 浙江索思科技有限公司 Image recognition-based catch target classification detection method and system
CN113780127A (en) * 2021-08-30 2021-12-10 武汉理工大学 Ship positioning and monitoring system and method
CN114037737A (en) * 2021-11-16 2022-02-11 浙江大学 Neural network-based offshore submarine fish detection and tracking statistical method
CN114049477A (en) * 2021-11-16 2022-02-15 中国水利水电科学研究院 Fish passing fishway system and dynamic identification and tracking method for fish quantity and fish type
CN114463675A (en) * 2022-01-11 2022-05-10 北京市农林科学院信息技术研究中心 Underwater fish group activity intensity identification method and device
CN115063378A (en) * 2022-06-27 2022-09-16 中国平安财产保险股份有限公司 Intelligent counting method, device, equipment and storage medium
CN115953725A (en) * 2023-03-14 2023-04-11 浙江大学 Fish egg automatic counting system based on deep learning and counting method thereof
TWI801911B (en) * 2021-06-18 2023-05-11 國立臺灣海洋大學 Aquatic organism identification method and system
CN116721132A (en) * 2023-06-20 2023-09-08 中国农业大学 Multi-target tracking method, system and equipment for industrially cultivated fishes
CN117292305A (en) * 2023-11-24 2023-12-26 中国科学院水生生物研究所 Method, system, electronic equipment and medium for determining fetal movement times of fish fertilized eggs

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805064A (en) * 2018-05-31 2018-11-13 中国农业大学 A kind of fish detection and localization and recognition methods and system based on deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805064A (en) * 2018-05-31 2018-11-13 中国农业大学 A kind of fish detection and localization and recognition methods and system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHASHA LIU等: ""Embedded Online Fish Detection and Tracking System via YOLOv3 and Parallel Correlation Filter"", 《OCEANS 2018 MTS/IEEE CHARLESTON》 *
TAO LIU等: ""Multi-class fish stock statistics technology based on object classification and tracking algorithm"", 《ECOLOGICAL INFORMATICS》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112726A (en) * 2021-05-11 2021-07-13 创新奇智(广州)科技有限公司 Intrusion detection method, device, equipment, system and readable storage medium
TWI801911B (en) * 2021-06-18 2023-05-11 國立臺灣海洋大學 Aquatic organism identification method and system
CN113569971B (en) * 2021-08-02 2022-03-25 浙江索思科技有限公司 Image recognition-based catch target classification detection method and system
CN113569971A (en) * 2021-08-02 2021-10-29 浙江索思科技有限公司 Image recognition-based catch target classification detection method and system
CN113326850A (en) * 2021-08-03 2021-08-31 中国科学院烟台海岸带研究所 Example segmentation-based video analysis method for group behavior of Charybdis japonica
CN113326850B (en) * 2021-08-03 2021-10-26 中国科学院烟台海岸带研究所 Example segmentation-based video analysis method for group behavior of Charybdis japonica
CN113379746B (en) * 2021-08-16 2021-11-02 深圳荣耀智能机器有限公司 Image detection method, device, system, computing equipment and readable storage medium
CN113379746A (en) * 2021-08-16 2021-09-10 深圳荣耀智能机器有限公司 Image detection method, device, system, computing equipment and readable storage medium
CN113780127A (en) * 2021-08-30 2021-12-10 武汉理工大学 Ship positioning and monitoring system and method
CN114049477B (en) * 2021-11-16 2023-04-07 中国水利水电科学研究院 Fish passing fishway system and dynamic identification and tracking method for fish quantity and fish type
CN114049477A (en) * 2021-11-16 2022-02-15 中国水利水电科学研究院 Fish passing fishway system and dynamic identification and tracking method for fish quantity and fish type
CN114037737A (en) * 2021-11-16 2022-02-11 浙江大学 Neural network-based offshore submarine fish detection and tracking statistical method
CN114037737B (en) * 2021-11-16 2022-08-09 浙江大学 Neural network-based offshore submarine fish detection and tracking statistical method
CN114463675A (en) * 2022-01-11 2022-05-10 北京市农林科学院信息技术研究中心 Underwater fish group activity intensity identification method and device
CN115063378A (en) * 2022-06-27 2022-09-16 中国平安财产保险股份有限公司 Intelligent counting method, device, equipment and storage medium
CN115063378B (en) * 2022-06-27 2023-12-05 中国平安财产保险股份有限公司 Intelligent point counting method, device, equipment and storage medium
CN115953725A (en) * 2023-03-14 2023-04-11 浙江大学 Fish egg automatic counting system based on deep learning and counting method thereof
CN116721132A (en) * 2023-06-20 2023-09-08 中国农业大学 Multi-target tracking method, system and equipment for industrially cultivated fishes
CN116721132B (en) * 2023-06-20 2023-11-24 中国农业大学 Multi-target tracking method, system and equipment for industrially cultivated fishes
CN117292305A (en) * 2023-11-24 2023-12-26 中国科学院水生生物研究所 Method, system, electronic equipment and medium for determining fetal movement times of fish fertilized eggs
CN117292305B (en) * 2023-11-24 2024-02-20 中国科学院水生生物研究所 Method, system, electronic equipment and medium for determining fetal movement times of fish fertilized eggs

Similar Documents

Publication Publication Date Title
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
Yang et al. Computer vision models in intelligent aquaculture with emphasis on fish detection and behavior analysis: a review
Jia et al. Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot
CN109766830B (en) Ship target identification system and method based on artificial intelligence image processing
CN111178197B (en) Mass R-CNN and Soft-NMS fusion based group-fed adherent pig example segmentation method
CN111046880A (en) Infrared target image segmentation method and system, electronic device and storage medium
Umamaheswari et al. Weed detection in farm crops using parallel image processing
CN113592896B (en) Fish feeding method, system, equipment and storage medium based on image processing
CN109685045A (en) A kind of Moving Targets Based on Video Streams tracking and system
CN110415208A (en) A kind of adaptive targets detection method and its device, equipment, storage medium
CN110853070A (en) Underwater sea cucumber image segmentation method based on significance and Grabcut
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN114724022A (en) Culture fish school detection method, system and medium fusing SKNet and YOLOv5
Liu et al. A high-density fish school segmentation framework for biomass statistics in a deep-sea cage
Xia et al. In situ sea cucumber detection based on deep learning approach
CN115731282A (en) Underwater fish weight estimation method and system based on deep learning and electronic equipment
Hou et al. Detection and localization of citrus fruit based on improved You Only Look Once v5s and binocular vision in the orchard
Wang et al. Using an improved YOLOv4 deep learning network for accurate detection of whitefly and thrips on sticky trap images
Yu et al. U-YOLOv7: a network for underwater organism detection
Li et al. Fast recognition of pig faces based on improved Yolov3
Xu et al. Detection of bluefin tuna by cascade classifier and deep learning for monitoring fish resources
Hu et al. Automatic detection of pecan fruits based on Faster RCNN with FPN in orchard
CN114037737B (en) Neural network-based offshore submarine fish detection and tracking statistical method
Siripattanadilok et al. Recognition of partially occluded soft-shell mud crabs using Faster R-CNN and Grad-CAM
CN112308002B (en) Submarine organism identification and detection method based on single-stage deep learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210402