CN112598713A

CN112598713A - Offshore submarine fish detection and tracking statistical method based on deep learning

Info

Publication number: CN112598713A
Application number: CN202110232509.9A
Authority: CN
Inventors: 李培良; 刘韬; 顾艳镇; 刘浩杨; 李琳
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2021-04-02

Abstract

The invention discloses a method for detecting, tracking and counting fish at the near shore based on deep learning, which comprises the steps of extracting characteristics in an input underwater real-time video by processing the input underwater real-time video through a basic neural network YOLOv5, tracking the position and the type of fish output by a branch and the ID number of each tracked fish, correcting the output of the tracking branch by using a detection branch as the final output, and obtaining the position, the type and the number of the fish in each picture. The invention matches the result between each frame by using the particle filtering and the KM algorithm, thereby matching the serial number of the fish in the video.

Description

Offshore submarine fish detection and tracking statistical method based on deep learning

Technical Field

The invention relates to the field of seabed exploration and detection, in particular to a near-shore seabed fish detection, tracking and statistics method based on deep learning.

Background

The ocean has very abundant biological resources; thus, coastal countries are actively developing marine farms, particularly fisheries aquaculture-type marine farms. The food and agriculture organization of the united nations records that the global edible fish yield of a marine ranch in 2016 is 2870 ten thousand tons (674 hundred million dollars) and the yield accounts for 49.5 percent of the total yield of aquaculture in 2016. Currently, offshore fishing is being over developed and the aquaculture industry is also in saturation; thus, marine ranch operations are considered to be an important approach to addressing the decline in fishery resources. However, there are also some problems with marine ranch operations (e.g., over-fishing, ecosystem imbalance, etc.). By enhancing the monitoring of the underwater biological resources, the fishing time and intensity can be controlled according to the change of the underwater biological resources, thereby solving the problems. For marine ranches, real-time monitoring of the number of organisms can form the basis of a protection strategy for scientific fishery management and sustainable fish production. In addition, the fish resource statistics help researchers understand the abundance of species, and the corresponding fish resource statistics can be analyzed in combination with local sea states to determine conditions suitable for the survival of each species. Therefore, the technology has important practical significance.

In the last decade, several tracking and detection methods have been introduced in the field of fishery management. In the detection algorithm, the traditional research method is to extract the fine features of the underwater target by fusing multi-sensor and multi-feature information. For example, Ishibashi et al utilizes optical sensors to acquire specific images of underwater targets. Saini and Biswas detect targets by detecting edges using adaptive thresholds. At present, the mainstream method is to capture an object by using an underwater camera and extract features by using a deep learning algorithm. Deep learning algorithms such as fast-RCNN and Resnet have been applied to underwater biometric identification processes such as sea cucumber identification (Xia et al, 2018) and fish detection algorithms (CN 202010003815.0). The main problem with this detection algorithm is that it is not possible to identify whether the fish in both frames are the same animal; therefore, a tracking model is required. In tracking algorithms, traditional filtering methods, such as particle filtering, optical flow methods and object segmentation, are the main methods, which are mainly tested under controlled conditions, such as in a limited laboratory environment. For example, Chuang tracks fish using object segmentation and object height block stereo matching. The method divides the fish into a plurality of parts for matching, and ignores the overall characteristics of the fish. Sun proposes a consistent fish tracking strategy for underwater surveillance systems with multiple static cameras and overlapping fields of view. And (3) adopting an accelerated robust feature technology and a centroid coordinate isomorphic mapping technology to capture the fishes. However, this method cannot identify the kind of fish. Romero-Ferero proposed an automated method to track all individuals in a small or large population of unmarked animals. Their algorithms have high accuracy for populations below 100 people; however, this method must be performed in an ideal laboratory environment. Meng-Che proposes a fish segmentation and tracking algorithm, which overcomes the problem of low contrast and ensures accurate segmentation of fish shape boundaries by adopting a histogram back projection method for a double local threshold image. However, with this method, sudden movements of the fish may cause tracking failures. In addition, the algorithm is too complex to achieve real-time tracking.

In recent years, several methods for tracking the abundance of fishes and automatically counting fish populations by using a machine vision technology are proposed. For example, Song et al (2020) propose an automatic fish counting method based on a hybrid neural network model to achieve real-time, accurate, objective, and lossless fish population counting in ocean salmon farming. The method adopts a multi-row convolutional neural network as a front end to capture characteristic information of different receptive fields. Meanwhile, the back end adopts a wider and deeper expanded convolutional neural network to reduce the loss of space structure information in the network transmission process. Finally, a hybrid neural network model is constructed. However, the main limitation of this method is that fish are regarded as particles and the type of fish cannot be classified. Marini et al (2018) developed a content-based image analysis method based on genetic programming. However, crowded scenes limit the efficiency of identification when a large number of fish are gathered in front of the camera. When these aggregates are particularly dense, individuals often overlap each other, which increases the false negative rate.

Disclosure of Invention

Most of the above fish detection and tracking methods based on machine vision do not relate to problematic scenes and practical difficulties in the harsh environment of a real marine ranch. In particular, the problems of multi-class multi-target real-time tracking and high underwater turbidity and difficult recognition are not solved. Because the time complexity is high due to the high algorithm complexity when the prior art adopts the traditional image processing, the real-time requirement is difficult to complete while the high accuracy is kept, and in the latest technology adopting deep learning, the target detection and the target tracking cannot be integrated together, so that the step-by-step operation is required, the additional time overhead and storage space overhead of intermediate results are required, and the misrecognition rate is high due to the complex underwater environment.

In order to solve the defects of the prior art, the invention provides the following technical scheme:

a near-shore submarine fish detection and tracking statistical method based on deep learning comprises the following steps:

step 1, obtaining fish image information, preprocessing the fish image information, sending the preprocessed fish image information into a trained FDT neural network structure, and extracting three feature maps with different scales after passing through a basic neural network, wherein the basic neural network is a target detection network YOLOv5, and the target detection network adopts a parallel double-branch structure based on deep learning and is used for detecting and tracking fishes in real-time in a real marine ranch environment;

step 2, obtaining three feature maps with different scales, inputting the feature maps into a detection branch and a tracking branch, and outputting the position and the type of the fish in the image by the detection branch through modeling the features and carrying out nonlinear change in the detection branch and the tracking branch; tracking the position and the type of the fish in the branch output image and the serial number ID of each tracked fish; correcting the output result of the tracking branch by using the output result of the detection branch, and taking the corrected output result as a final output result, wherein the position, the category and the serial number ID of the fish in each picture are recorded in the final output result;

step 3, obtaining the final output result, and carrying out online tracking on the final output result;

and 4, matching results between each frame by using particle filtering and a KM algorithm according to an online tracking result so as to match the serial numbers of the fishes, and associating the serial numbers with data of the multi-class fish schools according to the identified and tracked fishes.

Further, in step 1, obtaining fish image information, and preprocessing the fish image information, specifically including: acquiring an acquired underwater real-time video, cutting a fish image data set from the underwater real-time video, performing contrast processing on images in the fish image data set, and cutting the images subjected to the contrast processing to a specified size from an original size, wherein the original size is 1920 x 1080.

Further, the cutting from the original size to the designated size specifically includes: and calculating a scaling factor by taking the longest edge of the image as a reference edge, scaling the whole image to 608 × 342 through bilinear interpolation, then performing zero filling on the upper and lower edges of the image, and finally obtaining a cropped image with the specified size of 608 × 608.

Further, performing contrast processing on the images in the fish image dataset specifically includes: the histogram of the RGB channel is color compensated and then the compensated image is CLAHE-processed with contrast limited adaptive histogram equalization.

Further, performing color compensation on the histogram of the RGB channel specifically includes:

divide the image into RGB three channels, respectivelyCalculate the average b for each channel_avg、g_avg、 r_avgAnd the minimum value of the three average values is used as the correction parameter value of the color shift, value = min { b }_avg，g_avg，r_avg｝；

The average value for each channel is calculated by the following formula:

wherein the index i ranges from 0 to n-1 and the index j ranges from 0 to m-1;

if the calculation result is less than 0, each channel of the value at the (i, j) position is defined as 0; otherwise, the pixel value is corrected using the channel average and the correction parameter value.

Further, performing online tracking on the final output result, specifically including: extracting a posterior box by adopting a non-maximum inhibition method based on the heat map score; determining the positions of key points with the heat map scores larger than a threshold value, and calculating corresponding posterior frames according to the estimated offset and the size of the prediction frame; the box linking is implemented using an online tracking algorithm.

Further, the method for realizing the box link by using the online tracking algorithm specifically comprises the following steps: initializing a tracking track set based on a detection frame in a first frame, and setting a threshold value; particle filtering is used to predict the location of the tracking trajectory in the current frame, and in subsequent frames, the Re-ID features and the IOU measurements are linked to a set of tracking trajectories if the distance between these blocks is greater than a threshold.

Further, the training process of the FDT neural network structure is as follows: and inputting the training sample tensor of the original reconstructed image and the training tensor of the target truth-value image into the FDT neural network structure, and performing cyclic training on the FDT neural network structure until the loss function output by the network is lower than a set threshold value.

Further, the calculation formula of the loss function is as follows:

wherein the content of the first and second substances,

represents the posterior box loss,

Represents the loss of the category and

indicating loss.

Further, posterior box loss

The calculation formula of (a) is as follows,

wherein the content of the first and second substances,

is the coordinate of the center point of the ground real posterior box, b is the coordinate of the center point of the prediction boundary box, IoU is the intersection of the area of the ground real posterior box and the area of the prediction prior box,

c represents the diagonal distance of the minimum closure area containing the real a-posteriori box and the prediction a-priori box,

，

two influencing factors, IoU,

、

The calculation formula of (a) is as follows:

wherein

，

The width and height of the ground real posterior box, w, h are the width and height of the prediction prior box,

the area of the real posterior box is shown,

representing the area of a prediction prior box;

class loss

With cross entropy, the calculation formula is as follows:

when the jth prior frame of the ith grid is responsible for a certain real object, calculating a classification loss function by a posterior frame generated by the prior frame; when the object is true, the object is,

=1, otherwise

=0；

Considering Re-ID loss, embedding Re-ID is treated as a classification task, all object instances of the same identity are treated as a class, an identity feature vector is extracted at position (i, j), and the mapping to a class distribution vector is learned

The category label of the real posterior box is expressed as

；

Calculating softmax loss

The calculation formula is as follows:

where K is the number of classes and N is the number of true posterior boxes.

The software for detecting, tracking and counting the underwater submarine fishes comprises the following steps: collecting real-time videos; extracting features in the video through a basic neural network (YOLOv 5); respectively inputting to a detection branch and a tracking branch, further, respectively modeling the characteristics in the two branches, carrying out nonlinear change, and detecting the position and the type of fish in a branch output picture, wherein: 1) the location, type of the output fish of the tracking branch and the ID (number) of each tracked fish; 2) correcting the output of the tracking branch with the detected branch as the final output; the results between each frame are matched using the online tracking part, i.e. particle filtering and KM algorithm, to match the number of fish in the video.

The invention provides an end-to-end neural network framework, which can directly output results while inputting videos, and provides an image enhancement algorithm at an input end, thereby obviously improving the accuracy of images.

For the FDT algorithm, when a section of underwater real-time video is input, after the underwater real-time video passes through a basic neural network (YOLOv 5), characteristics such as fish textures, fish shapes, fish sizes and the like in the video are extracted, then the characteristics are respectively input into a detection branch and a tracking branch, nonlinear change is carried out in the two branches through modeling the characteristics, and the position and the type of the fish in a branch output picture are detected. The location, type of fish and ID (number) of each tracked fish are output by the tracking branch. And finally, correcting the output of the tracking branch by using the detection branch as the final output. The position, category and number of the fish in each picture can be obtained. And then matching the number of the fish in the video by using an online tracking part, namely particle filtering and a KM algorithm to match the result between each frame.

In the process, detection and multi-target tracking algorithms are fused into a framework, so that tracking statistics of multi-class fish schools can be realized, an end-to-end unified neural network architecture is adopted, online processing can be realized, and a statistical result is output while a video is input.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a comparison diagram of an original image after image preprocessing and an original image after processing according to an embodiment of the present invention;

FIG. 2 is a diagram of an FDT algorithm architecture provided by an embodiment of the present invention;

FIG. 3 provides a process diagram of a training and testing phase according to an embodiment of the present invention;

fig. 4 is a schematic diagram of the OceanEye software main interface according to the embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a near-shore submarine fish detection and tracking statistical method based on deep learning, which is applied to fish population statistics.

The fish population statistical technique mainly comprises two parts, namely accurately identifying underwater fishes, and matching the fishes identified in each frame to form a tracking track.

The first part is the object recognition task. At present, in the field of computer vision, many target recognition algorithms have reached the accuracy of human eye recognition, but they are only focused on object recognition on land; the identification precision of the underwater complex environment is low, because light can be refracted and reflected in underwater transmission, the turbid underwater illumination is uneven, the light attenuation rates of the same wavelength are different, and the color shift phenomenon can occur in the underwater environment; for the above reasons, images photographed underwater have problems of degradation in quality such as low contrast, color distortion and blurring of texture, and thus, it is difficult to distinguish marine animals.

The second part is mainly the multi-target tracking task. The task is to match the objects identified in the two frames and determine the ID of the object in the video. However, the current multi-target tracking algorithm mainly focuses on single-class multi-target tracking. The workload of multi-class multi-target tracking is less, the mainstream algorithm is a two-stage algorithm of firstly identifying and then tracking, the real-time requirement cannot be met, and the current end-to-end algorithm capable of simultaneously identifying and tracking has lower precision. Furthermore, many fish are similar in size and appearance, and it is therefore difficult to distinguish such fish according to their texture and size, and many experiments are required to create a usable model; moreover, the fish swim irregularly in all directions; therefore, fish deformation and a blocking phenomenon may frequently occur.

The invention provides a near-shore submarine fish detection, tracking and statistics method based on deep learning, which comprises the following steps:

in this step, the picture processed in step 1 is input into an FDT neural network structure, and features in the picture, such as fish texture, fish shape, fish size, etc., are extracted after passing through a basic neural network YOLOv 5; the FDT neural network structure is shown in fig. 2.

In step 1, obtaining fish image information, and preprocessing the fish image information, specifically including:

acquiring an acquired underwater real-time video, cutting a fish image data set from the underwater real-time video, performing contrast processing on images in the fish image data set, and cutting the images subjected to the contrast processing to a specified size from an original size, wherein the original size is 1920 x 1080;

the cutting from the original size to the specified size specifically comprises:

calculating a scaling factor by taking the longest edge of the image as a reference edge, scaling the whole image to 608 × 342 through bilinear interpolation, then performing zero filling on the upper edge and the lower edge of the image, and finally obtaining a cut image with the specified size of 608 × 608;

performing contrast processing on the images in the fish image dataset, specifically comprising: the histogram of the RGB channel is color compensated and then the compensated image is CLAHE-processed with contrast limited adaptive histogram equalization.

By the contrast processing, the problems of underwater color distortion, low contrast, fuzzy details and the like are effectively solved; then, the image contrast is improved by using a self-adaptive histogram equalization CLAHE algorithm, the problems of color shift and contrast are solved, and the identification precision is improved. The images before and after enhancement are shown in fig. 1.

Performing color compensation on the histogram of the RGB channel, specifically comprising:

dividing the image into RGB three channels, respectively calculating the average value b of each channel_avg、g_avg、 r_avgAnd the minimum value of the three average values is used as the correction parameter value of the color shift, value = min { b }_avg，g_avg，r_avg｝；

The average value for each channel is calculated by the following formula:

wherein the index i ranges from 0 to n-1 and the index j ranges from 0 to m-1;

wherein the detection branch and the tracking branch share the same basic feature extraction network to reduce the amount of computation. The first output size of the tracking branch is 13 x 45. The first value 13 is the height of the feature map. The second value 13 is the width of the feature map. The third value 45 includes the x and y coordinates of the predicted center point for each channel, the width and height of the posterior box, the confidence and the probability of 9 fish. Similarly, the size of the second output is 26 × 26 × 45, and the size of the third output is 52 × 52 × 45. The tracking branch uses a one-time MOT method, which has four outputs, respectively, center point, posterior box position, center offset, and Re-ID. The center point displays the x-coordinate and the y-coordinate of the detected object. The posterior box position shows the height and width of the target box. The center offset output is responsible for more accurately locating the target for reducing quantization errors resulting from the signature steps. The Re-ID is used to identify whether the same object exists in different frames. Then, the correction tracking branch is output by the detection branch. The center point of the target is matched between the two branch outputs. Here, if the center point of the trace branch is the same as the center point of the detection branch, the posterior box corresponding to the target output by the trace branch is replaced by the detection branch. If the center point of the detected branch is not contained in the tracking list, the detected target is added to the tracking list. And finally, carrying out online tracking on the central point, the posterior frame, the fish and the Re-ID of each tracking target.

The advantages of the combination of detecting and tracking branches are summarized below. Multiple class objects cannot be tracked simultaneously using only one tracking branch. However, we can identify the class of the tracked object by detecting the additional class output of the branch. Therefore, multi-class synchronous tracking is realized. In addition, the output of the posterior box of the detection branch is more accurate than that of the tracking branch, and the tracking precision and the online tracking time are improved. Finally, due to the complexity of the underwater environment, such as when two fish overlap, the detection branch can extract more detailed features to track the correct target.

performing online tracking on the final output result, specifically comprising: extracting a posterior box by adopting a non-maximum inhibition method based on the heat map score; determining the positions of key points with the heat map scores larger than a threshold value, and calculating corresponding posterior frames according to the estimated offset and the size of the prediction frame; the box linking is implemented using an online tracking algorithm.

The method for realizing the frame linking by using the online tracking algorithm specifically comprises the following steps: initializing a tracking track set based on a detection frame in a first frame, and setting a threshold value; particle filtering is used to predict the location of the tracking trajectory in the current frame, and in subsequent frames, the Re-ID features and the IOU measurements are linked to a set of tracking trajectories if the distance between these blocks is greater than a threshold.

In this step, two parallel branches can get the posterior box and the Re-ID, as shown in FIG. 3.

The training process of the FDT neural network structure is as follows: and inputting the training sample tensor of the original reconstructed image and the training tensor of the target truth-value image into the FDT neural network structure, and performing cyclic training on the FDT neural network structure until the loss function output by the network is lower than a set threshold value.

Said loss functionlossIs composed of three parts, namely posterior frame loss

(loss function of predicted true position), class loss

(loss function of predicted species class) and losses

(predicted loss function of biological number), the calculation formula is as follows:

loss of posterior box

The calculation formula of (a) is as follows,

in the formula, the overlapping area, the distance between the central points and the length-width ratio are considered at the same time, and the CIoU can obtain better convergence speed and precision on the BBox regression problem;

wherein the content of the first and second substances,

c represents the diagonal distance of the minimum closure area that can contain the true a posteriori box and the predicted a priori box,

，

are two influencing factors, IoU,

、

The calculation formula of (a) is as follows:

wherein the content of the first and second substances,

and

the width and height of the ground real posterior box, w and h are the width and height of the prediction prior box,

it refers to the area of the real posterior box,

refers to the predicted prior box area;

class loss

With cross entropy, the calculation formula is as follows:

=1, otherwise

=0；

The category label of the real posterior box is expressed as

；

Calculating softmax loss

The calculation formula is as follows:

where K is the number of classes and N is the number of true posterior boxes.

In a more preferred embodiment of the invention, the invention also provides a software for detecting and tracking the underwater submarine fishes, which is based on the algorithm; the software interface is as shown in fig. 4, specifically, the input video is clicked in a cloud storage manner, and the processed video and the corresponding statistical results of each category are displayed in real time. When a section of underwater real-time video is input, after the underwater real-time video passes through a basic neural network (YOLOv 5), characteristics in the video, such as fish textures, fish shapes, fish sizes and the like, are extracted, then the characteristics are respectively input into a detection branch and a tracking branch, nonlinear change is carried out in the two branches through modeling the characteristics, and the position and the type of the fish in an output picture of the detection branch are detected. The location, type of fish and ID (number) of each tracked fish are output by the tracking branch. And finally, correcting the output of the tracking branch by using the detection branch as the final output. The position, category and number of the fish in each picture can be obtained. And then matching results between each frame by using an online tracking part, namely particle filtering and a KM algorithm, so as to match the numbers of the fish in the video. In the process, detection and multi-target tracking algorithms are fused into a framework, so that tracking statistics of multi-class fish schools can be realized, an end-to-end unified neural network architecture is adopted, online processing can be realized, and a statistical result is output while a video is input.

Furthermore, the software detection and tracking process comprises the following steps: collecting real-time videos; extracting features in the video through a basic neural network (YOLOv 5); respectively inputting to a detection branch and a tracking branch, further, respectively modeling the characteristics in the two branches, carrying out nonlinear change, and detecting the position and the type of fish in a branch output picture, wherein: 1 tracking the location, type of the branched output fish and the ID (number) of each tracked fish; 2 correcting the output of the tracking branch by using the detection branch as the final output; the results between each frame are matched using the online tracking part, i.e. particle filtering and KM algorithm, to match the number of fish in the video.

While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that the described embodiments may be modified in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are illustrative in nature and should not be construed as limiting the scope of the invention.

Claims

1. A near-shore submarine fish detection and tracking statistical method based on deep learning is characterized by comprising the following steps:

2. The method for detecting, tracking and counting fish on the near shore seabed based on deep learning of claim 1, wherein in step 1, fish image information is obtained, and the fish image information is preprocessed, specifically comprising: acquiring an acquired underwater real-time video, cutting a fish image data set from the underwater real-time video, performing contrast processing on images in the fish image data set, and cutting the images subjected to the contrast processing to a specified size from an original size, wherein the original size is 1920 x 1080.

3. The deep learning-based offshore fish detection and tracking statistical method according to claim 2, wherein the cutting from an original size to a specified size specifically comprises: and calculating a scaling factor by taking the longest edge of the image as a reference edge, scaling the whole image to 608 × 342 through bilinear interpolation, then performing zero filling on the upper and lower edges of the image, and finally obtaining a cropped image with the specified size of 608 × 608.

4. The method for offshore submarine fish detection and tracking statistics based on deep learning according to claim 2, wherein the contrast processing of the images in the fish image dataset specifically comprises: the histogram of the RGB channel is color compensated and then the compensated image is CLAHE-processed with contrast limited adaptive histogram equalization.

5. The method for offshore fish detection and tracking statistics based on deep learning as claimed in claim 4, wherein the color compensation of the histogram of RGB channels specifically comprises:

The average value for each channel is calculated by the following formula:

wherein the index i ranges from 0 to n-1 and the index j ranges from 0 to m-1;

6. The method for offshore submarine fish detection and tracking statistics based on deep learning of claim 1, wherein the online tracking of the final output result specifically comprises: extracting a posterior box by adopting a non-maximum inhibition method based on the heat map score; determining the positions of key points with the heat map scores larger than a threshold value, and calculating corresponding posterior frames according to the estimated offset and the size of the prediction frame; the box linking is implemented using an online tracking algorithm.

7. The method for offshore fish detection and tracking statistics based on deep learning as claimed in claim 6, wherein the online tracking algorithm is used to realize frame linking, and specifically comprises: initializing a tracking track set based on a detection frame in a first frame, and setting a threshold value; particle filtering is used to predict the location of the tracking trajectory in the current frame, and in subsequent frames, the Re-ID features and the IOU measurements are linked to a set of tracking trajectories if the distance between these blocks is greater than a threshold.

8. The deep learning-based offshore fish detection and tracking statistical method according to claim 1, wherein the FDT neural network structure is trained as follows: and inputting the training sample tensor of the original reconstructed image and the training tensor of the target truth-value image into the FDT neural network structure, and performing cyclic training on the FDT neural network structure until the loss function output by the network is lower than a set threshold value.

9. The deep learning-based offshore fish detection and tracking statistical method according to claim 8, wherein the loss function is calculated by the following formula:

wherein the content of the first and second substances,

represents the posterior box loss,

Represents the loss of the category and

indicating loss.

10. The offshore seafloor fish detection and tracking statistical method based on deep learning of claim 9,

loss of posterior box

The calculation formula of (a) is as follows,

wherein the content of the first and second substances,

，

two influencing factors, IoU,

、

The calculation formula of (a) is as follows:

wherein

，

the area of the real posterior box is shown,

representing the area of a prediction prior box;

class loss

With cross entropy, the calculation formula is as follows:

=1, otherwise

=0；

In considering Re-ID loss, Re-ID embedding is handled as a classification task,all object instances of the same identity are treated as a class, an identity feature vector is extracted at location (i, j), and the mapping to a class distribution vector is learned

The category label of the real posterior box is expressed as

；

Calculating softmax loss

The calculation formula is as follows:

where K is the number of classes and N is the number of true posterior boxes.