CN115496780A

CN115496780A - Fish identification counting, speed measurement early warning and access quantity statistical system

Info

Publication number: CN115496780A
Application number: CN202211148365.XA
Authority: CN
Inventors: 徐敬; 廖文栋
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2022-12-20

Abstract

The invention discloses a fish identification counting, speed measurement early warning and in-out quantity counting system. A user can customize a training data set by combining the actual situation of the user, and a series of functions of multi-target tracking of fishes, counting of the number of various fishes in a picture, calculation of the number of fishes entering and leaving a monitoring picture, speed measurement of the moving speed of each fish and the like can be realized by combining a trained weight file with a Deepsort-based multi-target tracking algorithm, a biconvex track measurement and calculation algorithm and a visual gradient-based speed measurement algorithm which are used in the invention. The intelligent management and control system can achieve the effect of intelligent management and control according to the ornamental fishes to be managed by the intelligent management and control system. The method has good application expansibility, and can perform a plurality of secondary developments by adding some algorithms in combination with user requirements.

Description

Fish identification counting, speed measurement early warning and access quantity statistical system

Technical Field

The invention relates to the field of artificial intelligence, in particular to a fish identification counting, speed measurement early warning and in-out quantity counting system.

Background

In recent years, with the economic level rise of Chinese people, some artistic appreciation and leisure and entertainment activities are gradually prevalent, and the enjoyment and cultivation of ornamental fishes is an interesting activity at present, and the beautiful images of marine ornamental fishes can be seen in many marine museums, shopping malls, hotels, tourist attractions, entertainment and exhibition places, and even in individual houses. In recent years, the culture scale and the exhibition scale of the marine ornamental fishes become larger, and a series of problems are brought along with the increase of the culture scale and the exhibition scale. The ocean ornamental fishes need to be finely managed and scientifically raised in most cases because of a series of factors of higher economic price, meeting the requirements of mass ornamental arts, bringing joyful mood for people to appreciate and the like. However, manual management is time-consuming and labor-consuming, and the management effect is general, so that an artificial intelligence technology is urgently needed to alleviate the problem.

At present, some people only use artificial intelligence technology to identify fish species, and do not further research to monitor more fish indexes and do not make deeper research for intelligent culture.

Disclosure of Invention

The invention provides a fish identification counting, speed measurement early warning and in-out quantity counting system aiming at the problems in the background technology. The system can realize the functions of dynamically tracking fishes, counting the number of various fishes in a picture, calculating the number of fishes entering and leaving a monitoring picture, carrying out speed measurement and early warning on the swimming speed of each fish and the like.

1. The technical scheme of the invention comprises the following steps:

s1: the method comprises the steps of collecting videos or pictures of 5 ornamental fishes as a data set, labeling the fish coordinates in the pictures by using a genie labeling assistant, and then generating a labeling file. The labeled data set is divided into a training set and a verification set by using a Python program script.

S2: and putting the data set into a network structure of YOLOv5 for training, and storing the trained weight file after the training is finished.

S3: and calling the weight file to perform target detection on the video frame, and detecting a fish target frame in the picture. And processing the fish by a Deepsort-based multi-target tracking algorithm and a visual gradient-based speed measurement algorithm to obtain the sequence number and the speed of the fish target frame.

S4: and then obtaining the number of the entering and leaving fishes through a counting algorithm based on biconvex trajectory measurement.

S5: and outputting a system detection result.

2. In some alternative embodiments, step S1 comprises:

s11: and recording the video of the ornamental fish, taking the photo of the ornamental fish and searching the video or the photo of the ornamental fish on the internet by using a camera, and taking the video and the photo as a training data set. The video is converted into pictures for training, so the following operations are performed. Simultaneously pressing a Windows key and an R key on a keyboard, inputting cmd, clicking an open command line window, and inputting pip install ffmpeg to install the ffmpeg (a toolkit in a command line). After ffmpeg is installed, ffmpeg is used in the command line to convert the video into a picture data set according to a certain sampling rate, and instructions such as: mpeg-i test. Mp4-r5-f image2 \ output \ frame _%05d. Jpg, where-i is followed by a video file suffixed mp 4; wherein-r is followed by a sampling rate, i.e. several pictures are split in one second of video; where-f is followed by the type of picture saved, and finally the output picture save folder and naming format, where% 05d indicates a sequence number of 5 digits, and the picture naming is continued from frame _00000.jpg, frame _00001.jpg, frame _00002.jpg \8230, 8230, all the time.

S12: a genie marking assistant (software for making a label on a data set) is used for marking the data set, several kinds of ornamental fishes are defined, and 5 kinds of ornamental fishes are used in the development of the system: clown Fish (Clowfish), spotted puffer Fish (Yellow Box Fish), horse Fish (Schooling Banner Fish), emperor Angel Fish (Emperor Angelfish), yellow Tang Fish (Yellow Tang Fish); then, a box is drawn with a mouse for all the ornamental fishes appearing in each picture, the ornamental fishes are framed, and an ornamental fish species, such as one of the aforementioned 5 species, is selected for each box. After all the annotations are finished, setting an export path and an export format, wherein the export format is selected as passacal-voc, and 5228 pictures generate 5228 annotation files with the suffixes of xml format.

S13: because the training uses a markup file suffixed with txt, then a markup file conversion is performed. The Python program script is used for batch conversion. Import the relevant library file using import os and import xml. Etree. Elementtree, convert the pixel coordinates in xml file to txt normalized coordinates, and the fish species are represented by 5 numbers 0,1,2,3, 4. The conversion equations are shown in equation 1, equation 2, equation 3, and equation 4.

Where (x, y) -coordinates of the centre of the box after normalization

w, h-normalized Box Width and height

(x _min ，y _min ) Coordinates of the upper left corner of the square

(x _max ，y _max ) Coordinates of the lower right corner of the square

width, height-width and height of the entire picture

3. In some alternative embodiments, step S2 comprises:

s21: in the Input part of the network fabric. (1) Firstly conducting Mosaic data enhancement, and carrying out zooming, distribution and cutting on 4 input images by the algorithm to randomly splice the images. The method enables input images to be diversified, the trained model is more beneficial to detecting small objects, and meanwhile, originally processed 4 pictures are changed into 1 picture, so that the calculated amount is reduced. (2) Then, an adaptive picture scaling algorithm is carried out, and the algorithm is used during testing and model inference, so that the inference speed of the algorithm is greatly improved. (3) And then, performing a self-adaptive anchor frame calculation algorithm, putting a program corresponding to the tested initial anchor frame value into the original code, and obtaining the optimal anchor frame value each time the data set is trained.

S22: in the BackBone part of the network structure, in the Focus structure, a picture is sliced, then a Concat operation is carried out, and finally the picture becomes an easily processed characteristic diagram after a convolution operation. This reduces floating point operations and speeds up computation. For example, the 796 × 796 × 3 picture is sliced, spliced and convolved to become a 398 × 398 × 12 feature map. Then, the feature map is processed by a CSPNet structure, the operation of the structure mainly splits the feature map into two parts, one part is processed by convolution operation, and the other part is spliced with the result of the convolution operation of the previous part. This structure can reduce the amount of calculation, but the improvement in precision is small.

S23: in the Neck part of the network structure, an FPN-PAN structure is used, multi-dimensional feature extraction is carried out on the structure, and the receptive field is greatly increased.

S24: in the Head part of the network structure, a CIOU Loss function is used as a regression Loss function of the bounding box, the Loss function considers more comprehensive information, and the actual effect is better. In the post-processing stage of target detection, non-maximum suppression operation is performed for screening of a plurality of target frames. From FIG. 6, equation 5 for calculating CIOU Loss is derived.

Distance _ C in the formula-the diagonal length of the minimum bounding rectangle

Distance _ 2-length of line connecting center points of two bounding boxes

IOU-cross-over ratio

v-measure aspect ratio uniformity parameter

4. In some alternative embodiments, step S3 comprises:

s31: the method mainly comprises the following steps of a Deepsort-based multi-target tracking algorithm: (1) Firstly, a trained fish classification weight file is used for detecting a fish target in a video frame to obtain characteristic information, wherein the characteristic information comprises position information, category information, confidence information and the like of the fish. (2) The matching target is carried out by calculating the matching degree of the two frames of image information before and after, and then the serial number is distributed to each tracked target, so that the dynamic tracking of various fishes is realized. The Deepsort algorithm adds cascade matching and new track confirmation, and solves the problem that the object in the Sort algorithm is disconnected due to shielding. The tracks are divided into acknowledged tracks and unacknowledged tracks. The newly generated track must be continuously matched with the detector for a plurality of times, and the unconfirmed track can be converted into the confirmed track; the trace of the confirmation state is continuously disconnected with the detector for a plurality of times, and then the trace of the confirmation state is deleted.

S32: the core idea of the speed measurement algorithm based on the visual gradient is as follows. Pixel coordinate recording (1920 × 1280 resolution) is performed first: and (4) recording all the serial numbers of the fishes and the center XY coordinates detected and sequenced by the Deepsort algorithm in a list. When a plurality of video frames are spaced and the same ID number is tracked, the pixel point speed is calculated, and the calculation formulas are formula 6 and formula 7.

(x-x ^-1 ) ² +(y-y ^-1 ) ² ＝Distance ² (formula 6)

Where x, y-position coordinates after movement

x ^-1 ，y ^-1 Position coordinates before movement

Distance-Distance of moving position

t-time of movement

-pixel point velocity

The speed of movement in a video frame, i.e. the speed of a pixel moving in a unit time, is called the pixel speed. The pixel point speed and the actual speed are still different, and the physical model and the lens of the shooting camera are required to be subjected to inclination angle analysis. In the same time, under the condition of the same swimming distance, more pixel points are passed by the fishes near the camera in swimming; the fish near the camera swims through fewer pixel points. Therefore, the former needs to be scaled down by adding a scaling factor, and the latter needs to be scaled up by adding a scaling factor, but a parameter which is difficult to find by manual operation is provided. The best solution is to measure multiple sets of actual velocities and pixel point velocities and solve the equations using machine learning, as shown in equation 8.

Where λ is a proportionality coefficient between actual speed and pixel speed

s _i -machine learning the parameters to be solved from the sets of actual velocities and pixel point velocities

y-vertical axis pixel coordinate

n is an integer, a value being taken according to the actual situation

v-actual speed

5. In some alternative embodiments, step S4 comprises:

s41: the core idea of the counting algorithm based on the biconvex trajectory estimation is as follows. (1) drawing two convex detection lines on the video: green line, yellow line. (2) And continuously comparing the fish position coordinates with the coordinates of the two lines. And counting when the coordinates of the monitoring points are crossed with the two convex detection lines. (3) The fish position coordinate firstly passes through a green line and then a yellow line, the fish is judged to enter a monitoring area, and the number of the fishes is added by 1; the fish position coordinate passes through a yellow line and then a green line, the fish is judged to leave the monitoring area, and the number of the fishes is reduced by 1.

6. In general, compared with the prior art, the above technical solution of the present invention can achieve the following beneficial effects:

(1) The invention realizes a series of functions of multi-target tracking of fishes, counting of the quantity of various fishes in a picture, calculation of the quantity of fishes entering and leaving a monitoring picture, speed measurement of the swimming speed of each fish and the like by using a Deepsort-based multi-target tracking algorithm, a double convex track measurement-based counting algorithm and a visual gradient-based speed measurement algorithm.

(2) The intelligent management and control system can automatically collect data sets for training according to ornamental fishes to be managed by the intelligent management and control system, and achieves the effect of intelligent management and control.

(3) The invention has better application expansibility, combines with the user requirements, adds some algorithms and can carry out a plurality of secondary developments, such as: fish speed measurement early warning, fish separation from a water tank detection, fish theft alarm and the like.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention.

FIG. 2 is a graph of the effect of weight file detection of the present invention.

Fig. 3 is the Mosaic data enhancement of the present invention.

Fig. 4 is the adaptive picture scaling of the present invention.

Fig. 5 is a FPN-PAN structure of the present invention.

FIG. 6 is a CIOU Loss schematic diagram of the present invention.

Fig. 7 is a non-maxima suppression operation of the present invention.

FIG. 8 is a workflow of the Deepsort algorithm of the present invention.

Fig. 9 is a schematic diagram of a biconvex trajectory estimation of the invention.

FIG. 10 shows the results of the system test of the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

A fish identification counting, speed measurement early warning and access quantity statistical system comprises the following steps:

s1: the method comprises the steps of collecting 5 ornamental fish videos or pictures as a data set, labeling the fish coordinates in the pictures by using a genie labeling assistant, and then generating a labeling file. The labeled data set is divided into a training set and a verification set by using a Python program script.

In this embodiment, step S1 may be implemented by:

s11: and recording the video of the ornamental fish, taking the photo of the ornamental fish and searching the video or the photo of the ornamental fish on the internet by using a camera, and taking the video and the photo as a training data set. The video is converted into pictures for training, so the following operations are carried out. And simultaneously pressing a Windows key and an R key on a keyboard, inputting a cmd, clicking to open a command line window, and inputting pip install ffmpeg to install the ffmpeg (a tool kit in a command line). After ffmpeg is installed, ffmpeg is used in the command line to convert the video into a picture data set according to a certain sampling rate, and instructions such as: mpeg-i test. Mp4-r5-f image2 \ output \ frame _%05d.jpg, where-i is followed by a video file with mp4 suffix; wherein-r is followed by a sampling rate, i.e. several pictures are split in one second of video; where-f is followed by the type of saved pictures and finally the output picture deposit folder and naming format, where% 05d indicates a sequence number of 5 digits, picture naming is continued from frame _00000.jpg, frame _00001.jpg, frame _00002.jpg \8230, 8230, and so forth.

S12: using genius marking assistant (a piece of software for making label on the data set) to mark the data set, firstly defining several kinds of ornamental fishes, and using 5 kinds of ornamental fishes in the system development: clown Fish (Clowfish), spotted-spotted globefish (Yellow Box Fish), horse-Fish (Schooling Banner Fish), imperial Emperor Angelfish (Emperor Angelfish), yellow Tang Fish (Yellow Tang Fish); then, a box is drawn with a mouse for all the ornamental fishes appearing in each picture, the ornamental fishes are framed, and an ornamental fish species, such as one of the aforementioned 5 species, is selected for each box. After all the annotations are finished, setting an export path and an export format, wherein the export format is selected as passacal-voc, and 5228 pictures generate 5228 annotation files with the suffixes of xml format.

Where (x, y) -coordinates of the centre of the box after normalization

w, h-normalized Box Width and height

(x _min ，y _min ) Coordinates of the upper left corner of the square

(x _max ，y _max ) Coordinates of the lower right corner of the square

width, height-width and height of the entire picture

S2: putting the data set into a network structure of YOL0v5 for training, and after the training is finished, saving the trained weight file, wherein the detection effect of the weight file is shown in FIG. 2.

In this embodiment, step S2 can be implemented by:

s21: in the Input part of the network fabric. (1) Firstly, conducting Mosaic data enhancement, and the algorithm randomly splices 4 input images through zooming, distribution and cutting, as shown in fig. 3. The method enables input images to be more diversified, the trained model is more beneficial to detecting small objects, and meanwhile, originally, 4 images are processed into 1 image, so that the calculated amount is reduced. (2) Then, an adaptive picture scaling algorithm is performed, and the algorithm is used during testing and model inference, which greatly improves the inference speed of the algorithm, as shown in fig. 4. (3) And then, performing a self-adaptive anchor frame calculation algorithm, putting a program corresponding to the tested initial anchor frame value into the original code, and obtaining the optimal anchor frame value each time the data set is trained.

S22: in the BackBone part of the network structure, in the Focus structure, a picture is sliced, then Concat operation is carried out, and finally the picture becomes an easily processed characteristic diagram after convolution operation. This reduces floating point operations and speeds up computation. For example, the 796 × 796 × 3 picture is sliced, spliced and convolved to become a 398 × 398 × 12 feature map. And then, through a CSPNet structure, the operation of the structure mainly splits the characteristic diagram into two parts, one part is subjected to convolution operation, and the other part is spliced with the result of the convolution operation of the previous part. This structure can reduce the amount of calculation, but the precision improvement is small.

S23: in the hack part of the network structure, an FPN-PAN structure is used, and the structure carries out multi-dimensional feature extraction, so that the receptive field is greatly increased, as shown in FIG. 5.

S24: in the Head part of the network structure, a CIOU Loss function is used as a regression Loss function of the bounding box, the Loss function considers more comprehensive information, and the actual effect is better. In the post-processing stage of target detection, a non-maximum suppression operation is performed for the screening of a large number of target frames, as shown in fig. 7. From FIG. 6, equation 5 for calculating CIOU Loss is derived.

Distance _ 2-length of line connecting center points of two bounding boxes

IOU-cross-over ratio

v-measure aspect ratio uniformity parameter

S3: and calling the weight file to perform target detection on the video frame, and detecting a fish target frame in the picture. And processing the fish target frames by a Deepsort-based multi-target tracking algorithm and a visual gradient-based speed measurement algorithm to obtain the serial numbers and the speeds of the fish target frames.

In this embodiment, step S3 can be implemented by:

s31: the method mainly comprises the following steps of a multi-target tracking algorithm based on Deepsort: (1) Firstly, a trained fish classification weight file is used for detecting a fish target in a video frame to obtain characteristic information, wherein the characteristic information comprises position information, category information, confidence information and the like of the fish. (2) The matching target is carried out by calculating the matching degree of the two frames of image information before and after, and then the serial number is distributed to each tracked target, so that the dynamic tracking of various fishes is realized. The Deepsort algorithm adds cascade matching and new track confirmation, and solves the problem that the object in the Sort algorithm is disconnected due to shielding. The trajectories are divided into confirmed state trajectories and unconfirmed state trajectories. The newly generated track must be continuously matched with the detector for a plurality of times, and the unconfirmed track can be converted into the confirmed track; the confirmation state track is continuous and the detector is disconnected for a plurality of times, and then the confirmation state track is deleted. The workflow of the Deepsort algorithm is shown in FIG. 8.

S32: the core idea of the speed measurement algorithm based on the visual gradient is as follows. Pixel coordinate recording (1920 × 1280 resolution) is performed first: and (4) recording all the fish ID serial numbers and center XY coordinates detected and sequenced by the Deepsort algorithm into a list. When a plurality of video frames are spaced and the same ID number is tracked, the pixel point speed is calculated, and the calculation formulas are formula 6 and formula 7.

(x-x ^-1 ) ² +(y-y ^-1 ) ² ＝Distance ² (formula 6)

Where x, y-position coordinates after movement

x ^-1 ，y ^-1 -position coordinates before movement

Distance-Distance of moving position

t-time of movement

-pixel point velocity

The speed of movement in the video frame, i.e. the moving pixels per unit time, is called the pixel speed. The pixel point speed and the actual speed are still different, and the physical model and the lens of the shooting camera are required to be subjected to inclination angle analysis. In the same time, under the condition of the same swimming distance, more pixel points are passed by the fishes near the camera in swimming; the fish near the camera swims through fewer pixel points. Therefore, the former needs to be scaled down by a scaling factor, and the latter needs to be scaled up by a scaling factor, but a parameter which is difficult to find by manual operation is provided. The best solution is to measure multiple sets of actual velocities and pixel point velocities and solve the equations using machine learning, as shown in equation 8.

Where λ is a proportionality coefficient of actual speed and pixel speed

y-vertical axis pixel coordinate

n is an integer, which is taken to be a value according to actual conditions

v-actual speed

In this embodiment, step S4 can be implemented by:

s41: the core idea of the counting algorithm based on the biconvex trajectory estimation is as follows. (1) drawing two convex detection lines on a video: green line, yellow line, as shown in fig. 9. (2) And continuously comparing the fish position coordinates with the coordinates of the two lines. And counting when the coordinates of the monitoring points are crossed with the two convex detection lines. (3) The fish position coordinate firstly passes through a green line and then a yellow line, the fish is judged to enter a monitoring area, and the number of the fishes is added by 1; the fish position coordinate passes through a yellow line and then a green line, the fish is judged to leave the monitoring area, and the number of the fishes is reduced by 1.

After the above steps, a system detection result is output, as shown in fig. 10.

It should be noted that, according to the implementation requirement of the method, each step described in the present application can be divided into more steps, or two or more steps or partial operations of the steps can be combined into a new step, so as to achieve the purpose of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The utility model provides a fish discernment count, speed measurement early warning and business turn over quantity statistical system which characterized in that includes:

S2: and (4) putting the data set into a network structure of YOLOv5 for training, and storing a trained weight file after training is finished.

S4: and then obtaining the number of the entering and leaving fishes through a counting algorithm based on biconvex trajectory measurement and calculation.

S5: and outputting a system detection result.

2. The method according to claim 1, wherein step S1 comprises:

s11: and recording the video of the ornamental fish, taking the photo of the ornamental fish and searching the video or the photo of the ornamental fish on the internet by using a camera, and taking the video and the photo as a training data set. The video is converted into pictures for training, so the following operations are performed. And simultaneously pressing a Windows key and an R key on a keyboard, inputting a cmd, clicking to open a command line window, and inputting pip install ffmpeg to install the ffmpeg (a tool kit in a command line). After ffmpeg is installed, ffmpeg is used in the command line to convert the video into a picture data set according to a certain sampling rate, and instructions are used, such as: mpeg-i test. Mp4-r5-f image2 \ output \ frame%05d. Jpg, where-i is followed by a video file with the suffix mp 4; wherein, the sampling rate is followed by r, namely, a video is divided into a plurality of pictures in one second; where-f is followed by the type of saved pictures and finally the output picture deposit folder and naming format, where% 05d indicates a sequence number of 5 digits, picture naming is continued from frame _00000.jpg, frame _00001.jpg, frame _00002.jpg \8230, 8230, and so forth.

S12: a genie marking assistant (software for making a label on a data set) is used for marking the data set, several kinds of ornamental fishes are defined, and 5 kinds of ornamental fishes are used in the development of the system: clown Fish (Clowfish), spotted-spotted globefish (Yellow Box Fish), horse-Fish (Schooling Banner Fish), imperial Emperor Angelfish (Emperor Angelfish), yellow Tang Fish (Yellow Tang Fish); then, a box is drawn with a mouse for all the ornamental fishes appearing in each picture, the ornamental fishes are framed, and one ornamental fish species, such as one of the aforementioned 5 species, is selected for each box. After all the annotations are marked, setting an export path and an export format, wherein the export format is selected as passacal-voc, and 5228 pictures generate 5228 annotation files with the suffix of xml format.

Where (x, y) -coordinates of the centre of the box after normalization

w, h-normalized Box Width and height

(x _min ，y _min ) Coordinates of the upper left corner of the square

(x _max ，y _max ) Coordinates of the lower right corner of the square

width, height — width and height of the entire picture.

3. The method according to claim 2, wherein step S2 comprises:

s21: in the Input part of the network fabric. (1) Firstly, conducting Mosaic data enhancement, and randomly splicing 4 input images by the algorithm through zooming, distribution and cutting. The method enables input images to be more diversified, the trained model is more beneficial to detecting small objects, and meanwhile, originally, 4 images are processed into 1 image, so that the calculated amount is reduced. (2) Then, an adaptive picture scaling algorithm is carried out, and the algorithm is used during testing and model inference, so that the inference speed of the algorithm is greatly improved. (3) And then, performing a self-adaptive anchor frame calculation algorithm, putting a program corresponding to the tested initial anchor frame value into the original code, and obtaining the optimal anchor frame value each time the data set is trained.

S22: in the BackBone part of the network structure, in the Focus structure, a picture is sliced, then Concat operation is carried out, and finally the picture becomes an easily processed characteristic diagram after convolution operation. This reduces floating point operations and speeds up computation. For example, the 796 × 796 × 3 picture is sliced, spliced and convolved to become a 398 × 398 × 12 feature map. Then, the feature map is processed by a CSPNet structure, the operation of the structure mainly splits the feature map into two parts, one part is processed by convolution operation, and the other part is spliced with the result of the convolution operation of the previous part. This structure can reduce the amount of calculation, but the improvement in precision is small.

S24: in the Head part of the network structure, a CIOU Loss function is used as a regression Loss function of the bounding box, the Loss function considers more comprehensive information, and the actual effect is better. In the post-processing stage of target detection, a non-maximum suppression operation is performed on the screening of a plurality of target frames. From FIG. 6, equation 5 for calculating CIOU Loss is derived.

Distance _ C in the formula-the diagonal length of the minimum circumscribed rectangle

Distance _ 2-length of line connecting center points of two bounding boxes

IOU-cross-over ratio

v-measure aspect ratio uniformity parameter.

4. The method according to claim 3, wherein step S3 comprises:

s31: the method mainly comprises the following steps of a multi-target tracking algorithm based on Deepsort: (1) Firstly, a trained fish classification weight file is used for detecting fish targets in a video frame to obtain characteristic information, wherein the information comprises position information, category information, confidence information and the like of fishes. (2) The matching target is carried out by calculating the matching degree of the two frames of image information before and after, and then the serial number is distributed to each tracked target, so that the dynamic tracking of various fishes is realized. The Deepsort algorithm adds cascade matching and new track confirmation, and solves the problem that the object in the Sort algorithm is disconnected due to shielding. The trajectories are divided into confirmed state trajectories and unconfirmed state trajectories. The newly generated track must be continuously matched with the detector for multiple times, and the unconfirmed track can be converted into the confirmed track; the trace of the confirmation state is continuously disconnected with the detector for a plurality of times, and then the trace of the confirmation state is deleted.

(x-x ^-1 ) ² +(y-y ^-1 ) ² ＝Distance ² (formula 6)

Where x, y-position coordinates after movement

x ^-1 ，y ^-1 Position coordinates before movement

Distance-Distance of moving position

t- — moving time

-pixel point velocity

The speed of movement in the video frame, i.e. the moving pixels per unit time, is called the pixel speed. The pixel point speed and the actual speed are still different, and the physical model and the lens of the shooting camera are required to be subjected to inclination angle analysis. In the same time, under the condition of the same swimming distance, more pixel points are traveled by the fishes close to the camera; the number of pixels through which fish near the camera swim is small. Therefore, the former needs to be scaled down by adding a scaling factor, and the latter needs to be scaled up by adding a scaling factor, but a parameter which is difficult to find by manual operation is provided. The best solution is to measure multiple sets of actual velocities and pixel point velocities and solve the equations using machine learning, as shown in equation 8.

Where λ is a proportionality coefficient of actual speed and pixel speed

y-vertical axis pixel coordinate

n is an integer, which is taken to be a value according to actual conditions

v-actual speed.

5. The method according to claim 4, wherein step S4 comprises:

s41: the core idea of the counting algorithm based on the biconvex trajectory estimation is as follows. (1) drawing two convex detection lines on the video: green line, yellow line. (2) And continuously comparing the fish position coordinates with the coordinates of the two lines. And counting when the coordinates of the monitoring points are crossed with the two convex detection lines. (3) The fish position coordinate firstly passes through a green line and then passes through a yellow line, the fish is judged to enter a monitoring area, and the number of the fishes is increased by 1; the fish position coordinate passes through a yellow line and then a green line, the fish is judged to leave the monitoring area, and the number of the fishes is reduced by 1.