CN111476828A

CN111476828A - Multi-view animal group tracking method and device

Info

Publication number: CN111476828A
Application number: CN202010231337.9A
Authority: CN
Inventors: 刘烨斌; 王松涛; 安亮; 张宇翔; 戴琼海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-07-31
Anticipated expiration: 2040-03-27
Also published as: CN111476828B

Abstract

The invention discloses a multi-view animal group tracking method and a multi-view animal group tracking device, wherein the method comprises the following steps: acquiring internal parameters and external parameters of a camera; solving a basic matrix according to the internal parameters and the external parameters and calculating epipolar constraints of different viewpoints; removing single-viewpoint detection error areas according to epipolar constraints of different viewpoints, and detecting animal skeleton characteristics to estimate the animal motion direction; and distinguishing different moving animal Identities (IDs) according to the animal posture characteristics to obtain a tracking result. According to the tracking method, the accuracy and the reliability of tracking can be effectively improved, and the method has strong robustness and is simple and easy to implement.

Description

Multi-view animal group tracking method and device

Technical Field

The invention relates to the technical field of visual target tracking, in particular to a multi-view animal group tracking method and device.

Background

In the motion interaction process of animal groups, serious and frequent mutual shielding conditions exist, because the observation visual angle is limited, the single-viewpoint tracking method can lose the animal area, and the continuous motion track of animals cannot be maintained.

In the related art, posture information of a tracked target is not considered based on a detection frame tracking method, so that predicted movement information is not accurate enough, searching is performed only by enlarging a retrieval area, once animal Identity (Identity document) is distinguished in a process that animals with similar appearances move relatively closely, distinguishing cannot be performed only according to the apparent features of the animals, accuracy and reliability of tracking cannot be guaranteed, and a solution is needed.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention aims to provide a multi-view animal population tracking method which can effectively improve the accuracy and reliability of tracking.

Another object of the present invention is to provide a multi-view animal population tracking device.

In order to achieve the above object, an embodiment of an aspect of the present invention provides a multi-view animal population tracking method, including the following steps: acquiring internal parameters and external parameters of a camera; solving a basic matrix according to the internal parameters and the external parameters and calculating epipolar constraints of different viewpoints; removing a single-viewpoint detection error region according to the epipolar constraints of the different viewpoints, and detecting animal skeleton characteristics to estimate the animal motion direction; and distinguishing different moving animal Identities (IDs) according to the animal posture characteristics to obtain a tracking result.

The multi-view animal population tracking method provided by the embodiment of the invention removes a single-view detection error region based on multi-view antipodal constraints, estimates the animal motion direction based on animal skeleton characteristics, and is used for distinguishing different moving animal identity IDs (identities) based on animal posture characteristics, thereby effectively improving the tracking accuracy and reliability, and being simple and easy to implement.

In addition, the multi-view animal population tracking method according to the above embodiment of the present invention may further have the following additional technical features:

optionally, in an embodiment of the present invention, the calculation formula of the base matrix is:

E～[T]_×R，

wherein the external parameter represents [ R T ];

the calculation formula of the epipolar constraints of different viewpoints is as follows:

F～K^-1TEK′^-1，

wherein E represents the fundamental matrix and K represents the internal parameters.

Further, in an embodiment of the present invention, the removing the single-view detection error region according to the different-view epipolar constraints includes: carrying out single-viewpoint animal region detection through a fast RCNN to obtain an animal rectangular frame candidate region under a corresponding viewpoint; projecting the central points of the rectangular frame candidate areas detected by other viewpoints to the corresponding viewpoints according to the epipolar constraints of the different viewpoints, and matching through the minimum Euclidean distance; and removing the rectangular frame area with the distance exceeding a preset threshold value.

Further, in an embodiment of the present invention, the detecting the animal skeleton feature to estimate the animal movement direction includes: detecting key points under corresponding viewpoints in a current rectangular frame candidate region through an HRNet network to obtain 2D key points of the animal; and connecting according to the 2D key points to obtain the skeleton characteristics, and determining the movement direction of the animal according to the relative angles of the head and the tail of the skeleton characteristics.

Further, in an embodiment of the present invention, the distinguishing the different moving animal identity IDs according to the animal posture characteristics includes: extracting attitude characteristics under corresponding viewpoints through an ST-GCN network; combining the attitude characteristics and apparent characteristics output by a Faster RCNN network of the current rectangular frame region in a cascade mode to form multi-dimensional characteristics for distinguishing animal Identities (IDs); and predicting the animal state by adopting a Kalman filter based on the Mahalanobis distance according to the time continuity characteristic of the video sequence, and keeping the consistency of the tracking identity ID.

In order to achieve the above object, another embodiment of the present invention provides a multi-view animal population tracking device, including: the acquisition module is used for acquiring internal parameters and external parameters of the camera; the calculation module is used for solving a basic matrix according to the internal parameters and the external parameters and calculating epipolar constraints of different viewpoints; the processing module is used for removing a single-viewpoint detection error region according to the epipolar constraints of different viewpoints and detecting animal skeleton characteristics so as to estimate the animal motion direction; and the tracking module is used for distinguishing different moving animal identity IDs according to the animal posture characteristics to obtain a tracking result.

The multi-view animal population tracking device provided by the embodiment of the invention removes a single-view detection error region based on multi-view antipodal constraints, estimates the animal motion direction based on animal skeleton characteristics, and is used for distinguishing different moving animal identity IDs (identities) based on animal posture characteristics, so that the tracking accuracy and reliability are effectively improved, and the tracking device is simple and easy to implement.

In addition, the multi-view animal population tracking device according to the above embodiment of the present invention may further have the following additional technical features:

E～[T]_×R，

wherein the external parameter represents [ R T ];

F～K^-1TEK′^-1，

Further, in one embodiment of the present invention, the processing module includes: the first detection unit is used for carrying out single-viewpoint animal region detection through a Faster RCNN network to obtain an animal rectangular frame candidate region under a corresponding viewpoint; the matching unit is used for projecting the central points of the candidate regions of the rectangular frames detected by other viewpoints to the corresponding viewpoints according to the epipolar constraints of different viewpoints and matching the central points through the minimum Euclidean distance; and the removing unit is used for removing the rectangular frame area with the distance exceeding a preset threshold value.

Further, in an embodiment of the present invention, the processing module further includes: the second detection unit is used for detecting key points under corresponding viewpoints in the current rectangular frame candidate area through the HRNet network to obtain 2D key points of the animal; and the estimation unit is used for connecting according to the 2D key points to obtain the skeleton characteristics and determining the animal motion direction according to the relative angle of the head and the tail of the skeleton characteristics.

Further, in one embodiment of the present invention, the tracking module comprises: the extraction unit is used for extracting the attitude characteristics under the corresponding viewpoint through an ST-GCN network; the combination unit is used for combining the attitude characteristics and apparent characteristics output by a Faster RCNN network of the current rectangular frame region in a cascade mode to form multidimensional characteristics for animal identity ID distinguishing; and the tracking unit is used for predicting the animal state by adopting a Kalman filter based on the Mahalanobis distance according to the time continuous characteristic of the video sequence and keeping the consistency of the tracking identity ID.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a multi-view animal population tracking method according to an embodiment of the invention;

FIG. 2 is a schematic view of a deployment location of a multi-view camera system according to one embodiment of the invention;

FIG. 3 is a schematic diagram of multi-camera calibration according to one embodiment of the present invention;

FIG. 4 is a schematic antipodal line for selecting a viewpoint according to one embodiment of the invention;

FIG. 5 is a schematic diagram of multi-viewpoint animal target detection based on epipolar constraint according to one embodiment of the present invention;

FIG. 6 is a schematic diagram of removing a multi-detected target rectangular region according to other viewpoint detection results based on epipolar constraints according to an embodiment of the present invention;

FIG. 7 is a schematic illustration of a key point annotation for an animal according to one embodiment of the invention;

FIG. 8 is a schematic view of predicting a direction of movement based on an animal skeleton according to one embodiment of the invention;

fig. 9 is a block schematic diagram of a multi-view animal population tracking device according to one embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a multi-view animal population tracking method and apparatus proposed according to an embodiment of the present invention with reference to the accompanying drawings, and first, the multi-view animal population tracking method proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a flow chart of a multi-view animal population tracking method according to an embodiment of the invention.

As shown in fig. 1, the multi-view animal population tracking method includes the following steps:

in step S101, camera internal parameters and external parameters are acquired.

As shown in fig. 2, a multi-view animal population tracking system is constructed, such as 4 front, rear, left and right views are arranged at the top of the building, and the annotation points can be manually selected for multi-camera correction, as shown in fig. 3, so as to obtain camera internal parameters K and external parameters [ R T ].

In step S102, the basis matrix is solved according to the internal parameters and the external parameters and different view epipolar constraints are calculated.

E～[T]_×R，

wherein the external parameter represents [ R T ];

F～K^-1TEK′^-1，

where E denotes a fundamental matrix and K denotes internal parameters.

Specifically, a Fundamental Matrix (Fundamental Matrix) is further solved based on the internal and external parameters of the camera, and is used for calculating epipolar constraints of different viewpoints, as shown in formula 1:

F～K^-1TEK′^-1(1)，

wherein E represents a basic Matrix (Essential Matrix), and is defined as shown in the following formula 2:

E～[T]_×R (2)。

in step S103, a single-viewpoint detection error region is removed according to different-viewpoint epipolar constraints, and animal skeleton features are detected to estimate an animal motion direction.

First, because a single viewpoint is limited, when an animal population is gathered, a single viewpoint detection has a multi-detection area problem, and therefore, the embodiment of the invention solves the problem of single viewpoint detection error by using multiple viewpoints.

Further, in an embodiment of the present invention, removing a single-view detection error region according to different view epipolar constraints includes: carrying out single-viewpoint animal region detection through a fast RCNN to obtain an animal rectangular frame candidate region under a corresponding viewpoint; projecting the central points of the rectangular frame candidate areas detected by other viewpoints to the corresponding viewpoints according to the epipolar constraints of different viewpoints, and matching through the minimum Euclidean distance; and removing the rectangular frame area with the distance exceeding a preset threshold value.

Specifically, single-viewpoint animal region detection is performed based on the fast RCNN network, and an animal rectangular frame candidate region under the viewpoint is obtained. Based on the epipolar constraint relationship of different viewpoints, as shown in fig. 4, the central points of the candidate regions of the rectangular frames detected by other viewpoints are projected to the viewpoint, and are matched through the minimum euclidean distance, as shown in fig. 5. The candidate rectangular frame region detected by error from the viewpoint is farther from the central point of the rectangular frame region detected from other viewpoints in the euclidean distance, and the threshold may be set to remove the rectangular frame region detected by error from the current viewpoint, as shown in fig. 6.

And projecting the central points of the rectangular frames of the other viewpoint detection animals to the viewpoint to obtain epipolar lines of different viewpoints at the viewpoint. Firstly, calculating the intersection point of epipolar lines of different viewpoints in the viewpoint scene, and when the epipolar line intersection point is in a rectangular frame area, considering the epipolar line intersection point as a point needing to be matched, otherwise, not considering the epipolar line intersection point. The euclidean distance between the epipolar line intersection in the rectangular frame and the center point of the current rectangular frame is calculated, and the epipolar line intersection closest to the current rectangular frame is selected to obtain the one-to-one correspondence relationship between the rectangular frame and other viewpoint rectangular frames, as shown in fig. 5.

When false detection occurs in the animal motion aggregation of the viewpoint 2, a rectangular frame is detected more, and matching is repeated to another viewpoint 1 and a viewpoint 4 based on the nearest Euclidean distance matching, as shown in FIG. 6. When different rectangular frames of the viewpoint all correspond to the same rectangular frame of another viewpoint, the larger repeated matching can be removed based on the Euclidean distance, and the detected rectangular frame of the viewpoint is removed at the same time.

Fig. 6 is based on epipolar constraint, and the multi-detection target rectangular region is removed according to other viewpoint detection results. The circle with the larger radius is the center of the rectangular frame of the current viewpoint, and the circle with the smaller radius represents the intersection point of the inner polar lines of the rectangular frame. The circles with different radii and the rectangular frames have the same color, and represent corresponding relations. For example, a rectangle frame of a certain color in the left image is a false detection caused by overlapping of animals, and the detection result of mutual occlusion can be avoided by using other viewpoints for removal.

Secondly, because the animal is a non-rigid body, the motion direction of the animal is closely related according to the state of the animal, and the animal motion estimation based on the rectangular candidate region of the animal on the basis of the Kalman filter is not accurate enough, the embodiment of the invention is based on 2D skeleton motion prediction.

Further, in one embodiment of the present invention, detecting animal skeletal features to estimate animal motion direction comprises: detecting key points under corresponding viewpoints in a current rectangular frame candidate region through an HRNet network to obtain 2D key points of the animal; and connecting according to the 2D key points to obtain the skeleton characteristics, and determining the animal movement direction according to the relative angles of the head and the tail of the skeleton characteristics.

Specifically, key point detection under the viewpoint is carried out in a current rectangular frame candidate area based on an HRNet network to obtain 2D key points of the animal, the key points are connected to obtain a skeleton, and an animal movement method is determined according to the relative angle of the head and the tail of the skeleton. .

For example, with a piglet as a typical animal, 15 joint points are defined, as shown in fig. 7, which are nose, left ear, left shoulder, left front paw, left hip, left knee, left back paw, tail, right ear, right shoulder, right front paw, right hip, right knee, right back paw, body center point, in that order.

As shown in fig. 8, when a more complete key point of a single piglet is obtained, the skeleton can be obtained by connecting the joint points according to a predefined skeleton topology. Calculating the direction angle of the animal body skeleton, defining a body central point as an original point, and calculating an included angle by using three points, namely a nose point, a body weight and center point and a tail point. When the included angle is 180 degrees +/-10 degrees, the moving direction of the animal piglet is the head direction, and when the included angle is other values, the moving direction of the animal piglet is uncertain.

In step S104, different moving animal identities ID are distinguished according to the animal posture characteristics, and a tracking result is obtained.

That is, since animals have similar appearances, when animal identity discrimination is performed based on the appearance characteristics, especially when animal group movement cannot be discriminated, the embodiment of the present invention is used for discriminating different moving animal identity IDs based on the animal posture characteristics.

Further, in one embodiment of the present invention, distinguishing different moving animal identity IDs according to animal pose characteristics comprises: extracting attitude characteristics under corresponding viewpoints through an ST-GCN network; combining the attitude characteristics and apparent characteristics output by a Faster RCNN network of the current rectangular frame region in a cascade mode to form multi-dimensional characteristics for distinguishing the animal identities ID; and predicting the animal state by adopting a Kalman filter based on the Mahalanobis distance according to the time continuity characteristic of the video sequence, and keeping the consistency of the tracking identity ID.

Specifically, 128-dimensional posture features under the viewpoint are extracted based on an ST-GCN network, and 256-dimensional features are combined in cascade mode to distinguish animal Identities (IDs) based on the features and 128-dimensional apparent features output by a Faster RCNN network of a current rectangular frame region. Considering the time continuity characteristic of a video sequence, predicting the animal state and keeping the consistency of the tracking identity ID by adopting a Kalman filter based on the Mahalanobis distance, wherein the calculation of the Mahalanobis distance is shown as a formula 3:

where d (i, j) denotes the i-th frame tracking measurement space (y)_i,S_i) Mahalanobis distance from the jth detection rectangle. According to the Mahalanobis distance, the detection areas of the animal piglets are registered in the time domain, and the consistency of tracked IDs of the piglets is kept, so that the problems of limited view angle, inaccurate non-rigid motion prediction and identity confusion of the traditional video single-viewpoint tracking method during animal group tracking are solved.

According to the multi-view animal population tracking method provided by the embodiment of the invention, the error detection result under a single view is removed from the detection results of different views based on the epipolar constraint, the animal is considered to be a non-rigid body, the motion of the animal is related to the current posture, accurate motion prediction is carried out based on the skeleton, in order to keep the consistency of the animal tracking identity ID, the distinguishing difficulty caused by the similarity of animal appearance is overcome, the animal posture characteristics are increased, the Markov distance is adopted to carry out time domain registration based on Kalman filtering, and the robustness is strong, so that the accuracy and reliability of tracking are effectively improved, and the method is simple and easy to implement.

Next, a multi-viewpoint animal population tracking apparatus proposed according to an embodiment of the present invention is described with reference to the accompanying drawings.

Fig. 9 is a schematic structural diagram of a multi-view animal population tracking device according to an embodiment of the present invention.

As shown in fig. 9, the multi-viewpoint animal population tracking apparatus 10 includes: an acquisition module 100, a calculation module 200, a processing module 300 and a tracking module 400.

Specifically, the acquiring module 100 is configured to acquire internal parameters and external parameters of the camera.

And the calculating module 200 is used for solving the basic matrix according to the internal parameters and the external parameters and calculating the epipolar constraints of different viewpoints.

And the processing module 300 is configured to remove a single-viewpoint detection error region according to the epipolar constraints of different viewpoints and detect animal skeleton features to estimate the animal motion direction.

And the tracking module 400 is used for distinguishing different moving animal identity IDs according to the animal posture characteristics to obtain a tracking result.

E～[T]_×R，

wherein the external parameter represents [ R T ];

F～K^-1TEK′^-1，

where E denotes a fundamental matrix and K denotes internal parameters.

Further, in one embodiment of the present invention, the processing module 300 includes: the device comprises a first detection unit, a matching unit and a removal unit.

The first detection unit is used for carrying out single-viewpoint animal region detection through a fast RCNN network to obtain an animal rectangular frame candidate region under a corresponding viewpoint.

And the matching unit is used for projecting the central points of the rectangular frame candidate areas detected by other viewpoints to the corresponding viewpoints according to the epipolar constraints of different viewpoints, and matching through the minimum Euclidean distance.

And the removing unit is used for removing the rectangular frame area with the distance exceeding a preset threshold value.

Further, in an embodiment of the present invention, the processing module 300 further includes: a second detection unit and an estimation unit.

And the second detection unit is used for detecting the key points under the corresponding view points in the current rectangular frame candidate area through the HRNet network to obtain the 2D key points of the animal.

And the estimation unit is used for connecting according to the 2D key points to obtain the skeleton characteristics and determining the animal motion direction according to the relative angle of the head and the tail of the skeleton characteristics.

Further, in one embodiment of the present invention, the tracking module 400 includes: an extraction unit, a combination unit and a tracking unit.

The extraction unit is used for extracting the attitude characteristics under the corresponding viewpoint through an ST-GCN network.

And the combination unit is used for combining the attitude characteristics and the apparent characteristics output by the Faster RCNN network of the current rectangular frame region in a cascading manner to form multi-dimensional characteristics for animal identity ID distinguishing.

And the tracking unit is used for predicting the animal state by adopting a Kalman filter based on the Mahalanobis distance according to the time continuous characteristic of the video sequence and keeping the consistency of the tracking identity ID.

It should be noted that the explanation of the embodiment of the multi-view animal population tracking method is also applicable to the multi-view animal population tracking device of the embodiment, and is not repeated here.

According to the multi-view animal population tracking device provided by the embodiment of the invention, different view detection results are subjected to elimination of error detection results under a single view based on epipolar constraint, the animal is considered to be a non-rigid body, the motion of the animal is related to the current posture, accurate motion prediction is carried out based on a framework, in order to keep the consistency of animal tracking identity ID, the distinguishing difficulty caused by the similarity of animal appearance is overcome, the animal posture characteristics are increased, the Markov distance is adopted for time domain registration based on Kalman filtering, and the multi-view animal population tracking device has stronger robustness, so that the accuracy and reliability of tracking are effectively improved, and the multi-view animal population tracking device is simple.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A multi-view animal population tracking method is characterized by comprising the following steps:

acquiring internal parameters and external parameters of a camera;

solving a basic matrix according to the internal parameters and the external parameters and calculating epipolar constraints of different viewpoints;

removing a single-viewpoint detection error region according to the epipolar constraints of the different viewpoints, and detecting animal skeleton characteristics to estimate the animal motion direction; and

and distinguishing different moving animal Identities (IDs) according to the animal posture characteristics to obtain a tracking result.

2. The method of claim 1, wherein the base matrix is calculated by:

E～[T]_×R，

wherein the external parameter represents [ R T ];

F～K^-1TEK′^-1，

3. The method of claim 1, wherein removing single-view detection error regions according to the different view epipolar constraints comprises:

carrying out single-viewpoint animal region detection through a fast RCNN to obtain an animal rectangular frame candidate region under a corresponding viewpoint;

projecting the central points of the rectangular frame candidate areas detected by other viewpoints to the corresponding viewpoints according to the epipolar constraints of the different viewpoints, and matching through the minimum Euclidean distance;

and removing the rectangular frame area with the distance exceeding a preset threshold value.

4. The method of claim 1, wherein the detecting animal skeletal features to estimate animal motion direction comprises:

detecting key points under corresponding viewpoints in a current rectangular frame candidate region through an HRNet network to obtain 2D key points of the animal;

and connecting according to the 2D key points to obtain the skeleton characteristics, and determining the movement direction of the animal according to the relative angles of the head and the tail of the skeleton characteristics.

5. The method of claim 1, wherein distinguishing between different moving animal identity IDs based on animal pose characteristics comprises:

extracting attitude characteristics under corresponding viewpoints through an ST-GCN network;

combining the attitude characteristics and apparent characteristics output by a Faster RCNN network of the current rectangular frame region in a cascade mode to form multi-dimensional characteristics for distinguishing animal Identities (IDs);

and predicting the animal state by adopting a Kalman filter based on the Mahalanobis distance according to the time continuity characteristic of the video sequence, and keeping the consistency of the tracking identity ID.

6. A multi-viewpoint animal population tracking device, comprising:

the acquisition module is used for acquiring internal parameters and external parameters of the camera;

the calculation module is used for solving a basic matrix according to the internal parameters and the external parameters and calculating epipolar constraints of different viewpoints;

the processing module is used for removing a single-viewpoint detection error region according to the epipolar constraints of different viewpoints and detecting animal skeleton characteristics so as to estimate the animal motion direction; and

and the tracking module is used for distinguishing different moving animal identity IDs according to the animal posture characteristics to obtain a tracking result.

7. The apparatus of claim 6, wherein the calculation formula of the base matrix is:

E～[T]_×R，

wherein the external parameter represents [ R T ];

F～K^-1TEK′^-1，

8. The apparatus of claim 6, wherein the processing module comprises:

the first detection unit is used for carrying out single-viewpoint animal region detection through a Faster RCNN network to obtain an animal rectangular frame candidate region under a corresponding viewpoint;

the matching unit is used for projecting the central points of the candidate regions of the rectangular frames detected by other viewpoints to the corresponding viewpoints according to the epipolar constraints of different viewpoints and matching the central points through the minimum Euclidean distance;

9. The apparatus of claim 8, wherein the processing module further comprises:

the second detection unit is used for detecting key points under corresponding viewpoints in the current rectangular frame candidate area through the HRNet network to obtain 2D key points of the animal;

10. The apparatus of claim 6, wherein the tracking module comprises:

the extraction unit is used for extracting the attitude characteristics under the corresponding viewpoint through an ST-GCN network;

the combination unit is used for combining the attitude characteristics and apparent characteristics output by a Faster RCNN network of the current rectangular frame region in a cascade mode to form multidimensional characteristics for animal identity ID distinguishing;