CN110569719B

CN110569719B - Animal head posture estimation method and system

Info

Publication number: CN110569719B
Application number: CN201910698158.3A
Authority: CN
Inventors: 黄章进; 汪方军; 贺翔翔; 邹露
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2022-05-17
Anticipated expiration: 2039-07-30
Also published as: CN110569719A

Abstract

A method and a system for estimating the head posture of an animal are provided, the method comprises the following steps: carrying out target detection processing on the animal picture, and marking a first detection frame of the head of the animal on the animal picture; marking a second detection frame of the head of the animal according to the first detection frame, and recording the position coordinates of the second detection frame, wherein the second detection frame comprises the whole head of the animal; extracting the characteristics of the animal head within the range defined by the second detection frame to obtain a plurality of key point position coordinates corresponding to the animal head; and obtaining the posture of the animal head according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points. The method and the system perform attitude estimation in the head region instead of the whole picture, can effectively reduce the calculated amount and input information redundancy, and improve the accuracy and speed of zebra fish head attitude estimation.

Description

Animal head posture estimation method and system

Technical Field

The invention relates to the field of image processing, in particular to an animal head posture estimation method and system.

Background

The method controls and records the activity of the whole brain neurons under the free behavior of the animal, and has important significance for analyzing the corresponding relation between the behavior of the animal and the brain activity. Such as zebrafish, whose whole genome has 87% similarity to the human genome, are often used to explore the relationship between neuronal populations and behavior. In order to realize the optogenetic experiment under the free behavior of animals (zebra fish), the head images of the animals need to be subjected to posture estimation, so that the heads of the animals are quantized.

At present, the method for quantifying the head of the zebra fish mainly comprises a template matching-based method, an HOG feature-based method and a deep learning-based method.

The template matching method is based on the main idea that a region template is designed to locate a specific region, left eye, right eye and central template images required by template matching are firstly set, then reasonable threshold values are set, binarization is carried out on the images, and finally a target is tracked in an ROI region according to the set template so as to achieve the purpose of quantization. The algorithm has high requirements on the setting of the template, no universality, particular sensitivity to environment and noise, and poor robustness, flexibility and stability.

The method based on the HOG features mainly comprises the steps of designing a head feature description according to head features and imaging features of the head feature description, then combining a global matching algorithm, different from other template-based matching algorithms, extracting candidate points through a fast algorithm, and then performing filtering processing, so that the head of the zebra fish can be quickly tracked. The method belongs to the process of artificially designing features, and is used for counting bottom-layer features, only effectively expressing features such as edges and the like, and not expressing the features of high-layer semantic information.

The deep learning-based method currently comprises DeepLabCut, which quantifies animal behaviors by performing posture estimation on a user-defined animal body part. And adopting ResNet as a base frame by the DeepLabcut, extracting high-level semantic features, performing deconvolution on the features to obtain thermodynamic diagrams and offsets, and predicting the positions of key points by combining the thermodynamic diagrams and the offsets. But it uses the entire image as input, resulting in computational and information redundancy throughout the process, causing bottlenecks in speed and accuracy.

Disclosure of Invention

Technical problem to be solved

In view of the prior art, the present invention provides a method and a system for estimating a head pose of an animal, which are used to at least partially solve the above technical problems.

(II) technical scheme

The invention provides an animal head posture estimation method, which comprises the following steps: s1, carrying out target detection processing on the animal picture, and marking a first detection frame of the animal head on the animal picture; s2, marking a second detection frame of the animal head according to the first detection frame, and recording a position coordinate of the second detection frame, wherein the second detection frame comprises the whole head of the animal; s3, extracting the features of the animal head within the range defined by the second detection frame to obtain a plurality of key point position coordinates corresponding to the animal head; and S4, obtaining the posture of the animal head according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points.

Optionally, the first detection frame for marking the head of the animal on the animal picture comprises: and recording the coordinates, the length and the width of the two pairs of corner points of the first detection frame and the confidence coefficient of the head of the animal.

Optionally, the second detection frame marking the head of the animal according to the first detection frame comprises: mapping the animal picture marked with the first detection frame back to the original animal picture, extracting a corresponding region of the first detection frame, and inputting the corresponding region of the first detection frame into a regression optimization neural network to obtain an offset required when the first detection frame contains the whole head of the animal; calculating the coordinates, the length and the width of the two pairs of corner points of the second detection frame according to the offset and the coordinates, the length and the width of the two pairs of corner points of the first detection frame; and marking the second detection frame of the animal head on the animal picture according to the coordinates, the length and the width of the two pairs of corner points of the second detection frame.

Optionally, the animal picture is input into a Micro-YOLO neural network for target detection processing, wherein the Micro-YOLO neural network at least includes a convolutional layer.

Optionally, within the range defined by the second detection frame, the performing feature extraction on the animal head comprises: and mapping the animal picture marked with the second detection frame back to the original animal picture, extracting a region corresponding to the second detection frame, inputting the region corresponding to the second detection frame into an hourglass neural network, and extracting the characteristics of the head of the animal, wherein the hourglass neural network at least comprises a convolution layer and a pooling layer.

Optionally, the formula for calculating the coordinates, the length, and the width of the two pairs of corner points of the second detection frame is as follows:

x_c＝(x₁+x₂)/2，y_c＝(y₁+y₂)/2

b_w＝x₂-x₁，b_h＝y₂-y₁

x_c′＝x_c+t_x·b_w，y_c′＝y_c+y_y·b_h

wherein (x)₁，y₁) And (x)₂，y₂) Two pairs of angular coordinates for the first detection box, (x)_c，y_c) As the center coordinates of the first detection frame, b_wIs the length of the first detection frame, b_hWidth of the first detection frame, (x)_c′，y_c') center coordinates of the second detection frame, b_w' is the length of the second detection frame, b_h' is the width of the second detection frame, (t)_x，t_y，t_w，t_h) Is an offset.

Optionally, inputting the corresponding region of the second detection box into the hourglass neural network, and performing feature extraction on the animal head includes: downsampling the area corresponding to the second detection frame to obtain a first feature map corresponding to the head of the animal; upsampling the first characteristic diagram to obtain a second characteristic diagram; fusing the second characteristic diagram in a jumping connection mode to obtain a thermal point diagram corresponding to the head of the animal; and (4) obtaining the position coordinates of the maximum activation value of each heat point in the heat point diagram to obtain the position coordinates of a plurality of key points corresponding to the head of the animal.

Optionally, the upsampling employs a bilinear interpolation method.

In another aspect, the present invention provides a system for estimating the head pose of an animal, comprising: the first processing module is used for carrying out target detection processing on the animal picture and marking a first detection frame of the head of the animal on the animal picture; the second processing module is used for marking a second detection frame of the head of the animal according to the first detection frame and recording the position coordinates of the second detection frame, wherein the second detection frame comprises the whole head of the animal; the characteristic extraction module is used for extracting the characteristics of the animal head within the range defined by the second detection frame to obtain a plurality of key point position coordinates corresponding to the animal head; and the calculation module is used for obtaining the posture of the animal head according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points.

Optionally, the first processing module performs target detection processing on the animal picture by using a Micro-YOLO neural network; the second processing module marks a second detection frame of the animal head according to the first detection frame by adopting a regression optimization neural network; the feature extraction module adopts a hourglass neural network to extract features of the animal head.

(III) advantageous effects

The invention provides an animal head posture estimation method and system, which are characterized in that a zebra fish head region is detected firstly, and then posture estimation is carried out on the head region instead of the whole picture, so that the calculated amount and the input information redundancy can be effectively reduced, and the accuracy and the speed of zebra fish head posture estimation are improved.

Drawings

Fig. 1 schematically shows a flowchart of an animal head pose estimation method provided by an embodiment of the present invention.

Fig. 2 schematically shows a detection block diagram of a zebra fish head marker according to an embodiment of the present invention.

FIG. 3 schematically shows a Fast regressNet network architecture diagram provided by an embodiment of the invention.

FIG. 4 is a schematic diagram of a network architecture of the Tiny Hourglass provided by the embodiment of the present invention

Fig. 5 schematically shows a block diagram of an animal head pose estimation system provided by an embodiment of the invention.

Fig. 6 schematically shows a result diagram of posture estimation and quantization of the zebra fish head by using the method and system for estimating the animal head posture of the embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments and the accompanying drawings.

Referring to fig. 1, fig. 1 schematically shows a flowchart of a method for estimating a head pose of an animal according to an embodiment of the present invention, which is described in detail below by taking zebra fish as an example. As shown in fig. 1, the method includes:

and S1, carrying out target detection processing on the animal picture, and marking a first detection frame of the animal head on the animal picture.

The purpose of operation S1 is to quickly obtain a zebrafish head region that is not particularly precise and likely does not fully contain the zebrafish head. The method adopts a first-order detection neural network to carry out target detection processing on the whole zebra fish picture. In this embodiment, a Micro-YOLO neural network is adopted, the Micro-YOLO neural network at least includes a convolutional layer, and the invention is not limited to the specific detection of the neural network type.

Specifically, the zebra picture is input into a Micro-YOLO neural network, and a first detection frame (x) of the zebra fish head is finally obtained according to network output after sequential forward propagation₁，y₁，x₂，y₂) Wherein (x)₁，y₁) Represents the coordinates of the upper left corner of the first detection box, (x)₂，y₂) Representing the coordinates of the lower right corner of the first detection box.

The specific network architecture of Micro-YOLO is shown in table 1, the input picture is 224 × 224 × 1, and is divided into 7 × 7 grids, and each grid predicts 2 first detection boxes. Each first detection frame comprises 5 predicted values which are respectively a center point coordinate (coordinates of two pairs of corner points), a length and a width of the first detection frame and a confidence coefficient comprising the head of the animal, and meanwhile, each grid needs to predict a category confidence coefficient. Therefore, the input image passes through 6 convolutional layers and 3 fully-connected layers, and finally a vector of dimension 7 × 7 × (2 × 5+1) ═ 539 is obtained. The location of the first detection frame of the head can be obtained by a 539-dimensional vector. The detection result of Micro-YOLO is shown in fig. 2, where the frame indicated by the reference numeral 1 in fig. 2 represents the first detection frame, and the difference is large compared to the real frame (the frame indicated by the reference numeral 3) that completely surrounds the head of the zebra fish, and the zebra fish head is not completely contained.

TABLE 1

And S2, marking a second detection frame of the animal head according to the first detection frame, and recording the position coordinates of the second detection frame, wherein the second detection frame comprises the whole head of the animal.

The purpose of operation S2 is to perform regression optimization on the zebra fish head region (first detection box) obtained in the first stage to obtain position regression coordinate values, and optimize the region (first detection box) obtained in the first stage into a more accurate zebra fish head region (second detection box). The method used is to input the animal picture marked with the first detection frame into a regression optimization neural network for elaboration.

Specifically, this embodiment adopts Fast RegressNet to detect the first detection frame (x)₁，y₁， x₂，y₂) And (6) optimizing. The general idea is to map the detection frame obtained in operation S1 back to the original image, extract the corresponding area, and scale the size to the image block R1 of 56 × 56 × 1. Then, the image block R1 is input into Fast regressNet, so that a more accurate second detection frame (x) is obtained₁′，y₁′，x₂′，y₂') the second detection frame contains the entire area of the zebrafish head.

The Fast RegressNet network architecture is shown in fig. 4, with an input size of 56 × 56 × 1. 4 convolution layers and 2 full-connection layers are passed through to finally obtain 6-dimensional vector (t)_x，t_y，t_w，t_h，c₁，c₂)。 (t_x，t_y，t_w，t_h) Represents the amount of offset required for regression of the first test frame (so that the first test frame encompasses the entire head of the animal), (c)₁，c₂) Representing a picture background probability of c₁Probability of zebrafish being c₂. Calculating the coordinates, the length and the width of the two pairs of corner points of the second detection frame according to the offset and the coordinates, the length and the width of the two pairs of corner points of the first detection frame; and marking the second detection frame of the animal head on the animal picture according to the coordinates, the length and the width of the two pairs of corner points of the second detection frame. The coordinates (x) of the center point of the optimized second detection frame_c′，y_c') and length and width (b)_w′，b_h') can be calculated by the following formula:

x_c＝(x₁+x₂)/2，y_c＝(y₁+y₂)/2

b_w＝x₂-x₁，b_h＝y₂-y₁

x_c′＝x_c+t_x·b_w，y_c′＝y_c+y_v·b_h

wherein (x)₁，y₁) And (x)₂，y₂) Two diagonal coordinates of the first detection frame, (x)_c，y_c) As the center coordinates of the first detection frame, b_wIs the length of the first detection frame, b_hWidth of the first detection frame, (x)_c′，y_c') center coordinates of the second detection frame, b_w' is the length of the second detection frame, b_h' is the width of the second detection frame, (t)_x，t_y，t_w，t_h) Is an offset. Similarly, by the center coordinate (x) of the second detection frame_c′，y_c') and length and width (b)_w′，b_h') determine coordinates (x) of the upper left corner of the second detection frame₁′，y₁) And the coordinates of the lower right corner (x)₂′，y₂′)。

The regression optimization result of Fast RegressNet is shown in fig. 2, and the frame indicated by the reference numeral 2 in fig. 2 represents a second detection frame, and it can be seen that the difference between the second detection frame and a real green rectangular frame (the frame indicated by the reference numeral 3) which completely surrounds the head of the zebra fish is small, and the head of the zebra fish is completely surrounded.

And S3, extracting the features of the animal head within the range defined by the second detection frame to obtain the position coordinates of a plurality of key points corresponding to the animal head.

The purpose of operation S3 is to obtain the position of the key point of the animal head for pose estimation by performing feature extraction on the animal head local area contained in the second detection frame.

Specifically, the Hourglass neural network (Tiny Hourglass) used in this embodiment is a network structure of the zebra fish head region obtained in operation S2, and the network structure is shown in fig. 4. The general idea is to map the second detection frame obtained in operation S2 back to the original image, extract the corresponding area, and scale the size to image block R2 of 56 × 56 × 1. Then, R2 is input into the Tiny Hourglass to obtain the position coordinates (P) of a plurality of key points₀，P₁，…， P_n). The number of the specific key points is determined according to actual requirements, the invention is not limited, and the positions of the key points of 21 zebra fish heads are obtained in the embodiment.

To speed up the reasoning process, the input resolution is reduced to 56 × 56. The Tiny Hourglass uses a convolutional layer and a max pooling layer to process the features, down-samples the input to 7 x 7 to obtain a first feature map, then up-samples the first feature map to the original input size using a fully symmetric structure to obtain a second feature map, and the up-sampling method is bilinear interpolation. In order to better extract the features, the second feature maps obtained by upsampling are fused through jump connection, and finally 21 thermodynamic point maps are obtained, which represent 21 key point positions. All convolution operations in the Tiny Hourglass use convolution kernels of size 3X 64. And respectively taking the position coordinates of the maximum activation value of the 21 heat point diagrams to obtain the position coordinates of 21 key points.

And S4, obtaining the posture of the animal head according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points.

Finally, combining the position coordinates of each key point with a second detection frame (x)_l′，y₁′，x₂′，y₂') obtaining the position coordinates of the zebra fish head in the original image, and obtaining the posture of the zebra fish head.

The animal head posture estimation method provided by the embodiment carries out posture estimation on the zebra fish head instead of the whole picture, can effectively reduce the calculated amount and input information redundancy, and improves the accuracy and speed of zebra fish head posture estimation.

Referring to fig. 5, fig. 5 schematically shows a block diagram of an animal head pose estimation system provided by an embodiment of the present invention, including:

the first processing module 510 is configured to perform target detection processing on the animal picture, and mark a first detection frame of the head of the animal on the animal picture;

the second processing module 520 is configured to mark a second detection frame of the head of the animal according to the first detection frame, and record a position coordinate of the second detection frame, where the second detection frame includes the entire head of the animal;

the feature extraction module 530 is configured to perform feature extraction on the animal head within a range defined by the second detection frame to obtain a plurality of key point position coordinates corresponding to the animal head;

and the calculating module 540 is configured to obtain the posture of the head of the animal according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points.

The first processing module 510 performs target detection processing on the animal picture by using a Micro-YOLO neural network; the second processing module 520 marks a second detection frame of the animal head according to the first detection frame by adopting a regression optimization neural network; the feature extraction module 530 performs feature extraction on the head of the animal by using the hourglass neural network.

Please refer to the first embodiment of the present invention for details of the present embodiment.

The network model framework built by the animal head posture estimation system provided by the embodiment is smaller, the characteristic extraction capability is strong, and the accuracy of zebra fish head posture estimation is improved.

Referring to fig. 6, fig. 6 schematically shows a result diagram of posture estimation and quantization of the zebra fish head by using the method and system for estimating the animal head posture of the embodiment of the invention. Picture a represents a zebra fish original picture, wherein roi (region of interest) represents the zebra fish head region; the picture b represents the definition of the zebra fish head pose estimation, and the zebra fish head pose estimation totally comprises 21 key points; the picture c represents the result of ellipse fitting and straight line fitting of the key points, and the picture d represents the quantized result of the zebra fish head, including the sight line direction of both eyes and the head median line direction.

The posture of the zebra fish is estimated by adopting the animal head posture estimation system and the currently best user-defined posture estimation method DeepLabCut, and the comparison result is shown in Table 2.

TABLE 2

As can be seen from Table 2, the method and the system provided by the invention are superior to DeepLabCut in the speed and the accuracy of attitude estimation.

In conclusion, the invention firstly detects the head area of the zebra fish, then carries out attitude estimation on the zebra fish head instead of the whole picture, has smaller built network model frame and strong capability of extracting characteristics, can effectively reduce the calculated amount and input information redundancy, improves the accuracy and speed of the attitude estimation of the zebra fish head, and is suitable for the quantification of the zebra fish head.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for estimating the head pose of an animal, comprising:

s1, carrying out target detection processing on the animal picture, marking a first detection frame of the animal head on the animal picture, and recording the coordinates, the length and the width of two pairs of corner points of the first detection frame and the confidence coefficient containing the animal head;

s2, marking a second detection frame of the animal head according to the first detection frame, and recording the position coordinates of the second detection frame, wherein the animal picture marked with the first detection frame is mapped back to the original animal picture, the corresponding region of the first detection frame is extracted, the corresponding region of the first detection frame is reduced and then input into a regression optimization neural network, and the offset required when the first detection frame comprises the whole head of the animal is obtained; calculating the coordinates, the length and the width of the two pairs of corner points of the second detection frame according to the offset and the coordinates, the length and the width of the two pairs of corner points of the first detection frame; marking a second detection frame of the animal head on the animal picture according to the coordinates, the length and the width of the two pairs of corner points of the second detection frame; the second detection frame comprises the entire head of the animal;

s3, extracting the features of the animal head within the range defined by the second detection frame to obtain a plurality of key point position coordinates corresponding to the animal head;

s4, obtaining the posture of the animal head according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points;

the formula for calculating the coordinates, the length and the width of the two pairs of corner points of the second detection frame is as follows:

wherein (A), (B), (C), (D), (C), (B), (C)x ₁，y ₁) And (a)x ₂，y ₂) For two diagonal coordinates of said first detection frame, ((s))x _c，y _c) Is the center coordinate of the first detection frame,b _wis the first detectionThe length of the frame is such that,b _hthe width of the first detection frame, ((ii))x _c′，y _c') is the center coordinates of the second detection frame,b _w' is the length of the second detection frame,b _h' is the width of the second detection frame,

is the offset.

2. The method of claim 1, wherein the animal picture is input into a Micro-YOLO neural network for target detection processing, wherein the Micro-YOLO neural network comprises at least a convolutional layer.

3. The method for estimating the head pose of the animal according to claim 1, wherein said extracting the features of the head of the animal within the range defined by the second detection frame comprises:

and mapping the animal picture marked with the second detection frame back to the original animal picture, extracting a region corresponding to the second detection frame, inputting the region corresponding to the second detection frame into an hourglass neural network, and extracting the characteristics of the head of the animal, wherein the hourglass neural network at least comprises a convolution layer and a pooling layer.

4. The method for estimating the head pose of the animal according to the claim 3, wherein the inputting the corresponding area of the second detection frame into the hourglass neural network, and the extracting the features of the head of the animal comprises:

downsampling the area corresponding to the second detection frame to obtain a first feature map corresponding to the head of the animal;

the first characteristic diagram is up-sampled to obtain a second characteristic diagram;

fusing the second characteristic diagram in a jumping connection mode to obtain a thermal point diagram corresponding to the head of the animal;

and obtaining the position coordinates of the maximum activation value of each heat point in the heat point diagram to obtain the position coordinates of a plurality of key points corresponding to the animal head.

5. The animal head pose estimation method of claim 4, wherein the upsampling employs a bilinear interpolation method.

6. An animal head pose estimation system, comprising:

the first processing module is used for carrying out target detection processing on the animal picture and marking a first detection frame of the head of the animal on the animal picture;

the second processing module is used for marking a second detection frame of the head of the animal according to the first detection frame and recording the position coordinates of the second detection frame, wherein the second detection frame comprises the whole head of the animal, and formulas for calculating the coordinates, the length and the width of two pairs of corner points of the second detection frame are as follows:

wherein (A), (B), (C), (D), (C), (B), (C)x ₁，y ₁) And (a)x ₂，y ₂) For two diagonal coordinates of said first detection frame, ((s))x _c，y _c) Is the center coordinate of the first detection frame,b _wis the length of the first detection frame,b _hthe width of the first detection frame, ((ii))x _c′，y _c′）Is the center coordinate of the second detection frame,b _w' is the length of the second detection frame,b _h' is the width of the second detection frame,

is the offset;

the feature extraction module is used for extracting features of the animal head within the range defined by the second detection frame to obtain a plurality of key point position coordinates corresponding to the animal head;

and the calculation module is used for obtaining the posture of the animal head according to the position coordinates of the second detection frame and the position coordinates of the plurality of key points.

7. The system for estimating the head pose of the animal according to claim 6, wherein the first processing module performs a target detection process on the animal picture by using a Micro-YOLO neural network;

the second processing module marks a second detection frame of the animal head according to the first detection frame by adopting a regression optimization neural network;

the feature extraction module adopts a hourglass neural network to extract features of the animal head.