CN113065506B

CN113065506B - Human body posture recognition method and system

Info

Publication number: CN113065506B
Application number: CN202110411237.9A
Authority: CN
Inventors: 江沛远; 周余; 于耀; 都思丹
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2023-12-26
Anticipated expiration: 2041-04-16
Also published as: CN113065506A

Abstract

The invention discloses a human body posture recognition method, which comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.

Description

Human body posture recognition method and system

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and a system for recognizing human body gestures.

Background

3D human body pose estimation refers to estimating the pose of a human target from an image, video or point cloud, and is a fundamental task in 3D research around the human body. The 3D human body posture estimation is an important precondition for 3D human body reconstruction and can also be an important source of motion in human body motion driving. Currently, there are two main ways of obtaining the body posture. 1. By training a specific data set, a neural network is used for realizing the aim of estimating the human body posture under the scene. This method requires a large amount of artificial marker data to train the neural network, and at the same time, the accuracy of the method is low. 2. And establishing a human body model, and enabling the model to fit the human body on the picture. This method relies on the creation of a mannequin, which is relatively complex and slow. Therefore, how to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation is a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a human body posture recognition method and system, so as to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation.

In order to achieve the above object, the present invention provides the following solutions:

the invention provides a human body posture recognition method, which comprises the following steps:

acquiring a plurality of human body images under different visual angles of a current frame;

according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm;

clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;

establishing a two-dimensional pixel block model of each pixel block;

optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;

and determining the human body posture of the current frame according to the optimized three-dimensional human body model.

Optionally, the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model further includes:

predicting the position coordinates of each body part at the current frame according to the position coordinates of each body part at each frame within the preset frame before the current frame,

and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.

Optionally, the building a three-dimensional human body model according to the plurality of human body images by adopting a convolutional neural network algorithm specifically includes:

obtaining K key points from each human body image by adopting a horglass network structure;

respectively enabling k=1, 2, … and K, and calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of a camera to obtain a plurality of back projection rays corresponding to each key point;

determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in a three-dimensional space by taking the coordinates of the key point in the three-dimensional space;

and carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.

Optionally, the three-dimensional manikin is:

wherein A (x) is a three-dimensional human body model, A _j (x) Three-dimensional manikin of jth human body part, mu _j Representing the coordinates, sigma, of the jth human body part _j The radius of the jth human body part is represented, and x represents the position of any point of the human body.

Optionally, the two-dimensional pixel block model is:

B _i (x) A two-dimensional pixel block model representing the ith pixel block, c _i Representing the ith pixel blockColor of center, mu _i Representing the coordinates, delta, of the center of the ith pixel block _i Representing half length of ith pixel block, x _i Representing the projection of the position x of any point on the human body on the ith pixel block.

Optionally, the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model specifically includes:

calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;

judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result;

if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;

if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.

Optionally, the calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model, as the objective function value, specifically includes:

calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:

wherein E is _ij Jth representing three-dimensional manikinSimilarity of model of human body part and ith pixel block model, d (c) _i ,c _j ) Color c of model representing jth human body part _j Color c with the ith pixel block _i Is similar to degree B _i (x) Two-dimensional pixel block model representing the ith pixel block, A _j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ _i Representing the coordinates, delta, of the center of the ith pixel block _i Representing the half length of the ith pixel block,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;

w is the penalty value of the color, ε is the color threshold, μ _jx 、μ _jy 、μ _jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part _j Represents the radius of the jth body part, f _il Representing parameters of the camera that obtained the i-th pixel block.

And calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.

A human gesture recognition system, the recognition system comprising:

the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;

the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;

the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;

the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block;

the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;

and the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model.

Optionally, the identification system further comprises:

the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame;

the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.

Optionally, the three-dimensional mannequin building module specifically includes:

the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure;

the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point;

the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space;

and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

The invention predicts the result of the current frame by utilizing the continuity of human body actions and the result of the previous frames, reduces the iteration times in the optimization process and further improves the speed of human body posture estimation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a human body gesture recognition method provided by the invention;

FIG. 2 is a schematic diagram of a human body posture recognition method provided by the invention;

fig. 3 is a diagram of an arrangement of cameras for acquiring multiple images of a human body at different viewing angles according to the present invention;

FIG. 4 is a three-dimensional manikin provided by the present invention;

FIG. 5 is a diagram of a human body model composed of a plurality of two-dimensional pixel block models provided by the invention;

FIG. 6 is a view of an optimized three-dimensional manikin provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

The invention relates to a human body posture estimation method without manually marking data or sensor data. Collecting color images of a target person at different angles in a fixed scene through a plurality of color cameras, obtaining coarse-precision human body key point coordinates by using a pre-trained convolutional neural network, and obtaining human body three-dimensional coordinates by using a back projection technology; and generating a general three-dimensional human body model by using the three-dimensional human body coordinates, calculating the similarity between the three-dimensional human body model and a human body in an image, and optimizing the human body model by combining the consistency of a plurality of visual angles in the three-dimensional world as constraint to obtain the final human body posture. The calculation process is performed on the CUDA, parameters of the human body model are constrained, calculation can be completed in constant time, 25 frames of images are processed per second, and videos can be processed in real time. The invention only needs color images as input, does not need manual operation or extra sensor equipment, and can be widely applied in the field of human body gesture acquisition.

As shown in fig. 1 and 2, the present invention provides a human body posture recognition method, which includes the steps of:

step 101, acquiring a plurality of human body images under different visual angles of a current frame.

A plurality of color cameras are used, located at different locations within the room, to obtain a sequence of successive, synchronized color images of the human body from multiple perspectives. All color cameras are controlled by a sync box.

The color cameras are circumferentially distributed to obtain human body information at different angles, and a typical eight-camera array is shown in fig. 3. The synchronous box sends out square wave signals, and when the camera receives the signals, the signals are shot at the same time, so that the human body information at the same moment is collected. The color information of the human body clothes is enriched as much as possible, so that the human body and the background object can be distinguished, and meanwhile, different parts of the human body can be identified more easily. The background information should be as simple as possible and the resulting body posture more robust.

And 102, building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images.

And according to the joint point coordinates output by the first frame neural network, back-projecting to obtain three-dimensional human body coordinates, and further generating a rough-precision gesture conforming to the human body in the image. The body model should describe the information of the fat, thin, height, characteristic color, etc. of the human body as much as possible. In a body model of a three-dimensional set of gaussian functions, the mean and variance of each gaussian function are used for description. In order to avoid overfitting caused by high degrees of freedom, L is used for describing the length of the human body trunk, R is used for describing the width of the human body trunk, and then the mean value and the variance of each Gaussian function are obtained through calculation, so that the degrees of freedom can be greatly reduced, and a better generalization effect is obtained. The present invention adjusts this model to generate a actor-specific body model that generally represents the shape and color of each gaussian.

This step uses the neural network horglass to extract 16 human keypoints (x) from each photograph for 8 photographs input at different perspectives _i ,y _i ). For 8 views of the same key point, based on camera parameters, 8 rays of the back projection are calculated. Using a least square method, a common point (x _i ,y _i ,z _i ) The common point is the coordinates of the key point in three-dimensional space. Three-dimensional coordinates (x) _i ,y _i ,z _i ) By linear interpolation, a three-dimensional mannequin having three-dimensional coordinates of 63 human body parts is obtained:

The method comprises the following specific steps:

and obtaining K key points from each human body image by adopting a horglass network structure.

Let k=1, 2, …, K respectively, calculate the back projection ray in the human body image of the kth key point under each view angle based on the parameters of the camera, obtain a plurality of back projection rays corresponding to each key point.

And determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in the three-dimensional space by taking the coordinates of the common point as the coordinates of the key point in the three-dimensional space.

And (4) performing linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body, as shown in fig. 4.

And step 103, clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks.

For each view angle picture, firstly, according to the optimized three-dimensional human body model of the previous frame, the interested human body area is obtained through projection. Only the human body area is extracted, and irrelevant background can be removed. For neighboring similar pixel clusters, a color block (pixel block) is generated, the result is shown in fig. 5. Each color block is approximated using a two-dimensional gaussian function. During clustering, the present invention uses some particular threshold to determine which pixels are clustered together. The pictures are represented by the picture model of the two-dimensional Gaussian function set, so that compared with the method for performing similarity matching by directly using pixels of the pictures, a large amount of computing power can be saved, and the overall experiment speed is greatly improved.

And 102, establishing a two-dimensional pixel block model of each pixel block.

The two-dimensional pixel block model for each pixel block is:

B _i (x) A two-dimensional pixel block model representing the ith pixel block, c _i Color, μ representing the center of the ith pixel block _i Representing the coordinates, delta, of the center of the ith pixel block _i Representing half length of ith pixel block, x _i Representing the projection of the position x of any point on the human body on the ith pixel block.

The entire image is divided into a plurality of pixel blocks, and thus, the entire image can be expressed as:

Im(x)＝∑c _i ·B _i (x)

and 105, optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model.

Projection of the mannequin to each viewing angle:

wherein mu _jx 、μ _jy 、μ _jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part _j Represents the radius of the jth body part, f _il Representing parameters of the camera that obtained the i-th pixel block. By the formula, the three-dimensional human body model of the three-dimensional Gaussian function set is projected to two dimensions, and the similarity can be calculated with the two-dimensional pixel block model of the two-dimensional Gaussian function set. The approximate value of the real projection is calculated, and the real value is an ellipsoid, but the error introduced by the approximate value is negligible. The similarity of the three-dimensional mannequin and the image model can be expressed as:

wherein d (c) _i ,c _j ) For the closeness of two colors:

w is a penalty for colors that differ too much, ε is a threshold to determine if colors are close. Generally, epsilon=0.1 and w=0.05, and good results can be obtained. The RGB color space is closely related to the illumination intensity, and better results can be obtained using other color spaces, such as Lab color space. For similarity E _ij Deriving, along the gradient direction, E _ij And (3) enlarging:

E _ijk+1 ＝E _ijk +ρ _k s ^(k)

wherein s is ^k Representing the gradient direction ρ _k Representing the search step in the gradient direction. After a certain number of iterations, when E _ij After approaching to the constant, recording the parameters of the human body model at the moment as the current human body posture. Wherein the step size is dynamic. An initial step size is determined according to the statistical rules of a plurality of videos. In the optimization process, when two iterative calculations are performedWhen the derivative symbols of (c) are identical, meaning that the current pose has not yet reached the optimal point, we expand the step size by a factor of 1.1. When the derivative symbols of the two iterative calculations are not identical, meaning that the current pose skips the optimal point, we reduce the step size to 0.5 (and do not need to be modified to-0.5 because the derivative symbols themselves have changed and the direction of optimization has changed). Considering that the human body motion is continuous, the result of the next frame can be predicted from the results of the previous frames.

pose _i+1 ＝t ₁ *pose _i +t ₂ *pose _i-1 +t ₃ *pose _i-2

Where phase is the pose result of the different frames and t is the weight of each frame result used in prediction. And predicting the result of the gesture of the current frame by using the historical gesture, replacing the result of the previous frame directly used as the initial gesture of the current frame, and enabling the result to be closer to a picture model of a two-dimensional Gaussian function set of the current frame so as to reduce the optimized iteration times and obtain the acceleration of the whole experiment.

Step 105, optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model, which specifically includes: calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value; judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result; if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value; if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.

The calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model, as the objective function value, specifically includes: calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:

wherein d (c) _i ,c _j ) Color c of model representing jth human body part _j Color c with the ith pixel block _i Is similar to degree B _i (x) Two-dimensional pixel block model representing the ith pixel block, A _j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ _i Representing the coordinates, delta, of the center of the ith pixel block _i Representing the half length of the ith pixel block,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;

w is the penalty value of the color, ε is the color threshold, μ _jx 、μ _jy 、μ _jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part _j Represents the radius of the jth body part, f _il Representing a camera obtaining an ith pixel blockParameters.

Each pixel block in fig. 5 calculates a color similarity. And then optimizing parameters of the three-dimensional human body model along the gradient direction to ensure that the similarity is larger. When the similarity is stable, the three-dimensional body model is projected onto a picture, and the result is shown in fig. 6. The current three-dimensional human body model parameters are the current human body posture.

And 106, determining the human body posture of the current frame according to the optimized three-dimensional human body model.

In order to reduce the number of iterations in the optimization process, in step 105, the optimizing the three-dimensional mannequin according to the two-dimensional pixel block model of each pixel block, to obtain an optimized three-dimensional mannequin, further includes: predicting the position coordinates of each human body part in the current frame according to the position coordinates of each human body part in each frame in the preset frame before the current frame; and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.

The invention carries out picture preprocessing on the CPU to obtain mathematical representation of the picture. The picture and the human model parameters are transferred into CUDA (compute unified device architecture, parallel computing framework) and computed. For each picture, the same number of pixel blocks is used. Each CUDA kernel calculates the similarity between each pixel block and a part of the human body, and the parameter of the human body model is fixed, so each calculation is a constant time.

The invention also provides a human body gesture recognition system, which comprises:

and the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block.

The three-dimensional human body model building module specifically comprises: the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure; the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point; the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space; and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.

And the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model.

The identification system further comprises: the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame; the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.

the invention discloses a human body posture recognition method and a system, wherein the recognition method comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A human body posture recognition method, characterized in that the recognition method comprises the steps of:

establishing a two-dimensional pixel block model of each pixel block;

determining the human body posture of the current frame according to the optimized three-dimensional human body model;

the three-dimensional human body model is as follows:

wherein A (x) is a three-dimensional human body model, A _j (x) Three-dimensional manikin of jth human body part, mu _j Representing the coordinates, sigma, of the jth human body part _j The radius of the jth human body part is represented, and x represents the position of any point of the human body;

the two-dimensional pixel block model is as follows:

wherein B is _i (x) A two-dimensional pixel block model representing the ith pixel block, c _i Color, μ representing the center of the ith pixel block _i Representing the coordinates, delta, of the center of the ith pixel block _i Representing half length of ith pixel block, x _i Representing the projection of the position x of any point on the human body on the ith pixel block.

2. The method according to claim 1, wherein the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model further comprises:

predicting the position coordinates of each human body part in the current frame according to the position coordinates of each human body part in each frame in the preset frame before the current frame;

3. The human body posture recognition method of claim 1, wherein the building a three-dimensional human body model according to a plurality of human body images by adopting a convolutional neural network algorithm specifically comprises:

4. The human body posture recognition method according to claim 1, wherein the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model specifically comprises:

5. The human body posture identifying method according to claim 4, characterized in that said calculating a sum of the similarity of the model of the jth human body part of the three-dimensional human body model and each of said two-dimensional pixel block models as an objective function value, specifically comprises:

wherein E is _ij Similarity between model and i-th pixel block model of jth human body part representing three-dimensional human body model, d (c) _i ,c _j ) Color c of model representing jth human body part _j Color c with the ith pixel block _i Is similar to degree B _i (x) Two-dimensional pixel block model representing the ith pixel block, A _j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ _i Representing the coordinates, delta, of the center of the ith pixel block _i Representing half of the ith pixel blockThe length of the steel wire is longer than the length,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;

wherein w is the penalty value of the color, ε is the color threshold, μ _jx 、μ _jy 、μ _jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part _j Represents the radius of the jth body part, f _il Parameters representing a camera that obtained the i-th pixel block;

6. A human gesture recognition system, the recognition system comprising:

the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model;

the three-dimensional human body model is as follows:

the two-dimensional pixel block model is as follows:

7. The human gesture recognition system of claim 6, wherein the recognition system further comprises:

8. The human body posture recognition system of claim 6, wherein the three-dimensional human body model building module specifically comprises: