CN113065506B - Human body posture recognition method and system - Google Patents

Human body posture recognition method and system Download PDF

Info

Publication number
CN113065506B
CN113065506B CN202110411237.9A CN202110411237A CN113065506B CN 113065506 B CN113065506 B CN 113065506B CN 202110411237 A CN202110411237 A CN 202110411237A CN 113065506 B CN113065506 B CN 113065506B
Authority
CN
China
Prior art keywords
human body
dimensional
model
pixel block
body part
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110411237.9A
Other languages
Chinese (zh)
Other versions
CN113065506A (en
Inventor
江沛远
周余
于耀
都思丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110411237.9A priority Critical patent/CN113065506B/en
Publication of CN113065506A publication Critical patent/CN113065506A/en
Application granted granted Critical
Publication of CN113065506B publication Critical patent/CN113065506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body posture recognition method, which comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.

Description

Human body posture recognition method and system
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and a system for recognizing human body gestures.
Background
3D human body pose estimation refers to estimating the pose of a human target from an image, video or point cloud, and is a fundamental task in 3D research around the human body. The 3D human body posture estimation is an important precondition for 3D human body reconstruction and can also be an important source of motion in human body motion driving. Currently, there are two main ways of obtaining the body posture. 1. By training a specific data set, a neural network is used for realizing the aim of estimating the human body posture under the scene. This method requires a large amount of artificial marker data to train the neural network, and at the same time, the accuracy of the method is low. 2. And establishing a human body model, and enabling the model to fit the human body on the picture. This method relies on the creation of a mannequin, which is relatively complex and slow. Therefore, how to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation is a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a human body posture recognition method and system, so as to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation.
In order to achieve the above object, the present invention provides the following solutions:
the invention provides a human body posture recognition method, which comprises the following steps:
acquiring a plurality of human body images under different visual angles of a current frame;
according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm;
clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
establishing a two-dimensional pixel block model of each pixel block;
optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
and determining the human body posture of the current frame according to the optimized three-dimensional human body model.
Optionally, the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model further includes:
predicting the position coordinates of each body part at the current frame according to the position coordinates of each body part at each frame within the preset frame before the current frame,
and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
Optionally, the building a three-dimensional human body model according to the plurality of human body images by adopting a convolutional neural network algorithm specifically includes:
obtaining K key points from each human body image by adopting a horglass network structure;
respectively enabling k=1, 2, … and K, and calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of a camera to obtain a plurality of back projection rays corresponding to each key point;
determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in a three-dimensional space by taking the coordinates of the key point in the three-dimensional space;
and carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
Optionally, the three-dimensional manikin is:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body.
Optionally, the two-dimensional pixel block model is:
B i (x) A two-dimensional pixel block model representing the ith pixel block, c i Representing the ith pixel blockColor of center, mu i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
Optionally, the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model specifically includes:
calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result;
if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.
Optionally, the calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model, as the objective function value, specifically includes:
calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:
wherein E is ij Jth representing three-dimensional manikinSimilarity of model of human body part and ith pixel block model, d (c) i ,c j ) Color c of model representing jth human body part j Color c with the ith pixel block i Is similar to degree B i (x) Two-dimensional pixel block model representing the ith pixel block, A j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ i Representing the coordinates, delta, of the center of the ith pixel block i Representing the half length of the ith pixel block,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;
w is the penalty value of the color, ε is the color threshold, μ jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Representing parameters of the camera that obtained the i-th pixel block.
And calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.
A human gesture recognition system, the recognition system comprising:
the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;
the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;
the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block;
the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
and the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model.
Optionally, the identification system further comprises:
the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame;
the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
Optionally, the three-dimensional mannequin building module specifically includes:
the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure;
the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point;
the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space;
and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a human body posture recognition method, which comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.
The invention predicts the result of the current frame by utilizing the continuity of human body actions and the result of the previous frames, reduces the iteration times in the optimization process and further improves the speed of human body posture estimation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a human body gesture recognition method provided by the invention;
FIG. 2 is a schematic diagram of a human body posture recognition method provided by the invention;
fig. 3 is a diagram of an arrangement of cameras for acquiring multiple images of a human body at different viewing angles according to the present invention;
FIG. 4 is a three-dimensional manikin provided by the present invention;
FIG. 5 is a diagram of a human body model composed of a plurality of two-dimensional pixel block models provided by the invention;
FIG. 6 is a view of an optimized three-dimensional manikin provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a human body posture recognition method and system, so as to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The invention relates to a human body posture estimation method without manually marking data or sensor data. Collecting color images of a target person at different angles in a fixed scene through a plurality of color cameras, obtaining coarse-precision human body key point coordinates by using a pre-trained convolutional neural network, and obtaining human body three-dimensional coordinates by using a back projection technology; and generating a general three-dimensional human body model by using the three-dimensional human body coordinates, calculating the similarity between the three-dimensional human body model and a human body in an image, and optimizing the human body model by combining the consistency of a plurality of visual angles in the three-dimensional world as constraint to obtain the final human body posture. The calculation process is performed on the CUDA, parameters of the human body model are constrained, calculation can be completed in constant time, 25 frames of images are processed per second, and videos can be processed in real time. The invention only needs color images as input, does not need manual operation or extra sensor equipment, and can be widely applied in the field of human body gesture acquisition.
As shown in fig. 1 and 2, the present invention provides a human body posture recognition method, which includes the steps of:
step 101, acquiring a plurality of human body images under different visual angles of a current frame.
A plurality of color cameras are used, located at different locations within the room, to obtain a sequence of successive, synchronized color images of the human body from multiple perspectives. All color cameras are controlled by a sync box.
The color cameras are circumferentially distributed to obtain human body information at different angles, and a typical eight-camera array is shown in fig. 3. The synchronous box sends out square wave signals, and when the camera receives the signals, the signals are shot at the same time, so that the human body information at the same moment is collected. The color information of the human body clothes is enriched as much as possible, so that the human body and the background object can be distinguished, and meanwhile, different parts of the human body can be identified more easily. The background information should be as simple as possible and the resulting body posture more robust.
And 102, building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images.
And according to the joint point coordinates output by the first frame neural network, back-projecting to obtain three-dimensional human body coordinates, and further generating a rough-precision gesture conforming to the human body in the image. The body model should describe the information of the fat, thin, height, characteristic color, etc. of the human body as much as possible. In a body model of a three-dimensional set of gaussian functions, the mean and variance of each gaussian function are used for description. In order to avoid overfitting caused by high degrees of freedom, L is used for describing the length of the human body trunk, R is used for describing the width of the human body trunk, and then the mean value and the variance of each Gaussian function are obtained through calculation, so that the degrees of freedom can be greatly reduced, and a better generalization effect is obtained. The present invention adjusts this model to generate a actor-specific body model that generally represents the shape and color of each gaussian.
This step uses the neural network horglass to extract 16 human keypoints (x) from each photograph for 8 photographs input at different perspectives i ,y i ). For 8 views of the same key point, based on camera parameters, 8 rays of the back projection are calculated. Using a least square method, a common point (x i ,y i ,z i ) The common point is the coordinates of the key point in three-dimensional space. Three-dimensional coordinates (x) i ,y i ,z i ) By linear interpolation, a three-dimensional mannequin having three-dimensional coordinates of 63 human body parts is obtained:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body.
The method comprises the following specific steps:
and obtaining K key points from each human body image by adopting a horglass network structure.
Let k=1, 2, …, K respectively, calculate the back projection ray in the human body image of the kth key point under each view angle based on the parameters of the camera, obtain a plurality of back projection rays corresponding to each key point.
And determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in the three-dimensional space by taking the coordinates of the common point as the coordinates of the key point in the three-dimensional space.
And (4) performing linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body, as shown in fig. 4.
And step 103, clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks.
For each view angle picture, firstly, according to the optimized three-dimensional human body model of the previous frame, the interested human body area is obtained through projection. Only the human body area is extracted, and irrelevant background can be removed. For neighboring similar pixel clusters, a color block (pixel block) is generated, the result is shown in fig. 5. Each color block is approximated using a two-dimensional gaussian function. During clustering, the present invention uses some particular threshold to determine which pixels are clustered together. The pictures are represented by the picture model of the two-dimensional Gaussian function set, so that compared with the method for performing similarity matching by directly using pixels of the pictures, a large amount of computing power can be saved, and the overall experiment speed is greatly improved.
And 102, establishing a two-dimensional pixel block model of each pixel block.
The two-dimensional pixel block model for each pixel block is:
B i (x) A two-dimensional pixel block model representing the ith pixel block, c i Color, μ representing the center of the ith pixel block i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
The entire image is divided into a plurality of pixel blocks, and thus, the entire image can be expressed as:
Im(x)=∑c i ·B i (x)
and 105, optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model.
Projection of the mannequin to each viewing angle:
wherein mu jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Representing parameters of the camera that obtained the i-th pixel block. By the formula, the three-dimensional human body model of the three-dimensional Gaussian function set is projected to two dimensions, and the similarity can be calculated with the two-dimensional pixel block model of the two-dimensional Gaussian function set. The approximate value of the real projection is calculated, and the real value is an ellipsoid, but the error introduced by the approximate value is negligible. The similarity of the three-dimensional mannequin and the image model can be expressed as:
wherein d (c) i ,c j ) For the closeness of two colors:
w is a penalty for colors that differ too much, ε is a threshold to determine if colors are close. Generally, epsilon=0.1 and w=0.05, and good results can be obtained. The RGB color space is closely related to the illumination intensity, and better results can be obtained using other color spaces, such as Lab color space. For similarity E ij Deriving, along the gradient direction, E ij And (3) enlarging:
E ijk+1 =E ijkk s (k)
wherein s is k Representing the gradient direction ρ k Representing the search step in the gradient direction. After a certain number of iterations, when E ij After approaching to the constant, recording the parameters of the human body model at the moment as the current human body posture. Wherein the step size is dynamic. An initial step size is determined according to the statistical rules of a plurality of videos. In the optimization process, when two iterative calculations are performedWhen the derivative symbols of (c) are identical, meaning that the current pose has not yet reached the optimal point, we expand the step size by a factor of 1.1. When the derivative symbols of the two iterative calculations are not identical, meaning that the current pose skips the optimal point, we reduce the step size to 0.5 (and do not need to be modified to-0.5 because the derivative symbols themselves have changed and the direction of optimization has changed). Considering that the human body motion is continuous, the result of the next frame can be predicted from the results of the previous frames.
pose i+1 =t 1 *pose i +t 2 *pose i-1 +t 3 *pose i-2
Where phase is the pose result of the different frames and t is the weight of each frame result used in prediction. And predicting the result of the gesture of the current frame by using the historical gesture, replacing the result of the previous frame directly used as the initial gesture of the current frame, and enabling the result to be closer to a picture model of a two-dimensional Gaussian function set of the current frame so as to reduce the optimized iteration times and obtain the acceleration of the whole experiment.
Step 105, optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model, which specifically includes: calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value; judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result; if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value; if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.
The calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model, as the objective function value, specifically includes: calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:
wherein d (c) i ,c j ) Color c of model representing jth human body part j Color c with the ith pixel block i Is similar to degree B i (x) Two-dimensional pixel block model representing the ith pixel block, A j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ i Representing the coordinates, delta, of the center of the ith pixel block i Representing the half length of the ith pixel block,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;
w is the penalty value of the color, ε is the color threshold, μ jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Representing a camera obtaining an ith pixel blockParameters.
And calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.
Each pixel block in fig. 5 calculates a color similarity. And then optimizing parameters of the three-dimensional human body model along the gradient direction to ensure that the similarity is larger. When the similarity is stable, the three-dimensional body model is projected onto a picture, and the result is shown in fig. 6. The current three-dimensional human body model parameters are the current human body posture.
And 106, determining the human body posture of the current frame according to the optimized three-dimensional human body model.
In order to reduce the number of iterations in the optimization process, in step 105, the optimizing the three-dimensional mannequin according to the two-dimensional pixel block model of each pixel block, to obtain an optimized three-dimensional mannequin, further includes: predicting the position coordinates of each human body part in the current frame according to the position coordinates of each human body part in each frame in the preset frame before the current frame; and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
The invention carries out picture preprocessing on the CPU to obtain mathematical representation of the picture. The picture and the human model parameters are transferred into CUDA (compute unified device architecture, parallel computing framework) and computed. For each picture, the same number of pixel blocks is used. Each CUDA kernel calculates the similarity between each pixel block and a part of the human body, and the parameter of the human body model is fixed, so each calculation is a constant time.
The invention also provides a human body gesture recognition system, which comprises:
the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;
the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;
the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
and the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block.
The three-dimensional human body model building module specifically comprises: the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure; the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point; the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space; and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
And the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model.
And the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model.
The identification system further comprises: the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame; the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a human body posture recognition method and a system, wherein the recognition method comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.
The invention predicts the result of the current frame by utilizing the continuity of human body actions and the result of the previous frames, reduces the iteration times in the optimization process and further improves the speed of human body posture estimation.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (8)

1. A human body posture recognition method, characterized in that the recognition method comprises the steps of:
acquiring a plurality of human body images under different visual angles of a current frame;
according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm;
clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
establishing a two-dimensional pixel block model of each pixel block;
optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
determining the human body posture of the current frame according to the optimized three-dimensional human body model;
the three-dimensional human body model is as follows:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body;
the two-dimensional pixel block model is as follows:
wherein B is i (x) A two-dimensional pixel block model representing the ith pixel block, c i Color, μ representing the center of the ith pixel block i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
2. The method according to claim 1, wherein the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model further comprises:
predicting the position coordinates of each human body part in the current frame according to the position coordinates of each human body part in each frame in the preset frame before the current frame;
and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
3. The human body posture recognition method of claim 1, wherein the building a three-dimensional human body model according to a plurality of human body images by adopting a convolutional neural network algorithm specifically comprises:
obtaining K key points from each human body image by adopting a horglass network structure;
respectively enabling k=1, 2, … and K, and calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of a camera to obtain a plurality of back projection rays corresponding to each key point;
determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in a three-dimensional space by taking the coordinates of the key point in the three-dimensional space;
and carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
4. The human body posture recognition method according to claim 1, wherein the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model specifically comprises:
calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result;
if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.
5. The human body posture identifying method according to claim 4, characterized in that said calculating a sum of the similarity of the model of the jth human body part of the three-dimensional human body model and each of said two-dimensional pixel block models as an objective function value, specifically comprises:
calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:
wherein E is ij Similarity between model and i-th pixel block model of jth human body part representing three-dimensional human body model, d (c) i ,c j ) Color c of model representing jth human body part j Color c with the ith pixel block i Is similar to degree B i (x) Two-dimensional pixel block model representing the ith pixel block, A j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ i Representing the coordinates, delta, of the center of the ith pixel block i Representing half of the ith pixel blockThe length of the steel wire is longer than the length,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;
wherein w is the penalty value of the color, ε is the color threshold, μ jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Parameters representing a camera that obtained the i-th pixel block;
and calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.
6. A human gesture recognition system, the recognition system comprising:
the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;
the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;
the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block;
the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model;
the three-dimensional human body model is as follows:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body;
the two-dimensional pixel block model is as follows:
wherein B is i (x) A two-dimensional pixel block model representing the ith pixel block, c i Color, μ representing the center of the ith pixel block i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
7. The human gesture recognition system of claim 6, wherein the recognition system further comprises:
the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame;
the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
8. The human body posture recognition system of claim 6, wherein the three-dimensional human body model building module specifically comprises:
the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure;
the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point;
the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space;
and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
CN202110411237.9A 2021-04-16 2021-04-16 Human body posture recognition method and system Active CN113065506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110411237.9A CN113065506B (en) 2021-04-16 2021-04-16 Human body posture recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110411237.9A CN113065506B (en) 2021-04-16 2021-04-16 Human body posture recognition method and system

Publications (2)

Publication Number Publication Date
CN113065506A CN113065506A (en) 2021-07-02
CN113065506B true CN113065506B (en) 2023-12-26

Family

ID=76566830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110411237.9A Active CN113065506B (en) 2021-04-16 2021-04-16 Human body posture recognition method and system

Country Status (1)

Country Link
CN (1) CN113065506B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035769A (en) * 2022-07-21 2022-09-09 四川嘉义索隐科技有限公司 Training system for simulating electronic countermeasure
CN115984972B (en) * 2023-03-20 2023-08-11 乐歌人体工学科技股份有限公司 Human body posture recognition method based on motion video driving

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715493A (en) * 2015-03-23 2015-06-17 北京工业大学 Moving body posture estimating method
CN106910247A (en) * 2017-03-20 2017-06-30 厦门幻世网络科技有限公司 Method and apparatus for generating three-dimensional head portrait model
CN108876814A (en) * 2018-01-11 2018-11-23 南京大学 A method of generating posture stream picture
CN109949368A (en) * 2019-03-14 2019-06-28 郑州大学 A kind of human body three-dimensional Attitude estimation method based on image retrieval
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN111428586A (en) * 2020-03-09 2020-07-17 同济大学 Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
CN111753747A (en) * 2020-06-28 2020-10-09 高新兴科技集团股份有限公司 Violent motion detection method based on monocular camera and three-dimensional attitude estimation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11069131B2 (en) * 2019-09-26 2021-07-20 Amazon Technologies, Inc. Predictive personalized three-dimensional body models

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715493A (en) * 2015-03-23 2015-06-17 北京工业大学 Moving body posture estimating method
CN106910247A (en) * 2017-03-20 2017-06-30 厦门幻世网络科技有限公司 Method and apparatus for generating three-dimensional head portrait model
CN108876814A (en) * 2018-01-11 2018-11-23 南京大学 A method of generating posture stream picture
CN109949368A (en) * 2019-03-14 2019-06-28 郑州大学 A kind of human body three-dimensional Attitude estimation method based on image retrieval
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN111428586A (en) * 2020-03-09 2020-07-17 同济大学 Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
CN111753747A (en) * 2020-06-28 2020-10-09 高新兴科技集团股份有限公司 Violent motion detection method based on monocular camera and three-dimensional attitude estimation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Human body posture recognition algorithm for still images;Naigong Yu;《JOURNAL OF ENGINEERING-JOE》;全文 *
一种基于全卷积神经网络的横担姿态测量方法;吴巍;郭飞;郭毓;郭健;;华中科技大学学报(自然科学版)(12);全文 *
三维人脸建模及在跨姿态人脸匹配中的有效性验证;李昕昕;龚勋;;计算机应用(01);全文 *
基于二维点云图的三维人体建模方法;张广翩;计忠平;;计算机工程与应用(19);全文 *
基于卷积神经网络的人体姿态估计算法综述;彭帅;《北京信息科技大学学报(自然科学版)》;第35卷(第03期);全文 *

Also Published As

Publication number Publication date
CN113065506A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
US20180012411A1 (en) Augmented Reality Methods and Devices
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
CN113065546B (en) Target pose estimation method and system based on attention mechanism and Hough voting
US20170278302A1 (en) Method and device for registering an image to a model
JP6207210B2 (en) Information processing apparatus and method
CN113065506B (en) Human body posture recognition method and system
CN111105432A (en) Unsupervised end-to-end driving environment perception method based on deep learning
CN107798313A (en) A kind of human posture recognition method, device, terminal and storage medium
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
CN113421328B (en) Three-dimensional human body virtual reconstruction method and device
CN110827320B (en) Target tracking method and device based on time sequence prediction
Chen et al. A particle filtering framework for joint video tracking and pose estimation
CN115661246A (en) Attitude estimation method based on self-supervision learning
CN111222459A (en) Visual angle-independent video three-dimensional human body posture identification method
CN112509129B (en) Spatial view field image generation method based on improved GAN network
CN111862278B (en) Animation obtaining method and device, electronic equipment and storage medium
CN116433822B (en) Neural radiation field training method, device, equipment and medium
Wan et al. Boosting image-based localization via randomly geometric data augmentation
CN117726747A (en) Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
Gibson et al. Quadruped gait analysis using sparse motion information
CN115953447A (en) Point cloud consistency constraint monocular depth estimation method for 3D target detection
JP2021071749A (en) Three dimensional model generation apparatus and method
CN115115713A (en) Unified space-time fusion all-around aerial view perception method
CN112132743B (en) Video face changing method capable of self-adapting illumination
Zhang et al. Object detection based on deep learning and b-spline level set in color images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant