CN104715493B

CN104715493B - A kind of method of movement human Attitude estimation

Info

Publication number: CN104715493B
Application number: CN201510128533.2A
Authority: CN
Inventors: 孔德慧; 刘洪林; 王少帆; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-03-23
Filing date: 2015-03-23
Publication date: 2018-01-19
Anticipated expiration: 2035-03-23
Also published as: CN104715493A

Abstract

The invention discloses a method for estimating the posture of a moving human body, which can accurately locate the skeleton points of the human body, effectively obtain the more expressive features of the three-dimensional moving human body, and make the construction more effective and simple. It includes: (1) using the median filter method to preprocess the depth image data, and using the Dijkstra algorithm based on geodesic distance to calibrate the human body pixels; (2) based on the K-means clustering algorithm Based on the regional feature point extraction algorithm, the number of clusters in each class is determined to be 3, and 32 posture features are extracted to represent different human postures; (3) In the training phase, the skeleton point position labeling information is obtained through the PoserPro2012 software, and the synthetic The posture features of 300 frames of virtual human are marked with standard skeleton points. Through the posture feature points of training samples and standard skeleton points, the linear regression model of posture features and skeleton points is calculated to obtain the mapping relationship between posture features and standard skeleton points. .

Description

A Method for Pose Estimation of Moving Human Body

技术领域technical field

本发明属于计算机视觉和模式识别的技术领域，具体地涉及一种运动人体姿态估计的方法。The invention belongs to the technical field of computer vision and pattern recognition, and in particular relates to a method for gesture estimation of a moving human body.

背景技术Background technique

人体姿态估计是计算机视觉领域的一个重要的研究方向。近十年来，自动识别图像视频序列中的人体姿态问题一直是计算机视觉领域的研究热点。促使人体姿态估计成为研究重点的主要原因还是电子设备的迅速发展以及由它所产生的巨大应用市场。有效的处理和理解数据中的人体活动，将会为人类社会的发展带来深远影响。运动人体行为分析的目的在于描述、识别和理解人体动作、人与人之间以及人与环境之间的交互行为，其在智能视频监控、虚拟现实、安全、高级人机交互以及基于内容的图像存储与检索等方面具有广泛的应用背景。Human pose estimation is an important research direction in the field of computer vision. In the past ten years, the problem of automatically recognizing human poses in image and video sequences has been a research hotspot in the field of computer vision. The main reason for human body pose estimation to become the focus of research is the rapid development of electronic equipment and the huge application market generated by it. Effective processing and understanding of human activities in data will have a profound impact on the development of human society. The purpose of sports human behavior analysis is to describe, recognize and understand human actions, human-human and human-environment interactions, which can be used in intelligent video surveillance, virtual reality, security, advanced human-computer interaction and content-based images. It has a wide range of application backgrounds in storage and retrieval.

人体的姿态表示维度有二维和三维两种。二维的人体姿态是指人体关节在图像二维平面分布的一种描述，传统的二维图像的人体姿态估计受周围环境因素(衣服颜色、光照)和遮挡的影响比较大，同时缺乏像素的空间位置信息。目前面向二维图像的人体姿态估计方法中，基于图结构模型及其改进的方法占压倒性优势。图结构模型是用图模型结构来表示部件之间的连接情况。图结构模型将人体分成多个刚体部件(头、躯干、一对上臂，一对下臂，一对大腿，一对小腿等)，各部位用一个矩形框来进行定位表示；两部件之间由关节点相连(如图1a)。对应人体部位的矩形框可以表示为向量There are two dimensions of human body pose representation: two-dimensional and three-dimensional. Two-dimensional human body posture refers to a description of the distribution of human body joints on the two-dimensional plane of the image. The human body posture estimation of traditional two-dimensional images is greatly affected by surrounding environmental factors (clothes color, light) and occlusion. At the same time, there is a lack of pixel Spatial location information. Among the current human pose estimation methods for two-dimensional images, the methods based on graph structure models and their improvements occupy an overwhelming advantage. The graph structure model uses the graph model structure to represent the connection between components. The graph structure model divides the human body into multiple rigid parts (head, torso, a pair of upper arms, a pair of lower arms, a pair of thighs, a pair of calves, etc.), and each part is represented by a rectangular frame; the two parts are separated by The joints are connected (as shown in Figure 1a). A rectangular box corresponding to a human body part can be represented as a vector

L＝(x,y,r,s,w) (1)L=(x,y,r,s,w) (1)

其中(x，y)表示矩形中心位置，r表示矩形相对于垂直方向所偏移的角度，s表示矩形框的长度，w表示矩形框宽度。人体树形图模型(如图1b)可以表示为一个无向图Where (x, y) represents the center position of the rectangle, r represents the angle of the rectangle relative to the vertical direction, s represents the length of the rectangle, and w represents the width of the rectangle. The human tree diagram model (as shown in Figure 1b) can be represented as an undirected graph

G＝(V,E) (2)G=(V,E) (2)

其中E为图中所有的边的集合，顶点集合V＝{v₁,v₂,v₃,...,v_n}中的各个元素分别是对应人体刚性部位，若两个人体部位v_i和v_j相连，则存在边(v_i，v_j)∈E。基于图模型的人体姿态估计需要设计特征的表示、部件检测并处理复杂的拓扑结构。因此基于图模型的方法不是一种非常高效的人体姿态估计方法。对于二维的人体姿态估计问题，2014年Alexander Toshev等人提出了基于DNN的人体姿态估计方法，该方法将人体姿态估计问题形式化为关节点的回归问题。基于深度卷积神经网络的人体姿态估计存在显而易见的缺陷：只是针对RGB图像，未使用深度数据，同时该方法使用的网络结构非常地复杂(如图2)，训练效率低。Where E is the set of all edges in the graph, each element in the vertex set V={v ₁ ,v ₂ ,v ₃ ,...,v _n } is the corresponding rigid part of the human body, if two human body parts v _i If it is connected with v _j , there is an edge (v _i , v _j )∈E. Human pose estimation based on graphical models requires the representation of design features, part detection, and handling complex topologies. Therefore, the method based on the graphical model is not a very efficient method for human pose estimation. For the two-dimensional human pose estimation problem, Alexander Toshev et al. proposed a DNN-based human pose estimation method in 2014, which formalized the human pose estimation problem as a regression problem of joint points. Human pose estimation based on deep convolutional neural network has obvious defects: it is only for RGB images, no depth data is used, and the network structure used by this method is very complex (as shown in Figure 2), and the training efficiency is low.

目前，三维人体姿态估计方法大致上可以分为两类。一类是以Loren ArthurSchwarz等人提出的方法为代表的基于测地距离和光流的方法，该类算法的缺陷在于其对于二级骨架点(肘部，膝盖，脖子，肩部，胯部)的定位采用比例法定位的方式，因此对于不同体型的人效果不是很理想，同时应用光流的计算量较大很难满足实时性要求比较高的场合。另一类方法是基于聚类的方法，以Jamie Shotton提出的基于随机森林的聚类方法为代表，该类方法中每一个骨架点都是所有像素的回归值，模型复杂且需要大量的训练样本，通过大量的有监督的训练过程才能够较为理想的确定像素对于每个骨架点的权重。At present, 3D human pose estimation methods can be roughly divided into two categories. One is the method based on geodesic distance and optical flow represented by the method proposed by Loren Arthur Schwarz et al. The defect of this type of algorithm is that it is limited to the secondary skeleton points (elbow, knee, neck, shoulder, crotch). The positioning adopts the proportional positioning method, so the effect is not very ideal for people of different body types. At the same time, the application of optical flow requires a large amount of calculation and it is difficult to meet the occasions with high real-time requirements. Another type of method is a clustering-based method, represented by the random forest-based clustering method proposed by Jamie Shotton. In this type of method, each skeleton point is the regression value of all pixels, and the model is complex and requires a large number of training samples. , through a large number of supervised training processes, it is possible to ideally determine the weight of the pixel for each skeleton point.

发明内容Contents of the invention

本发明的技术解决问题是：克服现有技术的不足，提供一种运动人体姿态估计的方法，其能够准确地定位人体骨架点，有效地获取三维运动人体中更具表达力的特征，构建更为有效、简单。The problem solved by the technology of the present invention is to overcome the deficiencies of the prior art and provide a method for estimating the pose of a moving human body, which can accurately locate the skeleton points of the human body, effectively obtain more expressive features in the three-dimensional moving human body, and construct more To be effective and simple.

本发明的技术解决方案是：这种运动人体姿态估计的方法，包括以下步骤：Technical solution of the present invention is: the method for this motion human body attitude estimation, comprises the following steps:

(1)采用中值滤波的方法对深度图像数据进行预处理操作，采用基于测地距离的迪杰斯特拉算法对人体像素进行部位标定；(1) The depth image data is preprocessed by the median filter method, and the human body pixels are calibrated by the Dijkstra algorithm based on the geodesic distance;

(2)基于K-均值聚类算法的区域特征点提取算法，确定每个类内的聚类个数为3个，提取32个姿态特征以表征不同的人体姿态；(2) Based on the regional feature point extraction algorithm of the K-means clustering algorithm, the number of clusters in each class is determined to be 3, and 32 posture features are extracted to represent different human postures;

(3)在训练阶段通过Poser Pro 2012软件获得骨架点位置标注信息，合成300帧虚拟人的姿态特征并标注了标准骨架点，通过训练样本的姿态特征点与标准骨架点，计算姿态特征与骨架点的线性回归模型，以便得到姿态特征和标准骨架点之间的映射关系。(3) In the training phase, the skeleton point position labeling information was obtained through Poser Pro 2012 software, and the posture features of 300 frames of virtual human were synthesized and the standard skeleton points were marked. The posture features and skeleton were calculated through the posture feature points and standard skeleton points of the training samples Point linear regression model in order to obtain the mapping relationship between pose features and standard skeleton points.

本发明通过基于测地距离的单源点迪杰斯特拉算法能够非常准确地定位人体一级骨架点(四肢和头)。为了能够更加方便有效的定位人体二级骨架点(脖子，肘，膝关节，髋部)等，提出了基于聚类的特征提取方式，这种特征的选择方式通过寻求一种高效的特征和骨架点之间的映射关系，使该方法具有更高的处理效率，能够准确的定位人体二级骨架点。该方法从聚类的方式入手，能够有效的获取三维运动人体中更具表达力的特征，构建更为有效、简单。The invention can very accurately locate the first-level skeleton points (limbs and head) of the human body through the single source point Dijkstra algorithm based on the geodesic distance. In order to locate the secondary skeleton points of the human body more conveniently and effectively (neck, elbow, knee joint, hip), etc., a feature extraction method based on clustering is proposed. This feature selection method seeks an efficient feature and skeleton The mapping relationship between the points makes the method have higher processing efficiency and can accurately locate the secondary skeleton points of the human body. This method starts from the way of clustering, which can effectively obtain the more expressive features of the three-dimensional moving human body, and the construction is more effective and simple.

附图说明Description of drawings

图1a是人体结构，图1b是图结构模型。Figure 1a is the human body structure, and Figure 1b is the graph structure model.

图2是DNN模型结构示意图。Figure 2 is a schematic diagram of the DNN model structure.

具体实施方式detailed description

这种运动人体姿态估计的方法，包括以下步骤：The method for estimating the posture of a moving human body includes the following steps:

优选地，所述步骤(1)的预处理操作包括平滑、去噪、对齐和背景剔除获取深度图像中的人体像素，采用基于测地距离的迪杰斯特拉算法对人体像素进行部位标定：基于图结构的像素表示，使用基于测地距离的单源点迪杰斯特拉算法，以人体像素的几何中心点作为源点，进行计算，标示人体的左右手、左右脚和头的肢端点位置；单源点迪杰斯特拉算法中初始化邻接矩阵有邻接边的构造满足公式(3)：Preferably, the preprocessing operations of the step (1) include smoothing, denoising, alignment and background removal to obtain the human body pixels in the depth image, and use the Dijkstra algorithm based on geodesic distance to carry out part calibration on the human body pixels: Based on the pixel representation of the graph structure, using the single source point Dijkstra algorithm based on the geodesic distance, the geometric center point of the pixel of the human body is used as the source point for calculation, and the positions of the left and right hands, left and right feet, and the extremity points of the head are marked ; In the single-source point Dijkstra algorithm, the initialization of the adjacency matrix with adjacent edges satisfies the formula (3):

(x_ij，x_kt)∈V_t×V_t，||x_ij-x_kt||₂＜δ∧|i-k|≤1∧|j-t|≤1 (3)(x _ij , x _kt )∈V _t ×V _t , ||x _ij -x _kt || ₂ <δ∧|ik|≤1∧|jt|≤1 (3)

其中x_ij表示图像中处于(i,j)位置的像素深度值，V_t是人体像素集合，δ代表连通像素的统计经验值；通过单源点迪杰斯特拉算法求取人体5个肢端点，采用基于测地距离的最近邻分类算法，以肢端点和几何中心为类心，实现人体区域划分；模式类别共有6类分别代表人体肢端点部位和躯干，其中为躯干部位，根据公式(4)、(9)得到where x _ij represents the depth value of the pixel at position (i, j) in the image, V _t is the set of human body pixels, and δ represents the statistical experience value of connected pixels; the five limbs of the human body are calculated by the single-source point Dijkstra algorithm Endpoints, using the nearest neighbor classification algorithm based on geodesic distance, with extremity endpoints and geometric centers as the centroids, to achieve human body area division; there are 6 types of pattern categories represent the extremities and the torso of the human body, respectively, where is the trunk part, according to the formulas (4), (9) to get

其中的下标i表示类，c表示类中的几何中心点像素，in The subscript i means class, c means The geometric center point pixel in the class,

||x-y||_geodesic表示像素间的测地距离。||xy|| _geodesic represents the geodesic distance between pixels.

优选地，所述步骤(2)中的特征点提取是基于人体区域的K-均值聚类算法：Preferably, the feature point extraction in the step (2) is based on the K-means clustering algorithm of the human body region:

姿态特征提取：采用基于人体部位区域的K-均值聚类算法，在得到的五个人体部位内，采用聚类个数为三的K-均值聚类算法对这五个人体部位提取聚类中心位置特征。Pose feature extraction: Using the K-means clustering algorithm based on the human body part area, the five human body parts obtained Inside, the K-means clustering algorithm with three clusters is used to extract the cluster center location features of these five human body parts.

特征维度：在五个肢端点部位内，采用聚类个数为三的K-均值聚类算法，得到15个人体姿态特征点；为了获取人体姿态的全局描述，在人体姿态特征点中加入人体像素的几何中心点作为全局描述特征，人体姿态特征点为16，特征点包括其二维坐标值，特征维度为32。Dimensions of features: at the five acral point sites Inside, the K-means clustering algorithm with three clusters is used to obtain 15 human body posture feature points; in order to obtain the global description of human body posture, the geometric center point of human body pixels is added to the human body posture feature points as the global description feature , the number of feature points of the human body posture is 16, the feature points include their two-dimensional coordinate values, and the feature dimension is 32.

优选地，所述步骤(3)中根据公式(5)-(7)得到一个稀疏的线性投影矩阵B，使得由X预测Y的误差较小Preferably, in said step (3), a sparse linear projection matrix B is obtained according to formulas (5)-(7), so that the error of predicting Y from X is small

其中X＝{x₁，...，x_n}表示聚类得到的n个样本的特征点样本集，每个Where X={x ₁ ,...,x _n } represents the feature point sample set of n samples obtained by clustering, each

样本x_i＝{x_i1,x_i2...x_im}，其中i∈{1,2,3...,n}，m是提取的特征个数32。Sample x _i ={x _i1 , x _i2 ... x _im }, where i∈{1,2,3...,n}, m is the number of extracted features 32.

Y＝{y₁，...，y_n}表示训练样本相对应的骨架点标签组成的n个样本，每个样本y_i＝{y_i1,y_i2...y_it}，其中i∈{1,2,3...,n}，t是人体姿态的标准骨架点坐标的个数。Y={y ₁ ,...,y _n } means n samples composed of skeleton point labels corresponding to training samples, each sample y _i ={y _i1 ,y _i2 ...y _it }, where i∈ {1,2,3...,n}, t is the number of standard skeleton point coordinates of human body posture.

为了保证子空间距离度量的旋转不变性，模型增加了旋转矩阵R，并将提取人体骨架点问题形式化为旋转稀疏回归问题。优选地，所述步骤(3)中根据公式(8)将提取人体骨架点形式化为旋转稀疏回归问题求解，In order to ensure the rotation invariance of the subspace distance measure, a rotation matrix R is added to the model, and the problem of extracting human skeleton points is formalized as a rotation sparse regression problem. Preferably, in the step (3), according to the formula (8), the extraction of human skeleton points is formalized as a rotation sparse regression problem solution,

本发明应用到Kinect2获取的深度图像数据和Poser Pro 2012导出的虚拟数据中，并且取得了明显的效果。在实验中选用640×480的RGBD图像，采集环境为室内，采集光照为日光灯。The invention is applied to the depth image data obtained by Kinect2 and the virtual data exported by Poser Pro 2012, and has achieved obvious effects. In the experiment, a 640×480 RGBD image is selected, the collection environment is indoor, and the collection light is fluorescent lamp.

在合成数据测试集Poser Pro 2012中取得了相当高的准确度，测试100帧深度图像数据，统计了合成数据的均方根误差(RMS)和最大误差(Max),其数值均以像素为单位。RMS＝10.1305最大关节欧氏距离误差为37.3363，可以看出合成的数据真实感和准确度较好。本发明与Jamie Shotton等人提出的随机森林的方法相比较从RMS和MAX两个标准上都取得了理想的提升。In the synthetic data test set Poser Pro 2012, a very high accuracy has been achieved. 100 frames of depth image data were tested, and the root mean square error (RMS) and maximum error (Max) of the synthetic data were counted. The values are in pixels. . RMS=10.1305 The maximum joint Euclidean distance error is 37.3363, it can be seen that the synthetic data has a good sense of reality and accuracy. Compared with the random forest method proposed by Jamie Shotton et al., the present invention has achieved ideal improvement in both RMS and MAX standards.

在Kinect2获取的实际深度图像测试数据上，该方法同样取得了理想的效果。On the actual depth image test data obtained by Kinect2, this method also achieved ideal results.

以上所述，仅是本发明的较佳实施例，并非对本发明作任何形式上的限制，凡是依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰，均仍属本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in any form. Any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention are still within the scope of this invention. The protection scope of the technical solution of the invention.

Claims

A kind of 1. method of movement human Attitude estimation, it is characterised in that：Comprise the following steps：

(1) pretreatment operation is carried out to depth image data using the method for medium filtering, using the Di Jie based on geodesic distance Si Tela algorithms carry out position demarcation to human body pixel；

(2) the provincial characteristics point extraction algorithm based on K- means clustering algorithms, it is determined that the cluster number in each class is 3, carry 32 dimension posture features are taken to characterize different human body attitudes；

(3) skeletal point position markup information is obtained by the softwares of Poser Pro 2012 in the training stage, synthesizes 300 frame visual humans Posture feature and be labelled with standard skeletal point, by the posture feature point of training sample and standard skeletal point, it is special to calculate posture Linear regression model (LRM) of the sign point with skeletal point, to obtain the mapping relations between posture feature and standard skeletal point；

The pretreatment operation of the step (1) includes smooth, denoising, alignment and background and rejects the human body picture obtained in depth image Element, position demarcation is carried out to human body pixel using based on the Dijkstra's algorithm of geodesic distance：Pixel table based on graph structure Show, using single source point Dijkstra's algorithm based on geodesic distance, using the geometric center point of human body pixel as source point, carry out Calculate, indicate the limb endpoint location of the right-hand man of human body, left and right pin and head；Adjoining is initialized in single source point Dijkstra's algorithm The construction that matrix has adjacent side meets formula (3)：

(x_ij, x_kt)∈V_t×V_t, | | x_ij-x_kt||₂＜ δ ∧ | i-k |≤1 ∧ | j-t |≤1 (3)

Wherein x_ijRepresent the pixel depth value in (i, j) position, V in image_tIt is human body pixel set, δ represents connected pixel Statistics empirical value；5 acra points of human body are asked for by single source point Dijkstra's algorithm, using based on the nearest of geodesic distance Adjacent sorting algorithm, using acra point and geometric center as the class heart, realize that human region divides；Pixel classification shares 6 classesI= 1,2 ..., 6, human body acra point position and trunk are represented respectively, whereinFor metastomium, obtained according to formula (4), (9)

<mrow> <msub> <mi>g</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>k</mi> </munder> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mi>t</mi> </mrow> </msub> <mo>-</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>c</mi> </msubsup> <mo>|</mo> <msub> <mo>|</mo> <mrow> <mi>g</mi> <mi>e</mi> <mi>o</mi> <mi>d</mi> <mi>e</mi> <mi>s</mi> <mi>i</mi> <mi>c</mi> </mrow> </msub> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mn>6</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mover> <mi>&omega;</mi> <mo>&OverBar;</mo> </mover> <mi>j</mi> </msub> <mo>=</mo> <mo>{</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mi>t</mi> </mrow> </msub> <mo>:</mo> <msub> <mi>g</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mrow> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>&le;</mo> <mn>6</mn> </mrow> </munder> <msub> <mi>g</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

WhereinSubscript i representClass, c are representedGeometric center point pixel in class,

||x-y||_geodesicRepresent the geodesic distance between pixel.
2. the method for movement human Attitude estimation according to claim 1, it is characterised in that：Spy in the step (2) Sign point extraction is the K- means clustering algorithms based on human region：

Posture feature extracts：Using the K- means clustering algorithms based on human body region, in five obtained human bodies In j=1 ..., 5, cluster number is used to calculate cluster centre position to this five human bodies for three K- means clustering algorithms Put,

Characteristic dimension：At five acra point positionsIn j=1 ..., 5, cluster number is used to be calculated for three K- mean clusters Method, obtain 15 human body posture feature points；In order to obtain the global description of human body attitude, people is added in human body attitude characteristic point The geometric center point of volumetric pixel is 16 as global description's feature, human body attitude characteristic point, and characteristic point includes its two-dimensional coordinate value, Characteristic dimension is 32.
3. the method for movement human Attitude estimation according to claim 2, it is characterised in that：Basis in the step (3) Formula (5)-(7) obtain a sparse linear projection matrix B, to minimize the prediction for mapping to obtain by projection matrix B by X The error of characteristic point position and fact characteristic point position Y is object function

<mrow> <mi>X</mi> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>x</mi> <mn>11</mn> </msub> </mtd> <mtd> <msub> <mi>x</mi> <mn>12</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>x</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>x</mi> <mn>21</mn> </msub> </mtd> <mtd> <msub> <mi>x</mi> <mn>22</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>x</mi> <mrow> <mn>2</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>x</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <msub> <mi>x</mi> <mrow> <mi>n</mi> <mn>2</mn> </mrow> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>x</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>Y</mi> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>y</mi> <mn>11</mn> </msub> </mtd> <mtd> <msub> <mi>y</mi> <mn>12</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>y</mi> <mrow> <mn>1</mn> <mi>t</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>y</mi> <mn>21</mn> </msub> </mtd> <mtd> <msub> <mi>y</mi> <mn>22</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>y</mi> <mrow> <mn>2</mn> <mi>t</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>y</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <msub> <mi>y</mi> <mrow> <mi>n</mi> <mn>2</mn> </mrow> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>y</mi> <mrow> <mi>n</mi> <mi>t</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

Wherein X={ x₁..., x_nRepresent to cluster the characteristic point sample set of n obtained sample, each sample x_i={ x_i1, x_i2...x_im, wherein i ∈ { 1,2,3..., n }, m=32 be extraction Characteristic Number, Y={ y₁..., y_nRepresent training sample N sample of corresponding skeletal point label composition, each sample y_i={ y_i1, y_i2...y_it, wherein i ∈ 1,2,3..., N }, t is the number of the standard skeleton point coordinates of human body attitude.
4. the method for movement human Attitude estimation according to claim 3, it is characterised in that：Basis in the step (3) Formula (8) turns to the solution of rotation sparse regression problem by human skeleton point form is extracted,