CN115393754A

CN115393754A - A 3D Human Pose Recognition Method Based on Angle Vector Calculation and Neural Network Improvement

Info

Publication number: CN115393754A
Application number: CN202210739379.2A
Authority: CN
Inventors: 邹嘉祥; 刘喆; 常欣雨; 田琦
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-11-25

Abstract

The invention discloses a three-dimensional human body posture recognition method based on angular vector calculation and neural network improvement, which comprises the steps of firstly carrying out frame extraction on an input original video file, and then extracting a video frame into a two-dimensional human body posture through a neural network OpenPose; secondly, the human body posture is converted into a three-dimensional human body posture by using an angular vector calculation method, so that the matching speed is greatly improved, and meanwhile, the three-dimensional human body posture which accords with human engineering is calculated according to the two-dimensional posture, so that the recognition precision is improved; finally, mapping the skeleton model to the original video frame, and forming a new human skeleton model file which can be used for secondary processing; the processing process is simple and quick, the step of the opera video fusion is simplified, and the opera video fusion is quicker, clearer and more harmonious.

Description

A 3D Human Pose Recognition Based on Angle Vector Calculation and Neural Network Improvement method

技术领域technical field

本发明属于计算机图形处理技术领域，具体涉及基于角向量计算与神经网络改进的三维人体姿态识别方法。The invention belongs to the technical field of computer graphics processing, in particular to a three-dimensional human body posture recognition method based on angle vector calculation and neural network improvement.

背景技术Background technique

在三维人体姿态识别中，数字化生成虚拟人物主要用到姿态估计等深度学习的相关技术。其基本思想是利用某种几何模型或结构来表示物体的结构和形状，并通过提取某些物体特征，在模型和图像之间建立起对应关系，然后通过几何或者其它方法实现物体空间姿态的估计。这里所使用的模型既可能是简单的几何形体，如平面、圆柱，也可能是某种几何结构，也可能是通过激光扫描或其它方法获得的三维模型。In 3D human pose recognition, the digital generation of virtual characters mainly uses related technologies of deep learning such as pose estimation. The basic idea is to use a certain geometric model or structure to represent the structure and shape of the object, and by extracting some object features, establish a correspondence between the model and the image, and then realize the estimation of the object's spatial attitude by geometric or other methods . The model used here may be a simple geometric body, such as a plane, a cylinder, or a certain geometric structure, or a three-dimensional model obtained by laser scanning or other methods.

基于学习的方法一般采用全局观测特征，可以保证算法具有较好的鲁棒性。然而这一类方法的姿态估计精度很大程度依赖于训练的充分程度。要想比较精确地得到二维观测与三维姿态之间的对应关系，就必须获取足够密集的样本来学习决策规则和回归函数。而一般来说所需要样本的数量是随状态空间的维度指数级增加的，对于高维状态空间，事实上不可能获取进行精确估计所需要的密集采样。因此，无法得到密集采样而难以保证估计的精度与连续性，是基于学习的姿态估计方法无法克服的根本困难。物体识别是一项关键技术，如果不能正确识别人脸，人体定位，如果不能正确识别背景，人体有可能产生扭曲。物体识别是计算机视觉领域中的一项基础研究，它的任务是识别出图像中有什么物体，并报告出这个物体在图像表示的场景中的位置和方向。Learning-based methods generally use global observation features, which can ensure that the algorithm has better robustness. However, the pose estimation accuracy of this class of methods largely depends on the adequacy of training. In order to obtain the corresponding relationship between two-dimensional observation and three-dimensional attitude more accurately, it is necessary to obtain enough dense samples to learn decision rules and regression functions. Generally speaking, the number of samples required increases exponentially with the dimension of the state space. For high-dimensional state spaces, it is actually impossible to obtain the dense sampling required for accurate estimation. Therefore, it is difficult to obtain dense sampling and ensure the accuracy and continuity of estimation, which is the fundamental difficulty that cannot be overcome by learning-based attitude estimation methods. Object recognition is a key technology. If the face cannot be recognized correctly, the human body positioning, and if the background cannot be recognized correctly, the human body may be distorted. Object recognition is a basic research in the field of computer vision. Its task is to identify what object is in the image and report the position and orientation of the object in the scene represented by the image.

目前，大多数现有的三维姿态识别方法需要在特定的环境下或者复杂神经网络模型下才能产生较好的效果，对素材要求高，处理过程繁琐，处理时间长，很难达到实时应用的效果。At present, most of the existing 3D pose recognition methods need to produce good results in a specific environment or under a complex neural network model, which requires high material requirements, cumbersome processing, and long processing time, making it difficult to achieve real-time application effects. .

发明内容Contents of the invention

为解决现有技术存在的问题，本发明的目的在于提供一种基于角向量计算与神经网络改进的三维人体姿态识别方法，处理过程简单、快捷，提高了识别精度；使戏曲视频的融合更快、更清晰、更和谐。In order to solve the problems existing in the prior art, the object of the present invention is to provide a three-dimensional human body posture recognition method based on angle vector calculation and neural network improvement, the processing process is simple and fast, and the recognition accuracy is improved; the fusion of opera videos is faster , clearer and more harmonious.

为了实现上述目的，本发明采用以下技术方案予以实现：In order to achieve the above object, the present invention adopts the following technical solutions to achieve:

一种基于角向量计算与神经网络改进的三维人体姿态识别方法，包括以下步骤：A three-dimensional human gesture recognition method based on angle vector calculation and neural network improvement, comprising the following steps:

步骤1：输入原视频文件，并分解为视频帧；Step 1: Input the original video file and decompose it into video frames;

步骤2：对步骤1中的视频帧输入到OpenPose算法中进行二维姿态提取；Step 2: Input the video frame in step 1 into the OpenPose algorithm for two-dimensional pose extraction;

步骤3：将步骤2中提取出的二维姿态进行归一化处理；Step 3: Normalize the two-dimensional pose extracted in step 2;

步骤4：将步骤3中归一化后的二维人体姿态通过角向量计算公式还原为三维人体姿态；Step 4: Restore the normalized two-dimensional human body posture in step 3 to a three-dimensional human body posture through the angle vector calculation formula;

步骤5：通过对人体每个相连接的关节点均做上述步骤3，获取所有关节点的三维坐标位置；最后，将所有关节点连接成树形结构的人体骨骼特征图；Step 5: By doing the above step 3 for each connected joint point of the human body, the three-dimensional coordinate positions of all relevant nodes are obtained; finally, all relevant nodes are connected into a tree-structured human skeleton feature map;

步骤6：将步骤5中得到人体骨骼特征图映射回原视频文件中；Step 6: Map the human skeleton feature map obtained in step 5 back to the original video file;

步骤7：将经过步骤6处理过后骨骼特征图重新生成三维人体姿态视频。Step 7: Regenerate the 3D human pose video from the bone feature map processed in step 6.

进一步，所述步骤2中视频帧进行归一化处理，将每一帧的像素值设置为1920×1080。Further, in the step 2, the video frames are normalized, and the pixel value of each frame is set to 1920×1080.

进一步，所述步骤2中将二维姿态进行归一化处理，归一化矩阵R的获取方法为：以头部关节点所在平面的π作为法向量，旋转后则为以e_z为法向量的平面，旋转向量为A＝(a_x，a_y，a_z)，旋转角为θ；因为π与其他关节点组成的二维平面垂直，利用法向量和平面任意向量的乘积为0的原理得到(π_x，π_y，π_z)，令π′、e_z′为单位法向量。归一化过程的计算方法为如下公式：Further, in the step 2, the two-dimensional posture is normalized, and the method of obtaining the normalization matrix R is: take the π of the plane where the head joint point is located as the normal vector, and after rotation, use e _z as the normal vector , the rotation vector is A=(a _x , a _y , a _z ), and the rotation angle is θ; because π is perpendicular to the two-dimensional plane composed of other joint points, the principle that the product of the normal vector and any plane vector is 0 is used To get (π _x , π _y , π _z ), let π′, e _z ′ be unit normal vectors. The calculation method of the normalization process is the following formula:

最终得到的R即为归一化矩阵，将归一化矩阵R与原二维坐标矩阵相乘，即可得到矫正为正对屏幕的二维人体姿态；The final R obtained is the normalization matrix, and the normalization matrix R is multiplied by the original two-dimensional coordinate matrix to obtain the two-dimensional human body posture corrected to face the screen;

进一步，所述步骤4中所用的角向量计算公式为：Further, the angle vector calculation formula used in the step 4 is:

本发明与现有技术相比，具有如下技术效果：Compared with the prior art, the present invention has the following technical effects:

本发明公开了一种基于角向量计算与神经网络改进的三维人体姿态识别方法，首先对输入的原视频文件进行帧提取，然后通过神经网络OpenPose将视频帧提取为二维人体姿态；其次，使用角向量计算方法将而为人体姿态转换为三维人体姿态，这样极大程度提高了匹配速度，同时，根据二维姿态计算出符合人体工学的三维人体姿态，提高了识别精度；最后，将骨骼模型映射到原视频帧中，并且形成新的人体骨骼模型文件，可以用于二次处理。处理过程简单、快捷，简化了戏曲视频融合的步骤，使戏曲视频的融合更快、更清晰、更和谐。The invention discloses a three-dimensional human body posture recognition method based on angle vector calculation and neural network improvement. First, frame extraction is performed on an input original video file, and then the video frame is extracted into a two-dimensional human body posture through neural network OpenPose; secondly, using The angle vector calculation method converts the human body posture into a three-dimensional human body posture, which greatly improves the matching speed. At the same time, the ergonomic three-dimensional human body posture is calculated according to the two-dimensional posture, which improves the recognition accuracy; finally, the skeleton model Mapped to the original video frame, and form a new human skeleton model file, which can be used for secondary processing. The processing process is simple and fast, which simplifies the steps of opera video fusion and makes the opera video fusion faster, clearer and more harmonious.

将二维姿态进行归一化处理，通过归一化处理将倾斜的二维人体姿态变为正对镜头的二维人体姿态，以便于在三维人体姿态转换中正确的获取关节点之间的夹角。The two-dimensional pose is normalized, and the tilted two-dimensional human pose is changed into a two-dimensional human pose facing the camera through normalization, so as to correctly obtain the clip between joint points in the three-dimensional human pose transformation. horn.

本发明根据对真实场景的三维人体姿态识别方法进行研究，并就重建模型时遇到的模型时序抖动、环境物遮挡、人物图像不清晰以及检测速度等问题展开研究。针对时序抖动及检测人物图像不清晰的问题，提出了两阶段的姿态识别模型，利用时间卷积神经网络与时间帧序列的依赖关系，结合前后帧之间的信息有效消除时序抖动和人物图像模糊的问题，并且采用角向量计算重建三维人体姿态，有助于解决关节点遮挡、生成姿态不合理的问题。The invention conducts research on the three-dimensional human body pose recognition method in real scenes, and conducts research on problems such as model timing shake, environmental object occlusion, unclear character images, and detection speed encountered when rebuilding the model. Aiming at the problem of time series jitter and detection of unclear images of people, a two-stage pose recognition model is proposed, which uses the dependency relationship between temporal convolutional neural network and time frame sequence, and combines the information between the front and rear frames to effectively eliminate the time series jitter and blurred images of people , and the use of angle vector calculations to reconstruct 3D human poses helps to solve the problems of joint point occlusion and unreasonable generated poses.

附图说明Description of drawings

图1是本发明的流程图；Fig. 1 is a flow chart of the present invention;

图2是本发明的归一化示意图；Fig. 2 is a normalized schematic diagram of the present invention;

图3是本发明的角向量计算图；Fig. 3 is angle vector calculation figure of the present invention;

图4是本发明的姿态识别示意图；Fig. 4 is a schematic diagram of gesture recognition of the present invention;

具体实施方式Detailed ways

下面结合实施例对本发明的具体内容做进一步详细解释说明，但不作为对本发明的限定。The specific content of the present invention will be further explained in detail below in conjunction with the examples, but it is not intended to limit the present invention.

如图1所示，本发明提供一种基于角向量计算与神经网络改进的三维人体姿态识别方法，具体包括以下步骤：As shown in Figure 1, the present invention provides a three-dimensional human body gesture recognition method based on angle vector calculation and neural network improvement, which specifically includes the following steps:

步骤3：将步骤2中提取出的二维姿态进行归一化处理，如图2所示；Step 3: Normalize the two-dimensional pose extracted in step 2, as shown in Figure 2;

归一化矩阵R的获取方法为：以头部关节点所在平面的π作为法向量，旋转后则为以e_z为法向量的平面，旋转向量为A＝(a_x，a_y，a_z)，旋转角为θ。因为π与其他关节点组成的二维平面垂直，利用法向量和平面任意向量的乘积为0的原理可以得到(π_x，π_y，π_z)。令π′、e′_z为单位法向量，则可以得到公式(1)和(2)：The method of obtaining the normalization matrix R is: take the π of the plane where the head joint point is located as the normal vector, and after rotation, it becomes the plane with e _z as the normal vector, and the rotation vector is A=(a _x , a _y , a _z ), and the rotation angle is θ. Because π is perpendicular to the two-dimensional plane composed of other joint points, (π _x , π _y , π _z ) can be obtained by using the principle that the product of the normal vector and any vector on the plane is 0. Let π′, e′ _z be the unit normal vector, then formulas (1) and (2) can be obtained:

由此可得旋转矩阵R为：From this, the rotation matrix R can be obtained as:

在获得图像正面坐标后，假定肢体不会发生向后倾斜。随后测量所有互相连接的关节点的长度。接下来，以左肩关节点(1、2)以及左臂关节点(2、3)为例，做一个三维姿态识别的示例，如图3所示。After obtaining the frontal coordinates of the image, it is assumed that the limb does not tilt backward. The lengths of all interconnected joint points are then measured. Next, take the left shoulder joint point (1, 2) and the left arm joint point (2, 3) as examples to make a three-dimensional gesture recognition example, as shown in Figure 3.

对于二维平面来说，关节点(1、2)与关节点(2、3)之间的角度θ，这个角度是已知的，可以通过计算向量L₁₂和L₂₃之间的夹角得出，如图3(a)所示。For a two-dimensional plane, the angle θ between the joint point (1, 2) and the joint point (2, 3) is known, and can be obtained by calculating the angle between the vectors L ₁₂ and L ₂₃ out, as shown in Figure 3(a).

当关节点3进行前倾的时候，L₁₂保持不变，得到的二维输出其实是L′₂₃，其中3′代表前倾后关节点3的映射。根据两帧间的图像距离可以获得L′₂₃的长度和L₂₃的长度。由此可以计算出L′₂₃和L₂₃之间的夹角为θ′，如图3(b)所示，θ′可以通过公式(6)计算得出：When joint point 3 is leaning forward, L ₁₂ remains unchanged, and the obtained two-dimensional output is actually L′ ₂₃ , where 3′ represents the mapping of joint point 3 after forward leaning. The length of L' ₂₃ and the length of L ₂₃ can be obtained according to the image distance between two frames. Thus it can be calculated that the angle between L' ₂₃ and L ₂₃ is θ', as shown in Figure 3(b), θ' can be calculated by formula (6):

步骤5：通过对每个相连接的关节点均做上述操作，即可获取所有关节点的三维坐标位置。最后，将所有关节点连接成树形结构的人体骨骼特征图；Step 5: By performing the above operations on each connected joint point, the three-dimensional coordinate positions of all related nodes can be obtained. Finally, connect all relevant nodes into a tree-structured human skeleton feature map;

步骤6：将步骤5中得到骨骼特征图映射回原视频文件中；Step 6: Map the bone feature map obtained in step 5 back to the original video file;

步骤7：将经过步骤6处理过后骨骼特征图重新生成三维人体姿态视频，如图4所示。Step 7: Regenerate the 3D human pose video from the bone feature map processed in step 6, as shown in Figure 4.

Claims

1. A three-dimensional human body posture recognition method based on angular vector calculation and neural network improvement is characterized by comprising the following steps:

step 1: inputting an original video file and decomposing the original video file into video frames;

and 2, step: inputting the video frame in the step 1 into an OpenPose algorithm for two-dimensional posture extraction;

and step 3: normalizing the two-dimensional posture extracted in the step 2;

and 4, step 4: restoring the two-dimensional human body posture normalized in the step 3 into a three-dimensional human body posture through an angular quantity calculation formula;

and 5: 3, acquiring the three-dimensional coordinate positions of all the joint points by performing the step 3 on each connected joint point of the human body; finally, connecting all the joint points into a human skeleton characteristic diagram with a tree structure;

step 6: mapping the human skeleton characteristic diagram obtained in the step 5 back to the original video file;

and 7: and (6) regenerating the three-dimensional human body posture video from the bone characteristic diagram processed in the step 6.

2. The method for three-dimensional human body posture recognition based on angular vector calculation and neural network improvement of claim 1, characterized in that: in the step 2, the video frames are normalized, and the pixel value of each frame is set to 1920 × 1080.

3. The method for recognizing the three-dimensional human body posture based on the angular vector calculation and the neural network improvement according to claim 1, wherein the two-dimensional posture is normalized in the step 2, and the obtaining method of the normalization matrix R is as follows: the pi of the plane of the head joint point is taken as a normal vector, and the pi is taken as e after rotation _z Is the plane of normal vector, and the rotation vector is A = (a) _x ,a _y ,a _z ) The rotation angle is theta; because pi is perpendicular to a two-dimensional plane formed by other joint points, the method is obtained by using the principle that the product of a normal vector and a plane arbitrary vector is 0 (pi) _x ,π _y ,π _z ) Let pi' e _z ' is a unit normal vector. The calculation method of the normalization process is as follows:

the finally obtained R is a normalization matrix, and the normalization matrix R is multiplied by the original two-dimensional coordinate matrix to obtain the two-dimensional human body posture which is corrected to be over against the screen;

4. the method for three-dimensional human body posture recognition based on angular vector calculation and neural network improvement of claim 1, characterized in that: the angle quantity calculation formula used in the step 4 is as follows: