WO2022241583A1 - 一种基于多目视频的家庭场景动作捕捉方法 - Google Patents
一种基于多目视频的家庭场景动作捕捉方法 Download PDFInfo
- Publication number
- WO2022241583A1 WO2022241583A1 PCT/CN2021/093969 CN2021093969W WO2022241583A1 WO 2022241583 A1 WO2022241583 A1 WO 2022241583A1 CN 2021093969 W CN2021093969 W CN 2021093969W WO 2022241583 A1 WO2022241583 A1 WO 2022241583A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- motion
- key points
- key point
- key
- human
- Prior art date
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000001514 detection method Methods 0.000 claims abstract description 5
- 238000002372 labelling Methods 0.000 claims abstract description 5
- 210000000988 bone and bone Anatomy 0.000 claims description 51
- 230000003287 optical effect Effects 0.000 claims description 10
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 230000001815 facial effect Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 abstract 2
- 230000003993 interaction Effects 0.000 abstract 1
- 238000002360 preparation method Methods 0.000 abstract 1
- 230000002123 temporal effect Effects 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000007654 immersion Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 206010037180 Psychiatric symptoms Diseases 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
Definitions
- the invention belongs to the technical field of motion capture, and in particular relates to a multi-view video-based family scene motion capture method.
- the family scene motion capture technology based on multi-view video involved in the present invention can capture motion information of family members in real time and generate three-dimensional virtual character animations, thereby protecting user privacy, providing viewers with multiple viewing angles, and enhancing the sense of immersion.
- Human body motion capture technology is widely used in film and television, games, animation and other fields. This technology captures the action characteristics of the real human body, drives the virtual character model, and generates 3D animation.
- Optical human motion capture technology can be divided into marker-based human motion capture technology and video-based human motion capture technology.
- the body capture technology based on marker points requires the human body to wear specific sensors or cursors that can reflect infrared lasers to collect key point information of the human body. But this kind of method equipment is expensive, and is not suitable for motion capture in daily life.
- the video-based human body motion capture technology does not need to wear equipment, and can calculate the spatial position of the key points of the human body based on the image sequence captured by multiple calibration cameras, and restore the human body posture.
- motion capture based on multi-eye video is more robust to depth ambiguity and occlusion problems, and is more in line with the technical requirements of this patent.
- the SMPL model (Skinned Multi-Person Linear model) is a parametric model of the human body that contains a large number of human body priors.
- the SMPL model defines human posture and posture through 10 shape parameters and 72 pose parameters.
- the objective function of the distance between the gesture features extracted from the video and the human body parameter model features can be established, and the motion capture problem can be transformed into the objective function minimization problem.
- the present invention provides a family scene motion capture method based on multi-view video, which aims to generate real-time animation of the family scene by using the motion capture technology, and has robustness in occlusion situations.
- the present invention comprises following main steps:
- Step 1 camera placement, place multiple calibration cameras in the home to be detected, and obtain multi-angle videos of the home in real time.
- Step 2 building and labeling the family scene model, creating a 3D virtual scene model based on the real family scene, and making necessary labels on the 3D virtual scene.
- This step includes:
- Step 2.1 perform 3D modeling of the family scene to be detected.
- step 2.2 mark functional areas such as common walking passages and sitting areas in the 3D scene. And in the fixed functional areas such as sofas, tables and chairs, the face orientation of the characters when performing routine actions is defined, which is used to assist the generation of common behavior animations of the characters.
- Step 2.3 Establish a family member action database, and pre-create family member models, guest standard appearance models, and common action animations, such as walking, standing, and sitting, based on the parametric human body model SMPL.
- Step 3 human body 2D key point detection, detects human body 2D key point coordinates and PAF (PartAffinityField) in the multi-view video.
- PAF PartAffinityField
- This step includes:
- J represents the number of key points in a single human skeleton
- C represents the number of bones in a single human skeleton
- PAF of bones of class c Denotes the PAF of bones of class c, where c ⁇ ⁇ 1,...,C ⁇ .
- Step 3.2 use the non-maximum value suppression algorithm to find out the heat map set of all jth key points in S j in, Represents the heat map of the jth key point of the mth person in the scene, M is the number of people in the scene, m ⁇ 1,...,M ⁇ .
- Step 3.3 calculate Coordinates of the mid-maximum point That is, the 2D coordinates of the jth key point of the mth person in the scene.
- Step 4 human skeleton assembly, assemble the detected multi-person 2D key points to form multiple groups of human 2D skeletons, and establish the connection between key points in different perspectives, as well as the key points between the current frame and the previous frame. connect.
- This step includes:
- Step 4.1 construct the initial key point association graph G:
- V is the point set of graph G
- E is the edge set of graph G.
- D t-1 indicates the bone 3D key point obtained in frame t-1, if there is no frame t-1, this item is ignored.
- E P there are edges connecting two key points of different types in the human skeleton.
- E V there is an edge connection between the key points of the same type of human skeleton, which is represented by E V .
- each key point is connected to all key points of the same class in D t-1 , denoted by E T , if there is no t-1 frame, this item is ignored. .
- Step 4.2 the goal is to solve the initial key point association graph G, and obtain the real key point association graph G’ that can correctly represent the key point connection:
- Step 4.3 for the edge of graph G Perform weight assignment:
- L c (x) represents the PAF value at point x.
- x(u) means at the key point and Interpolation points on the line.
- Step 4.4 for the edge of graph G Perform weight assignment:
- K c represents the internal parameter matrix of camera c
- Z is the normalization coefficient, the Normalized to [0,1].
- Step 4.5 for the edges of graph G Perform weight assignment:
- Step 4.6 calculate human bone bundle
- human bone bundle Represents a subgraph consisting of keypoints of the i-th category of the m-th person and the j-th category of the m-th person in the real keypoint association graph G'.
- This step includes:
- step 4.6.1 in the initial key point association graph G, the subgraph composed of all i-th key points and all j-th key points is recorded as
- q(z) p(z) ⁇ z
- represents the number of points in g c
- w p , w m , w t , w v are weight coefficients.
- Step 4.6.2 let Repeat step 4.6.1 until Is empty.
- step 4.7 traverse all the bones of the human body to obtain the set B of human bone bundles.
- step 4.8 arrange the human bone bundles B according to the scores of the formula (10) from large to small to form a queue Q.
- Step 4.9 Initially, the ground-truth keypoint association map
- Step 4.10 take out the team leader skeleton bundle from the queue Q
- all keypoints d included should be assigned the label of the same person. like And d i , d j have been given different character labels in G', then There is a conflict with G'.
- the According to the character label in G' it is split into the bone bundles of different characters, and the new bone bundle score is calculated according to the formula (10), and it is added to the queue Q again.
- Step 5 Reconstruct the existing actions in the action database. For recognizable common actions, directly call the preset action animations in the action database to save computing costs.
- This step includes:
- Step 5.1 using the collected image sequence and 2D skeleton information to identify the identity and action of the current person.
- Step 5.2 judging whether the current character action has been stored in the action database. If it has been stored, use steps 5.3 and 5.4 to generate character animation. If not stored, go to step 6.
- Step 5.3 based on the triangulation method, the three-dimensional coordinates of the root key points are calculated using the image coordinates of the root key points of the human body acquired by the dual-target fixed camera.
- step 5.4 align the root node of the character model in the initial frame of animation in the action database with the three-dimensional coordinates calculated in step 5.3, and determine the rotation direction of the root node with the help of the facial direction annotation in step 2.2. Subsequently, the animation in the action database is played.
- this step method can be used to calculate the position of the root node at the end of the action, and use the marking of the walking channel in step 2.2 to determine the path of the motion process.
- Step 5.5 if it is detected that the action of the character is switched, return to step 5.2.
- Step 6 real-time motion reconstruction. If the current motion is not stored in the motion database, use the 3D model to fit the 2D human skeleton to reconstruct the 3D motion of the character in real time.
- This step includes,
- step 6.1 according to the identification result of the person in step 5.1, the parameterized human body model of the corresponding family member is called out from the database. Fit the parametric human model to the motion of the 2D human skeleton assembled in step 3 by minimizing the objective function (11). If the current character identity is a family member, keep the initial shape parameter ⁇ of the model, and only optimize the pose parameter ⁇ . If the current character identity is a guest, the shape parameter ⁇ and the posture parameter ⁇ of the human body model are optimized at the same time in the first frame, and only the posture parameter ⁇ is optimized in subsequent frames.
- ⁇ J , ⁇ shape , ⁇ temp , ⁇ ⁇ are weight parameters.
- aE J is the joint distance penalty item:
- ⁇ i,c represents the confidence score of the i-th key point of the person in the c-th viewing angle
- R ⁇ (J( ⁇ ) i ) represents the 3D coordinates of the i-th key point in the SMPL model
- J i,c represents the 2D coordinates of the i-th key point in the c-th viewing angle
- ⁇ ( ⁇ ) is the Geman-McClure penalty function.
- bE shape is the shape penalty item:
- l i,t represents the length of the i-th bone in the current frame t
- C represents the set of human bones.
- cE temp is a time smoothing item:
- ⁇ is the weight parameter
- ⁇ v j,t represents the trend of joint point j moving forward in frame t
- ⁇ v j,t R ⁇ (J( ⁇ ) j,t-1 -R ⁇ (J( ⁇ ) j,t-2 , ⁇ i ,t represent the pose parameters of the i-th bone in the t-th frame.
- ⁇ j (g j N( ⁇ ; ⁇ ⁇ ,j , ⁇ ⁇ ,j ) is the prior Gaussian mixture model about the pose parameter ⁇ established using the CMUMoCaP dataset.
- Step 7 Judging and processing the occlusion situation during real-time motion reconstruction, judging and processing the occlusion of key points of the human body during real-time motion reconstruction, resulting in the problem that 2D key points cannot be recognized or recognized incorrectly.
- This step includes:
- Step 7.1 if the 2D human skeleton formed in step 4 is incomplete in all viewing angles, or the confidence of some detected key points in all viewing angles is lower than the preset threshold T, then it is considered that some key points of the human body are occluded , in the blind spot of the viewing angle.
- Step 7.2 for the occlusion of shorter consecutive frames, when performing real-time reconstruction in step 6, increase the weight coefficient ⁇ temp of the occluded key points in formula (11), and strengthen the current human body 3D key point estimation to the key points of the previous frame rely.
- step 7.3 for the occlusion of longer continuous frames, especially the long-term occlusion of specific key points, the processing of step 7.2 is prone to cumulative errors.
- the character is generally in a relatively static state, for example, the key points of the lower body are blocked when sitting at the table.
- the standard posture model closest to the current posture is called from the action database, such as standard sitting posture, standard standing posture, standard lying posture, etc., and its posture parameter ⁇ .
- ⁇ j represents the axis angle rotation of the key point j in the skeletal joint chain relative to the parent key point.
- the parameter ⁇ of the standard pose model is used as the initial value for action regression, and only the parameter ⁇ of the key points with high confidence is optimized during regression, and the occluded key points keep the original parameter ⁇ .
- Fig. 1 shows a kind of family scene motion capture method based on multi-purpose video of the present invention
- Fig. 2 shows an example of an initial key point association graph G of an example of the present invention
- Fig. 3 shows the real key point association graph G' example of the example of the present invention
- Fig. 4 shows an example of a skeletal bundle definition of an example of the present invention
- Step 1 camera placement, place multiple calibration cameras in the home to be detected, and obtain multi-angle videos of the home in real time.
- Step 2 building and labeling the family scene model, creating a 3D virtual scene model based on the real family scene, and making necessary labels on the 3D virtual scene.
- This step includes:
- Step 2.1 perform 3D modeling of the family scene to be detected.
- step 2.2 mark functional areas such as common walking passages and sitting areas in the 3D scene. And in the fixed functional areas such as sofas, tables and chairs, the face orientation of the characters when performing routine actions is defined, which is used to assist the generation of common behavior animations of the characters.
- Step 2.3 Establish a family member action database, and pre-create family member models, guest standard appearance models, and common action animations, such as walking, standing, and sitting, based on the parametric human body model SMPL.
- Step 3 human body 2D key point detection, detects human body 2D key point coordinates and PAF (PartAffinityField) in the multi-view video.
- PAF PartAffinityField
- This step includes:
- J represents the number of key points in a single human skeleton
- C represents the number of bones in a single human skeleton
- PAF of bones of class c Denotes the PAF of bones of class c, where c ⁇ ⁇ 1,...,C ⁇ .
- Step 3.2 use the non-maximum value suppression algorithm to find out the heat map set of all jth key points in S j in, Represents the heat map of the jth key point of the mth person in the scene, M is the number of people in the scene, m ⁇ 1,...,M ⁇ .
- Step 3.3 calculate Coordinates of the mid-maximum point That is, the 2D coordinates of the jth key point of the mth person in the scene.
- Step 4 human skeleton assembly, assemble the detected multi-person 2D key points to form multiple groups of human 2D skeletons, and establish the connection between key points in different perspectives, as well as the key points between the current frame and the previous frame. connect.
- This step includes:
- Step 4.1 construct the initial key point association graph G:
- V is the point set of graph G
- E is the edge set of graph G.
- D t-1 indicates the bone 3D key point obtained in frame t-1, if there is no frame t-1, this item is ignored.
- E P there are edges connecting two key points of different types in the human skeleton.
- E V there is an edge connection between the key points of the same type of human skeleton, which is represented by E V .
- each key point is connected to all key points of the same class in D t-1 , denoted by E T , if there is no t-1 frame, this item is ignored.
- the initial key point association graph G is shown in Figure 2. For the sake of clarity, only two perspectives and two types of key points are shown in Figure 2.
- Step 4.2 the goal is to solve the initial key point association graph G, and obtain the real key point association graph G’ that can correctly represent the key point connection:
- the real key point association graph G’ is shown in Figure 3. For clarity, only two perspectives and two types of key points are shown in Figure 3.
- Step 4.3 for the edge of graph G Perform weight assignment:
- L c (x) represents the PAF value at point x.
- x(u) means at the key point and Interpolation points on the line.
- Step 4.4 for the edge of graph G Perform weight assignment:
- K c represents the internal parameter matrix of camera c
- Z is the normalization coefficient, the Normalized to [0,1].
- Step 4.5 for the edges of graph G Perform weight assignment:
- Step 4.6 calculate human bone bundle
- human bone bundle Represents a subgraph consisting of keypoints of the i-th category of the m-th person and the j-th category of the m-th person in the real keypoint association graph G'.
- a bone bundle is shown in Figure 4.
- This step includes:
- step 4.6.1 in the initial key point association graph G, the subgraph composed of all i-th key points and all j-th key points is recorded as
- q(z) p(z) ⁇ z
- represents the number of points in g c
- w p , w m , w t , w v are weight coefficients.
- Step 4.6.2 let Repeat step 4.6.1 until Is empty.
- step 4.7 traverse all the bones of the human body to obtain the set B of human bone bundles.
- step 4.8 arrange the human bone bundles B according to the scores of the formula (10) from large to small to form a queue Q.
- Step 4.9 Initially, the ground-truth keypoint association map
- Step 4.10 take out the team leader skeleton bundle from the queue Q
- all keypoints d included should be assigned the label of the same person. like And d i , d j have been given different character labels in G', then There is a conflict with G'.
- the According to the character label in G' it is split into the bone bundles of different characters, and the new bone bundle score is calculated according to the formula (10), and it is added to the queue Q again.
- Step 5 Reconstruct the existing actions in the action database. For recognizable common actions, directly call the preset action animations in the action database to save computing costs.
- This step includes:
- Step 5.1 using the collected image sequence and 2D skeleton information to identify the identity and action of the current person.
- Step 5.2 judging whether the current character action has been stored in the action database. If it has been stored, use steps 5.3 and 5.4 to generate character animation. If not stored, go to step 6.
- Step 5.3 based on the triangulation method, the three-dimensional coordinates of the root key points are calculated using the image coordinates of the root key points of the human body acquired by the dual-target fixed camera.
- step 5.4 align the root node of the character model in the initial frame of animation in the action database with the three-dimensional coordinates calculated in step 5.3, and determine the rotation direction of the root node with the help of the facial direction annotation in step 2.2. Subsequently, the animation in the action database is played.
- this step method can be used to calculate the position of the root node at the end of the action, and use the marking of the walking channel in step 2.2 to determine the path of the motion process.
- Step 5.5 if it is detected that the action of the character is switched, return to step 5.2.
- Step 6 real-time motion reconstruction. If the current motion is not stored in the motion database, use the 3D model to fit the 2D human skeleton to reconstruct the 3D motion of the character in real time.
- This step includes,
- step 6.1 according to the identification result of the person in step 5.1, the parameterized human body model of the corresponding family member is called out from the database. Fit the parametric human model to the motion of the 2D human skeleton assembled in step 3 by minimizing the objective function (11). If the current character identity is a family member, the initial shape parameter ⁇ of the model is maintained, and only the posture parameter ⁇ is optimized. If the current character identity is a guest, the shape parameter ⁇ and the posture parameter ⁇ of the human body model are optimized at the same time in the first frame, and only the posture parameter ⁇ is optimized in subsequent frames.
- ⁇ J , ⁇ shape , ⁇ temp , ⁇ ⁇ are weight parameters.
- aE J is the joint distance penalty item:
- ⁇ i,c represents the confidence score of the i-th key point of the person in the c-th viewing angle
- R ⁇ (J( ⁇ ) i ) represents the 3D coordinates of the i-th key point in the SMPL model
- J i,c represents the 2D coordinates of the i-th key point in the c-th viewing angle
- ⁇ ( ⁇ ) is the Geman-McClure penalty function.
- bE shape is the shape penalty item:
- l i,t represents the length of the i-th bone in the current frame t
- C represents the set of human bones.
- cE temp is a time smoothing item:
- ⁇ is the weight parameter
- ⁇ v j,t represents the trend of joint point j moving forward in frame t
- ⁇ v j,t R ⁇ (J( ⁇ ) j,t-1 -R ⁇ (J( ⁇ ) j,t-2 , ⁇ i ,t represent the pose parameters of the i-th bone in the t-th frame.
- ⁇ j (g j N( ⁇ ; ⁇ ⁇ ,j , ⁇ ⁇ ,j ) is the prior Gaussian mixture model about the pose parameter ⁇ established using the CMUMoCaP dataset.
- Step 7 Judging and processing the occlusion situation during real-time motion reconstruction, judging and processing the occlusion of key points of the human body during real-time motion reconstruction, resulting in the problem that 2D key points cannot be recognized or recognized incorrectly.
- This step includes:
- Step 7.1 if the 2D human skeleton formed in step 4 is incomplete in all viewing angles, or the confidence of some detected key points in all viewing angles is lower than the preset threshold T, then it is considered that some key points of the human body are occluded , in the blind spot of the viewing angle.
- Step 7.2 for the occlusion of shorter consecutive frames, when performing real-time reconstruction in step 6, increase the weight coefficient ⁇ temp of the occluded key points in formula (11), and strengthen the current human body 3D key point estimation to the key points of the previous frame rely.
- step 7.3 for the occlusion of longer continuous frames, especially the long-term occlusion of specific key points, the processing of step 7.2 is prone to cumulative errors.
- the character is generally in a relatively static state, for example, the key points of the lower body are blocked when sitting at the table.
- the standard posture model closest to the current posture is called from the action database, such as standard sitting posture, standard standing posture, standard lying posture, etc., and its posture parameter ⁇ .
- ⁇ j represents the axis angle rotation of the key point j in the skeletal joint chain relative to the parent key point.
- the parameter ⁇ of the standard pose model is used as the initial value for action regression, and only the parameter ⁇ of the key points with high confidence is optimized during regression, and the occluded key points keep the original parameter ⁇ .
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
本发明提供了一种基于多目视频的家庭场景动作捕捉方法,可在家庭场景下进行多人动作捕捉,帮助使用者通过电子设备与远程家庭进行交互。该方法包括相机放置、家庭场景模型构建及标注、人体2D关键点检测、人体骨架组装、动作数据库中已有动作重建、实时动作重建、对遮挡情况的判断与处理。放置相机是获取家庭中多角度视频的准备工作。家庭场景模型构建及标注为后续动作重建提供了动作约束条件和先验信息。在实际动作重建时,本方法利用人体2D关键点检测确定多人场景中所有人体关键点的二维坐标。随后,人体骨架组装是为了在多人场景中将正确的2D关键点连接,形成单人2D骨架,并建立多视角2D骨骼点以及前一帧3D骨骼点之间的联系,为人体3D关键点预测提供时间和空间维度的信息。动作数据库中已有动作重建是利用家庭场景人物行动较为单一的特性,通过预先定义的人物动作动画减少实时动作重建次数的手段。实时动作重建针对动作数据库中没有的动作,用3D模型拟合2D关键点,最终以3D模型呈现出当前人物三维姿势。最后,本方法还对遮挡情况进行了判断和纠正,从而减少人体关键点被遮挡时出现的动作重建错误,使本方法在家庭场景中拥有更强的鲁棒性。本发明可以有效地适应家庭场景的多人动作捕捉,在保证家庭隐私的情况下,为用户提供了远程家庭场景本地呈现的技术手段。
Description
本发明属于动作捕捉的技术领域,尤其涉及一种基于多目视频的家庭场景动作捕捉方法。
随着我国老龄化日渐严重,空巢老人越来越多。通过技术手段,将远程子女的家庭情景呈现在本地,可缓解独居老人内心的孤独感。然而,以家庭视频监控为代表的相关技术虽然实施简单,却有容易泄露家庭隐私、视角单一、缺乏沉浸感的缺点。本发明涉及的基于多目视频的家庭场景动作捕捉技术可实时捕获家庭成员动作信息,生成三维虚拟人物动画,从而保护使用者隐私,并为观看者提供多种观看视角,增强浸入感。
人体动作捕捉技术被广泛应用于影视、游戏、动画等领域。该技术通过捕捉真实人体的动作特征,驱动虚拟人物模型,产生三维动画。光学式人体动作捕捉技术可被分为基于标记点的人体动作捕捉技术和基于视频的人体动作捕捉技术。基于标记点的人体捕捉技术需要人体佩戴特定的传感器或可反射红外激光的光标,以采集人体关键点信息。但此类方法设备造价昂贵,且不适合日常生活中的动作捕捉。基于视频的人体动作捕捉技术无需佩戴设备,可根据多个标定相机拍摄的图像序列计算出人体关键点的空间位置,恢复出人体姿态。相比于基于单目视频的动作捕捉,基于多目视频的动作捕捉对深度歧义和遮挡问题有更好的鲁棒性,更符合本专利的技术需求。
SMPL模型(Skinned Multi-Person Linear model)是包含大量人体先验的人体参数化模型。SMPL模型通过10个形状参数和72个姿势参数对人体体态和姿势进行定义。利用SMPL模型,可以建立从视频提取的姿态特征和人体参数模型特征之间距离的目标函数,将动作捕捉问题转化为目标函数最小化问题。
发明内容
本发明提供了一种基于多目视频的家庭场景动作捕捉方法,旨在利用动作捕捉技术,生成家庭场景的实时动画,并在遮挡情景下具有鲁棒性。本发明包括以下主要步骤:
步骤1,相机放置,在待检测家庭中放置多个标定相机,实时获取家庭的多 角度视频。
步骤2,家庭场景模型构建及标注,根据真实家庭场景创建三维虚拟场景模型,对三维虚拟场景进行必要标注。
本步骤包括:
步骤2.1,对待检测家庭场景进行三维建模。
步骤2.2,在三维场景中标注常用行走通道、可坐区域等功能区。并在沙发、桌椅等固定的功能区中,对人物进行常规动作时的面部朝向进行定义,用来辅助人物常见行为动画的生成。
步骤2.3,建立家庭成员动作数据库,基于参数化人体模型SMPL预先创建各家庭成员模型、客人标准样貌模型,以及常见动作动画,如行走、站立、静坐等。
步骤3,人体2D关键点检测,检测多目视频中的人体2D关键点坐标和PAF(PartAffinityField)。
本步骤包括:
步骤3.1,将各角度视频的当前帧输入OpenPose卷积神经网络,得到置信度图集合S=(S
1,S
2,...,S
J)和PAF集合L=(L
1,L
2,...,L
C)。
步骤4,人体骨架组装,对检测到的多人2D关键点进行组装,形成多组人体2D骨架,并建立不同视角中关键点之间的联系,以及当前帧与前一帧关键点之间的联系。
本步骤包括:
步骤4.1,构建初始关键点关联图G:
G=(V,E),V=D
j(c)∪D
t-1,E=E
P∪E
V∪E
T (1)
其中,V为图G的点集,E为图G的边集。
表示在当前帧t中,视角c里第j类关键点中的第m个候选点,j∈{1,2,...,J},c∈{1,2,...,N},N为相机个数。D
t-1表示t-1帧求出的骨骼3D关键点,若不存在t-1帧,则忽略这一项。在图G中,同一视角里,人体骨架中不同类的关键点两两之间有边相连,用E
P表示。不同视角中,人体骨架相同类的关键点两两之间有边相连,用E
V表示。每个视角中,每个关键点与D
t-1中所有相同类的关键点相连,用E
T表示,若不存在t-1帧,则忽略这一项。。
步骤4.2,目标是对初始关键点关联图G求解,得到能够正确表示关键点联系的真实关键点关联图G’:
G’=(V,E’),V=D
j(c)∪D
t-1,E’=E’
p∪E’
v∪E’
T (2)
其中,在G’中,同一视角中的关键点以真实人体骨架对应的边相连,用E’
p表示;不同视角中,同一人物的同类关键点以边相连,用E’
v表示;每个视角中,每个关键点与D
t-1中同一人物的同类关键点相连,用E’
T表示。步骤4.1-4.10即对G’求解过程。
本步骤包括:
步骤4.6.1,将初始关键点关联图G中,所有第i类关键点和所有第j类关键点组成的子图记为
在多人场景下,
中包含多个人体骨骼束。从
生成的所有候选骨骼束中计算出可令目标方程(10)最大的骨骼束g
c,作为真实的骨骼束。
其中,q(z)=p(z)·z,|V
c|表示g
c中点的个数,w
p,w
m,w
t,w
v为权重系数。
步骤4.7,根据步骤4.6,遍历人体所有骨骼,求出人体骨骼束集合B。
步骤4.8,将人体骨骼束B按照公式(10)的得分,由大到小排列,构成队列Q。
步骤5,动作数据库中已有动作重建,对于可识别的常见动作,直接调用动作数据库中预置动作动画,节约运算开销。
本步骤包括:
步骤5.1,利用采集的图像序列和2D骨骼信息,识别当前人物身份与动作。
步骤5.2,判断当前人物动作是否已存储于动作数据库中。若已存储,则利用步骤5.3,5.4生成人物动画。若未存储,则进入步骤6。
步骤5.3,基于三角测量法,利用双目标定相机获取的人体根关键点图像坐标计算出根关键点的三维坐标。
步骤5.4,将动作数据库中动画初始帧的人物模型根结点对齐步骤5.3中计算出的三维坐标,并借助步骤2.2的面部方向标注确定根结点旋转方向。随后,播放动作数据库中的动画。在处理行走类动作时,可利用本步骤方法计算出动作结束时根结点的位置,并利用步骤2.2中对行走通道的标记确定运动过程路径。
步骤5.5,若检测到人物动作发生切换,则返回步骤5.2。
步骤6,实时动作重建,若当前动作未存储在动作数据库中,则利用三维模型拟合2D人体骨架,实时重建出人物三维动作。
本步骤包括,
步骤6.1,根据步骤5.1中人物身份识别结果,从数据库中调出相应家庭成员的参数化人体模型。通过最小化目标函数(11),令参数化人体模型与步骤3中组装的2D人体骨架动作拟合。若当前人物身份是家庭成员,则保持模型初始 形状参数β,只对姿势参数θ进行优化。若当前人物身份是客人,则在第一帧同时优化人体模型的形状参数β与姿势参数θ,后续帧只对姿势参数θ进行优化。
E(β,θ)=λ
JE
J+λ
shapeE
shape+λ
tempE
temp+λ
θE
θ (11)
其中,λ
J,λ
shape,λ
temp,λ
θ为权重参数。
a.E
J为关节距离惩罚项:
其中,对于单个人物,η
i,c表示第c个视角中此人的第i类关键点的置信分数,R
θ(J(β)
i)表示SMPL模型中第i类关键点的3D坐标,
表示第i类关键点向第c个相机的图像平面投影的2D坐标,J
i,c表示第c个视角中第i类关键点的2D坐标,ρ(·)为Geman-McClure惩罚函数。
b.E
shape为形状惩罚项:
c.E
temp为时间平滑项:
其中,α为权重参数,Δv
j,t表示表示第t帧关节点j向前运动的趋势,Δv
j,t=R
θ(J(β)
j,t-1-R
θ(J(β)
j,t-2,θ
i,t表示第t帧第i类骨骼的姿势参数。
d.E
θ为动作惩罚项:
其中,∑
j(g
jN(θ;μ
θ,j,Σ
θ,j)为利用CMUMoCaP数据集建立的关于姿势参数θ 的先验高斯混合模型。
步骤7,判断并处理实时动作重建时的遮挡情况,判断并处理实时动作重建时人体关键点被遮挡,导致2D关键点无法识别或识别错误的问题。
本步骤包括:
步骤7.1,若步骤4组成的2D人体骨架在所有视角中都不完整,或者检测出的部分关键点在所有视角的置信度都低于预设阈值T,则认为该人体有部分关键点被遮挡,处于视角盲区。
步骤7.2,对于较短连续帧的遮挡,在步骤6进行实时重建时,增大式(11)中被遮挡关键点的权重系数λ
temp,加强当前人体3D关键点估算对前一帧关键点的依赖。
步骤7.3,对于较长连续帧的遮挡,特别是特定关键点的长时间遮挡,步骤7.2的处理容易产生累积误差。此种情况下,人物一般处于较为静止的状态,例如,坐于桌前时下半身的关键点被遮挡。此时,根据图片识别结果,从动作数据库中调出最接近当前姿态的标准姿态模型,如标准坐姿、标准站姿、标准卧姿等,及其姿势参数θ。
其中,ω
j表示骨骼关节链中关键点j相对于父关键点的轴角旋转。
根据式(11),以标准姿态模型的参数θ为初始值进行动作回归,进行回归时只对置信度高的关键点的参数ω进行优化,被遮挡关键点保持原本的参数ω。
图1示出了本发明一种基于多目视频的家庭场景动作捕捉方法;
图2示出了本发明实例的初始关键点关联图G示例;
图3示出了本发明实例的真实关键点关联图G'示例;
图4示出了本发明实例的骨骼束定义示例;
下面结合附图和实施例对本发明优先实施方式进一步说明。
图1所示的流程图给出了本发明整个实施的具体过程:
步骤1,相机放置,在待检测家庭中放置多个标定相机,实时获取家庭的多 角度视频。
步骤2,家庭场景模型构建及标注,根据真实家庭场景创建三维虚拟场景模型,对三维虚拟场景进行必要标注。
本步骤包括:
步骤2.1,对待检测家庭场景进行三维建模。
步骤2.2,在三维场景中标注常用行走通道、可坐区域等功能区。并在沙发、桌椅等固定的功能区中,对人物进行常规动作时的面部朝向进行定义,用来辅助人物常见行为动画的生成。
步骤2.3,建立家庭成员动作数据库,基于参数化人体模型SMPL预先创建各家庭成员模型、客人标准样貌模型,以及常见动作动画,如行走、站立、静坐等。
步骤3,人体2D关键点检测,检测多目视频中的人体2D关键点坐标和PAF(PartAffinityField)。
本步骤包括:
步骤3.1,将各角度视频的当前帧输入OpenPose卷积神经网络,得到置信度图集合S=(S
1,S
2,...,S
J)和PAF集合L=(L
1,L
2,...,L
C)。
步骤4,人体骨架组装,对检测到的多人2D关键点进行组装,形成多组人体2D骨架,并建立不同视角中关键点之间的联系,以及当前帧与前一帧关键点之间的联系。
本步骤包括:
步骤4.1,构建初始关键点关联图G:
G=(V,E),V=D
j(c)∪D
t-1,E=E
P∪E
V∪E
T (1)
其中,V为图G的点集,E为图G的边集。
表示在当前帧t中,视角c里第j类关键点中的第m个候选点,j∈{1,2,...,J},c∈{1,2,...,N},N为相机个数。D
t-1表示t-1帧求出的骨骼3D关键点,若不存在t-1帧,则忽略这一项。在图G中,同一视角里,人体骨架中不同类的关键点两两之间有边相连,用E
P表示。不同视角中,人体骨架相同类的关键点两两之间有边相连,用E
V表示。每个视角中,每个关键点与D
t-1中所有相同类的关键点相连,用E
T表示,若不存在t-1帧,则忽略这一项。初始关键点关联图G如图2所示,为了表述清晰,图2中只画出了两个视角、两类关键点的示意图。
步骤4.2,目标是对初始关键点关联图G求解,得到能够正确表示关键点联系的真实关键点关联图G’:
G’=(V,E’),V=D
j(c)∪D
t-1,E’=E’
p∪E’
v∪E’
T (2)
其中,在G’中,同一视角中的关键点以真实人体骨架对应的边相连,用E’
p表示;不同视角中,同一人物的同类关键点以边相连,用E’
v表示;每个视角中,每个关键点与D
t-1中同一人物的同类关键点相连,用E’
T表示。步骤4.1-4.10即对G’求解过程。
真实关键点关联图G’如图3所示,为了表述清晰,图3中只画出了两个视角、两类关键点的示意图。
本步骤包括:
步骤4.6.1,将初始关键点关联图G中,所有第i类关键点和所有第j类关键点组成的子图记为
在多人场景下,
中包含多个人体骨骼束。从
生成的所有候选骨骼束中计算出可令目标方程(10)最大的骨骼束g
c,作为真实的骨骼束。
其中,q(z)=p(z)·z,|V
c|表示g
c中点的个数,w
p,w
m,w
t,w
v为权重系数。
步骤4.7,根据步骤4.6,遍历人体所有骨骼,求出人体骨骼束集合B。
步骤4.8,将人体骨骼束B按照公式(10)的得分,由大到小排列,构成队列Q。
步骤5,动作数据库中已有动作重建,对于可识别的常见动作,直接调用动作数据库中预置动作动画,节约运算开销。
本步骤包括:
步骤5.1,利用采集的图像序列和2D骨骼信息,识别当前人物身份与动作。
步骤5.2,判断当前人物动作是否已存储于动作数据库中。若已存储,则利用步骤5.3,5.4生成人物动画。若未存储,则进入步骤6。
步骤5.3,基于三角测量法,利用双目标定相机获取的人体根关键点图像坐标计算出根关键点的三维坐标。
步骤5.4,将动作数据库中动画初始帧的人物模型根结点对齐步骤5.3中计算出的三维坐标,并借助步骤2.2的面部方向标注确定根结点旋转方向。随后,播放动作数据库中的动画。在处理行走类动作时,可利用本步骤方法计算出动作结束时根结点的位置,并利用步骤2.2中对行走通道的标记确定运动过程路径。
步骤5.5,若检测到人物动作发生切换,则返回步骤5.2。
步骤6,实时动作重建,若当前动作未存储在动作数据库中,则利用三维模型拟合2D人体骨架,实时重建出人物三维动作。
本步骤包括,
步骤6.1,根据步骤5.1中人物身份识别结果,从数据库中调出相应家庭成员的参数化人体模型。通过最小化目标函数(11),令参数化人体模型与步骤3中组装的2D人体骨架动作拟合。若当前人物身份是家庭成员,则保持模型初始形状参数β,只对姿势参数θ进行优化。若当前人物身份是客人,则在第一帧同时优化人体模型的形状参数β与姿势参数θ,后续帧只对姿势参数θ进行优化。
E(β,θ)=λ
JE
J+λ
shapeE
shape+λ
tempE
temp+λ
θE
θ (11)
其中,λ
J,λ
shape,λ
temp,λ
θ为权重参数。
a.E
J为关节距离惩罚项:
其中,对于单个人物,η
i,c表示第c个视角中此人的第i类关键点的置信分数,R
θ(J(β)
i)表示SMPL模型中第i类关键点的3D坐标,
表示第i类关键点向第c个相机的图像平面投影的2D坐标,J
i,c表示第c个视角中第i类关键点的2D坐标,ρ(·)为Geman-McClure惩罚函数。
b.E
shape为形状惩罚项:
c.E
temp为时间平滑项:
其中,α为权重参数,Δv
j,t表示表示第t帧关节点j向前运动的趋势,Δv
j,t=R
θ(J(β)
j,t-1-R
θ(J(β)
j,t-2,θ
i,t表示第t帧第i类骨骼的姿势参数。
d.E
θ为动作惩罚项:
其中,∑
j(g
jN(θ;μ
θ,j,Σ
θ,j)为利用CMUMoCaP数据集建立的关于姿势参数θ的先验高斯混合模型。
步骤7,判断并处理实时动作重建时的遮挡情况,判断并处理实时动作重建时人体关键点被遮挡,导致2D关键点无法识别或识别错误的问题。
本步骤包括:
步骤7.1,若步骤4组成的2D人体骨架在所有视角中都不完整,或者检测出的部分关键点在所有视角的置信度都低于预设阈值T,则认为该人体有部分关键点被遮挡,处于视角盲区。
步骤7.2,对于较短连续帧的遮挡,在步骤6进行实时重建时,增大式(11)中被遮挡关键点的权重系数λ
temp,加强当前人体3D关键点估算对前一帧关键点的依赖。
步骤7.3,对于较长连续帧的遮挡,特别是特定关键点的长时间遮挡,步骤7.2的处理容易产生累积误差。此种情况下,人物一般处于较为静止的状态,例如,坐于桌前时下半身的关键点被遮挡。此时,根据图片识别结果,从动作数据库中调出最接近当前姿态的标准姿态模型,如标准坐姿、标准站姿、标准卧姿等,及其姿势参数θ。
其中,ω
j表示骨骼关节链中关键点j相对于父关键点的轴角旋转。
根据式(11),以标准姿态模型的参数θ为初始值进行动作回归,进行回归时只对置信度高的关键点的参数ω进行优化,被遮挡关键点保持原本的参数ω。
Claims (6)
- 一种基于多目视频的家庭场景动作捕捉方法,其特征在于,包括以下步骤:步骤1,相机放置,在待检测家庭中放置多个标定相机,实时获取家庭的多角度视频。步骤2,家庭场景模型构建及标注,根据真实家庭场景创建三维虚拟场景模型,对三维虚拟场景进行必要标注。步骤3,人体2D关键点检测,检测多目视频中的人体2D关键点坐标和PAF(Part Affinity Field)。步骤4,人体骨架组装,对检测到的多人2D关键点进行组装,形成多组人体2D骨架,并建立不同视角中关键点之间的联系,以及当前帧与前一帧关键点之间的联系。步骤5,动作数据库中已有动作重建,对于可识别的常见动作,直接调用动作数据库中预置动作动画,节约运算开销。步骤6,实时动作重建,若当前动作未存储在动作数据库中,则利用三维模型拟合2D人体骨架,实时重建出人物三维动作。步骤7,判断并处理实时动作重建时的遮挡情况,判断并处理实时动作重建时人体关键点被遮挡,导致2D关键点无法识别或识别错误的问题。
- 根据权利要求1所述的一种基于多目视频的家庭场景动作捕捉方法,其特征在于,所述的步骤2中家庭场景模型构建及标注,构建与真实家庭场景对应的三维场景模型,对三维场景进行必要标注。所述的步骤2进一步包括:步骤2.1,对待检测家庭场景进行三维建模。步骤2.2,在三维场景中标注常用行走通道、可坐区域等功能区。并在沙发、桌椅等固定的功能区中,对人物进行常规动作时的面部朝向进行定义,用来辅助人物常见行为动画的生成。步骤2.3,建立家庭成员动作数据库,基于参数化人体模型SMPL预先创建各家庭成员模型、客人标准样貌模型,以及常见动作动画,如行走、站立、静坐等。
- 根据权利要求1所述的一种基于多目视频的家庭场景动作捕捉方法,其特征在于,所述的步骤4中人体骨架组装,对检测到的多人2D关键点进行组装,形成多组人体骨架。所述的步骤4进一步包括:步骤4.1,构建初始关键点关联图G:G=(V,E),V=D j(c)∪D t-1,E=E P∪E V∪E T (1)其中,V为图G的点集,E为图G的边集。 表示在当前帧t中,视角c里第j类关键点中的第m个候选点,j∈{1,2,...,J},c∈{1,2,...,N},N为相机个数。D t-1表示t-1帧求出的骨骼3D关键点,若不存在t-1帧,则忽略这一项。在图G中,同一视角里,人体骨架中不同类的关键点两两之间有边相连,用E P表示。不同视角中,人体骨架相同类的关键点两两之间有边相连,用E V表示。每个视角中,每个关键点与D t-1中所有相同类的关键点相连,用E T表示,若不存在t-1帧,则忽略这一项。初始关键点关联图G如图2所示,为了表述清晰,图2中只画出了两个视角、两类关键点的示意图。步骤4.2,目标是对初始关键点关联图G求解,得到能够正确表示关键点联系的真实关键点关联图G’:G’=(V,E’),V=D j(c)∪D t-1,E’=E’ p∪E’ v∪E’ T (2)其中,在G’中,同一视角中的关键点以真实人体骨架对应的边相连,用E’ p表示;不同视角中,同一人物的同类关键点以边相连,用E’ v表示;每个视角中,每个关键点与D t-1中同一人物的同类关键点相连,用E’ T表示。步骤4.1-4.10即对G’求解过程。真实关键点关联图G’如图3所示,为了表述清晰,图3中只画出了两个视角、两类关键点的示意图。本步骤包括:步骤4.6.1,将初始关键点关联图G中,所有第i类关键点和所有第j类关键点组成的子图记为 在多人场景下, 中包含多个人体骨骼束。从 生成的所有候选骨骼束中计算出可令目标方程(10)最大的骨骼束g c,作为真实的骨骼束。其中,q(z)=p(z)·z,|V c|表示g c中点的个数,w p,w m,w t,w v为权重系数。步骤4.7,根据步骤4.6,遍历人体所有骨骼,求出人体骨骼束集合B。步骤4.8,将人体骨骼束B按照公式(10)的得分,由大到小排列,构成队列Q。
- 根据权利要求1所述的一种基于多目视频的家庭场景动作捕捉方法,其特征在于,所述的步骤5中动作数据库中已有动作重建,对于可识别的常见动作,直接调用动作数据库中预置动作动画,节约运算开销。所述的步骤5进一步包括:步骤5.1,利用采集的图像序列和2D骨骼信息,识别当前人物身份与动作。步骤5.2,判断当前人物动作是否已存储于动作数据库中。若已存储,则利用步骤5.3,5.4生成人物动画。若未存储,则进入步骤6。步骤5.3,基于三角测量法,利用双目标定相机获取的人体根关键点图像坐标计算出根关键点的三维坐标。步骤5.4,将动作数据库中动画初始帧的人物模型根结点对齐步骤5.3中计算出的三维坐标,并借助步骤2.2的面部方向标注确定根结点旋转方向。随后,播放动作数据库中的动画。在处理行走类动作时,可利用本步骤方法计算出动作结束时根结点的位置,并利用步骤2.2中对行走通道的标记确定运动过程路径。步骤5.5,若检测到人物动作发生切换,则返回步骤5.2。
- 根据权利要求1所述的一种基于多目视频的家庭场景动作捕捉方法,其特征在于,所述的步骤6中实时动作重建,若当前动作未存储在动作数据库中,则利用三维模型拟合2D人体骨架,实时重建出人物三维动作。所述的步骤6中, 令参数化模型拟合2D人体骨架的目标方程的定义为:E(β,θ)=λ JE J+λ shapeE shape+λ tempE temp+λ θE θ (11)其中,λ J,λ shape,λ temp,λ θ为权重参数。a.E J为关节距离惩罚项:其中,对于单个人物,η i,c表示第c个视角中此人的第i类关键点的置信分数,R θ(J(β) i)表示SMPL模型中第i类关键点的3D坐标, 表示第i类关键点向第c个相机的图像平面投影的2D坐标,J i,c表示第c个视角中第i类关键点的2D坐标,ρ(·)为Geman-McClure惩罚函数。b.E shape为形状惩罚项:c.E temp为时间平滑项:其中,α为权重参数,Δv j,t表示表示第t帧关节点j向前运动的趋势,Δv j,t=R θ(J(β) j,t-1-R θ(J(β) j,t-2,θ i,t表示第t帧第i类骨骼的姿势参数。d.E θ为动作惩罚项:其中,Σ j(g jN(θ;μ θ,j,Σ θ,j)为利用CMUMoCaP数据集建立的关于姿势参数θ的先验高斯混合模型。
- 根据权利要求1所述的一种基于多目视频的家庭场景动作捕捉方法,其特征在于,所述的步骤7中判断并处理实时动作重建时的遮挡情况,判断并处理实时动作重建时人体关键点被遮挡,导致2D关键点无法识别或识别错误的问题。所述的步骤7进一步包括:步骤7.1,若步骤4组成的2D人体骨架在所有视角中都不完整,或者检测出的部分关键点在所有视角的置信度都低于预设阈值T,则认为该人体有部分关键点被遮挡,处于视角盲区。步骤7.2,对于较短连续帧的遮挡,在步骤6进行实时重建时,增大式(11)中被遮挡关键点的权重系数λ temp,加强当前人体3D关键点估算对前一帧关键点的依赖。步骤7.3,对于较长连续帧的遮挡,特别是特定关键点的长时间遮挡,步骤7.2的处理容易产生累积误差。此种情况下,人物一般处于较为静止的状态,例如,坐于桌前时下半身的关键点被遮挡。此时,根据图片识别结果,从动作数据库中调出最接近当前姿态的标准姿态模型,如标准坐姿、标准站姿、标准卧姿等,及其姿势参数θ。其中,ω j表示骨骼关节链中关键点j相对于父关键点的轴角旋转。根据式(11),以标准姿态模型的参数θ为初始值进行动作回归,进行回归时只对置信度高的关键点的参数ω进行优化,被遮挡关键点保持原本的参数ω。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/093969 WO2022241583A1 (zh) | 2021-05-15 | 2021-05-15 | 一种基于多目视频的家庭场景动作捕捉方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/093969 WO2022241583A1 (zh) | 2021-05-15 | 2021-05-15 | 一种基于多目视频的家庭场景动作捕捉方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022241583A1 true WO2022241583A1 (zh) | 2022-11-24 |
Family
ID=84140927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/093969 WO2022241583A1 (zh) | 2021-05-15 | 2021-05-15 | 一种基于多目视频的家庭场景动作捕捉方法 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022241583A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115565253A (zh) * | 2022-12-08 | 2023-01-03 | 季华实验室 | 一种动态手势实时识别方法、装置、电子设备和存储介质 |
CN115984972A (zh) * | 2023-03-20 | 2023-04-18 | 乐歌人体工学科技股份有限公司 | 基于运动视频驱动的人体姿态识别方法 |
CN116403275A (zh) * | 2023-03-14 | 2023-07-07 | 南京航空航天大学 | 基于多目视觉检测封闭空间中人员行进姿态的方法及系统 |
CN116403288A (zh) * | 2023-04-28 | 2023-07-07 | 中南大学 | 运动姿态的识别方法、识别装置及电子设备 |
CN116880687A (zh) * | 2023-06-07 | 2023-10-13 | 黑龙江科技大学 | 一种基于单目多算法的悬浮触控方法 |
CN117541646A (zh) * | 2023-12-20 | 2024-02-09 | 暗物质(北京)智能科技有限公司 | 一种基于参数化模型的动作捕捉方法及系统 |
CN117911632A (zh) * | 2024-03-19 | 2024-04-19 | 电子科技大学 | 一种人体节点三维虚拟角色动作重构方法、设备及计算机可读存储介质 |
CN118015711A (zh) * | 2024-04-10 | 2024-05-10 | 华南农业大学 | 基于多角度下的表演动作识别方法、系统、设备及介质 |
CN118286603A (zh) * | 2024-04-17 | 2024-07-05 | 四川大学华西医院 | 一种基于计算机视觉的磁刺激系统及方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107845129A (zh) * | 2017-11-07 | 2018-03-27 | 深圳狗尾草智能科技有限公司 | 三维重构方法及装置、增强现实的方法及装置 |
CN110020611A (zh) * | 2019-03-17 | 2019-07-16 | 浙江大学 | 一种基于三维假设空间聚类的多人动作捕捉方法 |
CN110544302A (zh) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | 基于多目视觉的人体动作重建系统、方法和动作训练系统 |
US20210012100A1 (en) * | 2019-07-10 | 2021-01-14 | Hrl Laboratories, Llc | Action classification using deep embedded clustering |
-
2021
- 2021-05-15 WO PCT/CN2021/093969 patent/WO2022241583A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107845129A (zh) * | 2017-11-07 | 2018-03-27 | 深圳狗尾草智能科技有限公司 | 三维重构方法及装置、增强现实的方法及装置 |
CN110020611A (zh) * | 2019-03-17 | 2019-07-16 | 浙江大学 | 一种基于三维假设空间聚类的多人动作捕捉方法 |
US20210012100A1 (en) * | 2019-07-10 | 2021-01-14 | Hrl Laboratories, Llc | Action classification using deep embedded clustering |
CN110544302A (zh) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | 基于多目视觉的人体动作重建系统、方法和动作训练系统 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115565253B (zh) * | 2022-12-08 | 2023-04-18 | 季华实验室 | 一种动态手势实时识别方法、装置、电子设备和存储介质 |
CN115565253A (zh) * | 2022-12-08 | 2023-01-03 | 季华实验室 | 一种动态手势实时识别方法、装置、电子设备和存储介质 |
CN116403275B (zh) * | 2023-03-14 | 2024-05-24 | 南京航空航天大学 | 基于多目视觉检测封闭空间中人员行进姿态的方法及系统 |
CN116403275A (zh) * | 2023-03-14 | 2023-07-07 | 南京航空航天大学 | 基于多目视觉检测封闭空间中人员行进姿态的方法及系统 |
CN115984972A (zh) * | 2023-03-20 | 2023-04-18 | 乐歌人体工学科技股份有限公司 | 基于运动视频驱动的人体姿态识别方法 |
CN115984972B (zh) * | 2023-03-20 | 2023-08-11 | 乐歌人体工学科技股份有限公司 | 基于运动视频驱动的人体姿态识别方法 |
CN116403288A (zh) * | 2023-04-28 | 2023-07-07 | 中南大学 | 运动姿态的识别方法、识别装置及电子设备 |
CN116880687A (zh) * | 2023-06-07 | 2023-10-13 | 黑龙江科技大学 | 一种基于单目多算法的悬浮触控方法 |
CN116880687B (zh) * | 2023-06-07 | 2024-03-19 | 黑龙江科技大学 | 一种基于单目多算法的悬浮触控方法 |
CN117541646A (zh) * | 2023-12-20 | 2024-02-09 | 暗物质(北京)智能科技有限公司 | 一种基于参数化模型的动作捕捉方法及系统 |
CN117911632A (zh) * | 2024-03-19 | 2024-04-19 | 电子科技大学 | 一种人体节点三维虚拟角色动作重构方法、设备及计算机可读存储介质 |
CN117911632B (zh) * | 2024-03-19 | 2024-05-28 | 电子科技大学 | 一种人体节点三维虚拟角色动作重构方法、设备及计算机可读存储介质 |
CN118015711A (zh) * | 2024-04-10 | 2024-05-10 | 华南农业大学 | 基于多角度下的表演动作识别方法、系统、设备及介质 |
CN118286603A (zh) * | 2024-04-17 | 2024-07-05 | 四川大学华西医院 | 一种基于计算机视觉的磁刺激系统及方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022241583A1 (zh) | 一种基于多目视频的家庭场景动作捕捉方法 | |
Wang et al. | Deep 3D human pose estimation: A review | |
CN109242950B (zh) | 多人紧密交互场景下的多视角人体动态三维重建方法 | |
Cheung et al. | Shape-from-silhouette across time part ii: Applications to human modeling and markerless motion tracking | |
Wang et al. | EM enhancement of 3D head pose estimated by point at infinity | |
Ye et al. | Accurate 3d pose estimation from a single depth image | |
Tao et al. | Object tracking with bayesian estimation of dynamic layer representations | |
Kumano et al. | Pose-invariant facial expression recognition using variable-intensity templates | |
Rafi et al. | A semantic occlusion model for human pose estimation from a single depth image | |
KR20190129985A (ko) | 파트 기반 키 프레임들 및 선험적 모델을 사용한 견고한 메시 트래킹 및 융합 | |
KR20210079542A (ko) | 3d 골격 정보를 이용한 사용자 동작 인식 방법 및 시스템 | |
CN111582036B (zh) | 可穿戴设备下基于形状和姿态的跨视角人物识别方法 | |
Argyros et al. | Binocular hand tracking and reconstruction based on 2D shape matching | |
CN111832386A (zh) | 一种估计人体姿态的方法、装置及计算机可读介质 | |
Rius et al. | Action-specific motion prior for efficient Bayesian 3D human body tracking | |
Haker et al. | Self-organizing maps for pose estimation with a time-of-flight camera | |
Okada et al. | Virtual fashion show using real-time markerless motion capture | |
Lefevre et al. | Structure and appearance features for robust 3d facial actions tracking | |
Leow et al. | 3-D–2-D spatiotemporal registration for sports motion analysis | |
Muhlbauer et al. | A model-based algorithm to estimate body poses using stereo vision | |
Zúniga et al. | Fast and reliable object classification in video based on a 3D generic model | |
Joo | Sensing, Measuring, and Modeling Social Signals in Nonverbal Communication | |
Metaxas et al. | Dynamically adaptive tracking of gestures and facial expressions | |
Dornaika et al. | Detecting and tracking of 3d face pose for human-robot interaction | |
Kaimakis et al. | Gradient-based hand tracking using silhouette data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21940042 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21940042 Country of ref document: EP Kind code of ref document: A1 |