CN117541646A - Motion capturing method and system based on parameterized model - Google Patents

Motion capturing method and system based on parameterized model Download PDF

Info

Publication number
CN117541646A
CN117541646A CN202311754272.6A CN202311754272A CN117541646A CN 117541646 A CN117541646 A CN 117541646A CN 202311754272 A CN202311754272 A CN 202311754272A CN 117541646 A CN117541646 A CN 117541646A
Authority
CN
China
Prior art keywords
human body
target person
person
coordinates
parameterized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311754272.6A
Other languages
Chinese (zh)
Inventor
陈靖涵
张鹏飞
苏江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dark Matter Beijing Intelligent Technology Co ltd
Original Assignee
Dark Matter Beijing Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dark Matter Beijing Intelligent Technology Co ltd filed Critical Dark Matter Beijing Intelligent Technology Co ltd
Priority to CN202311754272.6A priority Critical patent/CN117541646A/en
Publication of CN117541646A publication Critical patent/CN117541646A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a motion capturing method and a motion capturing system based on a parameterized model.A human body detection module acquires RGB video or RGBD video matched with depth information to acquire a target person and a position boundary box of two hands; according to the region image in the target person boundary box and the region image of the target person double-hand boundary box, the foot touchdown detection module obtains a classification result of the person double feet by using a classification algorithm model; the human body posture capturing module captures and estimates the rotation value of each joint point of the human body by using the human body parameterized three-dimensional model; the absolute position estimation module obtains the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm; the data optimization module obtains the optimized rotation values for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through mean value filtering processing and inverse kinematics optimization algorithm according to the rotation values of all the joints of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.

Description

Motion capturing method and system based on parameterized model
Technical Field
The invention relates to the technical field of computer vision and human motion capture, in particular to a motion capture method and system based on a parameterized model.
Background
At present, the human body motion capturing method is a technology which is needed in the digital human body and the meta universe, and a more mature scheme exists, and the current human body motion capturing technology can capture more accurate motion under the condition that no equipment is worn and only a camera is needed, so that compared with the motion capturing method which needs to wear equipment, the cost of motion capturing is reduced.
However, most methods of motion capture by means of cameras only focus on limb parts, but ignore hand motions, and the negative effects of foot drift and sliding often occur in the results of motion capture, affecting the look and feel.
Therefore, how to capture the motion of the whole body and eliminate the foot sliding at the same time, and restore the more realistic motion is a problem that the person skilled in the art needs to solve.
Disclosure of Invention
In view of the above, the present invention provides a motion capturing method and system based on parameterized model to solve some of the technical problems mentioned in the background art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a motion capture method based on a parameterized model comprises the following steps:
s1, acquiring RGB video or RGBD video matched with depth information;
s2, positioning the positions of the target persons in the video picture, and simultaneously positioning the positions of the two hands of the target persons to obtain the target persons and the position boundary boxes of the two hands of the target persons;
s3, according to the regional image in the target person boundary box, a classification algorithm model is utilized to obtain a classification result of the person feet, and whether the person feet are in contact with the ground or not is judged;
capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
according to the regional image in the boundary frame of the target person, obtaining the 3D coordinates of the target person in a camera coordinate system through an absolute position estimation algorithm, and estimating the displacement information of the target person;
s4, according to the rotation value of each joint point of the human body, the coordinates of the human body in a camera coordinate system and the two classification results of whether the feet of the person are grounded or not, the rotation value for eliminating the sliding and floating of the feet and the coordinates of the human body in the camera coordinate system after optimization are obtained through an average value filtering process and an inverse kinematics optimization algorithm.
Preferably, the bounding box for obtaining the positions of the target person and the two hands in S2 is implemented by using the mainstream target detection algorithm YOLO.
Preferably, in step S3, the two-class algorithm model includes a multi-layer perceptron MLP, which includes an input layer, three hidden layers, and an output layer, and five layers of networks are all fully connected layers, and the loss function of the two-class algorithm model adopts two-class cross entropy loss functions.
Preferably, the human parameterized three-dimensional model in step S3 comprises an encoder, a spatial feature pyramid network, and a regressor;
the input image is output by the encoder and contains feature graphs of rich semantic information, then the input space pyramid network further extracts features, finally input parameters required by the human body parameterized three-dimensional model and estimated camera parameters are output by the regressive, the input parameters are rotation values of all bone points, the 3D key point positions and the 2D key point positions of the human body are obtained by forward kinematics and the camera parameters, the three-dimensional model is used for calculating a loss function, and the human body parameterized three-dimensional model is trained by reconstructing the loss function.
Preferably, the reconstruction loss function of the training human body parameterized three-dimensional model is specifically:
L reg =λ 2d ||K-K gt ||+λ 3d ||J-J gt ||+λ para ||Θ-Θ gt ||
wherein K represents a 2D key point position, J represents a 3D key point position, θ represents a human body parameterized three-dimensional model input parameter and a camera parameter, λ represents the weights of the different parts, |·| represents the L2 norm.
Preferably, the absolute position estimation algorithm in step S3 includes a backbone network and two regressors, where the backbone network is formed by multiple convolution layers, the regressors are mainly formed by full connection layers, the image is extracted by the backbone network, features are respectively input into the two regressors, camera parameters and 3D coordinates of the relative root node are respectively estimated, and then the estimated camera parameters convert the 3D coordinates of the relative root node into absolute 3D coordinates of a camera coordinate system.
Preferably, the loss function of the absolute position estimation algorithm is an L1 norm, specifically:
L=||R-R gt || 1
where R represents the absolute 3D coordinates of the camera coordinate system.
Preferably, step S2 includes a single-person mode or a multi-person mode, wherein in the single-person mode, if a plurality of persons appear on the screen, only one of the bounding boxes with the largest proportion of the screen is output; the multi-view mode is to match the boundary boxes of the same person in different views through a matching algorithm, and position of the same person in different views is located;
in step S3, during the multi-view mode, summarizing the classification results of the multiple views, and taking the classification result of the majority of the views as the classification result of the two feet of the person; outputting the final rotation value of the rotation value output by each visual angle through a multi-visual angle fusion algorithm; the mean of the 3D coordinates of the multiple view angle estimates is the 3D coordinates of the target person in the camera coordinate system.
Preferably, the specific content of step S4 includes:
s41, eliminating jitter information in data through mean value filtering processing of rotation values of all joint points of a human body and coordinates of the human body in a camera coordinate system;
s42, calculating the position of a new foot key point according to the classification result of whether the left foot and the right foot of the human body contact the ground or not by an interpolation method;
s43, optimizing the rotation value of each joint point of the human body in an iterative numerical optimization mode by taking the position of the new foot key point as a constraint.
The motion capture system based on the parameterized model comprises a human body detection module, a foot touchdown detection module, a human body gesture capture module, an absolute position estimation module and a data optimization module;
the human body detection module is used for locating the position of a target person in a video picture by collecting RGB video or RGBD video matched with depth information, locating the positions of both hands of the target person and obtaining a target person and a boundary frame of the positions of both hands of the target person;
the foot touchdown detection module is used for obtaining a classification result of the feet of the person by utilizing a classification algorithm model according to the region image in the target person boundary box and judging whether the feet of the person are contacted with the ground or not;
the human body posture capturing module is used for capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
the absolute position estimation module is used for obtaining the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm according to the regional image in the target person boundary box and estimating the displacement information of the target person;
the data optimization module is used for obtaining the optimized rotation value for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through the average filtering processing and the inverse kinematics optimization algorithm according to the rotation value of each joint point of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
Compared with the prior art, the invention discloses a motion capturing method and a motion capturing system based on a parameterized model, which have the following advantages:
the effects of capturing motion and driving digital virtual persons are realized through the low-cost household RGB camera, and the deployment is simple and quick;
the end-to-end motion capture system and the data optimization method are provided, in actual operation, optimized motion capture data can be obtained only by inputting RGB video, and a more real effect is achieved;
the multi-view scheme is selected to enable the motion capture result to be more accurate and stable;
the method can capture the detailed information of the hand, and can be applied to more actual scenes by combining the information of the body;
the data optimization method can further optimize the result of motion capture, achieves a more anthropomorphic and real driving effect, is high in processing speed, flexible in setting and universality, and can be applied to virtual digital human models with different skeleton structures through simple modification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a motion capture method based on a parameterized model according to the present invention;
FIG. 2 is a schematic diagram of a human body posture estimation method based on a parameterized model provided by the invention;
FIG. 3 is a schematic diagram of a motion capture system based on a parameterized model according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a motion capture method based on a parameterized model, as shown in fig. 1, comprising the following steps:
s1, acquiring RGB video or RGBD video matched with depth information;
s2, positioning the positions of the target persons in the video picture, and simultaneously positioning the positions of the two hands of the target persons to obtain the target persons and the position boundary boxes of the two hands of the target persons;
s3, according to the regional image in the target person boundary box, a classification algorithm model is utilized to obtain a classification result of the person feet, and whether the person feet are in contact with the ground or not is judged;
capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
according to the regional image in the boundary frame of the target person, obtaining the 3D coordinates of the target person in a camera coordinate system through an absolute position estimation algorithm, and estimating the displacement information of the target person;
s4, according to the rotation value of each joint point of the human body, the coordinates of the human body in a camera coordinate system and the two classification results of whether the feet of the person are grounded or not, the rotation value for eliminating the sliding and floating of the feet and the coordinates of the human body in the camera coordinate system after optimization are obtained through an average value filtering process and an inverse kinematics optimization algorithm.
In order to further implement the above technical solution, the obtaining of the target person and the bounding boxes of the positions of the two hands in S2 is implemented by the mainstream target detection algorithm YOLO.
In this embodiment, the video captured by the camera is passed through the mainstream object detection algorithm YOLO frame by frame, and the position of the target person in each frame, and the positions of both hands thereof are output, wherein the positions are represented by the form of a bounding box in which the person and hands are to be completely included, and in order to ensure this, the bounding box of the output is enlarged as a whole.
In order to further implement the above technical solution, in step S3, the two-class algorithm model includes a multi-layer perceptron MLP, which includes an input layer, three hidden layers, and an output layer, where five layers of networks are all fully connected layers, and the loss function of the two-class algorithm model uses two-class cross entropy loss functions.
In order to further implement the above technical solution, as shown in fig. 2, the human body parameterized three-dimensional model in step S3 includes an encoder, a spatial feature pyramid network, and a regressor;
the input image is output by the encoder and contains feature graphs of rich semantic information, then the input space pyramid network further extracts features, finally input parameters required by the human body parameterized three-dimensional model and estimated camera parameters are output by the regressive, the input parameters are rotation values of all bone points, the 3D key point positions and the 2D key point positions of the human body are obtained by forward kinematics and the camera parameters, the three-dimensional model is used for calculating a loss function, and the human body parameterized three-dimensional model is trained by reconstructing the loss function.
In order to further implement the technical scheme, the reconstruction loss function of the human body parameterized three-dimensional model is specifically:
L reg =λ 2d ||K-K gt ||+λ 3d ||J-J gt ||+λ para ||Θ-Θ gt ||
wherein K represents a 2D key point position, J represents a 3D key point position, θ represents a human body parameterized three-dimensional model input parameter and a camera parameter, λ represents the weights of the different parts, |·| represents the L2 norm.
In order to further implement the above technical solution, the absolute position estimation algorithm in step S3 includes a backbone network and two regressors, where the backbone network is formed by multiple convolution layers, the regressors are mainly formed by full connection layers, the image is extracted by the backbone network to extract features, the two regressors are input respectively, camera parameters and 3D coordinates of the relative root node are estimated respectively, and then the estimated camera parameters are used to convert the 3D coordinates of the relative root node into absolute 3D coordinates of the camera coordinate system.
In order to further implement the above technical solution, the loss function of the absolute position estimation algorithm is an L1 norm, specifically:
L=||R-R gt || 1
where R represents the absolute 3D coordinates of the camera coordinate system.
In order to further implement the above technical solution, step S2 includes a single person mode or a multi-person mode, wherein in the single person mode, if a plurality of persons appear on the screen, only one of the bounding boxes with the largest proportion of the screen is output; the multi-view mode is to match the boundary boxes of the same person in different views through a matching algorithm, and position of the same person in different views is located;
in step S3, during the multi-view mode, summarizing the classification results of the multiple views, and taking the classification result of the majority of the views as the classification result of the two feet of the person; outputting the final rotation value of the rotation value output by each visual angle through a multi-visual angle fusion algorithm; the mean of the 3D coordinates of the multiple view angle estimates is the 3D coordinates of the target person in the camera coordinate system.
In order to further implement the above technical solution, the specific content of step S4 includes:
s41, eliminating jitter information in data through mean value filtering processing of rotation values of all joint points of a human body and coordinates of the human body in a camera coordinate system;
in practical application, the filtering process has a good effect on jitter with the mean value of 0, and the phenomenon of foot sliding and floating can also occur when a virtual digital person is actually driven, so that an optimization algorithm based on inverse kinematics is further carried out:
s42, calculating the position of a new foot key point according to the classification result of whether the left foot and the right foot of the human body contact the ground or not by an interpolation method;
s43, optimizing the rotation value of each joint point of the human body in an iterative numerical optimization mode by taking the position of the new foot key point as a constraint.
A motion capture system based on a parameterized model, as shown in fig. 3, comprises a human body detection module, a foot touchdown detection module, a human body gesture capture module, an absolute position estimation module and a data optimization module;
the human body detection module is used for locating the position of a target person in a video picture by collecting RGB video or RGBD video matched with depth information, locating the positions of both hands of the target person and obtaining a target person and a boundary frame of the positions of both hands of the target person;
the foot touchdown detection module is used for obtaining a classification result of the feet of the person by utilizing a classification algorithm model according to the region image in the target person boundary box and judging whether the feet of the person are contacted with the ground or not;
the human body posture capturing module is used for capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
the absolute position estimation module is used for obtaining the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm according to the regional image in the target person boundary box and estimating the displacement information of the target person;
the data optimization module is used for obtaining the optimized rotation value for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through the average filtering processing and the inverse kinematics optimization algorithm according to the rotation value of each joint point of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a parameterized model based motion capture method.
A processing terminal comprises a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, and the processor realizes a motion capture method based on a parameterized model when executing the computer program.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The motion capturing method based on the parameterized model is characterized by comprising the following steps of:
s1, acquiring RGB video or RGBD video matched with depth information;
s2, positioning the positions of the target persons in the video picture, and simultaneously positioning the positions of the two hands of the target persons to obtain the target persons and the position boundary boxes of the two hands of the target persons;
s3, according to the regional image in the target person boundary box, a classification algorithm model is utilized to obtain a classification result of the person feet, and whether the person feet are in contact with the ground or not is judged;
capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
according to the regional image in the boundary frame of the target person, obtaining the 3D coordinates of the target person in a camera coordinate system through an absolute position estimation algorithm, and estimating the displacement information of the target person;
s4, according to the rotation value of each joint point of the human body, the coordinates of the human body in a camera coordinate system and the two classification results of whether the feet of the person are grounded or not, the rotation value for eliminating the sliding and floating of the feet and the coordinates of the human body in the camera coordinate system after optimization are obtained through an average value filtering process and an inverse kinematics optimization algorithm.
2. The parameterized model-based motion capture method of claim 1, wherein the obtaining of the bounding boxes for the target person and his hands in S2 is performed by a mainstream target detection algorithm YOLO.
3. The motion capture method based on a parameterized model of claim 1, wherein in step S3, the two-class algorithm model includes a multi-layer perceptron MLP, which includes an input layer, three hidden layers, an output layer, and five layers of networks all being fully connected layers, and the loss function of the two-class algorithm model employs a two-class cross entropy loss function.
4. The method of claim 1, wherein the three-dimensional model of human body parameterization in step S3 comprises an encoder, a spatial feature pyramid network, and a regressor;
the input image is output by the encoder and contains feature graphs of rich semantic information, then the input space pyramid network further extracts features, finally input parameters required by the human body parameterized three-dimensional model and estimated camera parameters are output by the regressive, the input parameters are rotation values of all bone points, the 3D key point positions and the 2D key point positions of the human body are obtained by forward kinematics and the camera parameters, the three-dimensional model is used for calculating a loss function, and the human body parameterized three-dimensional model is trained by reconstructing the loss function.
5. The method for motion capture based on parameterized model of claim 4, wherein the reconstruction loss function of training the parameterized three-dimensional model of the human body is specifically:
L reg =λ 2d ||K-K gt ||+λ 3d ||J-J gt ||+λ para ||Θ-Θ gt ||
wherein K represents a 2D key point position, J represents a 3D key point position, θ represents a human body parameterized three-dimensional model input parameter and a camera parameter, λ represents the weights of the different parts, |·| represents the L2 norm.
6. The motion capture method based on a parameterized model of claim 1, wherein the absolute position estimation algorithm in step S3 comprises a backbone network and two regressions, wherein the backbone network is composed of a plurality of convolution layers, the regressions are mainly composed of full connection layers, the image is extracted by the backbone network to be characterized, the two regressions are respectively input, camera parameters and 3D coordinates of the relative root node are respectively estimated, and the estimated camera parameters are used for converting the 3D coordinates of the relative root node into absolute 3D coordinates of a camera coordinate system.
7. The motion capture method based on the parameterized model of claim 6, wherein the loss function of the absolute position estimation algorithm is L1 norm, specifically:
L=||R-R gt || 1
where R represents the absolute 3D coordinates of the camera coordinate system.
8. The method of claim 1, wherein step S2 includes a single-person mode or a multi-person mode, wherein in the single-person mode, if a plurality of persons appear on a screen, only one of the bounding boxes with the largest specific gravity of the screen is output; the multi-view mode is to match the boundary boxes of the same person in different views through a matching algorithm, and position of the same person in different views is located;
in step S3, during the multi-view mode, summarizing the classification results of the multiple views, and taking the classification result of the majority of the views as the classification result of the two feet of the person; outputting the final rotation value of the rotation value output by each visual angle through a multi-visual angle fusion algorithm; the mean of the 3D coordinates of the multiple view angle estimates is the 3D coordinates of the target person in the camera coordinate system.
9. The method for motion capture based on parameterized model of claim 1, wherein the specific content of step S4 comprises:
s41, eliminating jitter information in data through mean value filtering processing of rotation values of all joint points of a human body and coordinates of the human body in a camera coordinate system;
s42, calculating the position of a new foot key point according to the classification result of whether the left foot and the right foot of the human body contact the ground or not by an interpolation method;
s43, optimizing the rotation value of each joint point of the human body in an iterative numerical optimization mode by taking the position of the new foot key point as a constraint.
10. A motion capture system based on a parameterized model, characterized in that the motion capture system based on the parameterized model of any one of claims 1-9 comprises a human body detection module, a foot touchdown detection module, a human body gesture capture module, an absolute position estimation module and a data optimization module;
the human body detection module is used for locating the position of a target person in a video picture by collecting RGB video or RGBD video matched with depth information, locating the positions of both hands of the target person and obtaining a target person and a boundary frame of the positions of both hands of the target person;
the foot touchdown detection module is used for obtaining a classification result of the feet of the person by utilizing a classification algorithm model according to the region image in the target person boundary box and judging whether the feet of the person are contacted with the ground or not;
the human body posture capturing module is used for capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
the absolute position estimation module is used for obtaining the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm according to the regional image in the target person boundary box and estimating the displacement information of the target person;
the data optimization module is used for obtaining the optimized rotation value for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through the average filtering processing and the inverse kinematics optimization algorithm according to the rotation value of each joint point of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
CN202311754272.6A 2023-12-20 2023-12-20 Motion capturing method and system based on parameterized model Pending CN117541646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311754272.6A CN117541646A (en) 2023-12-20 2023-12-20 Motion capturing method and system based on parameterized model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311754272.6A CN117541646A (en) 2023-12-20 2023-12-20 Motion capturing method and system based on parameterized model

Publications (1)

Publication Number Publication Date
CN117541646A true CN117541646A (en) 2024-02-09

Family

ID=89792079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311754272.6A Pending CN117541646A (en) 2023-12-20 2023-12-20 Motion capturing method and system based on parameterized model

Country Status (1)

Country Link
CN (1) CN117541646A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190286892A1 (en) * 2018-03-13 2019-09-19 Adobe Inc. Interaction Detection Model for Identifying Human-Object Interactions in Image Content
CN113033369A (en) * 2021-03-18 2021-06-25 北京达佳互联信息技术有限公司 Motion capture method, motion capture device, electronic equipment and computer-readable storage medium
US11182924B1 (en) * 2019-03-22 2021-11-23 Bertec Corporation System for estimating a three dimensional pose of one or more persons in a scene
CN114519758A (en) * 2022-02-28 2022-05-20 广州虎牙科技有限公司 Method and device for driving virtual image and server
CN114550292A (en) * 2022-02-21 2022-05-27 东南大学 High-physical-reality human body motion capture method based on neural motion control
WO2022241583A1 (en) * 2021-05-15 2022-11-24 电子科技大学 Family scenario motion capture method based on multi-target video
CN116386141A (en) * 2023-03-30 2023-07-04 南京大学 Multi-stage human motion capturing method, device and medium based on monocular video
CN116721471A (en) * 2023-08-10 2023-09-08 中国科学院合肥物质科学研究院 Multi-person three-dimensional attitude estimation method based on multi-view angles
CN116934972A (en) * 2023-07-26 2023-10-24 石家庄铁道大学 Three-dimensional human body reconstruction method based on double-flow network
CN116958355A (en) * 2023-08-21 2023-10-27 北京字跳网络技术有限公司 Action animation generation method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190286892A1 (en) * 2018-03-13 2019-09-19 Adobe Inc. Interaction Detection Model for Identifying Human-Object Interactions in Image Content
US11182924B1 (en) * 2019-03-22 2021-11-23 Bertec Corporation System for estimating a three dimensional pose of one or more persons in a scene
CN113033369A (en) * 2021-03-18 2021-06-25 北京达佳互联信息技术有限公司 Motion capture method, motion capture device, electronic equipment and computer-readable storage medium
WO2022241583A1 (en) * 2021-05-15 2022-11-24 电子科技大学 Family scenario motion capture method based on multi-target video
CN114550292A (en) * 2022-02-21 2022-05-27 东南大学 High-physical-reality human body motion capture method based on neural motion control
CN114519758A (en) * 2022-02-28 2022-05-20 广州虎牙科技有限公司 Method and device for driving virtual image and server
CN116386141A (en) * 2023-03-30 2023-07-04 南京大学 Multi-stage human motion capturing method, device and medium based on monocular video
CN116934972A (en) * 2023-07-26 2023-10-24 石家庄铁道大学 Three-dimensional human body reconstruction method based on double-flow network
CN116721471A (en) * 2023-08-10 2023-09-08 中国科学院合肥物质科学研究院 Multi-person three-dimensional attitude estimation method based on multi-view angles
CN116958355A (en) * 2023-08-21 2023-10-27 北京字跳网络技术有限公司 Action animation generation method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XINLIANG WEI 等: "The Application of Motion Capture and 3D Skeleton Modeling in Virtual Fighting", NEXT GENERATION COMPUTER ANIMATION TECHNIQUES CONFERENCE PAPER, 1 November 2017 (2017-11-01) *
罗飘;刘晓平;: "面向Kinect运动数据的鲁棒足迹检测", 中国图象图形学报, no. 02, 16 February 2016 (2016-02-16) *

Similar Documents

Publication Publication Date Title
CN109636831B (en) Method for estimating three-dimensional human body posture and hand information
CN110637323B (en) Method, device and system for part-based tracking
Zuffi et al. Lions and tigers and bears: Capturing non-rigid, 3d, articulated shape from images
Li et al. Learning the depths of moving people by watching frozen people
Rematas et al. Soccer on your tabletop
Zheng et al. Hybridfusion: Real-time performance capture using a single depth sensor and sparse imus
Mueller et al. Real-time hand tracking under occlusion from an egocentric rgb-d sensor
Yan et al. Ddrnet: Depth map denoising and refinement for consumer depth cameras using cascaded cnns
CN113706699B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN110070595B (en) Single image 3D object reconstruction method based on deep learning
CN113628348A (en) Method and equipment for determining viewpoint path in three-dimensional scene
CN111476884B (en) Real-time three-dimensional human body reconstruction method and system based on single-frame RGBD image
CN114450719A (en) Human body model reconstruction method, reconstruction system and storage medium
CN114119739A (en) Binocular vision-based hand key point space coordinate acquisition method
Wang et al. CNN-monofusion: online monocular dense reconstruction using learned depth from single view
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
Reinert et al. Animated 3D creatures from single-view video by skeletal sketching.
Li et al. Three-dimensional motion estimation via matrix completion
Fu et al. CBAM-SLAM: A semantic slam based on attention module in dynamic environment
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
Khan et al. Towards monocular neural facial depth estimation: Past, present, and future
JP2002032741A (en) System and method for three-dimensional image generation and program providing medium
Cohen et al. Inference of 3D human body posture from multiple cameras for vision-based user interface
CN117541646A (en) Motion capturing method and system based on parameterized model
CN114926594A (en) Single-view-angle shielding human body motion reconstruction method based on self-supervision space-time motion prior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination