CN117541646A - Motion capturing method and system based on parameterized model - Google Patents
Motion capturing method and system based on parameterized model Download PDFInfo
- Publication number
- CN117541646A CN117541646A CN202311754272.6A CN202311754272A CN117541646A CN 117541646 A CN117541646 A CN 117541646A CN 202311754272 A CN202311754272 A CN 202311754272A CN 117541646 A CN117541646 A CN 117541646A
- Authority
- CN
- China
- Prior art keywords
- human body
- target person
- person
- coordinates
- parameterized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 37
- 238000005457 optimization Methods 0.000 claims abstract description 23
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000001914 filtration Methods 0.000 claims abstract description 11
- 238000007635 classification algorithm Methods 0.000 claims abstract description 7
- 238000006073 displacement reaction Methods 0.000 claims description 6
- 210000000988 bone and bone Anatomy 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 230000001373 regressive effect Effects 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 230000005484 gravity Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Computer Graphics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a motion capturing method and a motion capturing system based on a parameterized model.A human body detection module acquires RGB video or RGBD video matched with depth information to acquire a target person and a position boundary box of two hands; according to the region image in the target person boundary box and the region image of the target person double-hand boundary box, the foot touchdown detection module obtains a classification result of the person double feet by using a classification algorithm model; the human body posture capturing module captures and estimates the rotation value of each joint point of the human body by using the human body parameterized three-dimensional model; the absolute position estimation module obtains the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm; the data optimization module obtains the optimized rotation values for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through mean value filtering processing and inverse kinematics optimization algorithm according to the rotation values of all the joints of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
Description
Technical Field
The invention relates to the technical field of computer vision and human motion capture, in particular to a motion capture method and system based on a parameterized model.
Background
At present, the human body motion capturing method is a technology which is needed in the digital human body and the meta universe, and a more mature scheme exists, and the current human body motion capturing technology can capture more accurate motion under the condition that no equipment is worn and only a camera is needed, so that compared with the motion capturing method which needs to wear equipment, the cost of motion capturing is reduced.
However, most methods of motion capture by means of cameras only focus on limb parts, but ignore hand motions, and the negative effects of foot drift and sliding often occur in the results of motion capture, affecting the look and feel.
Therefore, how to capture the motion of the whole body and eliminate the foot sliding at the same time, and restore the more realistic motion is a problem that the person skilled in the art needs to solve.
Disclosure of Invention
In view of the above, the present invention provides a motion capturing method and system based on parameterized model to solve some of the technical problems mentioned in the background art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a motion capture method based on a parameterized model comprises the following steps:
s1, acquiring RGB video or RGBD video matched with depth information;
s2, positioning the positions of the target persons in the video picture, and simultaneously positioning the positions of the two hands of the target persons to obtain the target persons and the position boundary boxes of the two hands of the target persons;
s3, according to the regional image in the target person boundary box, a classification algorithm model is utilized to obtain a classification result of the person feet, and whether the person feet are in contact with the ground or not is judged;
capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
according to the regional image in the boundary frame of the target person, obtaining the 3D coordinates of the target person in a camera coordinate system through an absolute position estimation algorithm, and estimating the displacement information of the target person;
s4, according to the rotation value of each joint point of the human body, the coordinates of the human body in a camera coordinate system and the two classification results of whether the feet of the person are grounded or not, the rotation value for eliminating the sliding and floating of the feet and the coordinates of the human body in the camera coordinate system after optimization are obtained through an average value filtering process and an inverse kinematics optimization algorithm.
Preferably, the bounding box for obtaining the positions of the target person and the two hands in S2 is implemented by using the mainstream target detection algorithm YOLO.
Preferably, in step S3, the two-class algorithm model includes a multi-layer perceptron MLP, which includes an input layer, three hidden layers, and an output layer, and five layers of networks are all fully connected layers, and the loss function of the two-class algorithm model adopts two-class cross entropy loss functions.
Preferably, the human parameterized three-dimensional model in step S3 comprises an encoder, a spatial feature pyramid network, and a regressor;
the input image is output by the encoder and contains feature graphs of rich semantic information, then the input space pyramid network further extracts features, finally input parameters required by the human body parameterized three-dimensional model and estimated camera parameters are output by the regressive, the input parameters are rotation values of all bone points, the 3D key point positions and the 2D key point positions of the human body are obtained by forward kinematics and the camera parameters, the three-dimensional model is used for calculating a loss function, and the human body parameterized three-dimensional model is trained by reconstructing the loss function.
Preferably, the reconstruction loss function of the training human body parameterized three-dimensional model is specifically:
L reg =λ 2d ||K-K gt ||+λ 3d ||J-J gt ||+λ para ||Θ-Θ gt ||
wherein K represents a 2D key point position, J represents a 3D key point position, θ represents a human body parameterized three-dimensional model input parameter and a camera parameter, λ represents the weights of the different parts, |·| represents the L2 norm.
Preferably, the absolute position estimation algorithm in step S3 includes a backbone network and two regressors, where the backbone network is formed by multiple convolution layers, the regressors are mainly formed by full connection layers, the image is extracted by the backbone network, features are respectively input into the two regressors, camera parameters and 3D coordinates of the relative root node are respectively estimated, and then the estimated camera parameters convert the 3D coordinates of the relative root node into absolute 3D coordinates of a camera coordinate system.
Preferably, the loss function of the absolute position estimation algorithm is an L1 norm, specifically:
L=||R-R gt || 1
where R represents the absolute 3D coordinates of the camera coordinate system.
Preferably, step S2 includes a single-person mode or a multi-person mode, wherein in the single-person mode, if a plurality of persons appear on the screen, only one of the bounding boxes with the largest proportion of the screen is output; the multi-view mode is to match the boundary boxes of the same person in different views through a matching algorithm, and position of the same person in different views is located;
in step S3, during the multi-view mode, summarizing the classification results of the multiple views, and taking the classification result of the majority of the views as the classification result of the two feet of the person; outputting the final rotation value of the rotation value output by each visual angle through a multi-visual angle fusion algorithm; the mean of the 3D coordinates of the multiple view angle estimates is the 3D coordinates of the target person in the camera coordinate system.
Preferably, the specific content of step S4 includes:
s41, eliminating jitter information in data through mean value filtering processing of rotation values of all joint points of a human body and coordinates of the human body in a camera coordinate system;
s42, calculating the position of a new foot key point according to the classification result of whether the left foot and the right foot of the human body contact the ground or not by an interpolation method;
s43, optimizing the rotation value of each joint point of the human body in an iterative numerical optimization mode by taking the position of the new foot key point as a constraint.
The motion capture system based on the parameterized model comprises a human body detection module, a foot touchdown detection module, a human body gesture capture module, an absolute position estimation module and a data optimization module;
the human body detection module is used for locating the position of a target person in a video picture by collecting RGB video or RGBD video matched with depth information, locating the positions of both hands of the target person and obtaining a target person and a boundary frame of the positions of both hands of the target person;
the foot touchdown detection module is used for obtaining a classification result of the feet of the person by utilizing a classification algorithm model according to the region image in the target person boundary box and judging whether the feet of the person are contacted with the ground or not;
the human body posture capturing module is used for capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
the absolute position estimation module is used for obtaining the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm according to the regional image in the target person boundary box and estimating the displacement information of the target person;
the data optimization module is used for obtaining the optimized rotation value for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through the average filtering processing and the inverse kinematics optimization algorithm according to the rotation value of each joint point of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
Compared with the prior art, the invention discloses a motion capturing method and a motion capturing system based on a parameterized model, which have the following advantages:
the effects of capturing motion and driving digital virtual persons are realized through the low-cost household RGB camera, and the deployment is simple and quick;
the end-to-end motion capture system and the data optimization method are provided, in actual operation, optimized motion capture data can be obtained only by inputting RGB video, and a more real effect is achieved;
the multi-view scheme is selected to enable the motion capture result to be more accurate and stable;
the method can capture the detailed information of the hand, and can be applied to more actual scenes by combining the information of the body;
the data optimization method can further optimize the result of motion capture, achieves a more anthropomorphic and real driving effect, is high in processing speed, flexible in setting and universality, and can be applied to virtual digital human models with different skeleton structures through simple modification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a motion capture method based on a parameterized model according to the present invention;
FIG. 2 is a schematic diagram of a human body posture estimation method based on a parameterized model provided by the invention;
FIG. 3 is a schematic diagram of a motion capture system based on a parameterized model according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a motion capture method based on a parameterized model, as shown in fig. 1, comprising the following steps:
s1, acquiring RGB video or RGBD video matched with depth information;
s2, positioning the positions of the target persons in the video picture, and simultaneously positioning the positions of the two hands of the target persons to obtain the target persons and the position boundary boxes of the two hands of the target persons;
s3, according to the regional image in the target person boundary box, a classification algorithm model is utilized to obtain a classification result of the person feet, and whether the person feet are in contact with the ground or not is judged;
capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
according to the regional image in the boundary frame of the target person, obtaining the 3D coordinates of the target person in a camera coordinate system through an absolute position estimation algorithm, and estimating the displacement information of the target person;
s4, according to the rotation value of each joint point of the human body, the coordinates of the human body in a camera coordinate system and the two classification results of whether the feet of the person are grounded or not, the rotation value for eliminating the sliding and floating of the feet and the coordinates of the human body in the camera coordinate system after optimization are obtained through an average value filtering process and an inverse kinematics optimization algorithm.
In order to further implement the above technical solution, the obtaining of the target person and the bounding boxes of the positions of the two hands in S2 is implemented by the mainstream target detection algorithm YOLO.
In this embodiment, the video captured by the camera is passed through the mainstream object detection algorithm YOLO frame by frame, and the position of the target person in each frame, and the positions of both hands thereof are output, wherein the positions are represented by the form of a bounding box in which the person and hands are to be completely included, and in order to ensure this, the bounding box of the output is enlarged as a whole.
In order to further implement the above technical solution, in step S3, the two-class algorithm model includes a multi-layer perceptron MLP, which includes an input layer, three hidden layers, and an output layer, where five layers of networks are all fully connected layers, and the loss function of the two-class algorithm model uses two-class cross entropy loss functions.
In order to further implement the above technical solution, as shown in fig. 2, the human body parameterized three-dimensional model in step S3 includes an encoder, a spatial feature pyramid network, and a regressor;
the input image is output by the encoder and contains feature graphs of rich semantic information, then the input space pyramid network further extracts features, finally input parameters required by the human body parameterized three-dimensional model and estimated camera parameters are output by the regressive, the input parameters are rotation values of all bone points, the 3D key point positions and the 2D key point positions of the human body are obtained by forward kinematics and the camera parameters, the three-dimensional model is used for calculating a loss function, and the human body parameterized three-dimensional model is trained by reconstructing the loss function.
In order to further implement the technical scheme, the reconstruction loss function of the human body parameterized three-dimensional model is specifically:
L reg =λ 2d ||K-K gt ||+λ 3d ||J-J gt ||+λ para ||Θ-Θ gt ||
wherein K represents a 2D key point position, J represents a 3D key point position, θ represents a human body parameterized three-dimensional model input parameter and a camera parameter, λ represents the weights of the different parts, |·| represents the L2 norm.
In order to further implement the above technical solution, the absolute position estimation algorithm in step S3 includes a backbone network and two regressors, where the backbone network is formed by multiple convolution layers, the regressors are mainly formed by full connection layers, the image is extracted by the backbone network to extract features, the two regressors are input respectively, camera parameters and 3D coordinates of the relative root node are estimated respectively, and then the estimated camera parameters are used to convert the 3D coordinates of the relative root node into absolute 3D coordinates of the camera coordinate system.
In order to further implement the above technical solution, the loss function of the absolute position estimation algorithm is an L1 norm, specifically:
L=||R-R gt || 1
where R represents the absolute 3D coordinates of the camera coordinate system.
In order to further implement the above technical solution, step S2 includes a single person mode or a multi-person mode, wherein in the single person mode, if a plurality of persons appear on the screen, only one of the bounding boxes with the largest proportion of the screen is output; the multi-view mode is to match the boundary boxes of the same person in different views through a matching algorithm, and position of the same person in different views is located;
in step S3, during the multi-view mode, summarizing the classification results of the multiple views, and taking the classification result of the majority of the views as the classification result of the two feet of the person; outputting the final rotation value of the rotation value output by each visual angle through a multi-visual angle fusion algorithm; the mean of the 3D coordinates of the multiple view angle estimates is the 3D coordinates of the target person in the camera coordinate system.
In order to further implement the above technical solution, the specific content of step S4 includes:
s41, eliminating jitter information in data through mean value filtering processing of rotation values of all joint points of a human body and coordinates of the human body in a camera coordinate system;
in practical application, the filtering process has a good effect on jitter with the mean value of 0, and the phenomenon of foot sliding and floating can also occur when a virtual digital person is actually driven, so that an optimization algorithm based on inverse kinematics is further carried out:
s42, calculating the position of a new foot key point according to the classification result of whether the left foot and the right foot of the human body contact the ground or not by an interpolation method;
s43, optimizing the rotation value of each joint point of the human body in an iterative numerical optimization mode by taking the position of the new foot key point as a constraint.
A motion capture system based on a parameterized model, as shown in fig. 3, comprises a human body detection module, a foot touchdown detection module, a human body gesture capture module, an absolute position estimation module and a data optimization module;
the human body detection module is used for locating the position of a target person in a video picture by collecting RGB video or RGBD video matched with depth information, locating the positions of both hands of the target person and obtaining a target person and a boundary frame of the positions of both hands of the target person;
the foot touchdown detection module is used for obtaining a classification result of the feet of the person by utilizing a classification algorithm model according to the region image in the target person boundary box and judging whether the feet of the person are contacted with the ground or not;
the human body posture capturing module is used for capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
the absolute position estimation module is used for obtaining the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm according to the regional image in the target person boundary box and estimating the displacement information of the target person;
the data optimization module is used for obtaining the optimized rotation value for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through the average filtering processing and the inverse kinematics optimization algorithm according to the rotation value of each joint point of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a parameterized model based motion capture method.
A processing terminal comprises a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, and the processor realizes a motion capture method based on a parameterized model when executing the computer program.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The motion capturing method based on the parameterized model is characterized by comprising the following steps of:
s1, acquiring RGB video or RGBD video matched with depth information;
s2, positioning the positions of the target persons in the video picture, and simultaneously positioning the positions of the two hands of the target persons to obtain the target persons and the position boundary boxes of the two hands of the target persons;
s3, according to the regional image in the target person boundary box, a classification algorithm model is utilized to obtain a classification result of the person feet, and whether the person feet are in contact with the ground or not is judged;
capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
according to the regional image in the boundary frame of the target person, obtaining the 3D coordinates of the target person in a camera coordinate system through an absolute position estimation algorithm, and estimating the displacement information of the target person;
s4, according to the rotation value of each joint point of the human body, the coordinates of the human body in a camera coordinate system and the two classification results of whether the feet of the person are grounded or not, the rotation value for eliminating the sliding and floating of the feet and the coordinates of the human body in the camera coordinate system after optimization are obtained through an average value filtering process and an inverse kinematics optimization algorithm.
2. The parameterized model-based motion capture method of claim 1, wherein the obtaining of the bounding boxes for the target person and his hands in S2 is performed by a mainstream target detection algorithm YOLO.
3. The motion capture method based on a parameterized model of claim 1, wherein in step S3, the two-class algorithm model includes a multi-layer perceptron MLP, which includes an input layer, three hidden layers, an output layer, and five layers of networks all being fully connected layers, and the loss function of the two-class algorithm model employs a two-class cross entropy loss function.
4. The method of claim 1, wherein the three-dimensional model of human body parameterization in step S3 comprises an encoder, a spatial feature pyramid network, and a regressor;
the input image is output by the encoder and contains feature graphs of rich semantic information, then the input space pyramid network further extracts features, finally input parameters required by the human body parameterized three-dimensional model and estimated camera parameters are output by the regressive, the input parameters are rotation values of all bone points, the 3D key point positions and the 2D key point positions of the human body are obtained by forward kinematics and the camera parameters, the three-dimensional model is used for calculating a loss function, and the human body parameterized three-dimensional model is trained by reconstructing the loss function.
5. The method for motion capture based on parameterized model of claim 4, wherein the reconstruction loss function of training the parameterized three-dimensional model of the human body is specifically:
L reg =λ 2d ||K-K gt ||+λ 3d ||J-J gt ||+λ para ||Θ-Θ gt ||
wherein K represents a 2D key point position, J represents a 3D key point position, θ represents a human body parameterized three-dimensional model input parameter and a camera parameter, λ represents the weights of the different parts, |·| represents the L2 norm.
6. The motion capture method based on a parameterized model of claim 1, wherein the absolute position estimation algorithm in step S3 comprises a backbone network and two regressions, wherein the backbone network is composed of a plurality of convolution layers, the regressions are mainly composed of full connection layers, the image is extracted by the backbone network to be characterized, the two regressions are respectively input, camera parameters and 3D coordinates of the relative root node are respectively estimated, and the estimated camera parameters are used for converting the 3D coordinates of the relative root node into absolute 3D coordinates of a camera coordinate system.
7. The motion capture method based on the parameterized model of claim 6, wherein the loss function of the absolute position estimation algorithm is L1 norm, specifically:
L=||R-R gt || 1
where R represents the absolute 3D coordinates of the camera coordinate system.
8. The method of claim 1, wherein step S2 includes a single-person mode or a multi-person mode, wherein in the single-person mode, if a plurality of persons appear on a screen, only one of the bounding boxes with the largest specific gravity of the screen is output; the multi-view mode is to match the boundary boxes of the same person in different views through a matching algorithm, and position of the same person in different views is located;
in step S3, during the multi-view mode, summarizing the classification results of the multiple views, and taking the classification result of the majority of the views as the classification result of the two feet of the person; outputting the final rotation value of the rotation value output by each visual angle through a multi-visual angle fusion algorithm; the mean of the 3D coordinates of the multiple view angle estimates is the 3D coordinates of the target person in the camera coordinate system.
9. The method for motion capture based on parameterized model of claim 1, wherein the specific content of step S4 comprises:
s41, eliminating jitter information in data through mean value filtering processing of rotation values of all joint points of a human body and coordinates of the human body in a camera coordinate system;
s42, calculating the position of a new foot key point according to the classification result of whether the left foot and the right foot of the human body contact the ground or not by an interpolation method;
s43, optimizing the rotation value of each joint point of the human body in an iterative numerical optimization mode by taking the position of the new foot key point as a constraint.
10. A motion capture system based on a parameterized model, characterized in that the motion capture system based on the parameterized model of any one of claims 1-9 comprises a human body detection module, a foot touchdown detection module, a human body gesture capture module, an absolute position estimation module and a data optimization module;
the human body detection module is used for locating the position of a target person in a video picture by collecting RGB video or RGBD video matched with depth information, locating the positions of both hands of the target person and obtaining a target person and a boundary frame of the positions of both hands of the target person;
the foot touchdown detection module is used for obtaining a classification result of the feet of the person by utilizing a classification algorithm model according to the region image in the target person boundary box and judging whether the feet of the person are contacted with the ground or not;
the human body posture capturing module is used for capturing and estimating the rotation value of each joint point of the human body by utilizing the human body parameterized three-dimensional model according to the region image in the target person boundary frame and the region image of the target person double-hand boundary frame;
the absolute position estimation module is used for obtaining the 3D coordinates of the target person in the camera coordinate system through an absolute position estimation algorithm according to the regional image in the target person boundary box and estimating the displacement information of the target person;
the data optimization module is used for obtaining the optimized rotation value for eliminating foot sliding and floating and the coordinates of the human body in the camera coordinate system through the average filtering processing and the inverse kinematics optimization algorithm according to the rotation value of each joint point of the human body, the coordinates of the human body in the camera coordinate system and the two classification results of whether the feet of the human body are grounded or not.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311754272.6A CN117541646A (en) | 2023-12-20 | 2023-12-20 | Motion capturing method and system based on parameterized model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311754272.6A CN117541646A (en) | 2023-12-20 | 2023-12-20 | Motion capturing method and system based on parameterized model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117541646A true CN117541646A (en) | 2024-02-09 |
Family
ID=89792079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311754272.6A Pending CN117541646A (en) | 2023-12-20 | 2023-12-20 | Motion capturing method and system based on parameterized model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117541646A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190286892A1 (en) * | 2018-03-13 | 2019-09-19 | Adobe Inc. | Interaction Detection Model for Identifying Human-Object Interactions in Image Content |
CN113033369A (en) * | 2021-03-18 | 2021-06-25 | 北京达佳互联信息技术有限公司 | Motion capture method, motion capture device, electronic equipment and computer-readable storage medium |
US11182924B1 (en) * | 2019-03-22 | 2021-11-23 | Bertec Corporation | System for estimating a three dimensional pose of one or more persons in a scene |
CN114519758A (en) * | 2022-02-28 | 2022-05-20 | 广州虎牙科技有限公司 | Method and device for driving virtual image and server |
CN114550292A (en) * | 2022-02-21 | 2022-05-27 | 东南大学 | High-physical-reality human body motion capture method based on neural motion control |
WO2022241583A1 (en) * | 2021-05-15 | 2022-11-24 | 电子科技大学 | Family scenario motion capture method based on multi-target video |
CN116386141A (en) * | 2023-03-30 | 2023-07-04 | 南京大学 | Multi-stage human motion capturing method, device and medium based on monocular video |
CN116721471A (en) * | 2023-08-10 | 2023-09-08 | 中国科学院合肥物质科学研究院 | Multi-person three-dimensional attitude estimation method based on multi-view angles |
CN116934972A (en) * | 2023-07-26 | 2023-10-24 | 石家庄铁道大学 | Three-dimensional human body reconstruction method based on double-flow network |
CN116958355A (en) * | 2023-08-21 | 2023-10-27 | 北京字跳网络技术有限公司 | Action animation generation method and device, electronic equipment and storage medium |
-
2023
- 2023-12-20 CN CN202311754272.6A patent/CN117541646A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190286892A1 (en) * | 2018-03-13 | 2019-09-19 | Adobe Inc. | Interaction Detection Model for Identifying Human-Object Interactions in Image Content |
US11182924B1 (en) * | 2019-03-22 | 2021-11-23 | Bertec Corporation | System for estimating a three dimensional pose of one or more persons in a scene |
CN113033369A (en) * | 2021-03-18 | 2021-06-25 | 北京达佳互联信息技术有限公司 | Motion capture method, motion capture device, electronic equipment and computer-readable storage medium |
WO2022241583A1 (en) * | 2021-05-15 | 2022-11-24 | 电子科技大学 | Family scenario motion capture method based on multi-target video |
CN114550292A (en) * | 2022-02-21 | 2022-05-27 | 东南大学 | High-physical-reality human body motion capture method based on neural motion control |
CN114519758A (en) * | 2022-02-28 | 2022-05-20 | 广州虎牙科技有限公司 | Method and device for driving virtual image and server |
CN116386141A (en) * | 2023-03-30 | 2023-07-04 | 南京大学 | Multi-stage human motion capturing method, device and medium based on monocular video |
CN116934972A (en) * | 2023-07-26 | 2023-10-24 | 石家庄铁道大学 | Three-dimensional human body reconstruction method based on double-flow network |
CN116721471A (en) * | 2023-08-10 | 2023-09-08 | 中国科学院合肥物质科学研究院 | Multi-person three-dimensional attitude estimation method based on multi-view angles |
CN116958355A (en) * | 2023-08-21 | 2023-10-27 | 北京字跳网络技术有限公司 | Action animation generation method and device, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
XINLIANG WEI 等: "The Application of Motion Capture and 3D Skeleton Modeling in Virtual Fighting", NEXT GENERATION COMPUTER ANIMATION TECHNIQUES CONFERENCE PAPER, 1 November 2017 (2017-11-01) * |
罗飘;刘晓平;: "面向Kinect运动数据的鲁棒足迹检测", 中国图象图形学报, no. 02, 16 February 2016 (2016-02-16) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109636831B (en) | Method for estimating three-dimensional human body posture and hand information | |
CN110637323B (en) | Method, device and system for part-based tracking | |
Zuffi et al. | Lions and tigers and bears: Capturing non-rigid, 3d, articulated shape from images | |
Li et al. | Learning the depths of moving people by watching frozen people | |
Rematas et al. | Soccer on your tabletop | |
Zheng et al. | Hybridfusion: Real-time performance capture using a single depth sensor and sparse imus | |
Mueller et al. | Real-time hand tracking under occlusion from an egocentric rgb-d sensor | |
Yan et al. | Ddrnet: Depth map denoising and refinement for consumer depth cameras using cascaded cnns | |
CN113706699B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN110070595B (en) | Single image 3D object reconstruction method based on deep learning | |
CN113628348A (en) | Method and equipment for determining viewpoint path in three-dimensional scene | |
CN111476884B (en) | Real-time three-dimensional human body reconstruction method and system based on single-frame RGBD image | |
CN114450719A (en) | Human body model reconstruction method, reconstruction system and storage medium | |
CN114119739A (en) | Binocular vision-based hand key point space coordinate acquisition method | |
Wang et al. | CNN-monofusion: online monocular dense reconstruction using learned depth from single view | |
CN117218246A (en) | Training method and device for image generation model, electronic equipment and storage medium | |
Reinert et al. | Animated 3D creatures from single-view video by skeletal sketching. | |
Li et al. | Three-dimensional motion estimation via matrix completion | |
Fu et al. | CBAM-SLAM: A semantic slam based on attention module in dynamic environment | |
CN117711066A (en) | Three-dimensional human body posture estimation method, device, equipment and medium | |
Khan et al. | Towards monocular neural facial depth estimation: Past, present, and future | |
JP2002032741A (en) | System and method for three-dimensional image generation and program providing medium | |
Cohen et al. | Inference of 3D human body posture from multiple cameras for vision-based user interface | |
CN117541646A (en) | Motion capturing method and system based on parameterized model | |
CN114926594A (en) | Single-view-angle shielding human body motion reconstruction method based on self-supervision space-time motion prior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |