WO2022120843A1 - 三维人体重建方法、装置、计算机设备和存储介质 - Google Patents

三维人体重建方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2022120843A1
WO2022120843A1 PCT/CN2020/135932 CN2020135932W WO2022120843A1 WO 2022120843 A1 WO2022120843 A1 WO 2022120843A1 CN 2020135932 W CN2020135932 W CN 2020135932W WO 2022120843 A1 WO2022120843 A1 WO 2022120843A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
parameters
facial
human body
image
Prior art date
Application number
PCT/CN2020/135932
Other languages
English (en)
French (fr)
Inventor
刘宝玉
王磊
马晓亮
林佩珍
程俊
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2020/135932 priority Critical patent/WO2022120843A1/zh
Publication of WO2022120843A1 publication Critical patent/WO2022120843A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation

Definitions

  • the present application belongs to the technical field of computer vision, and in particular, relates to a three-dimensional human body reconstruction method, device, computer equipment and storage medium.
  • 3D human body reconstruction is an important technical means to understand people's gestures, communication clues and interactive meanings through images or videos.
  • the existing 3D human body reconstruction often ignores the face reconstruction, and the existing face reconstruction methods do not It is suitable to be added to the body model, so the existing three-dimensional human body reconstruction usually forms a three-dimensional human body model without facial expressions.
  • Embodiments of the present application provide a three-dimensional human body reconstruction method, apparatus, computer equipment, and storage medium, so as to solve the problem of ignoring face reconstruction based on a three-dimensional human body model.
  • a technical solution adopted in this application is to provide a three-dimensional human body reconstruction method, comprising:
  • a three-dimensional model of the human body of the target object is obtained through the SMPL-X model.
  • the method further includes:
  • the step of extracting the facial image in the human body image includes:
  • the face image is obtained by cutting the human body image by using the face frame.
  • the method before the step of using the facial image and obtaining the facial parameters according to the facial parameter prediction model, the method further includes:
  • the qualified key point position data and its corresponding face image are input into the first preset neural network model for training, and the facial parameter prediction model is obtained.
  • the method before the step of inputting the human body image into a gender classifier model to obtain gender parameters, the method further includes:
  • the step of obtaining the three-dimensional human body model of the target object by using the SMPL-X model according to the facial parameter, the body parameter and the gender parameter includes:
  • the three-dimensional model of the human body is constructed by integrating the three-dimensional face model and the three-dimensional body model.
  • the step of obtaining the three-dimensional human body model of the target object by using the SMPL-X model according to the facial parameter, the hand parameter, the body parameter and the gender parameter includes:
  • the three-dimensional model of the human body is constructed by integrating the three-dimensional model of the face, the three-dimensional model of the hand, and the three-dimensional model of the body.
  • the described facial parameters are input into the facial model included in the SMPL-X model, and the step of obtaining a 3D facial model includes:
  • the facial parameters include facial shape, facial expression, facial posture and facial image camera parameters.
  • the step of inputting the hand parameters into the hand model included in the SMPL-X model, and obtaining the three-dimensional model of the hand includes:
  • the hand parameters corresponding to the left hand and the right hand are respectively input into the hand model, and the 3D mesh vertex coordinates of the left hand and the right hand are obtained to obtain the hand 3D model;
  • the hand parameters include hand joint parameters , hand shape parameters and hand image camera parameters;
  • the step of inputting the body parameters into the body model included in the SMPL-X model to obtain a three-dimensional body model includes:
  • the body parameters are input into the body model, and the coordinates of the vertices of the body 3D mesh are obtained to obtain the body 3D model; the body parameters include body joint parameters, body shape parameters and body image camera parameters.
  • the step of inputting the human body image into a gender classifier model to obtain gender parameters includes:
  • the gender classifier model outputs the gender parameter and the gender probability that the target is the gender parameter; the gender parameter is a male parameter or a female parameter.
  • the present application also provides a three-dimensional human body reconstruction device, comprising:
  • the acquisition module is used to acquire the human body image of the target
  • an extraction module for extracting the face image and the body image in the human body image
  • a face prediction module used for obtaining facial parameters based on the facial image using a facial parameter prediction model
  • a body prediction module for obtaining body parameters based on the body image using a body parameter prediction model
  • a gender detection module for inputting the human body image into a gender classifier model to obtain gender parameters
  • the model processing module is used for obtaining the three-dimensional human body model of the target object through the SMPL-X model according to the facial parameters, the body parameters and the gender parameters.
  • the extraction module is further configured to extract the hand image in the human body image
  • the device further includes: a hand prediction module, configured to obtain hand parameters by using a hand parameter prediction model based on the hand image;
  • the model processing module is configured to obtain a three-dimensional human body model of the target object through the SMPL-X model according to the facial parameters, the hand parameters, the body parameters and the gender parameters.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps:
  • a three-dimensional model of the human body of the target object is obtained through the SMPL-X model.
  • the present application also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by the processor, causes the processor to perform the following steps:
  • a three-dimensional model of the human body of the target object is obtained through the SMPL-X model.
  • the beneficial effect of the three-dimensional human body reconstruction method provided by the present application is that: the present application adopts the structure of the SMPL-X human body prediction model to carry out the final integration of facial parameters, hand parameters, body parameters and gender parameters,
  • the corresponding parameter prediction models are partially input to realize the formation of a three-dimensional human body model with complete facial reconstruction through human body images.
  • the application based on optimization is simple and time-consuming.
  • FIG. 1 is a schematic flowchart of a three-dimensional human body reconstruction method provided by an embodiment of the present application
  • Fig. 2-(a), Fig. 2-(b), Fig. 2-(c), Fig. 2-(d) are schematic diagrams of examples of human body images in this application;
  • FIG. 3 is a schematic diagram of a formation process of a facial parameter prediction model provided by an embodiment of the present application.
  • Fig. 4 is the schematic diagram of facial key point
  • Fig. 5 is a kind of schematic diagram of SMPL-X human body prediction model structure
  • Fig. 6 is the prediction result comparison diagram of adopting SMPL-X human body prediction model, the human body prediction model without gender parameter and the SMPL-X human body prediction model with gender parameter;
  • FIG. 7 is a structural block diagram of a three-dimensional human body reconstruction apparatus provided by an embodiment of the present application.
  • FIG. 8 is an internal structure diagram of a computer device in an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • the present application first acquires a human body image of the target object to be reconstructed by 3D human body, then extracts a face image and a body image based on the obtained human body image, and predicts the model and body parameters according to the facial parameters based on the face image and the body image respectively.
  • the prediction model obtains facial parameters and body parameters, and uses the human body image and gender classifier model to obtain gender parameters.
  • the SMPL-X model is used to obtain the three-dimensional human body model of the target object.
  • the three-dimensional human body model obtained in the present application fully considers facial reconstruction, which is more conducive to knowing the pose, communication clues and interactive meaning of the human body through the three-dimensional human body model of the present application.
  • a three-dimensional human body reconstruction method provided by an embodiment of the present application may include:
  • Step 1 Obtain a human body image of the target.
  • a human body image of the target is acquired, wherein the human body image may include a face, hands and a body.
  • the human body image can be an RGB image.
  • Fig. 2-(a), Fig. 2-(b), Fig. 2-(c), and Fig. 2-(d) respectively show example images in which the applicant's body image adopts RGB image.
  • Step 2 extract the face image and the body image in the human body image.
  • the method of obtaining the face image in the human body image can be adopted, including but not limited to CenterFace (a practical edge device anchor-free face detection), MTCNN (Multi-task convolutional neural network, joint face based on multi-task cascaded convolutional network) detection and alignment), FaceBoxes (a high-precision CPU real-time face detector), and RetinaFace (one-stage dense face localization in the wild).
  • CenterFace a practical edge device anchor-free face detection
  • MTCNN Multi-task convolutional neural network, joint face based on multi-task cascaded convolutional network detection and alignment
  • FaceBoxes a high-precision CPU real-time face detector
  • RetinaFace one-stage dense face localization in the wild.
  • Step 21 Obtain face frame labels through the face dataset.
  • the face dataset uses the vggface2 dataset, which includes face images, corresponding face border labels, and gender labels.
  • Step 22 Detect the face frame of the human body image according to the face frame label.
  • the size of the face frame may be 224 ⁇ 224.
  • step 23 the human body image is cut by using the face frame to obtain a face image.
  • the method of extracting body images can use Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose (CPU-based real-time two-dimensional multi-person pose estimation: Lightweight OpenPose), Daniil Osokin.Real-time 2D Multi-Person Pose Estimation on CPU:LightweightOpenPose.In arXiv preprint arXiv:1811.12004, 2018, not limited here.
  • the method of obtaining the body image in the human body image may include but not limited to Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose (CPU-based real-time 2D multi-person pose estimation: Lightweight OpenPose), Daniil Osokin .Real-time 2D Multi-Person Pose Estimation on CPU:LightweightOpenPose.In arXiv preprint arXiv:1811.12004, 2018, not limited here.
  • Step 3 using the facial image and obtaining the facial parameters according to the facial parameter prediction model.
  • the formation process of the facial parameter prediction model can be as follows:
  • Step 31 Acquire the facial key point position data of each image in the facial dataset.
  • OpenPose also known as the Human Pose Recognition Project
  • CMU Carnegie Mellon University
  • VGGFace2 is a large-scale face recognition dataset, including 3.31 million pictures, 9131 IDs, and the average number of pictures is 362.6. VGGFace2 has many character IDs, and each ID contains a large number of pictures, covering a wide range of postures, ages and races.
  • the facial key point position data may include the x, y coordinates and confidence of each key point, and the confidence is less than 1.
  • the position data of face key points can be selected as needed, for example, 68 pieces of face key point position data can be selected, or the number of face key point position data that can better express facial expressions can be selected.
  • Step 32 Screen multiple sets of face key point position data according to preset reliability to obtain qualified key point position data.
  • the preset reliability can be set according to actual needs, and the facial key point data whose confidence is higher than the preset reliability among the multiple groups of face key point data is qualified key point position data.
  • the preset reliability can be If it is 0.4, it is equivalent to selecting data with a confidence level between 0.4 and 1 as the qualified key point position data.
  • Step 33 Input the qualified key point position data and its corresponding face image into the first preset neural network model for training, and obtain a facial parameter prediction model.
  • the facial parameter prediction model Wf includes an encoder and a decoder, wherein the encoder is a first preset neural network model, which uses ResNet50 , that is, a network ResNet50 pre-trained for feature extraction to extract 2048-dimensional features from a two-dimensional picture.
  • the decoder consists of a set of fully connected layers, which regress the head parameters from the features, including the face shape parameter ⁇ f , the facial pose parameter ⁇ f , the facial expression parameter and face image camera parameters c f .
  • ResNet is Residual Network, short for Residual Network, which is widely used in fields such as object classification and as part of the classical neural network backbone of computer vision tasks.
  • ResNet50 refers to a ResNet that contains 50 two-dimensional convolutions. It performs a convolution operation on the qualified key point position data, then contains 4 residual blocks (Residual Block), and finally performs a full connection operation to facilitate the classification task.
  • the neural network model has loss items in the training process, and after adding the loss items, the trained neural network model is obtained.
  • the loss term L total in the training process of the first preset neural network model can be expressed as:
  • ⁇ proj is the L proj loss term weight
  • L proj is the keypoint loss
  • K 2D is the 2D labeled coordinates of the facial keypoints
  • ⁇ proj is the L proj loss term weight.
  • is the face shape loss term weight is the loss of facial shape
  • is the facial expression loss term weight is the loss of facial expression.
  • Step 4 obtaining body parameters based on the body image and using the body parameter prediction model
  • the body parameter prediction model can be used including but not limited to Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop (reconstructing 3D human pose and shape through model fitting in the loop), see Nikos Kolotouros, Georgios for details Pavlakos, Michael J Black, and Kostas Daniilidis. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019.
  • Step 5 Input the human body image into the gender classifier model to obtain gender parameters.
  • step 5 may include: the gender classifier model outputs a gender parameter and a gender probability that the target is a gender parameter; the gender parameter is a male parameter or a female parameter. If the gender probability output by the gender classifier model is lower than the preset threshold, the gender parameter is directly determined as a neutral parameter.
  • the gender classifier model can use a classifier trained by Smplify-X in SMPL-X. Specifically, before entering the human body image into the gender classifier model and obtaining the gender parameter step, that is, before step 5, it also includes the gender classifier model. Formation process:
  • Step 51 Perform gender labeling on each face image in the face data set and obtain the position data of key points of the face.
  • the face dataset uses several human RGB images containing faces, and the position data of face key points can be obtained through OpenPose.
  • Step 52 Input the facial key point position data and the corresponding gender label into the second preset neural network model for training to obtain a gender classifier model.
  • the second preset neural network model may use Resnet18.
  • Step 6 Obtain a three-dimensional model of the human body of the target object through the SMPL-X model according to the facial parameters, body parameters and gender parameters.
  • step 2 can also be: extracting the face image, hand image and body image in the human body image, that is, in addition to extracting the face image and the body image, the hand image in the human body image can also be extracted.
  • the three-dimensional human body reconstruction method provided in this embodiment may include obtaining hand parameters based on a hand image and using a hand parameter prediction model.
  • step 6 may also be based on facial parameters, hand parameters, and body parameters. and gender parameters, the 3D model of the human body of the target is obtained through the SMPL-X model. That is, the reconstructed 3D model of the human body in this embodiment may include a 3D model of the hand in addition to the 3D face model and the 3D body model.
  • the method of extracting hand images may include but not limited to Understanding Human Hands in Contact at Internet Scale (understanding human hand contact on the Internet scale), see Shan, D., Geng, J., Shu, M. , and Fouhey, D.F. Understanding Human Hands in Contact at Internet Scale..In arXiv preprint arXiv:2006.06669, 2020.
  • Hand parameter prediction models can be used including, but not limited to, FrankMocap (Fast Monocular 3D Hand and Body Motion Capture Based on Regression and Integration).
  • the SMPL-X human prediction model in this application is different from the SMPL model which only focuses on the body movement part, and also includes hand movements, such as the bending and opening of fingers, and facial expressions.
  • Figure 5 shows each model required to obtain a three-dimensional model of the human body according to the human body image.
  • the function of the SMPL-X human body prediction model can be expressed as:
  • is a shape parameter that describes the shape of the human body model, including a 10-dimensional linear shape parameter.
  • Shape parameters are coefficients in a low-dimensional shape space, obtained from a training set of thousands of registered scans.
  • is an action parameter, which is used to describe the rotation of the joints of the human body model, including the joint angle information of 21 joints of the human body, 30 joints of the hand and 3 joints of the face, a total of 54 joints.
  • TP is the average template, used to describe according to ⁇ , ⁇ , The changes required to make the deformation from the template model, where the template model is the SMPL-X model.
  • J is the three-dimensional joint position.
  • Step 61 Input the facial parameters into the facial model included in the SMPL-X model to obtain a three-dimensional facial model.
  • step 61 may specifically include the following steps:
  • the facial parameters include facial shape, facial expression, facial pose, and facial image camera parameters.
  • the facial parameter prediction model W f adopts the facial model in the SMPL-X model.
  • the facial parameter prediction model uses an end-to-end neural network structure to regress the head parameters, which can be expressed as:
  • ⁇ f is the face shape
  • ⁇ f is the face pose, expressed by the rotation angle
  • cf is the face image camera parameter
  • the parameters obtained by the regressor are input into the face model, and the vertex coordinates of the face 3D mesh are obtained, that is, the 3D face model.
  • Step 62 Input the hand parameters into the hand model included in the SMPL-X model to obtain a three-dimensional model of the hand.
  • step 62 may specifically include the following steps:
  • the hand joint parameters, hand shape parameters and hand image camera parameters corresponding to the left hand and the right hand are respectively input into the hand model, and the 3D mesh vertex coordinates of the left hand and the right hand are obtained, namely the hand 3D model.
  • the hand parameters include hand joint parameters, hand shape parameters and hand image camera parameters.
  • the hand parameter prediction model W h the two-dimensional images of the left hand and the right hand are first detected by the hand frame detector, which is used as the input of the hand parameter prediction model, and the hand joint parameters and hand shape parameters are predicted. Expressed as:
  • I h is the two-dimensional image of the hand
  • ⁇ h is the hand shape parameter, and the value is the same as ⁇ in function (1).
  • ⁇ h is the 15 joint parameters of the hand, and the value is the same as ⁇ in function (1).
  • c h is the hand image camera parameter, which is a real number.
  • Step 63 Input the body parameters into the body model included in the SMPL-X model to obtain a three-dimensional body model.
  • step 63 may include:
  • Body parameters include body joint parameters, body shape parameters, and body image camera parameters.
  • the neural network-based model is used to predict the 21 joint parameters and shape parameters of the body, which can be expressed as:
  • the input I is a two-dimensional image of a person, that is, an RGB image containing a human body, and the outputs are ⁇ b , ⁇ b , and c b .
  • ⁇ b is the body shape parameter, a 10-dimensional vector, the value is the same as ⁇ in function (1), ⁇ b is the rotation angle of 21 joints of the body, and the value is the same as ⁇ in function (1).
  • c b is the camera parameter of the body, the value is a real number.
  • the SMPL-X human body prediction model is an extended model of hand and face added to the SMPL human body prediction model.
  • This application adopts the structure of the SMPL-X human body prediction model to integrate facial parameters, hand parameters, body parameters and gender parameters, and input each part of the human body image of the target into the corresponding parameter prediction model to realize the formation of the human body image. 3D mannequin with facial reconstruction. Compared with existing human body prediction models, the optimization-based method is simple and time-consuming.
  • the images in the first column are human body images
  • the images in the second column are 3D human body images processed by the existing human body prediction model
  • the images in the third column are 3D human body images without gender and with facial expressions
  • the images in the four columns are the three-dimensional images of the human body with gender and facial expressions. It can be seen from the images in FIG. 6 that the three-dimensional images of the human body with gender and facial expressions are more accurate.
  • an embodiment of the present application provides a three-dimensional human body reconstruction device, which may include an acquisition module 21 , an extraction module 22 , a face prediction module 23 , a body prediction module 25 , a gender detection module 26 and a model processing module 27 .
  • the acquisition module 21 can be used to acquire the human body image of the target.
  • the extraction module 22 may be used to extract face images and body images from the human body images.
  • the face prediction module 23 can be used to obtain face parameters by using a face parameter prediction model based on the face image.
  • the body prediction module 25 can be used to obtain body parameters by using a body parameter prediction model based on the body image.
  • the gender detection module 26 can be used to input the human body image into the gender classifier model to obtain gender parameters.
  • the model processing module 27 can be used to obtain a three-dimensional human body model of the target object through the SMPL-X model according to the facial parameters, body parameters and gender parameters.
  • the extraction module 22 may also be used to extract the hand image in the human body image.
  • the three-dimensional human body reconstruction apparatus may further include a hand prediction module 24, and the hand prediction module 24 may be configured to obtain hand parameters by using a hand parameter prediction model based on the hand image.
  • the model processing module 27 can be configured to obtain a three-dimensional human body model of the target object through the SMPL-X model according to the facial parameters, hand parameters, body parameters and gender parameters.
  • the face prediction module 23 may include an acquisition unit, a screening unit, and a processing unit.
  • the acquiring unit is used for acquiring the key point position data of the face of each image in the face dataset.
  • the screening unit is used for screening multiple sets of face key point position data according to preset reliability to obtain qualified key point position data.
  • the screening unit is used for inputting qualified key point position data into the first preset neural network model for training to obtain a facial parameter prediction model.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor causes the processor to execute the three-dimensional human body in any of the embodiments. Steps of the reconstruction method.
  • Figure 8 shows an internal structure diagram of a computer device in one embodiment.
  • the computer device includes a processor, memory, and a network interface connected by a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and also stores a computer program.
  • the processor can implement the three-dimensional human body reconstruction method.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute the three-dimensional human body reconstruction method.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method steps of the three-dimensional human body reconstruction in the foregoing embodiments can be implemented.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware.
  • the computer program can be stored in a computer-readable storage medium, and the computer program When executed by the processor, the steps of the above-mentioned various method embodiments may be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

适用于计算机视觉技术领域,尤其涉及三维人体重建方法、装置、计算机设备和存储介质,三维人体重建方法包括:获取目标物的人体图像(1);提取人体图像中的面部图像和身体图像(2);利用面部图像并根据面部参数预测模型得到面部参数(3);基于身体图像并利用身体参数预测模型得到身体参数(4);将人体图像输入性别分类器模型,得到性别参数(5);根据面部参数、身体参数和性别参数,通过SMPL-X模型得到目标物的人体三维模型(6)。该方法相对现有的人体预测模型可以解决基于三维人体模型中忽略面部重建的问题。

Description

三维人体重建方法、装置、计算机设备和存储介质 技术领域
本申请属于计算机视觉技术领域,尤其涉及一种三维人体重建方法、装置、计算机设备和存储介质。
背景技术
三维人体重建是通过图像或视频读懂人们姿态、交流线索和互动含义的重要技术手段。目前,基于人体图像的三维人体重建中,由于面部与手部在人体图像中占比较小,且容易模糊与遮挡,因此现有三维人体重建常常忽略了面部重建,且已有的面部重建方法不适合加入到身体模型上,因此现有三维人体重建通常形成无面部表情的三维人体模型。
技术问题
本申请实施例提供了三维人体重建方法、装置、计算机设备和存储介质,以解决基于三维人体模型中忽略面部重建的问题。
技术解决方案
为实现上述目的,本申请采用的一种技术方案是:提供一种三维人体重建方法,包括:
获取目标物的人体图像;
提取所述人体图像中的面部图像和身体图像;
利用所述面部图像并根据面部参数预测模型得到面部参数;
基于所述身体图像并利用身体参数预测模型得到身体参数;
将所述人体图像输入性别分类器模型,得到性别参数;
根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
可选地,所述方法还包括:
提取所述人体图像中的手部图像;
基于所述手部图像并利用手部参数预测模型得到手部参数;
所述根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型,包括:根据所述面部参数、所述手部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
可选地,所述提取所述人体图像中的面部图像步骤包括:
通过人脸数据集获得面部边框标签;
根据所述面部边框标签检测所述人体图像的面部边框;
利用所述面部边框对所述人体图像进行切割,得到所述面部图像。
可选地,在所述利用所述面部图像并根据面部参数预测模型得到面部参数步骤之前,所述方法还包括:
获取人脸数据集中各图像的人脸关键点位置数据;
根据预设置信度对多组所述人脸关键点位置数据进行筛选,得到合格关键点位置数据;
将所述合格关键点位置数据及其对应人脸图像输入到第一预设神经网络模型进行训练,得到所述面部参数预测模型。
可选地,在所述将所述人体图像输入性别分类器模型,得到性别参数步骤之前,所述方法还包括:
对人脸数据集中各人脸图像进行性别标签标记和获取人脸关键点位置数据;
将所述人脸关键点位置数据和对应的性别标签输入到第二预设神经网络模型进行训练,得到所述性别分类器模型。
可选地,所述根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型步骤包括:
将所述面部参数输入所述SMPL-X模型包括的面部模型,得到面部三维模型;
将所述身体参数输入所述SMPL-X模型包括的身体模型,得到身体三维模型;
对所述面部三维模型和所述身体三维模型进行整合构建出所述人体三维模型。
可选地,所述根据所述面部参数、所述手部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型步骤包括:
将所述面部参数输入所述SMPL-X模型包括的面部模型,得到面部三维模型;
将所述手部参数输入所述SMPL-X模型包括的手部模型,得到手部三维模型;
将所述身体参数输入所述SMPL-X模型包括的身体模型,得到身体三维模型;
对所述面部三维模型、所述手部三维模型和所述身体三维模型进行整合构建出所述人体三维模型。
可选地,所述将所述面部参数输入所述SMPL-X模型包括的面部模型,得到面部三维模型步骤包括:
将所述面部参数输入至所述面部模型,获取面部三维网格顶点坐标得到面部三维模型;所述面部参数包括面部形状、面部表情、面部姿态和面部图像相机参数。
可选地,所述将所述手部参数输入所述SMPL-X模型包括的手部模型,得到手部三维模型步骤包括:
将左手和右手各自对应的所述手部参数分别输入至所述手部模型,获取左手和右手各自的手部三维网格顶点坐标得到手部三维模型;所述手部参数包括手部关节参数、手部形状参数和手部图像相机参数;
可选地,所述将所述身体参数输入所述SMPL-X模型包括的身体模型,得到身体三维模型步骤包括:
将所述身体参数输入至所述身体模型,获取身体三维网格顶点坐标得到身体三维模型;所述身体参数包括身体关节参数、身体形状参数和身体图像相机 参数。
可选地,所述将所述人体图像输入性别分类器模型,得到性别参数步骤包括:
所述性别分类器模型输出所述性别参数和所述目标物为所述性别参数的性别概率;所述性别参数为男性参数或女性参数。
本申请还提供一种三维人体重建装置,包括:
获取模块,用于获取目标物的人体图像;
提取模块,用于提取所述人体图像中的面部图像和身体图像;
面部预测模块,用于基于所述面部图像,利用面部参数预测模型得到面部参数;
身体预测模块,用于基于所述身体图像,利用身体参数预测模型得到身体参数;
性别检测模块,用于将所述人体图像输入性别分类器模型,得到性别参数;
模型处理模块,用于根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
可选地,所述提取模块还用于提取所述人体图像中的手部图像;
所述装置还包括:手部预测模块,用于基于所述手部图像,利用手部参数预测模型得到手部参数;
所述模型处理模块用于根据所述面部参数、所述手部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
获取目标物的人体图像;
提取所述人体图像中的面部图像和身体图像;
利用所述面部图像并根据面部参数预测模型得到面部参数;
基于所述身体图像并利用身体参数预测模型得到身体参数;
将所述人体图像输入性别分类器模型,得到性别参数;
根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
本申请还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:
获取目标物的人体图像;
提取所述人体图像中的面部图像和身体图像;
利用所述面部图像并根据面部参数预测模型得到面部参数;
基于所述身体图像并利用身体参数预测模型得到身体参数;
将所述人体图像输入性别分类器模型,得到性别参数;
根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
有益效果
本申请提供的三维人体重建方法的有益效果在于:本申请采用SMPL-X人体预测模型的结构针对面部参数、手部参数、身体参数和性别参数进行最后的整合,将目标物的人体图像中各个部分输入各自对应的参数预测模型,实现通过人体图像形成具有完整面部重建的三维人体模型。本申请相对现有人体预测模型使用基于优化的方法简单且耗时短。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例提供的三维人体重建方法的流程示意图;
图2-(a)、图2-(b)、图2-(c)、图2-(d)是本申请中人体图像的示 例示意图;
图3是本申请一实施例提供的面部参数预测模型的形成过程示意图;
图4为面部关键点的示意图;
图5是一种SMPL-X人体预测模型结构的示意图;
图6是采用SMPL-X人体预测模型、不具有性别参数的人体预测模型和具有性别参数的SMPL-X人体预测模型的预测结果对比图;
图7是本申请一实施例提供的一种三维人体重建装置的结构框图。
图8是本申请一实施例中计算机设备的内部结构图。
本发明的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本申请首先获取待进行三维人体重建的目标物的人体图像,之后基于获得的人体图像进行面部图像和身体图像的提取,并分别基于所述面部图像和身体图像相应根据面部参数预测模型和身体参数预测模型得到面部参数和身体参数,以及利用人体图像和性别分类器模型获得性别参数,最后根据得到的面部参数、身体参数和性别参数,采用SMPL-X模型得到所述目标物的人体三维模型。本申请得到的人体三维模型充分考虑了面部重建,更有利于通过本申请的人体三维模型获知人物姿态、交流线索和互动含义。
如图1所示,本申请一实施例提供的一种三维人体重建方法,可以包括:
步骤1、获取目标物的人体图像。
其中,获取目标物的人体图像,其中人体图像可以包括面部、双手和身体。人体图像可以采用RGB图像。示例性地,图2-(a)、图2-(b)、图2-(c)和图2-(d)分别给出了本申请人体图像采用RGB图像的示例图像。
步骤2、提取人体图像中的面部图像和身体图像。
获取人体图像中的面部图像的方法,可以采用包括但不限于CenterFace(实用的边缘设备无锚人脸检测)、MTCNN(Multi-task convolutional neural network,基于多任务级联卷积网络的联合人脸检测与对准)、FaceBoxes(高精度的CPU实时人脸检测器)和RetinaFace(野外单阶段密集脸定位)。
示例性地,下面以CenterFace方法为例介绍如何获取人体图像中面部图像的过程:
步骤21,通过人脸数据集获得面部边框标签。人脸数据集使用vggface2数据集,该数据集包括人脸图片、相应的面部边框标签以及性别标签。
步骤22,根据面部边框标签检测人体图像的面部边框。示例性地,其中,面部边框的尺寸可以为224x224。
步骤23,利用面部边框对人体图像进行切割,得到面部图像。
提取身体图像的方法可以采用Real-time 2D Multi-Person Pose Estimation on CPU:Lightweight OpenPose(基于CPU的实时二维多人姿态估计:轻量级OpenPose),Daniil Osokin.Real-time 2D Multi-Person Pose Estimation on CPU:LightweightOpenPose.In arXiv preprint arXiv:1811.12004,2018,在此不做限定。
获取人体图像中的身体图像的方法,可以采用包括但不限于Real-time 2D Multi-Person Pose Estimation on CPU:Lightweight OpenPose(基于CPU的实时二维多人姿态估计:轻量级OpenPose),Daniil Osokin.Real-time 2D Multi-Person Pose Estimation on CPU:LightweightOpenPose.In arXiv preprint arXiv:1811.12004,2018,在此不做限定。
步骤3、利用面部图像并根据面部参数预测模型得到面部参数。
其中,如图3所示,面部参数预测模型的形成过程可以如下:
步骤31、获取人脸数据集中各图像的人脸关键点位置数据。
其中,获取人脸数据集中各图像的人脸关键点位置数据的方法可以采用OpenPose。OpenPose也可以称为人体姿态识别项目,是美国卡耐基梅隆大学(CMU)基于卷积神经网络和监督学习并以caffe为框架开发的开源库,可以实现人体动作、面部表情、手指运动等姿态估计。
人脸数据集可以采用VGGFace2,VGGFace2是一个大规模人脸识别数据集,包含331万图片,9131个ID,平均图片个数为362.6。VGGFace2具有人物ID 较多,且每个ID包含的图片个数也较多,覆盖大范围的姿态、年龄和种族的特点。
人脸关键点位置数据可以包括各关键点的x、y坐标和置信度,且置信度小于1。如图4所示,人脸关键点位置数据可以根据需要选取,例如可以选取68个,也可以选取可以较好表现面部表情的人脸关键点位置数据数量。
步骤32、根据预设置信度对多组人脸关键点位置数据进行筛选,得到合格关键点位置数据。
其中,预设置信度可以根据实际需要而设定,多组人脸关键点数据中置信度高于预设置信度的人脸关键点数据为合格关键点位置数据,例如,预设置信度可以为0.4,则相当于选取置信度在0.4-1之间的数据作为合格关键点位置数据。
步骤33、将合格关键点位置数据及其对应人脸图像输入到第一预设神经网络模型进行训练,得到面部参数预测模型。
其中,面部参数预测模型W f包括编码器与解码器,其中编码器为第一预设神经网络模型,其使用ResNet50,即特征提取预训练的网络ResNet50从二维图片上提取2048维的特征。解码器由一组全连接层组成,从特征中回归出头部参数,包括面部的面部形状参数β f、面部姿态参数θ f、面部表情参数
Figure PCTCN2020135932-appb-000001
和面部图像相机参数c f
ResNet是Residual Network,残差网络的缩写,广泛用于目标分类等领域以及作为计算机视觉任务主干经典神经网络的一部分。ResNet50是指包含了50个二维卷积的ResNet,其对合格关键点位置数据做了卷积操作,之后包含4个残差块(ResidualBlock),最后进行全连接操作以便于进行分类任务。神经网络模型在训练过程中均有损失项,加入损失项后经过训练得到训练完成的神经网络模型。第一预设神经网络模型训练过程中的损失项L total可以表示为:
Figure PCTCN2020135932-appb-000002
其中,λ proj是L proj损失项权重,L proj是关键点损失,
Figure PCTCN2020135932-appb-000003
Figure PCTCN2020135932-appb-000004
是从预测三维模型投影到二维坐标,K 2D是面部关键点的二维标记坐标,λ proj是L proj损失项权重。
Figure PCTCN2020135932-appb-000005
是面部形状损失项权重,
Figure PCTCN2020135932-appb-000006
是面部形状损失,
Figure PCTCN2020135932-appb-000007
是面部表情损失项权重,
Figure PCTCN2020135932-appb-000008
是面部表情损失。
步骤4、基于身体图像并利用身体参数预测模型得到身体参数;
其中,身体参数预测模型可以采用包括但不限于Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop(在回路中通过模型拟合学习重建三维人体姿态和形状),详见Nikos Kolotouros,Georgios Pavlakos,Michael J Black,and Kostas Daniilidis.Learning to reconstruct 3d human pose and shape via model-fitting in the loop.In ICCV,2019。
步骤5、将人体图像输入性别分类器模型,得到性别参数。示例性地,步骤5可以包括:性别分类器模型输出性别参数和目标物为性别参数的性别概率;性别参数为男性参数或女性参数。通过性别分类器模型输出的性别概率若低于预先设定的阈值,则直接将性别参数确定为中性参数。其中,性别分类器模型可以采用SMPL-X中Smplify-X训练的一个分类器,具体地,在将人体图像输入性别分类器模型,得到性别参数步骤之前即步骤5之前还包括性别分类器模型的形成过程:
步骤51、对人脸数据集中各人脸图像进行性别标签标记和获取人脸关键点位置数据。
其中,人脸数据集采用若干包含人脸的人体RGB图像,可以通过OpenPose获取人脸关键点位置数据。
步骤52、将人脸关键点位置数据和对应的性别标签输入到第二预设神经网络模型进行训练,得到性别分类器模型。
其中,第二预设神经网络模型可以采用Resnet18。
步骤6、根据面部参数、身体参数和性别参数,通过SMPL-X模型得到目标物的人体三维模型。
在一个实施例中,步骤2还可以为:提取人体图像中的面部图像、手部图 像和身体图像,即除了提取面部图像和身体图像外,还可以对人体图像中的手部图像进行提取。本实施例提供的三维人体重建方法可以包括基于手部图像并利用手部参数预测模型得到手部参数,相应地,在此基础上,步骤6还可以为根据面部参数、手部参数、身体参数和性别参数,通过SMPL-X模型得到目标物的人体三维模型。即本实施例重建后的人体三维模型除了包括面部三维模型和身体三维模型外,还可以包括手部三维模型,当然重建后的人体三维模型可以只包括面部三维模型和身体三维模型。
其中,提取手部图像的方法可以采用包括但不限于Understanding Human Hands in Contact at Internet Scale(在互联网尺度上理解人类的双手接触),详见Shan,D.,Geng,J.,Shu,M.,and Fouhey,D.F.Understanding Human Hands in Contact at Internet Scale..In arXiv preprint arXiv:2006.06669,2020。
手部参数预测模型可以采用包括但不限于FrankMocap(基于回归和积分的快速单目三维手和身体运动捕捉)。
本申请中的SMPL-X人体预测模型不同于SMPL模型仅仅专注于身体运动部分,还包括了手部运动,例如手指的弯曲张开,以及面部表情。如图5示出了根据人体图像得到人体三维模型所需的各模型。
SMPL-X人体预测模型的函数可表述为:
Figure PCTCN2020135932-appb-000009
其中,β是形状参数,用于描述人体模型的形状,包括一个10维的线性形状参数。形状参数是低维形状空间的系数,是从成千上万个已注册扫描的训练集中进行获得。
θ是动作参数,用于描述人体模型关节的旋转,包括人体身体21个关节、手部30个关节和面部3个关节,共54个关节的关节角信息。人体姿势空间参数可以在不同姿势下的1786种路线上得到训练,手部姿势参数可以从1500个手部扫描中学习,动作参数θ可以分解成身体动作参数θ b,手部动作参数θ h,面部动作参数θ f,即θ=(θ b,θ h,θ f)。
Figure PCTCN2020135932-appb-000010
是表情参数,用于描述面部表情,从3800个高分辨率头部扫描中学到。
T P是平均模板,用于描述根据β、θ、
Figure PCTCN2020135932-appb-000011
的变化从模板模型上所需作出的变形,其中模板模型是SMPL-X模型。
J是三维关节位置。
现分别介绍人体各部分模型的形成过程,示例性地:
步骤61、将面部参数输入SMPL-X模型包括的面部模型,得到面部三维模型。
在一个实施例中,步骤61具体可以包括以下步骤:
将面部形状、面部姿态、面部表情和面部图像相机参数输入至面部模型,并获取面部三维网格顶点坐标即面部三维模型。面部参数包括面部形状、面部表情、面部姿态和面部图像相机参数。
其中,面部参数预测模型W f采用SMPL-X模型中面部模型。面部参数预测模型使用端到端的神经网络结构以回归出头部参数,可表述为:
Figure PCTCN2020135932-appb-000012
其中,I f是使用CenterFace方法进行剪裁后得到的具有头部区域的图像,β f是面部形状,θ f是面部姿态,用旋转角表示,
Figure PCTCN2020135932-appb-000013
是面部表情,与函数(1)中
Figure PCTCN2020135932-appb-000014
相同,c f是面部图像相机参数。
通过回归器获取的参数输入到面部模型中,得到面部三维网格顶点坐标,即三维面部模型。
步骤62、将手部参数输入SMPL-X模型包括的手部模型,得到手部三维模型。
在一个实施例中,步骤62可以具体包括以下步骤:
将左手和右手各自对应的手部关节参数、手部形状参数和手部图像相机参数分别输入至手部模型,并获取左手和右手各自的手部三维网格顶点坐标即手部三维模型。手部参数包括手部关节参数、手部形状参数和手部图像相机参数。
其中,在手部参数预测模型W h中,先由手部边框检测器检测出左手与右手的二维图像,作为手部参数预测模型的输入,预测出手部关节参数与手部形状参数,可表述为:
h,θ h,c h]=W h(I h)  (3)
I h为手部的二维图像,β h是手部形状参数,取值与函数(1)中β相同。θ h是手部的15个关节参数,取值与函数(1)中θ相同。c h为手部图像相机参数,取值为实数。
步骤63、将身体参数输入SMPL-X模型包括的身体模型,得到身体三维模型。
在一个实施例中,步骤63可以包括:
将身体关节参数、身体形状参数和身体图像相机参数输入至身体模型,并获取身体三维网格顶点坐标即身体三维模型。身体参数包括身体关节参数、身体形状参数和身体图像相机参数。其中,在身体参数预测模型W b中,忽略面部与手部的关节、面部表情,采用基于神经网络的模型预测身体的21个关节参数与形状参数,可表述为:
b,θ b,c b]=W b(I)  (4)
其中,输入I是人物二维图像,即包含人体的RGB图像,输出β b,θ b,c b
β b是身体形状参数,10维向量,取值与函数(1)中β相同,θ b是身体的21个关节旋转角,取值与函数(1)中θ相同。c b为身体的相机参数,取值为实数。
SMPL-X人体预测模型是在SMPL人体预测模型的基础上增加了手部和面部的扩展模型。本申请采用SMPL-X人体预测模型的结构针对面部参数、手部参数、身体参数和性别参数进行整合,将目标物的人体图像中的各个部分输入各自对应的参数预测模型,实现通过人体图像形成具有面部重建的三维人体模型。相对现有人体预测模型使用基于优化的方法简单且耗时短。
如图6所示,第一列图像为人体图像,第二列图像为采用现有人体预测模型处理后的人体三维图像,第三列图像为不具有性别且具有面部表情的人体三 维图像,第四列图像为具有性别且具有面部表情的人体三维图像,通过图6中的图像可知,具有性别且具有面部表情的人体三维图像更加精准。
参考图7所示,本申请一实施例提供了一种三维人体重建装置,可以包括获取模块21、提取模块22、面部预测模块23、身体预测模块25、性别检测模块26和模型处理模块27。获取模块21可以用于获取目标物的人体图像。提取模块22可以用于提取人体图像中的面部图像和身体图像。面部预测模块23可以用于基于面部图像,利用面部参数预测模型得到面部参数。身体预测模块25可以用于基于身体图像,利用身体参数预测模型得到身体参数。性别检测模块26可以用于将人体图像输入性别分类器模型,得到性别参数。模型处理模块27可以用于根据面部参数、身体参数和性别参数,通过SMPL-X模型,得到目标物的人体三维模型。
在一个实施例中,参考图7所示,提取模块22还可以用于提取人体图像中的手部图像。三维人体重建装置还可以包括手部预测模块24,手部预测模块24可以用于基于手部图像,利用手部参数预测模型得到手部参数。模型处理模块27可以用于根据面部参数、手部参数、身体参数和性别参数,通过SMPL-X模型,得到目标物的人体三维模型。
在一个实施例中,面部预测模块23可以包括获取单元、筛选单元和处理单元。获取单元用于获取人脸数据集中各图像的人脸关键点位置数据。筛选单元用于根据预设置信度对多组人脸关键点位置数据进行筛选,得到合格关键点位置数据。筛选单元用于将合格关键点位置数据输入到第一预设神经网络模型进行训练,得到面部参数预测模型。
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上 述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再详述。
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如任一实施例中三维人体重建方法的步骤。
图8示出了一个实施例中计算机设备的内部结构图。如图8所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现三维人体重建方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行三维人体重建方法。本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时可实现上述各实施例中三维人体重建的方法步骤。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请 实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者 也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均包含在本申请的保护范围之内。

Claims (15)

  1. 一种三维人体重建方法,其特征在于,所述方法包括:
    获取目标物的人体图像;
    提取所述人体图像中的面部图像和身体图像;
    利用所述面部图像并根据面部参数预测模型得到面部参数;
    基于所述身体图像并利用身体参数预测模型得到身体参数;
    将所述人体图像输入性别分类器模型,得到性别参数;
    根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
  2. 根据权利要求1所述的三维人体重建方法,其特征在于,
    所述方法还包括:
    提取所述人体图像中的手部图像;
    基于所述手部图像并利用手部参数预测模型得到手部参数;
    所述根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型,包括:根据所述面部参数、所述手部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
  3. 根据权利要求1所述的三维人体重建方法,其特征在于,所述提取所述人体图像中的面部图像步骤包括:
    通过人脸数据集获得面部边框标签;
    根据所述面部边框标签检测所述人体图像的面部边框;
    利用所述面部边框对所述人体图像进行切割,得到所述面部图像。
  4. 根据权利要求1至3任一项所述的三维人体重建方法,其特征在于, 在所述利用所述面部图像并根据面部参数预测模型得到面部参数步骤之前,所述方法还包括:
    获取人脸数据集中各图像的人脸关键点位置数据;
    根据预设置信度对多组所述人脸关键点位置数据进行筛选,得到合格关键点位置数据;
    将所述合格关键点位置数据及其对应人脸图像输入到第一预设神经网络模型进行训练,得到所述面部参数预测模型。
  5. 根据权利要求1至3任一项所述的三维人体重建方法,其特征在于,在所述将所述人体图像输入性别分类器模型,得到性别参数步骤之前,所述方法还包括:
    对人脸数据集中各人脸图像进行性别标签标记和获取人脸关键点位置数据;
    将所述人脸关键点位置数据和对应的性别标签输入到第二预设神经网络模型进行训练,得到所述性别分类器模型。
  6. 根据权利要求1所述的三维人体重建方法,其特征在于,所述根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型步骤包括:
    将所述面部参数输入所述SMPL-X模型包括的面部模型,得到面部三维模型;
    将所述身体参数输入所述SMPL-X模型包括的身体模型,得到身体三维模型;
    对所述面部三维模型和所述身体三维模型进行整合构建出所述人体三维模型。
  7. 根据权利要求2所述的三维人体重建方法,其特征在于,所述根据所述面部参数、所述手部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型步骤包括:
    将所述面部参数输入所述SMPL-X模型包括的面部模型,得到面部三维模型;
    将所述手部参数输入所述SMPL-X模型包括的手部模型,得到手部三维模型;
    将所述身体参数输入所述SMPL-X模型包括的身体模型,得到身体三维模型;
    对所述面部三维模型、所述手部三维模型和所述身体三维模型进行整合构建出所述人体三维模型。
  8. 根据权利要求6或7所述的三维人体重建方法,其特征在于,所述将所述面部参数输入所述SMPL-X模型包括的面部模型,得到面部三维模型步骤包括:
    将所述面部参数输入至所述面部模型,获取面部三维网格顶点坐标得到面部三维模型;所述面部参数包括面部形状、面部表情、面部姿态和面部图像相机参数。
  9. 根据权利要求7所述的三维人体重建方法,其特征在于,所述将所述手部参数输入所述SMPL-X模型包括的手部模型,得到手部三维模型步骤包括:
    将左手和右手各自对应的所述手部参数分别输入至所述手部模型,获取左手和右手各自的手部三维网格顶点坐标得到手部三维模型;所述手部参数包括手部关节参数、手部形状参数和手部图像相机参数;
  10. 根据权利要求6或7所述的三维人体重建方法,其特征在于,所述将 所述身体参数输入所述SMPL-X模型包括的身体模型,得到身体三维模型步骤包括:
    将所述身体参数输入至所述身体模型,获取身体三维网格顶点坐标得到身体三维模型;所述身体参数包括身体关节参数、身体形状参数和身体图像相机参数。
  11. 根据权利要求1或3所述的三维人体重建方法,其特征在于,所述将所述人体图像输入性别分类器模型,得到性别参数步骤包括:
    所述性别分类器模型输出所述性别参数和所述目标物为所述性别参数的性别概率;所述性别参数为男性参数或女性参数。
  12. 一种三维人体重建装置,其特征在于,所述装置包括:
    获取模块,用于获取目标物的人体图像;
    提取模块,用于提取所述人体图像中的面部图像和身体图像;
    面部预测模块,用于基于所述面部图像,利用面部参数预测模型得到面部参数;
    身体预测模块,用于基于所述身体图像,利用身体参数预测模型得到身体参数;
    性别检测模块,用于将所述人体图像输入性别分类器模型,得到性别参数;
    模型处理模块,用于根据所述面部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
  13. 根据权利要求12所述的三维人体重建装置,其特征在于,
    所述提取模块还用于提取所述人体图像中的手部图像;
    所述装置还包括:手部预测模块,用于基于所述手部图像,利用手部参数预测模型得到手部参数;
    所述模型处理模块用于根据所述面部参数、所述手部参数、所述身体参数和所述性别参数,通过SMPL-X模型得到所述目标物的人体三维模型。
  14. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至11中任一项所述三维人体重建方法的步骤。
  15. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至11中任一项所述三维人体重建方法的步骤。
PCT/CN2020/135932 2020-12-11 2020-12-11 三维人体重建方法、装置、计算机设备和存储介质 WO2022120843A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/135932 WO2022120843A1 (zh) 2020-12-11 2020-12-11 三维人体重建方法、装置、计算机设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/135932 WO2022120843A1 (zh) 2020-12-11 2020-12-11 三维人体重建方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022120843A1 true WO2022120843A1 (zh) 2022-06-16

Family

ID=81974166

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135932 WO2022120843A1 (zh) 2020-12-11 2020-12-11 三维人体重建方法、装置、计算机设备和存储介质

Country Status (1)

Country Link
WO (1) WO2022120843A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830642A (zh) * 2023-02-13 2023-03-21 粤港澳大湾区数字经济研究院(福田) 2d全身人体关键点标注方法及3d人体网格标注方法
CN116714251A (zh) * 2023-05-16 2023-09-08 北京盈锋科技有限公司 一种人物三维立体打印系统、方法、电子设备及存储介质
CN117392326A (zh) * 2023-11-09 2024-01-12 中国科学院自动化研究所 基于单张图像的三维人体重建方法及相关设备
WO2024103890A1 (zh) * 2022-11-18 2024-05-23 苏州元脑智能科技有限公司 模型构建方法、重建方法、装置、电子设备及非易失性可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104167016A (zh) * 2014-06-16 2014-11-26 西安工业大学 一种基于rgb彩色与深度图像的三维运动重建方法
US10621788B1 (en) * 2018-09-25 2020-04-14 Sony Corporation Reconstructing three-dimensional (3D) human body model based on depth points-to-3D human body model surface distance
CN111127641A (zh) * 2019-12-31 2020-05-08 中国人民解放军陆军工程大学 一种具有高逼真面部特征的三维人体参数化建模方法
CN111784818A (zh) * 2020-06-01 2020-10-16 北京沃东天骏信息技术有限公司 生成三维人体模型的方法、装置及计算机可读存储介质
CN111932678A (zh) * 2020-08-13 2020-11-13 北京未澜科技有限公司 多视点实时人体运动、手势、表情、纹理重建系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104167016A (zh) * 2014-06-16 2014-11-26 西安工业大学 一种基于rgb彩色与深度图像的三维运动重建方法
US10621788B1 (en) * 2018-09-25 2020-04-14 Sony Corporation Reconstructing three-dimensional (3D) human body model based on depth points-to-3D human body model surface distance
CN111127641A (zh) * 2019-12-31 2020-05-08 中国人民解放军陆军工程大学 一种具有高逼真面部特征的三维人体参数化建模方法
CN111784818A (zh) * 2020-06-01 2020-10-16 北京沃东天骏信息技术有限公司 生成三维人体模型的方法、装置及计算机可读存储介质
CN111932678A (zh) * 2020-08-13 2020-11-13 北京未澜科技有限公司 多视点实时人体运动、手势、表情、纹理重建系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103890A1 (zh) * 2022-11-18 2024-05-23 苏州元脑智能科技有限公司 模型构建方法、重建方法、装置、电子设备及非易失性可读存储介质
CN115830642A (zh) * 2023-02-13 2023-03-21 粤港澳大湾区数字经济研究院(福田) 2d全身人体关键点标注方法及3d人体网格标注方法
CN115830642B (zh) * 2023-02-13 2024-01-12 粤港澳大湾区数字经济研究院(福田) 2d全身人体关键点标注方法及3d人体网格标注方法
CN116714251A (zh) * 2023-05-16 2023-09-08 北京盈锋科技有限公司 一种人物三维立体打印系统、方法、电子设备及存储介质
CN116714251B (zh) * 2023-05-16 2024-05-31 北京盈锋科技有限公司 一种人物三维立体打印系统、方法、电子设备及存储介质
CN117392326A (zh) * 2023-11-09 2024-01-12 中国科学院自动化研究所 基于单张图像的三维人体重建方法及相关设备

Similar Documents

Publication Publication Date Title
Jiang et al. Deep learning-based face super-resolution: A survey
Jam et al. A comprehensive review of past and present image inpainting methods
Fieraru et al. Three-dimensional reconstruction of human interactions
CN112530019B (zh) 三维人体重建方法、装置、计算机设备和存储介质
WO2022120843A1 (zh) 三维人体重建方法、装置、计算机设备和存储介质
Han et al. Space-time representation of people based on 3D skeletal data: A review
Sigal Human pose estimation
Sharp et al. Accurate, robust, and flexible real-time hand tracking
Liu et al. A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection
Laraba et al. 3D skeleton‐based action recognition by representing motion capture sequences as 2D‐RGB images
Li et al. Grayscale-thermal object tracking via multitask laplacian sparse representation
Ahmad et al. Human action recognition using shape and CLG-motion flow from multi-view image sequences
Berretti et al. Representation, analysis, and recognition of 3D humans: A survey
Wang et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
Guo et al. Multiview cauchy estimator feature embedding for depth and inertial sensor-based human action recognition
Chen et al. Learning a deep network with spherical part model for 3D hand pose estimation
Arivazhagan et al. Human action recognition from RGB-D data using complete local binary pattern
Gao et al. Collaborative sparse representation leaning model for RGBD action recognition
Ong et al. Viewpoint invariant exemplar-based 3D human tracking
Lee et al. 3-D human behavior understanding using generalized TS-LSTM networks
Biswas et al. A new perceptual hashing method for verification and identity classification of occluded faces
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
Zhang et al. Deep learning-based real-time 3D human pose estimation
Asadi-Aghbolaghi et al. Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos
Chang et al. 2d–3d pose consistency-based conditional random fields for 3d human pose estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964779

Country of ref document: EP

Kind code of ref document: A1