WO2024103890A1 - Procédé et appareil de construction de modèle, procédé et appareil de reconstruction, et dispositif électronique et support de stockage lisible non volatil - Google Patents

Procédé et appareil de construction de modèle, procédé et appareil de reconstruction, et dispositif électronique et support de stockage lisible non volatil Download PDF

Info

Publication number
WO2024103890A1
WO2024103890A1 PCT/CN2023/114799 CN2023114799W WO2024103890A1 WO 2024103890 A1 WO2024103890 A1 WO 2024103890A1 CN 2023114799 W CN2023114799 W CN 2023114799W WO 2024103890 A1 WO2024103890 A1 WO 2024103890A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
human body
target
view
clothed
Prior art date
Application number
PCT/CN2023/114799
Other languages
English (en)
Chinese (zh)
Inventor
孙红岩
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024103890A1 publication Critical patent/WO2024103890A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the field of deep learning technology, and in particular to a model building method, a reconstruction method, a device, an electronic device, and a non-volatile readable storage medium.
  • the present application provides a model building method, a reconstruction method, an apparatus, an electronic device and a non-volatile readable storage medium to solve the technical defect of the prior art that 3D reconstruction in complex scenes cannot be performed.
  • the present application provides a method for constructing a three-dimensional model of a clothed human body, comprising:
  • the initial SMPL model is trained based on the preset human posture image training data to obtain a trained target SMPL model
  • the initial front-view prediction model and the initial rear-view prediction model are trained to obtain the trained target front-view prediction model and the target rear-view prediction model, wherein the target front-view prediction model is used to construct a target front-view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, and the target rear-view prediction model is used to construct a target rear-view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, and the target three-dimensional voxel array is obtained by processing the preset human posture image training data through the target SMPL model;
  • the initial in-vivo and out-vivo recognition model is trained based on the target front-view prediction model and the target rear-view prediction model to obtain a trained target in-vivo and out-vivo recognition model, wherein the target in-vivo and out-vivo recognition model is used to distinguish sampling points located inside or outside the body in the target front-view clothed human body 3D prediction model and the target rear-view clothed human body 3D prediction model;
  • a three-dimensional model of a clothed human body is constructed based on the target SMPL model, the target front-view prediction model, the target rear-view prediction model, the target in-vivo and out-of-vivo recognition model and the image three-dimensional visualization model.
  • the three-dimensional model of the clothed human body is used to reconstruct the 3D model of the clothed human body corresponding to the clothed human body posture image data to be reconstructed.
  • the preset human body posture image training data includes 3D human body posture image training data and 2D human body posture image training data;
  • the initial SMPL model is trained based on the preset human posture image training data to obtain the trained target SMPL model, including:
  • the initial SMPL model is trained in the first stage based on the 3D human posture image training data to obtain a primary SMPL model
  • the primary SMPL model is trained in the second stage based on the 2D human posture image training data to obtain the trained target SMPL model.
  • a method for constructing a three-dimensional model of a clothed human body further includes:
  • a primary SMPL model is trained in the second stage based on 2D human posture image training data to obtain a trained target SMPL model, including:
  • the 2D regression loss between the 2D human pose image prediction data and the 2D human pose image training data is calculated, and the primary SMPL model is iteratively updated based on the 2D regression loss until the second stage of training is completed to obtain the trained target SMPL model.
  • the initial SMPL model is trained in the first stage based on 3D human posture image training data to obtain a primary SMPL model, including:
  • the initial 3D human posture image prediction data reconstructed by the initial SMPL model is obtained;
  • the initial SMPL model is iteratively updated based on the 3D regression loss until the first stage of training is completed to obtain a trained primary SMPL model.
  • a calculation formula for calculating the 3D regression loss is calculated based on SMPL posture parameters, SMPL morphological parameters, global rotation parameters, camera parameters and initial 3D human posture image prediction data:
  • L ⁇ is the 3D regression loss corresponding to the SMPL posture parameters
  • L ⁇ is the 3D regression loss corresponding to the SMPL morphology parameters
  • L c is the 3D regression loss corresponding to the camera parameters.
  • a method for constructing a three-dimensional model of a clothed human body also includes:
  • L1 includes: L ⁇ , L ⁇ , and L c
  • yi is the expected value under the corresponding parameters
  • f( xi ) is the predicted value under the corresponding parameters.
  • an initial front-view prediction model and an initial rear-view prediction model are trained based on a trained target SMPL model to obtain a trained target front-view prediction model and a trained target rear-view prediction model, including:
  • a predicted front-view voxel array and a predicted rear-view voxel array are decomposed from the predicted three-dimensional voxel array, and an initial front-view prediction model is trained based on the predicted front-view voxel array, and an initial rear-view prediction model is trained based on the predicted rear-view voxel array to obtain a trained target front-view prediction model and a target rear-view prediction model.
  • the method further comprises:
  • the calculation formula of the loss function of the model during training is: in, is the expected value, is the predicted value.
  • an initial orthographic prediction model is trained based on a predicted orthographic voxel array, including:
  • the initial frontal prediction model is trained based on the frontal clothed human body prediction images.
  • an initial rear-view prediction model is trained based on a predicted rear-view voxel array, including:
  • the initial rear-view prediction model is trained based on the rear-view clothed human body prediction images.
  • an initial in-vivo and out-vivo recognition model is trained based on a target front-view prediction model and a target rear-view prediction model to obtain a trained target in-vivo and out-vivo recognition model, including:
  • a 3D prediction model of a clothed human body in front view is estimated based on the target front view prediction model, and a 3D prediction model of a clothed human body in rear view is estimated based on the target rear view prediction model;
  • sampling points located inside or outside the body are respectively taken from the front-view clothed human body 3D prediction model and the rear-view clothed human body 3D prediction model to construct a sampling point training set;
  • the initial in vivo and in vitro recognition model is trained based on the sampling point training set to obtain a trained target in vivo and in vitro recognition model.
  • the structural units of the initial front-view prediction model and the initial rear-view prediction model are ResNet subnetworks
  • the ResNet subnetwork includes Conv convolution layer, BatchNorm normalization layer and Relu activation function layer.
  • the target body inside and outside recognition model is composed of an input layer, a first fully connected layer of 13 neurons, a second fully connected layer of 521 neurons, a third fully connected layer of 256 neurons, a fourth fully connected layer of 128 neurons, a fifth fully connected layer of 1 neuron and an output layer.
  • the present application also provides a method for three-dimensional reconstruction of a clothed human body, comprising:
  • the three-dimensional model of a clothed human body is obtained based on any of the above three-dimensional model construction methods of a clothed human body.
  • the clothed human body three-dimensional model includes a target SMPL model, a target front-view prediction model, a target rear-view prediction model, a target body and interior and exterior recognition model, and an image three-dimensional visualization model;
  • Inputting the to-be-reconstructed clothed human body posture image data into the clothed human body three-dimensional model, and obtaining the clothed human body 3D model output by the clothed human body three-dimensional model comprises:
  • Decomposing a target front-view voxel array and a target rear-view voxel array from a target three-dimensional voxel array inputting the target front-view voxel array into a target front-view prediction model, obtaining a target front-view clothed human body 3D model output by the target front-view prediction model, and inputting the target rear-view voxel array into the target rear-view prediction model, obtaining a target rear-view clothed human body 3D model output by the target rear-view prediction model;
  • each front-view coordinate point, the color value of each front-view coordinate point, each rear-view coordinate point, the color value of each rear-view coordinate point and the SDF value of each 3D coordinate point into the target in-vivo and out-vivo recognition model, and obtain the in-vivo and out-vivo recognition results of each 3D coordinate point output by the target in-vivo and out-vivo recognition model;
  • the in-vivo and out-vivo recognition results are input into the image three-dimensional visualization model to obtain a 3D model of a clothed human body output by the image three-dimensional visualization model.
  • the present application also provides a device for constructing a three-dimensional model of a clothed human body, comprising:
  • the first training unit is configured to train the initial SMPL model based on preset human posture image training data to obtain a trained target SMPL model;
  • the second training unit is configured to train the initial front-view prediction model and the initial rear-view prediction model based on the trained target SMPL model to obtain the trained target front-view prediction model and the target rear-view prediction model, wherein the target front-view prediction model is used to construct a target front-view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, and the target rear-view prediction model is used to construct a target rear-view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, and the target three-dimensional voxel array is obtained by processing the preset human posture image training data through the target SMPL model;
  • the third training unit is configured to train the initial in-vivo and out-of-vivo recognition model based on the target front-view prediction model and the target rear-view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model, wherein the target in-vivo and out-of-vivo recognition model is used to distinguish the target front-view clothed human body 3D prediction model and the target rear-view prediction model. Sampling points located inside or outside the body of the 3D predicted model of the clothed human body;
  • the construction unit is configured to construct a three-dimensional model of a clothed human body based on a target SMPL model, a target front-view prediction model, a target rear-view prediction model, a target in-vivo and out-of-vivo recognition model, and an image three-dimensional visualization model, wherein the three-dimensional model of the clothed human body is used to reconstruct a 3D model of a clothed human body corresponding to the clothed human body posture image data to be reconstructed.
  • the present application also provides a three-dimensional reconstruction device for a clothed human body, comprising:
  • a determination unit configured to determine the clothed human body posture image data to be reconstructed
  • a reconstruction unit is configured to input the to-be-reconstructed clothed human body posture image data into the clothed human body three-dimensional model to obtain a clothed human body 3D model output by the clothed human body three-dimensional model;
  • the three-dimensional model of a clothed human body is obtained based on any of the above three-dimensional model construction methods of a clothed human body.
  • the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it implements any one of the above-mentioned methods for constructing a three-dimensional model of a clothed human body or any one of the above-mentioned methods for reconstructing a three-dimensional clothed human body.
  • the present application also provides a non-volatile readable storage medium having a computer program stored thereon, which, when executed by a processor, implements any of the above-mentioned methods for constructing a three-dimensional model of a clothed human body or any of the above-mentioned methods for reconstructing a three-dimensional clothed human body.
  • the present application also provides a computer program product, including a computer program, which, when executed by a processor, implements any of the above-mentioned methods for constructing a three-dimensional model of a clothed human body or any of the above-mentioned methods for reconstructing a three-dimensional clothed human body.
  • the model construction method, reconstruction method, device, electronic device and non-volatile readable storage medium include training an initial SMPL model based on preset human posture image training data to obtain a trained target SMPL model, training an initial front view prediction model and an initial rear view prediction model based on the trained target SMPL model to obtain a trained target front view prediction model and a target rear view prediction model, training an initial in-vivo and out-of-vivo recognition model based on the target front view prediction model and the target rear view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model, and finally constructing a three-dimensional model of a clothed human body based on the target SMPL model, the target front view prediction model, the target rear view prediction model, the target in-vivo and out-of-vivo recognition model and the image three-dimensional visualization model.
  • the three-dimensional model of a clothed human body constructed in this way includes dimensional feature recognition of multiple different levels of SMPL parameter dimension, front view dimension, rear view dimension and internal and external point dimension of the human body surface, and the three-dimensional model of a clothed human body constructed thereby can solve the interference of the relative overlapping and penetration phenomenon of the human body in complex scenes with multiple people, and thus can restore the model reconstruction of the clothed human body in complex scenes with multiple people.
  • FIG1 is a flow chart of a method for training a human body 3D reconstruction model provided by the present application
  • FIG2 is a schematic diagram of a model framework based on a ResNet subnetwork as a structural unit provided by the present application;
  • FIG3 is a schematic diagram of the structure of a ResNet subnetwork provided by the present application.
  • FIG4 is a schematic diagram of a flow chart of a method for three-dimensional reconstruction of a clothed human body provided by the present application.
  • FIG5 is a schematic diagram of the structure of a human body 3D reconstruction model training device provided by the present application.
  • FIG6 is a schematic structural diagram of a three-dimensional reconstruction device for a clothed human body provided by the present application.
  • FIG. 7 is a schematic diagram of the structure of an electronic device provided in the present application.
  • this embodiment provides a method for constructing a three-dimensional model of a clothed human body.
  • FIG. 1 it is one of the flow charts of the method for constructing a three-dimensional model of a clothed human body provided in an embodiment of the present application.
  • the method mainly comprises the following steps:
  • Step 101 training an initial SMPL model based on preset human posture image training data to obtain a trained target SMPL model
  • the SMPL (Skinned Multi-Person Linear) model is a model that uses shape parameters and posture parameters to encode the human body.
  • the input parameters of the initial SMPL model are divided into posture parameters and body shape parameters, wherein the posture parameters include 23*3 joint points and 3 root joint points, and the body shape parameters include 10 parameters, including height, weight, head-to-body ratio, etc.
  • the output includes SMPL posture parameters, SMPL morphological parameters, global rotation parameters, and camera parameters.
  • SMPL posture parameters include 23*3 joint points and 3 root joint points
  • body shape parameters include 10 parameters, including height, weight, head-to-body ratio, etc.
  • the output includes SMPL posture parameters, SMPL morphological parameters, global rotation parameters, and camera parameters.
  • a reconstructed three-dimensional human body network under the posture parameters and body shape parameters is obtained.
  • the model parameters of the initial SMPL model can be iteratively updated according to the predicted position and actual position of each sampling point after reconstruction until the model parameters converge, thereby obtaining a trained target SMPL model.
  • the preset human posture image training data can be obtained from the Human36M dataset.
  • the Human36M dataset is obtained, and then at least one of random scale transformation, random rotation, and random color transformation is used to process the images in the Human36M dataset to obtain processed images, and the images before and after processing together constitute the preset human posture image training data.
  • the number of samples selected for one training of the model is set to Batch Size of 64, and the training is performed using the Adam optimizer with an initial learning rate of 10 -4 .
  • Step 102 training an initial front-view prediction model and an initial back-view prediction model based on the trained target SMPL model to obtain a trained target front-view prediction model and a trained target back-view prediction model;
  • the front view prediction model refers to a model that estimates the position and color of the coordinate points of the sampling points in the front view direction of the human body three-dimensional model
  • the rear view prediction model refers to a model that estimates the position and color of the coordinate points of the sampling points in the rear view direction of the human body three-dimensional model.
  • the target three-dimensional voxel array can be obtained through the target SMPL model.
  • the target front-view prediction model can be used to construct a target front-view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array
  • the target rear-view prediction model can be used to construct a target rear-view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, so as to extract the features in the front-view dimension and the rear-view dimension in the preset human posture image training data.
  • the structural units of the initial front view prediction model and the initial rear view prediction model in the present embodiment are ResNet (Residual Network) sub-networks.
  • ResNet Residual Network
  • Figure 2 is a model framework based on ResNet sub-networks as structural units proposed in the present embodiment.
  • the front view prediction model is used as a representative for explanation.
  • the feature data is input into the positive prediction model, it is first processed once by the ResNet sub-network to obtain the first processing result.
  • the first processing result is processed again by the ResNet sub-network to obtain the second processing result.
  • the second processing result is processed by two consecutive ResNet sub-networks to obtain the third processing result.
  • the second processing result and the third processing result are fused to obtain the first fused result.
  • the first fused result is processed by two consecutive ResNet sub-networks to obtain the fourth processing result.
  • the fourth processing result and the first processing result continue to be fused to obtain the second fused result.
  • the second fused result is processed by two consecutive ResNet sub-networks to obtain the fifth processing result.
  • the fifth processing result is fused with the input feature data to obtain the final processing result that the model needs to output.
  • the ResNet sub-network includes a Conv (Convolutional Layer) convolution layer, a BatchNorm (Batch Normalization) normalization layer and a Relu (Rectified Linear Unit) activation function layer.
  • Conv Convolutional Layer
  • BatchNorm Batch Normalization
  • Relu Rectified Linear Unit
  • the feature data first passes through a Conv convolution layer with a parameter value of 3*1*1 (i.e., 3 input channels, 1 output channel, and 1 convolution kernel), a BatchNorm normalization layer, and a Relu activation function layer to obtain the first result data. It then passes through a Conv convolution layer with a parameter value of 1*3*1 (i.e., the input channel is 1, the output channel is 3, and the convolution kernel is 1), a BatchNorm normalization layer, and a Relu activation function layer in sequence to obtain the second result data.
  • 3*1*1 i.e., 3 input channels, 1 output channel, and 1 convolution kernel
  • 1*3*1 i.e., the input channel is 1, the output channel is 3, and the convolution kernel is 1
  • the second result data continues to pass through a Conv convolution layer with a parameter value of 1*1*3 (i.e., the input channel is 1, the output channel is 1, and the convolution kernel is 3), a BatchNorm normalization layer, and a Relu activation function layer in sequence to obtain the third result data.
  • the last three result data are combined to obtain the result data output by the ResNet subnetwork.
  • Step 103 training the initial in-vivo and out-vivo recognition model based on the target front-view prediction model and the target rear-view prediction model to obtain a trained target in-vivo and out-vivo recognition model;
  • the initial in-vivo and out-of-vivo recognition model refers to a model used to distinguish whether the sampling point is outside the human body surface or inside the human body surface, and its output result is +1 or 1.
  • the result is +1, it indicates that the sampling point is outside the human body surface, and when the result is 1, it indicates that the sampling point is inside the human body surface, thereby reconstructing a three-dimensional model of the clothed human body.
  • the target in-vivo recognition model is used to distinguish the sampling points located inside or outside the body in the target frontal 3D prediction model of the clothed human body and the target rear-view 3D prediction model of the clothed human body, thereby eliminating the interference of overlapping penetration between human bodies on the reconstruction of the model of the clothed human body through the features in the dimensions of the inside and outside points on the human body surface.
  • the target in-vivo recognition model in this embodiment is composed of an input layer, a first fully connected layer of 13 neurons, a second fully connected layer of 521 neurons, a third fully connected layer of 256 neurons, a fourth fully connected layer of 128 neurons, a fifth fully connected layer of 1 neuron, and an output layer.
  • Step 104 constructing a three-dimensional model of a clothed human body based on the target SMPL model, the target front-view prediction model, the target rear-view prediction model, the target in-body and out-body recognition model, and the image three-dimensional visualization model.
  • the target body inside and outside recognition model can only distinguish whether the sampling point is outside or inside the human body surface, in order to reconstruct a complete three-dimensional model of the clothed human body in this embodiment, after knowing the positional relationship between the sampling point and the human body surface, it is processed by the image three-dimensional visualization model to construct a 3D (three-dimensional) model of the clothed human body corresponding to the clothed human body posture image data to be reconstructed.
  • the three-dimensional visualization model of the image in this embodiment is a marching cube algorithm, wherein the marching cube algorithm is a voxel-level reconstruction method, also known as an isosurface extraction algorithm.
  • the marching cube algorithm first divides the space into a large number of hexahedral grids. Since the above four models can be used to obtain the positional relationship between each sampling point and the human body surface, that is, the spatial field value of these points in space, the three-dimensional model of the clothed human body can be reconstructed according to the spatial field values of these points in space and the large number of divided hexahedral grids.
  • the target SMPL model in the constructed three-dimensional model of the clothed human body can reconstruct the posture characteristics and body shape characteristics of the clothed human body to be reconstructed in the image.
  • the target front-view prediction model and the target rear-view prediction model can reconstruct the sampling points of the clothed human body to be reconstructed and the color characteristics of each sampling point, thereby distinguishing the position information of each sampling point according to the color characteristics.
  • the target in-body and out-body recognition model can determine whether the sampling point is outside or inside the human body surface.
  • the interference of the relative overlapping and penetration phenomenon of the human body in a complex scene with multiple people can be solved, and then the judgment result is processed by the image three-dimensional visualization model to reconstruct the three-dimensional model of the clothed human body in the complex scene with multiple people.
  • the three-dimensional model of the clothed human body constructed by the method for constructing the three-dimensional model of the clothed human body proposed in this embodiment includes dimensional feature recognition at multiple different levels of SMPL parameter dimension, front view dimension, rear view dimension, and inner and outer point dimension of the human body surface.
  • the three-dimensional model of the clothed human body constructed can solve the interference of the relative overlapping and penetration phenomenon of the human body in complex scenes with multiple people, and can restore the model reconstruction of the clothed human body in complex scenes with multiple people.
  • the preset human body posture image training data includes 3D human body posture image training data and 2D (Two-Dimensional) human body posture image training data;
  • the initial SMPL model is trained based on the preset human posture image training data to obtain the trained target SMPL model, including:
  • the initial SMPL model is trained in the first stage based on the 3D human posture image training data to obtain a primary SMPL model
  • the primary SMPL model is trained in the second stage based on the 2D human posture image training data to obtain the trained target SMPL model.
  • 3D human body posture image training data and 2D human body posture image training data are used to optimize it.
  • 3D human pose image training data is obtained from the Human36M dataset
  • 2D human pose image training data is obtained from the MPII (Max Planck Institute for Informatics (MPII)) dataset and the MS COCO (Microsoft Common Objects in Context) dataset.
  • MPII Max Planck Institute for Informatics
  • MS COCO Microsoft Common Objects in Context
  • the MS COCO dataset is a large and rich object detection, segmentation and subtitle dataset
  • the MPII dataset is a benchmark for human pose estimation. Therefore, in this embodiment, the primary SMPL model can be trained again by extracting 2D human pose image training data from the MPII dataset and the MS COCO dataset, thereby compensating for the defect of poor model convergence caused by the lack of 3D human pose image training data, and then enriching the model training data to make the trained target SMPL model converge sufficiently, so that the subsequent results are more accurate.
  • the primary SMPL model is trained in the second stage based on the 2D human posture image training data to obtain a trained target SMPL model, including:
  • the 2D regression loss between the 2D human pose image prediction data and the 2D human pose image training data is calculated, and the primary SMPL model is iteratively updated based on the 2D regression loss until the second stage of training is completed to obtain the trained target SMPL model.
  • primary 3D human body posture image prediction data (i.e., joint 3D coordinates) is obtained through the primary SMPL model, wherein the primary 3D human body posture image prediction data is obtained by performing SMPL estimation under the SMPL posture parameters, SMPL morphological parameters and camera parameters of the current primary SMPL model.
  • the joint 3D coordinates obtained under the current training process are firstly obtained through the orthogonal projection formula using the camera parameters and global rotation parameters under the current training process to obtain the joint 2D coordinates, and then the 2D regression loss is calculated based on the mapped joint 2D coordinates.
  • J2D is the joint 2D coordinate (i.e., 2D human posture image prediction data)
  • s is the image plane scaling corresponding to the camera parameters
  • ⁇ g is the global rotation parameter
  • J3D is the joint 3D coordinate (i.e., 3D human posture image prediction data)
  • t is the image plane translation corresponding to the camera parameters.
  • the 3D posture information is projected to 2D coordinate points through a projection formula, so that the 2D coordinate data set can be applied to 3D reconstruction to optimize the SMPL model and pixel alignment operations, thereby more accurately restoring the model reconstruction of the clothed human body in complex scenes with multiple people.
  • the initial SMPL model is trained in the first stage based on the 3D human posture image training data to obtain a primary SMPL model, including:
  • the initial 3D human posture image prediction data reconstructed by the initial SMPL model is obtained;
  • the initial SMPL model is iteratively updated based on the 3D regression loss until the first stage of training is completed to obtain a trained primary SMPL model.
  • the 3D human posture image training data is first convolved and pooled to form early image features, and then processed through four Conv convolutional layers in the ResNet-50 (Residual Network-50) network to extract image features, thereby obtaining a combined feature F′, where the combined feature F′ is a matrix F′ ⁇ R 2048 ⁇ 8 ⁇ 8 .
  • the combined feature F′ is then processed by a 15*8 Conv convolution layer to generate a 120*8*8 matrix, and then processed by the reshape model, soft argmax model and grid sample model to generate a 3D pose P (3D) , where the 3D pose
  • the combined feature F′ is processed by the grid sample model to form a matrix of J c ⁇ C, and then combined with the posture coordinate confidence to form The matrix finally passes through the graph convolutional neural network and the four MLP networks to output the SMPL posture parameters, SMPL morphological parameters, global rotation parameters and camera parameters.
  • the graph convolutional network calculation formula in this embodiment is:
  • the value at (j, i) is is the normalized adjacency matrix
  • A is the adjacency matrix established according to the skeleton hierarchy
  • D is the (A+I) eigenvector
  • I is the identity matrix
  • ⁇ ReLU is the linear rectification function
  • ⁇ BN is the batch normalization function
  • Wi is the weight of the network.
  • SMPL estimation can be performed to obtain the estimated initial 3D human posture image prediction data.
  • the calculation formula for calculating the 3D regression loss based on the SMPL posture parameters, SMPL morphology parameters, global rotation parameters, camera parameters and initial 3D human posture image prediction data is:
  • L ⁇ is the 3D regression loss corresponding to the SMPL posture parameters
  • L ⁇ is the 3D regression loss corresponding to the SMPL morphology parameters
  • L c is the 3D regression loss corresponding to the camera parameters.
  • yi is the expected value
  • f( xi ) is the predicted value
  • the initial front-view prediction model and the initial back-view prediction model are trained based on the trained target SMPL model to obtain the trained target front-view prediction model and the trained target back-view prediction model, including:
  • a predicted front-view voxel array and a predicted rear-view voxel array are decomposed from the predicted three-dimensional voxel array, and an initial front-view prediction model is trained based on the predicted front-view voxel array, and an initial rear-view prediction model is trained based on the predicted rear-view voxel array to obtain a trained target front-view prediction model and a target rear-view prediction model.
  • the target SMPL model outputs a predicted 3D model of a clothed human body (i.e., a three-dimensional human body mesh), and then the generated three-dimensional human body mesh is voxelized to generate a predicted front-view voxel array and a predicted rear-view voxel array, respectively.
  • a clothed human body i.e., a three-dimensional human body mesh
  • the predicted front-view voxel array refers to a voxel array composed of sampling points in the front-view direction of the three-dimensional human body model
  • the predicted rear-view voxel array refers to a voxel array composed of sampling points in the rear-view direction of the three-dimensional human body model.
  • training data is extracted from the AGORA dataset and the THuman dataset for training, wherein the AGORA dataset is a 3D real human model dataset containing approximately 7,000 models, and thus the two models are trained using the data in this dataset to optimize the model training results.
  • training an initial orthoptic prediction model based on the predicted orthoptic voxel array includes:
  • the initial frontal prediction model is trained based on the frontal clothed human body prediction images.
  • a preset differential renderer can be used for training.
  • the network is trained by regressing the rendered image and the original image. After obtaining the network weight, the preset differential renderer is removed.
  • the preset differential renderer is a mesh renderer differentiable renderer, whose input is the 3D vertex coordinates and the 3D vertex IDs contained in the triangle facets, and the output is the triangle facet ID corresponding to each pixel of the rendered image and the centroid weights of the three vertices of this triangle facet.
  • the renderer also provides the differential of the pixel centroid weight with respect to the vertex position.
  • training an initial backsight prediction model based on a predicted backsight voxel array includes:
  • the initial rear-view prediction model is trained based on the rear-view clothed human body prediction images.
  • the preset differential renderer is a mesh renderer differentiable renderer, whose input is the 3D vertex coordinates and the 3D vertex ids contained in the triangle facets, and the output is the triangle facet id corresponding to each pixel of the rendered image and the centroid weights of the three vertices of this triangle facet.
  • the renderer also provides the differential of the pixel centroid weight with respect to the vertex position.
  • the initial in-vivo and out-of-vivo recognition model is trained based on the target front-view prediction model and the target rear-view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model, including:
  • a 3D prediction model of a clothed human body in front view is estimated based on the target front view prediction model, and a 3D prediction model of a clothed human body in rear view is estimated based on the target rear view prediction model;
  • sampling points located inside or outside the body are respectively taken from the front-view clothed human body 3D prediction model and the rear-view clothed human body 3D prediction model to construct a sampling point training set;
  • the initial in vivo and in vitro recognition model is trained based on the sampling point training set to obtain a trained target in vivo and in vitro recognition model.
  • the front-view clothed human body 3D prediction model and the rear-view clothed human body 3D prediction model are both 3D human body models in a three-dimensional grid, wherein the 3D human body model in the three-dimensional grid includes not only the information of each coordinate point, but also the color value of each coordinate point, wherein the color value of the coordinate point corresponds to the color value of the clothes of the clothed human body.
  • sampling points can be randomly selected around the 3D human body model in each three-dimensional grid for training, wherein the selected sampling points have both coordinate information and color value information of the point, which can be used to train the initial in-vivo and out-of-vivo recognition model to distinguish whether the sampling points are outside or inside the human body surface.
  • this embodiment further proposes a method for 3D reconstruction of a clothed human body.
  • FIG4 is one of the flow charts of the method for 3D reconstruction of a clothed human body provided by this application. As shown in FIG4 , the method comprises:
  • Step 401 determining the clothed human body posture image data to be reconstructed
  • Step 402 inputting the to-be-reconstructed clothed human body posture image data into the clothed human body three-dimensional model, and obtaining a clothed human body 3D model output by the clothed human body three-dimensional model;
  • the three-dimensional model of a clothed human body is obtained based on the method for constructing a three-dimensional model of a clothed human body in any of the above embodiments.
  • the three-dimensional model of the clothed human body obtained by the above-mentioned method for constructing the three-dimensional model of the clothed human body can be applied to the three-dimensional reconstruction of the clothed human body, and the posture image data of the clothed human body to be reconstructed is input into the trained three-dimensional model of the clothed human body to obtain the reconstruction result output by the three-dimensional model of the clothed human body.
  • the three-dimensional model of a clothed human body includes a target SMPL model, a target front-view prediction model, a target rear-view prediction model, a target body and exterior recognition model, and an image three-dimensional visualization model;
  • Decomposing a target front-view voxel array and a target rear-view voxel array from a target three-dimensional voxel array inputting the target front-view voxel array into a target front-view prediction model, obtaining a target front-view clothed human body 3D model output by the target front-view prediction model, and inputting the target rear-view voxel array into the target rear-view prediction model, obtaining a target rear-view clothed human body 3D model output by the target rear-view prediction model;
  • each front-view coordinate point, the color value of each front-view coordinate point, each rear-view coordinate point, the color value of each rear-view coordinate point and the SDF value of each 3D coordinate point into the target in-vivo and out-vivo recognition model, and obtain the in-vivo and out-vivo recognition results of each 3D coordinate point output by the target in-vivo and out-vivo recognition model;
  • the in-vivo and out-vivo recognition results are input into the image three-dimensional visualization model to obtain a 3D model of a clothed human body output by the image three-dimensional visualization model.
  • the SDF value refers to the distance field value, which represents the position of each pixel point from the surface. If it is outside the surface, it is a positive number, and the farther the distance is, the larger the value is; if it is inside the surface, it is a negative number, and the farther the distance is, the smaller the value is.
  • the calculation method of the SDF value in this embodiment is consistent with that in the prior art, and will not be repeated here.
  • the clothed human body 3D reconstruction method proposed in this embodiment obtains a reconstructed clothed human body 3D model by inputting the posture image data of the clothed human body to be reconstructed into the clothed human body 3D model. Since the clothed human body 3D model includes dimensional feature recognition of multiple different levels such as SMPL parameter dimension, front view dimension, rear view dimension and inner and outer point dimension of the human body surface, the model can be used to restore the clothed human body 3D model reconstruction in complex scenes with multiple people.
  • the following is a description of the device for constructing a three-dimensional model of a clothed human body provided in the present application.
  • the device for constructing a three-dimensional model of a clothed human body described below and the method for constructing a three-dimensional model of a clothed human body described above can be referenced to each other.
  • an embodiment of the present application provides a device for constructing a three-dimensional model of a clothed human body, the device comprising: a first training unit 510 , a second training unit 520 , a third training unit 530 and a construction unit 540 .
  • the first training unit 510 is configured to train the initial SMPL model based on the preset human posture image training data to obtain a trained target SMPL model;
  • the second training unit 520 is configured to train the initial front view prediction model and the initial rear view prediction model based on the trained target SMPL model to obtain a trained target front view prediction model and a target rear view prediction model, wherein the target front view prediction model is used to construct a target front view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, and the target rear view prediction model is used to construct a target rear view clothed human body 3D prediction model corresponding to the target three-dimensional voxel array, and the target three-dimensional voxel array is obtained by training the preset human posture image with the target SMPL model.
  • the data is processed;
  • the third training unit 530 is configured to train the initial in-vivo and out-of-vivo recognition model based on the target front-view prediction model and the target rear-view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model, wherein the target in-vivo and out-of-vivo recognition model is used to distinguish the sampling points located inside or outside the body in the target front-view clothed human body 3D prediction model and the target rear-view clothed human body 3D prediction model;
  • the construction unit 540 is configured to construct a clothed human body 3D model based on the target SMPL model, the target front-view prediction model, the target rear-view prediction model, the target in-vivo and out-of-vivo recognition model and the image 3D visualization model, wherein the clothed human body 3D model is used to reconstruct the clothed human body 3D model corresponding to the clothed human body posture image data to be reconstructed.
  • the first training unit 510 is further configured to perform a first-stage training on the initial SMPL model based on the 3D human posture image training data to obtain a primary SMPL model; and perform a second-stage training on the primary SMPL model based on the 2D human posture image training data to obtain a trained target SMPL model.
  • the first training unit 510 is further configured to input the 2D human pose image training data into the primary SMPL model to obtain primary 3D human pose image prediction data output by the primary SMPL model; obtain camera parameters and global rotation parameters corresponding to the primary 3D human pose image prediction data, and map the primary 3D human pose image prediction data to 2D human pose image prediction data based on the camera parameters and the global rotation parameters; calculate the 2D regression loss between the 2D human pose image prediction data and the 2D human pose image training data, and iteratively update the primary SMPL model based on the 2D regression loss until the second stage of training is completed to obtain a trained target SMPL model.
  • the first training unit 510 is further configured to input the 3D human pose image training data into the initial SMPL model, obtain the SMPL pose parameters, SMPL morphological parameters, global rotation parameters and camera parameters output by the initial SMPL model; obtain the initial 3D human pose image prediction data reconstructed by the initial SMPL model based on the SMPL pose parameters, SMPL morphological parameters, global rotation parameters and camera parameters; calculate the 3D regression loss based on the SMPL pose parameters, SMPL morphological parameters, global rotation parameters, camera parameters and the initial 3D human pose image prediction data; iteratively update the initial SMPL model based on the 3D regression loss until the first stage of training is completed to obtain a trained primary SMPL model.
  • the calculation formula for calculating the 3D regression loss using SMPL posture parameters, SMPL morphology parameters, global rotation parameters, camera parameters, and initial 3D human posture image prediction data is:
  • L ⁇ is the 3D regression loss corresponding to the SMPL posture parameters
  • L ⁇ is the 3D regression loss corresponding to the SMPL morphology parameters
  • L c is the 3D regression loss corresponding to the camera parameters.
  • the second training unit 520 is further configured to obtain a predicted three-dimensional voxel array output by the trained target SMPL model; decompose a predicted front-view voxel array and a predicted rear-view voxel array from the predicted three-dimensional voxel array, and train an initial front-view prediction model based on the predicted front-view voxel array, and train an initial rear-view prediction model based on the predicted rear-view voxel array, so as to obtain a trained target front-view prediction model and a trained target rear-view prediction model. type.
  • the second training unit 520 is further configured to input the predicted orthographic voxel array into the initial orthographic prediction model to obtain the orthographic clothed human body 3D prediction model output by the initial orthographic prediction model; input the orthographic clothed human body 3D prediction model into a preset differential renderer to obtain the orthographic clothed human body prediction image rendered by the preset differential renderer; and train the initial orthographic prediction model based on the orthographic clothed human body prediction image.
  • the second training unit 520 is further configured to input the predicted rear-view voxel array into the initial rear-view prediction model to obtain a rear-view clothed human body 3D prediction model output by the initial rear-view prediction model; input the rear-view clothed human body 3D prediction model into a preset differential renderer to obtain a rear-view clothed human body prediction image rendered by the preset differential renderer; and train the initial rear-view prediction model based on the rear-view clothed human body prediction image.
  • the third training unit 530 is further configured to estimate a front-view clothed human body 3D prediction model based on the target front-view prediction model, and to estimate a rear-view clothed human body 3D prediction model based on the target rear-view prediction model; to take a number of sampling points located inside or outside the body from the front-view clothed human body 3D prediction model and the rear-view clothed human body 3D prediction model, respectively, to construct a sampling point training set; and to train the initial in-vivo and out-vivo recognition model based on the sampling point training set to obtain a trained target in-vivo and out-vivo recognition model.
  • the structural units of the initial front view prediction model and the initial rear view prediction model are ResNet sub-networks;
  • the ResNet sub-network includes a Conv convolution layer, a BatchNorm normalization layer, and a Relu activation function layer.
  • the target body inside and outside recognition model is composed of an input layer, a first fully connected layer of 13 neurons, a second fully connected layer of 521 neurons, a third fully connected layer of 256 neurons, a fourth fully connected layer of 128 neurons, a fifth fully connected layer of 1 neuron and an output layer.
  • the apparatus for constructing a three-dimensional model of a clothed human body trains an initial SMPL model based on preset human posture image training data to obtain a trained target SMPL model, trains an initial front-view prediction model and an initial rear-view prediction model based on the trained target SMPL model to obtain a trained target front-view prediction model and a target rear-view prediction model, trains an initial in-vivo and out-of-vivo recognition model based on the target front-view prediction model and the target rear-view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model, and finally constructs a three-dimensional model of a clothed human body based on the target SMPL model, the target front-view prediction model, the target rear-view prediction model, the target in-vivo and out-of-vivo recognition model and the image three-dimensional visualization model.
  • the three-dimensional model of a clothed human body constructed in this way includes dimensional feature recognition of multiple different levels of SMPL parameter dimension, front-view dimension, rear-view dimension and internal and external point dimension of the human body surface, and thus the constructed three-dimensional model of a clothed human body can solve the interference of the relative overlapping and penetration phenomenon of the human body in complex scenes with multiple people, and thus can restore the model reconstruction of the clothed human body in complex scenes with multiple people.
  • the following is a description of the three-dimensional reconstruction device for a clothed human body provided by the present application.
  • the three-dimensional reconstruction device for a clothed human body described below and the three-dimensional reconstruction method for a clothed human body described above can be referenced to each other.
  • the embodiment of the present application provides a three-dimensional reconstruction device for a clothed human body, the device comprising: a determination unit 610 and a reconstruction unit 620.
  • the determination unit 610 is configured to determine the image data of the clothed human body posture to be reconstructed; the reconstruction unit 620 is configured to input the image data of the clothed human body posture to be reconstructed into the three-dimensional model of the clothed human body to obtain the 3D model of the clothed human body output by the three-dimensional model of the clothed human body.
  • the three-dimensional model of the dressed human body includes a target SMPL model, a target front-view prediction model, a target rear-view prediction model, a target in-vivo recognition model and an image three-dimensional visualization model;
  • the reconstruction unit 620 is further configured to input the posture image data of the dressed human body to be reconstructed into the target SMPL model, obtain the target dressed human body 3D model output by the target SMPL model, and voxelize the target dressed human body 3D model to obtain a target three-dimensional voxel array; decompose the target front-view voxel array and the target rear-view voxel array from the target three-dimensional voxel array, input the target front-view voxel array into the target front-view prediction model, obtain the target front-view dressed human body 3D model output by the target front-view prediction model, and input the target rear-view voxel array into the target rear-view prediction model.
  • the apparatus for constructing a three-dimensional model of a clothed human body obtains a reconstructed 3D model of a clothed human body by inputting the posture image data of the clothed human body to be reconstructed into the three-dimensional model of the clothed human body. Since the three-dimensional model of the clothed human body includes dimensional feature recognition at multiple levels such as SMPL parameter dimension, frontal dimension, rearward dimension and internal and external point dimension of the human body surface, the model can be used to restore the clothed human body in complex scenes with multiple people.
  • FIG7 illustrates a schematic diagram of a physical structure of an electronic device.
  • the electronic device may include: a processor 701, A communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 communicate with each other via the communication bus 704.
  • the processor 701 can call the logic instructions in the memory 703 to execute a human body three-dimensional reconstruction model training method, the method comprising: training an initial SMPL model based on preset human body posture image training data to obtain a trained target SMPL model; training an initial front view prediction model and an initial rear view prediction model based on the trained target SMPL model to obtain a trained target front view prediction model and a target rear view prediction model; training an initial in-vivo and out-of-vivo recognition model based on the target front view prediction model and the target rear view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model; constructing a three-dimensional model of a clothed human body based on the target SMPL model, the target front view prediction model, the target rear view prediction model, the target in-vivo and out-of-vivo recognition model and the image three-dimensional visualization model.
  • the logic instructions in the above-mentioned memory 703 can be implemented in the form of software functional units and can be stored in a non-volatile readable storage medium when sold or used as an independent product.
  • the technical solution of the present application, or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a non-volatile readable storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the embodiment method of the present application.
  • the aforementioned non-volatile readable storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc., which can store program code.
  • the present application also provides a computer program product, which includes a computer program.
  • the computer program can be stored on a non-volatile readable storage medium.
  • the computer can execute the human body three-dimensional reconstruction model training method provided by the above methods, the method including: training an initial SMPL model based on preset human body posture image training data to obtain a trained target SMPL model; training an initial front view prediction model and an initial rear view prediction model based on the trained target SMPL model to obtain a trained target front view prediction model and a target rear view prediction model; training an initial in-vivo and out-of-vivo recognition model based on the target front view prediction model and the target rear view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model; constructing a three-dimensional model of a clothed human body based on the target SMPL model, the target front view prediction model, the target rear view prediction model, the target in-vivo and out-of-vivo recognition
  • the present application also provides a non-volatile readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to execute the human body three-dimensional reconstruction model training method provided by the above-mentioned methods, the method comprising: training an initial SMPL model based on preset human body posture image training data to obtain a trained target SMPL model; training an initial front view prediction model and an initial rear view prediction model based on the trained target SMPL model to obtain a trained target front view prediction model and a target rear view prediction model; training an initial in-vivo and out-of-vivo recognition model based on the target front view prediction model and the target rear view prediction model to obtain a trained target in-vivo and out-of-vivo recognition model; constructing a three-dimensional model of a clothed human body based on the target SMPL model, the target front view prediction model, the target rear view prediction model, the target in-vivo and out-of-vivo recognition model and the image three-dimensional visualization
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, i.e., they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative effort.
  • each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware.
  • the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a non-volatile readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method of the embodiment or some part of the embodiment.
  • ROM/RAM read-only memory
  • a computer device which can be a personal computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Architecture (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

La présente demande se rapporte au domaine technique de l'apprentissage profond. L'invention concerne un procédé et un appareil de construction de modèle, un procédé et un appareil de reconstruction, et un dispositif électronique et un support de stockage lisible non volatil. Le procédé de construction de modèle consiste à : obtenir un modèle SMPL cible entraîné ; obtenir un modèle de prédiction de vue avant cible et un modèle de prédiction de vue arrière cible sur la base du modèle SMPL cible ; obtenir un modèle de reconnaissance in vivo et in vitro cible sur la base du modèle de prédiction de vue avant cible et du modèle de prédiction de vue arrière cible ; et enfin, construire un modèle tridimensionnel de corps humain habillé sur la base des plusieurs modèles ci-dessus et d'un modèle de visualisation tridimensionnelle d'image, le modèle tridimensionnel de corps humain habillé construit de cette manière comprenant une variété de niveaux différents de caractéristiques dimensionnelles dans la dimension de paramètre SMPL, la dimension de vue avant, la dimension de vue arrière et la dimension de points à l'intérieur et à l'extérieur de la surface d'un corps humain, et ainsi le modèle tridimensionnel de corps humain habillé construit peut récupérer une reconstruction de modèle d'un corps humain habillé dans une scène complexe où de multiples personnes sont présentes.
PCT/CN2023/114799 2022-11-18 2023-08-24 Procédé et appareil de construction de modèle, procédé et appareil de reconstruction, et dispositif électronique et support de stockage lisible non volatil WO2024103890A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211443259.4 2022-11-18
CN202211443259.4A CN115496864B (zh) 2022-11-18 2022-11-18 模型构建方法、重建方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024103890A1 true WO2024103890A1 (fr) 2024-05-23

Family

ID=85116198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/114799 WO2024103890A1 (fr) 2022-11-18 2023-08-24 Procédé et appareil de construction de modèle, procédé et appareil de reconstruction, et dispositif électronique et support de stockage lisible non volatil

Country Status (2)

Country Link
CN (1) CN115496864B (fr)
WO (1) WO2024103890A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118229893A (zh) * 2024-05-24 2024-06-21 深圳魔视智能科技有限公司 一种稀疏点云三维重建方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496864B (zh) * 2022-11-18 2023-04-07 苏州浪潮智能科技有限公司 模型构建方法、重建方法、装置、电子设备及存储介质
CN115797567B (zh) * 2022-12-27 2023-11-10 北京元起点信息科技有限公司 一种衣物三维驱动模型的建立方法、装置、设备及介质
CN118314463A (zh) * 2024-06-05 2024-07-09 中建三局城建有限公司 一种基于机器学习的结构损伤识别方法及系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859296A (zh) * 2019-02-01 2019-06-07 腾讯科技(深圳)有限公司 Smpl参数预测模型的训练方法、服务器及存储介质
CN110599540A (zh) * 2019-08-05 2019-12-20 清华大学 多视点相机下的实时三维人体体型与姿态重建方法及装置
CN111739161A (zh) * 2020-07-23 2020-10-02 之江实验室 一种有遮挡情况下的人体三维重建方法、装置及电子设备
WO2022120843A1 (fr) * 2020-12-11 2022-06-16 中国科学院深圳先进技术研究院 Procédé et appareil de reconstruction tridimensionnelle de corps humain, ainsi que dispositif informatique et support de stockage
CN114782634A (zh) * 2022-05-10 2022-07-22 中山大学 基于表面隐函数的单目图像着装人体重建方法与系统
WO2022156533A1 (fr) * 2021-01-21 2022-07-28 魔珐(上海)信息科技有限公司 Procédé et appareil de reconstruction de modèle de corps humain tridimensionnel, dispositif électronique et support de stockage
CN115049764A (zh) * 2022-06-24 2022-09-13 苏州浪潮智能科技有限公司 Smpl参数预测模型的训练方法、装置、设备及介质
CN115496864A (zh) * 2022-11-18 2022-12-20 苏州浪潮智能科技有限公司 模型构建方法、重建方法、装置、电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428493B (zh) * 2019-07-12 2021-11-02 清华大学 基于网格形变的单图像人体三维重建方法及系统
CN111968217B (zh) * 2020-05-18 2021-08-20 北京邮电大学 基于图片的smpl参数预测以及人体模型生成方法
CN114067057A (zh) * 2021-11-22 2022-02-18 安徽大学 一种基于注意力机制的人体重建方法、模型、装置
CN114581502A (zh) * 2022-03-10 2022-06-03 西安电子科技大学 基于单目图像的三维人体模型联合重建方法、电子设备及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859296A (zh) * 2019-02-01 2019-06-07 腾讯科技(深圳)有限公司 Smpl参数预测模型的训练方法、服务器及存储介质
CN110599540A (zh) * 2019-08-05 2019-12-20 清华大学 多视点相机下的实时三维人体体型与姿态重建方法及装置
CN111739161A (zh) * 2020-07-23 2020-10-02 之江实验室 一种有遮挡情况下的人体三维重建方法、装置及电子设备
WO2022120843A1 (fr) * 2020-12-11 2022-06-16 中国科学院深圳先进技术研究院 Procédé et appareil de reconstruction tridimensionnelle de corps humain, ainsi que dispositif informatique et support de stockage
WO2022156533A1 (fr) * 2021-01-21 2022-07-28 魔珐(上海)信息科技有限公司 Procédé et appareil de reconstruction de modèle de corps humain tridimensionnel, dispositif électronique et support de stockage
CN114782634A (zh) * 2022-05-10 2022-07-22 中山大学 基于表面隐函数的单目图像着装人体重建方法与系统
CN115049764A (zh) * 2022-06-24 2022-09-13 苏州浪潮智能科技有限公司 Smpl参数预测模型的训练方法、装置、设备及介质
CN115496864A (zh) * 2022-11-18 2022-12-20 苏州浪潮智能科技有限公司 模型构建方法、重建方法、装置、电子设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118229893A (zh) * 2024-05-24 2024-06-21 深圳魔视智能科技有限公司 一种稀疏点云三维重建方法及装置

Also Published As

Publication number Publication date
CN115496864B (zh) 2023-04-07
CN115496864A (zh) 2022-12-20

Similar Documents

Publication Publication Date Title
WO2024103890A1 (fr) Procédé et appareil de construction de modèle, procédé et appareil de reconstruction, et dispositif électronique et support de stockage lisible non volatil
CN109636831B (zh) 一种估计三维人体姿态及手部信息的方法
US9679192B2 (en) 3-dimensional portrait reconstruction from a single photo
WO2022001236A1 (fr) Procédé et appareil de génération de modèle tridimensionnel, ainsi que dispositif informatique et support de stockage
Stoll et al. Fast articulated motion tracking using a sums of gaussians body model
JP4950787B2 (ja) 画像処理装置及びその方法
WO2022205760A1 (fr) Procédé et appareil de reconstruction tridimensionnelle de corps humain, ainsi que dispositif et support de stockage
WO2021253788A1 (fr) Procédé et appareil de construction de modèle de corps humain tridimensionnel
CN116109798B (zh) 图像数据处理方法、装置、设备及介质
EP3980974A1 (fr) Animation de corps en temps réel basée sur une image unique
WO2021063271A1 (fr) Procédé et système de reconstruction de modèle de corps humain, et support de stockage associé
EP3756163B1 (fr) Procédés, dispositifs et produits programmes d'ordinateur pour des reconstructions de profondeur basées sur un gradient avec des statistiques robustes
WO2020108304A1 (fr) Procédé de reconstruction de modèle de maillage de visage, dispositif, appareil et support de stockage
CN113111861A (zh) 人脸纹理特征提取、3d人脸重建方法及设备及存储介质
CN111950430B (zh) 基于颜色纹理的多尺度妆容风格差异度量及迁移方法、系统
CN113723317B (zh) 3d人脸的重建方法、装置、电子设备和存储介质
CN113628327A (zh) 一种头部三维重建方法及设备
CN114863037A (zh) 基于单手机的人体三维建模数据采集与重建方法及系统
CN110660076A (zh) 一种人脸交换方法
CN111680573B (zh) 人脸识别方法、装置、电子设备和存储介质
CN106558042B (zh) 一种对图像进行关键点定位的方法和装置
KR101812664B1 (ko) 가분적 외곽선을 가지는 다시점 객체 추출 방법 및 장치
CN114429518B (zh) 人脸模型重建方法、装置、设备和存储介质
JP2012208759A (ja) 3次元形状モデル高精度化方法およびプログラム
KR20190069750A (ko) 2d를 3d로 변환하는 기술과 posit 알고리즘을 이용한 증강현실 표현방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890328

Country of ref document: EP

Kind code of ref document: A1