WO2024032464A1 - Procédé, appareil et dispositif de reconstruction tridimensionnelle d'un visage, support et produit - Google Patents

Procédé, appareil et dispositif de reconstruction tridimensionnelle d'un visage, support et produit Download PDF

Info

Publication number
WO2024032464A1
WO2024032464A1 PCT/CN2023/111005 CN2023111005W WO2024032464A1 WO 2024032464 A1 WO2024032464 A1 WO 2024032464A1 CN 2023111005 W CN2023111005 W CN 2023111005W WO 2024032464 A1 WO2024032464 A1 WO 2024032464A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
dimensional
reconstruction
dimensional face
Prior art date
Application number
PCT/CN2023/111005
Other languages
English (en)
Chinese (zh)
Inventor
靳凯
Original Assignee
广州市百果园信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市百果园信息技术有限公司 filed Critical 广州市百果园信息技术有限公司
Publication of WO2024032464A1 publication Critical patent/WO2024032464A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • This application relates to the field of image processing technology, for example, to a three-dimensional face reconstruction method and its device, equipment, media, and products.
  • 3DMM 3D Morphable Models
  • 3DMM based on neural networks needs rich and accurate training data to achieve better reconstruction results, which means the training cost is high.
  • This application provides a three-dimensional face reconstruction method and its corresponding devices, equipment, non-volatile readable storage media, and computer program products.
  • a three-dimensional face reconstruction method including the following steps:
  • the parameter mapping layer of the three-dimensional face reconstruction network is used to map the face area image to the Corresponding parameter coefficients in the parameterized three-dimensional face model, the parameter coefficients include an identity coefficient corresponding to the facial identity and an expression coefficient corresponding to the facial expression.
  • a three-dimensional face reconstruction device including:
  • the image acquisition module is configured to acquire face image data and extract the face image therein;
  • a face detection module configured to perform key point detection on the face image and obtain a face region image of the area where the face key points are located;
  • the face modeling module is configured to use the bilinear modeling layer of the three-dimensional face reconstruction network pre-trained to a converged state to perform bilinear modeling of facial identity and facial expression on the face area image, and obtain Parametric three-dimensional face model;
  • a parameter mapping module configured to use the parameter mapping layer of the three-dimensional face reconstruction network to map the face area image into corresponding parameter coefficients in the parameterized three-dimensional face model, where the parameter coefficients include the The identity coefficient corresponding to the facial identity and the expression coefficient corresponding to the facial expression.
  • a three-dimensional face reconstruction device including a central processor and a memory.
  • the central processor is configured to call and run a computer program stored in the memory to execute the three-dimensional face reconstruction method described in the present application. Steps of face reconstruction method.
  • a non-volatile readable storage medium which stores a computer program implemented according to the three-dimensional face reconstruction method in the form of computer-readable instructions, and the computer program is When the computer calls the runtime, it executes the steps involved in the method.
  • a computer program product including a computer program/instructions that, when executed by a processor, implement the steps of the method described in any embodiment of the present application.
  • Figure 1 is a schematic flow chart of an embodiment of the three-dimensional face reconstruction method of the present application.
  • Figure 2 is a flowchart of an embodiment of the exemplary scenario application of the three-dimensional face reconstruction method of the present application. intention;
  • Figure 3 is a schematic diagram of the expression migration results of the three-dimensional face model in the embodiment of the present application.
  • Figure 4 is a schematic flowchart of obtaining a face area image in an embodiment of the present application.
  • Figure 5 is a schematic diagram of the results of obtaining a three-dimensional face model in an embodiment of the present application.
  • Figure 6 is a schematic flowchart of parameter mapping for facial feature maps in an embodiment of the present application.
  • Figure 7 is a schematic flowchart of training a three-dimensional face reconstruction network in an embodiment of the present application.
  • Figure 8 is a schematic diagram of the training framework used in the three-dimensional face reconstruction network method in the embodiment of the present application.
  • Figure 9 is a schematic flow chart of the calculation of the reconstruction loss function in the embodiment of the present application.
  • Figure 10 is a functional block diagram of the three-dimensional face reconstruction device of the present application.
  • Figure 11 is a schematic structural diagram of a three-dimensional face reconstruction device used in this application.
  • the models cited or that may be cited in this application include traditional machine learning models or deep learning models. Unless expressly specified, they can be deployed on a remote server and remotely called on the client, or they can be deployed on a client with competent device capabilities. Direct call, in some embodiments, when it is run on the client, its corresponding machine intelligence can be obtained through transfer learning, so as to reduce the requirements for client hardware running resources and avoid excessive occupation of client hardware running resources.
  • Step S1100 Obtain facial image data and extract facial images therein;
  • Face image data refers to image data with human face parts. This type of image data can be obtained through authorized live broadcast, on-demand and other legal channels. It can be video stream data or image data.
  • the image data of the real person needs to be collected in real time through a camera, and then sent to the backend server for further processing to generate a digital person image and replace it with the image data of the real person. describe the real person in the image data, and finally output the image data with the image of the digital person to the display terminal device facing the audience.
  • the collected image data of real people can be used as the face image data.
  • the video data that has been shot can be stored in the server, and the relevant technical personnel can capture the image data with the target person in it, and then replace it with the image data.
  • the corresponding digital human image is generated, and finally the corresponding image file is generated.
  • the image data with the target person can be used as the face image data.
  • some advertising posters need to use digital human images to attract the masses.
  • an image with a real person is first captured by a camera, and then handed over to relevant technical personnel to generate a digital image of the corresponding style. human image to replace the real person in the image.
  • the image with a real person is the face image data.
  • the face image data may be a kind of video stream data or a kind of image data.
  • it is necessary to further extract the face images in the face image data that is, when the face image data is video stream data, each frame of the image is extracted as a face image; when When the face image data is image data, the face image data is a face image.
  • the extracted face images need to be in a unified format, which can be YUV420 format, or RGB24 format, or YUV444 format, or other similar encoding formats.
  • a unified format which can be YUV420 format, or RGB24 format, or YUV444 format, or other similar encoding formats.
  • the unification of its image data format can make the interfaces for subsequent operations consistent, facilitating unified processing and rapid completion.
  • Step S1200 Perform key point detection on the face image to obtain a face region image of the area where the face key points are located;
  • face detection and face key point detection are performed to detect and obtain the face area image and face key points in the face image.
  • a face detection model pre-trained to a converged state is used to perform face detection to obtain face target frame information.
  • the face target frame information includes the upper left point and the lower right point of the face part. Point coordinate information.
  • the image of the corresponding area position is intercepted from the face image, which is the face area image, which eliminates the interference of redundant image information in non-face areas and has the ability to focus more on face information.
  • a face key point detection model pre-trained to a converged state is used to perform face key point detection to obtain face key point information.
  • the face key points are key points pointing to the face part in the face area image, which can represent the location of the key areas of the face, such as eyebrows, eyes, nose, mouth, facial contour, etc.
  • a standard alignment operation After obtaining the face area image and face key points, a standard alignment operation also needs to be performed.
  • a preset standard three-dimensional face model can be projected into a two-dimensional plane, and the standard face key point information on the two-dimensional plane can be obtained accordingly, and the face key points and the standard face key points can be obtained.
  • the points are aligned and matched to obtain standard transformation parameters, and the face area image is transformed into a face area image with standard size and angle according to the standard transformation parameters.
  • Step S1300 Use the bilinear modeling layer of the three-dimensional face reconstruction network pre-trained to the convergence state to perform bilinear modeling of facial identity and facial expression on the face area image to obtain a parameterized three-dimensional dimensional face model;
  • the 3D face reconstruction network includes a two-layer structure.
  • the first layer is a bilinear modeling layer, based on a parameterized 3D face model, used to decouple facial identity and facial expressions for the face region image. Modeling, the corresponding identity coefficient and expression coefficient need to be further determined;
  • the second layer is the parameter mapping layer, which is used to map the face area image to the corresponding parameter coefficients in the parameterized three-dimensional face model , the parameter coefficients include an identity coefficient corresponding to the facial identity and an expression coefficient corresponding to the facial expression.
  • a parameterized face model is first determined as the three-dimensional face model to be optimized; in one embodiment, the parameterized face model can be BFM (Basel Face Model , BFM) model.
  • BFM Basel Face Model
  • the BFM model is based on the 3DMM (3D Morphable Models, 3DMM) statistical model. According to the principle of 3DMM, each face is a superposition of shape vectors and texture vectors.
  • 3DMM 3D Morphable Models
  • each face is a superposition of shape vectors and texture vectors.
  • vertex represents the face grid vertex
  • identity represents the identity coefficient
  • expression represents the expression coefficient
  • core_tensor represents the tensor representation of the three-dimensional face model grid vertex.
  • the 3DMM based on the bilinear model uses coefficient multiplication to decouple the identity information and expression information of the face for modeling, and can realize the separate application of the identity coefficient and the expression coefficient to realize the expression.
  • Migration etc. people with different identities and the same expression can be represented by a set of different identity coefficients and the same expression coefficient. In another embodiment, people with the same identity and different expressions can be represented by a set of the same identity coefficients and different expression coefficients.
  • the three-dimensional face model database can be used by relevant technical personnel according to actual applications. Set according to scenarios and actual business needs.
  • this application pre-constructs a three-dimensional face model database with a number of 79, with 46 types of expressions, that is, the vector dimension of the identity coefficient in the face model is 79.
  • the vector dimension of the expression coefficient is 46.
  • the number of the three-dimensional face model database, the number of expression types, the vector dimensions of the identity coefficient and the vector dimension of the expression coefficient can be adjusted according to the actual application scenario, without affecting the actual application of the method. .
  • Step S1400 Use the parameter mapping layer of the three-dimensional face reconstruction network to map the face area image into corresponding parameter coefficients in the parameterized three-dimensional face model.
  • the parameter coefficients include the parameters corresponding to the face The identity coefficient corresponding to the identity and the expression coefficient corresponding to the facial expression.
  • the parameter mapping layer of the three-dimensional face reconstruction network is the second layer structure of the three-dimensional face reconstruction network, which is used to map the face region image to the corresponding parameter coefficients in the parameterized three-dimensional face model. .
  • the face area image contains all the information of the target face, such as the identity information that represents the identity of the face, the expression information that represents the facial expression, etc. Therefore, the mapping relationship between it and the identity coefficient and expression coefficient in the three-dimensional face model is constructed. feasible.
  • texture parameters, lighting parameters, posture parameters and transformation parameters can all be expressed in the face area image, and it is also feasible to construct corresponding mapping relationships based on these parameters.
  • mapping relationship can be constructed between the face area image and the identity parameters, expression parameters, texture parameters, lighting parameters, posture parameters, and transformation parameters, so that the identity coefficient, Expression coefficient, texture coefficient, lighting coefficient, posture coefficient, transformation coefficient, etc.
  • the encoder in the three-dimensional face reconstruction network is first used to perform feature extraction on the face area image to obtain the depth features of the face area image, which is called a face feature map; secondly, Perform spatial mapping on the facial feature map to obtain all parameter coefficients, including: identity coefficient, expression coefficient, texture coefficient, lighting coefficient, posture coefficient, transformation coefficient, where the identity coefficient and expression coefficient are the double Parameter coefficients corresponding to identity parameters and expression parameters in the linear modeling layer.
  • the parameter coefficients corresponding to each face image can be stored independently for later use, and can be used to arbitrarily combine to construct different three-dimensional face models, so as to obtain different effects of people. face image. For example, one identity coefficient is combined with multiple expression coefficients to generate face images corresponding to different expressions of the same person, or one expression coefficient is combined with multiple different identity coefficients to generate face images corresponding to the same expressions of different characters. face images, etc.
  • the method after using the parameter mapping layer of the three-dimensional face reconstruction network to map the face region image to the corresponding parameter coefficients in the parameterized three-dimensional face model, the method includes:
  • Three-dimensional reconstruction is performed according to the parameter coefficients to obtain a three-dimensional face model of the face area image.
  • the identity coefficient and the expression coefficient among the parameter coefficients are used to construct the corresponding three-dimensional face model. Therefore, the above process of the present application is performed according to a face area image to obtain parameterized By using the three-dimensional face model and the identity coefficient and expression coefficient, a three-dimensional face model that effectively reflects the identity information and expression information of the face area image can be obtained.
  • V _ The expression coefficient output by the parameter mapping layer, ⁇ id (F g(x) ) represents the identity coefficient output by the parameter mapping layer in the three-dimensional face reconstruction network.
  • this application uses the bilinear modeling layer of the three-dimensional face reconstruction network pre-trained to the convergence state to target the face after obtaining the face area image of the area where the key points of the face are located in the face image.
  • the regional image performs bilinear modeling of identity information and expression information to obtain a parameterized three-dimensional face model; then the parameter mapping layer of the three-dimensional face reconstruction network is used to map the facial region image to the parameterized
  • the corresponding parameter coefficients in the three-dimensional face model are used to complete the reconstruction of the three-dimensional face model.
  • the three-dimensional face reconstruction method uses a bilinear modeling layer to decouple the identity information and expression information in the face, thereby effectively separating the expression parameters and realizing expression migration, which can greatly promote live broadcast, film and television, and animation and other related industries;
  • the three-dimensional face reconstruction network is suitable for training using a weakly supervised learning method based on a single image, which can greatly reduce the acquisition cost and labeling cost of training data, and contribute to scale application.
  • the parameter mapping layer of the three-dimensional face reconstruction network is used to map the face area image to the corresponding parameter coefficients in the parameterized three-dimensional face model. Afterwards, the method further includes:
  • Step S1500 Obtain the target parameter coefficients required to constitute the parameterized three-dimensional face model, where the target parameter coefficients include pre-specified identity coefficients and pre-specified expression coefficients;
  • the parameterized three-dimensional face model is constructed in the bilinear modeling layer of the three-dimensional face reconstruction network, and its undetermined parameter coefficients are the identity coefficient and the expression coefficient.
  • the vector dimension of the identity coefficient is 79
  • the vector dimension of the expression coefficient is 46.
  • Step S1600 Migrate the target parameter coefficients to the three-dimensional face model of the corresponding digital person to obtain the three-dimensional face model of the digital person;
  • the previous step completed the reconstruction of the three-dimensional face model of the face area image, but in actual application scenario requirements, it is more inclined to apply its digital image.
  • a digital person is used to replace the face part in the face area image, in order to replace the "real person” with a "digital person” for activities such as live broadcast or communication and interaction.
  • the real-time emotional simulation of "digital people” has become an urgent problem to be solved.
  • One solution is to migrate the real expressions of the "real person” to the "digital person” so that it can simultaneously express the emotions of the "real person".
  • the bilinear modeling layer constructed by the present application can realize the decoupling of expression information, thereby migrating the expression coefficients in the three-dimensional face model of the "real person” to the "digital person". "In the three-dimensional face model in ", the expression migration from "real person” to "digital person” can be completed.
  • the number of identities and the number of expressions that is, the vector dimensions of the identity coefficient and the expression coefficient should be consistent.
  • the expression coefficient corresponding to the "real person” can be directly replaced with the expression coefficient in the "digital person” three-dimensional face model, and the other parameters remain unchanged, and then we can obtain 3D face model of digital human after expression transfer.
  • Step S1700 Render and project the three-dimensional face model of the digital human into a two-dimensional image space to obtain a digital human image.
  • step S1400 after obtaining the three-dimensional face model of the "digital human", three-dimensional rendering and projection are performed based on the illumination coefficient, posture coefficient, transformation coefficient obtained in step S1400, and the texture coefficient of the "digital human” itself.
  • the image of the "digital human” is obtained, that is, the expression migration from the face area image to the "digital human” image is completed.
  • the face area image in the single-frame face image is obtained and replaced with the "digital human” image, so that the "digital human” can be broadcast simultaneously.
  • This type of application is one of the scenarios where the expression migration function of the method is applied, and it can also be used in other scenarios.
  • the method is aimed at decoupling modeling of identity information and expression information, which can bring huge benefits to industries such as live broadcast, film and television, and digital image. It has great application value, and its expression migration application does not affect the changes of other face information.
  • Figure 4 to implement key point detection on the face image.
  • Step S1210 Perform face key point detection on the face image to obtain the face area image and face key point information
  • the face detection model pre-trained to the convergence state is used to perform face detection on the face image, and the face rectangular frame information in the face image is obtained.
  • the face rectangular frame can calibrate the position and size of the face part in the face image, and the calibration result can be represented by a set with four coordinate elements, such as S roi .
  • the corresponding area image is selected from the face image according to the set, that is, the face area image is obtained.
  • the face area image completely contains the face part, and redundant parts of other non-face areas in the face image are removed.
  • S roi ⁇ x 1 ,y 1 ,x 2 ,y 2 ⁇
  • x 1 and y 1 represent the pixel coordinates of the upper left corner of the detected face part
  • x 2 and y 2 represent the pixel coordinates of the lower right corner of the face part
  • the face detection model and face key point detection model are implemented by neural network models. In practical applications, relatively excellent face detection models and face key point detection models in related technologies can be used.
  • Step S1220 Align the face key points with standard face key points to obtain standard alignment parameters.
  • the standard face key points are corresponding face key points obtained by two-dimensional projection of a standard three-dimensional face model;
  • the face contours in the face area image have different angles and sizes, which can easily interfere with subsequent three-dimensional face parameter calibration work. Therefore, it is necessary to perform standard alignment on the face area images.
  • the face key points are also detected from the standard face image projected from the standard three-dimensional face model to the two-dimensional plane, thereby obtaining the standard face key points.
  • the standard three-dimensional face model can be preset by relevant technical personnel.
  • the face key points detected from the face area image are aligned to obtain corresponding standard transformation parameters.
  • the method used in the alignment operation can be any minimization method such as PnP, least squares method, etc. In one embodiment of the present application, the PnP method is used.
  • the standard transformation parameters include translation transformation parameters and scale transformation parameters.
  • Step S1230 Align the face area image according to the standard alignment parameter.
  • a standard transformation is performed on the face area image S roi and the face key point L n .
  • its size is adjusted to a preset size, which is 224 ⁇ 224 ⁇ 3 in one embodiment of the present application.
  • the Hough transformation can be used to obtain the posture information of the three-dimensional facial model corresponding to the facial region image.
  • the posture information of the three-dimensional face model includes pitch angle, roll angle and rotation angle.
  • the parameter mapping layer of the three-dimensional face reconstruction network is used to map the face area image to the corresponding part of the parameterized three-dimensional face model.
  • Parameter coefficients including:
  • Step S1410 Use the encoder in the three-dimensional face reconstruction network to perform feature extraction on the face area image to obtain a face feature map
  • the encoder pre-trained to convergence is used to perform feature extraction on the face area image obtained in step S1200, and obtain Facial feature map.
  • the face feature map can reduce the interference of redundant information in non-face area images in the face image, thereby better extracting the semantic information of the face part.
  • the encoder is implemented by a neural network model.
  • the neural network model can use a variety of relatively excellent feature extraction models in related technologies, including: VGG16 model, VGG19 model, InceptionV3 model, Xception model, MobileNet model, AlexNet model, LeNet Model, ZF_Net model, ResNet18 model, ResNet34 model, ResNet_50 model, ResNet_101 model, ResNet_152 model, etc., are all mature feature extraction models.
  • the feature extraction model is a neural network model that has been trained to convergence. In one embodiment, it is trained to convergence on the ImageNet large-scale data set.
  • the output of the encoder is set to a feature map.
  • the encoder directly outputs the feature map of the last convolutional layer, which is called a face feature map.
  • the input size of the encoder is defined as N ⁇ C ⁇ H ⁇ W, and the output size is N ⁇ C' ⁇ H' ⁇ W', where N represents the number of samples, C represents the number of channels, and H ⁇ W represents the preset image. Size, C' represents the number of features, H' ⁇ W' represents the feature map size.
  • Step S1420 Perform spatial mapping on the facial feature map to obtain parameter coefficients in the bilinear modeling layer
  • the above facial feature map is spatially mapped to obtain the parameter system of the three-dimensional facial model. numbers and related parameter coefficients for 3D rendering and 2D projection.
  • the space mapping includes semantic space mapping and parameter space mapping.
  • the semantic space mapping maps the face feature map into a face feature vector.
  • the face feature vector contains all the depth semantic information in the face image, which is the face identity semantic information, expression semantic information, and texture semantic information. , comprehensive representation of illumination semantic information, posture semantic information, and transformation semantic information.
  • the parameter space mapping maps the face feature vector to the corresponding parameter subspace, thereby obtaining the coefficients of its corresponding parameters.
  • the parameter space includes a face identity parameter space, an expression parameter space, a texture parameter space, and an illumination parameter space. , attitude parameter space, transformation parameter space.
  • the facial feature map is processed through the above-mentioned semantic space mapping and parameter space mapping to obtain identity coefficients, expression coefficients, texture coefficients, illumination coefficients, posture coefficients, and transformation coefficients.
  • identity coefficient and expression coefficient are used to reconstruct the three-dimensional face model of the face area image;
  • texture coefficient, lighting coefficient, posture coefficient, and transformation coefficient are used for three-dimensional rendering and two-dimensional projection.
  • the parameter mapping layer of the three-dimensional face reconstruction network first extracts the face feature map in the face area image, and then maps it to the semantic space to extract its semantic feature vector, and then It is mapped to different parameter spaces respectively in order to obtain the coefficients in the corresponding parameter space; it can make full use of the identity information, expression information, texture information, lighting information, posture information, and transformation information in the face area image without introducing Other additional information achieves the purpose of integrated modeling of 3D face reconstruction and rendering projection.
  • the spatial mapping is performed on the facial feature map to obtain the parameter coefficients in the bilinear modeling layer, including:
  • Step S1421 Perform semantic space mapping on the facial feature map to obtain a facial feature vector
  • the facial feature map is N ⁇ C' ⁇ H' ⁇ W', where N represents the number of samples, C' represents the number of features, and H' ⁇ W' represents the size of the feature map.
  • Semantic space mapping is performed on the facial feature map x.
  • the F g (x) contains rich information describing the characteristic information of the face, including identity information, shape information, texture information, lighting information, posture information and transformation information.
  • the F g (x) after the semantic space mapping is a feature vector, that is, a face feature vector, represented by x'[N,C'].
  • Step S1422 Perform parameter space mapping on the facial feature vector to obtain parameter coefficients in the bilinear modeling layer.
  • a corresponding number of parameter space mapping layers are designed to map the facial feature vectors into corresponding parameter subspaces for optimization, and obtain coefficients of corresponding parameters.
  • F all (x) ⁇ id (F g(x) ), ⁇ exp (F g(x) ), ⁇ texture (F g(x) ), ⁇ light (F g(x) ), ⁇ pose ( F g(x) ), ⁇ transition (F g(x) ) ⁇
  • ⁇ id represents the learning of identity coefficients.
  • the same person should have similar coefficient representations, and different people have different coefficient representations.
  • the parameter size can be described as [C′,79]; ⁇ exp represents the learning of expression coefficients.
  • People should have similar coefficients, such as closing eyes, opening mouth, curling lips, etc. People with different expressions should have different coefficients. For example, closing eyes and opening eyes should be inconsistent in a specific shape.
  • the parameter size can be described as [C′, 46]; ⁇ texture represents the learning of texture coefficients, which are used to model real textures, and their parameters are described as [C′,79].
  • ⁇ light is used to estimate the current facial illumination, and its parameters are described as [C′,27], which represents the basis coefficients of 27 spherical harmonics.
  • ⁇ pose is used to estimate the pose of the human face and contains three sub-parameters yaw, pitch and roll, corresponding to roll, pitch and rotation respectively.
  • ⁇ transition is used to estimate the transformation of the three-dimensional face space, so it contains the transformation coefficients of the three axes of x, y, and z.
  • the decoupled modeling based on the bilinear modeling layer in the three-dimensional face reconstruction network can separately model the identity information and expression information, which is helpful for the scenario application of expression migration and drives Development of expression generation applications in related industries.
  • the spatial mapping in the parameter mapping layer is used to map the face area image with the three-dimensional face model parameters and rendering projection parameters, making full use of the feature information of the input face area image to provide more convenient acquisition of parameter coefficients. effective way.
  • the input of the three-dimensional face reconstruction network of this application is a face region image, and its output is a three-dimensional face model.
  • a framework corresponding to the weakly supervised learning mechanism is constructed for the three-dimensional face reconstruction network, and the training of the three-dimensional face reconstruction network is completed.
  • Figure 7 shows a schematic diagram of the principle of the framework corresponding to the weakly supervised learning mechanism used to train the three-dimensional face reconstruction network of the present application.
  • the three-dimensional face reconstruction network is trained according to this framework. Therefore, based on any of the above embodiments, please refer to Figure 8.
  • the training process of the three-dimensional face reconstruction network includes:
  • Step S2100 Obtain a single sample of the preprocessed face image data
  • the facial image data refers to image data with human face parts. This type of image data can be obtained through authorized live broadcast, on-demand and other legal channels. In one embodiment, it can be video stream data. Its video storage formats can be diverse, including MP4, avi, rmvb, x264, etc. In another embodiment, it may also be image data.
  • the image data content may include indoor, outdoor, news media, sports and entertainment and other scenes, including natural scenes.
  • the data storage format of the image data is inconsistent due to various data sources, including RGB24, YUV444, YUV420 and other formats.
  • the data storage formats are unified.
  • image data from different sources can be converted into a unified YUV420 format.
  • image data from different sources can also be converted into a unified RGB24 format, or YUV444 format, or others.
  • the above-mentioned preprocessing method is applied to the training and application of relevant technical methods in this application, unifying various data formats into one to improve the efficiency of technical applications without affecting its performance.
  • one face image with a face part is extracted as a single sample for subsequent processing.
  • Step S2200 Obtain the face area image, face key points and three-dimensional face model posture coefficients in the single sample;
  • the face area image, face key points and three-dimensional face model posture coefficients are extracted from the single sample in the same manner as in step S1200 above.
  • the face detection model pre-trained to the convergence state is used to detect the single sample, the face rectangular frame information is obtained, and the face area image is further obtained; and then the face key point detection model pre-trained to the convergence state is used to detect the face key point detection model.
  • the face area image to obtain face key point information; align the face area image S roi and the face key point information L n according to standard alignment parameters; finally, use Hough transform calculation on the face key points Obtain three-dimensional face pose information Y pose .
  • the face region image is used as the input of the three-dimensional face reconstruction network, and the face key points and three-dimensional face pose information are used to calculate the loss value.
  • Step S2300 Use the three-dimensional face reconstruction network to reconstruct and obtain a three-dimensional face model of the face area image, and obtain a face reconstruction image through rendering and projection into two dimensions;
  • the bilinear modeling layer of the three-dimensional face reconstruction network is used to perform decoupled modeling of identity information and expression information
  • the parameter mapping layer of the three-dimensional face reconstruction network is used to obtain the identity coefficient, expression coefficient, texture coefficient, and lighting Coefficients, attitude coefficients, transformation coefficients.
  • the identity coefficient and expression coefficient are used to reconstruct a three-dimensional face model to obtain the face area image.
  • the three-dimensional rendering and two-dimensional projection of the three-dimensional face model include the following operations: estimating the surface texture of the face, assuming in advance that the face is a Lambertian surface, and using spherical harmonics to approximate the scene lighting, which can be combined with the face surface discovery and Skin texture ⁇ texture (F g(x) ) to calculate the radiance of the vertex where ⁇ represents the spherical harmonic function basis function.
  • the three-dimensional rendering work of the three-dimensional face model can be completed on it, and then the camera system transformation of the face is performed, using the posture parameter ⁇ pose (F g(x) ) and the transformation parameter ⁇ transition (F g(x) ), Combined with the camera perspective model to perform translation and rotation changes on the three-dimensional face, it can be projected into a two-dimensional plane to obtain all the projection points L x of the face vertices, which can be expressed as [N v ,2], where 2 represents the x, y plane coordinate information.
  • the face projection has completed the relevant transformation from the world coordinate system to the pixel coordinate system, and it matches the relevant positions of the standard face key points. At this point, the projection of the three-dimensional face model into the two-dimensional plane is completed, and the reconstructed face image is obtained.
  • Step S2400 Calculate a reconstruction loss value based on the face area image and the face reconstruction image, and update the parameters of the three-dimensional face reconstruction network based on the reconstruction loss value;
  • the three-dimensional reconstruction loss function is a weighted sum of four sub-loss functions: the first sub-loss function is a perceptual loss function, used to minimize the face area image and the face reconstruction image
  • the second sub-loss function is the photometric loss function, which is used to enhance the shape and pixel-level alignment between the face area image and the face reconstruction image;
  • the third sub-loss function is the posture loss function , used to ensure higher accuracy of the posture;
  • the fourth sub-loss function is the reprojection loss function, used to optimize the accuracy of the projection point.
  • the weighted sum of the above sub-loss values is the reconstruction loss value of the three-dimensional face reconstruction network under the current iteration number, that is, the error L(x).
  • the relevant weights can be updated according to the back propagation mechanism of the neural network.
  • the updated weight part is mainly the weight of the space mapping in the parameter mapping layer in the three-dimensional face reconstruction network, that is, the semantic space mapping component and the parameter space mapping component.
  • the direction of the weight update is a direction that makes the error L(x) smaller.
  • Step S2500 Repeat the above operations until the preset termination condition is triggered to end the training, and obtain the three-dimensional face reconstruction network.
  • Training can be terminated until the training conditions reach the preset termination condition, indicating that the training has reached convergence.
  • the preset termination condition can be set by relevant technical personnel according to actual application scenario requirements. In one embodiment, it can be a constraint on the number of iterations, that is, training is terminated when the number of training times reaches a preset number. In another embodiment, it can be Loss value constraint, that is, when the reconstruction loss value reaches the preset minimum value during the iterative training process, Terminate training.
  • the weakly supervised learning mechanism based on a single face image can construct training data in large quantities at low cost, thereby effectively reducing the acquisition cost and labeling cost of training samples, which is beneficial to the rapid research and development of related technologies.
  • this method can decouple and obtain facial expression models for expression migration applications, such as film and television, animation, digital humans and other related fields, which has great practical application value and commercial value.
  • Calculating the reconstruction loss value based on the aligned face area image and the reconstructed face image includes:
  • Step S2410 Calculate a first loss value, which is used to minimize the error between the face area image and the face reconstruction image;
  • the first loss value is calculated based on the depth perception of the face area image and the face reconstruction image. That is, a neural network with mature perceptual capabilities is used to pre-extract the semantic features of the face area image and the face reconstruction image, and then calculate the correlation loss value based on the semantic features.
  • self-supervised modeling is first performed on the reconstructed face image.
  • a face recognition network pre-trained to a converged state is introduced to extract the top-level depth of the reconstructed face image and the face region image. feature.
  • the face recognition network can use mature neural network models in related technologies, and face recognition models such as VGGNet, FaceNet, and ArcFaceNet can be used for self-supervised training.
  • the ArcFaceNet network can be used, which has better effects.
  • the perceptual loss function can be expressed as:
  • the above similarity loss function is used to constrain the network model so that the reconstructed face is close to the real face, and the surface texture features and lighting parameters are optimized.
  • Step S2420 Calculate a second loss value, the second loss value is used to enhance the shape and pixel level alignment between the face area image and the face reconstruction image;
  • the first loss value implicitly constrains the approximate relationship of the face feature layer.
  • a second loss value is added to strengthen the face region image and the face reconstruction.
  • Shape and pixel level alignment between images which can be expressed as:
  • This constraint has a strong pixel-level constraint. Therefore, in one embodiment, a smaller weight w photo is given to prevent the network from falling into a local solution.
  • Step S2430 Calculate a third loss value.
  • the third loss value is used to ensure that the posture has higher accuracy
  • the first loss value implicitly constrains and optimizes the pose.
  • the third loss value is calculated.
  • ⁇ pose (F g(x) ) ⁇ R 3 is the posture coefficient obtained in the forward reasoning of the three-dimensional face reconstruction network, including roll angle, pitch angle and rotation angle
  • Y pose ⁇ R 3 represents It is the posture coefficient of the three-dimensional face model obtained in step S2200, and also includes the roll angle, pitch angle and rotation angle.
  • Step S2440 Calculate a fourth loss value, which is used to optimize the accuracy of projection points in two-dimensional projection;
  • the fourth loss value can also be used for model constraints.
  • face key point data extracted based on the sample and reprojected points after 3D rendering and 2D projection after 3D face reconstruction to construct reprojection error constraints.
  • the number of vertices is consistent with the number of two-dimensional facial key point detection.
  • Step S2450 Calculate a reconstruction loss value, which is a weighted fusion of the first loss value, the second loss value, the third loss value, and the fourth loss value.
  • Weighted fusion is performed based on the four sub-loss functions constructed in the above steps.
  • w percep , w pose and w proj represent the weights expressed as the first loss value, the third loss value and the fourth loss value respectively.
  • the weighted fusion reconstruction loss value calculation based on the first loss value, the second loss value, the third loss value, and the fourth loss value can more comprehensively constrain the three-dimensional face reconstruction network. All parameters obtained in the method are close to the real label values. At the same time, loss calculation and parameter update based on single samples can accelerate convergence and save training costs.
  • a three-dimensional face reconstruction device provided according to one aspect of the present application, a In the embodiment, it includes an image acquisition module 1100, a face detection module 1200, a face modeling module 1300, and a parameter mapping module 1400.
  • the image acquisition module 1100 is configured to acquire face image data and extract face images therein;
  • the face detection module 1200 is configured to perform key point detection on the face image to obtain a face region image of the area where the key points of the face are located;
  • the face modeling module 1300 is configured to use a three-dimensional model pre-trained to a converged state.
  • the bilinear modeling layer of the face reconstruction network performs bilinear modeling of facial identity and facial expression on the face region image to obtain a parameterized three-dimensional face model; the parameter mapping module 1400 is set to use
  • the parameter mapping layer of the three-dimensional face reconstruction network maps the face region image to the corresponding parameter coefficients in the parameterized three-dimensional face model, and the parameter coefficients include the identity corresponding to the face identity. coefficient and the expression coefficient corresponding to the facial expression.
  • the parameter mapping module 1400 includes: a coefficient acquisition unit configured to obtain target parameter coefficients required to constitute the parameterized three-dimensional face model, wherein the target parameter coefficients include prespecified The identity coefficient and the pre-specified expression coefficient; the expression migration unit is configured to migrate the target parameter coefficients to the three-dimensional face model of the corresponding digital person to obtain the three-dimensional face model of the digital person; the rendering projection unit is configured to transfer the three-dimensional face model of the digital person.
  • the three-dimensional face model of the digital human is rendered and projected into the two-dimensional image space to obtain the digital human image.
  • the face detection module 1200 includes: a face detection unit configured to detect face key points on the face image to obtain face area images and face key point information; standard alignment A unit configured to align the face key points with the standard face key points to obtain standard alignment parameters.
  • the standard face key points are corresponding face key points obtained by two-dimensional projection of a standard three-dimensional face model; face An alignment unit configured to align the face area image according to the standard alignment parameters.
  • the modeling projection module 1400 includes: a feature encoding unit configured to use the encoder in the three-dimensional face reconstruction network to perform feature extraction on the face area image to obtain a face feature map ; A spatial mapping unit configured to perform spatial mapping on the facial feature map to obtain parameter coefficients in the bilinear modeling layer.
  • the spatial mapping unit includes: a semantic space mapping subunit, which is configured to perform semantic space mapping on the facial feature map to obtain a facial feature vector; and a parameter space mapping subunit, which is configured to perform semantic space mapping on the facial feature map.
  • the facial feature vector is mapped in parameter space to obtain the parameter coefficients in the bilinear modeling layer.
  • the network training module includes: a sample acquisition unit configured to acquire a single sample of preprocessed face image data; a data acquisition unit configured to acquire a face region image in the single sample , face key points and three-dimensional face model posture coefficients; the reconstruction image unit is configured to use the three-dimensional face reconstruction network to reconstruct the three-dimensional face model to obtain the face area image, and then render and project it into two dimensions to obtain the human face model.
  • face reconstruction image a loss optimization unit configured to calculate a reconstruction loss value according to the face area image and the face reconstruction image, and perform the reconstruction loss on the three-dimensional face according to the reconstruction loss value.
  • the face reconstruction network performs parameter updates; the training repetition unit is set to repeat the above operations until the preset termination condition is triggered and the training ends, and the three-dimensional face reconstruction network is obtained.
  • the loss optimization unit includes: a first loss subunit configured to calculate a first loss value, the first loss value being used to minimize the relationship between the face region image and the face reconstruction Error between images; a second loss subunit configured to calculate a second loss value, the second loss value being used to enhance the shape and pixel-level alignment between the face region image and the face reconstruction image ;
  • the third loss subunit is configured to calculate a third loss value, the third loss value is used to ensure that the posture has higher accuracy;
  • the fourth loss subunit is configured to calculate a fourth loss value, the fourth loss value The value is used to optimize the accuracy of the projection points in the two-dimensional projection; the loss fusion subunit is set to calculate the reconstruction loss value, which is the first loss value, the second loss value, the third loss value, and the fourth loss value. Weighted fusion of values.
  • FIG. 11 a schematic diagram of the internal structure of the three-dimensional face reconstruction device.
  • the three-dimensional face reconstruction device includes a processor, a computer-readable storage medium, a memory and a network interface connected through a system bus.
  • the computer-readable non-volatile readable storage medium of the three-dimensional face reconstruction device stores an operating system, a database and computer-readable instructions.
  • the database can store information sequences, and the computer-readable instructions are processed by the processor.
  • the processor can be enabled to implement a three-dimensional face reconstruction method.
  • the processor of the three-dimensional face reconstruction device is used to provide computing and control capabilities to support the operation of the entire three-dimensional face reconstruction device.
  • Computer-readable instructions may be stored in the memory of the three-dimensional face reconstruction device. When executed by the processor, the computer-readable instructions may cause the processor to execute the three-dimensional face reconstruction method of the present application.
  • the network interface of the three-dimensional face reconstruction device is used to connect and communicate with the terminal.
  • FIG. 11 is only a block diagram of part of the structure related to the solution of the present application.
  • a specific three-dimensional face reconstruction device may include more or fewer components than shown in the figure. Or combine certain parts, or have different parts arrangements.
  • the processor is used to execute the specific functions of each module in Figure 10, and the memory stores program codes and various types of data required to execute the above modules or sub-modules.
  • the network interface is used to realize data transmission between user terminals or servers.
  • the non-volatile readable storage medium in this embodiment stores the program codes and data required to execute all modules in the three-dimensional face reconstruction device of the present application.
  • the server can call the server's program codes and data to execute the functions of all modules. .
  • This application also provides a non-volatile readable storage medium storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, they cause one or more processors to execute any embodiment of the application.
  • the steps of the 3D face reconstruction method are also provided.
  • the present application also provides a computer program product, which includes a computer program/instruction that implements the steps of the method described in any embodiment of the present application when executed by one or more processors.
  • the computer program can be stored in a non-volatile readable storage medium. , when the program is executed, it may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage media can be computer-readable storage media such as magnetic disks, optical disks, read-only memory (Read-Only Memory, ROM), or random access memory (Random Access Memory, RAM), etc.
  • this application can achieve three-dimensional face reconstruction.
  • the three-dimensional face reconstruction method uses a bilinear modeling layer to decouple the identity information and expression information in the face, thereby effectively separating the expression parameters.
  • realizing expression migration can greatly promote the application and development of related live broadcast, film and television and other industries;
  • the training method of the method is based on weakly supervised learning of a single image, which can greatly reduce the acquisition cost and labeling cost of training data. , conducive to large-scale application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

Sont divulgués un procédé, un appareil et un dispositif de reconstruction tridimensionnelle d'un visage, ainsi qu'un support et un produit. Le procédé comprend les étapes consistant à : acquérir des données d'images d'un visage et extraire une image du visage parmi les données d'images d'un visage ; effectuer une détection de points clés sur l'image du visage de façon à obtenir une image d'une zone du visage dans laquelle se trouvent des points clés du visage ; effectuer une modélisation bilinéaire d'une identité du visage et d'une expression du visage sur l'image de la zone du visage à l'aide d'une couche de modélisation bilinéaire d'un réseau de reconstruction tridimensionnelle du visage pré-entraîné selon un état de convergence de façon à obtenir un modèle tridimensionnel paramétré du visage ; et mapper l'image de la zone du visage en coefficients de paramètres correspondants dans le modèle tridimensionnel paramétré du visage à l'aide d'une couche de mappage des paramètres du réseau de reconstruction tridimensionnelle du visage.
PCT/CN2023/111005 2022-08-12 2023-08-03 Procédé, appareil et dispositif de reconstruction tridimensionnelle d'un visage, support et produit WO2024032464A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210969989.1A CN115330947A (zh) 2022-08-12 2022-08-12 三维人脸重建方法及其装置、设备、介质、产品
CN202210969989.1 2022-08-12

Publications (1)

Publication Number Publication Date
WO2024032464A1 true WO2024032464A1 (fr) 2024-02-15

Family

ID=83923644

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/111005 WO2024032464A1 (fr) 2022-08-12 2023-08-03 Procédé, appareil et dispositif de reconstruction tridimensionnelle d'un visage, support et produit

Country Status (2)

Country Link
CN (1) CN115330947A (fr)
WO (1) WO2024032464A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115330947A (zh) * 2022-08-12 2022-11-11 百果园技术(新加坡)有限公司 三维人脸重建方法及其装置、设备、介质、产品
CN115690327A (zh) * 2022-11-16 2023-02-03 广州大学 一种空频解耦的弱监督三维人脸重建方法
CN116228763B (zh) * 2023-05-08 2023-07-21 成都睿瞳科技有限责任公司 用于眼镜打印的图像处理方法及系统
CN116993948B (zh) * 2023-09-26 2024-03-26 粤港澳大湾区数字经济研究院(福田) 一种人脸三维重建方法、系统及智能终端
CN117237547B (zh) * 2023-11-15 2024-03-01 腾讯科技(深圳)有限公司 图像重建方法、重建模型的处理方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060001673A1 (en) * 2004-06-30 2006-01-05 Mitsubishi Electric Research Laboratories, Inc. Variable multilinear models for facial synthesis
CN103093490A (zh) * 2013-02-02 2013-05-08 浙江大学 基于单个视频摄像机的实时人脸动画方法
CN114241102A (zh) * 2021-11-11 2022-03-25 清华大学 基于参数化模型的人脸细节重建和编辑方法及装置
CN114742954A (zh) * 2022-04-27 2022-07-12 南京大学 一种构建大规模多样化人脸图片和模型数据对的方法
CN115330947A (zh) * 2022-08-12 2022-11-11 百果园技术(新加坡)有限公司 三维人脸重建方法及其装置、设备、介质、产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060001673A1 (en) * 2004-06-30 2006-01-05 Mitsubishi Electric Research Laboratories, Inc. Variable multilinear models for facial synthesis
CN103093490A (zh) * 2013-02-02 2013-05-08 浙江大学 基于单个视频摄像机的实时人脸动画方法
CN114241102A (zh) * 2021-11-11 2022-03-25 清华大学 基于参数化模型的人脸细节重建和编辑方法及装置
CN114742954A (zh) * 2022-04-27 2022-07-12 南京大学 一种构建大规模多样化人脸图片和模型数据对的方法
CN115330947A (zh) * 2022-08-12 2022-11-11 百果园技术(新加坡)有限公司 三维人脸重建方法及其装置、设备、介质、产品

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CAO, CHEN ET AL.: "FaceWarehouse: A 3D Facial Expression Database for Visual Computing", IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, vol. 20, no. 3, 31 March 2014 (2014-03-31), XP011543570, ISSN: 1077-2626, DOI: 10.1109/TVCG.2013.249 *

Also Published As

Publication number Publication date
CN115330947A (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
WO2024032464A1 (fr) Procédé, appareil et dispositif de reconstruction tridimensionnelle d'un visage, support et produit
CN110458939B (zh) 基于视角生成的室内场景建模方法
US11538216B2 (en) Dynamically estimating light-source-specific parameters for digital images using a neural network
CN109285215B (zh) 一种人体三维模型重建方法、装置和存储介质
US11748934B2 (en) Three-dimensional expression base generation method and apparatus, speech interaction method and apparatus, and medium
Park et al. Transformation-grounded image generation network for novel 3d view synthesis
Rematas et al. Novel views of objects from a single image
US9792725B2 (en) Method for image and video virtual hairstyle modeling
JP2022524891A (ja) 画像処理方法及び装置、電子機器並びにコンピュータプログラム
WO2022001236A1 (fr) Procédé et appareil de génération de modèle tridimensionnel, ainsi que dispositif informatique et support de stockage
CN112085835B (zh) 三维卡通人脸生成方法、装置、电子设备及存储介质
CN110458924B (zh) 一种三维脸部模型建立方法、装置和电子设备
WO2021063271A1 (fr) Procédé et système de reconstruction de modèle de corps humain, et support de stockage associé
WO2024007478A1 (fr) Procédé et système de collecte et de reconstruction de données de modélisation de corps humain tridimensionnel basés sur un seul téléphone mobile
EP3855386B1 (fr) Procédé, appareil, dispositif et support de stockage pour transformation de coiffure et produit programme informatique
WO2023066120A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
CN111754622B (zh) 脸部三维图像生成方法及相关设备
US20200118333A1 (en) Automated costume augmentation using shape estimation
CN115496862A (zh) 基于spin模型的实时三维重建方法和系统
CN111402403B (zh) 高精度三维人脸重建方法
CN115775300A (zh) 人体模型的重建方法、人体重建模型的训练方法及装置
Patterson et al. Landmark-based re-topology of stereo-pair acquired face meshes
Peng et al. Geometrical consistency modeling on b-spline parameter domain for 3d face reconstruction from limited number of wild images
CN117557699B (zh) 动画数据生成方法、装置、计算机设备和存储介质
CN116704097B (zh) 基于人体姿态一致性和纹理映射的数字化人形象设计方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851685

Country of ref document: EP

Kind code of ref document: A1