CN116363308A

CN116363308A - Human body three-dimensional reconstruction model training method, human body three-dimensional reconstruction method and equipment

Info

Publication number: CN116363308A
Application number: CN202310276181.XA
Authority: CN
Inventors: 郑喜民; 刘智方; 舒畅; 陈又新; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-06-30

Abstract

The application relates to the technical field of deep learning, in particular to a human body three-dimensional reconstruction model training method, a human body three-dimensional reconstruction method and equipment. The super-realistic human body reconstruction method based on the neural human body representation network and the MLP network simultaneously plays the advantages that the prior information of the parameterized model is rich, the obtained reconstruction result contains bone information with biological and anatomical significance, and the neural radiation field-surface deformation field method has high reconstruction precision, so that super-realistic and drivable digital human automatic reconstruction is possible. The method has high degree of automation, no manual intervention is needed in the whole process, the obtained reconstruction result is compatible with mainstream commercial animation production software/pipeline, the cost of super-realistic virtual digital human production is greatly reduced, and the efficiency of accurately constructing the dynamic human body model is improved. The generated human model can be used for virtual customer service figure images when financial products are purchased or after-sales service is answered, and the cost of manufacturing the virtual human model by enterprises is obviously reduced.

Description

Human body three-dimensional reconstruction model training method, human body three-dimensional reconstruction method and equipment

Technical Field

The application relates to the technical field of deep learning, in particular to a human body three-dimensional reconstruction model training method, a human body three-dimensional reconstruction method and related equipment.

Background

The technology of meta space, virtual reality/augmented reality and the like has been rapidly developed in recent years, and the fields of virtual anchor, digital avatar, movie, game production and the like all show great development potential, and the market scale of related industries is steadily improved. The three-dimensional digital person is used as an important component of the virtual world and a main body of virtual interaction, and the fields of human body three-dimensional reconstruction, digital person driving and the like closely related to the three-dimensional digital person are widely focused by the industry and the academic world, so that the three-dimensional digital person is a common research hotspot in the fields of computer vision, computer graphics and the like.

Currently, the method for acquiring the commercial high-quality human body model mainly comprises the following steps: the method can acquire information such as a fine human body model, even surface materials and the like, but has high cost. With the development of deep learning technology, various sparse view reconstruction methods are greatly developed, and a parameterized model-based method and an implicit function field-based method are rapidly developed. However, such methods still have some problems: implicit function field methods tend to reconstruct static human bodies and are difficult to use directly in commercial animation pipelines; the parameterized model has limited characterization capability and only a bare model can be obtained. The generated human model can be used for virtual customer service figure images when financial products are purchased or after-sales service is answered, and the cost of manufacturing the virtual human model by enterprises is obviously reduced.

Disclosure of Invention

The main purpose of the application is to provide a human body three-dimensional reconstruction model training method, a human body three-dimensional reconstruction method and related equipment, and aims to solve the problem that a dynamic human body model cannot be accurately constructed in the prior art.

In order to achieve the above object, the present application proposes a training method for a three-dimensional reconstruction model of a human body, including:

acquiring a human body model video, wherein the human body model video comprises different human bodies and different human body postures, generating a plurality of data pairs of human body skeleton skin models by using a STAR model, and performing first training on a neural human body characterization network according to the data pairs to obtain first human body posture parameters;

performing second training on the neural human body representation network after the first training according to the human body model video to obtain output of the neural human body representation network, wherein partial weights of the neural human body representation network after the first training are frozen, and the partial weights are weights corresponding to human body posture parameters obtained by the first training of the neural human body representation network;

training an MLP network according to the output of the neural human body representation network and the human body model video;

And after the neural human body characterization network and the MLP network are trained, a human body three-dimensional reconstruction model is obtained.

The application also provides a human body three-dimensional reconstruction method, which comprises the following steps:

acquiring a human RGB image;

according to the human RGB image, obtaining human posture parameters by an HMR method;

inputting the human body posture parameters into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map, wherein the human body three-dimensional reconstruction model is a model trained by any one of the methods.

The application also provides a human body three-dimensional reconstruction model training device, which comprises:

the human body model video comprises different human bodies and different human body postures, a STAR model is used for generating a plurality of data pairs of human body skeleton skin models, and a neural human body characterization network is trained for the first time according to the data pairs to obtain first human body posture parameters;

the second training module is used for carrying out second training on the neural human body representation network after the first training according to the human body model video to obtain the output of the neural human body representation network, wherein partial weights of the neural human body representation network after the first training are frozen, and the partial weights are weights corresponding to human body posture parameters obtained by the first training of the neural human body representation network;

The third training module is used for training the MLP network according to the output of the neural human body representation network and the human body model video;

and the human body three-dimensional reconstruction model generation module is used for obtaining a human body three-dimensional reconstruction model after the neural human body characterization network and the MLP network are trained.

The application also provides a human body three-dimensional reconstruction device, which comprises:

the image acquisition module is used for acquiring human RGB images;

the HMR processing module is used for obtaining human body posture parameters through an HMR method according to the human body RGB image;

the human body three-dimensional reconstruction map generation module is used for inputting the human body posture parameters into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map, wherein the human body three-dimensional reconstruction model is a model trained by any one of the methods.

The present application also provides a computer device comprising a memory and a processor, said memory having stored therein a computer program, characterized in that the processor, when executing said computer program, implements the steps of any of the methods described above.

The present application also provides a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of any of the methods described above.

According to the human body three-dimensional reconstruction model training method, the human body three-dimensional reconstruction method and the human body three-dimensional reconstruction device, the super-realistic human body reconstruction method based on the neural human body characterization network and the MLP network is used for simultaneously playing the advantages that the prior information of the parameterized model is rich, the obtained reconstruction result contains bone information with biological and anatomical significance, and the neural radiation field-surface deformation field method has the advantage of high reconstruction precision, so that super-realistic and drivable digital human automatic reconstruction is possible. The method has high degree of automation, no manual intervention is needed in the whole process, the obtained reconstruction result is compatible with mainstream commercial animation production software/pipeline, the cost of super-realistic virtual digital human production is greatly reduced, and the efficiency of accurately constructing the dynamic human body model is improved. The generated human model can be used for virtual customer service figure images when financial products are purchased or after-sales service is answered, and the cost of manufacturing the virtual human model by enterprises is obviously reduced.

Drawings

FIG. 1 is a schematic diagram showing steps of a training method for a three-dimensional reconstruction model of a human body according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating steps of a three-dimensional reconstruction method of a human body according to an embodiment of the present application;

FIG. 3 is a block diagram of the overall structure of a training device for a three-dimensional reconstruction model of a human body in an embodiment of the present application;

FIG. 4 is a block diagram of the overall structure of a three-dimensional reconstruction apparatus for human body according to an embodiment of the present application;

fig. 5 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Referring to fig. 1, in an embodiment of the present application, a training method for a three-dimensional reconstruction model of a human body is provided, including steps S1 to S4, specifically:

s1, acquiring human body model videos, wherein the human body model videos comprise different human bodies and different human body postures, generating a plurality of data pairs of human body skeleton skin models by using a STAR model, and performing first training on a neural human body characterization network according to the data pairs to obtain first human body posture parameters;

s2, performing second training on the neural human body representation network after the first training according to the human body model video to obtain output of the neural human body representation network, wherein partial weights of the neural human body representation network after the first training are frozen, and the partial weights are weights corresponding to human body posture parameters obtained by the first training of the neural human body representation network;

S3, training an MLP network according to the output of the neural human body representation network and the human body model video;

s4, after the neural human body characterization network and the MLP network are trained, a human body three-dimensional reconstruction model is obtained.

Specifically, for step S1, the human model video includes a plurality of human bodies, and a plurality of human body poses corresponding to the plurality of human bodies. In the training process, human body model videos are used, training is carried out according to each frame of image in the human body model videos, and the model can learn action continuity of different human body postures. The STAR model (A Sparse Trained Articulated Human Body Regressor) sparsely trained joint human regressor is an improved version of the SMPL model, the SMPL Model (SMPL) Multi-Person Linear Model, the Multi-human skin linear model is a parameterized human model, and the method is a human modeling method proposed by Marp. This approach can simulate the protrusion and depression of a person's muscles during limb movements. Therefore, the surface distortion of a human body in the motion process can be avoided, the appearance of muscle stretching and shrinking motion of the human body can be accurately described, and the STAR model has preset human skeleton data and skin data. The object in the SMPL model is to define the shape of the human body such as fat, thin, height and posture of human body motion. The neural human body characterization network is trained for the first time through human skeleton data and skin data preset in the STAR model, so that reconstruction loss from parameters to a human body directed distance field is minimized, and mapping from the parameters to human skeleton is optimized.

Specifically, for step S2, after the neural human body characterization network completes the first training, the human body posture parameter is obtained, and the weight of the obtained human body posture parameter is frozen. And (5) further training other parameters in the neural human body representation network according to the human body model video.

Specifically, for steps S3 and S4, training an MLP network according to the output of the neural human body characterization network and the human body model video to obtain a new view synthesis loss, a directed distance field regularization loss and a surface deformation field regularization loss. The new view synthesis includes the synthesis of phantom views from different perspectives. Regularization is used for preventing overfitting, further enhancing generalization capability of the model, preventing the model from being effective only on the training set and not being effective enough on the test set, and improving performance or performance of the model trained by using training data on the test set. And after the neural human body characterization network and the MLP network are trained, a human body three-dimensional reconstruction model is obtained.

In one embodiment, the training the neural human body representation network after the first training for the second time according to the human body model video to obtain the output of the neural human body representation network includes:

S201, obtaining a second human body posture parameter through an HMR method according to the human body model video;

s202, training the neural human body representation network after the first training for the second time according to the second human body posture parameters to obtain the output of the neural human body representation network.

Specifically, for steps S201 and S202, the human body model video is used in the training process, training is performed according to each frame of image in the human body model video, and each frame of image in the human body model video is processed by using the HMR (Human Mesh Recovery, HMR) method, so that the model can learn the action continuity of different human body postures. The HMR method adopts an end-to-end human body posture and morphology countermeasure learning network, and realizes the learning from human body RGB images to 3D human bodies. Pairs of 2D-to-3D supervision information are not needed, no intermediate 2D keypoint detection is needed, but directly from pixel to grid. The second body posture parameters include posture parameters and body shape parameters. The second human body posture parameter performs second training on the neural human body representation network after the first training to obtain the output of the neural human body representation network, wherein the output of the neural human body representation network comprises a directed distance field and human bones.

In one embodiment, the step of generating a plurality of data pairs of a human skeleton skin model by using the STAR model, and performing first training on the neural human body characterization network according to the data pairs to obtain first human body posture parameters includes:

s301, acquiring a bone-key point real value and a human body projection real value;

s302, generating a plurality of data pairs of human skeleton skin models by using a STAR model, and obtaining skeleton-key point predicted values and human projection predicted values through a neural human body characterization network;

s303, calculating the loss of the bone-key point true value and the bone-key point predicted value to obtain a bone-key point loss;

s304, calculating the loss of the human projection true value and the human projection predicted value to obtain human projection loss;

and S305, optimizing parameters of a neural human body representation network according to the skeleton-key point loss and the human body projection loss to obtain the first human body posture parameters.

Specifically, for steps S301, S302, S303, S304, and S305, a STAR model is used to generate a plurality of data pairs of human bone skin models, and after the data pairs are processed through a neural human body characterization network, a bone-key point predicted value and a human projection predicted value are obtained. And respectively calculating the loss of the bone-key point predicted value and the human projection predicted value with the bone-key point true value and the human projection true value, optimizing a neural human body representation network according to the calculated loss function, minimizing the reconstruction loss from the parameters to the human directed distance field, simultaneously optimizing the mapping from the parameters to the human bones, and obtaining the first human body posture parameters.

In one embodiment, the step S3 of training the MLP network according to the output of the neural human body characterization network and the human body model video includes:

s401, performing new view synthesis loss processing, directed distance field regularization loss processing and surface deformation field regularization loss processing according to the output of the neural human body characterization network and the human body model video;

s402, optimizing the MLP network according to the new view synthesis loss, the directed distance field regularization loss and the surface deformation field regularization loss.

Specifically, for steps S401 and S402, new view synthesis includes synthesis of human model views of different perspectives. The purpose of regularization is to limit the parameters too much or too large, avoiding the model to be more complex. It is therefore necessary to add some extra penalty term, i.e. regularization term, to the objective function. Adding penalty terms can be seen as making some restrictions on certain parameters in the loss function, which can be divided into: l0 norm penalty, L1 norm penalty (parameter sparsity penalty), L2 norm penalty (weight decay penalty). L0 norm penalty: to prevent overfitting, we can limit the weight w of the higher order part to 0, which is equivalent to converting from higher order form to lower order. To achieve this, the most intuitive approach is to limit the number of w, but such conditions are NP-hard problems and are very difficult to solve. Therefore L1, L2 regularization is often used in machine learning. The L1 regularization term is also known as Lasso, and the L2 regularization parameter is also known as Ridge. L1 norm: the sum of the absolute values of the elements in the weight vector w, L1 regularization can produce a sparse weight matrix, i.e., a sparse model, which can be used for feature selection. L2 norm: the square sum of each element in the weight vector w is then square root calculated, and L2 regularization can prevent the model from being over fitted; to some extent, L1 may also prevent overfitting. The directed distance field regularization loss optimization may more accurately represent the space of the three-dimensional mannequin. The regularization optimization of the surface deformation field can ensure the time sequence consistency. The MLP network includes a local implicit surface regression network and a color texture network.

Referring to fig. 2, in an embodiment of the present application, a three-dimensional reconstruction method for a human body is provided, including steps A1-A3, specifically:

a1, acquiring human RGB images;

a2, obtaining human body posture parameters through an HMR method according to the human body RGB image;

a3, inputting the human body posture parameters into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map, wherein the human body three-dimensional reconstruction model is a model trained by any one of the methods.

Specifically, for steps A1, A2, and A3, the human RGB image is a color image containing a human model. The data source used in the actual use process is not the data pair of the human skeleton skin model preset in the STAR model, so that the human RGB image needs to be processed by the HMR method. The HMR method adopts an end-to-end human body posture and morphology countermeasure learning network, and realizes the learning from human body RGB images to 3D human bodies. Pairs of 2D-to-3D supervision information are not needed, no intermediate 2D keypoint detection is needed, but directly from pixel to grid. The human body posture parameters include posture parameters and body shape parameters. Inputting the human body posture parameters into a trained human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map. The efficiency of accurately constructing the dynamic human body model is improved, and the cost of super-realistic virtual digital human production is greatly reduced. The generated human model can be used for virtual customer service figure images when financial products are purchased or after-sales service is answered, and the cost of manufacturing the virtual human model by enterprises is obviously reduced.

In one embodiment, the step of inputting the human body posture parameter into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map includes:

a301, inputting the human body posture parameters into a neural human body characterization network in the human body three-dimensional reconstruction model to obtain an initial directed distance field S ₀ And human bones;

a302, any point p in the setting spaceAccording to the initial directed distance field S ₀ The human skeleton is subjected to MLP network to obtain p-point directed distance and p-point color texture to form the human three-dimensional reconstruction map, wherein the space is the space where the human model is located, and the directed distance is an initial directed distance field S ₀ Directed distance of one of them.

Specifically, for steps a301 and a302, inputting the human body posture parameters into a neural human body characterization network to obtain an initial directed distance field S ₀ And human bones. Distance Field, a data set that identifies Distance parameters to a specified location. A directed distance field Signed distance field is a distance field whose direction is identified by a sign in addition to the distance value recorded. The directed distance is an initial directed distance field S ₀ Directed distance of one of them. And setting any point p in a space, wherein the space is the space where the human body model is located. Human bones obtained through treatment of the neural human body characterization network are used for determining the quantity K of the human bones. According to the number K of human bones, the surface deformation field is generated to contain 2*K MLP networks. MLP (Multi-Layer Perceptron), a Multi-Layer Perceptron, is an artificial neural network that tends to structure, mapping a set of input vectors to a set of output vectors. The MLP can be seen as a directed graph, consisting of multiple layers of nodes, each layer being fully connected to the next. Except for the input nodes, each node is a neuron (or processing unit) with a nonlinear activation function. A supervised learning method called a back propagation algorithm is often used to train the MLP. MLP is popularization of the perceptron, has overcome the perceptron and can't realize the shortcoming to the unable fractional data identification of linearity. The K MLP networks are K local implicit surface regression networks, and for any point p in the space where the human body model is located, the position of p in a skeleton coordinate system is calculated to obtain a corresponding directed distance. The skeletal coordinate system is defined as: and the right hand coordinate system is characterized in that the center is at a skeleton center point, the x axis is consistent with the x axis pointing of a human model root node under T-phase, and the y axis is pointed to a child node from a father joint point. The K MLP networks are color texture networks and are used for determining the color of any point according to the RGB image. By MLP Local implicit surface regression network of network, combining skeleton coordinate system to obtain p-point directional distance S ₁ (p)＝min(Local_SDF _j ) J is the bone serial number. By this step, the position of the p-point can be accurately determined. Meanwhile, a measurement of p points in the human body or outside (p) =sigmoid (SDF (p)) can be obtained and used for position determination and loss calculation when the subsequently generated human body model is applied to other models.

In one embodiment, any point p in the setting space is determined according to the initial directed distance field S ₀ And the human skeleton, through MLP network, get p point color texture, form the step of the three-dimensional reconstruction map of the said human, comprising:

a401, according to the human body posture parameter, the directional distance field S ₀ And the human skeleton, using MarchingCube algorithm to obtain the human surface;

a402, obtaining p points corresponding to color textures I (p) on an RGB image according to the human body surface, and obtaining p point color textures through an MLP network to form the human body three-dimensional reconstruction map.

Specifically, for steps A401 and A402, the initial directed distance S of the p-point ₀ (p) using MarchingCube algorithm, based on human body posture parameters, directed distance field S ₀ And human skeleton to obtain human surface, obtain p point corresponding color texture I (p) on RGB image, obtain p point color texture by combining skeleton coordinate system and color texture network of MLP network

j is the skeleton number corresponding to the p point, i is the skeleton number, and localSDF _j For p-point directed distance, localSDF _i Is the directed distance of the bone sequence number i. By this step, the color of the p-point can be accurately determined. The Marching Cubes algorithm is a classical algorithm in the face display algorithm, which is also called "iso-face extraction". Essentially, a series of two-dimensional slice data is regarded as a three-dimensional data field from which substances with certain threshold values are extracted and connected in a topological form into triangular patches.

Referring to fig. 3, a structural block diagram of a training device for three-dimensional reconstruction model of human body according to an embodiment of the present application, the device includes:

a first training module 100, configured to obtain a human model video, where the human model video includes different human bodies and different human body postures, generate a plurality of data pairs of human skeleton skin models by using a STAR model, and perform first training on a neural human body characterization network according to the data pairs to obtain first human body posture parameters;

the second training module 200 is configured to perform a second training on the neural human body representation network after the first training according to the human body model video to obtain an output of the neural human body representation network, where a part of weights of the neural human body representation network after the first training are frozen, and the part of weights are weights corresponding to human body posture parameters obtained by the first training of the neural human body representation network;

A third training module 300 for training an MLP network based on the output of the neural human body characterization network and the human body model video;

the human body three-dimensional reconstruction model generation module 400 is used for obtaining a human body three-dimensional reconstruction model after the neural human body characterization network and the MLP network are trained.

Specifically, for the data first training module 100, the mannequin video includes a plurality of human bodies, and a plurality of human body gestures corresponding to the plurality of human bodies. In the training process, human body model videos are used, training is carried out according to each frame of image in the human body model videos, and the model can learn action continuity of different human body postures. The STAR model is an improved version of the SMPL model, which is a parameterized mannequin and is a mannequin method proposed by Marep. This approach can simulate the protrusion and depression of a person's muscles during limb movements. Therefore, the surface distortion of a human body in the motion process can be avoided, the appearance of muscle stretching and shrinking motion of the human body can be accurately described, and the STAR model has preset human skeleton data and skin data. The object in the SMPL model is to define the shape of the human body such as fat, thin, height and posture of human body motion. The neural human body characterization network is trained for the first time through human skeleton data and skin data preset in the STAR model, so that reconstruction loss from parameters to a human body directed distance field is minimized, and mapping from the parameters to human skeleton is optimized.

Specifically, for the second training module 200, after the neural human body characterization network completes the first training, the human body posture parameters are obtained, and the weights of the obtained human body posture parameters are frozen. And (5) further training other parameters in the neural human body representation network according to the human body model video.

Specifically, for the third training module 300 and the human body three-dimensional reconstruction model generating module 400, training the MLP network according to the output of the neural human body characterization network and the human body model video to obtain a new view synthesis loss, a directed distance field regularization loss and a surface deformation field regularization loss. The new view synthesis includes the synthesis of phantom views from different perspectives. Regularization is used for preventing overfitting, further enhancing generalization capability of the model, preventing the model from being effective only on the training set and not being effective enough on the test set, and improving performance or performance of the model trained by using training data on the test set. And after the neural human body characterization network and the MLP network are trained, a human body three-dimensional reconstruction model is obtained.

In one embodiment, the human body three-dimensional reconstruction model training apparatus further includes:

the first training sub-module is used for obtaining second human body posture parameters through an HMR method according to the human body model video; and training the neural human body representation network after the first training for the second time according to the second human body posture parameters to obtain the output of the neural human body representation network.

Specifically, for the first training sub-module, the human body model video is used in the training process, training is performed according to each frame of image in the human body model video, and each frame of image in the human body model video is processed by the HMR method, so that the model can learn action continuity of different human body postures. The HMR method adopts an end-to-end human body posture and morphology countermeasure learning network, and realizes the learning from human body RGB images to 3D human bodies. Pairs of 2D-to-3D supervision information are not needed, no intermediate 2D keypoint detection is needed, but directly from pixel to grid. The second body posture parameters include posture parameters and body shape parameters. The second human body posture parameter performs second training on the neural human body representation network after the first training to obtain the output of the neural human body representation network, wherein the output of the neural human body representation network comprises a directed distance field and human bones.

the second training submodule is used for acquiring a skeleton-key point true value and a human body projection true value; generating a plurality of data pairs of human bone skin models by using a STAR model, and obtaining a bone-key point predicted value and a human projection predicted value through a neural human body characterization network; calculating the loss of the bone-key point true value and the bone-key point predicted value to obtain a bone-key point loss; calculating the loss of the human projection true value and the human projection predicted value to obtain human projection loss; and optimizing parameters of a neural human body representation network according to the bone-key point loss and the human body projection loss to obtain the first human body posture parameters.

Specifically, for the second training sub-module, a STAR model is used to generate a plurality of data pairs of human skeleton skin models, and the data pairs are processed through a neural human body characterization network to obtain skeleton-key point predicted values and human projection predicted values. And respectively calculating the loss of the bone-key point predicted value and the human projection predicted value with the bone-key point true value and the human projection true value, optimizing a neural human body representation network according to the calculated loss function, minimizing the reconstruction loss from the parameters to the human directed distance field, simultaneously optimizing the mapping from the parameters to the human bones, and obtaining the first human body posture parameters.

the third training sub-module is used for carrying out new view synthesis loss processing, directed distance field regularization loss processing and surface deformation field regularization loss processing according to the output of the neural human body representation network and the human body model video; and optimizing the MLP network according to the new view synthesis loss, the directed distance field regularization loss and the surface deformation field regularization loss.

Specifically, for the third training sub-module, the new view synthesis includes synthesis of the mannequin views from different perspectives. The purpose of regularization is to limit the parameters too much or too large, avoiding the model to be more complex. It is therefore necessary to add some extra penalty term, i.e. regularization term, to the objective function. Adding penalty terms can be seen as making some restrictions on certain parameters in the loss function, which can be divided into: l0 norm penalty, L1 norm penalty (parameter sparsity penalty), L2 norm penalty (weight decay penalty). L0 norm penalty: to prevent overfitting, we can limit the weight w of the higher order part to 0, which is equivalent to converting from higher order form to lower order. To achieve this, the most intuitive approach is to limit the number of w, but such conditions are NP-hard problems and are very difficult to solve. Therefore L1, L2 regularization is often used in machine learning. The L1 regularization term is also known as Lasso, and the L2 regularization parameter is also known as Ridge. L1 norm: the sum of the absolute values of the elements in the weight vector w, L1 regularization can produce a sparse weight matrix, i.e., a sparse model, which can be used for feature selection. L2 norm: the square sum of each element in the weight vector w is then square root calculated, and L2 regularization can prevent the model from being over fitted; to some extent, L1 may also prevent overfitting. The directed distance field regularization loss optimization may more accurately represent the space of the three-dimensional mannequin. The regularization optimization of the surface deformation field can ensure the time sequence consistency. The MLP network includes a local implicit surface regression network and a color texture network.

Referring to fig. 4, a block diagram of a three-dimensional reconstruction apparatus for human body according to an embodiment of the present application, the apparatus includes:

an image acquisition module 500 for acquiring RGB images of a human body;

the HMR processing module 600 is configured to obtain human body posture parameters according to the human body RGB image by using an HMR method;

the human body three-dimensional reconstruction map generating module 700 is configured to input the human body posture parameter into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map, where the human body three-dimensional reconstruction model is a model trained by any one of the methods described above.

Specifically, for the image acquisition module 500, the HMR processing module 600, and the human three-dimensional reconstruction map generation module 700, the human RGB image is a color image containing a human model. The data source used in the actual use process is not the data pair of the human skeleton skin model preset in the STAR model, so that the human RGB image needs to be processed by the HMR method. The HMR method adopts an end-to-end human body posture and morphology countermeasure learning network, and realizes the learning from human body RGB images to 3D human bodies. Pairs of 2D-to-3D supervision information are not needed, no intermediate 2D keypoint detection is needed, but directly from pixel to grid. The human body posture parameters include posture parameters and body shape parameters. Inputting the human body posture parameters into a trained human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map. The efficiency of accurately constructing the dynamic human body model is improved, and the cost of super-realistic virtual digital human production is greatly reduced. The generated human model can be used for virtual customer service figure images when financial products are purchased or after-sales service is answered, and the cost of manufacturing the virtual human model by enterprises is obviously reduced.

In one embodiment, the three-dimensional reconstruction device for human body further includes:

an arbitrary point determining module for inputting the human body posture parameters into a neural human body characterization network in the human body three-dimensional reconstruction model to obtain an initial directed distance field S ₀ And human bones; setting any point p in the space according to the initial directed distance field S ₀ The human skeleton is subjected to MLP network to obtain p-point directed distance and p-point color texture to form the human three-dimensional reconstruction map, wherein the space is the space where the human model is located, and the directed distance is an initial directed distance field S ₀ Directed distance of one of them.

Specifically, for any point determining module, inputting the human body posture parameters into a neural human body characterization network to obtainTo an initial directed distance field S ₀ And human bones. Distance Field, a data set that identifies Distance parameters to a specified location. A directed distance field Signed distance field is a distance field whose direction is identified by a sign in addition to the distance value recorded. The directed distance is an initial directed distance field S ₀ Directed distance of one of them. And setting any point p in a space, wherein the space is the space where the human body model is located. Human bones obtained through treatment of the neural human body characterization network are used for determining the quantity K of the human bones. According to the number K of human bones, the surface deformation field is generated to contain 2*K MLP networks. MLP (Multi-Layer Perceptron), a Multi-Layer Perceptron, is an artificial neural network that tends to structure, mapping a set of input vectors to a set of output vectors. The MLP can be seen as a directed graph, consisting of multiple layers of nodes, each layer being fully connected to the next. Except for the input nodes, each node is a neuron (or processing unit) with a nonlinear activation function. A supervised learning method called a back propagation algorithm is often used to train the MLP. MLP is popularization of the perceptron, has overcome the perceptron and can't realize the shortcoming to the unable fractional data identification of linearity. The K MLP networks are K local implicit surface regression networks, and for any point p in the space where the human body model is located, the position of p in a skeleton coordinate system is calculated to obtain a corresponding directed distance. The skeletal coordinate system is defined as: and the right hand coordinate system is characterized in that the center is at a skeleton center point, the x axis is consistent with the x axis pointing of a human model root node under T-phase, and the y axis is pointed to a child node from a father joint point. The K MLP networks are color texture networks and are used for determining the color of any point according to the RGB image. Obtaining p-point directional distance S by combining a local implicit surface regression network of an MLP network and a bone coordinate system ₁ (p)＝min(Local_SDF _j ) J is the bone serial number. By this step, the position of the p-point can be accurately determined. Meanwhile, a measurement of p points in the human body or outside (p) =sigmoid (SDF (p)) can be obtained and used for position determination and loss calculation when the subsequently generated human body model is applied to other models.

a color determining module for determining the directional distance field S according to the human body posture parameters ₀ And the human skeleton, using MarchingCube algorithm to obtain the human surface; and obtaining p-point color textures on the RGB image according to the human body surface, and obtaining p-point color textures through an MLP network to form the human body three-dimensional reconstruction map.

Specifically, for the color determination module, an initial directed distance S of the p-point ₀ (p) using MarchingCube algorithm, based on human body posture parameters, directed distance field S ₀ And human skeleton to obtain human surface, obtain p point corresponding color texture I (p) on RGB image, obtain p point color texture by combining skeleton coordinate system and color texture network of MLP network

j is the skeleton number corresponding to the p point, i is the skeleton number, lodalSDF _j For p-point directed distance, localSDF _i Is the directed distance of the bone sequence number i. By this step, the color of the p-point can be accurately determined. The Marching Cubes algorithm is a classical algorithm in the face display algorithm, which is also called "iso-face extraction". Essentially, a series of two-dimensional slice data is regarded as a three-dimensional data field from which substances with certain threshold values are extracted and connected in a topological form into triangular patches.

Referring to fig. 5, a computer device is further provided in an embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing the human body three-dimensional reconstruction model training method operation data, the human body three-dimensional reconstruction model operation data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for adaptively optimizing a wake-up effect according to any of the above embodiments.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device to which the present application is applied.

An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for adaptively optimizing a wake-up effect. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. A human body three-dimensional reconstruction model training method, which is characterized by comprising the following steps:

2. The method for training a three-dimensional reconstruction model of a human body according to claim 1, wherein the training the neural human body characterization network for the second time after the first training according to the human body model video to obtain the output of the neural human body characterization network comprises:

obtaining a second human body posture parameter through an HMR method according to the human body model video;

and training the neural human body representation network after the first training for the second time according to the second human body posture parameters to obtain the output of the neural human body representation network.

3. The method for training a three-dimensional reconstruction model of a human body according to claim 1, wherein the generating a plurality of data pairs of a human body bone skin model by using a STAR model, performing a first training on a neural human body characterization network according to the data pairs, and obtaining first human body posture parameters comprises:

Acquiring a bone-key point true value and a human projection true value;

generating a plurality of data pairs of human bone skin models by using a STAR model, and obtaining a bone-key point predicted value and a human projection predicted value through a neural human body characterization network;

calculating the loss of the bone-key point true value and the bone-key point predicted value to obtain a bone-key point loss;

calculating the loss of the human projection true value and the human projection predicted value to obtain human projection loss;

and optimizing parameters of a neural human body representation network according to the bone-key point loss and the human body projection loss to obtain the first human body posture parameters.

4. The method of claim 1, wherein training the MLP network based on the output of the neural human body characterization network and the human body model video comprises:

performing new view synthesis loss processing, directed distance field regularization loss processing and surface deformation field regularization loss processing according to the output of the neural human body characterization network and the human body model video;

and optimizing the MLP network according to the new view synthesis loss, the directed distance field regularization loss and the surface deformation field regularization loss.

5. A method for three-dimensional reconstruction of a human body, comprising:

acquiring a human RGB image;

inputting the human body posture parameters into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map, wherein the human body three-dimensional reconstruction model is a model trained based on the method of any one of claims 1-4.

6. The method for three-dimensional reconstruction of a human body according to claim 5, wherein the inputting the human body posture parameter into the three-dimensional reconstruction model of a human body to obtain a three-dimensional reconstruction map of a human body comprises:

inputting the human body posture parametersThe neural human body representation network in the human body three-dimensional reconstruction model obtains an initial directed distance field S ₀ And human bones;

setting any point p in the space according to the initial directed distance field S ₀ The human skeleton is subjected to MLP network to obtain p-point directed distance and p-point color texture to form the human three-dimensional reconstruction map, wherein the space is the space where the human model is located, and the directed distance is an initial directed distance field S ₀ Directed distance of one of them.

7. The method according to claim 6, wherein the initial directed distance field S is used to reconstruct a human body at any point p in the set space ₀ And the human skeleton, through MLP network, get p point color texture, form the three-dimensional reconstruction map of the human, including:

according to the human body posture parameter and the directed distance field S ₀ And the human skeleton, using MarchingCube algorithm to obtain the human surface;

and obtaining p-point color textures on the RGB image according to the human body surface, and obtaining p-point color textures through an MLP network to form the human body three-dimensional reconstruction map.

8. A human three-dimensional reconstruction model training apparatus, the apparatus comprising:

9. A human three-dimensional reconstruction apparatus, the apparatus comprising:

the image acquisition module is used for acquiring human RGB images;

the human body three-dimensional reconstruction map generating module is used for inputting the human body posture parameters into a human body three-dimensional reconstruction model to obtain a human body three-dimensional reconstruction map, wherein the human body three-dimensional reconstruction model is a model trained based on the method of any one of claims 1-4.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.