CN116863078A - Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium - Google Patents

Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium Download PDF

Info

Publication number
CN116863078A
CN116863078A CN202310870355.5A CN202310870355A CN116863078A CN 116863078 A CN116863078 A CN 116863078A CN 202310870355 A CN202310870355 A CN 202310870355A CN 116863078 A CN116863078 A CN 116863078A
Authority
CN
China
Prior art keywords
human body
dimensional
image
normal
texture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310870355.5A
Other languages
Chinese (zh)
Inventor
苏明兰
张超颖
刘巧俏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Technology Innovation Center
China Telecom Corp Ltd
Original Assignee
China Telecom Technology Innovation Center
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Technology Innovation Center, China Telecom Corp Ltd filed Critical China Telecom Technology Innovation Center
Priority to CN202310870355.5A priority Critical patent/CN116863078A/en
Publication of CN116863078A publication Critical patent/CN116863078A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The present disclosure provides a three-dimensional human model reconstruction method, apparatus, electronic device, and readable medium, wherein the three-dimensional human model reconstruction method includes: acquiring an original RGB two-dimensional image comprising a human body image and a silhouette of the human body image; inputting the original RGB two-dimensional image and the silhouette into an SMPL prediction network to construct an SMPL human body parameter model; rendering the front image and the back image of the human body on the basis of the silhouette to obtain a normal image and texture characteristics of the human body without dressing; and reconstructing the three-dimensional human body model according to the human body normal map and the texture characteristics. Through the embodiment of the disclosure, the dependence on training data can be reduced, the accuracy and the reliability of normal map generation are improved, and the surface texture reconstruction effect of the three-dimensional human model is further improved.

Description

Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium
Technical Field
The disclosure relates to the field of information technology, and in particular relates to a three-dimensional human model reconstruction method, a three-dimensional human model reconstruction device, electronic equipment and a readable medium.
Background
Currently, most of existing reconstruction algorithms are based on three-dimensional human body reconstruction of a single image. The reconstruction method has the advantages that the observable part of the input image is limited, and the information of the back surface of the blocked or person is lacking, so that the recovery precision of the invisible area in the reconstruction structure is lower.
In the related art, a normal map is predicted to an original image, and then the normal map and the original image are used as an input of a three-dimensional human body reconstruction network.
Although the three-dimensional human body reconstruction scheme in the related art can improve the reconstruction accuracy of the invisible area to a certain extent, at least the following technical problems exist:
(1) The prediction of the normal map is obtained by RGB images through a Pix2Pix network, but a large amount of training data is needed to be generalized enough on different postures, and the accuracy of the normal map prediction obtained by training is limited;
(2) The normal map is used as an input of the three-dimensional human body reconstruction network together with the original image, which may lead to poor detail accuracy of the reconstructed three-dimensional human body model.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the present disclosure is to provide a three-dimensional manikin reconstruction method, apparatus, electronic device, and readable medium for overcoming, at least to some extent, the problems of poor reliability and the like of three-dimensional manikin reconstruction due to limitations and drawbacks of the related art.
According to a first aspect of embodiments of the present disclosure, there is provided a three-dimensional human model reconstruction method, including: acquiring an original RGB two-dimensional image comprising a human body image and a silhouette of the human body image; inputting the original RGB two-dimensional image and the silhouette into an SMPL prediction network to construct an SMPL human parameter model; rendering the front image and the back image of the human body on the basis of the silhouette to obtain a normal image and texture characteristics of the human body without dressing; and reconstructing a three-dimensional human body model according to the human body normal map and the texture features.
In an exemplary embodiment of the present disclosure, further comprising:
determining voxel features, normal features and texture features of the human normal map;
fusing the voxel features, the normal features and the texture features to a reconstructed three-dimensional human model based on an attention mechanism.
In one exemplary embodiment of the present disclosure, determining voxel features, normal features, and texture features of the human normal map comprises:
voxelized the human normal map to determine the voxel characteristics;
extracting a texture map and a normal map of a wearing human body from the original RGB two-dimensional image;
and carrying out feature extraction on the texture map and the normal map of the wearing human body through a feature extraction network so as to obtain the normal features and the texture features.
In one exemplary embodiment of the present disclosure, feature extracting the texture map and the wearing human normal map through a feature extraction network to obtain the normal feature and the texture feature includes:
projecting the human normal map to a two-dimensional plane based on a bilinear interpolation algorithm;
extracting the normal feature and the texture feature according to the projection result of the two-dimensional plane;
and determining the corresponding relation between any sampling point on the human body normal map and the normal feature and the corresponding relation between the sampling point and the texture feature.
In one exemplary embodiment of the present disclosure, fusing the voxel features, the normal features, and the texture features to the reconstructed three-dimensional mannequin based on an attention mechanism includes:
normalizing the voxel characteristics, the normal characteristics and the texture characteristics based on the attention-based mechanism, and giving a preset weight value;
decoding the human normal map into an SMPL mesh to determine vertex coordinates, β, and θ;
substituting the vertex coordinates, the beta, the theta, the preset weight value and the corresponding feature vector into a body fitting loss function, wherein the feature vector comprises the voxel feature, the normal feature and the texture feature;
and adjusting the weight value of the feature vector based on the magnitude relation between the calculation result of the body fitting loss function and the preset loss function result.
In an exemplary embodiment of the present disclosure, adjusting the weight value of the feature vector based on the magnitude relation between the calculation result of the body fitting loss function and the preset loss function result includes:
judging whether the calculation result of the body fitting loss function is larger than the preset loss function result or not;
and if the calculation result of the body fitting loss function is larger than the preset loss function result, adjusting the weight value until the calculation result is smaller than or equal to the preset loss function result.
In an exemplary embodiment of the present disclosure, the expression of the body fit loss function includes:
wherein the v j Represents the jth vertex, said n s Representing the total number of vertices, the c (v j ) Representing the feature vector corresponding to the vertex, the F (c (v j ) Representing the occupancy value of the vertex, the lambda R Is constant, said beta init And the theta is equal to init As an initial parameter, the η represents a preset coefficient.
According to a second aspect of embodiments of the present disclosure, there is provided a three-dimensional human model reconstruction apparatus, comprising:
an acquisition module configured to acquire an original RGB two-dimensional image including a human body image and a silhouette of the human body image;
a building module configured to input the original RGB two-dimensional image and the silhouette to an SMPL prediction network to build an SMPL human parameter model;
the rendering module is used for rendering the front image and the back image of the human body on the basis of the silhouette to obtain a normal image and texture characteristics of the human body without dressing;
and the reconstruction module is used for reconstructing a three-dimensional human body model according to the human body normal map and the texture features.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a program which, when executed by a processor, implements a three-dimensional human model reconstruction method as set forth in any one of the above.
According to the embodiment of the disclosure, by acquiring the original RGB two-dimensional image including the human body image and the silhouette of the human body image, inputting the original RGB two-dimensional image and the silhouette into the SMPL prediction network, the SMPL prediction network can conduct prediction training on the human body image area and the background based on the silhouette to construct the SMPL human body parameter model, and conduct rendering on the human body front image and the human body back image on the SMPL human body parameter model based on the silhouette to obtain the normal image and the texture feature of the human body without dressing, and finally, reconstructing the three-dimensional human body model according to the normal image and the texture feature of the human body, so that dependence on training data can be reduced, accuracy and reliability of normal image generation are improved, and surface texture reconstruction effect of the three-dimensional human body model is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 shows a schematic diagram of a reconstruction scheme of a three-dimensional mannequin in the related art;
FIG. 2 is a flow chart of a three-dimensional manikin reconstruction method in an exemplary embodiment of the disclosure;
FIG. 3 is a flow chart of another three-dimensional mannequin reconstruction method according to an exemplary embodiment of the present disclosure;
FIG. 4 is a flowchart of another three-dimensional manikin reconstruction method in an exemplary embodiment of the disclosure;
FIG. 5 is a flowchart of another three-dimensional manikin reconstruction method in an exemplary embodiment of the disclosure;
FIG. 6 is a flowchart of another three-dimensional manikin reconstruction method in an exemplary embodiment of the disclosure;
FIG. 7 is a flowchart of another three-dimensional manikin reconstruction method in an exemplary embodiment of the disclosure;
FIG. 8 is a schematic architecture diagram of a three-dimensional manikin reconstruction scheme in an exemplary embodiment of the disclosure;
FIG. 9 is a schematic architecture diagram of another three-dimensional mannequin reconstruction scheme in an exemplary embodiment of the present disclosure;
FIG. 10 is a schematic architecture diagram of another three-dimensional mannequin reconstruction scheme in an exemplary embodiment of the present disclosure;
FIG. 11 is a block diagram of a three-dimensional manikin reconstruction device according to an exemplary embodiment of the disclosure;
fig. 12 is a block diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are only schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 shows a schematic diagram of a reconstruction scheme of a three-dimensional human body model in the related art.
As shown in fig. 1, the reconstruction scheme of the three-dimensional human body model in the related art includes the following steps:
step S102, an image is input.
Step S104, the image is continuously input to the Pix2Pix network.
Step S106, the Pix2Pix network generates a normal map.
Step S108, three-dimensional human body parameters are determined through SMPL parameter prediction.
Step S110, outputting the SMPL parameters corresponding to the three-dimensional human body parameters.
And step S112, extracting three-dimensional features.
And step S114, carrying out two-dimensional feature extraction according to the normal map and the input image.
Step S116, obtaining sampling points of the three-dimensional human body model.
In step S118, the features of the sampling points are determined.
Step S120, an implicit function for reconstructing the three-dimensional manikin is acquired.
Step S122, predicting the sampling point position.
And step S124, reconstructing the three-dimensional human body model based on the prediction result.
In the process of obtaining the three-dimensional human body model based on the embodiment, the prediction of the normal map is obtained by RGB images through a Pix2Pix network, but a large amount of training data is needed for generalization in different postures, the prediction precision of the normal map is limited, and the accuracy of texture details of the three-dimensional human body model is poor.
In view of the shortcomings of the related art, the present disclosure proposes a new three-dimensional mannequin reconstruction scheme, and exemplary embodiments of the present disclosure are described in detail below with reference to fig. 2 to 12.
Fig. 2 is a flow chart of a three-dimensional manikin reconstruction method in an exemplary embodiment of the disclosure.
Referring to fig. 2, the three-dimensional manikin reconstruction method may include:
step S202, acquiring an original RGB two-dimensional image including a human body image and a silhouette of the human body image.
Step S204, inputting the original RGB two-dimensional image and the silhouette into an SMPL prediction network to construct an SMPL human parameter model.
And step S206, rendering the front image and the back image of the human body on the basis of the silhouette to obtain a normal image and texture characteristics of the human body without dressing.
And step S208, reconstructing a three-dimensional human body model according to the human body normal map and the texture features.
According to the embodiment of the disclosure, by acquiring the original RGB two-dimensional image including the human body image and the silhouette of the human body image, inputting the original RGB two-dimensional image and the silhouette into the SMPL prediction network, the SMPL prediction network can conduct prediction training on the human body image area and the background based on the silhouette to construct the SMPL human body parameter model, and conduct rendering on the human body front image and the human body back image on the SMPL human body parameter model based on the silhouette to obtain the normal image and the texture feature of the human body without dressing, and finally, reconstructing the three-dimensional human body model according to the normal image and the texture feature of the human body, so that dependence on training data can be reduced, accuracy and reliability of normal image generation are improved, and surface texture reconstruction effect of the three-dimensional human body model is further improved.
The RGB two-dimensional image is a two-dimensional image of three primary colors of red, green and blue.
In addition, the SMPL mode is a parameterized human body model, can perform arbitrary human body modeling and animation driving, wherein beta represents 10 parameters of the individual with the same proportion of height, stuffy and thin body and head and body ratio, and theta represents 75 parameters of the overall motion pose of the human body and the relative angles of 24 joints.
Next, each step of the three-dimensional human model reconstruction method will be described in detail.
In one exemplary embodiment of the present disclosure, as shown in fig. 3, the three-dimensional mannequin reconstruction method further includes:
step S302, determining voxel features, normal features and texture features of the human normal map.
Step S304, fusing the voxel features, the normal features and the texture features to the reconstructed three-dimensional human model based on an attention mechanism.
In one exemplary embodiment of the present disclosure, as shown in fig. 4, determining voxel features, normal features, and texture features of the human normal map includes:
step S402, voxelizing the human normal map to determine the voxel characteristic.
Step S404, extracting texture map and normal map of wearing human body for the original RGB two-dimensional image.
And step S406, carrying out feature extraction on the texture map and the normal map of the wearing human body through a feature extraction network so as to obtain the normal features and the texture features.
In one exemplary embodiment of the present disclosure, the voxel characteristic is a vector characteristic of each sampling point.
In one exemplary embodiment of the present disclosure, as shown in fig. 5, performing feature extraction on the texture map and the wearing human normal map through a feature extraction network to obtain the normal feature and the texture feature includes:
and step S502, projecting the human normal map to a two-dimensional plane based on a bilinear interpolation algorithm.
In one exemplary embodiment of the present disclosure, the loss of projection of the human normal map to the two-dimensional plane is compensated by a bilinear interpolation algorithm.
And step S504, extracting the normal feature and the texture feature according to the projection result of the two-dimensional plane.
Step S506, determining a correspondence between any sampling point on the human normal chart and the normal feature, and a correspondence between the sampling point and the texture feature.
In one exemplary embodiment of the present disclosure, as shown in fig. 6, fusing the voxel features, the normal features, and the texture features to the reconstructed three-dimensional mannequin based on an attention mechanism includes:
step S602, performing normalization processing on the voxel feature, the normal feature and the texture feature based on the attention-based mechanism, and assigning a preset weight value.
In one exemplary embodiment of the present disclosure, the initial preset weight value may be set to 1/3.
In step S604, the human normal map is decoded into a SMPL mesh to determine vertex coordinates, β, and θ.
Step S606, substituting the vertex coordinates, the β, the θ, the preset weight values, and corresponding feature vectors into a body fitting loss function, where the feature vectors include the voxel feature, the normal feature, and the texture feature.
Step S608, adjusting the weight value of the feature vector based on the magnitude relation between the calculation result of the body fitting loss function and the preset loss function result.
In an exemplary embodiment of the present disclosure, as shown in fig. 7, adjusting the weight value of the feature vector based on the magnitude relation between the calculation result of the body fitting loss function and the preset loss function result includes:
step S702 is executed to determine whether the calculation result of the body fitting loss function is greater than the preset loss function result.
Step S704, if it is determined that the calculation result of the body fitting loss function is greater than the preset loss function result, adjusting the weight value until the calculation result is less than or equal to the preset loss function result.
In an exemplary embodiment of the present disclosure, the expression of the body fit loss function includes:
wherein the v j Represents the jth vertex, said n s Representing the total number of vertices, the c (v j ) Representing the feature vector corresponding to the vertex, the F (c (v j ) Representing the occupancy value of the vertex, the lambda R Is constant, said beta init And the theta is equal to init As an initial parameter, the η represents a preset coefficient.
In an exemplary embodiment of the present disclosure, as shown in fig. 8, through a framework 800 of a strategy of progressive reconstruction, generation of normal images is guided based on three-dimensional priori knowledge of a human body, so that a front and back normal vector image generated by prediction accords with human body characteristics, meanwhile, as a front visible area and a back invisible area of the human body share one silhouette image, a rough normal image without dressing is deformed and refined based on an input RGB image and a two-dimensional contour image of the human body, and generation effects of the normal image and a texture image can be further improved, and the implementation process is as follows:
1) Generating corresponding SMPL parameters based on the existing SMPL parameter prediction network 900 according to the original RGB image, and constructing an SMPL human body parameter model;
2) Performing front and back rendering on the SMPL human body parameter model based on the silhouette differentiable renderer to obtain a normal map of the human body without dressing;
3) And sending the normal map of the non-wearing human body, the original RGB map and the silhouette corresponding to the RGB map to a normal map prediction network to generate a front-back normal map and a back-back texture map of the wearing human body.
In an exemplary embodiment of the present disclosure, as shown in fig. 10, an implicitly reconstructed prediction framework 1000 determines, based on a sampling point feature, whether the sampling point is located on a three-dimensional target surface, and performs implicit prediction by fusing voxel features, normal features and texture features, so as to further refine details of surface texture reconstruction, where the implementation process is as follows:
1) Voxelization of the SMPL human body parameter model, extraction of voxel characteristics based on a 3D convolution network, and obtaining the voxel characteristics of any sampling point P based on multi-scale tri-linear interpolation;
2) The original RGB image is used as a front texture image and a normal image of the front of a wearing human body to be respectively sent into a feature extraction network to obtain corresponding texture features and normal features, and the back texture image and the normal image are processed simultaneously;
3) Projecting an arbitrary sampling point P to an image plane, and judging whether the sampling point P is positioned on the front or the back of a human body;
4) If the projected image is positioned on the front side of the human body, bilinear interpolation is carried out at the position of the projected pixel coordinate pi (P), the pixel texture characteristics and the normal characteristics are respectively extracted, and if the projected image is positioned on the back side of the human body, the two pixel texture characteristics and the normal characteristics are processed simultaneously;
5) Respectively normalizing texture features, normal features and voxel features, endowing the same initial attention weight to obtain fusion features, and performing implicit prediction to obtain an initial three-dimensional human model;
6) Decoding the initial three-dimensional human body model into an SMPL mesh to obtain corresponding vertex, beta and theta parameters;
7) The predicted vertices, beta, theta and their corresponding features are brought into the body fit loss function as follows:
wherein v is j Represents the jth vertex, n s Represents the total number of vertices, c (v j ) Represents the feature vector corresponding to the point, F (c (v) j ) Represents the occupancy value of the point, lambda R Is constant, experimentally obtained, beta init And theta init The initial parameters generated by the network are predicted for the SMPL parameters, where η represents a preset coefficient.
8) If L B Greater than the threshold value, the attention weight is updated, continuing with step 5) 6) 7) above).
Corresponding to the above method embodiments, the present disclosure also provides a three-dimensional manikin reconstruction device, which may be used to perform the above method embodiments.
Fig. 11 is a block diagram of a three-dimensional manikin reconstruction device in an exemplary embodiment of the disclosure.
Referring to fig. 11, the three-dimensional manikin reconstruction apparatus 1100 may include:
an acquisition module 1102 is arranged to acquire an original RGB two-dimensional image comprising a human body image and a silhouette of the human body image.
A construction module 1104 is arranged to input the original RGB two-dimensional image and the silhouette into the SMPL prediction network to construct an SMPL human parameter model.
The rendering module 1106 is configured to render the front image and the back image of the human body on the basis of the silhouette to obtain a normal map and texture features of the human body without clothing.
A reconstruction module 1108 is arranged for reconstructing a three-dimensional phantom from the body normal map and the texture features.
In one exemplary embodiment of the present disclosure, the three-dimensional mannequin reconstruction apparatus 1100 is further configured to:
determining voxel features, normal features and texture features of the human normal map;
fusing the voxel features, the normal features and the texture features to a reconstructed three-dimensional human model based on an attention mechanism.
In one exemplary embodiment of the present disclosure, the three-dimensional mannequin reconstruction apparatus 1100 is further configured to:
voxelized the human normal map to determine the voxel characteristics;
extracting a texture map and a normal map of a wearing human body from the original RGB two-dimensional image;
and carrying out feature extraction on the texture map and the normal map of the wearing human body through a feature extraction network so as to obtain the normal features and the texture features.
In one exemplary embodiment of the present disclosure, the three-dimensional mannequin reconstruction apparatus 1100 is further configured to:
projecting the human normal map to a two-dimensional plane based on a bilinear interpolation algorithm;
extracting the normal feature and the texture feature according to the projection result of the two-dimensional plane;
and determining the corresponding relation between any sampling point on the human body normal map and the normal feature and the corresponding relation between the sampling point and the texture feature.
In one exemplary embodiment of the present disclosure, the three-dimensional mannequin reconstruction apparatus 1100 is further configured to:
normalizing the voxel characteristics, the normal characteristics and the texture characteristics based on the attention-based mechanism, and giving a preset weight value;
decoding the human normal map into an SMPL mesh to determine vertex coordinates, β, and θ;
substituting the vertex coordinates, the beta, the theta, the preset weight value and the corresponding feature vector into a body fitting loss function, wherein the feature vector comprises the voxel feature, the normal feature and the texture feature;
and adjusting the weight value of the feature vector based on the magnitude relation between the calculation result of the body fitting loss function and the preset loss function result.
In one exemplary embodiment of the present disclosure, the three-dimensional mannequin reconstruction apparatus 1100 is further configured to:
judging whether the calculation result of the body fitting loss function is larger than the preset loss function result or not;
and if the calculation result of the body fitting loss function is larger than the preset loss function result, adjusting the weight value until the calculation result is smaller than or equal to the preset loss function result.
In an exemplary embodiment of the present disclosure, the expression of the body fit loss function includes:
wherein the v j Represents the jth vertex, said n s Representing the total number of vertices, the c (v j ) Representing the feature vector corresponding to the vertex, the F (c (v j ) Representing the occupancy value of the vertex, the lambda R Is constant, said beta init And the theta is equal to init As an initial parameter, the η represents a preset coefficient.
Since the functions of the apparatus 1100 are described in detail in the corresponding method embodiments, the disclosure is not repeated herein.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 1200 according to this embodiment of the present application is described below with reference to fig. 12. The electronic device 1200 shown in fig. 12 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 12, the electronic device 1200 is in the form of a general purpose computing device. Components of electronic device 1200 may include, but are not limited to: the at least one processing unit 1210, the at least one memory unit 1220, and a bus 1230 connecting the different system components (including the memory unit 1220 and the processing unit 1210).
Wherein the storage unit stores program code that is executable by the processing unit 1210 such that the processing unit 1210 performs steps according to various exemplary embodiments of the present application described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 1210 may perform the methods as shown in the embodiments of the present disclosure.
The storage unit 1220 may include a readable medium in the form of a volatile storage unit, such as a Random Access Memory (RAM) 12201 and/or a cache memory 12202, and may further include a Read Only Memory (ROM) 12203.
Storage unit 1220 may also include a program/utility 12204 having a set (at least one) of program modules 12205, such program modules 12205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 1230 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 1200 may also communicate with one or more external devices 1240 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1200, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 1200 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1250. Also, the electronic device 1200 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet through the network adapter 1260. As shown, the network adapter 1260 communicates with other modules of the electronic device 1200 over bus 1230. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 1200, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the application may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the application as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
The program product for implementing the above-described method according to an embodiment of the present application may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present application, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A method for reconstructing a three-dimensional mannequin, comprising:
acquiring an original RGB two-dimensional image comprising a human body image and a silhouette of the human body image;
inputting the original RGB two-dimensional image and the silhouette into an SMPL prediction network to construct an SMPL human parameter model;
rendering the front image and the back image of the human body on the basis of the silhouette to obtain a normal image and texture characteristics of the human body without dressing;
and reconstructing a three-dimensional human body model according to the human body normal map and the texture features.
2. The three-dimensional manikin reconstruction method according to claim 1, further comprising:
determining voxel features, normal features and texture features of the human normal map;
fusing the voxel features, the normal features and the texture features to a reconstructed three-dimensional human model based on an attention mechanism.
3. The three-dimensional mannequin reconstruction method according to claim 2, wherein determining voxel features, normal features, and texture features of the mannequin includes:
voxelized the human normal map to determine the voxel characteristics;
extracting a texture map and a normal map of a wearing human body from the original RGB two-dimensional image;
and carrying out feature extraction on the texture map and the normal map of the wearing human body through a feature extraction network so as to obtain the normal features and the texture features.
4. The three-dimensional mannequin reconstruction method of claim 3, wherein performing feature extraction on the texture map and the dressing body normal map through a feature extraction network to obtain the normal features and the texture features comprises:
projecting the human normal map to a two-dimensional plane based on a bilinear interpolation algorithm;
extracting the normal feature and the texture feature according to the projection result of the two-dimensional plane;
and determining the corresponding relation between any sampling point on the human body normal map and the normal feature and the corresponding relation between the sampling point and the texture feature.
5. The three-dimensional mannequin reconstruction method according to any one of claims 1-4, wherein fusing the voxel features, the normal features, and the texture features to the reconstructed three-dimensional mannequin based on an attention mechanism includes:
normalizing the voxel characteristics, the normal characteristics and the texture characteristics based on the attention-based mechanism, and giving a preset weight value;
decoding the human normal map into an SMPL mesh to determine vertex coordinates, β, and θ;
substituting the vertex coordinates, the beta, the theta, the preset weight value and the corresponding feature vector into a body fitting loss function, wherein the feature vector comprises the voxel feature, the normal feature and the texture feature;
and adjusting the weight value of the feature vector based on the magnitude relation between the calculation result of the body fitting loss function and the preset loss function result.
6. The three-dimensional manikin reconstruction method according to claim 5, characterized in that adjusting the weight value of the feature vector based on a magnitude relation between a calculation result of the body fitting loss function and a preset loss function result includes:
judging whether the calculation result of the body fitting loss function is larger than the preset loss function result or not;
and if the calculation result of the body fitting loss function is larger than the preset loss function result, adjusting the weight value until the calculation result is smaller than or equal to the preset loss function result.
7. The three-dimensional manikin reconstruction method according to claim 5, characterized in that said expression of body fitting loss functions includes:
wherein the v j Represents the jth vertex, said n s Representing the total number of vertices, the c (v j ) Representing the feature vector corresponding to the vertex, the F (c (v j ) Representing the occupancy value of the vertex, the lambda R Is constant, said beta init And the theta is equal to init As an initial parameter, the η represents a preset coefficient.
8. A three-dimensional manikin reconstruction device, comprising:
an acquisition module configured to acquire an original RGB two-dimensional image including a human body image and a silhouette of the human body image;
a building module configured to input the original RGB two-dimensional image and the silhouette to an SMPL prediction network to build an SMPL human parameter model;
the rendering module is used for rendering the front image and the back image of the human body on the basis of the silhouette to obtain a normal image and texture characteristics of the human body without dressing;
and the reconstruction module is used for reconstructing a three-dimensional human body model according to the human body normal map and the texture features.
9. An electronic device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the three-dimensional mannequin reconstruction method according to any one of claims 1-7 based on instructions stored in the memory.
10. A computer readable storage medium having stored thereon a program which when executed by a processor implements the three-dimensional mannequin reconstruction method according to any one of claims 1 to 7.
CN202310870355.5A 2023-07-14 2023-07-14 Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium Pending CN116863078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310870355.5A CN116863078A (en) 2023-07-14 2023-07-14 Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310870355.5A CN116863078A (en) 2023-07-14 2023-07-14 Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium

Publications (1)

Publication Number Publication Date
CN116863078A true CN116863078A (en) 2023-10-10

Family

ID=88230143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310870355.5A Pending CN116863078A (en) 2023-07-14 2023-07-14 Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium

Country Status (1)

Country Link
CN (1) CN116863078A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522951A (en) * 2023-12-29 2024-02-06 深圳市朗诚科技股份有限公司 Fish monitoring method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522951A (en) * 2023-12-29 2024-02-06 深圳市朗诚科技股份有限公司 Fish monitoring method, device, equipment and storage medium
CN117522951B (en) * 2023-12-29 2024-04-09 深圳市朗诚科技股份有限公司 Fish monitoring method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109214980B (en) Three-dimensional attitude estimation method, three-dimensional attitude estimation device, three-dimensional attitude estimation equipment and computer storage medium
CN109191554B (en) Super-resolution image reconstruction method, device, terminal and storage medium
CN110378947B (en) 3D model reconstruction method and device and electronic equipment
CN112785674A (en) Texture map generation method, rendering method, device, equipment and storage medium
CN113313832B (en) Semantic generation method and device of three-dimensional model, storage medium and electronic equipment
US11741678B2 (en) Virtual object construction method, apparatus and storage medium
CN110390327A (en) Foreground extracting method, device, computer equipment and storage medium
CN116863078A (en) Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium
CN115346018A (en) Three-dimensional model reconstruction method and device and electronic equipment
CN114863002A (en) Virtual image generation method and device, terminal equipment and computer readable medium
CN113140034A (en) Room layout-based panoramic new view generation method, device, equipment and medium
CN114708374A (en) Virtual image generation method and device, electronic equipment and storage medium
CN115346000A (en) Three-dimensional human body reconstruction method and device, computer readable medium and electronic equipment
CN115272565A (en) Head three-dimensional model reconstruction method and electronic equipment
CN114792355A (en) Virtual image generation method and device, electronic equipment and storage medium
CN113409340A (en) Semantic segmentation model training method, semantic segmentation device and electronic equipment
CN112085842B (en) Depth value determining method and device, electronic equipment and storage medium
CN115272575B (en) Image generation method and device, storage medium and electronic equipment
CN115439610B (en) Training method and training device for model, electronic equipment and readable storage medium
EP4086853A2 (en) Method and apparatus for generating object model, electronic device and storage medium
US11741671B2 (en) Three-dimensional scene recreation using depth fusion
CN115311403A (en) Deep learning network training method, virtual image generation method and device
CN114494574A (en) Deep learning monocular three-dimensional reconstruction method and system based on multi-loss function constraint
CN112861940A (en) Binocular disparity estimation method, model training method and related equipment
CN116012666B (en) Image generation, model training and information reconstruction methods and devices and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination