CN114913287B - Three-dimensional human body model reconstruction method and system - Google Patents
Three-dimensional human body model reconstruction method and system Download PDFInfo
- Publication number
- CN114913287B CN114913287B CN202210362589.4A CN202210362589A CN114913287B CN 114913287 B CN114913287 B CN 114913287B CN 202210362589 A CN202210362589 A CN 202210362589A CN 114913287 B CN114913287 B CN 114913287B
- Authority
- CN
- China
- Prior art keywords
- dimensional
- input image
- target user
- parameters
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
Abstract
The application provides a three-dimensional human model reconstruction method and a three-dimensional human model reconstruction system, which relate to the technical field of computer vision, wherein the method comprises the following steps: acquiring an input image, wherein the input image comprises limbs of a target user; generating a feature map pyramid according to the input image through a feature pyramid network; extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales; processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user. The embodiment of the application can improve the accuracy of reconstructing the three-dimensional human body model.
Description
Technical Field
The application relates to the technical field of computer vision, in particular to a three-dimensional human model reconstruction method and system.
Background
Detecting surface areas of the human body in the image and estimating parameters of the three-dimensional (3D) surface provides important information for understanding the structure and context of the scene, and can also be applied to three-dimensional human body model reconstruction. For example, the method is applied to the aspects of virtual reality scenes, human special effects, human detection and the like.
However, for surface detection, the task of detecting a surface or estimating surface parameters is fundamentally challenging, as estimating from a single 2D image is an ill-posed problem. First, 3D surface reconstruction requires a rich a priori. Second, surface structures often lack texture, requiring global understanding of image semantics, rather than local texture analysis. The latest approach to this task makes use of Convolutional Neural Networks (CNNs), formulating the task as a surface segmentation task.
And for human body detection, the traditional three-dimensional human body model reconstruction mode is obtained by matching and aligning irregular point clouds of the depth map with a three-dimensional human body regular grid model. However, the matching result of the method is seriously dependent on the quality of the depth map, and if the resolution ratio of the depth map is low, the three-dimensional human body model obtained by reconstruction is inaccurate.
Disclosure of Invention
The embodiment of the application aims to provide a three-dimensional human body model reconstruction method and system. The specific technical scheme is as follows:
in a first aspect of an embodiment of the present application, a three-dimensional human model reconstruction method is provided, including:
acquiring an input image, wherein the input image comprises limbs of a target user;
generating a feature map pyramid according to the input image through a feature pyramid network;
extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
Optionally, the method further comprises:
the feature map pyramid includes a plurality of levels associated with a number of different scales.
Optionally, the bounding box parameter defines a bounding box indicating a position of the surface object, wherein the surface normal parameter indicates a relative orientation of the surface object.
Optionally, the method further comprises:
a surface offset parameter is calculated for each surface object of the plurality of surface objects based on the normal vector and the depth map estimated for each surface object, wherein a decoder network included in the one or more neural networks is configured to process a map of the feature pyramid estimated depth map.
Optionally, the method further comprises:
the region of interest is processed to generate a four-element vector representing a location of a vertex of the bounding box and a size of the bounding box for each of the plurality of anchor boxes, associated with the region of interest to estimate the bounding box parameters.
Optionally, the method further comprises:
surface normal parameters are estimated by processing the region of interest to generate an anchor identifier and a three-element vector representing a residual vector corresponding to the anchor normal associated with the anchor identifier.
Optionally, the method further comprises:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional human body model into a two-dimensional image according to camera parameters corresponding to the image.
In yet another aspect of an embodiment of the present application, there is provided a three-dimensional mannequin reconstruction system including:
the input image acquisition module is used for acquiring an input image, wherein the input image comprises limbs of a target user;
the feature map pyramid generation module is used for generating a feature map pyramid according to the input image through a feature pyramid network;
the three-dimensional parameter generation module is used for extracting the characteristics of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
a surface parameter generation module for processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
the reconstruction module is used for reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
In still another aspect of the embodiments of the present application, there is provided a light field reconstruction apparatus of a light field camera, including: a processor for performing the method of any one of claims 1 to 7.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is another flow chart of a three-dimensional mannequin reconstruction method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a three-dimensional human model reconstruction method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a three-dimensional mannequin reconstruction system according to an embodiment of the present application;
fig. 4 is an internal structure diagram of a computer device provided by an embodiment of the present application for a target user.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The three-dimensional human model reconstruction method provided by the application can be applied to an application environment shown in figure 1. The terminal 102 obtains an input image including the limb of the target user. The terminal sends the image to a server 104, and the server 104 generates a feature map pyramid according to the input image through a feature pyramid network; and extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales. Server 104 also processes the region of interest sampled from the feature map pyramid through one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask.
The server 104 then returns the three-dimensional parameters of the target user, and the parameters of the surface object, to the terminal 102. The terminal 102 reconstructs a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminal 102 may be, but is not limited to, a smart phone, tablet, notebook, desktop, smart box, smart watch, etc. The terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
In some embodiments, the three-dimensional mannequin reconstruction method is applicable to three-dimensional mannequin reconstruction of a human body, and comprises the following steps:
acquiring an input image, wherein the input image comprises limbs of a target user;
generating a feature map pyramid according to the input image through a feature pyramid network;
extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
As can be seen from the above, in the embodiment of the present application, the trained neural network model is used to perform graph convolution processing on the human body image, so that the three-dimensional human body parameters corresponding to the target human body in the human body image and the parameters of the target human body surface object are rapidly and accurately obtained, and the three-dimensional human body model is accurately reconstructed through the three-dimensional human body parameters and the human body surface parameters, so that the accuracy of the three-dimensional human body model reconstruction and the reconstruction efficiency are improved.
Fig. 2 is a schematic flow chart of a three-dimensional human body model reconstruction method according to an embodiment of the present application, and as shown in fig. 2, the three-dimensional human body model reconstruction method includes the following steps:
step 201, an input image is acquired.
Wherein the input image may comprise a limb of the target user. The target user may refer to a user who needs to perform a three-dimensional manikin. For example, the target user may be a person in the input image. Wherein the input image may be a single image frame (e.g., RGB, monochrome, etc.) having one or more channels. The input image may also be a color image, which has higher resolution and richer details for the depth map, and can reconstruct a three-dimensional human body model of a human body more finely.
Optionally, the terminal may obtain the corresponding input image by directly shooting the target user, or may obtain the input image corresponding to the target user from a local or network or from a third device, and the method for obtaining the input image is not limited in the embodiments of the present application.
And 202, generating a feature map pyramid according to the input image through a feature pyramid network.
In some embodiments, a feature pyramid network is implemented to process input images and generate feature map pyramids. As used herein, a feature map pyramid refers to a plurality of feature maps having different scales relative to the scale of an input image. The feature map pyramid may include a plurality of levels, each level including one or more feature maps of a particular scale that increase in scale (e.g., resolution in pixel space) as one moves from the top to the bottom of the pyramid. In some embodiments, the feature pyramid network is based on a residual network that extracts image features. The feature map is then up-sampled and combined with intermediate feature maps from the residual network to generate a feature map pyramid.
And 203, extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales.
Specifically, the terminal may perform feature extraction on an input image of the target user to obtain features corresponding to the input image, and obtain a feature map. And then, the terminal carries out graph convolution processing on the feature graph to obtain point cloud features with different scales. And the terminal carries out convolution processing with the channel number of 3 on the point cloud features with different scales to obtain point cloud coordinates with different scales, wherein the point cloud coordinates are three-dimensional coordinates.
Specifically, the terminal can perform downsampling and full connection processing on point cloud coordinates with different scales to obtain three-dimensional parameters of a target user, and can reconstruct a three-dimensional model of the whole human body based on the skin multi-person linear parameters.
Step 204, processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image.
Wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask.
A sliding window is applied to each feature map in the feature map pyramid to sample the region of interest. The region of interest may refer to a region of the feature map corresponding to a particular subset of the input image. While the sliding window may have a fixed size, as applied to a particular feature map of a given scale in the feature map pyramid, the region of interest is associated with a variable size region of the input image. For example, a sliding window may be defined as a 7 x 7 pixel region of downsampled size relative to a particular feature map in the feature map pyramid. Based on the relative differences, the 7×7 pixel region in the feature map may correspond to, for example, a 14×14 pixel region, a 28×28 pixel region, or a 56×56 pixel region (or larger) in the input image and the scale of the feature map and the input image in pixel space.
Step 205, reconstructing a three-dimensional human model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object.
Wherein the three-dimensional manikin can be matched with the limb of the target user. That is, the generated three-dimensional mannequin is the three-dimensional mannequin of the target user.
In particular, the three-dimensional parameters may include three-dimensional pose parameters and three-dimensional body shape parameters. The three-dimensional posture parameter is the joint point coordinate of the target object, and the three-dimensional body type parameter is the characteristic point coordinate of the surface of the target object. After the terminal obtains the three-dimensional posture parameters and the three-dimensional body shape parameters corresponding to the target object in the image, a model is built in a three-dimensional space according to the three-dimensional coordinates corresponding to the three-dimensional posture parameters and the three-dimensional coordinates corresponding to the three-dimensional body shape parameters, and therefore a three-dimensional human body model is obtained.
Optionally, the embodiments of the present application may further include the following.
Optionally, the bounding box parameter defines a bounding box indicating a position of the surface object, wherein the surface normal parameter indicates a relative orientation of the surface object.
Optionally, the method of the embodiment of the present application further includes:
a surface offset parameter is calculated for each surface object of the plurality of surface objects based on the normal vector and the depth map estimated for each surface object, wherein a decoder network included in the one or more neural networks is configured to process a map of the feature pyramid estimated depth map.
Optionally, the method of the embodiment of the present application further includes:
the region of interest is processed to generate a four-element vector representing a location of a vertex of the bounding box and a size of the bounding box for each of the plurality of anchor boxes, associated with the region of interest to estimate the bounding box parameters.
Optionally, the method of the embodiment of the present application further includes:
surface normal parameters are estimated by processing the region of interest to generate an anchor identifier and a three-element vector representing a residual vector corresponding to the anchor normal associated with the anchor identifier.
Optionally, the method of the embodiment of the present application further includes:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional human body model into a two-dimensional image according to camera parameters corresponding to the image.
In addition to the surface parameters discussed above, another useful parameter for various applications is a surface offset value. As used herein, a surface offset value refers to the closest distance from the origin of the coordinate system to the surface object, where the origin corresponds to the position of the camera (or virtual camera) that captured the image 202. Wherein the surface offset value may be calculated according to the following formula:
where K is a 3×3 intrinsic camera matrix x i Is the ith pixel, m i Is an indicator, 1 if the ith pixel belongs to a surface object, or 0, z otherwise i Is the predicted depth value of the pixel and n is the normal vector.
That is, for each pixel in the image overlapping the surface object, the distance is calculated by the dot product of the projected 3D position of the pixel and the normal vector, and then the distance in the number of pixels by which the surface offset overlaps the surface object is calculated by the average value of the sums.
In one embodiment, non-projection, coordinate system transformation, and projection may be implemented. Given a set of intrinsic parameters K of the camera, the projected pixels (u) can be cancelled in the neighbor view according to the following equation n ,v n )。
X n =K -1 h(u n ,v n ){circumflex over(D)} n (u n ,v n ),
Wherein { circumflex over (D) } is n Is a surface live depth map of the neighboring view, and h is the depth of a pixel (u n ,v n ) Converting to a homogenous representation. The non-projected pixels are then converted to the current view of the warp model using rotation R and translation t according to the following equation.
C c =RX n +t
Finally, warped pixel coordinates (u w ,v w )。
(u w ,v W )=h -1 (KX c ),
Wherein h is -1 The homogeneous coordinates are converted back into a two-dimensional representation. Depth values are then sampled from the depth map of the current view using warped pixel coordinates (uw, vw), and a loss function view between the sampled depth values is calculated using the warped pixel coordinates and the true depth map of the neighbor.
In order to implement the above method embodiments, the embodiment of the present application further provides a three-dimensional mannequin reconstruction system, and fig. 3 shows a schematic structural diagram of the three-dimensional mannequin reconstruction system provided by the embodiment of the present application, where the system includes:
an input image acquisition module 301, configured to acquire an input image, where the input image includes a limb of a target user;
a feature map pyramid generation module 302, configured to generate a feature map pyramid from the input image through a feature pyramid network;
the three-dimensional parameter generating module 303 is configured to perform feature extraction on the input image to obtain point cloud coordinates of different scales, and generate three-dimensional parameters of the target user according to the point cloud coordinates of different scales;
a surface parameter generation module 304 for processing the region of interest sampled from the feature map pyramid through one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
a reconstruction module 305, configured to reconstruct a three-dimensional human model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to corresponding processes in the foregoing method embodiments, which are not described herein again.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing training data of a reconstruction network and reconstruction data of a three-dimensional human body model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a training method for reconstructing a network and a three-dimensional manikin reconstruction method.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a training method for reconstructing a network and a three-dimensional manikin reconstruction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In some embodiments, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In some embodiments, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRA M) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
In summary, the three-dimensional human body model reconstruction method and system provided by the application can acquire an input image, wherein the input image comprises limbs of a target user; generating a feature map pyramid according to the input image through a feature pyramid network; extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales; processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask; reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method for reconstructing a three-dimensional mannequin, comprising:
acquiring an input image, wherein the input image comprises limbs of a target user;
generating a feature map pyramid according to the input image through a feature pyramid network;
extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate a parameter and a surface offset value for each of the plurality of surface objects, the parameters of the surface objects including a bounding box parameter, a surface normal parameter, and a segmentation mask, the surface offset value calculated according to:
wherein K is a 3×3 intrinsic camera matrix x i Is the ith pixel, m i Is an indicator, 1 if the ith pixel belongs to a surface object, or 0, z otherwise i Is the predicted depth value of the pixel, n is the normal vector;
reconstructing a three-dimensional human model of the target user based on the three-dimensional parameters of the target user, the surface offset value, and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
2. The method of claim 1, wherein the feature map pyramid comprises a plurality of levels associated with a number of different scales.
3. The method of claim 1, wherein the bounding box parameter defines a bounding box indicating a location of the surface object, wherein the surface normal parameter indicates a relative orientation of the surface object.
4. The method of claim 1, wherein the method further comprises:
the surface offset value is calculated for each surface object of a plurality of surface objects based on the normal vector and the depth map estimated for each surface object, wherein a decoder network included in the one or more neural networks is configured to process a map of the feature pyramid estimated depth map.
5. The method of claim 1, wherein the method further comprises:
the region of interest is processed to generate a four-element vector representing a location of a vertex of the bounding box and a size of the bounding box for each of the plurality of anchor boxes, associated with the region of interest to estimate the bounding box parameters.
6. The method as recited in claim 5, further comprising:
surface normal parameters are estimated by processing the region of interest to generate an anchor identifier and a three-element vector representing a residual vector corresponding to the anchor normal associated with the anchor identifier.
7. The method of claim 1, wherein the method further comprises:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional human body model into a two-dimensional image according to camera parameters corresponding to the image.
8. A three-dimensional manikin reconstruction system, said system comprising:
the input image acquisition module is used for acquiring an input image, wherein the input image comprises limbs of a target user;
the feature map pyramid generation module is used for generating a feature map pyramid according to the input image through a feature pyramid network;
the three-dimensional parameter generation module is used for extracting the characteristics of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
a surface parameter generation module for processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate a parameter and a surface offset value for each of the plurality of surface objects, the parameters of the surface objects including a bounding box parameter, a surface normal parameter, and a segmentation mask, the surface offset value calculated according to:
wherein K is a 3×3 intrinsic camera matrix x i Is the ith pixel, m i Is an indicator, 1 if the ith pixel belongs to a surface object, or 0, z otherwise i Is the predicted depth value of the pixel, n is the normal vector;
a reconstruction module, configured to reconstruct a three-dimensional human body model of the target user based on the three-dimensional parameter of the target user, the surface offset value, and the parameter of the surface object; the three-dimensional manikin is matched to the limb of the target user.
9. An electronic device, the device comprising:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of three-dimensional reconstruction of a human body as recited in any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of three-dimensional reconstruction of a human body as claimed in any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210362589.4A CN114913287B (en) | 2022-04-07 | 2022-04-07 | Three-dimensional human body model reconstruction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210362589.4A CN114913287B (en) | 2022-04-07 | 2022-04-07 | Three-dimensional human body model reconstruction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114913287A CN114913287A (en) | 2022-08-16 |
CN114913287B true CN114913287B (en) | 2023-08-22 |
Family
ID=82762654
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210362589.4A Active CN114913287B (en) | 2022-04-07 | 2022-04-07 | Three-dimensional human body model reconstruction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114913287B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115115797B (en) * | 2022-08-25 | 2022-11-25 | 清华大学 | Large-scene sparse light field semantic driving intelligent reconstruction method, system and device |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682477A (en) * | 2012-05-16 | 2012-09-19 | 南京邮电大学 | Regular scene three-dimensional information extracting method based on structure prior |
CN107169475A (en) * | 2017-06-19 | 2017-09-15 | 电子科技大学 | A kind of face three-dimensional point cloud optimized treatment method based on kinect cameras |
CN108961390A (en) * | 2018-06-08 | 2018-12-07 | 华中科技大学 | Real-time three-dimensional method for reconstructing based on depth map |
CN111027547A (en) * | 2019-12-06 | 2020-04-17 | 南京大学 | Automatic detection method for multi-scale polymorphic target in two-dimensional image |
CN111340864A (en) * | 2020-02-26 | 2020-06-26 | 浙江大华技术股份有限公司 | Monocular estimation-based three-dimensional scene fusion method and device |
CN111598998A (en) * | 2020-05-13 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium |
CN111598030A (en) * | 2020-05-21 | 2020-08-28 | 山东大学 | Method and system for detecting and segmenting vehicle in aerial image |
CN111968169A (en) * | 2020-08-19 | 2020-11-20 | 北京拙河科技有限公司 | Dynamic human body three-dimensional reconstruction method, device, equipment and medium |
CN112184899A (en) * | 2020-11-06 | 2021-01-05 | 中山大学 | Three-dimensional reconstruction method based on symbolic distance function |
CN112652016A (en) * | 2020-12-30 | 2021-04-13 | 北京百度网讯科技有限公司 | Point cloud prediction model generation method, pose estimation method and device |
CN112927354A (en) * | 2021-02-25 | 2021-06-08 | 电子科技大学 | Three-dimensional reconstruction method, system, storage medium and terminal based on example segmentation |
CN113129449A (en) * | 2021-04-16 | 2021-07-16 | 浙江孔辉汽车科技有限公司 | Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision |
CN113313810A (en) * | 2021-06-18 | 2021-08-27 | 广东工业大学 | 6D attitude parameter calculation method for transparent object |
CN113850900A (en) * | 2021-05-27 | 2021-12-28 | 北京大学 | Method and system for recovering depth map based on image and geometric clue in three-dimensional reconstruction |
CN113936090A (en) * | 2021-06-18 | 2022-01-14 | 中国科学院自动化研究所 | Three-dimensional human body reconstruction method and device, electronic equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10757393B2 (en) * | 2018-05-06 | 2020-08-25 | American Advanced Technology, Llc | Method and apparatus for measuring characteristics of a surface topology |
US11037051B2 (en) * | 2018-11-28 | 2021-06-15 | Nvidia Corporation | 3D plane detection and reconstruction using a monocular image |
CN115210532A (en) * | 2020-03-05 | 2022-10-18 | 奇跃公司 | System and method for depth estimation by learning triangulation and densification of sparse points for multi-view stereo |
-
2022
- 2022-04-07 CN CN202210362589.4A patent/CN114913287B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682477A (en) * | 2012-05-16 | 2012-09-19 | 南京邮电大学 | Regular scene three-dimensional information extracting method based on structure prior |
CN107169475A (en) * | 2017-06-19 | 2017-09-15 | 电子科技大学 | A kind of face three-dimensional point cloud optimized treatment method based on kinect cameras |
CN108961390A (en) * | 2018-06-08 | 2018-12-07 | 华中科技大学 | Real-time three-dimensional method for reconstructing based on depth map |
CN111027547A (en) * | 2019-12-06 | 2020-04-17 | 南京大学 | Automatic detection method for multi-scale polymorphic target in two-dimensional image |
CN111340864A (en) * | 2020-02-26 | 2020-06-26 | 浙江大华技术股份有限公司 | Monocular estimation-based three-dimensional scene fusion method and device |
CN111598998A (en) * | 2020-05-13 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium |
CN111598030A (en) * | 2020-05-21 | 2020-08-28 | 山东大学 | Method and system for detecting and segmenting vehicle in aerial image |
CN111968169A (en) * | 2020-08-19 | 2020-11-20 | 北京拙河科技有限公司 | Dynamic human body three-dimensional reconstruction method, device, equipment and medium |
CN112184899A (en) * | 2020-11-06 | 2021-01-05 | 中山大学 | Three-dimensional reconstruction method based on symbolic distance function |
CN112652016A (en) * | 2020-12-30 | 2021-04-13 | 北京百度网讯科技有限公司 | Point cloud prediction model generation method, pose estimation method and device |
CN112927354A (en) * | 2021-02-25 | 2021-06-08 | 电子科技大学 | Three-dimensional reconstruction method, system, storage medium and terminal based on example segmentation |
CN113129449A (en) * | 2021-04-16 | 2021-07-16 | 浙江孔辉汽车科技有限公司 | Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision |
CN113850900A (en) * | 2021-05-27 | 2021-12-28 | 北京大学 | Method and system for recovering depth map based on image and geometric clue in three-dimensional reconstruction |
CN113313810A (en) * | 2021-06-18 | 2021-08-27 | 广东工业大学 | 6D attitude parameter calculation method for transparent object |
CN113936090A (en) * | 2021-06-18 | 2022-01-14 | 中国科学院自动化研究所 | Three-dimensional human body reconstruction method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
基于二值图像的三维人体模型重建;陈佳宇 等;《毛纺科技》;第48卷(第09期);第61-67页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114913287A (en) | 2022-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111598998B (en) | Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium | |
CN110998659B (en) | Image processing system, image processing method, and program | |
Voynov et al. | Perceptual deep depth super-resolution | |
JP6368709B2 (en) | Method for generating 3D body data | |
CN109684969B (en) | Gaze position estimation method, computer device, and storage medium | |
Vitoria et al. | Semantic image inpainting through improved wasserstein generative adversarial networks | |
US11074752B2 (en) | Methods, devices and computer program products for gradient based depth reconstructions with robust statistics | |
Tsiminaki et al. | High resolution 3D shape texture from multiple videos | |
KR20220117324A (en) | Learning from various portraits | |
US20220222895A1 (en) | Method for human body model reconstruction and reconstruction system | |
CN114913287B (en) | Three-dimensional human body model reconstruction method and system | |
Cheng et al. | Quad‐fisheye Image Stitching for Monoscopic Panorama Reconstruction | |
WO2021151380A1 (en) | Method for rendering virtual object based on illumination estimation, method for training neural network, and related products | |
CN111192308B (en) | Image processing method and device, electronic equipment and computer storage medium | |
JP5795556B2 (en) | Shadow information deriving device, shadow information deriving method and program | |
CN109166176B (en) | Three-dimensional face image generation method and device | |
Yin et al. | 3D face recognition based on high-resolution 3D face modeling from frontal and profile views | |
Kim et al. | Feature detection and simplification of 3D face data with facial expressions | |
CN115409949A (en) | Model training method, visual angle image generation method, device, equipment and medium | |
Polasek et al. | Vision UFormer: Long-range monocular absolute depth estimation | |
CN115272575A (en) | Image generation method and device, storage medium and electronic equipment | |
CN112785494B (en) | Three-dimensional model construction method and device, electronic equipment and storage medium | |
US11120606B1 (en) | Systems and methods for image texture uniformization for multiview object capture | |
CN108921908B (en) | Surface light field acquisition method and device and electronic equipment | |
Villa-Uriol et al. | Automatic creation of three-dimensional avatars |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |