CN114913287B - Three-dimensional human body model reconstruction method and system - Google Patents

Three-dimensional human body model reconstruction method and system Download PDF

Info

Publication number
CN114913287B
CN114913287B CN202210362589.4A CN202210362589A CN114913287B CN 114913287 B CN114913287 B CN 114913287B CN 202210362589 A CN202210362589 A CN 202210362589A CN 114913287 B CN114913287 B CN 114913287B
Authority
CN
China
Prior art keywords
dimensional
input image
target user
parameters
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210362589.4A
Other languages
Chinese (zh)
Other versions
CN114913287A (en
Inventor
袁潮
请求不公布姓名
肖占中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhuohe Technology Co Ltd
Original Assignee
Beijing Zhuohe Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhuohe Technology Co Ltd filed Critical Beijing Zhuohe Technology Co Ltd
Priority to CN202210362589.4A priority Critical patent/CN114913287B/en
Publication of CN114913287A publication Critical patent/CN114913287A/en
Application granted granted Critical
Publication of CN114913287B publication Critical patent/CN114913287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Abstract

The application provides a three-dimensional human model reconstruction method and a three-dimensional human model reconstruction system, which relate to the technical field of computer vision, wherein the method comprises the following steps: acquiring an input image, wherein the input image comprises limbs of a target user; generating a feature map pyramid according to the input image through a feature pyramid network; extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales; processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user. The embodiment of the application can improve the accuracy of reconstructing the three-dimensional human body model.

Description

Three-dimensional human body model reconstruction method and system
Technical Field
The application relates to the technical field of computer vision, in particular to a three-dimensional human model reconstruction method and system.
Background
Detecting surface areas of the human body in the image and estimating parameters of the three-dimensional (3D) surface provides important information for understanding the structure and context of the scene, and can also be applied to three-dimensional human body model reconstruction. For example, the method is applied to the aspects of virtual reality scenes, human special effects, human detection and the like.
However, for surface detection, the task of detecting a surface or estimating surface parameters is fundamentally challenging, as estimating from a single 2D image is an ill-posed problem. First, 3D surface reconstruction requires a rich a priori. Second, surface structures often lack texture, requiring global understanding of image semantics, rather than local texture analysis. The latest approach to this task makes use of Convolutional Neural Networks (CNNs), formulating the task as a surface segmentation task.
And for human body detection, the traditional three-dimensional human body model reconstruction mode is obtained by matching and aligning irregular point clouds of the depth map with a three-dimensional human body regular grid model. However, the matching result of the method is seriously dependent on the quality of the depth map, and if the resolution ratio of the depth map is low, the three-dimensional human body model obtained by reconstruction is inaccurate.
Disclosure of Invention
The embodiment of the application aims to provide a three-dimensional human body model reconstruction method and system. The specific technical scheme is as follows:
in a first aspect of an embodiment of the present application, a three-dimensional human model reconstruction method is provided, including:
acquiring an input image, wherein the input image comprises limbs of a target user;
generating a feature map pyramid according to the input image through a feature pyramid network;
extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
Optionally, the method further comprises:
the feature map pyramid includes a plurality of levels associated with a number of different scales.
Optionally, the bounding box parameter defines a bounding box indicating a position of the surface object, wherein the surface normal parameter indicates a relative orientation of the surface object.
Optionally, the method further comprises:
a surface offset parameter is calculated for each surface object of the plurality of surface objects based on the normal vector and the depth map estimated for each surface object, wherein a decoder network included in the one or more neural networks is configured to process a map of the feature pyramid estimated depth map.
Optionally, the method further comprises:
the region of interest is processed to generate a four-element vector representing a location of a vertex of the bounding box and a size of the bounding box for each of the plurality of anchor boxes, associated with the region of interest to estimate the bounding box parameters.
Optionally, the method further comprises:
surface normal parameters are estimated by processing the region of interest to generate an anchor identifier and a three-element vector representing a residual vector corresponding to the anchor normal associated with the anchor identifier.
Optionally, the method further comprises:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional human body model into a two-dimensional image according to camera parameters corresponding to the image.
In yet another aspect of an embodiment of the present application, there is provided a three-dimensional mannequin reconstruction system including:
the input image acquisition module is used for acquiring an input image, wherein the input image comprises limbs of a target user;
the feature map pyramid generation module is used for generating a feature map pyramid according to the input image through a feature pyramid network;
the three-dimensional parameter generation module is used for extracting the characteristics of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
a surface parameter generation module for processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
the reconstruction module is used for reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
In still another aspect of the embodiments of the present application, there is provided a light field reconstruction apparatus of a light field camera, including: a processor for performing the method of any one of claims 1 to 7.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is another flow chart of a three-dimensional mannequin reconstruction method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a three-dimensional human model reconstruction method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a three-dimensional mannequin reconstruction system according to an embodiment of the present application;
fig. 4 is an internal structure diagram of a computer device provided by an embodiment of the present application for a target user.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The three-dimensional human model reconstruction method provided by the application can be applied to an application environment shown in figure 1. The terminal 102 obtains an input image including the limb of the target user. The terminal sends the image to a server 104, and the server 104 generates a feature map pyramid according to the input image through a feature pyramid network; and extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales. Server 104 also processes the region of interest sampled from the feature map pyramid through one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask.
The server 104 then returns the three-dimensional parameters of the target user, and the parameters of the surface object, to the terminal 102. The terminal 102 reconstructs a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminal 102 may be, but is not limited to, a smart phone, tablet, notebook, desktop, smart box, smart watch, etc. The terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
In some embodiments, the three-dimensional mannequin reconstruction method is applicable to three-dimensional mannequin reconstruction of a human body, and comprises the following steps:
acquiring an input image, wherein the input image comprises limbs of a target user;
generating a feature map pyramid according to the input image through a feature pyramid network;
extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
As can be seen from the above, in the embodiment of the present application, the trained neural network model is used to perform graph convolution processing on the human body image, so that the three-dimensional human body parameters corresponding to the target human body in the human body image and the parameters of the target human body surface object are rapidly and accurately obtained, and the three-dimensional human body model is accurately reconstructed through the three-dimensional human body parameters and the human body surface parameters, so that the accuracy of the three-dimensional human body model reconstruction and the reconstruction efficiency are improved.
Fig. 2 is a schematic flow chart of a three-dimensional human body model reconstruction method according to an embodiment of the present application, and as shown in fig. 2, the three-dimensional human body model reconstruction method includes the following steps:
step 201, an input image is acquired.
Wherein the input image may comprise a limb of the target user. The target user may refer to a user who needs to perform a three-dimensional manikin. For example, the target user may be a person in the input image. Wherein the input image may be a single image frame (e.g., RGB, monochrome, etc.) having one or more channels. The input image may also be a color image, which has higher resolution and richer details for the depth map, and can reconstruct a three-dimensional human body model of a human body more finely.
Optionally, the terminal may obtain the corresponding input image by directly shooting the target user, or may obtain the input image corresponding to the target user from a local or network or from a third device, and the method for obtaining the input image is not limited in the embodiments of the present application.
And 202, generating a feature map pyramid according to the input image through a feature pyramid network.
In some embodiments, a feature pyramid network is implemented to process input images and generate feature map pyramids. As used herein, a feature map pyramid refers to a plurality of feature maps having different scales relative to the scale of an input image. The feature map pyramid may include a plurality of levels, each level including one or more feature maps of a particular scale that increase in scale (e.g., resolution in pixel space) as one moves from the top to the bottom of the pyramid. In some embodiments, the feature pyramid network is based on a residual network that extracts image features. The feature map is then up-sampled and combined with intermediate feature maps from the residual network to generate a feature map pyramid.
And 203, extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales.
Specifically, the terminal may perform feature extraction on an input image of the target user to obtain features corresponding to the input image, and obtain a feature map. And then, the terminal carries out graph convolution processing on the feature graph to obtain point cloud features with different scales. And the terminal carries out convolution processing with the channel number of 3 on the point cloud features with different scales to obtain point cloud coordinates with different scales, wherein the point cloud coordinates are three-dimensional coordinates.
Specifically, the terminal can perform downsampling and full connection processing on point cloud coordinates with different scales to obtain three-dimensional parameters of a target user, and can reconstruct a three-dimensional model of the whole human body based on the skin multi-person linear parameters.
Step 204, processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image.
Wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask.
A sliding window is applied to each feature map in the feature map pyramid to sample the region of interest. The region of interest may refer to a region of the feature map corresponding to a particular subset of the input image. While the sliding window may have a fixed size, as applied to a particular feature map of a given scale in the feature map pyramid, the region of interest is associated with a variable size region of the input image. For example, a sliding window may be defined as a 7 x 7 pixel region of downsampled size relative to a particular feature map in the feature map pyramid. Based on the relative differences, the 7×7 pixel region in the feature map may correspond to, for example, a 14×14 pixel region, a 28×28 pixel region, or a 56×56 pixel region (or larger) in the input image and the scale of the feature map and the input image in pixel space.
Step 205, reconstructing a three-dimensional human model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object.
Wherein the three-dimensional manikin can be matched with the limb of the target user. That is, the generated three-dimensional mannequin is the three-dimensional mannequin of the target user.
In particular, the three-dimensional parameters may include three-dimensional pose parameters and three-dimensional body shape parameters. The three-dimensional posture parameter is the joint point coordinate of the target object, and the three-dimensional body type parameter is the characteristic point coordinate of the surface of the target object. After the terminal obtains the three-dimensional posture parameters and the three-dimensional body shape parameters corresponding to the target object in the image, a model is built in a three-dimensional space according to the three-dimensional coordinates corresponding to the three-dimensional posture parameters and the three-dimensional coordinates corresponding to the three-dimensional body shape parameters, and therefore a three-dimensional human body model is obtained.
Optionally, the embodiments of the present application may further include the following.
Optionally, the bounding box parameter defines a bounding box indicating a position of the surface object, wherein the surface normal parameter indicates a relative orientation of the surface object.
Optionally, the method of the embodiment of the present application further includes:
a surface offset parameter is calculated for each surface object of the plurality of surface objects based on the normal vector and the depth map estimated for each surface object, wherein a decoder network included in the one or more neural networks is configured to process a map of the feature pyramid estimated depth map.
Optionally, the method of the embodiment of the present application further includes:
the region of interest is processed to generate a four-element vector representing a location of a vertex of the bounding box and a size of the bounding box for each of the plurality of anchor boxes, associated with the region of interest to estimate the bounding box parameters.
Optionally, the method of the embodiment of the present application further includes:
surface normal parameters are estimated by processing the region of interest to generate an anchor identifier and a three-element vector representing a residual vector corresponding to the anchor normal associated with the anchor identifier.
Optionally, the method of the embodiment of the present application further includes:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional human body model into a two-dimensional image according to camera parameters corresponding to the image.
In addition to the surface parameters discussed above, another useful parameter for various applications is a surface offset value. As used herein, a surface offset value refers to the closest distance from the origin of the coordinate system to the surface object, where the origin corresponds to the position of the camera (or virtual camera) that captured the image 202. Wherein the surface offset value may be calculated according to the following formula:
where K is a 3×3 intrinsic camera matrix x i Is the ith pixel, m i Is an indicator, 1 if the ith pixel belongs to a surface object, or 0, z otherwise i Is the predicted depth value of the pixel and n is the normal vector.
That is, for each pixel in the image overlapping the surface object, the distance is calculated by the dot product of the projected 3D position of the pixel and the normal vector, and then the distance in the number of pixels by which the surface offset overlaps the surface object is calculated by the average value of the sums.
In one embodiment, non-projection, coordinate system transformation, and projection may be implemented. Given a set of intrinsic parameters K of the camera, the projected pixels (u) can be cancelled in the neighbor view according to the following equation n ,v n )。
X n =K -1 h(u n ,v n ){circumflex over(D)} n (u n ,v n ),
Wherein { circumflex over (D) } is n Is a surface live depth map of the neighboring view, and h is the depth of a pixel (u n ,v n ) Converting to a homogenous representation. The non-projected pixels are then converted to the current view of the warp model using rotation R and translation t according to the following equation.
C c =RX n +t
Finally, warped pixel coordinates (u w ,v w )。
(u w ,v W )=h -1 (KX c ),
Wherein h is -1 The homogeneous coordinates are converted back into a two-dimensional representation. Depth values are then sampled from the depth map of the current view using warped pixel coordinates (uw, vw), and a loss function view between the sampled depth values is calculated using the warped pixel coordinates and the true depth map of the neighbor.
In order to implement the above method embodiments, the embodiment of the present application further provides a three-dimensional mannequin reconstruction system, and fig. 3 shows a schematic structural diagram of the three-dimensional mannequin reconstruction system provided by the embodiment of the present application, where the system includes:
an input image acquisition module 301, configured to acquire an input image, where the input image includes a limb of a target user;
a feature map pyramid generation module 302, configured to generate a feature map pyramid from the input image through a feature pyramid network;
the three-dimensional parameter generating module 303 is configured to perform feature extraction on the input image to obtain point cloud coordinates of different scales, and generate three-dimensional parameters of the target user according to the point cloud coordinates of different scales;
a surface parameter generation module 304 for processing the region of interest sampled from the feature map pyramid through one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask;
a reconstruction module 305, configured to reconstruct a three-dimensional human model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to corresponding processes in the foregoing method embodiments, which are not described herein again.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing training data of a reconstruction network and reconstruction data of a three-dimensional human body model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a training method for reconstructing a network and a three-dimensional manikin reconstruction method.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a training method for reconstructing a network and a three-dimensional manikin reconstruction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In some embodiments, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In some embodiments, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRA M) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
In summary, the three-dimensional human body model reconstruction method and system provided by the application can acquire an input image, wherein the input image comprises limbs of a target user; generating a feature map pyramid according to the input image through a feature pyramid network; extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales; processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate parameters of each of the plurality of surface objects, the parameters including a bounding box parameter, a surface normal parameter, and a segmentation mask; reconstructing a three-dimensional human body model of the target user based on the three-dimensional parameters of the target user and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for reconstructing a three-dimensional mannequin, comprising:
acquiring an input image, wherein the input image comprises limbs of a target user;
generating a feature map pyramid according to the input image through a feature pyramid network;
extracting features of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate a parameter and a surface offset value for each of the plurality of surface objects, the parameters of the surface objects including a bounding box parameter, a surface normal parameter, and a segmentation mask, the surface offset value calculated according to:
wherein K is a 3×3 intrinsic camera matrix x i Is the ith pixel, m i Is an indicator, 1 if the ith pixel belongs to a surface object, or 0, z otherwise i Is the predicted depth value of the pixel, n is the normal vector;
reconstructing a three-dimensional human model of the target user based on the three-dimensional parameters of the target user, the surface offset value, and the parameters of the surface object; the three-dimensional manikin is matched to the limb of the target user.
2. The method of claim 1, wherein the feature map pyramid comprises a plurality of levels associated with a number of different scales.
3. The method of claim 1, wherein the bounding box parameter defines a bounding box indicating a location of the surface object, wherein the surface normal parameter indicates a relative orientation of the surface object.
4. The method of claim 1, wherein the method further comprises:
the surface offset value is calculated for each surface object of a plurality of surface objects based on the normal vector and the depth map estimated for each surface object, wherein a decoder network included in the one or more neural networks is configured to process a map of the feature pyramid estimated depth map.
5. The method of claim 1, wherein the method further comprises:
the region of interest is processed to generate a four-element vector representing a location of a vertex of the bounding box and a size of the bounding box for each of the plurality of anchor boxes, associated with the region of interest to estimate the bounding box parameters.
6. The method as recited in claim 5, further comprising:
surface normal parameters are estimated by processing the region of interest to generate an anchor identifier and a three-element vector representing a residual vector corresponding to the anchor normal associated with the anchor identifier.
7. The method of claim 1, wherein the method further comprises:
determining camera parameters corresponding to the image based on the point cloud coordinates of different scales;
and projecting the three-dimensional human body model into a two-dimensional image according to camera parameters corresponding to the image.
8. A three-dimensional manikin reconstruction system, said system comprising:
the input image acquisition module is used for acquiring an input image, wherein the input image comprises limbs of a target user;
the feature map pyramid generation module is used for generating a feature map pyramid according to the input image through a feature pyramid network;
the three-dimensional parameter generation module is used for extracting the characteristics of the input image to obtain point cloud coordinates with different scales, and generating three-dimensional parameters of the target user according to the point cloud coordinates with different scales;
a surface parameter generation module for processing the region of interest sampled from the feature map pyramid by one or more neural networks to identify a plurality of surface objects in the input image; wherein the one or more neural networks are configured to estimate a parameter and a surface offset value for each of the plurality of surface objects, the parameters of the surface objects including a bounding box parameter, a surface normal parameter, and a segmentation mask, the surface offset value calculated according to:
wherein K is a 3×3 intrinsic camera matrix x i Is the ith pixel, m i Is an indicator, 1 if the ith pixel belongs to a surface object, or 0, z otherwise i Is the predicted depth value of the pixel, n is the normal vector;
a reconstruction module, configured to reconstruct a three-dimensional human body model of the target user based on the three-dimensional parameter of the target user, the surface offset value, and the parameter of the surface object; the three-dimensional manikin is matched to the limb of the target user.
9. An electronic device, the device comprising:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of three-dimensional reconstruction of a human body as recited in any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of three-dimensional reconstruction of a human body as claimed in any one of claims 1-7.
CN202210362589.4A 2022-04-07 2022-04-07 Three-dimensional human body model reconstruction method and system Active CN114913287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210362589.4A CN114913287B (en) 2022-04-07 2022-04-07 Three-dimensional human body model reconstruction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210362589.4A CN114913287B (en) 2022-04-07 2022-04-07 Three-dimensional human body model reconstruction method and system

Publications (2)

Publication Number Publication Date
CN114913287A CN114913287A (en) 2022-08-16
CN114913287B true CN114913287B (en) 2023-08-22

Family

ID=82762654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210362589.4A Active CN114913287B (en) 2022-04-07 2022-04-07 Three-dimensional human body model reconstruction method and system

Country Status (1)

Country Link
CN (1) CN114913287B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115797B (en) * 2022-08-25 2022-11-25 清华大学 Large-scene sparse light field semantic driving intelligent reconstruction method, system and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682477A (en) * 2012-05-16 2012-09-19 南京邮电大学 Regular scene three-dimensional information extracting method based on structure prior
CN107169475A (en) * 2017-06-19 2017-09-15 电子科技大学 A kind of face three-dimensional point cloud optimized treatment method based on kinect cameras
CN108961390A (en) * 2018-06-08 2018-12-07 华中科技大学 Real-time three-dimensional method for reconstructing based on depth map
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
CN111598998A (en) * 2020-05-13 2020-08-28 腾讯科技(深圳)有限公司 Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium
CN111598030A (en) * 2020-05-21 2020-08-28 山东大学 Method and system for detecting and segmenting vehicle in aerial image
CN111968169A (en) * 2020-08-19 2020-11-20 北京拙河科技有限公司 Dynamic human body three-dimensional reconstruction method, device, equipment and medium
CN112184899A (en) * 2020-11-06 2021-01-05 中山大学 Three-dimensional reconstruction method based on symbolic distance function
CN112652016A (en) * 2020-12-30 2021-04-13 北京百度网讯科技有限公司 Point cloud prediction model generation method, pose estimation method and device
CN112927354A (en) * 2021-02-25 2021-06-08 电子科技大学 Three-dimensional reconstruction method, system, storage medium and terminal based on example segmentation
CN113129449A (en) * 2021-04-16 2021-07-16 浙江孔辉汽车科技有限公司 Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision
CN113313810A (en) * 2021-06-18 2021-08-27 广东工业大学 6D attitude parameter calculation method for transparent object
CN113850900A (en) * 2021-05-27 2021-12-28 北京大学 Method and system for recovering depth map based on image and geometric clue in three-dimensional reconstruction
CN113936090A (en) * 2021-06-18 2022-01-14 中国科学院自动化研究所 Three-dimensional human body reconstruction method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10757393B2 (en) * 2018-05-06 2020-08-25 American Advanced Technology, Llc Method and apparatus for measuring characteristics of a surface topology
US11037051B2 (en) * 2018-11-28 2021-06-15 Nvidia Corporation 3D plane detection and reconstruction using a monocular image
CN115210532A (en) * 2020-03-05 2022-10-18 奇跃公司 System and method for depth estimation by learning triangulation and densification of sparse points for multi-view stereo

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682477A (en) * 2012-05-16 2012-09-19 南京邮电大学 Regular scene three-dimensional information extracting method based on structure prior
CN107169475A (en) * 2017-06-19 2017-09-15 电子科技大学 A kind of face three-dimensional point cloud optimized treatment method based on kinect cameras
CN108961390A (en) * 2018-06-08 2018-12-07 华中科技大学 Real-time three-dimensional method for reconstructing based on depth map
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
CN111598998A (en) * 2020-05-13 2020-08-28 腾讯科技(深圳)有限公司 Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium
CN111598030A (en) * 2020-05-21 2020-08-28 山东大学 Method and system for detecting and segmenting vehicle in aerial image
CN111968169A (en) * 2020-08-19 2020-11-20 北京拙河科技有限公司 Dynamic human body three-dimensional reconstruction method, device, equipment and medium
CN112184899A (en) * 2020-11-06 2021-01-05 中山大学 Three-dimensional reconstruction method based on symbolic distance function
CN112652016A (en) * 2020-12-30 2021-04-13 北京百度网讯科技有限公司 Point cloud prediction model generation method, pose estimation method and device
CN112927354A (en) * 2021-02-25 2021-06-08 电子科技大学 Three-dimensional reconstruction method, system, storage medium and terminal based on example segmentation
CN113129449A (en) * 2021-04-16 2021-07-16 浙江孔辉汽车科技有限公司 Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision
CN113850900A (en) * 2021-05-27 2021-12-28 北京大学 Method and system for recovering depth map based on image and geometric clue in three-dimensional reconstruction
CN113313810A (en) * 2021-06-18 2021-08-27 广东工业大学 6D attitude parameter calculation method for transparent object
CN113936090A (en) * 2021-06-18 2022-01-14 中国科学院自动化研究所 Three-dimensional human body reconstruction method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于二值图像的三维人体模型重建;陈佳宇 等;《毛纺科技》;第48卷(第09期);第61-67页 *

Also Published As

Publication number Publication date
CN114913287A (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN111598998B (en) Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium
CN110998659B (en) Image processing system, image processing method, and program
Voynov et al. Perceptual deep depth super-resolution
JP6368709B2 (en) Method for generating 3D body data
CN109684969B (en) Gaze position estimation method, computer device, and storage medium
Vitoria et al. Semantic image inpainting through improved wasserstein generative adversarial networks
US11074752B2 (en) Methods, devices and computer program products for gradient based depth reconstructions with robust statistics
Tsiminaki et al. High resolution 3D shape texture from multiple videos
KR20220117324A (en) Learning from various portraits
US20220222895A1 (en) Method for human body model reconstruction and reconstruction system
CN114913287B (en) Three-dimensional human body model reconstruction method and system
Cheng et al. Quad‐fisheye Image Stitching for Monoscopic Panorama Reconstruction
WO2021151380A1 (en) Method for rendering virtual object based on illumination estimation, method for training neural network, and related products
CN111192308B (en) Image processing method and device, electronic equipment and computer storage medium
JP5795556B2 (en) Shadow information deriving device, shadow information deriving method and program
CN109166176B (en) Three-dimensional face image generation method and device
Yin et al. 3D face recognition based on high-resolution 3D face modeling from frontal and profile views
Kim et al. Feature detection and simplification of 3D face data with facial expressions
CN115409949A (en) Model training method, visual angle image generation method, device, equipment and medium
Polasek et al. Vision UFormer: Long-range monocular absolute depth estimation
CN115272575A (en) Image generation method and device, storage medium and electronic equipment
CN112785494B (en) Three-dimensional model construction method and device, electronic equipment and storage medium
US11120606B1 (en) Systems and methods for image texture uniformization for multiview object capture
CN108921908B (en) Surface light field acquisition method and device and electronic equipment
Villa-Uriol et al. Automatic creation of three-dimensional avatars

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant