WO2021120052A1 - 3d reconstruction from an insufficient number of images - Google Patents

3d reconstruction from an insufficient number of images Download PDF

Info

Publication number
WO2021120052A1
WO2021120052A1 PCT/CN2019/126298 CN2019126298W WO2021120052A1 WO 2021120052 A1 WO2021120052 A1 WO 2021120052A1 CN 2019126298 W CN2019126298 W CN 2019126298W WO 2021120052 A1 WO2021120052 A1 WO 2021120052A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
images
user
image sequence
silhouettes
Prior art date
Application number
PCT/CN2019/126298
Other languages
French (fr)
Inventor
Sato Hiroyuki
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/CN2019/126298 priority Critical patent/WO2021120052A1/en
Publication of WO2021120052A1 publication Critical patent/WO2021120052A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/564Depth or shape recovery from multiple images from contours

Definitions

  • the present invention relates to three dimension (3D) reconstruction from a plurality of two dimension (2D) images captured from a subject.
  • MVS methods are not able to recover dense depth values on uniformly colored surface) , etc. Therefore, an insufficient number of images are captured to reconstruct the subject. Such an insufficient number of images cannot reconstruct a complete 3D model, namely, they leave big holes on the reconstructed 3D model of the subject.
  • a device is provided to achieve closing holes on the reconstructed 3D model caused by an insufficient number of images.
  • a device including: a camera for capturing an image sequence of a subject, a three dimension (3D) reconstruction unit for reconstructing a 3D model from the image sequence, and a model refinement unit for refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence.
  • 3D three dimension
  • the 3D model is refined based on one or more silhouettes of the subject that are extracted from the one or more selected images.
  • the device further includes: a user interface unit for showing one or more silhouettes of the subject that are extracted from the one or more selected images, and making the user check whether the one or more silhouettes are accurate or not.
  • the 3D model is reconstructed as a set of points, and holes on the 3D model is closed by one or more parts of a set of tangent surfaces computed from the one or more silhouettes, wherein the one or more parts of the set of tangent surfaces are inside a 3D model reconstructed as 3D mesh from the set of points.
  • a method performed by a device includes: capturing an image sequence of a subject, reconstructing a three dimension (3D) model from the image sequence, and refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence.
  • 3D three dimension
  • a computer readable storage media storing a program thereon, where when the program is executed by a processor, the program causes the processor to perform the method according to the second aspect.
  • Fig. 1 depicts an example of a usage scene of a 3D human model reconstruction application according to a first embodiment of the present invention
  • Fig. 2 depicts an example of a block diagram of a hardware configuration
  • Fig. 3 depicts an example of a block diagram of a functional configuration
  • Fig. 4 (a) depicts an example of an overall flowchart of model refinement
  • Fig. 4 (b) depicts an example of a detailed flowchart of the model refinement
  • Fig. 5 (a) depicts an example of a UI shown on the display 117;
  • Fig. 5 (b) depicts an example of a UI shown on the display 117;
  • Fig. 6 (a) depicts an example of a 3D model 300 with a hole
  • Fig. 6 (b) depicts an example of a tangent surface 301
  • Fig. 6 (c) depicts an example of a 3D model 300 and corresponding part of the tangent surface 302 that will be merged to fill the hole;
  • Fig. 6 (d) depicts an example of a refined 3D model 303 .
  • a first embodiment of the present invention is a 3D human model reconstruction application on a mobile device.
  • Fig. 1 depicts an example of a usage scene of the 3D human model reconstruction application on a mobile device 100, for example, a smart phone.
  • a user 102 holding and operating the mobile device 100 scans a static target person 101.
  • the target person 101 in Fig. 1 is conveniently shown as a simplified shape of a human, but it is intended to be an actual human.
  • the subject is not limited to a human, for example, it may be from a small thing to a large thing, such as a stuffed toy, a car, etc.
  • the user 102 is supposed to move around the target person 101 while keeping a camera 115 (Fig. 2) on the mobile device 100 toward the target person 101 and operate a user interface (UI) on a display 117 (Fig. 2) .
  • UI user interface
  • the term “scan” means capturing images of a subject from various directions. It is ideal to capture enough images to cover almost all of the surface of the subject; however, images captured by non-professional users are not enough. In many cases, 3D depth information of, for example, the top of the head, armpits, and crotch cannot be obtained, and holes are left on the reconstructed 3D model of the subject. Some of the reasons of the lack of the images are that it is difficult to capture the top of the head of the static target person 101 without going a higher place, and armpits and crotch are usually occluded by the other parts of the body. Existing techniques for closing the holes are as follows:
  • Screened Poisson surface reconstruction (for example, refer to: Kazhdan, Michael, and Hugues Hoppe, "Screened poisson surface reconstruction” , ACM Transactions on Graphics (ToG) 32.3 (2013) : 29) , which assumes that a continuous implicit surface is behind observed points, is widely used to make a 3D mesh from a set of points. Method 1 fills holes implicitly at the same time as meshing.
  • Screened Poisson surface reconstruction often fails to naturally close large holes around locally steep geometry, and makes inflated artifacts which are bigger/fatter than the actual surface.
  • Method 2 Hole filling (for example, refer to: Liepa, Peter, "Filling holes in meshes" , Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry processing, Eurographics Association, 2003) is also widely used to fill holes on a 3D model. Method 2 detects hole boundaries on a 3D model, performs parameterization for it, and finally polygonises it. But Method 2 is not robust and sometimes fails to fill holes or fills a hole unnaturally in practice since it can’t handle complex hole boundaries on a noisy mesh.
  • Visual Hull (for example, refer to: United States Patent Application, Publication No. US2015/0178988A1, “Method and a system for generating a realistic 3d reconstruction model for an object or being” ) , which reconstructs 3D model from a plurality of silhouette images, is another way of 3D reconstruction.
  • Visual Hull is usually performed under well calibrated settings. For instance, a subject is supposed to be in a special room where a sufficient number of cameras is tightly fixed and walls and floors are covered by distinct color to extract accurate silhouette of the subject. Under such lab-setting Visual Hull would reconstruct accurate 3D model.
  • Method 3 describes a system in such a special room.
  • Method 3 basically relies on Visual Hull, but enhances fidelity of a face by fusing high resolution mesh that comes from structured-light based triangulation.
  • Special smoothing method is applied to boundary of face to alleviate visible geometrical steps caused by combining two independent meshes.
  • one or more silhouettes of a subject are used to close holes on the 3D model.
  • the silhouette is useful to close holes appearing on unobservable region such as the top of the head, crotch or armpits.
  • the mobile device 100 includes a CPU (Central Processing Unit) 110, a RAM (Random Access Memory) 111, a ROM (Read Only Memory) 112, a bus 113, an Input/Output I/F (Interface) 114, a display 117, and a touch panel 118.
  • the mobile device 100 also has a camera 115 and a storage device 116 that are connected to the bus 113 via the Input/Output I/F 114.
  • the CPU 110 controls each element connected through the bus 113.
  • the RAM 111 is used for a main memory of the CPU 110 and so on.
  • the ROM 112 stores OS (Operating System) , programs, device drivers and so on.
  • the camera 115 connected via the Input/Output I/F 114 captures still images or videos.
  • the storage device 116 connected via the Input/Output I/F 114 is a storage having a large capacity, for example, a hard disk or a flash memory.
  • the Input/Output I/F 114 converts data captured by the camera 115 into an image format, and stores it in the storage device 116.
  • the display 117 shows a user interface.
  • the touch panel 118 embedded on the display 117 accepts and transfers touch operations by the user 102 to the CPU 110.
  • Fig. 3 depicts an example of a block diagram of a function configuration of the first embodiment.
  • the mobile device 100 includes a user interface control unit 120, an image acquisition unit 121, a silhouette extraction unit 122, a 3D reconstruction unit 123, a model refinement unit 124, and a storage unit 125.
  • the user interface control unit 120 controls a user interface shown on the display 117 according to the states of the other units and touch operations by the user 102 to the touch panel 118.
  • the user interface control unit 120 is realized by the CPU 110, the RAM 111, programs in the ROM 112, the bus 113, the display 117, and the touch panel 118.
  • the image acquisition unit 121 obtains a sequence of still images or a video from the camera 115, and stores it in the RAM 111 or the storage device 116.
  • the image acquisition unit 121 is realized by the CPU 110, the RAM 111, programs in the ROM 112, the bus 113, the Input/Output I/F 114, and the camera 115.
  • the silhouette extraction unit 122 extracts a silhouette of the target person 101 from a still image or a frame of video captured by the image acquisition unit 121 and stored in the RAM 111 or the storage unit 125.
  • the silhouette extraction unit 122 could be implemented in various ways, for example, background subtraction or CNN (Convolutional Neural Network) .
  • CNN Convolutional Neural Network
  • the silhouette extraction unit 122 is realized by the CPU 110, the RAM 111, programs in the ROM 112, and the bus 113.
  • the 3D reconstruction unit 123 reconstructs a 3D model of the target person 101 from the sequence of still images or the video captured by the image acquisition unit 121 and stored in the RAM 111 or the storage unit 125.
  • the 3D reconstruction unit 123 also estimates extrinsic parameters that define 3D rigid transformation between each image used for the reconstruction and the 3D model.
  • the 3D reconstruction unit 123 could be implemented in various ways, for example, SfM (Structure-from-Motion) and MVS (Multi-View Stereo) for color images or KinectFusion for depth images.
  • the 3D reconstruction unit 123 is realized by the CPU 110, the RAM 111, programs in the ROM 112, and the bus 113.
  • the model refinement unit 124 refines the 3D model reconstructed by the 3D reconstruction unit 123 to make a refined 3D model with one or more silhouettes selected by the user 102.
  • the details will be described later.
  • the model refinement unit 124 is realized by the CPU 110, the RAM 111, programs in the ROM 112, and the bus 113.
  • the storage unit 125 stores the captured images and the refined 3D model into the storage device 116 for further use.
  • the storage unit 125 is realized by the Input/Output I/F 114 and the storage device 116.
  • the CPU 110 controls the above-mentioned units in this embodiment.
  • Fig. 4 (a) depicts an example of an overall flowchart of model refinement according to the first embodiment.
  • Fig. 4 (b) depicts an example of a detailed flowchart of the model refinement according to the first embodiment.
  • Each step of Figs. 4 (a) and 4 (b) would be executed by the CPU 110 and data are stored in the RAM 111 or the storage device 116 and loaded from them as needed.
  • the CPU110 obtains an image sequence via the image acquisition unit 121 with the camera 115 and stores it in the RAM 111. It is assumed that the images are colored in this embodiment. It could be possible to store it in the storage device 116 by the storage unit 125.
  • Fig. 1 shows how the mobile device 100 is operated in this step. The user 102 holding and operating the mobile device 100 scans a static target person 101 as completely as possible. The user 102 is supposed to move around the target person 101 while keeping the camera 115 on the back of the mobile device 100 toward the target person 101.
  • CPU110 processes the image sequence obtained at step S100 to generate a 3D model.
  • the 3D reconstruction unit 123 reconstructs the 3D model and estimates extrinsic camera parameters (mentioned above) and if necessary, intrinsic camera parameters (mentioned later) . All of the output at step S101 are stored in the RAM 111 or the storage device 116.
  • the UI on the display 117 requests the user 102 to select one or more images, to which the user 102 wishes to fit the 3D model, from the image sequence.
  • the message “Select frontal view” in Fig. 5 (a) is merely an example, and the user 102 is requested to select “one or more images” to be used for fitting the 3D model into the silhouettes of the subject that are extracted from those “one or more images” .
  • the UI is controlled by the user interface control unit 120.
  • Fig. 5 (a) is the UI of this step. On the display 117, thumbnails of the image sequence 200 are shown. The images captured by the camera 115 are used for thumbnails in Figs. 5 (a) and 5 (b) (the face of the person in the thumbnails in Figs. 5 (a) and 5 (b) have been processed for the purpose of privacy protection because this patent application document will be opened to the public) .
  • Fig. 5 (a) is the UI of this step.
  • thumbnails of the image sequence 200 are shown on the display 117.
  • the images captured by the camera 115 are used for thumbnails in Figs. 5 (a) and 5 (b) (the face of the person in the thumbnails in Figs. 5 (a) and 5 (b) have been processed for the purpose of privacy protection because this patent application document will be opened to the public) .
  • Fig. 5 (a) is the UI of this step.
  • thumbnails of the image sequence 200 are shown.
  • the images captured by the camera 115
  • the upper-left image is a photo of a person standing in a room that is captured from the front
  • the upper-right image, the middle-left image, the middle-right image, and the lower-right image are captured from the rear right, from the back, from the left, and from the front right, respectively
  • the lower-left image is a photo of the lower body of the person captured from the front right.
  • the user 102 is supposed to select one or more frames corresponding to thumbnails by a touch operation. If there are too many images to show on the display 117 at one time, a next page button 201 is shown to change the thumbnails 200 to show the other images. After the user 102 selects one of the images, for example, if the user 102 selects the upper-left image, the UI changes and the selected image 202 is displayed as shown in Fig. 5 (b) .
  • a silhouette extraction unit 122 extracts the silhouette of the target person 101 from the selected frames at step S102.
  • Fig. 5 (b) depicts an example of the UI at step S104.
  • the selected image 202 and corresponding extracted silhouette 203 are shown.
  • the silhouette is shown in white and the background is shown in black. There may be cases where a wrong silhouette is extracted because of an algorithm error. If the user 102 taps one of response buttons 204, namely, taps “OK” or “NG” according to acceptance or rejection of the extracted silhouette. After the user 102 responses, the UI continues to show another selected image and corresponding silhouette.
  • step S105 If all of the one or more selected images and corresponding silhouettes are checked by the user 102, the process goes to the next step, specifically, if at least one silhouette does not have acceptable quality (the user 102 responded “NG” at least once) , goes to step S105, or if all silhouettes are acceptable (the user 102 responded “OK” for all of the one or more selected images and corresponding silhouettes) , goes to step S107.
  • the above-mentioned UI interaction may be eliminated by automatically selecting one or more images for silhouette refinement.
  • step S104 All or part of process at step S104 could be performed in advance or integrated into earlier steps.
  • step S100 when the user 102 is capturing the image sequence, the UI could show corresponding silhouette and the response buttons for each captured image in real time. In this case, the user 102 could select one or more images and check corresponding silhouettes in step S100.
  • step S105 the UI asks whether the user 102 wishes to select the other images from the existing image sequence (the image sequence obtained at step S100) or not. If yes, the process goes back to step S102, or if no, goes to step S106.
  • step S106 the user 102 captures another image sequence in the same way as step S100. After capturing, camera parameters for each additional image are estimated in the same way as step S101. Images captured at this step are merged to the existing image sequence. Then, the process goes back to step S102.
  • the model refinement unit 124 refines the 3D model reconstructed at step S101 by using the silhouettes extracted at step S103 and confirmed by the user 102 at step S104.
  • the details of the model refinement are shown in Fig. 4 (b) and will be explained later.
  • the refined 3D model is stored in the RAM 111 or the storage device 116 for further use, for instance, 3D model viewers or Augmented Reality applications.
  • a user who operates a device selects one or more images of a subject from the image sequence that the user is capturing and/or has already captured. Then, the device according to the present invention extracts one or more silhouettes of the subject that are extracted from the one or more selected images, and asks the user whether the silhouettes are accurate or not. If it is OK, the device applies refinement, based on the silhouettes, for the reconstructed 3D model to close holes on it. If not, the user is requested to select the other images from the image sequence or capture additional images.
  • the model refinement unit 124 refines the 3D model reconstructed by the 3D reconstruction unit 123 to make a refined 3D model with the one or more silhouettes selected by the user 102.
  • the 3D model often has large holes caused by the insufficient number of input images.
  • Fig. 6 (a) depicts an example of a 3D model 300 of a human with a hole on the top of the head caused by difficulty to capture there in a casual scanning.
  • Fig. 6 (a) shows a simplified 3D model of a human only from shoulder to top, viewed from obliquely above.
  • the model refinement unit 124 fills this hole by using the one or more silhouettes.
  • the model refinement unit 124 computes a set of 3D dimensional curved tangent surfaces from each silhouette based on the principle of perspective projective transformation and camera parameters like those for Visual Hull.
  • Various methods could be used at this step. For example, a signed distance function on a voxel grid set to cover the 3D model is updated by unprojecting ( “projection” means mapping 3D points to 2D points on an image, and “unprojection” means the reverse of this “projection” , namely, mapping 2D points on an image to 3D points) inside pixels of a silhouette to make an implicit tangent surface of the silhouette. Then marching cube is applied to extract the tangent surface.
  • the tangent surface should cover over the holes if the 3D model and the silhouette is enough accurate.
  • a tangent surface 301 is shown in Fig. 6 (b) , in which the white part corresponds to the silhouette, and the dark gray part corresponds to a possible surface calculated from the silhouette. This process is applied to all of the one or more silhouettes.
  • camera parameters are required, and are already obtained before performing this step S200.
  • camera 115 has a standard Field-of-View and could be approximated by a pinhole camera model.
  • Intrinsic camera parameters which are focal length and principal point, and distortion coefficients, could be independently calibrated before the application runs or estimated in the 3D reconstruction unit 123. Extrinsic camera parameters are estimated in the 3D reconstruction unit 123.
  • the model refinement unit 124 calculates parts of the tangent surfaces on the holes to close them.
  • the places of the holes are identified.
  • Poisson surface reconstruction (refer to “Existing Method 1” mentioned earlier) is applied to the 3D model to make closed surface with inflated artifacts on the holes.
  • Outside inflated artifacts parts of the closed surface are determined.
  • Such outside parts are above the holes.
  • the Nearest Neighbor from the outside parts to the tangent surfaces is used to find corresponding parts of the tangent surfaces to close holes.
  • Such a part of tangent surface 302 is shown in Fig. 6 (c) .
  • step S202 the parts of the tangent surfaces calculated at step S201 are merged to the 3D model to make a refined 3D model.
  • a merged surface of the refined 3D model 303 is shown in Fig. 6 (d) .
  • holes on the 3D model are closed and aligned with silhouettes of the subject.
  • a closed surface is not only visually good but also important for further application with the 3D model because many computer graphics and computer vision algorithms assume input surface is closed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A device (100) is provided. The device (100) includes: a camera (115) for capturing an image sequence of a subject, a three dimension (3D) reconstruction unit (123) for reconstructing a 3D model from the image sequence, and a model refinement unit (124) for refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence. The device (100) closes holes on the reconstructed 3D model caused by an insufficient number of images.

Description

3D RECONSTRUCTION FROM AN INSUFFICIENT NUMBER OF IMAGES TECHNICAL FIELD
The present invention relates to three dimension (3D) reconstruction from a plurality of two dimension (2D) images captured from a subject.
BACKGROUND
Color image-based 3D reconstruction is a well-studied field. SfM (Structure-from-Motion) (for example, refer to:
Figure PCTCN2019126298-appb-000001
Johannes L. and Jan-Michael Frahm, "Structure-from-motion revisited" , Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition) estimates camera motion and sparse 3D points, and then MVS (Multi-View Stereo) (for example, refer to: 
Figure PCTCN2019126298-appb-000002
Johannes L. et al., "Pixelwise view selection for unstructured multi-view stereo" , European Conference on Computer Vision, Springer, Cham, 2016) is applied to make a dense 3D model from them. Recent depth image based 3D reconstruction like KinectFusion (for example, refer to: Newcombe, Richard A. et al., "Kinectfusion: Real-time dense surface mapping and tracking" , ISMAR, Vol. 11. No. 2011, 2011) can make a dense 3D model in real time. These methods are able to make a complete 3D model if a sufficient number of images are captured. However, in a casual scanning by a non-professional user with a consumer device like a smart phone, a large portion of a surface of a subject often is not validly captured. There are several reasons: a limited camera Field-of-View, big/complex object shape, limited scanning space, limited user interface (UI) feedback, fast camera/user motion, light condition and material (e.g. depth sensor using IR (Infrared) emission cannot capture valid depth values on some black materials with low IR reflection. MVS methods are not able to recover dense depth values on uniformly colored surface) , etc. Therefore, an insufficient number of images are captured to reconstruct the subject. Such an insufficient number of images cannot reconstruct a complete 3D model, namely, they leave big holes on the reconstructed 3D model of the subject.
SUMMARY
A device, is provided to achieve closing holes on the reconstructed 3D model caused by  an insufficient number of images.
According to a first aspect, a device is provided, where the device includes: a camera for capturing an image sequence of a subject, a three dimension (3D) reconstruction unit for reconstructing a 3D model from the image sequence, and a model refinement unit for refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence.
In a first possible implementation manner of the first aspect, the 3D model is refined based on one or more silhouettes of the subject that are extracted from the one or more selected images.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the device further includes: a user interface unit for showing one or more silhouettes of the subject that are extracted from the one or more selected images, and making the user check whether the one or more silhouettes are accurate or not.
With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the 3D model is reconstructed as a set of points, and holes on the 3D model is closed by one or more parts of a set of tangent surfaces computed from the one or more silhouettes, wherein the one or more parts of the set of tangent surfaces are inside a 3D model reconstructed as 3D mesh from the set of points.
According to a second aspect, a method performed by a device is provided, where the method includes: capturing an image sequence of a subject, reconstructing a three dimension (3D) model from the image sequence, and refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence.
According to a third aspect, a computer readable storage media storing a program thereon is provided, where when the program is executed by a processor, the program causes the processor to perform the method according to the second aspect.
BRIEF DESCRIPTION OF DRAWINGS
To describe the technical solutions in the embodiments of the present invention or in the prior art more clearly, the following briefly introduces the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show merely some embodiments of the present invention, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.
Fig. 1 depicts an example of a usage scene of a 3D human model reconstruction  application according to a first embodiment of the present invention;
Fig. 2 depicts an example of a block diagram of a hardware configuration;
Fig. 3 depicts an example of a block diagram of a functional configuration;
Fig. 4 (a) depicts an example of an overall flowchart of model refinement;
Fig. 4 (b) depicts an example of a detailed flowchart of the model refinement;
Fig. 5 (a) depicts an example of a UI shown on the display 117;
Fig. 5 (b) depicts an example of a UI shown on the display 117;
Fig. 6 (a) depicts an example of a 3D model 300 with a hole;
Fig. 6 (b) depicts an example of a tangent surface 301;
Fig. 6 (c) depicts an example of a 3D model 300 and corresponding part of the tangent surface 302 that will be merged to fill the hole;
Fig. 6 (d) depicts an example of a refined 3D model 303 .
DESCRIPTION OF EMBODIMENTS
The following clearly and completely describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. The described embodiments are merely some but not all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.
A first embodiment of the present invention is a 3D human model reconstruction application on a mobile device. Fig. 1 depicts an example of a usage scene of the 3D human model reconstruction application on a mobile device 100, for example, a smart phone. A user 102 holding and operating the mobile device 100 scans a static target person 101. In Fig. 2, only a hand of the user 102 is shown. The target person 101 in Fig. 1 is conveniently shown as a simplified shape of a human, but it is intended to be an actual human. The subject is not limited to a human, for example, it may be from a small thing to a large thing, such as a stuffed toy, a car, etc. The user 102 is supposed to move around the target person 101 while keeping a camera 115 (Fig. 2) on the mobile device 100 toward the target person 101 and operate a user interface (UI) on a display 117 (Fig. 2) .
The term “scan” means capturing images of a subject from various directions. It is ideal to capture enough images to cover almost all of the surface of the subject; however, images captured by non-professional users are not enough. In many cases, 3D depth information of, for example, the top of the head, armpits, and crotch cannot be obtained, and holes are left on the reconstructed 3D model of the subject. Some of the reasons of the lack of the images are that it is  difficult to capture the top of the head of the static target person 101 without going a higher place, and armpits and crotch are usually occluded by the other parts of the body. Existing techniques for closing the holes are as follows:
Existing Method 1: Screened Poisson surface reconstruction (for example, refer to: Kazhdan, Michael, and Hugues Hoppe, "Screened poisson surface reconstruction" , ACM Transactions on Graphics (ToG) 32.3 (2013) : 29) , which assumes that a continuous implicit surface is behind observed points, is widely used to make a 3D mesh from a set of points. Method 1 fills holes implicitly at the same time as meshing. However, Screened Poisson surface reconstruction often fails to naturally close large holes around locally steep geometry, and makes inflated artifacts which are bigger/fatter than the actual surface. Giving a simplified example, if there is a big hole on a surface of a sphere, the surface around the hole is extended in the direction of tangent line, as a result, the hole is closed with a shape like a cone rather than a part of the sphere, and this extended surface expands to be bigger than the original spherical surface. After texture mapping on such inflated artifacts becomes much more noticeable and bad-looking, because conspicuous background color is put on the inflated artifacts.
Existing Method 2: Hole filling (for example, refer to: Liepa, Peter, "Filling holes in meshes" , Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry processing, Eurographics Association, 2003) is also widely used to fill holes on a 3D model. Method 2 detects hole boundaries on a 3D model, performs parameterization for it, and finally polygonises it. But Method 2 is not robust and sometimes fails to fill holes or fills a hole unnaturally in practice since it can’t handle complex hole boundaries on a noisy mesh.
Existing Method 3: Visual Hull (for example, refer to: United States Patent Application, Publication No. US2015/0178988A1, “Method and a system for generating a realistic 3d reconstruction model for an object or being” ) , which reconstructs 3D model from a plurality of silhouette images, is another way of 3D reconstruction. Visual Hull is usually performed under well calibrated settings. For instance, a subject is supposed to be in a special room where a sufficient number of cameras is tightly fixed and walls and floors are covered by distinct color to extract accurate silhouette of the subject. Under such lab-setting Visual Hull would reconstruct accurate 3D model.
Method 3 describes a system in such a special room. Method 3 basically relies on Visual Hull, but enhances fidelity of a face by fusing high resolution mesh that comes from structured-light based triangulation. Special smoothing method is applied to boundary of face to alleviate visible geometrical steps caused by combining two independent meshes.
There are two main problems when we use the approach of Method 3 in a casual setup.  First of all, everything is not well calibrated. Movable camera trajectory estimated by SfM or SLAM is prone to drift. A heuristic method or a machine learning based method can be applied to extract a silhouette of a subject in front of an unknown background, but the boundary of the silhouette will be noisy. Drifted camera trajectory and noisy silhouette decrease quality of Visual Hull output. Second problem is that we don’t know which part should be adopted by either Visual Hull or other methods and how to identify boundary region to be smoothed. Therefore, using Method 3 as approach in casual setting fails to generate a visually good 3D model.
In the present invention, one or more silhouettes of a subject are used to close holes on the 3D model. The silhouette is useful to close holes appearing on unobservable region such as the top of the head, crotch or armpits.
Fig. 2 depicts an example of a block diagram of a hardware configuration of the first embodiment. The mobile device 100 includes a CPU (Central Processing Unit) 110, a RAM (Random Access Memory) 111, a ROM (Read Only Memory) 112, a bus 113, an Input/Output I/F (Interface) 114, a display 117, and a touch panel 118. The mobile device 100 also has a camera 115 and a storage device 116 that are connected to the bus 113 via the Input/Output I/F 114. The CPU 110 controls each element connected through the bus 113. The RAM 111 is used for a main memory of the CPU 110 and so on. The ROM 112 stores OS (Operating System) , programs, device drivers and so on. The camera 115 connected via the Input/Output I/F 114 captures still images or videos. The storage device 116 connected via the Input/Output I/F 114 is a storage having a large capacity, for example, a hard disk or a flash memory. The Input/Output I/F 114 converts data captured by the camera 115 into an image format, and stores it in the storage device 116. The display 117 shows a user interface. The touch panel 118 embedded on the display 117 accepts and transfers touch operations by the user 102 to the CPU 110.
Fig. 3 depicts an example of a block diagram of a function configuration of the first embodiment. The mobile device 100 includes a user interface control unit 120, an image acquisition unit 121, a silhouette extraction unit 122, a 3D reconstruction unit 123, a model refinement unit 124, and a storage unit 125.
The user interface control unit 120 controls a user interface shown on the display 117 according to the states of the other units and touch operations by the user 102 to the touch panel 118. For example, the user interface control unit 120 is realized by the CPU 110, the RAM 111, programs in the ROM 112, the bus 113, the display 117, and the touch panel 118.
The image acquisition unit 121 obtains a sequence of still images or a video from the camera 115, and stores it in the RAM 111 or the storage device 116. For example, the image acquisition unit 121 is realized by the CPU 110, the RAM 111, programs in the ROM 112, the bus  113, the Input/Output I/F 114, and the camera 115.
The silhouette extraction unit 122 extracts a silhouette of the target person 101 from a still image or a frame of video captured by the image acquisition unit 121 and stored in the RAM 111 or the storage unit 125. The silhouette extraction unit 122 could be implemented in various ways, for example, background subtraction or CNN (Convolutional Neural Network) . For example, the silhouette extraction unit 122 is realized by the CPU 110, the RAM 111, programs in the ROM 112, and the bus 113.
The 3D reconstruction unit 123 reconstructs a 3D model of the target person 101 from the sequence of still images or the video captured by the image acquisition unit 121 and stored in the RAM 111 or the storage unit 125. The 3D reconstruction unit 123 also estimates extrinsic parameters that define 3D rigid transformation between each image used for the reconstruction and the 3D model. The 3D reconstruction unit 123 could be implemented in various ways, for example, SfM (Structure-from-Motion) and MVS (Multi-View Stereo) for color images or KinectFusion for depth images. For example, the 3D reconstruction unit 123 is realized by the CPU 110, the RAM 111, programs in the ROM 112, and the bus 113.
The model refinement unit 124 refines the 3D model reconstructed by the 3D reconstruction unit 123 to make a refined 3D model with one or more silhouettes selected by the user 102. The details will be described later. For example, the model refinement unit 124 is realized by the CPU 110, the RAM 111, programs in the ROM 112, and the bus 113.
The storage unit 125 stores the captured images and the refined 3D model into the storage device 116 for further use. For example, the storage unit 125 is realized by the Input/Output I/F 114 and the storage device 116.
The CPU 110 controls the above-mentioned units in this embodiment.
Fig. 4 (a) depicts an example of an overall flowchart of model refinement according to the first embodiment. Fig. 4 (b) depicts an example of a detailed flowchart of the model refinement according to the first embodiment. Each step of Figs. 4 (a) and 4 (b) would be executed by the CPU 110 and data are stored in the RAM 111 or the storage device 116 and loaded from them as needed.
At step S100, the CPU110 obtains an image sequence via the image acquisition unit 121 with the camera 115 and stores it in the RAM 111. It is assumed that the images are colored in this embodiment. It could be possible to store it in the storage device 116 by the storage unit 125. Fig. 1 shows how the mobile device 100 is operated in this step. The user 102 holding and operating the mobile device 100 scans a static target person 101 as completely as possible. The user 102 is supposed to move around the target person 101 while keeping the camera 115 on the back of the mobile device 100 toward the target person 101.
At step S101, CPU110 processes the image sequence obtained at step S100 to generate a 3D model. The 3D reconstruction unit 123 reconstructs the 3D model and estimates extrinsic camera parameters (mentioned above) and if necessary, intrinsic camera parameters (mentioned later) . All of the output at step S101 are stored in the RAM 111 or the storage device 116.
A complete 3D model is rarely reconstructed at S101 because only an insufficient number of images are often captured at S100. At step S102, the UI on the display 117 requests the user 102 to select one or more images, to which the user 102 wishes to fit the 3D model, from the image sequence. The message “Select frontal view” in Fig. 5 (a) is merely an example, and the user 102 is requested to select “one or more images” to be used for fitting the 3D model into the silhouettes of the subject that are extracted from those “one or more images” . The UI is controlled by the user interface control unit 120.
Fig. 5 (a) is the UI of this step. On the display 117, thumbnails of the image sequence 200 are shown. The images captured by the camera 115 are used for thumbnails in Figs. 5 (a) and 5 (b) (the face of the person in the thumbnails in Figs. 5 (a) and 5 (b) have been processed for the purpose of privacy protection because this patent application document will be opened to the public) . In Fig. 5 (a) , the upper-left image is a photo of a person standing in a room that is captured from the front, and the upper-right image, the middle-left image, the middle-right image, and the lower-right image are captured from the rear right, from the back, from the left, and from the front right, respectively, and the lower-left image is a photo of the lower body of the person captured from the front right. The user 102 is supposed to select one or more frames corresponding to thumbnails by a touch operation. If there are too many images to show on the display 117 at one time, a next page button 201 is shown to change the thumbnails 200 to show the other images. After the user 102 selects one of the images, for example, if the user 102 selects the upper-left image, the UI changes and the selected image 202 is displayed as shown in Fig. 5 (b) .
At step S103, a silhouette extraction unit 122 extracts the silhouette of the target person 101 from the selected frames at step S102.
At step S104, the user 102 checks whether the silhouette extraction result shown on the display 117 is acceptable or not in terms of silhouette accuracy. Fig. 5 (b) depicts an example of the UI at step S104. The selected image 202 and corresponding extracted silhouette 203 are shown. The silhouette is shown in white and the background is shown in black. There may be cases where a wrong silhouette is extracted because of an algorithm error. If the user 102 taps one of response buttons 204, namely, taps “OK” or “NG” according to acceptance or rejection of the extracted silhouette. After the user 102 responses, the UI continues to show another selected image and corresponding silhouette. If all of the one or more selected images and corresponding silhouettes are  checked by the user 102, the process goes to the next step, specifically, if at least one silhouette does not have acceptable quality (the user 102 responded “NG” at least once) , goes to step S105, or if all silhouettes are acceptable (the user 102 responded “OK” for all of the one or more selected images and corresponding silhouettes) , goes to step S107.
The above-mentioned UI interaction may be eliminated by automatically selecting one or more images for silhouette refinement.
All or part of process at step S104 could be performed in advance or integrated into earlier steps. For example, in step S100, when the user 102 is capturing the image sequence, the UI could show corresponding silhouette and the response buttons for each captured image in real time. In this case, the user 102 could select one or more images and check corresponding silhouettes in step S100.
At step S105, the UI asks whether the user 102 wishes to select the other images from the existing image sequence (the image sequence obtained at step S100) or not. If yes, the process goes back to step S102, or if no, goes to step S106.
At step S106, the user 102 captures another image sequence in the same way as step S100. After capturing, camera parameters for each additional image are estimated in the same way as step S101. Images captured at this step are merged to the existing image sequence. Then, the process goes back to step S102.
At step S107, the model refinement unit 124 refines the 3D model reconstructed at step S101 by using the silhouettes extracted at step S103 and confirmed by the user 102 at step S104. The details of the model refinement are shown in Fig. 4 (b) and will be explained later. The refined 3D model is stored in the RAM 111 or the storage device 116 for further use, for instance, 3D model viewers or Augmented Reality applications.
Summarizing the user operation other than obtaining an image sequence, a user who operates a device selects one or more images of a subject from the image sequence that the user is capturing and/or has already captured. Then, the device according to the present invention extracts one or more silhouettes of the subject that are extracted from the one or more selected images, and asks the user whether the silhouettes are accurate or not. If it is OK, the device applies refinement, based on the silhouettes, for the reconstructed 3D model to close holes on it. If not, the user is requested to select the other images from the image sequence or capture additional images.
Next, the details of the model refinement by the model refinement unit 124 is introduced with reference to the flowchart in Fig. 4 (b) . The model refinement unit 124 refines the 3D model reconstructed by the 3D reconstruction unit 123 to make a refined 3D model with the one or more silhouettes selected by the user 102. The 3D model often has large holes caused by the insufficient  number of input images. Fig. 6 (a) depicts an example of a 3D model 300 of a human with a hole on the top of the head caused by difficulty to capture there in a casual scanning. Fig. 6 (a) shows a simplified 3D model of a human only from shoulder to top, viewed from obliquely above. The model refinement unit 124 fills this hole by using the one or more silhouettes.
At step S200, the model refinement unit 124 computes a set of 3D dimensional curved tangent surfaces from each silhouette based on the principle of perspective projective transformation and camera parameters like those for Visual Hull. Various methods could be used at this step. For example, a signed distance function on a voxel grid set to cover the 3D model is updated by unprojecting ( “projection” means mapping 3D points to 2D points on an image, and “unprojection” means the reverse of this “projection” , namely, mapping 2D points on an image to 3D points) inside pixels of a silhouette to make an implicit tangent surface of the silhouette. Then marching cube is applied to extract the tangent surface. The tangent surface should cover over the holes if the 3D model and the silhouette is enough accurate. Such a tangent surface 301 is shown in Fig. 6 (b) , in which the white part corresponds to the silhouette, and the dark gray part corresponds to a possible surface calculated from the silhouette. This process is applied to all of the one or more silhouettes. To perform the unprojection in the same coordinate system of the 3D model, camera parameters are required, and are already obtained before performing this step S200. In the first embodiment, camera 115 has a standard Field-of-View and could be approximated by a pinhole camera model. Intrinsic camera parameters, which are focal length and principal point, and distortion coefficients, could be independently calibrated before the application runs or estimated in the 3D reconstruction unit 123. Extrinsic camera parameters are estimated in the 3D reconstruction unit 123.
At step S201, the model refinement unit 124 calculates parts of the tangent surfaces on the holes to close them. In this step, the places of the holes are identified. Various ways could be used for this step. For instance, Poisson surface reconstruction (refer to “Existing Method 1” mentioned earlier) is applied to the 3D model to make closed surface with inflated artifacts on the holes. By projecting vertices or faces of the closed surface to the silhouettes and checking projected positions are outside or not, outside inflated artifacts parts of the closed surface are determined. Such outside parts are above the holes. Then the Nearest Neighbor from the outside parts to the tangent surfaces is used to find corresponding parts of the tangent surfaces to close holes. Such a part of tangent surface 302 is shown in Fig. 6 (c) .
At step S202, the parts of the tangent surfaces calculated at step S201 are merged to the 3D model to make a refined 3D model. A merged surface of the refined 3D model 303 is shown in Fig. 6 (d) .
According to the embodiment of the present invention, holes on the 3D model are closed and aligned with silhouettes of the subject. A closed surface is not only visually good but also important for further application with the 3D model because many computer graphics and computer vision algorithms assume input surface is closed.
What is disclosed above is merely exemplary embodiments of the present invention, and certainly is not intended to limit the protection scope of the present invention. A person of ordinary skill in the art may understand that all or some of processes that implement the foregoing embodiments and equivalent modifications made in accordance with the claims of the present invention shall fall within the scope of the present invention.

Claims (6)

  1. A device, comprising:
    a camera for capturing an image sequence of a subject,
    a three dimension (3D) reconstruction unit for reconstructing a 3D model from the image sequence, and
    a model refinement unit for refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence.
  2. The device according to claim 1, wherein the 3D model is refined based on one or more silhouettes of the subject that are extracted from the one or more selected images.
  3. The device according to claim 2, further comprising:
    a user interface unit for showing one or more silhouettes of the subject that are extracted from the one or more selected images, and making the user check whether the one or more silhouettes are accurate or not.
  4. The device according to claim 2, wherein the 3D model is reconstructed as a set of points, and holes on the 3D model is closed by one or more parts of a set of tangent surfaces computed from the one or more silhouettes, wherein the one or more parts of the set of tangent surfaces are inside a 3D model reconstructed as 3D mesh from the set of points.
  5. A method performed by a device, comprising:
    capturing an image sequence of a subject,
    reconstructing a three dimension (3D) model from the image sequence, and
    refining the 3D model so as to be fitted to one or more images selected by a user from the image sequence.
  6. A computer readable storage media storing a program thereon, when the program is executed by a processor, the program causes the processor to perform the method according to claim 5.
PCT/CN2019/126298 2019-12-18 2019-12-18 3d reconstruction from an insufficient number of images WO2021120052A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/126298 WO2021120052A1 (en) 2019-12-18 2019-12-18 3d reconstruction from an insufficient number of images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/126298 WO2021120052A1 (en) 2019-12-18 2019-12-18 3d reconstruction from an insufficient number of images

Publications (1)

Publication Number Publication Date
WO2021120052A1 true WO2021120052A1 (en) 2021-06-24

Family

ID=76476978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126298 WO2021120052A1 (en) 2019-12-18 2019-12-18 3d reconstruction from an insufficient number of images

Country Status (1)

Country Link
WO (1) WO2021120052A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1308902A2 (en) * 2001-11-05 2003-05-07 Canon Europa N.V. Three-dimensional computer modelling
WO2009006273A2 (en) * 2007-06-29 2009-01-08 3M Innovative Properties Company Synchronized views of video data and three-dimensional model data
US20140111507A1 (en) * 2012-10-23 2014-04-24 Electronics And Telecommunications Research Institute 3-dimensional shape reconstruction device using depth image and color image and the method
CN104282040A (en) * 2014-09-29 2015-01-14 北京航空航天大学 Finite element preprocessing method for reconstructing three-dimensional entity model
CN109242954A (en) * 2018-08-16 2019-01-18 叠境数字科技(上海)有限公司 Multi-view angle three-dimensional human body reconstruction method based on template deformation
CN109658449A (en) * 2018-12-03 2019-04-19 华中科技大学 A kind of indoor scene three-dimensional rebuilding method based on RGB-D image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1308902A2 (en) * 2001-11-05 2003-05-07 Canon Europa N.V. Three-dimensional computer modelling
WO2009006273A2 (en) * 2007-06-29 2009-01-08 3M Innovative Properties Company Synchronized views of video data and three-dimensional model data
US20140111507A1 (en) * 2012-10-23 2014-04-24 Electronics And Telecommunications Research Institute 3-dimensional shape reconstruction device using depth image and color image and the method
CN104282040A (en) * 2014-09-29 2015-01-14 北京航空航天大学 Finite element preprocessing method for reconstructing three-dimensional entity model
CN109242954A (en) * 2018-08-16 2019-01-18 叠境数字科技(上海)有限公司 Multi-view angle three-dimensional human body reconstruction method based on template deformation
CN109658449A (en) * 2018-12-03 2019-04-19 华中科技大学 A kind of indoor scene three-dimensional rebuilding method based on RGB-D image

Similar Documents

Publication Publication Date Title
US11210838B2 (en) Fusing, texturing, and rendering views of dynamic three-dimensional models
WO2020192706A1 (en) Object three-dimensional model reconstruction method and device
EP3323249B1 (en) Three dimensional content generating apparatus and three dimensional content generating method thereof
KR101560508B1 (en) Method and arrangement for 3-dimensional image model adaptation
US9886530B2 (en) Computing camera parameters
KR101613721B1 (en) Methodology for 3d scene reconstruction from 2d image sequences
JP6685827B2 (en) Image processing apparatus, image processing method and program
Shen et al. Virtual mirror rendering with stationary rgb-d cameras and stored 3-d background
EP3429195A1 (en) Method and system for image processing in video conferencing for gaze correction
JP2018530045A (en) Method for 3D reconstruction of objects from a series of images, computer-readable storage medium and apparatus configured to perform 3D reconstruction of objects from a series of images
EP2089852A1 (en) Methods and systems for color correction of 3d images
Slabaugh et al. Image-based photo hulls
WO2021078179A1 (en) Image display method and device
CN113628327A (en) Head three-dimensional reconstruction method and equipment
US20220277512A1 (en) Generation apparatus, generation method, system, and storage medium
CN113516755A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN111742352A (en) 3D object modeling method and related device and computer program product
US9998724B2 (en) Image processing apparatus and method for processing three-dimensional data that describes a space that includes physical objects
WO2021120052A1 (en) 3d reconstruction from an insufficient number of images
Lim et al. 3-D reconstruction using the kinect sensor and its application to a visualization system
Ha et al. Normalfusion: Real-time acquisition of surface normals for high-resolution rgb-d scanning
EP3236422A1 (en) Method and device for determining a 3d model
Lee et al. Panoramic mesh model generation from multiple range data for indoor scene reconstruction
CN115272604A (en) Stereoscopic image acquisition method and device, electronic equipment and storage medium
Savakar et al. A relative 3D scan and construction for face using meshing algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19956820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19956820

Country of ref document: EP

Kind code of ref document: A1