CN109074658B

CN109074658B - Method for 3D multi-view reconstruction by feature tracking and model registration

Info

Publication number: CN109074658B
Application number: CN201780015282.4A
Authority: CN
Inventors: K-K·A·黄; 刘明昌
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2016-03-09
Filing date: 2017-03-03
Publication date: 2022-06-10
Anticipated expiration: 2037-03-03
Also published as: CN109074658A; WO2017155825A1; US10122996B2; JP6730695B2; US20170264887A1; JP2019512781A

Abstract

A 3D multi-view reconstruction method takes a series of 2D stereoscopic images from a narrow-field imager (e.g., a camera) and reconstructs a 3D representation of a wide-field object or scene. The 3D multiview reconstruction method tracks 2D image pixels across adjacent frames and constraints for frame integration through 3D model building.

Description

Method for 3D multi-view reconstruction by feature tracking and model registration

Cross Reference to Related Applications

Priority of U.S. provisional patent application serial No.62/305,696 entitled "METHOD FOR 3D multiple recovery BY means of a catalyst TRACKING AND mode regeneration" filed 2016, 3, 9, month, 9, herein under 35 u.s.c. § 119(e), the entire contents of which are incorporated herein BY reference FOR all purposes.

Technical Field

The present invention relates to imaging analysis. More particularly, the invention relates to 3D imaging analysis.

Background

Typical human vision is limited by its field of view, which usually covers only a narrow view of the entire scene. To be able to extend the field of view for a wider environment or object awareness, several views are used to reconstruct the environment in the human brain. Similarly, for computer vision, typical imagers/cameras capture only a limited portion of the object or scene of interest. By providing several shots (shots) of different views, the machine will have a more extensive understanding of the world around it in 3D.

Disclosure of Invention

In one aspect, a method of programming in a non-transitory memory of a device, comprising: acquiring one or more 2D stereo images; reconstructing a 3D multiview image using the one or more 2D stereo images; and displaying the 3D multiview image. Acquiring the one or more 2D stereoscopic images includes capturing the one or more 2D stereoscopic images with a plurality of cameras. Acquiring the one or more 2D stereoscopic images includes downloading the one or more 2D stereoscopic images from a server. The one or more 2D stereoscopic images have a narrow field of view. Reconstructing the 3D multiview image includes: modifying the one or more images, estimating disparity (disparity) between the images, enabling 2D pixel tracking across adjacent frames, estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates, and constructing a 3D model based on the 3D pose. Reconstructing the 3D multiview image includes: selecting a frame of the one or more images to accelerate; performing a noise check to discard an image of the one or more images if the noise of the image is above a threshold; and selecting a region of interest in the one or more images to accelerate. The 3D image includes a 3D surface. The 3D image includes a 3D point cloud.

In another aspect, an apparatus includes: a non-transitory memory for storing an application, the application to: acquiring one or more 2D stereoscopic images, reconstructing a 3D multiview image using the one or more 2D stereoscopic images, and displaying the 3D multiview image; and a processing component coupled to the memory, the processing component configured to process the application. Acquiring the one or more 2D stereoscopic images includes capturing the one or more 2D stereoscopic images with a plurality of cameras. Acquiring the one or more 2D stereoscopic images includes downloading the one or more 2D stereoscopic images from a server. The one or more 2D stereoscopic images have a narrow field of view. Reconstructing the 3D multiview image includes: the method includes modifying the one or more images, estimating differences between the images, enabling 2D pixel tracking across adjacent frames, estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates, and constructing a 3D model based on the 3D pose. Reconstructing the 3D multiview image includes: selecting a frame of the one or more images to accelerate, if noise of an image is above a threshold, performing a noise check to discard an image of the one or more images, and selecting a region of interest in the one or more images to accelerate. The 3D image includes a 3D surface. The 3D image includes a 3D point cloud.

In another aspect, a system includes: a plurality of cameras for acquiring one or more 2D stereoscopic images; and a computing device configured to: receiving the one or more 2D stereoscopic images, reconstructing a 3D multiview image using the one or more 2D stereoscopic images, and displaying the 3D multiview image. Acquiring the one or more 2D stereoscopic images includes capturing the one or more 2D stereoscopic images with a plurality of cameras. Acquiring the one or more 2D stereoscopic images includes downloading the one or more 2D stereoscopic images from a server. The one or more 2D stereoscopic images have a narrow field of view. Reconstructing the 3D multiview image includes: the method includes modifying one or more images, estimating differences between the images, enabling 2D pixel tracking across adjacent frames, estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates, and constructing a 3D model based on the 3D pose. Reconstructing the 3D multiview image includes: selecting a frame of the one or more images to accelerate, if noise of an image is above a threshold, performing a noise check to discard an image of the one or more images, and selecting a region of interest in the one or more images to accelerate. The 3D image includes a 3D surface. The 3D image includes a 3D point cloud.

Drawings

Fig. 1 illustrates a flow diagram of a 3D multi-view reconstruction method according to some embodiments.

Fig. 2 illustrates a diagram for reconstructing a 3D multiview image using a 2D stereo image, according to some embodiments.

FIG. 3 illustrates exemplary results of multi-view reconstruction in accordance with certain embodiments.

Fig. 4 illustrates a block diagram of an exemplary computing device configured to implement a 3D multi-view reconstruction method in accordance with certain embodiments.

FIG. 5 illustrates a diagram of a network of devices, according to some embodiments.

Detailed Description

A 3D multi-view reconstruction method acquires a series of 2D stereoscopic images from a narrow-field imager (e.g., a camera) and reconstructs a 3D representation of a wide-field object or scene.

Fig. 1 illustrates a flow diagram of a 3D multi-view reconstruction method according to some embodiments. In step 100, a 2D stereoscopic image is acquired. The 2D stereoscopic images may be acquired in any manner, such as using one or more cameras to acquire the 2D stereoscopic images or downloading the 2D stereoscopic images from a device such as a server (e.g., in a cloud system or the internet). In certain embodiments, the 2D stereoscopic image has a narrow field of view (FOV). The 2D stereo images are from multiple angles of a desired object or scene (e.g., stereo images from various angles in order to capture each side of the object-top, bottom, left side, right side, front side, back side, and/or angles therebetween). In some embodiments, the 2D image is extracted from a video. In step 102, a 3D multiview image is reconstructed using the 2D stereo image. In certain embodiments, the 3D representation has a wide FOV. In step 104, the 2D or 3D image is displayed and/or analyzed. In certain embodiments, fewer or additional steps are implemented. In certain embodiments, the order of the steps is modified.

Fig. 2 illustrates a diagram for reconstructing a 3D multiview image using a 2D stereo image, according to some embodiments. A 2D stereoscopic input frame (narrow FOV) is received. In step 200, a frame is selected for acceleration. In certain embodiments, step 200 is optional. In step 202, the image or selected frames are modified to project the images onto a common image plane. In step 204, an estimate of the difference between the images is computed. In step 206, the 3D representation is projected. In some embodiments, step 202-206 is part of a standard 3D stereoscopic reconstruction pipeline. In step 208, a noise check is implemented. The noise check can be implemented in any manner, such as determining the amount of noise in a frame and comparing the amount of noise to a threshold, and if the amount of noise is above the threshold, discarding the frame. In certain embodiments, step 208 is optional. In certain embodiments, after step 208, the process proceeds to step 214. In step 210, a region of interest (ROI) in the image is selected. For example, only a portion of the frame (e.g., ROI) is selected. In certain embodiments, step 210 is optional. In step 212, 2D pixel tracking (e.g., optical flow) across adjacent frames is implemented. For example, by tracking the movement of pixels in 2D, the direction in which the 2D object is moving can be determined. In step 214, the 3D pose is estimated. In some embodiments, the 2D tracking points are mapped to 3D coordinates for 3D pose estimation. The 3D pose is estimated from the tracked 2D features by a random sample consensus (RANSAC) algorithm using a 3D rigid transformation (rotation, translation). RANSAC is modified by checking for errors in frame model registration (registration). In step 216, a 3D model is constructed based on the estimated 3D pose. The 3D model is constructed by sequentially integrating the inputs of all the transformations. For example, the 3D poses are combined to generate a 3D model. The desired 3D output is constructed by 3D surface rendering or sampling based on the application. In certain embodiments, the 3D model is constructed differently. In step 218, a 3D surface is generated. In step 220, a 3D point cloud is generated. In certain embodiments, either or both of

steps

218 and 220 are optional. In certain embodiments, fewer or additional steps are implemented. In certain embodiments, the order of the steps is modified.

FIG. 3 illustrates exemplary results of multi-view reconstruction in accordance with certain embodiments. The 3D multi-view reconstruction method uses the 2D stereo image 300 to generate a 3D reconstructed model 302 and/or a sampled 3D point cloud 304 as described herein.

Fig. 4 illustrates a block diagram of an exemplary computing device configured to implement a 3D multi-view reconstruction method in accordance with certain embodiments. Computing device 400 can be used to obtain, store, calculate, process, communicate, and/or display information such as images and videos. In general, a hardware architecture suitable for implementing the computing device 400 includes a network interface 402, memory 404, a processor 406, I/O devices 408, a bus 410, and a storage device 412. The choice of processor is not critical as long as a suitable processor with sufficient speed is selected. The memory 404 may be any conventional computer memory known in the art. The storage device 412 may include a hard drive, CDROM, CDRW, DVD, DVDRW, high definition disc/drive, ultra high definition drive, flash memory card, or any other storage device. Computing device 400 may include one or more network interfaces 402. Examples of network interfaces include a network card connected to an ethernet or other type of LAN. The I/O devices 408 may include one or more of the following: keyboard, mouse, monitor, screen, printer, modem, touch screen, button interface, and other devices. A 3D multi-view reconstruction application 430 for performing the inter-patient brain registration method may be stored in the storage device 412 and the memory 404 and processed as if the application were processed normally. More or fewer components shown in fig. 4 may be included in computing device 400. In certain embodiments, 3D multiview reconstruction method hardware 420 is included. Although the computing device 400 in fig. 4 includes the application 430 and the hardware 420 for the 3D multiview reconstruction method, the 3D multiview reconstruction method can be implemented on the computing device in hardware, firmware, software, or any combination thereof. For example, in certain embodiments, the 3D multiview reconstruction method application 430 is programmed in memory and executed using a processor. In another example, in certain embodiments, the 3D multiview reconstruction method hardware 420 is programmed as hardware logic comprising gates specifically designed to implement the 3D multiview reconstruction method.

In certain embodiments, the 3D multiview reconstruction method application 430 includes several applications and/or modules. In certain embodiments, the module further comprises one or more sub-modules. In some embodiments, fewer or additional modules may be included.

Examples of suitable computing devices include personal computers, laptop computers, computer workstations, servers, mainframe computers, handheld computers, personal digital assistants, cellular/mobile phones, smart devices, game consoles, digital cameras, digital camcorders, camera phones, smart phones, portable music players, tablet computers, mobile devices, video players, video recorders/players (e.g., DVD recorders/players, high definition disc recorders/players, ultra high definition disc recorders/players), televisions, home entertainment systems, augmented reality devices, virtual reality devices, smart accessories (e.g., smart watches), or any other suitable computing device.

FIG. 5 illustrates a diagram of a network of devices, according to some embodiments. Multiple cameras 500 (e.g., stereo pairs) are utilized to acquire image/video content. The image/video content is sent to computing device 504 over network 502 (e.g., the internet, cellular network, or any other network). In some embodiments, the content is sent directly to the computing device without a network. The computing device 504 is configured to perform 3D multiview reconstruction as described herein. Computing device 504 may be any device, such as a server, a personal computer, a smartphone, or any device described herein or any combination of devices described herein. In some embodiments, the computing device 504 is one or more of the plurality of cameras 500. In other words, the camera 500 implements a 3D multiview reconstruction method.

To utilize the 3D multiview reconstruction methods described herein, a stereo pair of cameras is used to acquire images. The images are then processed using 3D multiview reconstruction. The process can be automated without human intervention.

In operation, the 3D multiview reconstruction method enables efficient conversion of 2D images with a narrow FOV to 3D content. Better results are achieved by tracking in 2D to obtain velocity and using model building in 3D to obtain accuracy. The 3D multi-view reconstruction method can be used in 3D object and scene modeling for augmented reality, 3D printing, virtual reality and localization (localization) and mapping.

Some embodiments of a method for 3D multi-view reconstruction by feature tracking and model registration

1. A method of programming in a non-transitory memory of a device, comprising:

acquiring one or more 2D stereo images;

reconstructing a 3D multiview image using the one or more 2D stereo images; and displaying the 3D multiview image.

2. The method of clause 1, wherein acquiring the one or more 2D stereoscopic images comprises capturing the one or more 2D stereoscopic images with a plurality of cameras.

3. The method of clause 1, wherein obtaining the one or more 2D stereoscopic images comprises downloading the one or more 2D stereoscopic images from a server.

4. The method of clause 1, wherein the one or more 2D stereoscopic images have a narrow field of view.

5. The method of clause 1, wherein reconstructing the 3D multiview image comprises:

modifying the one or more images;

estimating a difference between the images;

enabling 2D pixel tracking across adjacent frames;

estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates; and

and constructing a 3D model based on the 3D posture.

6. The method of clause 5, wherein reconstructing the 3D multiview image comprises:

selecting a frame of the one or more images to accelerate;

if the noise of the image is above a threshold, performing a noise check to discard the one

Or an image of the plurality of images; and

a region of interest in the one or more images is selected for acceleration.

7. The method of clause 1, wherein the 3D image comprises a 3D surface.

8. The method of clause 1, wherein the 3D image comprises a 3D point cloud.

9. An apparatus, comprising:

a non-transitory memory for storing an application, the application to:

acquiring one or more 2D stereo images;

reconstructing a 3D multiview image using the one or more 2D stereo images;

and

Displaying the 3D multiview image; and

a processing component coupled to the memory, the processing component configured to process the application.

10. The apparatus of clause 9, wherein acquiring the one or more 2D stereoscopic images comprises capturing the one or more 2D stereoscopic images with a plurality of cameras.

11. The apparatus of clause 9, wherein obtaining the one or more 2D stereoscopic images comprises downloading the one or more 2D stereoscopic images from a server.

12. The apparatus of clause 9, wherein the one or more 2D stereoscopic images have a narrow field of view.

13. The apparatus of clause 9, wherein reconstructing the 3D multiview image comprises:

modifying the one or more images;

estimating a difference between the images;

enabling 2D pixel tracking across adjacent frames;

estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates; and

and constructing a 3D model based on the 3D posture.

14. The apparatus of clause 13, wherein reconstructing the 3D multiview image comprises:

selecting a frame of the one or more images to accelerate;

performing a noise check to discard an image of the one or more images if the noise of the image is above a threshold; and

A region of interest in the one or more images is selected for acceleration.

15. The apparatus of clause 9, wherein the 3D image comprises a 3D surface.

16. The apparatus of clause 9, wherein the 3D image comprises a 3D point cloud.

17. A system, comprising:

a plurality of cameras for acquiring one or more 2D stereoscopic images; and

a computing device configured to:

receiving the one or more 2D stereoscopic images;

reconstructing a 3D multiview image using the one or more 2D stereo images; and

displaying the 3D multiview image.

18. The system of clause 17, wherein acquiring the one or more 2D stereoscopic images comprises capturing the one or more 2D stereoscopic images with a plurality of cameras.

19. The system of clause 17, wherein obtaining the one or more 2D stereoscopic images comprises downloading the one or more 2D stereoscopic images from a server.

20. The system of clause 17, wherein the one or more 2D stereoscopic images have a narrow field of view.

21. The system of clause 17, wherein reconstructing the 3D multiview image comprises:

modifying the one or more images;

estimating a difference between the images;

enabling 2D pixel tracking across adjacent frames;

Estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates; and constructing a 3D model based on the 3D pose.

22. The system of clause 21, wherein reconstructing the 3D multiview image comprises:

selecting a frame of the one or more images to accelerate;

a region of interest in the one or more images is selected for acceleration.

23. The system of clause 17, wherein the 3D image comprises a 3D surface.

24. The system of clause 17, wherein the 3D image comprises a 3D point cloud.

The present invention has been described in terms of specific embodiments incorporating details to facilitate the understanding of the principles of construction and operation of the invention. Such reference herein to specific embodiments and details thereof is not intended to limit the scope of the claims appended hereto. It will be apparent to those skilled in the art that various other modifications may be made in the embodiment chosen for illustration without departing from the spirit and scope of the invention as defined by the claims.

Claims

1. A method of programming in a non-transitory memory of a device, comprising:

Acquiring one or more 2D stereo images;

reconstructing a 3D multiview image using the one or more 2D stereo images, wherein reconstructing the 3D multiview image comprises:

modifying the one or more images;

estimating a difference between the images;

enabling 2D pixel tracking across adjacent frames;

estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates; and

building a 3D model based on the 3D pose; and

displaying the 3D multiview image.

2. The method of claim 1, wherein acquiring the one or more 2D stereoscopic images comprises capturing the one or more 2D stereoscopic images with a plurality of cameras.

3. The method of claim 1, wherein obtaining the one or more 2D stereoscopic images comprises downloading the one or more 2D stereoscopic images from a server.

4. The method of claim 1, wherein the one or more 2D stereoscopic images have a narrow field of view.

5. The method of claim 1, wherein reconstructing the 3D multiview image comprises:

selecting a frame of the one or more images;

A region of interest in the one or more images is selected.

6. The method of claim 1, wherein the 3D multiview image comprises a 3D surface.

7. The method of claim 1, wherein the 3D multiview image comprises a 3D point cloud.

8. An apparatus for 3D multiview reconstruction, comprising:

a non-transitory memory for storing an application, the application to:

acquiring one or more 2D stereo images;

modifying the one or more images;

estimating a difference between the images;

enabling 2D pixel tracking across adjacent frames;

estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates; and

building a 3D model based on the 3D pose; and

displaying the 3D multiview image; and

9. The apparatus of claim 8, wherein acquiring the one or more 2D stereoscopic images comprises capturing the one or more 2D stereoscopic images with a plurality of cameras.

10. The apparatus of claim 8, wherein obtaining the one or more 2D stereoscopic images comprises downloading the one or more 2D stereoscopic images from a server.

11. The apparatus of claim 8, wherein the one or more 2D stereoscopic images have a narrow field of view.

12. The apparatus of claim 8, wherein reconstructing the 3D multiview image comprises:

selecting a frame of the one or more images;

a region of interest in the one or more images is selected.

13. The apparatus of claim 8, wherein the 3D multiview image comprises a 3D surface.

14. The apparatus of claim 8, wherein the 3D multiview image comprises a 3D point cloud.

15. A system for 3D multiview reconstruction, comprising:

a plurality of cameras for acquiring one or more 2D stereoscopic images; and

a computing device configured to:

receiving the one or more 2D stereoscopic images;

modifying the one or more images;

estimating a difference between the images;

enabling 2D pixel tracking across adjacent frames;

estimating a 3D pose by mapping the tracked 2D pixels to 3D coordinates; and

Constructing a 3D model based on the 3D pose; and

displaying the 3D multiview image.

16. The system of claim 15, wherein acquiring the one or more 2D stereoscopic images comprises capturing the one or more 2D stereoscopic images with a plurality of cameras.

17. The system of claim 15, wherein obtaining the one or more 2D stereoscopic images comprises downloading the one or more 2D stereoscopic images from a server.

18. The system of claim 15, wherein the one or more 2D stereoscopic images have a narrow field of view.

19. The system of claim 15, wherein reconstructing the 3D multiview image comprises:

selecting a frame of the one or more images;

a region of interest in the one or more images is selected.

20. The system of claim 15, wherein the 3D multiview image comprises a 3D surface.

21. The system of claim 15, wherein the 3D multiview image comprises a 3D point cloud.