US20150124062A1 - Joint View Expansion And Filtering For Automultiscopic 3D Displays - Google Patents
Joint View Expansion And Filtering For Automultiscopic 3D Displays Download PDFInfo
- Publication number
- US20150124062A1 US20150124062A1 US14/531,548 US201414531548A US2015124062A1 US 20150124062 A1 US20150124062 A1 US 20150124062A1 US 201414531548 A US201414531548 A US 201414531548A US 2015124062 A1 US2015124062 A1 US 2015124062A1
- Authority
- US
- United States
- Prior art keywords
- views
- band pass
- images
- spatial
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/302—Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
-
- H04N13/0402—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/122—Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/349—Multi-view displays for displaying three or more geometrical viewpoints without viewer tracking
- H04N13/351—Multi-view displays for displaying three or more geometrical viewpoints without viewer tracking for displaying simultaneously
Definitions
- Multi-view autostereoscopic (or automultiscopic) displays offer a superior visual experience, since they provide both binocular and motion parallax without the use of special glasses.
- automultiscopic displays may be manufactured inexpensively, for non-limiting example, by adding a parallax barrier or a lenticular screen to a standard display.
- Existing approaches have at least three major problems that the present invention addresses in its solution for multi-view autostereoscopic TV.
- existing 3D content production pipelines provide two views, while multi-view stereoscopic displays preferably use images from many viewpoints.
- capturing TV-quality scenes with dense camera rigs may be impractical because of the size and cost of professional quality cameras.
- a solution to use view-interpolation to generate these additional views preferably uses an accurate depth and inpainting of missing scene regions.
- the quality of existing approaches is not good enough for TV broadcast and movies. Handling scenes that include defocus blur, motion blur, transparent materials, and specularities is especially challenging in existing approaches.
- multi-view autostereoscopic displays preferably use special filtering to remove interperspective aliasing, e.g., image content that is not supported by a given display.
- filtering e.g., image content that is not supported by a given display.
- Zwicker M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing for Automultiscopic 3D Displays,” in Proceedings of the 17 th Eurographics conference on Rendering Techniques , Eurographics Association, June 2006, pg. 73-82.
- a dense light field is preferably used.
- image disparities preferably are modified according to the display type, size, and viewer preference.
- This disparity retargeting step also preferably rerenders the scene with adjusted disparities.
- Applicants' proposed approach includes a method, system, and apparatus that addresses the foregoing limitations of the art.
- Applicants' proposed approach takes a stereoscopic stream as an input and produces a correctly filtered multi-view video for a given automultiscopic display, as shown in FIG. 1A .
- the proposed approach does not require changes to existing (current) stereoscopic production and content delivery pipelines. Additional processing may be performed by the client (e.g., at home).
- Some advantages of the proposed approach are that it is simple and it may be implemented in hardware.
- the proposed approach is implemented on a GPU (Graphics Processing Unit) in a CUDA (Compute Unified Device Architecture) which achieves a near real-time performance.
- GPU Graphics Processing Unit
- CUDA Computer Unified Device Architecture
- steerable pyramid decomposition and filtering that are successfully used for motion magnification in video sequences (see the following publications that at least further describe steerable pyramids, filtering, and pyramid decomposition, and are hereby incorporated by reference: Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph . ( Proc. SIGGRAPH ), 32, 4, July 2013, pg. 80:1-80:9, U.S. patent application Ser. No. 13/607,173, filed on Sep. 7, 2012, now U.S. Patent Publication No. 2014/0072228, published on Mar. 13, 2014, and U.S. patent application Ser. No. 13/707,451, filed on Dec. 6, 2012, now U.S. Patent Publication No. 2014/0072229, published on Mar. 13, 2014).
- Applicants' proposed approach shows how similar concepts may be used for view interpolation and how the antialiasing filter and disparity remapping may be incorporated without requiring additional cost.
- results of Applicants' proposed approach are demonstrated on a variety of different scenes including defocus blur, motion blur, and complex appearance, and Applicants' proposed approach is compared to both the ground truth and depth-based rendering approaches.
- Applicants demonstrate the proposed approach on a real-time 3D video conferencing system that preferably uses two video cameras and provides a multi-view autostereoscopic experience.
- the contributions of the proposed approach include, but are not limited to, an efficient algorithm for joint view expansion, filtering and disparity remapping for multi-view autostereoscopic displays.
- Applicants also provide herein an evaluation of the proposed approach on a variety of different scenes, along with a comparison to both the ground truth and the state-off-the-art depth-based rendering techniques.
- the proposed approach includes a system and corresponding method that remedies the deficiencies of the existing approaches.
- the proposed approach is directed to a computer system and a corresponding method for rendering a three-dimensional (3D) video display.
- An embodiment includes a computer-implemented method that uses at least one processor and at least one associated memory.
- Embodiments may receive a video stream formed of a sequence of frames. Each frame may have image content corresponding to a plurality of views, and the views may be initial views.
- the proposed approach may apply one or more spatial band pass filters to the received image content resulting in filtered images. Each spatial band pass filter may have a respective spatial frequency band. From the filtered images, embodiments compute one or more output images that synthesize additional views with respect to the initial views.
- the output images may be computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter.
- the computing of output images may enable the option to include removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information.
- Embodiments drive a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display.
- the received video stream may be a 3D stereo video stream of images having two views (left and right) per frame.
- the step of applying one or more spatial band pass filters may include applying a one-dimensional (1D) filter.
- the step of applying spatial band pass filters may include applying a two-dimensional (2D) filter.
- the step of computing the output images may be performed in a manner that results in a stereo disparity expansion of views without need of a dense depth map reconstruction.
- the disparity range in the output images is user adjustable by any of: (i) adjusting a magnification factor in the given spatial band pass filter, and (ii) at least one of defining and translating a disparity mapping function to map a certain phase shift at the spatial frequency of the given spatial band pass filter to a new phase shift.
- the step of computing may include interpolating in-between views.
- the step of applying spatial band pass filters may capture correspondence between views using phase differences for multiple spatial frequencies and orientations separately.
- local depth may be represented as a plurality of values instead of as a single value.
- the step of driving the display may be in real-time relative to the step of receiving the video stream.
- Another embodiment of the computer-implemented method may include prealigning the initial views with each other before applying the spatial band pass filters.
- a further embodiment may include optional antialiasing for adding depth-of-field effect.
- the plurality of views may include a relatively low number of views.
- An embodiment of a computer-implemented system for rendering a three-dimensional (3D) video display may include a receiving module configured to receive a video stream formed of a sequence of frames. Each frame may have image content corresponding to a plurality of views, the views being initial views.
- the system may also include a computing module that is responsive to the receiving module and is configured to apply one or more spatial band pass filters to the received image content resulting in filtered images. Each spatial band pass filter may have a respective spatial frequency band.
- the computing module may be further configured to compute, from the filtered images, one or more output images that synthesize additional views with respect to the initial views.
- the output images may be computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter.
- the computing module may be further configured to enable optionally including removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information.
- the system may also include a display module coupled to receive the output images from the computing module. The display module is configured to drive a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display.
- the computer-implemented system may be a real-time 3D video conferencing system.
- the received video stream may be a 3D stereo video stream of images having two views (left and right) per frame.
- the computing module may be further configured to apply at least one one-dimensional (1D) filter corresponding to at least one of the one or more spatial band pass filters.
- the computing module may be further configured to apply at least one two-dimensional (2D) filter corresponding to at least one of the one or more spatial band pass filters.
- the computing module may be further configured to compute the output images in a manner that results in a stereo disparity expansion of views without need of a dense depth map reconstruction.
- the display module may be further configured to enable a user to adjust disparity range in the output images by any of: (i) adjusting a magnification factor in the given spatial band pass filter, and (ii) at least one of defining and translating a disparity mapping function to map a certain phase shift at the spatial frequency of the given spatial band pass filter to a new phase shift.
- the computing module may be further configured to interpolate in-between views.
- the computing module may be further configured to apply spatial band pass filters including capturing correspondence between views using phase differences for multiple spatial frequencies and orientations separately.
- the computing module may be further configured to compute local depth, including representing local depth as a plurality of values instead of as a single value.
- the display module may be further configured to drive the display and the computing module may be further configured to receive the video stream in real-time.
- the computing module may be configured to prealign the initial views with each other before the computing module is configured to apply the one or more spatial band pass filters.
- the optional antialiasing may be used for adding depth-of-field effect.
- the plurality of views may include a relatively low number of views.
- An alternative embodiment is directed to a non-transitory computer readable medium having stored thereon a sequence of instructions which, when loaded and executed by a processor coupled to an apparatus, causes the apparatus to: receive a video stream formed of a sequence of frames, each frame having image content corresponding to a plurality of views, the views being initial views; apply one or more spatial band pass filters to the received image content resulting in filtered images, each spatial band pass filter having a respective spatial frequency band; compute, from the filtered images, one or more output images that synthesize additional views with respect to the initial views, the output images computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter; enable optionally including removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information; and drive a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display.
- FIG. 1A illustrates the present invention method and system presented by Applicants that takes a stream of stereo images as an input, synthesizes additional views that are preferably used for an automultiscopic display, and performs filtering. (“Big Buck Bunny” ⁇ by Blender Foundation).
- FIG. 1B illustrates a non-limiting flow-chart of the present invention method and system of FIG. 1A .
- FIG. 2 is a schematic view of the present invention method and system that takes a 3D stereo stream as an input, and performs a view expansion together with an antialiasing filtering to obtain a correct input for an automultiscopic display. (“Sintel” ⁇ by Blender Foundation).
- FIG. 3 is a graph illustration of an embodiment of Applicants' view expansion.
- FIGS. 4A-4D show embodiments of an automultiscopic display that provide superior image artifact handling, as compared with the existing approaches of ground truth and depth-based rendering. (“Big Buck Bunny” ⁇ by Blender Foundation).
- FIG. 5 shows another embodiment of an automultiscopic display that provides superior image artifact handling, as compared with the existing approach of depth-based rendering. (“Sintel” ⁇ by Blender Foundation).
- FIG. 6 illustrates an embodiment of an automultiscopic display that provides superior reconstruction of reflective and transparent objects, as compared with the existing approach of depth-image-based rendering (DIBR).
- DIBR depth-image-based rendering
- FIG. 7 is a colormap visualizing errors between depth-based rendering and ground truth (top), as well as errors between an embodiment of the present invention and ground truth (bottom), for the example embodiments from FIGS. 4A-4D . (“Big Buck Bunny” ⁇ by Blender Foundation).
- FIG. 8 illustrates that an embodiment of the present invention supports disparity manipulations. (“Sintel” ⁇ by Blender Foundation).
- FIG. 9 is an example embodiment that shows how very large magnification factors (increasing from left to right) may affect the final quality of results. (See “The Stanford Light Field Archive,” which is available from the Internet at lightfield.standford.edu, June 2008).
- FIG. 10 is an example embodiment with four input images, in which the present invention creates views both in the horizontal direction and in the vertical direction. (See “The Stanford Light Field Archive,” which is available from the Internet at lightfield.standford.edu, June 2008).
- FIG. 11 is a block diagram of an embodiment of the present invention.
- An automultiscopic display may reproduce multiple views corresponding to different viewing angles, thereby allowing for a glasses-free 3D and more immersive viewing experience for a user.
- the views are preferably provided to the display.
- One standard technique to acquire multiple images from different locations is to use a camera array.
- Such camera array systems may include calibrated and synchronized sensors, which may record a scene from different locations. The number of cameras may range from a dozen (see for example, the following publication: Matusik, W., and Pfister, H., “3D TV: A Scalable System for Real-Time Acquisition, Transmission, and Autostereoscopic Display of Dynamic Scenes,” ACM Trans. Graph., 23, 3, August 2004, pg.
- Automultiscopic screens preferably produce a light field, which may include a continuous four-dimensional (4D) function representing radiance with respect to a position and a viewing direction (see for example, the following publication: Levoy, M., and Hanrahan, P., “Light Field Rendering,” in Proceedings of the 23 rd Annual Conference on Computer Graphics and Interactive Techniques , ACM, August 1996, pg. 31-42). Due to the discrete nature of an acquisition (i.e., limited number of views), a recorded light field is preferably aliased.
- a plenoptic sampling theory analyzes the spectrum of a reconstructed light field (see for example, the following publications: Chai, J. X., Tong, X., Chan, S.
- aliasing may be due to undersampling of the light field and also because of the limited bandwidth of the display.
- One approach (see Zwicker, M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing for Automultiscopic 3D Displays,” in Proceedings of the 17 th Eurographics conference on Rendering Techniques , Eurographics Association, June 2006, pg. 73-82, hereinafter “Zwicker”), takes both sources of aliasing (undersampling and limited bandwidth, respectively) into account and presents a combined antialiasing framework which filters input views coming from a camera array.
- Zwicker a large number of views is preferably used, which may make the solution in Zwicker impractical in a scenario when 3D stereo content (two views) is available.
- a sequence of images preferably used for an automultiscopic display, preferably corresponds to a set of views captured from different locations. Such a sequence of views may be captured by a camera moving horizontally on a straight line. The problem of creating additional views may be considered as similar to a motion editing problem when the motion in the scene comes from the camera movement.
- a number of techniques may magnify invisible motions.
- motion is explicitly estimated and then magnified, and an image based technique is used to compute frames that correspond to a modified flow (see for example, the following publication that is hereby incorporated by reference: Liu, C., Torralba, A., Freeman, W. T., Durand, F., and Adelson, E. H., “Motion Magnification,” ACM Trans. Graph., 24, 3, July 2005, 519-526).
- a Eulerian approach may eliminate the need of flow computation. Instead of using flow computation, the Eulerian approach processes the video in space and time to amplify the temporal color changes (see Wu, H.
- Wadhwa ( Proc. SIGGRAPH ), 32, 4, July 2013, pg. 80:1-80:10, hereby incorporated by reference, hereinafter “Wadhwa”).
- the method in Wadhwa does not require motion computation and may handle much bigger displacements then the Eulerian approach.
- the method and system of the present invention is inspired by the methods of Wadhwa, and, as such, also does not require motion computation and may handle much bigger displacements then the Eulerian approach.
- correspondence is assumed to be encoded in the phase shift once the left and right views are decomposed into complex-valued steerable pyramids.
- the proposed approach takes as an input a standard 3D stereo video stream (e.g., left and right view), and creates additional views that may be used on an automultiscopic display.
- a standard 3D stereo video stream e.g., left and right view
- the proposed approach is inspired by a phase-based motion magnification technique. Therefore, to follow, a short overview is provided for this phase-based motion magnification method, and then an explanation is provided how the phase-based magnification method may be adapted to create additional views for an automultiscopic display.
- Phase-based motion magnification exploits the steerable pyramid decomposition, which decomposes images according to the spatial scale and orientation. See for example, the following publications that are hereby incorporated by reference: Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J., “Shiftable Multiscale Transforms,” IEEE Transactions on Information Theory, 38, 2, March 1992, pg. 587-607; Simoncelli, E. P., and Freeman, W. T., “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation,” in IEEE International Conference on Image Processing , vol. 3, October 1995, pg. 444-447. If the input signal is a sine wave, a small motion may be encoded in the phase shift between frames. Therefore, the motion may be magnified by modifying the temporal changes of the phase.
- a series of filters ⁇ ⁇ , ⁇ may be used. These filters may correspond to one filter, which may be scaled and rotated according to the scale ⁇ and the orientation ⁇ .
- the steerable pyramid may then be built by applying the filters to the discrete Fourier transform (DFT) ⁇ of each image I from the video sequence.
- DFT discrete Fourier transform
- DFT discrete Fourier transform
- a one-dimensional (1D) case is considered, e.g., a 1D intensity profile ⁇ translating over time with a constant velocity, in order to provide a non-limiting example of how the phase-based motion magnification works.
- the displacement is given by a function ⁇ (t)
- the image changes over time according to ⁇ (x+ ⁇ ( t )).
- the function ⁇ (x+ ⁇ ( t )) may be expressed in the Fourier domain as a sum of complex sinusoids:
- ⁇ is a single frequency and A is amplitude of the sinusoid. From this, a band corresponding to the frequency ⁇ is given by:
- the ⁇ (x+ ⁇ ( t )) is the phase of the sinusoid, and ⁇ (x+ ⁇ ( t )) may include the motion information which may be directly amplified.
- changing individual phases may not lead to meaningful motion editing because the motion may be encoded in the relative changes of the phase over time.
- the phase may be filtered in the temporal direction to isolate desired phase changes, B ⁇ (x,t).
- the filtered phase may be multiplied by a magnification factor ⁇ , and the original phase in band S ⁇ , ⁇ may be increased by the amplified signal B ⁇ (x,t).
- the new modified sub-band with amplified motion is:
- the above-mentioned method generalizes to the two-dimensional (2D) case, where the steerable pyramid decomposition uses filters with a finite spatial support, thereby enabling detecting and amplifying local motions. Additional details regarding the above-mentioned method may be found in the following publications, which are hereby incorporated by reference: Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph . ( Proc. SIGGRAPH ), 32, 4, July 2013, pg. 80:1-80:10; and U.S. patent application Ser. No. 13/607,173, filed on Sep. 7, 2012, now U.S. Patent Publication No. 2014/0072228, published on Mar. 13, 2014.
- Applicants' proposed approach takes a stereoscopic stream as an input and produces a correctly filtered multi-view video for a given automultiscopic display (see the electronic color version of the following paper that uses the proposed approach, hereby incorporated by reference: Didyk, P., Sitthi-Amorn, P., Freeman, W. T., Durand, F., and Matusik, W., “Joint View Expansion and Filtering for Automultiscopic 3D Displays,” ACM Trans. Graph., 32, 6, November 2013, Article No. 221, hereinafter “Applicants' paper”).
- FIG. 1A illustrates the method and system 100 presented by Applicants that takes a stream of stereo images as an input 102 and synthesizes (and/or creates) additional (and/or output) views 104 that are preferably used for an automultiscopic display.
- the output views 104 are also filtered by the method and system 100 to remove inter-view aliasing.
- FIG. 1B illustrates a non-limiting flow-chart of the present invention method and system 100 of FIG. 1A .
- An embodiment includes a computer-implemented method that uses at least one processor and at least one associated memory.
- the embodiment 100 receives 112 a video stream formed of a sequence of frames. Each frame may have image content corresponding to a plurality of views, and the views may be initial views.
- the system/method 100 applies 114 one or more spatial band pass filters to the received image content resulting in filtered images. Each spatial band pass filter may have a respective spatial frequency band. From the filtered images, the system/method 100 computes 116 one or more output images that synthesize additional views with respect to the initial views.
- the output images may be computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter.
- the computing of output images may perform anti-aliasing as an option 118 . That is, system/method 100 allows at 118 optionally including removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information.
- system/method 100 drives a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display 120 .
- FIG. 2 depicts a schematic view of the proposed approach 100 that takes a 3D stereo stream as an input 202 , and performs a view expansion together with antialiasing filtering 208 to obtain a correct input for an automultiscopic display 210 with different views 212 .
- FIG. 2 illustrates two frames (left 204 and right 206 ).
- FIG. 3 illustrates various graphical embodiments of Applicants' method and system 100 of the present invention, including the view expansion process.
- a magnification factor ⁇ (see elements 1118 a , 1118 b , 1118 c , and 1118 d in FIG. 3 ) is preferably adjusted according to the position of the virtual camera 1120 for which the view is generated.
- the present invention method and system 100 may synthesize new views (e.g., create generated views, 1116 ) in an outward direction (as shown in cases 1102 , 1104 , 1106 ), but also interpolate in-between views (as shown in case 1106 ). New views ( 1116 ) may be reconstructed from one or more input images 204 , 206 corresponding to the closest location.
- a given left input image 204 may be used to reconstruct one or more images 1116 to the left of the given image 204 (see corresponding blue regions in FIG. 3 ).
- a given right input image 206 may be used to reconstruct one or more images 1116 to the right of the given image 206 in FIG. 3 (see corresponding locations in green regions in FIG. 3 ).
- FIG. 3 illustrates cases 1102 , 1104 , 1106 with a left frame 204 and a right frame 206 .
- the present invention method and system instead of analyzing the phase changes in the temporal domain, accounts for phase differences in corresponding bands between two input views 204 , 206 .
- a notion of time is not required, so phase shift is therefore denoted as ⁇ ( 1112 ), instead of ⁇ (t) (indicating a time variable), in the description to follow.
- the present invention method and system 100 may take two or more input views that are also one or more left stereo frames, L ( 204 ), and one or more right stereo frames, R ( 206 ), and perform the steerable pyramid decomposition on both left and right frames 204 , 206 , respectively. Then, the present invention method and system 100 may compute the phase difference for each complex coefficient. After modifying the phase differences according to the a value (see elements 1118 a , 1118 b , 1118 c , and 1118 d in FIG. 3 ) and collapsing the pyramids, two or more nearby views are created (see elements 1116 ).
- an advantage of the present invention method and system 100 is that it provides a stereo disparity expansion without a requirement of dense depth map reconstruction, thereby avoiding the significant artifacts which dense depth map reconstruction is prone to.
- processk A process of the present invention method and system 100 , processk, may be defined as follows:
- magnification factors may be computed based on virtual camera positions 1120 that the images correspond to.
- the input images may coincide with locations ⁇ x 0 ( 1130 a ) and x 0 ( 1130 b ), corresponding to the left view, L ( 204 ), and the right view, R ( 206 ), respectively.
- the process of choosing correct magnification factors ( ⁇ values) is shown in FIG. 3 .
- the FIG. 3 examples 1102 , 1104 , 1106 illustrate view expansion, preferably in an outward direction.
- the present invention method and system 100 for new views generation may produce images without interperspective aliasing.
- the views are filtered according to the local depth.
- the process is similar to adding a depth-of-field effect.
- a na ⁇ ve and costly way to filter a single view is to generate a number of neighboring views and average them using weights corresponding to the distance from the original view.
- a key advantage of the present invention method and system 100 is that it may perform the filtering directly on the steerable pyramid decomposition.
- the present invention method and system 100 may derive a closed form solution that may be performed at almost no additional cost computationally.
- the above defined function M may include two or more functions (for right and left views respectively): M R and M L .
- the functions M R and M L may return one of the views, e.g., R′ or L′ respectively.
- the process of antialiasing may be analogous (and/or the same) for both right hand and left hand views R′ and L′.
- the case of the right hand R′ view is described as follows.
- R′ is preferably averaged with its neighboring views according to the weights given by a low pass filter along the viewpoint dimension.
- the filter is given as a function .
- the anti-aliased view ⁇ circumflex over (R) ⁇ ′ may correspond to fixed ⁇ value and ⁇ circumflex over (R) ⁇ ′ may be computed as follows:
- the present invention method and system 100 may approximate the above integration before the reconstruction of the pyramid for each sub-band of R′ separately.
- the corresponding filtered sub-band may be computed as:
- ⁇ ⁇ ( x,y , ⁇ ) ⁇ ( ⁇ ) ⁇ ⁇ ⁇ ( x,y ) d ⁇ , (6)
- the final filtered sub-band may include two components.
- the first component, S ⁇ (x,y), may comprise a sub-band of the original view R.
- the second component may comprise the corresponding integral component, ⁇ ( ⁇ ) ⁇ e o ⁇ d ⁇ , which preferably depends on phase shift ⁇ .
- the dependence on ⁇ may be convenient because in many cases the final filtered sub-band may have a closed form solution, or it may be pre-computed and stored as a lookup table parameterized by phase shift ⁇ .
- each sub-band of view R′ being:
- the above equations for ⁇ tilde over (S) ⁇ ⁇ (x,y, ⁇ ) preferably assume a good estimation of the phase shift ⁇ .
- a phase-based approach (see for example, the following publication that is hereby incorporated by reference: Wadhwa, N., Rubinstein, M., Guttag, J Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph . ( Proc. SIGGRAPH ), 32, 4, July 2013, pg. 80:1-80:10) may underestimate the phase shift ⁇ , which may lead to insufficient filtering. Insufficient filtering may occur when the assumption that the correspondence between two views encoded in the phase difference fails.
- the present invention method and system 100 overcomes the above-mentioned deficiency by correcting the phase shift in each sub-band separately, based on the phase shift in the corresponding sub-band for the lower frequency.
- the present invention method and system 100 processes the entire pyramid, starting from the lowest frequency level. Whenever the phase shift on the level below is greater than ⁇ /2 (90 degrees), the phase shift at the current level may be underestimated. In such a case, the present invention method and system 100 corrects the phase shift by setting its value to twice the phase shift on the lower level. Therefore, the present invention method and system 100 provides a correct phase shift estimation, preferably under the assumption that the correspondence between the input views behaves locally as a translation. Although the correct phase shift estimation may not be crucial for the motion magnification or nearby view synthesis, correct phase shift estimation may be important for the correct antialiasing filtering.
- Various embodiments implementing the above approach are provided. In one embodiment, implementation details and standard running times are included. In an embodiment, detailed comparison is provided between the present invention method and system 100 and a state-of-the-art depth image-based rendering technique (DIBR). In an embodiment, a real-time 3D video conferencing system is presented, in order to showcase the advantages of robustness and efficiency of the inventive method. In an embodiment, the present invention method and system 100 is applied to depth remapping.
- DIBR state-of-the-art depth image-based rendering technique
- the present invention method and system 100 is implemented on a GPU using CUDA (Compute Unified Device Architecture) API (Application Programming Interface), and processes sequences using a NVIDIA GTX TITAN graphics card on an INTEL XEON machine.
- CUDA Computer Unified Device Architecture
- the corresponding steerable pyramid uses eight orientations, which provides a good trade-off between quality and performance.
- the time expended in building a pyramid and reconstructing one additional view is independent of the image content, and it is preferably 15 ms (milliseconds) and 12 ms for building and reconstructing respectively, assuming a content with 816 ⁇ 512 resolution.
- the present invention method and system 100 enables reconstruction of eight views for a standard automultiscopic display at a rate of 8.3 FPS (frames per second).
- An advantage of the present invention method and system 100 is that its memory requirement is relatively low.
- each pyramid preferably requires 137 MB (megabytes) of memory.
- 3 ⁇ 137 MB of memory is required (that is, 2 ⁇ 137 MB for two input views and 137 MB for the synthesized view).
- Existing real-time methods fail to directly compute properly filtered content for automultiscopic 3D displays based on a stereoscopic video stream.
- a following comparison is made between the present invention method and system 100 and a combination of depth-based rendering and antialiasing (e.g., a hypothetical competitive method).
- the hypothetical competitive method takes a stereoscopic video stream as an input, and reconstructs a depth map for each image pair. Then, the competitive method applies a real-time warping technique for synthesis of additional views.
- the competitive method averages 30 neighboring views according to Gaussian weights similar to those that are mentioned above.
- the above-mentioned depth-based rendering is compared with the present invention method and system 100 in at least three non-limiting example embodiments to follow.
- Two of the example embodiments are computer generated animations ( FIGS. 4A-4D and FIG. 5 ).
- the third example embodiment ( FIG. 6 ) is a photograph taken using a 3D camera (an LG OLYMPUS P725 camera).
- the third example is particularly challenging because the captured scene may include both reflections and transparent objects.
- a dense light field is computed (a hundred views for non-limiting example).
- the dense light field enables the use of a ground truth method 412 , e.g., the antialiasing technique proposed by Zwicker, M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing for Automultiscopic 3D Displays,” in Proceedings of the 17 th Eurographics conference on Rendering Techniques , Eurographics Association, June 2006, pg. 73-82.
- FIG. 4A shows a comparison of different content creation approaches for automultiscopic display.
- FIG. 4A shows a comparison of different content creation approaches for automultiscopic display.
- artifacts 408 may be removed by filtering
- existing image-based techniques combined with filtering such as ground truth 412 or depth-based rendering 422 may introduce significant artifacts (see red insets, 418 , 428 , respectively) when depth estimation or ground truth fails.
- These artifacts ( 408 , 418 , 428 ) may be corrected by the method and system 100 of the present invention as shown in the red inset 438 .
- the blue inset 424 shows how incorrect depth estimation results in jaggy depth discontinuities that are not present in the other methods illustrated in FIG. 4A (see blue insets 404 , 414 , 434 ).
- the present invention method and system 100 produces results (see blue inset 434 , green inset 436 , and red inset 438 ) similar to rendering with filtering 422 , but at improved costs that are similar to real-time image-based techniques. See also FIGS. 4B-4D that represent enlarged images of the elements of FIG. 4A , in order to further emphasize the above-mentioned improvements of the present invention.
- FIG. 5 shows a comparison between the method and system 100 of the present invention and depth-based rendering 422 for one of the synthesized views.
- the artifacts 502 , 504 are due to the poor depth estimation for depth-based rendering 422 .
- the blue inset 504 shows how incorrect depth estimation of depth-based rendering 422 results in jaggy depth discontinuities.
- the counterpart blue inset 508 shows that these discontinuities are corrected by the method and system 100 of the present invention.
- red inset 502 depth estimation of the depth-based rendering technique 422 fails in reconstructing depth of the out-of-focus butterfly.
- the method and system 100 of the present invention more accurately reconstructs the butterfly. Therefore, as illustrated in FIG. 5 , the method and system 100 of the present invention produces more accurate (and/or correct) results compared with the depth-based rendering 422 .
- FIG. 6 shows the input images (top images 610 , 612 ) and views that are generated using a depth image-based technique 422 (middle images 620 , 622 ) and views that are generated using the method and system 100 of the present invention (bottom images 630 , 632 ).
- the depth estimation technique shown in images 620 , 622 fails to reconstruct 604 the original highly reflective and transparent objects 602 .
- the method and system 100 of the present invention properly reconstructs 606 the original highly reflective and transparent objects 602 .
- the method and system 100 of the present invention produces more graceful degradation of the image quality comparing to the depth-based rendering (DIBR) method 422 .
- DIBR depth-based rendering
- artifacts produced by the depth-based technique 422 are mostly due to poor depth estimation and not due to incorrect view-synthesis.
- Depth estimation is an ill-posed problem, and such existing DIBR methods 422 may not handle regions with non-obvious per-pixel depth values (e.g., transparencies, reflections, motion blur, defocus blur, and thin structures that have partial coverage) as shown in FIGS. 5-6 .
- Real-time depth estimation methods 422 also have problems with temporal coherence.
- the method and system 100 of the present invention improves results by avoiding producing visible and disturbing artifacts, even when coherence is not explicitly enforced.
- the improvements of the method and system 100 of the present invention is further illustrated in the video accompanying Applicants' paper mentioned above (see video that is hereby incorporated by reference, which is available on the Internet at people.csail.mit.edu, under the directory “pdidyk,” followed by the sub-directory “projects,” and the following sub-directory “MultiviewConversion,” as the file “Multiview Conversion.mp4,” and is also available on the Internet at www.youtube.com under the title “Joint View Expansion and Filtering for Automultiscopic 3D Displays,” hereinafter “Applicants' video” of Nov. 5, 2013).
- FIG. 7 is a colormap visualizing errors between depth-based rendering and ground truth (top) 702 , as well as visualizing errors between the method and system 100 of the present invention and ground truth (bottom) 712 , for the examples from FIG. 4A .
- the differences are computed using the Structural Similarity Metric (SSIM metric) (see for example, the following publication that is hereby incorporated by reference: Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E.
- SSIM metric Structural Similarity Metric
- the error 704 produced by the depth-based technique is localized mostly around depth discontinuities in the image 702 .
- the error 714 produced by the method and system 100 of the present invention is distributed more uniformly across the image 712 , and is therefore less disturbing.
- the error of the method and system 100 of the present invention may be significantly influenced by the different types of blur introduced by the compared methods.
- an embodiment of the method and system 100 of the present invention may apply filtering that provides a more uniform blur, as illustrated in green inset 436 of FIGS. 4A-4D .
- the method and system 100 of the present invention may filter images in both the horizontal direction and the vertical direction.
- the improved results produced by the method and system 100 of the present invention are a result of an over complete representation that it may use. While depth-based approaches estimate one depth value per pixel, which may lead to artifacts in complex cases where no such single value may exist, the method and system 100 of the present invention may capture the correspondence between views using phase differences for multiple spatial frequencies and orientations separately.
- the local depth is not required to be represented as one value, and instead the local depth may be represented as many values, which may also lead to improved performance, including cases where the depth is not well-defined.
- the method and system 100 of the present invention may expand a stereoscopic video stream to a multi-view stream, and to display it on an 8-view automultiscopic screen.
- the method and system 100 of the present invention is shown to work well with these sequences, as illustrated at least in FIGS. 4A-4D and FIGS. 5-7 .
- Video sequences are shown in the above-mentioned Applicants' video cited within Applicants' paper.
- a light-weight, real-time 3D video conferencing system is built, based on the method and system 100 of the present invention, which may include a fast view expansion technique.
- An embodiment of the 3D video conferencing system is illustrated in Applicants' video.
- the 3D video conferencing system comprises at least eight cameras mounted on a linear ring and an automultiscopic display, although the system is not so limited and may comprise more or less cameras.
- the system may operate in at least the two following modes: (1) the system may use the eight cameras to acquire eight corresponding views, or (2) the system may use two of the cameras and compute the other six views using the method and system 100 of the present invention.
- the eight views may be streamed in real-time to the screen, providing an interactive feedback for the users. See Applicants' video for the comparison between views captured using cameras and those generated using the method and system 100 of the present invention. Note that the views rendered by the method and system 100 of the present invention are filtered to avoid aliasing, which is advantageous because it does not add additional cost to the processing. In contrast, in existing approaches, original views captured by eight cameras may include aliasing.
- the method and system 100 of the present invention may also be used for remapping disparities in stereoscopic images and videos. Such modifications are often desired and necessary in order to adjust disparity range in the scene to a given comfort range (see for example, the following publication that is hereby incorporated by reference: Lambooij, M., Ijsselsteijn, W., Fortuin, M., and Heynderickx, I., “Visual Discomfort and Visual Fatigue of Stereoscopic Displays: A Review,” Journal of Imaging Science and Technology, 53, May-June 2009, pg.
- NVIDIA 3D Vision may allow users to change depth range using a simple knob.
- disparity range in a given image may be changed by adjusting a corresponding a value in the above-mentioned view expansion of the method and system 100 of the present invention.
- the result of this adjustment is a global scaling of disparities.
- An example of such manipulations is presented in FIG. 8 .
- FIG. 8 illustrates that the method and system 100 of the present invention supports disparity manipulations.
- FIG. 8 shows stereo images in anaglyph (and/or anaglyph 3D) version (red channel for the left eye and cyan for the right one) 802 , 804 , 806 , 808 for the same scene with different depth ranges (depth increasing from left to right).
- 1D spatial band pass filters as well as 2D spatial band pass filters may be applied to the input stereoscopic images in the above described approach by Applicants.
- user adjustments may be more general (i.e., not limited to changing the magnification factor ⁇ ).
- Applicants' approach is able to perform disparity mapping, including disparity mapping which is defined as a function that maps the input disparity to the output disparity.
- the method and system 100 of the present invention enables the user to adjust the function that maps certain phase shift at a given frequency level (given spatial band pass filter) to a new phase shift.
- the phase-based approach may process video that exhibits small displacements (Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:10, incorporated herein by reference).
- small displacements Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:10, incorporated herein by reference.
- For larger displacements the locality assumption of the motion may not hold. Therefore, for larger displacements, lower spatial frequencies may be correctly reconstructed.
- this deficiency is largely alleviated due to the need of interperspective antialia
- the view synthesis may not correctly reconstruct high frequencies for scene elements with large disparity
- these high frequencies are preferably removed anyway because they usually lie outside of the display bandwidth and may lead to aliasing artifacts.
- magnification factors and/or the interaxial between input images are large, some artifacts may remain visible.
- the method and system 100 of the present invention may reduce the number of cameras significantly.
- FIG. 9 visualizes a case where the magnification factor ⁇ values may be drastically increased.
- FIG. 9 shows how large magnification factors (increasing from left to right) may affect the final quality of results (see images 910 , 912 , 914 , 920 , 922 , 924 , 930 , 932 , 934 , 940 , 942 , 944 ).
- the inter-view antialiasing is reduced to make the artifacts more visible.
- the input images come from “The Stanford Light Field Archive,” which is available from the Internet at lightfield.standford.edu, June 2008.
- the method and system 100 of the present invention is novel at least because it combines view synthesis and antialiasing for automultiscopic display, in contrast to existing approaches.
- the method and system 100 of the present invention described herein does not require explicit depth estimation and alleviates this source of artifacts. Instead, the method and system 100 of the present invention leverages the link between parallax and the local phase of Gabor-like wavelets, in practice complex-valued steerable pyramids. In one embodiment, this enables the method and system 100 of the present invention to exploit the translation-shift theorem and extrapolate the phase difference measured in the two input views.
- the pyramid representation enables the method and system 100 of the present invention to integrate antialiasing directly and avoid expensive numerical prefiltering.
- the method and system 100 of the present invention derives a closed-form approximation to the prefiltering integral that results in a simple attenuation of coefficients based on the band and phase difference.
- the simplicity of the method and system 100 of the present invention is a key advantage because it enables an interactive implementation and provides robust performance even for difficult cases.
- the method and system 100 of the present invention also avoids artifacts at the focal plane, at least because the measured phase difference is zero. For displays that reproduce both horizontal and vertical parallax, the method and system 100 of the present invention may be extended to generate small light fields.
- additional views 1010 are created in the horizontal as well as the vertical direction, using (and surrounding) four input images 1012 .
- the top image array of elements 1010 corresponds to a small light field created from the four images 1012 marked in green.
- the small insets shown below 1020 , 1030 present magnified fragments of the reconstructed images from the image array elements 1010 , 1012 .
- a further enhancement in embodiments involves prealigning the input views/images. Prealignment improves the quality of the output images.
- the disparities between the input images are preferably small. Therefore, the method and system 100 of the present invention may prealign the input images using simple transformations (e.g., shift, shear, etc.) to minimize the disparities, perform the view expansion using the method and system 100 steps described above, and then, may apply a transformation which cancels out the transformation applied to the input images.
- Such prealignment may be guided by a low quality disparity map estimated from input images.
- two images are obtained with a disparities range of (50,60).
- the method and system 100 of the present invention may shift one of the images by 55 pixels, which may change the range of disparities to ( ⁇ 5,+5).
- the method and system 100 of the present invention may be applied to these shifted images, and compensate for the shift by shifting the output images accordingly.
- the shift may be replaced by a simple operation that is easy to revert (e.g., shear), and it may be guided by a poor quality disparity map.
- FIG. 11 is a high-level block diagram of an embodiment 300 of the present invention system and/or method 100 that generates a multi-view autostereoscopic display from a stereoscopic video input according to the principles of the present invention.
- the computer-based system 300 contains a bus 306 .
- the bus 306 is a connection between the various components of the system 300 .
- an input/output device interface 328 for connecting various input and output devices, such as a keypad, controller unit, keyboard (generally 324 ), mouse/pointing device 326 , display, speakers, touchscreen display (generally display device 318 ), etc. to the system 300 .
- the input/output device interface 328 provides an interface for allowing a user to select video display parameters and aspects using any method as is known in the art.
- a central processing unit (CPU) 302 is connected to the bus 306 and provides for the execution of computer instructions.
- Memory 310 provides volatile storage for data used for carrying out computer instructions.
- Storage or RAM 308 provides nonvolatile storage for software instructions such as an operating system.
- the system 300 also comprises a network interface 322 , for connecting to any variety of networks, including wide area networks (WANs), local area networks (LANs), wireless networks, mobile device networks, cable data networks and so on.
- a memory area 304 that is operably and/or communicatively coupled to the processor 302 and to a GPU 320 by a system bus 306 or similar supporting data communication line.
- the memory area 304 may include one, or more than one, forms of memory.
- the memory area 304 may include random access memory (RAM) 308 , which may include non-volatile RAM, magnetic RAM, ferroelectric RAM, and/or other forms of RAM.
- RAM random access memory
- the memory area 304 may also include read-only memory (ROM) 310 and/or flash memory and/or electrically erasable programmable read-only memory (EEPROM).
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- Any other suitable magnetic, optical and/or semiconductor memory, such as a hard disk drive (HDD) 312 by itself or in combination with other forms of memory, may be included in the memory area 304 .
- HDD 312 may be coupled to a disk controller 314 for use in transmitting and receiving messages to and from processor 302 .
- the memory area 304 may also be or may include a detachable or removable memory 316 such as a suitable cartridge disk, CD-ROM, DVD, or USB memory.
- the memory area 304 may in some embodiments effectively include cloud computing memory accessible through network interface 322 , and the like.
- the above examples are exemplary only, and thus, are not intended to limit in any way the definition and/or meaning of the term “memory area.”
- a CPU 302 sends a stream of 3D stereo video images to GPU 320 via a system bus 306 or other communications coupling.
- GPU 320 employs the above-described methods, algorithms and computer-based techniques as programmed in memory area 304 to generate correctly filtered, multi-view video images for automultiscopic display on display device 318 .
- the GPU 320 forms a picture of the screen image and stores it in a frame buffer. This picture is a large bitmap used to continually update and drive the screen image on display device 318 .
- the display device 318 may be, without limitation, a monitor, a television display, a plasma display, a liquid crystal display (LCD), a display based on light emitting diodes (LED), a display based on organic LEDs (OLEDs), a display based on polymer LEDs, a display based on surface-conduction electron emitters, a display including a projected and/or reflected image, or any other suitable electronic device or display mechanism.
- the display device 318 may include a touchscreen with an associated touchscreen controller. The above examples are exemplary only, and thus, are not intended to limit in any way the definition and/or meaning of the term “display device”.
- depth-image-based rendering may be referred to as a depth image-based technique, a depth-based technique, depth-based rendering, and/or depth rendering, and may include depth estimation.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Application No. 61/899,595, filed on Nov. 4, 2013. The entire teachings of the above application are incorporated herein by reference.
- This invention was made with government support under Grant Nos. NSF-CGV-1111415 and NSF-CGV-1116296 awarded by the National Science Foundation. The government has certain rights in the invention.
- A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
- Stereoscopic three-dimensional (3D) content is becoming more popular as it reaches an increasing number of home users. While most of current television (TV) sets are 3D-enabled, and there are plenty of 3D movies and sports programming available, the adoption of stereoscopic 3D is hampered by the use of 3D glasses that are preferably used for a given user to view the content. Multi-view autostereoscopic (or automultiscopic) displays offer a superior visual experience, since they provide both binocular and motion parallax without the use of special glasses. Using an automultiscopic display, a viewer is not restricted to being in a particular position and many viewers may watch the display at the same time. Furthermore, automultiscopic displays may be manufactured inexpensively, for non-limiting example, by adding a parallax barrier or a lenticular screen to a standard display.
- Existing approaches have at least three major problems that the present invention addresses in its solution for multi-view autostereoscopic TV. First, existing 3D content production pipelines provide two views, while multi-view stereoscopic displays preferably use images from many viewpoints. In existing approaches, capturing TV-quality scenes with dense camera rigs may be impractical because of the size and cost of professional quality cameras. A solution to use view-interpolation to generate these additional views preferably uses an accurate depth and inpainting of missing scene regions. Despite progress in stereo depth reconstruction algorithms, the quality of existing approaches is not good enough for TV broadcast and movies. Handling scenes that include defocus blur, motion blur, transparent materials, and specularities is especially challenging in existing approaches.
- Second, multi-view autostereoscopic displays preferably use special filtering to remove interperspective aliasing, e.g., image content that is not supported by a given display. See for example, the following publication that is hereby incorporated by reference: Zwicker, M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing for Automultiscopic 3D Displays,” in Proceedings of the 17th Eurographics conference on Rendering Techniques, Eurographics Association, June 2006, pg. 73-82. Without performing filtering, severe ghosting and flickering may be visible. However, in order to properly antialias a multi-view video, a dense light field is preferably used.
- Third, to assure viewing comfort, image disparities preferably are modified according to the display type, size, and viewer preference. This disparity retargeting step also preferably rerenders the scene with adjusted disparities.
- Applicants' proposed approach includes a method, system, and apparatus that addresses the foregoing limitations of the art. Applicants' proposed approach takes a stereoscopic stream as an input and produces a correctly filtered multi-view video for a given automultiscopic display, as shown in
FIG. 1A . In at least one embodiment, the proposed approach does not require changes to existing (current) stereoscopic production and content delivery pipelines. Additional processing may be performed by the client (e.g., at home). Some advantages of the proposed approach are that it is simple and it may be implemented in hardware. In one embodiment, the proposed approach is implemented on a GPU (Graphics Processing Unit) in a CUDA (Compute Unified Device Architecture) which achieves a near real-time performance. - Some key features of the proposed approach are a steerable pyramid decomposition and filtering that are successfully used for motion magnification in video sequences (see the following publications that at least further describe steerable pyramids, filtering, and pyramid decomposition, and are hereby incorporated by reference: Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:9, U.S. patent application Ser. No. 13/607,173, filed on Sep. 7, 2012, now U.S. Patent Publication No. 2014/0072228, published on Mar. 13, 2014, and U.S. patent application Ser. No. 13/707,451, filed on Dec. 6, 2012, now U.S. Patent Publication No. 2014/0072229, published on Mar. 13, 2014).
- In at least one embodiment, Applicants' proposed approach shows how similar concepts may be used for view interpolation and how the antialiasing filter and disparity remapping may be incorporated without requiring additional cost. In the Figures that follow, at least in
FIGS. 4A-4D andFIGS. 5-10 , results of Applicants' proposed approach are demonstrated on a variety of different scenes including defocus blur, motion blur, and complex appearance, and Applicants' proposed approach is compared to both the ground truth and depth-based rendering approaches. In addition, Applicants demonstrate the proposed approach on a real-time 3D video conferencing system that preferably uses two video cameras and provides a multi-view autostereoscopic experience. - The contributions of the proposed approach include, but are not limited to, an efficient algorithm for joint view expansion, filtering and disparity remapping for multi-view autostereoscopic displays. Applicants also provide herein an evaluation of the proposed approach on a variety of different scenes, along with a comparison to both the ground truth and the state-off-the-art depth-based rendering techniques.
- The proposed approach includes a system and corresponding method that remedies the deficiencies of the existing approaches. The proposed approach is directed to a computer system and a corresponding method for rendering a three-dimensional (3D) video display. An embodiment includes a computer-implemented method that uses at least one processor and at least one associated memory. Embodiments may receive a video stream formed of a sequence of frames. Each frame may have image content corresponding to a plurality of views, and the views may be initial views. The proposed approach may apply one or more spatial band pass filters to the received image content resulting in filtered images. Each spatial band pass filter may have a respective spatial frequency band. From the filtered images, embodiments compute one or more output images that synthesize additional views with respect to the initial views. The output images may be computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter. The computing of output images may enable the option to include removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information. Embodiments drive a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display.
- In another embodiment of the computer-implemented method, the received video stream may be a 3D stereo video stream of images having two views (left and right) per frame. In yet another embodiment, the step of applying one or more spatial band pass filters may include applying a one-dimensional (1D) filter. In a further embodiment, the step of applying spatial band pass filters may include applying a two-dimensional (2D) filter. In another embodiment, the step of computing the output images may be performed in a manner that results in a stereo disparity expansion of views without need of a dense depth map reconstruction.
- In another embodiment of the computer-implemented method, the disparity range in the output images is user adjustable by any of: (i) adjusting a magnification factor in the given spatial band pass filter, and (ii) at least one of defining and translating a disparity mapping function to map a certain phase shift at the spatial frequency of the given spatial band pass filter to a new phase shift. In yet another embodiment, the step of computing may include interpolating in-between views.
- In a further embodiment of the computer-implemented method, the step of applying spatial band pass filters may capture correspondence between views using phase differences for multiple spatial frequencies and orientations separately. In the step of computing, local depth may be represented as a plurality of values instead of as a single value. In another embodiment, the step of driving the display may be in real-time relative to the step of receiving the video stream.
- Another embodiment of the computer-implemented method may include prealigning the initial views with each other before applying the spatial band pass filters. A further embodiment may include optional antialiasing for adding depth-of-field effect. In another embodiment, the plurality of views may include a relatively low number of views.
- An embodiment of a computer-implemented system for rendering a three-dimensional (3D) video display may include a receiving module configured to receive a video stream formed of a sequence of frames. Each frame may have image content corresponding to a plurality of views, the views being initial views. The system may also include a computing module that is responsive to the receiving module and is configured to apply one or more spatial band pass filters to the received image content resulting in filtered images. Each spatial band pass filter may have a respective spatial frequency band. The computing module may be further configured to compute, from the filtered images, one or more output images that synthesize additional views with respect to the initial views. The output images may be computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter. The computing module may be further configured to enable optionally including removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information. The system may also include a display module coupled to receive the output images from the computing module. The display module is configured to drive a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display.
- In another embodiment, the computer-implemented system may be a real-
time 3D video conferencing system. In yet another embodiment of the computer-implemented system, the received video stream may be a 3D stereo video stream of images having two views (left and right) per frame. In a further embodiment, the computing module may be further configured to apply at least one one-dimensional (1D) filter corresponding to at least one of the one or more spatial band pass filters. In another embodiment, the computing module may be further configured to apply at least one two-dimensional (2D) filter corresponding to at least one of the one or more spatial band pass filters. In yet another embodiment, the computing module may be further configured to compute the output images in a manner that results in a stereo disparity expansion of views without need of a dense depth map reconstruction. - In yet another embodiment of the computer-implemented system, the display module may be further configured to enable a user to adjust disparity range in the output images by any of: (i) adjusting a magnification factor in the given spatial band pass filter, and (ii) at least one of defining and translating a disparity mapping function to map a certain phase shift at the spatial frequency of the given spatial band pass filter to a new phase shift. In another embodiment of the computer-implemented system, the computing module may be further configured to interpolate in-between views. In a further embodiment, the computing module may be further configured to apply spatial band pass filters including capturing correspondence between views using phase differences for multiple spatial frequencies and orientations separately. The computing module may be further configured to compute local depth, including representing local depth as a plurality of values instead of as a single value.
- In a further embodiment of the computer-implemented system, the display module may be further configured to drive the display and the computing module may be further configured to receive the video stream in real-time. In another embodiment, the computing module may be configured to prealign the initial views with each other before the computing module is configured to apply the one or more spatial band pass filters. In another embodiment, the optional antialiasing may be used for adding depth-of-field effect. In a further embodiment, the plurality of views may include a relatively low number of views.
- An alternative embodiment is directed to a non-transitory computer readable medium having stored thereon a sequence of instructions which, when loaded and executed by a processor coupled to an apparatus, causes the apparatus to: receive a video stream formed of a sequence of frames, each frame having image content corresponding to a plurality of views, the views being initial views; apply one or more spatial band pass filters to the received image content resulting in filtered images, each spatial band pass filter having a respective spatial frequency band; compute, from the filtered images, one or more output images that synthesize additional views with respect to the initial views, the output images computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter; enable optionally including removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information; and drive a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic 3D video display.
- The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
- The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the present invention.
-
FIG. 1A illustrates the present invention method and system presented by Applicants that takes a stream of stereo images as an input, synthesizes additional views that are preferably used for an automultiscopic display, and performs filtering. (“Big Buck Bunny” © by Blender Foundation). -
FIG. 1B illustrates a non-limiting flow-chart of the present invention method and system ofFIG. 1A . -
FIG. 2 is a schematic view of the present invention method and system that takes a 3D stereo stream as an input, and performs a view expansion together with an antialiasing filtering to obtain a correct input for an automultiscopic display. (“Sintel” © by Blender Foundation). -
FIG. 3 is a graph illustration of an embodiment of Applicants' view expansion. -
FIGS. 4A-4D show embodiments of an automultiscopic display that provide superior image artifact handling, as compared with the existing approaches of ground truth and depth-based rendering. (“Big Buck Bunny” © by Blender Foundation). -
FIG. 5 shows another embodiment of an automultiscopic display that provides superior image artifact handling, as compared with the existing approach of depth-based rendering. (“Sintel” © by Blender Foundation). -
FIG. 6 illustrates an embodiment of an automultiscopic display that provides superior reconstruction of reflective and transparent objects, as compared with the existing approach of depth-image-based rendering (DIBR). -
FIG. 7 is a colormap visualizing errors between depth-based rendering and ground truth (top), as well as errors between an embodiment of the present invention and ground truth (bottom), for the example embodiments fromFIGS. 4A-4D . (“Big Buck Bunny” © by Blender Foundation). -
FIG. 8 illustrates that an embodiment of the present invention supports disparity manipulations. (“Sintel” © by Blender Foundation). -
FIG. 9 is an example embodiment that shows how very large magnification factors (increasing from left to right) may affect the final quality of results. (See “The Stanford Light Field Archive,” which is available from the Internet at lightfield.standford.edu, June 2008). -
FIG. 10 is an example embodiment with four input images, in which the present invention creates views both in the horizontal direction and in the vertical direction. (See “The Stanford Light Field Archive,” which is available from the Internet at lightfield.standford.edu, June 2008). -
FIG. 11 is a block diagram of an embodiment of the present invention. - A description of example embodiments of the invention follows.
- The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
- An automultiscopic display may reproduce multiple views corresponding to different viewing angles, thereby allowing for a glasses-free 3D and more immersive viewing experience for a user. In order to achieve multiple images from different locations, the views are preferably provided to the display. One standard technique to acquire multiple images from different locations is to use a camera array. Such camera array systems may include calibrated and synchronized sensors, which may record a scene from different locations. The number of cameras may range from a dozen (see for example, the following publication: Matusik, W., and Pfister, H., “3D TV: A Scalable System for Real-Time Acquisition, Transmission, and Autostereoscopic Display of Dynamic Scenes,” ACM Trans. Graph., 23, 3, August 2004, pg. 814-824) to over a hundred (also see for example, the following publication: Wilburn, B. S., Smulski, M., Lee, H. H. K., and Horowitz, M. A., “Light Field Video Camera,” in Electronic Imaging, International Society for Optics and Photonics, July 2002, pg. 29-36). However, such camera setups may be impractical (see for example, the following publication: Farre, M., Wang, O., Lang, M., Stefanoski, N., Hornung, A., and Smolic, A., “Automatic Content Creation for Multiview Autostereoscopic Displays Using Image Domain Warping,” in IEEE International Conference on Multimedia and Expo, July 2011, 6 pages) and too expensive for commercial use. Instead, it is possible to use image-based techniques to generate missing views. Most camera setup techniques preferably recover depth information first, and then use a view synthesis method for computing additional views (see for example, the following publication: Smolic, A., Muller, K., Dix, K., Merkle, P., Kauff, P., and Wiegand, T., “Intermediate View Interpolation Based on Multiview Video Plus Depth for
Advanced 3D Video Systems,” in IEEE International Conference on Image Processing, October 2008, pg. 2448-2451). Although there are a number of techniques that try to recover depth information from stereo views (see for example, the following publication: Brown, M. Z., Burschka, D., and Hager, G. D., “Advances in Computational Stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 8, August 2003, pg. 993-1008), recovering depth information from stereo views is an ill-posed problem. Most existing methods are prone to artifacts and temporal inconsistency. The quality of estimated depth maps may be improved in a post-processing step (see for example, the following publication: Richardt, C., Stoll, C., Dodgson, N. A., Seidel, H.-P., and Theobalt, C., “Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos,” Computer Graphics Forum (Proc. Eurographics), 31, 2, May 2012, pg. 247-256). However, post-processing may be a time consuming process. Instead of recovering dense map correspondence, sparse depth maps may be recovered and a warping technique used to compute new views (see Farre, M., Wang, O., Lang, M., Stefanoski, N., Hornung, A., and Smolic, A., “Automatic Content Creation for Multiview Autostereoscopic Displays Using Image Domain Warping,” in IEEE International Conference on Multimedia and Expo, July 2011, 6 pages). Such recovery methods may produce good results but at an expense of computational time which prevents real-time solutions. - Significant developments in display designs exist (see for example, the following publication: Holliman, N. S., Dodgson, N. A., Favalora, G. E., and Pockett, L., “Three-Dimensional Displays: A Review and Applications Analysis,” IEEE Transactions on Broadcasting, 57, 2, June 2011, pg. 362-371). Commercial automultiscopic displays are often based on parallax barriers and/or lenticular sheets. Both parallax barriers and lenticular sheets, placed atop a high resolution panel, trade spatial resolution for angular resolution, and produce multiple images encoded as one image on the panel (see for example, the following publications: Lipton, L., and Feldman, M. H., “New autostereoscopic display technology: The SynthaGram,” in Electronic Imaging, International Society for Optics and Photonics, January 2002, pg. 229-235; and Schmidt, A., and Grasnick, A., “Multi-viewpoint Autostereoscopic Displays from 4D-Vision,” in Electronic Imaging, May 2002 pg. 212-221). Multi-view projector systems also exist (Matusik, W., and Pfister, H., “3D TV: A Scalable System for Real-Time Acquisition, Transmission, and Autostereoscopic Display of Dynamic Scenes,” ACM Trans. Graph., 23, 3, August 2004, pg. 814-824; and Balogh, T., “The HoloVizio System,” in Electronic Imaging, January 2006, pg. 60550U-1-60550U-12). An attempt of building a display which reproduces the entire light field includes a display with 256 views, proposed by Takaki, Y., and Nago, N., “Multi-projection of lenticular displays to construct a 256-view super multi-view display,” Optics Express, 18, 9, April 2010, pg. 8824-8835. Also compressive and multi-layer displays introduce more sophisticated hardware solutions (see for example, the following publications: Akeley, K., Watt, S. J., Girshick, A. R., and Banks, M. S., “A Stereo Display Prototype with Multiple Focal Distances,” ACM Trans. Graph., 23, 3, August 2004, pg. 804-813; and Wetzstein, G., Lanman, D., Hirsch, M., and Raskar, R., “Tensor Displays: Compressive Light Field Synthesis using Multilayer Displays with Directional Backlighting,” ACM Trans. Graph. (Proc. SIGGRAPH), 31, 4, July 2012, pg. 80:1-80:11.). The above-mentioned trends make the multi-view autostereoscopic display a promising solution.
- Automultiscopic screens preferably produce a light field, which may include a continuous four-dimensional (4D) function representing radiance with respect to a position and a viewing direction (see for example, the following publication: Levoy, M., and Hanrahan, P., “Light Field Rendering,” in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, ACM, August 1996, pg. 31-42). Due to the discrete nature of an acquisition (i.e., limited number of views), a recorded light field is preferably aliased. A plenoptic sampling theory analyzes the spectrum of a reconstructed light field (see for example, the following publications: Chai, J. X., Tong, X., Chan, S. C., and Shum, H. Y., “Plenoptic Sampling,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., July 2000, pg. 307-318; and Isaksen, A., McMillan, L., and Gortler, S. J., “Dynamically Reparameterized Light Fields,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press/Addison-Wesley Publishing Co., July 2000, pg. 297-306).
- Based on the above-mentioned existing approaches, some techniques allow for antialiasing of the recorded light field (see Isaksen, A., McMillan, L., and Gortler, S. J., “Dynamically Reparameterized Light Fields,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press/Addison-Wesley Publishing Co., July 2000, pg. 297-306; and Stewart, J., Yu, J., Gortler, S. J., and McMillan, L., “A New Reconstruction Filter for Undersampled Light Fields,” in Proceedings of the 14th Eurographics workshop on Rendering, Eurographics Association, June 2003, pg. 150-156). In the context of an automultiscopic display, aliasing may be due to undersampling of the light field and also because of the limited bandwidth of the display. One approach (see Zwicker, M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing for
Automultiscopic 3D Displays,” in Proceedings of the 17th Eurographics conference on Rendering Techniques, Eurographics Association, June 2006, pg. 73-82, hereinafter “Zwicker”), takes both sources of aliasing (undersampling and limited bandwidth, respectively) into account and presents a combined antialiasing framework which filters input views coming from a camera array. However, in the approach of Zwicker, a large number of views is preferably used, which may make the solution in Zwicker impractical in a scenario when 3D stereo content (two views) is available. - A sequence of images, preferably used for an automultiscopic display, preferably corresponds to a set of views captured from different locations. Such a sequence of views may be captured by a camera moving horizontally on a straight line. The problem of creating additional views may be considered as similar to a motion editing problem when the motion in the scene comes from the camera movement.
- A number of techniques may magnify invisible motions. For example, in the Lagrangian approach, motion is explicitly estimated and then magnified, and an image based technique is used to compute frames that correspond to a modified flow (see for example, the following publication that is hereby incorporated by reference: Liu, C., Torralba, A., Freeman, W. T., Durand, F., and Adelson, E. H., “Motion Magnification,” ACM Trans. Graph., 24, 3, July 2005, 519-526). A Eulerian approach may eliminate the need of flow computation. Instead of using flow computation, the Eulerian approach processes the video in space and time to amplify the temporal color changes (see Wu, H. Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., and Freeman, W. T., “Eulerian Video Magnification for Revealing Subtle Changes in the World,” ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, July 2012, pg. 65:1-65:8, hereby incorporated by reference). A phase-based technique benefits from the observation that in many cases motion may be encoded in a complex-valued steerable pyramid decomposition as coefficients variation (see Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:10, hereby incorporated by reference, hereinafter “Wadhwa”). Compared to previous techniques, the method in Wadhwa does not require motion computation and may handle much bigger displacements then the Eulerian approach. In at least one embodiment, the method and system of the present invention is inspired by the methods of Wadhwa, and, as such, also does not require motion computation and may handle much bigger displacements then the Eulerian approach. In one embodiment, instead of estimating correspondence (depth) between two stereo views (e.g., a left view and a right view), correspondence is assumed to be encoded in the phase shift once the left and right views are decomposed into complex-valued steerable pyramids.
- View Expansion
- In one embodiment, using view expansion, the proposed approach (method and system of the present invention) takes as an input a standard 3D stereo video stream (e.g., left and right view), and creates additional views that may be used on an automultiscopic display. The proposed approach is inspired by a phase-based motion magnification technique. Therefore, to follow, a short overview is provided for this phase-based motion magnification method, and then an explanation is provided how the phase-based magnification method may be adapted to create additional views for an automultiscopic display.
- Phase-Based Motion Magnification
- Phase-based motion magnification exploits the steerable pyramid decomposition, which decomposes images according to the spatial scale and orientation. See for example, the following publications that are hereby incorporated by reference: Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J., “Shiftable Multiscale Transforms,” IEEE Transactions on Information Theory, 38, 2, March 1992, pg. 587-607; Simoncelli, E. P., and Freeman, W. T., “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation,” in IEEE International Conference on Image Processing, vol. 3, October 1995, pg. 444-447. If the input signal is a sine wave, a small motion may be encoded in the phase shift between frames. Therefore, the motion may be magnified by modifying the temporal changes of the phase.
- In order to compute the steerable pyramid, a series of filters Ψω,Θ may be used. These filters may correspond to one filter, which may be scaled and rotated according to the scale ω and the orientation Θ. The steerable pyramid may then be built by applying the filters to the discrete Fourier transform (DFT) Ĩ of each image I from the video sequence. In this manner, a given frame may be decomposed into a number of frequency bands Sω,Θ which have DFT {tilde over (S)}ω,Θ=ĨΨω,Θ. One advantage of such a decomposition is that the response of each filter may be localized, which enables processing of phases locally.
- A one-dimensional (1D) case is considered, e.g., a 1D intensity profile ƒ translating over time with a constant velocity, in order to provide a non-limiting example of how the phase-based motion magnification works. If the displacement is given by a function δ(t), the image changes over time according to ƒ(x+δ(t)). The function ƒ(x+δ(t)) may be expressed in the Fourier domain as a sum of complex sinusoids:
-
- where ω is a single frequency and A is amplitude of the sinusoid. From this, a band corresponding to the frequency ω is given by:
-
S ω(x,t)=A ω e iω*x+δ(t)). (2) - The ω(x+δ(t)) is the phase of the sinusoid, and ω(x+δ(t)) may include the motion information which may be directly amplified. However, changing individual phases may not lead to meaningful motion editing because the motion may be encoded in the relative changes of the phase over time. To amplify motion, first, the phase may be filtered in the temporal direction to isolate desired phase changes, Bω(x,t). Next, the filtered phase may be multiplied by a magnification factor α, and the original phase in band Sω,Θ may be increased by the amplified signal Bω(x,t). Assuming that the filtering applied to the phase removes (direct current) DC components, the new modified sub-band with amplified motion is:
-
Ŝ ω(x,y)=Ŝ ω(x,y)e iαBω (x,t) =A ω e iω(x+(1+α)δ(t)). (3) - The above-mentioned method generalizes to the two-dimensional (2D) case, where the steerable pyramid decomposition uses filters with a finite spatial support, thereby enabling detecting and amplifying local motions. Additional details regarding the above-mentioned method may be found in the following publications, which are hereby incorporated by reference: Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:10; and U.S. patent application Ser. No. 13/607,173, filed on Sep. 7, 2012, now U.S. Patent Publication No. 2014/0072228, published on Mar. 13, 2014.
- Proposed Approach
- Applicants' proposed approach takes a stereoscopic stream as an input and produces a correctly filtered multi-view video for a given automultiscopic display (see the electronic color version of the following paper that uses the proposed approach, hereby incorporated by reference: Didyk, P., Sitthi-Amorn, P., Freeman, W. T., Durand, F., and Matusik, W., “Joint View Expansion and Filtering for
Automultiscopic 3D Displays,” ACM Trans. Graph., 32, 6, November 2013, Article No. 221, hereinafter “Applicants' paper”). -
FIG. 1A illustrates the method andsystem 100 presented by Applicants that takes a stream of stereo images as aninput 102 and synthesizes (and/or creates) additional (and/or output) views 104 that are preferably used for an automultiscopic display. The output views 104 are also filtered by the method andsystem 100 to remove inter-view aliasing. -
FIG. 1B illustrates a non-limiting flow-chart of the present invention method andsystem 100 ofFIG. 1A . An embodiment includes a computer-implemented method that uses at least one processor and at least one associated memory. Theembodiment 100 receives 112 a video stream formed of a sequence of frames. Each frame may have image content corresponding to a plurality of views, and the views may be initial views. Next, the system/method 100 applies 114 one or more spatial band pass filters to the received image content resulting in filtered images. Each spatial band pass filter may have a respective spatial frequency band. From the filtered images, the system/method 100 computes 116 one or more output images that synthesize additional views with respect to the initial views. The output images may be computed from the filtered images of a given spatial band pass filter corresponding to different visual disparities for the respective spatial frequency band of that given band pass filter. The computing of output images may perform anti-aliasing as anoption 118. That is, system/method 100 allows at 118 optionally including removing inter-view (inter-perspective) aliasing by filtering the output images according to local depth using phase shift instead of recovering depth information. Lastly, system/method 100 drives a display with the computed and optionally anti-aliased filtered output images, rendering a multi-view autostereoscopic3D video display 120. -
FIG. 2 depicts a schematic view of the proposedapproach 100 that takes a 3D stereo stream as aninput 202, and performs a view expansion together withantialiasing filtering 208 to obtain a correct input for anautomultiscopic display 210 withdifferent views 212. - As illustrated in
FIG. 2 , in order to expand 3D stereo content to a multi-view video stream, the following observation is made. Similarly to motion magnification, where the motion information may be mostly encoded in the phase change, the parallax between two neighboring views may be encoded in the phase difference. In one embodiment,FIG. 2 illustrates two frames (left 204 and right 206). -
FIG. 3 illustrates various graphical embodiments of Applicants' method andsystem 100 of the present invention, including the view expansion process. A magnification factor α (seeelements FIG. 3 ) is preferably adjusted according to the position of thevirtual camera 1120 for which the view is generated. The present invention method andsystem 100 may synthesize new views (e.g., create generated views, 1116) in an outward direction (as shown incases 1102, 1104, 1106), but also interpolate in-between views (as shown in case 1106). New views (1116) may be reconstructed from one ormore input images cases left input image 204 may be used to reconstruct one ormore images 1116 to the left of the given image 204 (see corresponding blue regions inFIG. 3 ). Also, incases right input image 206 may be used to reconstruct one ormore images 1116 to the right of the givenimage 206 inFIG. 3 (see corresponding locations in green regions inFIG. 3 ). - Similarly to
FIG. 2 which has aleft frame 204 andright frame 206, in an embodiment,FIG. 3 illustratescases left frame 204 and aright frame 206. InFIG. 3 , instead of analyzing the phase changes in the temporal domain, the present invention method and system accounts for phase differences in corresponding bands between twoinput views - As illustrated in the
example cases FIG. 3 , in order to create theadditional views 1116, the present invention method andsystem 100 may take two or more input views that are also one or more left stereo frames, L (204), and one or more right stereo frames, R (206), and perform the steerable pyramid decomposition on both left andright frames system 100 may compute the phase difference for each complex coefficient. After modifying the phase differences according to the a value (seeelements FIG. 3 ) and collapsing the pyramids, two or more nearby views are created (see elements 1116). In at least one embodiment, an advantage of the present invention method andsystem 100 is that it provides a stereo disparity expansion without a requirement of dense depth map reconstruction, thereby avoiding the significant artifacts which dense depth map reconstruction is prone to. - A process of the present invention method and
system 100, processk, may be defined as follows: -
(L′,R′)=M(L,R,α), (4) - where M is the view generation process, and L′ and R′ are the
nearby views 1116 according to the magnification factor α (seeelements FIG. 3 ). The magnification factors may be computed based onvirtual camera positions 1120 that the images correspond to. The input images may coincide with locations −x0 (1130 a) and x0 (1130 b), corresponding to the left view, L (204), and the right view, R (206), respectively. The magnification factor for an arbitrary location x on a givenx-axis 1120 preferably is set to α=(|x|−x0)/(2x0), referring to a (seeelements FIG. 3 ). Because a new image is preferably reconstructed from the input view which is closest to the new location, location x (1130 d) and location −x (1130 c) preferably use the same α value (1118 d, 1118 c, respectively). The process of choosing correct magnification factors (α values) is shown inFIG. 3 . TheFIG. 3 examples 1102, 1104, 1106 illustrate view expansion, preferably in an outward direction. - Antialiasing for Automultiscopic Display
- The present invention method and
system 100 for new views generation may produce images without interperspective aliasing. When producing images without interperspective aliasing, preferably the views are filtered according to the local depth. The process is similar to adding a depth-of-field effect. - A naïve and costly way to filter a single view is to generate a number of neighboring views and average them using weights corresponding to the distance from the original view. In contrast, a key advantage of the present invention method and
system 100 is that it may perform the filtering directly on the steerable pyramid decomposition. The present invention method andsystem 100 may derive a closed form solution that may be performed at almost no additional cost computationally. - Filtering Equation
- In one embodiment, the above defined function M may include two or more functions (for right and left views respectively): MR and ML. The functions MR and ML may return one of the views, e.g., R′ or L′ respectively. The process of antialiasing may be analogous (and/or the same) for both right hand and left hand views R′ and L′. The case of the right hand R′ view is described as follows.
- In order to be filtered, R′ is preferably averaged with its neighboring views according to the weights given by a low pass filter along the viewpoint dimension. In one embodiment, the filter is given as a function . The anti-aliased view {circumflex over (R)}′ may correspond to fixed α value and {circumflex over (R)}′ may be computed as follows:
- In order to perform the filtering directly on the pyramid decomposition, the present invention method and
system 100 may approximate the above integration before the reconstruction of the pyramid for each sub-band of R′ separately. In one embodiment, considering one band Ŝω(x,y,α) of the decomposition of R′, the corresponding filtered sub-band may be computed as: - which may be further transformed:
-
- In one embodiment, the final filtered sub-band may include two components. The first component, Sω(x,y), may comprise a sub-band of the original view R. The second component may comprise the corresponding integral component, ∫(β−α)·eoωβδdβ, which preferably depends on phase shift δ. The dependence on δ may be convenient because in many cases the final filtered sub-band may have a closed form solution, or it may be pre-computed and stored as a lookup table parameterized by phase shift δ.
-
-
- which may result in each sub-band of view R′ being:
-
- In at least one embodiment, the above equations for {tilde over (S)}ω(x,y,α) preferably assume a good estimation of the phase shift δ. A phase-based approach (see for example, the following publication that is hereby incorporated by reference: Wadhwa, N., Rubinstein, M., Guttag, J Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:10) may underestimate the phase shift δ, which may lead to insufficient filtering. Insufficient filtering may occur when the assumption that the correspondence between two views encoded in the phase difference fails. The present invention method and
system 100 overcomes the above-mentioned deficiency by correcting the phase shift in each sub-band separately, based on the phase shift in the corresponding sub-band for the lower frequency. In one embodiment, before applying the factor responsible for the filtering, the present invention method andsystem 100 processes the entire pyramid, starting from the lowest frequency level. Whenever the phase shift on the level below is greater than π/2 (90 degrees), the phase shift at the current level may be underestimated. In such a case, the present invention method andsystem 100 corrects the phase shift by setting its value to twice the phase shift on the lower level. Therefore, the present invention method andsystem 100 provides a correct phase shift estimation, preferably under the assumption that the correspondence between the input views behaves locally as a translation. Although the correct phase shift estimation may not be crucial for the motion magnification or nearby view synthesis, correct phase shift estimation may be important for the correct antialiasing filtering. - Results
- Various embodiments implementing the above approach are provided. In one embodiment, implementation details and standard running times are included. In an embodiment, detailed comparison is provided between the present invention method and
system 100 and a state-of-the-art depth image-based rendering technique (DIBR). In an embodiment, a real-time 3D video conferencing system is presented, in order to showcase the advantages of robustness and efficiency of the inventive method. In an embodiment, the present invention method andsystem 100 is applied to depth remapping. - Implementation Details
- In one embodiment, the present invention method and
system 100 is implemented on a GPU using CUDA (Compute Unified Device Architecture) API (Application Programming Interface), and processes sequences using a NVIDIA GTX TITAN graphics card on an INTEL XEON machine. In one embodiment, the corresponding steerable pyramid uses eight orientations, which provides a good trade-off between quality and performance. In one embodiment, preferably, the time expended in building a pyramid and reconstructing one additional view is independent of the image content, and it is preferably 15 ms (milliseconds) and 12 ms for building and reconstructing respectively, assuming a content with 816×512 resolution. The present invention method andsystem 100 enables reconstruction of eight views for a standard automultiscopic display at a rate of 8.3 FPS (frames per second). An advantage of the present invention method andsystem 100 is that its memory requirement is relatively low. In one embodiment, each pyramid preferably requires 137 MB (megabytes) of memory. Hence, in one embodiment, to process an input stereo sequence, 3×137 MB of memory is required (that is, 2×137 MB for two input views and 137 MB for the synthesized view). - Comparison to Depth-Based Techniques
- Existing real-time methods fail to directly compute properly filtered content for automultiscopic 3D displays based on a stereoscopic video stream. In order to compare existing real-time methods against the present invention method and
system 100, a following comparison is made between the present invention method andsystem 100 and a combination of depth-based rendering and antialiasing (e.g., a hypothetical competitive method). The hypothetical competitive method takes a stereoscopic video stream as an input, and reconstructs a depth map for each image pair. Then, the competitive method applies a real-time warping technique for synthesis of additional views. In order to obtain one antialiased view, the competitive method averages 30 neighboring views according to Gaussian weights similar to those that are mentioned above. For estimating depth, a recent technique is used (see Hosni, A., Rhemann, C., Bleyer, M., Rother, C., and Gelautz, M., “Fast Cost-Volume Filtering for Visual Correspondence and Beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2, February 2013, pg. 504-511, incorporated herein by reference). A view synthesis is applied which is similar to an existing approach (see Didyk, P., Ritschel, T., Eisemann, E., Myszkowski, K., and Seidel, H.-P., “Adaptive Image-space Stereo View Synthesis,” in Proc. VMV, November 2010, 8 pages, incorporated herein by reference). A combination of the two above-mentioned techniques provides a good trade-off between quality and performance. - The above-mentioned depth-based rendering is compared with the present invention method and
system 100 in at least three non-limiting example embodiments to follow. Two of the example embodiments are computer generated animations (FIGS. 4A-4D andFIG. 5 ). The third example embodiment (FIG. 6 ) is a photograph taken using a 3D camera (an LG OLYMPUS P725 camera). The third example is particularly challenging because the captured scene may include both reflections and transparent objects. - For the sequence from
FIGS. 4A-4D , a dense light field is computed (a hundred views for non-limiting example). The dense light field enables the use of aground truth method 412, e.g., the antialiasing technique proposed by Zwicker, M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing forAutomultiscopic 3D Displays,” in Proceedings of the 17th Eurographics conference on Rendering Techniques, Eurographics Association, June 2006, pg. 73-82.FIG. 4A shows a comparison of different content creation approaches for automultiscopic display. InFIG. 4A , in existingapproaches green insets system 100 of the present invention (see green inset 436) which may apply antialiasing. Ghosting artifacts may be removed when the content is filtered which may include rendering hundreds of views (412, 422). Although someartifacts 408 may be removed by filtering, existing image-based techniques combined with filtering, such asground truth 412 or depth-basedrendering 422 may introduce significant artifacts (see red insets, 418, 428, respectively) when depth estimation or ground truth fails. These artifacts (408, 418, 428) may be corrected by the method andsystem 100 of the present invention as shown in thered inset 438. Also inFIG. 4A , theblue inset 424 shows how incorrect depth estimation results in jaggy depth discontinuities that are not present in the other methods illustrated inFIG. 4A (seeblue insets - By comparison to existing techniques, the present invention method and
system 100 produces results (seeblue inset 434,green inset 436, and red inset 438) similar to rendering withfiltering 422, but at improved costs that are similar to real-time image-based techniques. See alsoFIGS. 4B-4D that represent enlarged images of the elements ofFIG. 4A , in order to further emphasize the above-mentioned improvements of the present invention. -
FIG. 5 shows a comparison between the method andsystem 100 of the present invention and depth-basedrendering 422 for one of the synthesized views. Please note theartifacts rendering 422. Theblue inset 504 shows how incorrect depth estimation of depth-basedrendering 422 results in jaggy depth discontinuities. By contrast, the counterpartblue inset 508 shows that these discontinuities are corrected by the method andsystem 100 of the present invention. Also illustrated inFIG. 5 , inred inset 502, depth estimation of the depth-basedrendering technique 422 fails in reconstructing depth of the out-of-focus butterfly. By contrast, as illustrated in the counterpartred inset 506, the method andsystem 100 of the present invention more accurately reconstructs the butterfly. Therefore, as illustrated inFIG. 5 , the method andsystem 100 of the present invention produces more accurate (and/or correct) results compared with the depth-basedrendering 422. - As illustrated in
FIG. 6 , transparent and highly reflective objects may be challenging for depth estimation and view synthesis methods.FIG. 6 shows the input images (top images 610, 612) and views that are generated using a depth image-based technique 422 (middle images 620, 622) and views that are generated using the method andsystem 100 of the present invention (bottom images 630, 632). As illustrated inFIG. 6 , the depth estimation technique shown inimages transparent objects 602. By contrast, the method andsystem 100 of the present invention properly reconstructs 606 the original highly reflective andtransparent objects 602. - As illustrated, at least in
FIGS. 4A-4D andFIGS. 5-6 , the method andsystem 100 of the present invention produces more graceful degradation of the image quality comparing to the depth-based rendering (DIBR)method 422. It is important to note that artifacts produced by the depth-basedtechnique 422 are mostly due to poor depth estimation and not due to incorrect view-synthesis. Depth estimation is an ill-posed problem, and such existingDIBR methods 422 may not handle regions with non-obvious per-pixel depth values (e.g., transparencies, reflections, motion blur, defocus blur, and thin structures that have partial coverage) as shown inFIGS. 5-6 . Real-timedepth estimation methods 422 also have problems with temporal coherence. By contrast withDIBR 422, in at least one embodiment, the method andsystem 100 of the present invention improves results by avoiding producing visible and disturbing artifacts, even when coherence is not explicitly enforced. The improvements of the method andsystem 100 of the present invention is further illustrated in the video accompanying Applicants' paper mentioned above (see video that is hereby incorporated by reference, which is available on the Internet at people.csail.mit.edu, under the directory “pdidyk,” followed by the sub-directory “projects,” and the following sub-directory “MultiviewConversion,” as the file “Multiview Conversion.mp4,” and is also available on the Internet at www.youtube.com under the title “Joint View Expansion and Filtering forAutomultiscopic 3D Displays,” hereinafter “Applicants' video” of Nov. 5, 2013). -
FIG. 7 is a colormap visualizing errors between depth-based rendering and ground truth (top) 702, as well as visualizing errors between the method andsystem 100 of the present invention and ground truth (bottom) 712, for the examples fromFIG. 4A . The differences (illustrated aserrors FIG. 7 , theerror 704 produced by the depth-based technique is localized mostly around depth discontinuities in theimage 702. By contrast, theerror 714 produced by the method andsystem 100 of the present invention is distributed more uniformly across theimage 712, and is therefore less disturbing. - In addition, in an embodiment, the error of the method and
system 100 of the present invention may be significantly influenced by the different types of blur introduced by the compared methods. Referring back toFIG. 4A , while the ground-truth (412) and the depth-based (422) techniques filter images in the horizontal direction, an embodiment of the method andsystem 100 of the present invention may apply filtering that provides a more uniform blur, as illustrated ingreen inset 436 ofFIGS. 4A-4D . In at least one embodiment, the method andsystem 100 of the present invention may filter images in both the horizontal direction and the vertical direction. - In one embodiment, the improved results produced by the method and
system 100 of the present invention are a result of an over complete representation that it may use. While depth-based approaches estimate one depth value per pixel, which may lead to artifacts in complex cases where no such single value may exist, the method andsystem 100 of the present invention may capture the correspondence between views using phase differences for multiple spatial frequencies and orientations separately. In at least one embodiment, the local depth is not required to be represented as one value, and instead the local depth may be represented as many values, which may also lead to improved performance, including cases where the depth is not well-defined. -
Standard 3D Stereo Content - To demonstrate the robustness of the method and
system 100 of the present invention, it is successfully tested on various sequences. These sequences often may include severe compression artifacts, vertical misalignment, and visible color differences between cameras. The method andsystem 100 of the present invention may expand a stereoscopic video stream to a multi-view stream, and to display it on an 8-view automultiscopic screen. The method andsystem 100 of the present invention is shown to work well with these sequences, as illustrated at least inFIGS. 4A-4D andFIGS. 5-7 . Video sequences are shown in the above-mentioned Applicants' video cited within Applicants' paper. - 3D Video Conferencing System
- In one embodiment, a light-weight, real-
time 3D video conferencing system is built, based on the method andsystem 100 of the present invention, which may include a fast view expansion technique. An embodiment of the 3D video conferencing system is illustrated in Applicants' video. In one embodiment, the 3D video conferencing system comprises at least eight cameras mounted on a linear ring and an automultiscopic display, although the system is not so limited and may comprise more or less cameras. The system may operate in at least the two following modes: (1) the system may use the eight cameras to acquire eight corresponding views, or (2) the system may use two of the cameras and compute the other six views using the method andsystem 100 of the present invention. In both of the two modes, the eight views may be streamed in real-time to the screen, providing an interactive feedback for the users. See Applicants' video for the comparison between views captured using cameras and those generated using the method andsystem 100 of the present invention. Note that the views rendered by the method andsystem 100 of the present invention are filtered to avoid aliasing, which is advantageous because it does not add additional cost to the processing. In contrast, in existing approaches, original views captured by eight cameras may include aliasing. Such aliasing may be removed using the method presented by Zwicker, M., Matusik, W., Durand, F., and Pfister, H., “Antialiasing forAutomultiscopic 3D Displays,” in Proceedings of the 17th Eurographics conference on Rendering Techniques, Eurographics Association, June 2006, pg. 73-82, incorporated herein by reference, with the aid of depth image-based rendering. However, it may be prohibitively expensive for a real-time system. - Disparity Manipulations
- The method and
system 100 of the present invention may also be used for remapping disparities in stereoscopic images and videos. Such modifications are often desired and necessary in order to adjust disparity range in the scene to a given comfort range (see for example, the following publication that is hereby incorporated by reference: Lambooij, M., Ijsselsteijn, W., Fortuin, M., and Heynderickx, I., “Visual Discomfort and Visual Fatigue of Stereoscopic Displays: A Review,” Journal of Imaging Science and Technology, 53, May-June 2009, pg. 030201-14), viewer preferences or for an artistic purpose (see for example, the following publication that is hereby incorporated by reference: Lang, M., Hornung, A., Wang, O., Poulakos, S., Smolic, A., and Gross, M., “Nonlinear Disparity Mapping for Stereoscopic 3D,” ACM Trans. Graph., 29, 4, July 2010, pg. 75:1-75:10). For example,NVIDIA 3D Vision may allow users to change depth range using a simple knob. Also, methods that target directly automultiscopic displays exist (see for example, the following publication that is hereby incorporated by reference: Didyk, P., Ritschel, T., Eisemann, E., Myszkowski, K., Seidel, H.-P., and Matusik, W., “A Luminance-Contrast-Aware Disparity Model and Applications,” ACM Trans. Graph. (Proc. SIGGRAPH Asia), 31, 6, November 2012, pg. 184:1-184:10). - Using the method and
system 100 of the present invention, disparity range in a given image may be changed by adjusting a corresponding a value in the above-mentioned view expansion of the method andsystem 100 of the present invention. The result of this adjustment is a global scaling of disparities. An example of such manipulations is presented inFIG. 8 . -
FIG. 8 illustrates that the method andsystem 100 of the present invention supports disparity manipulations.FIG. 8 shows stereo images in anaglyph (and/oranaglyph 3D) version (red channel for the left eye and cyan for the right one) 802, 804, 806, 808 for the same scene with different depth ranges (depth increasing from left to right). - In embodiments, 1D spatial band pass filters as well as 2D spatial band pass filters may be applied to the input stereoscopic images in the above described approach by Applicants. In the case of a 1D filter, user adjustments may be more general (i.e., not limited to changing the magnification factor α). Applicants' approach is able to perform disparity mapping, including disparity mapping which is defined as a function that maps the input disparity to the output disparity. The method and
system 100 of the present invention enables the user to adjust the function that maps certain phase shift at a given frequency level (given spatial band pass filter) to a new phase shift. - In one embodiment, the phase-based approach may process video that exhibits small displacements (Wadhwa, N., Rubinstein, M., Guttag, J., Durand, F., and Freeman, W. T., “Phase-Based Video Motion Processing,” ACM Trans. Graph. (Proc. SIGGRAPH), 32, 4, July 2013, pg. 80:1-80:10, incorporated herein by reference). For larger displacements the locality assumption of the motion may not hold. Therefore, for larger displacements, lower spatial frequencies may be correctly reconstructed. In the context of view synthesis for multi-view autostereoscopic displays, this deficiency is largely alleviated due to the need of interperspective antialiasing. In an embodiment, in a case where the view synthesis may not correctly reconstruct high frequencies for scene elements with large disparity, these high frequencies are preferably removed anyway because they usually lie outside of the display bandwidth and may lead to aliasing artifacts. For cases where magnification factors and/or the interaxial between input images are large, some artifacts may remain visible. However, the method and
system 100 of the present invention may reduce the number of cameras significantly.FIG. 9 visualizes a case where the magnification factor α values may be drastically increased. - In an
embodiment 100,FIG. 9 shows how large magnification factors (increasing from left to right) may affect the final quality of results (seeimages - The method and
system 100 of the present invention is novel at least because it combines view synthesis and antialiasing for automultiscopic display, in contrast to existing approaches. In contrast to existing approaches, in at least one embodiment, the method andsystem 100 of the present invention described herein does not require explicit depth estimation and alleviates this source of artifacts. Instead, the method andsystem 100 of the present invention leverages the link between parallax and the local phase of Gabor-like wavelets, in practice complex-valued steerable pyramids. In one embodiment, this enables the method andsystem 100 of the present invention to exploit the translation-shift theorem and extrapolate the phase difference measured in the two input views. In one embodiment, the pyramid representation enables the method andsystem 100 of the present invention to integrate antialiasing directly and avoid expensive numerical prefiltering. The method andsystem 100 of the present invention derives a closed-form approximation to the prefiltering integral that results in a simple attenuation of coefficients based on the band and phase difference. The simplicity of the method andsystem 100 of the present invention is a key advantage because it enables an interactive implementation and provides robust performance even for difficult cases. The method andsystem 100 of the present invention also avoids artifacts at the focal plane, at least because the measured phase difference is zero. For displays that reproduce both horizontal and vertical parallax, the method andsystem 100 of the present invention may be extended to generate small light fields. - In an
embodiment 100 shown inFIG. 10 ,additional views 1010 are created in the horizontal as well as the vertical direction, using (and surrounding) fourinput images 1012. InFIG. 10 , the top image array ofelements 1010 corresponds to a small light field created from the fourimages 1012 marked in green. The small insets shown below 1020, 1030 present magnified fragments of the reconstructed images from theimage array elements - A further enhancement in embodiments involves prealigning the input views/images. Prealignment improves the quality of the output images. In the method and
system 100 of the present invention, the disparities between the input images are preferably small. Therefore, the method andsystem 100 of the present invention may prealign the input images using simple transformations (e.g., shift, shear, etc.) to minimize the disparities, perform the view expansion using the method andsystem 100 steps described above, and then, may apply a transformation which cancels out the transformation applied to the input images. Such prealignment may be guided by a low quality disparity map estimated from input images. - In a non-limiting example embodiment, two images (left and right) are obtained with a disparities range of (50,60). The method and
system 100 of the present invention may shift one of the images by 55 pixels, which may change the range of disparities to (−5,+5). The method andsystem 100 of the present invention may be applied to these shifted images, and compensate for the shift by shifting the output images accordingly. As mentioned above, the shift may be replaced by a simple operation that is easy to revert (e.g., shear), and it may be guided by a poor quality disparity map. -
FIG. 11 is a high-level block diagram of anembodiment 300 of the present invention system and/ormethod 100 that generates a multi-view autostereoscopic display from a stereoscopic video input according to the principles of the present invention. The computer-basedsystem 300 contains abus 306. Thebus 306 is a connection between the various components of thesystem 300. Connected to thebus 306 is an input/output device interface 328 for connecting various input and output devices, such as a keypad, controller unit, keyboard (generally 324), mouse/pointing device 326, display, speakers, touchscreen display (generally display device 318), etc. to thesystem 300. According to an embodiment of the invention, the input/output device interface 328 provides an interface for allowing a user to select video display parameters and aspects using any method as is known in the art. - A central processing unit (CPU) 302 is connected to the
bus 306 and provides for the execution of computer instructions.Memory 310 provides volatile storage for data used for carrying out computer instructions. Storage orRAM 308 provides nonvolatile storage for software instructions such as an operating system. Thesystem 300 also comprises anetwork interface 322, for connecting to any variety of networks, including wide area networks (WANs), local area networks (LANs), wireless networks, mobile device networks, cable data networks and so on. - In particular the steps of the processes described above and/or any additional processes that may be related to those described above may be stored as computer executable instructions in, for example a
memory area 304 that is operably and/or communicatively coupled to theprocessor 302 and to aGPU 320 by asystem bus 306 or similar supporting data communication line. A “memory area” as used herein, refers generally to any means of storing program code and instructions executable by one or more processors to aid in joint view expansion, filtering and disparity remapping for multi-view autostereoscopic display (i.e., automatically generating a multi-view and filtered 3D video stream from a 3D stereoscopic video stream). Thememory area 304 may include one, or more than one, forms of memory. For example thememory area 304 may include random access memory (RAM) 308, which may include non-volatile RAM, magnetic RAM, ferroelectric RAM, and/or other forms of RAM. Thememory area 304 may also include read-only memory (ROM) 310 and/or flash memory and/or electrically erasable programmable read-only memory (EEPROM). Any other suitable magnetic, optical and/or semiconductor memory, such as a hard disk drive (HDD) 312, by itself or in combination with other forms of memory, may be included in thememory area 304.HDD 312 may be coupled to adisk controller 314 for use in transmitting and receiving messages to and fromprocessor 302. Moreover thememory area 304 may also be or may include a detachable orremovable memory 316 such as a suitable cartridge disk, CD-ROM, DVD, or USB memory. Thememory area 304 may in some embodiments effectively include cloud computing memory accessible throughnetwork interface 322, and the like. The above examples are exemplary only, and thus, are not intended to limit in any way the definition and/or meaning of the term “memory area.” - In embodiments, a
CPU 302 sends a stream of 3D stereo video images toGPU 320 via asystem bus 306 or other communications coupling.GPU 320 employs the above-described methods, algorithms and computer-based techniques as programmed inmemory area 304 to generate correctly filtered, multi-view video images for automultiscopic display ondisplay device 318. TheGPU 320 forms a picture of the screen image and stores it in a frame buffer. This picture is a large bitmap used to continually update and drive the screen image ondisplay device 318. - The
display device 318 may be, without limitation, a monitor, a television display, a plasma display, a liquid crystal display (LCD), a display based on light emitting diodes (LED), a display based on organic LEDs (OLEDs), a display based on polymer LEDs, a display based on surface-conduction electron emitters, a display including a projected and/or reflected image, or any other suitable electronic device or display mechanism. Moreover, thedisplay device 318 may include a touchscreen with an associated touchscreen controller. The above examples are exemplary only, and thus, are not intended to limit in any way the definition and/or meaning of the term “display device”. - While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
- For non-limiting example, depth-image-based rendering (DIBR) may be referred to as a depth image-based technique, a depth-based technique, depth-based rendering, and/or depth rendering, and may include depth estimation.
Claims (26)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/531,548 US9756316B2 (en) | 2013-11-04 | 2014-11-03 | Joint view expansion and filtering for automultiscopic 3D displays |
US14/613,924 US9967538B2 (en) | 2013-11-04 | 2015-02-04 | Reducing view transitions artifacts in automultiscopic displays |
PCT/US2015/014434 WO2015120032A1 (en) | 2014-02-07 | 2015-02-04 | Reducing view transition artifacts in automultiscopic displays |
US15/950,706 US20180249145A1 (en) | 2013-11-04 | 2018-04-11 | Reducing View Transitions Artifacts In Automultiscopic Displays |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361899595P | 2013-11-04 | 2013-11-04 | |
US14/531,548 US9756316B2 (en) | 2013-11-04 | 2014-11-03 | Joint view expansion and filtering for automultiscopic 3D displays |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/613,924 Continuation-In-Part US9967538B2 (en) | 2013-11-04 | 2015-02-04 | Reducing view transitions artifacts in automultiscopic displays |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150124062A1 true US20150124062A1 (en) | 2015-05-07 |
US9756316B2 US9756316B2 (en) | 2017-09-05 |
Family
ID=53006747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/531,548 Active 2035-10-03 US9756316B2 (en) | 2013-11-04 | 2014-11-03 | Joint view expansion and filtering for automultiscopic 3D displays |
Country Status (1)
Country | Link |
---|---|
US (1) | US9756316B2 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150195430A1 (en) * | 2014-01-09 | 2015-07-09 | Massachusetts Institute Of Technology | Riesz Pyramids For Fast Phase-Based Video Magnification |
US20160014387A1 (en) * | 2014-07-10 | 2016-01-14 | Samsung Electronics Co., Ltd. | Multiple view image display apparatus and disparity estimation method thereof |
US20160234479A1 (en) * | 2015-02-09 | 2016-08-11 | Electronics And Telecommunications Research Institute | Device and method for multiview image calibration |
US20160267666A1 (en) * | 2015-03-09 | 2016-09-15 | Samsung Electronics Co., Ltd. | Image signal processor for generating depth map from phase detection pixels and device having the same |
US9805475B2 (en) | 2012-09-07 | 2017-10-31 | Massachusetts Institute Of Technology | Eulerian motion modulation |
US9811901B2 (en) | 2012-09-07 | 2017-11-07 | Massachusetts Institute Of Technology | Linear-based Eulerian motion modulation |
US9967538B2 (en) | 2013-11-04 | 2018-05-08 | Massachussetts Institute Of Technology | Reducing view transitions artifacts in automultiscopic displays |
US10086210B2 (en) | 2013-03-14 | 2018-10-02 | Zoll Medical Corporation | Shock determination based on prior shocks |
WO2018226725A1 (en) * | 2017-06-05 | 2018-12-13 | Massachusetts Institute Of Technology | 3dtv at home: eulerian-lagrangian stereo-to-multi-view conversion |
US10326974B2 (en) * | 2016-01-20 | 2019-06-18 | Shenzhen Skyworth-Rgb Electronic Co., Ltd. | Naked-eye 3D display method and system thereof |
US10354399B2 (en) * | 2017-05-25 | 2019-07-16 | Google Llc | Multi-view back-projection to a light-field |
CN110944222A (en) * | 2018-09-21 | 2020-03-31 | 上海交通大学 | Method and system for immersive media content as user moves |
CN111553850A (en) * | 2020-03-30 | 2020-08-18 | 深圳一清创新科技有限公司 | Three-dimensional information acquisition method and device based on binocular stereo vision |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060290777A1 (en) * | 2005-06-27 | 2006-12-28 | Kyohei Iwamoto | Three-dimensional image display apparatus |
US20070229653A1 (en) * | 2006-04-04 | 2007-10-04 | Wojciech Matusik | Method and system for acquiring and displaying 3D light fields |
US20080252638A1 (en) * | 2005-05-13 | 2008-10-16 | Koninklijke Philips Electronics, N.V. | Cost Effective Rendering for 3D Displays |
US20080291268A1 (en) * | 2005-11-04 | 2008-11-27 | Koninklijke Philips Electronics, N.V. | Rendering of Image Data for Multi-View Display |
US20090262181A1 (en) * | 2008-04-17 | 2009-10-22 | Gal Rotem | Real-time video signal interweaving for autostereoscopic display |
US20100134599A1 (en) * | 2006-11-22 | 2010-06-03 | Ronny Billert | Arrangement and method for the recording and display of images of a scene and/or an object |
US7839549B2 (en) * | 2005-10-20 | 2010-11-23 | Zoran Mihajlovic | Three-dimensional autostereoscopic display and method for reducing crosstalk in three-dimensional displays and in other similar electro-optical devices |
US20100302235A1 (en) * | 2009-06-02 | 2010-12-02 | Horizon Semiconductors Ltd. | efficient composition of a stereoscopic image for a 3-D TV |
US20110032587A1 (en) * | 2009-03-20 | 2011-02-10 | Absolute Imaging LLC | System and Method for Autostereoscopic Imaging |
US20110102423A1 (en) * | 2009-11-04 | 2011-05-05 | Samsung Electronics Co., Ltd. | High density multi-view image display system and method with active sub-pixel rendering |
US20110304708A1 (en) * | 2010-06-10 | 2011-12-15 | Samsung Electronics Co., Ltd. | System and method of generating stereo-view and multi-view images for rendering perception of depth of stereoscopic image |
US20120013651A1 (en) * | 2009-01-22 | 2012-01-19 | David John Trayner | Autostereoscopic Display Device |
US20130002816A1 (en) * | 2010-12-29 | 2013-01-03 | Nokia Corporation | Depth Map Coding |
US8624964B2 (en) * | 2005-12-02 | 2014-01-07 | Koninklijke Philips N.V. | Depth dependent filtering of image signal |
US20140072228A1 (en) * | 2012-09-07 | 2014-03-13 | Massachusetts Institute Of Technology | Complex-Valued Eulerian Motion Modulation |
US20140072229A1 (en) * | 2012-09-07 | 2014-03-13 | Massachusetts Institute Of Technology | Complex-Valued Phase-Based Eulerian Motion Modulation |
US20140285623A1 (en) * | 2011-10-10 | 2014-09-25 | Koninklijke Philips N.V. | Depth map processing |
US20150042770A1 (en) * | 2012-01-06 | 2015-02-12 | Ultra D Copperatief U.A. | Display processor for 3d display |
US20150071360A1 (en) * | 2012-05-18 | 2015-03-12 | The Regents Of The University Of California | Independent thread video disparity estimation method and codec |
US9113043B1 (en) * | 2011-10-24 | 2015-08-18 | Disney Enterprises, Inc. | Multi-perspective stereoscopy from light fields |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US725567A (en) | 1902-09-25 | 1903-04-14 | Frederic E Ives | Parallax stereogram and process of making same. |
US6097394A (en) | 1997-04-28 | 2000-08-01 | Board Of Trustees, Leland Stanford, Jr. University | Method and system for light field rendering |
GB2372659A (en) | 2001-02-23 | 2002-08-28 | Sharp Kk | A method of rectifying a stereoscopic image |
JPWO2004084560A1 (en) | 2003-03-20 | 2006-06-29 | 富田 誠次郎 | 3D image display system |
US7916934B2 (en) | 2006-04-04 | 2011-03-29 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for acquiring, encoding, decoding and displaying 3D light fields |
GB0716776D0 (en) | 2007-08-29 | 2007-10-10 | Setred As | Rendering improvement for 3D display |
US8711204B2 (en) | 2009-11-11 | 2014-04-29 | Disney Enterprises, Inc. | Stereoscopic editing for video production, post-production and display adaptation |
EP2643972A4 (en) | 2010-11-24 | 2016-03-30 | Stergen High Tech Ltd | Improved method and system for creating three-dimensional viewable video from a single video stream |
KR20140000317A (en) | 2010-12-29 | 2014-01-02 | 톰슨 라이센싱 | Method and apparatus for providing mono-vision in multi-view system |
US9041774B2 (en) | 2011-01-07 | 2015-05-26 | Sony Computer Entertainment America, LLC | Dynamic adjustment of predetermined three-dimensional video settings based on scene content |
US9769365B1 (en) | 2013-02-15 | 2017-09-19 | Red.Com, Inc. | Dense field imaging |
US9412172B2 (en) | 2013-05-06 | 2016-08-09 | Disney Enterprises, Inc. | Sparse light field representation |
US9967538B2 (en) | 2013-11-04 | 2018-05-08 | Massachussetts Institute Of Technology | Reducing view transitions artifacts in automultiscopic displays |
WO2015120032A1 (en) | 2014-02-07 | 2015-08-13 | Massachusetts Institute Of Technology | Reducing view transition artifacts in automultiscopic displays |
-
2014
- 2014-11-03 US US14/531,548 patent/US9756316B2/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080252638A1 (en) * | 2005-05-13 | 2008-10-16 | Koninklijke Philips Electronics, N.V. | Cost Effective Rendering for 3D Displays |
US20060290777A1 (en) * | 2005-06-27 | 2006-12-28 | Kyohei Iwamoto | Three-dimensional image display apparatus |
US7839549B2 (en) * | 2005-10-20 | 2010-11-23 | Zoran Mihajlovic | Three-dimensional autostereoscopic display and method for reducing crosstalk in three-dimensional displays and in other similar electro-optical devices |
US20080291268A1 (en) * | 2005-11-04 | 2008-11-27 | Koninklijke Philips Electronics, N.V. | Rendering of Image Data for Multi-View Display |
US8624964B2 (en) * | 2005-12-02 | 2014-01-07 | Koninklijke Philips N.V. | Depth dependent filtering of image signal |
US20070229653A1 (en) * | 2006-04-04 | 2007-10-04 | Wojciech Matusik | Method and system for acquiring and displaying 3D light fields |
US20100134599A1 (en) * | 2006-11-22 | 2010-06-03 | Ronny Billert | Arrangement and method for the recording and display of images of a scene and/or an object |
US20090262181A1 (en) * | 2008-04-17 | 2009-10-22 | Gal Rotem | Real-time video signal interweaving for autostereoscopic display |
US20120013651A1 (en) * | 2009-01-22 | 2012-01-19 | David John Trayner | Autostereoscopic Display Device |
US20110032587A1 (en) * | 2009-03-20 | 2011-02-10 | Absolute Imaging LLC | System and Method for Autostereoscopic Imaging |
US20100302235A1 (en) * | 2009-06-02 | 2010-12-02 | Horizon Semiconductors Ltd. | efficient composition of a stereoscopic image for a 3-D TV |
US20110102423A1 (en) * | 2009-11-04 | 2011-05-05 | Samsung Electronics Co., Ltd. | High density multi-view image display system and method with active sub-pixel rendering |
US20110304708A1 (en) * | 2010-06-10 | 2011-12-15 | Samsung Electronics Co., Ltd. | System and method of generating stereo-view and multi-view images for rendering perception of depth of stereoscopic image |
US20130002816A1 (en) * | 2010-12-29 | 2013-01-03 | Nokia Corporation | Depth Map Coding |
US20140285623A1 (en) * | 2011-10-10 | 2014-09-25 | Koninklijke Philips N.V. | Depth map processing |
US9113043B1 (en) * | 2011-10-24 | 2015-08-18 | Disney Enterprises, Inc. | Multi-perspective stereoscopy from light fields |
US20150042770A1 (en) * | 2012-01-06 | 2015-02-12 | Ultra D Copperatief U.A. | Display processor for 3d display |
US20150071360A1 (en) * | 2012-05-18 | 2015-03-12 | The Regents Of The University Of California | Independent thread video disparity estimation method and codec |
US20140072228A1 (en) * | 2012-09-07 | 2014-03-13 | Massachusetts Institute Of Technology | Complex-Valued Eulerian Motion Modulation |
US20140072229A1 (en) * | 2012-09-07 | 2014-03-13 | Massachusetts Institute Of Technology | Complex-Valued Phase-Based Eulerian Motion Modulation |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10007986B2 (en) | 2012-09-07 | 2018-06-26 | Massachusetts Institute Of Technology | Linear-based eulerian motion modulation |
US9805475B2 (en) | 2012-09-07 | 2017-10-31 | Massachusetts Institute Of Technology | Eulerian motion modulation |
US9811901B2 (en) | 2012-09-07 | 2017-11-07 | Massachusetts Institute Of Technology | Linear-based Eulerian motion modulation |
US10217218B2 (en) | 2012-09-07 | 2019-02-26 | Massachusetts Institute Of Technology | Linear-based Eulerian motion modulation |
US10086210B2 (en) | 2013-03-14 | 2018-10-02 | Zoll Medical Corporation | Shock determination based on prior shocks |
US9967538B2 (en) | 2013-11-04 | 2018-05-08 | Massachussetts Institute Of Technology | Reducing view transitions artifacts in automultiscopic displays |
US20150195430A1 (en) * | 2014-01-09 | 2015-07-09 | Massachusetts Institute Of Technology | Riesz Pyramids For Fast Phase-Based Video Magnification |
US9338331B2 (en) * | 2014-01-09 | 2016-05-10 | Massachusetts Institute Of Technology | Riesz pyramids for fast phase-based video magnification |
US10152803B2 (en) * | 2014-07-10 | 2018-12-11 | Samsung Electronics Co., Ltd. | Multiple view image display apparatus and disparity estimation method thereof |
US20160014387A1 (en) * | 2014-07-10 | 2016-01-14 | Samsung Electronics Co., Ltd. | Multiple view image display apparatus and disparity estimation method thereof |
US20160234479A1 (en) * | 2015-02-09 | 2016-08-11 | Electronics And Telecommunications Research Institute | Device and method for multiview image calibration |
US9906775B2 (en) * | 2015-02-09 | 2018-02-27 | Electronics And Telecommunications Research Institute | Device and method for multiview image calibration |
US20160267666A1 (en) * | 2015-03-09 | 2016-09-15 | Samsung Electronics Co., Ltd. | Image signal processor for generating depth map from phase detection pixels and device having the same |
US9824417B2 (en) * | 2015-03-09 | 2017-11-21 | Samsung Electronics Co., Ltd. | Image signal processor for generating depth map from phase detection pixels and device having the same |
US10326974B2 (en) * | 2016-01-20 | 2019-06-18 | Shenzhen Skyworth-Rgb Electronic Co., Ltd. | Naked-eye 3D display method and system thereof |
US10354399B2 (en) * | 2017-05-25 | 2019-07-16 | Google Llc | Multi-view back-projection to a light-field |
WO2018226725A1 (en) * | 2017-06-05 | 2018-12-13 | Massachusetts Institute Of Technology | 3dtv at home: eulerian-lagrangian stereo-to-multi-view conversion |
US10834372B2 (en) | 2017-06-05 | 2020-11-10 | Massachusetts Institute Of Technology | 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion |
US10972713B2 (en) | 2017-06-05 | 2021-04-06 | Massachusetts Institute Of Technology | 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion |
CN110944222A (en) * | 2018-09-21 | 2020-03-31 | 上海交通大学 | Method and system for immersive media content as user moves |
CN111553850A (en) * | 2020-03-30 | 2020-08-18 | 深圳一清创新科技有限公司 | Three-dimensional information acquisition method and device based on binocular stereo vision |
Also Published As
Publication number | Publication date |
---|---|
US9756316B2 (en) | 2017-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9756316B2 (en) | Joint view expansion and filtering for automultiscopic 3D displays | |
Didyk et al. | Joint view expansion and filtering for automultiscopic 3D displays | |
Anderson et al. | Jump: virtual reality video | |
US20220014723A1 (en) | Enhancing performance capture with real-time neural rendering | |
JP5300258B2 (en) | Method and system for acquiring, encoding, decoding and displaying a three-dimensional light field | |
Zinger et al. | Free-viewpoint depth image based rendering | |
JP5887267B2 (en) | 3D image interpolation apparatus, 3D imaging apparatus, and 3D image interpolation method | |
US9113043B1 (en) | Multi-perspective stereoscopy from light fields | |
JP2008257686A (en) | Method and system for processing 3d scene light field | |
US8094148B2 (en) | Texture processing apparatus, method and program | |
Dąbała et al. | Efficient Multi‐image Correspondences for On‐line Light Field Video Processing | |
US20150379720A1 (en) | Methods for converting two-dimensional images into three-dimensional images | |
Kellnhofer et al. | 3DTV at home: eulerian-lagrangian stereo-to-multiview conversion | |
US20180249145A1 (en) | Reducing View Transitions Artifacts In Automultiscopic Displays | |
Berretty et al. | Real-time rendering for multiview autostereoscopic displays | |
Devernay et al. | Adapting stereoscopic movies to the viewing conditions using depth-preserving and artifact-free novel view synthesis | |
Adhikarla et al. | Real-time adaptive content retargeting for live multi-view capture and light field display | |
US11277633B2 (en) | Method and apparatus for compensating motion for a holographic video stream | |
EP2822279B1 (en) | Autostereo tapestry representation | |
WO2015120032A1 (en) | Reducing view transition artifacts in automultiscopic displays | |
Gurrieri et al. | Stereoscopic cameras for the real-time acquisition of panoramic 3D images and videos | |
US10972713B2 (en) | 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion | |
Tan et al. | A system for capturing, rendering and multiplexing images on multi-view autostereoscopic display | |
Jin et al. | Joint multilateral filtering for stereo image generation using depth camera | |
Colombari et al. | Continuous parallax adjustment for 3D-TV |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSET Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DIDYK, PIOTR KRZYSZTOF;REEL/FRAME:035394/0754 Effective date: 20141118 |
|
AS | Assignment |
Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSET Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SITTHI-AMORN, PITCHAYA;MATUSIK, WOJCIECH;DURAND, FREDERIC;AND OTHERS;SIGNING DATES FROM 20150416 TO 20150428;REEL/FRAME:035537/0367 |
|
AS | Assignment |
Owner name: NATIONAL SCIENCE FOUNDATION, VIRGINIA Free format text: CONFIRMATORY LICENSE;ASSIGNOR:MASSACHUSETTS INSTITUTE OF TECHNOLOGY;REEL/FRAME:036806/0144 Effective date: 20151005 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN) |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |