US10834372B2 - 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion - Google Patents

3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion Download PDF

Info

Publication number
US10834372B2
US10834372B2 US16/000,662 US201816000662A US10834372B2 US 10834372 B2 US10834372 B2 US 10834372B2 US 201816000662 A US201816000662 A US 201816000662A US 10834372 B2 US10834372 B2 US 10834372B2
Authority
US
United States
Prior art keywords
disparity
wavelet
pyramid
wavelets
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/000,662
Other languages
English (en)
Other versions
US20180352208A1 (en
Inventor
Wojciech Matusik
Piotr K. Didyk
William T. Freeman
Petr Kellnhofer
Pitchaya Sitthi-Amorn
Frederic Durand
Szu-Po Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Massachusetts Institute of Technology
Original Assignee
Massachusetts Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Massachusetts Institute of Technology filed Critical Massachusetts Institute of Technology
Priority to US16/000,662 priority Critical patent/US10834372B2/en
Publication of US20180352208A1 publication Critical patent/US20180352208A1/en
Priority to US16/725,448 priority patent/US10972713B2/en
Assigned to MASSACHUSETTS INSTITUTE OF TECHNOLOGY reassignment MASSACHUSETTS INSTITUTE OF TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Kellnhofer, Petr, MATUSIK, WOJCIECH, DURAND, FREDERIC, FREEMAN, WILLIAM T., DIDYK, PIOTR K., WANG, SZU-PO
Assigned to MASSACHUSETTS INSTITUTE OF TECHNOLOGY reassignment MASSACHUSETTS INSTITUTE OF TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SITTHI-AMORN, Pitchaya, Kellnhofer, Petr, FREEMAN, WILLIAM T., MATUSIK, WOJCIECH, DIDYK, PIOTR K., DURAND, FREDERIC, WANG, SZU-PO
Application granted granted Critical
Publication of US10834372B2 publication Critical patent/US10834372B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20064Wavelet transform [DWT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • S3D stereoscopic 3D
  • Today many movie blockbusters are released in a stereo format.
  • the popularity of S3D in the movie theaters has not translated to equivalent popularity at homes.
  • Most current television sets support S3D and the content providers offer streaming stereoscopic content the adoption of S3D at home remains very low.
  • the described embodiments of an Eulerian-Lagrangian stereo-to-multi-view conversion method and system may be used to provide a three-dimensional television (3DTV) experience in the homes of ordinary consumers.
  • the described embodiments expand existing stereoscopic content to a high-quality, multi-view format in real time.
  • real time refers to actions perceived to occur with little or no delay by an ordinary viewer.
  • the methods of the described embodiments may be implemented efficiently in hardware and naturally support disparity manipulations.
  • the standard depth image-based rendering methods of prior approaches are limited to small disparities.
  • the described embodiments overcome this limitation by combining a phase-based approach (i.e., an Eulerian approach) with standard depth image-based rendering (i.e., a Lagrangian approach).
  • Example embodiments described herein may decompose the stereoscopic input signal using a set of filters inspired by a steerable pyramid decomposition.
  • the basis functions of this transform may resemble Gabor-like wavelets. Accordingly, the example embodiments described herein may refer to the basis functions as wavelets, although decomposition based on other basis functions may also be used.
  • the disparity information may be estimated for each of the wavelets separately, using a combination of standard disparity estimation and phase-based measures.
  • the described embodiments may apply a wavelet re-projection, which moves wavelets according to their disparities. Such an approach may handle large disparities, while preserving all the advantages of the Eulerian approach.
  • the example embodiments described herein demonstrate that real-time performance may be provided both on a graphics processing unit (GPU), and a field-programmable gate array (FPGA).
  • GPU graphics processing unit
  • FPGA field-programmable gate array
  • the invention may be a method of converting stereo video content to multi-view video content that combines an Eulerian approach with a Lagrangian approach.
  • the method may comprise decomposing a stereoscopic input using a set of basis functions to produce a set of decomposed signals of one or more frequencies, estimating disparity information for each of the decomposed signals, and synthesizing novel views by re-projecting the decomposed signals, the re-projecting comprising moving the decomposed signals according to the disparity information.
  • the decomposed signals may be a sum of basis functions of the form
  • b ⁇ f ⁇ ( ⁇ ) cos ⁇ ( ⁇ 2 ⁇ log w ⁇ ( ⁇ / f ) ) ⁇ ⁇ ⁇ ( 1 2 ⁇ log w ⁇ ( ⁇ / f ) ) .
  • the method may further comprise generating a disparity map for each of a left view and a right view of the stereoscopic input, and establishing an initial disparity correspondence, between each of a corresponding set of left and right decomposed signals, based on the generated disparity maps;
  • the method may further comprise generating a disparity map for each of the left and right views of a received stereoscopic frame, and, for each corresponding pair of left and right scanlines of the received stereoscopic frame, decomposing the left and right scanlines into a left wavelet and a right wavelet, each of the wavelets being a sum of basis functions.
  • the method may further comprise establishing an initial disparity correspondence between the left wavelet and the right wavelet based on the generated disparity maps, refining the initial disparity between the left wavelet and the right wavelet using a phase difference between the corresponding wavelets and reconstructing at least one novel view based on the left and right wavelets.
  • the basis functions may be of the form
  • b ⁇ f ⁇ ( ⁇ ) cos ⁇ ( ⁇ 2 ⁇ log w ⁇ ( ⁇ / f ) ) ⁇ ⁇ ⁇ ( 1 2 ⁇ log w ⁇ ( ⁇ / f ) ) .
  • the method may further comprise determining a per-wavelet disparity as an average of disparities in a local neighborhood of the wavelet, the size of which is substantially equal to spacing of associated wavelets.
  • D r (y)/(s+1), where s
  • refining the initial disparity may further comprise transforming the phase difference into a disparity residual by multiplying the phase difference by f/2 ⁇ , where f is a frequency associated with the wavelet, and adding the disparity residual to the initial disparity determination to produce a per-wavelet refined disparity estimate.
  • the method may further comprise filtering the per-wavelet disparity estimate using a two-dimensional mean filter having a kernel size equal to double wavelet spacing of neighboring wavelets.
  • reconstructing at least one novel view may comprise determining a new position of each wavelet to specify a displaced wavelet, wherein the new position is x+a ⁇ d, x being a location of the wavelet, d being a disparity of the wavelet and a being a constant.
  • Reconstructing at least one novel view may further comprise converting the displaced wavelet to be uniformly-spaced, using a non-uniform Fourier transform, and reconstructing a displaced one-dimensional scanline signal based on the displaced wavelet, using pyramid reconstruction.
  • the invention may be a system for converting stereo video content to multi-view video content.
  • the system may comprise a frame duplicator configured to receive a stereoscopic frame and produce a first stereoscopic frame copy and a second stereoscopic frame copy therefrom.
  • the system may further comprise a pyramid decomposition processor configured to receive the first stereoscopic frame copy and produce a first pyramid corresponding to the first stereoscopic frame copy and a second pyramid corresponding to the first stereoscopic frame copy.
  • the system may further comprise a disparity processor configured to receive the second stereoscopic frame copy, and decomposition information from the pyramid decomposition processor, and to produce disparity information therefrom.
  • the system may further comprise a re-projection processor configured to receive the first pyramid and the second pyramid from the pyramid decomposition processor, and disparity information from the disparity processor, and to produce pyramid re-projection information therefrom.
  • the system may further comprise a pyramid reconstruction processor configured to receive the re-projection information from the wavelet re-projection processor and to produce at least one novel view therefrom.
  • the pyramid decomposition processor may produce the first pyramid and the second pyramid as a sum of basis functions having the form
  • b ⁇ f ⁇ ( ⁇ ) cos ⁇ ( ⁇ 2 ⁇ log w ⁇ ( ⁇ / f ) ) ⁇ ⁇ ⁇ ( 1 2 ⁇ log w ⁇ ( ⁇ / f ) ) .
  • the disparity processor may (i) generate a disparity map for each of a left view and a right view of the stereoscopic frame and (ii) establish an initial disparity correspondence between a left pyramid and a right pyramid based on the generated disparity maps.
  • the disparity processor may further refine the initial disparity between the left and right pyramid functions using a phase difference between the corresponding pyramid functions.
  • the disparity processor may determine a per-pyramid function disparity as an average of disparities in a local neighborhood of the pyramid function, the size of which is substantially equal to spacing of associated pyramid functions.
  • D r (y)/(s+1), where s
  • the disparity processor may be further configured to transform the phase difference into a disparity residual by multiplying the phase difference by f/2 ⁇ , where f is a frequency associated with the pyramid, and add the disparity residual to the initial disparity determination to produce a per-pyramid refined disparity estimate.
  • the pyramid reconstruction processor may be further configured to determine a new position of each pyramid to specify a displaced pyramid.
  • the new position may be x+a ⁇ d, x being a location of the pyramid, d being a disparity of the pyramid and a being a constant.
  • the pyramid reconstruction processor may be further configured to convert the displaced pyramid to be uniformly-spaced, using a non-uniform Fourier transform, and reconstruct a displaced one-dimensional scanline signal based on the displaced pyramid, using pyramid reconstruction.
  • the invention may be a non-transitory computer-readable medium with computer code instruction stored thereon, the computer code instructions, when executed by a processor, cause an apparatus to generate a disparity map for each of the left and right views of a received stereoscopic frame, and for each corresponding pair of left and right scanlines of the received stereoscopic frame, decompose the left and right scanlines into a left wavelet and a right wavelet, each of the wavelets being a sum of basis functions.
  • the computer code instructions when executed by a processor, may further cause the apparatus to establish an initial disparity correspondence between the left wavelet and the right wavelet based on the generated disparity maps, refine the initial disparity between the left wavelet and the right wavelet using a phase difference between the corresponding wavelets, and reconstruct at least one novel view based on the left and right wavelets.
  • FIGS. 1A and 1B illustrate limitations of the Lagrangian approach.
  • FIGS. 2A and 2B illustrate limitations of the Eulerian approach.
  • FIGS. 3A and 3B illustrate improvements with respect to the Lagrangian approach and the Eulerian approach shown in FIGS. 1A, 1B, 2A, and 2B .
  • FIGS. 4A and 4B illustrate the entire stereoscopic to three-dimensional conversion process according to the invention.
  • FIG. 5 illustrates visualizations of the filters of the described embodiments used to perform wavelet decomposition.
  • FIG. 6 illustrates resolving occlusion of wavelets according to the invention.
  • FIG. 7 illustrates antialiasing and disparity adjustment facilitated by the described embodiments.
  • FIG. 8 depicts each stage in an example embodiment hardware implementation 800 of the described embodiments.
  • FIG. 9 shows a diagram of an example internal structure of a processing system 900 that may be used to implement one or more of the embodiments herein.
  • FIGS. 10A, 10B and 10C further illustrate improvements of the described embodiments with respect to the Lagrangian approach and the Eulerian approach
  • the described embodiments provide an end-to-end solution for multi-view content creation that exploits complementary advantages of Lagrangian and Eulerian techniques and overcomes their limitations.
  • Lagrangian techniques recover depth information first, and then use re-projection to create novel views.
  • prior work developed systems for real-time stereo-to-multi-view conversion, and similar techniques are used in the context of view reprojection for virtual reality.
  • many sophisticated techniques for depth estimation have been proposed, this is still a challenging problem, especially in the case of real-time applications.
  • the prior methods still suffer from low-quality depth maps, if the performance of the system is of high importance.
  • FIGS. 1A and 1B illustrate some of the aforementioned limitations of the Lagrangian approach.
  • the inset 102 of FIG. 1A is shown in FIG. 1B .
  • Arrows in FIG. 1B illustrate fuzzy depth edges resulting from conversions using the Lagrangian approach.
  • Eulerian techniques estimate local changes using local phase information, as opposed to recovering depth or optical flow information explicitly. Advantages of phase-based processing are often attributed to the overcomplete representation. Instead of one per-pixel depth value, phase-based approaches consider localized, per-band information. This leads to better results in difficult cases where per-pixel information cannot be reliably estimated (e.g., depth-of-field, motion blur, specularities, etc.), and more accurate estimates due to the sub-pixel precision of these techniques. Another argument is that phase-based manipulations are semi-local and cannot have catastrophic failures like pixel warping does. As a result, such methods provide graceful quality degradation.
  • FIGS. 2A and 2B illustrate some of the aforementioned limitations of the Eulerian approach.
  • the inset 202 of FIG. 2A is shown as FIG. 2B .
  • Arrows in FIG. 2B illustrate ringing artifacts caused by exhaustive input disparities resulting from conversions using the Eulerian approach.
  • the described embodiments of the invention address the problem of limited disparity support by combining a phase-based technique with a Lagrangian approach, which pre-aligns views to reduce disparity so that the Eulerian approach can be applied.
  • One prior technique known as view-synthesis, addresses the problem of reconstructing a light field from a micro-baseline image pair.
  • the view-synthesis technique relies both on disparity and phase information.
  • the described embodiments use a concept of per-wavelet disparity, which provides much richer representation. Another difference is that the described embodiments a real-time solution capable of performing the stereo-to-multi-view conversion on the fly.
  • the described embodiments are based on a steerable pyramid decomposition, but augmented with depth information. This enables handling large disparities, which was the main limitation of previous phase-based methods.
  • the described embodiments do not share disparity information between different frequency levels of our decomposition, as is the case with prior approaches working in the context of multi-scale approaches (also known as “coarse-to-fine propagation” techniques). Not sharing disparity information between different frequency levels leads to a more flexible representation for cases where a single per-pixel disparity is not defined, as for multiple depth-separated image layers.
  • the described embodiments also reduce the conversion problem to a set of one dimensional (1D) problems, which significantly improves performance.
  • the described embodiments introduce a new view synthesis approach which re-projects wavelets.
  • the described embodiments employ a non-uniform Fourier transform.
  • the domain and technique are significantly different. All the above steps make the approach of the described embodiments suitable for hardware implementation, as described herein.
  • FIGS. 3A and 3B illustrate improvements resulting from the use of the described embodiments, with respect to the limitations shown in FIGS. 1A, 1B, 2A and 2B , of the Lagrangian and Eulerian approaches.
  • the inset 302 of FIG. 3A is shown as FIG. 3B .
  • Arrows in FIG. 3B illustrate improvements with respect to the Lagrangian and Eulerian approaches.
  • FIGS. 10A, 10B, and 10C further illustrate improvements of the described embodiments with respect to the Lagrangian and Eulerian approaches.
  • the described embodiments take a rectified stereoscopic image pair as an input, together with corresponding disparity maps.
  • the images are decomposed into wavelet representations, and disparity maps are used to determine per-wavelet disparity.
  • wavelet representations for image decomposition
  • other functions may alternatively be used as a basis for decomposition.
  • such functions may be specified through machine learning.
  • the bases functions are described by learnable parameters which are predicted for each input stereoscopic pair by a network pre-trained to minimize perceived visual difference between the view synthesized using our method with given basis functions and the ground truth in a dataset of multiview images.
  • the disparity maps may be of relatively low quality.
  • the described embodiments are concerned with reproduction of horizontal parallax, and use low-resolution disparity maps (determined according to, e.g., Fast cost - volume filtering for visual correspondence and beyond , Asmaa Hosni, Christoph Rhemann, Michael Bleyer, Carsten Rother, and Margrit Gelautz, IEEE Trans. on Pattern Analysis and Machine Intelligence 35, 2 (2013), 504-511).
  • the described embodiments may also rectify the input views (determined according to, e.g., A compact algorithm for rectification of stereo pairs , Andrea Fusiello, Emanuele Trucco, and Alessandro Verri, Machine Vision and Applications 12, 1 (2000), 16-22).
  • the described embodiments next refine per-wavelet disparity by incorporating phase information.
  • the described embodiments implement an image-based rendering approach tailored to the decomposition.
  • the technique of the embodiments re-projects whole wavelets. It supports both view interpolation and extrapolation in a unified way. The two operations differ only in the direction in which wavelet locations are altered.
  • Disparity is an important cue to synthesize novel views.
  • disparity maps (D l and D r ) encode the correspondence between left and right views (L and R). More formally, if for a given position in the world space, its projections into the left and the right views are x l and x r , the disparity is defined as the distance between those locations in the screen space. A signed distance is considered to distinguish between locations in front of and behind the zero-disparity plane.
  • the described embodiments consider per-wavelet, instead of per-pixel, disparity. This allows the use of phase information to improve the quality of the estimates and overcome limitations of previous Lagrangian and Eulerian approaches.
  • To determine per-wavelet disparities the input images are first decomposed into wavelet representations. Then, for each wavelet, the initial disparity is determined from the input disparity maps. In the next step, this information is refined by additionally considering local phase information.
  • FIGS. 4A and 4B An example embodiment, depicted in FIGS. 4A and 4B , demonstrates the entire stereoscopic to multiview conversion process. The process is depicted graphically in FIG. 4A , and as a flow diagram in FIG. 4B .
  • the example embodiments described herein consider wavelets as basic elements of a picture and estimate disparity for each of them. As set forth herein, however, functions other than wavelets may alternatively be used.
  • a stereoscopic image pair I l and I r is received, and each image scanline 402 , 404 is considered independently.
  • the scanlines are decomposed into wavelets and the initial correspondence between wavelets is found, from the left and the right views, based on the input disparity maps D l and D r .
  • the position difference of the corresponding wavelets defines the initial disparity information.
  • the phase difference of the wavelets is computed and combined with the initial disparity estimation.
  • the disparity information is not a single disparity map. Instead, one disparity map is obtained for each pyramid level.
  • FIG. 4B depicts an example method of converting stereo video content to multi-view video content, according to the invention.
  • the process may comprise generating 420 a disparity map for each of the left and right views of a received stereoscopic frame. For each corresponding pair of left and right scanlines of the received stereoscopic frame, the process further comprises decomposing 422 the left and right scanlines into a left sum of wavelets or other basis functions, and a right sum of wavelets or other basis functions.
  • the process may further comprise establishing 424 an initial disparity correspondence between the left wavelets and the right wavelets based on the generated disparity maps, refining 426 the initial disparity between the left wavelet and the right wavelet using a phase difference between the corresponding wavelets, and reconstructing 428 at least one novel view based on the left and right wavelets.
  • each pair of corresponding scanlines (1D signals, I r 402 and I l 404 ) of the right and left views are considered separately and are represented as a sum of basis functions b f with a frequency response defined as:
  • b ⁇ f ⁇ ( ⁇ ) cos ⁇ ( ⁇ 2 ⁇ log w ⁇ ( ⁇ / f ) ) ⁇ ⁇ ⁇ ( 1 2 ⁇ log w ⁇ ( ⁇ / f ) ) ( 1 )
  • f ⁇ specifies the central frequency of the filter
  • is a rectangular function centered around zero that extends from ⁇ 0.5 to 0.5
  • w defines the width of filters, i.e., the ratio of central frequencies of neighboring levels.
  • n ⁇ 4 . . . log 2 (length( l )) ⁇ ).
  • the filters in Equation (1) are 1D filters (which may be determined based on, e.g., The steerable pyramid: A flexible architecture for multi - scale derivative computation , Eero P Simoncelli and William T Freeman, In Image Processing, International Conference on, Vol. 3. IEEE Computer Society, 3444-3444 (1995)).
  • the filters of the example embodiments allow for computing local phase and amplitude, but lack information on orientation.
  • An additional low-pass filter, ⁇ circumflex over (b) ⁇ 0 ( ⁇ ) ⁇ square root over (1 ⁇ ⁇ circumflex over (b) ⁇ f 2 ( ⁇ )) ⁇ (2) collects the residual low-frequency components.
  • the filters of the example embodiment, used to perform wavelet decomposition, are visualized in FIG. 5 .
  • the top graph shows the frequency response for several filters 502 , 504 , 506 , 508 , 510 .
  • the middle graph shows the real part in the spatial domain, and the bottom graph shows the imaginary part in the spatial domain.
  • the plots in the spatial domain are scaled for visualization purposes.
  • the decomposition can be easily inverted by summing up wavelets for all frequencies in ( ) and the additional residual component from Equation (2), as follows:
  • the additional factor of two compensates for the fact that the complex wavelets are obtained only from positive frequency components, and factor of length(I)/
  • max( f ⁇ f/ 2,1) ⁇ x ⁇ min( f+f+f/ 2,length( I )) ⁇ for f ⁇ and X 0 x fmin where fmin is the lowest frequency in . These sets have overlapping regions such that each wavelet is sampling at least twice, which prevents aliasing. In the described embodiments, both decomposition and reconstruction are performed in the frequency domain.
  • each 1D scanline is transformed into the frequency domain, multiplied with the filters, and the result is transformed to the pixel domain.
  • the reconstruction is done similarly in the frequency domain, but this step requires a non-uniform Fourier transform.
  • the described embodiments establish a correspondence between I r and I l using input disparity maps (D r and D l ). More precisely, for each wavelet ⁇ rfx from I r a corresponding wavelet ⁇ lfx′ is sought from I l . To this end, for each ⁇ rfx a disparity value is determined from D r . Because each wavelet spans a certain spatial extent, there is no direct correspondence between wavelets and disparity values. Therefore, the disparity of a wavelet is determined as an average of disparities in its local neighborhood whose size is equal to the wavelet spacing. Formally, the disparity for wavelet ⁇ rfx is defined as
  • ⁇ ⁇ s ⁇ I r ⁇ / ⁇ X f ⁇ .
  • the wavelet ⁇ lfx′ is then found as the closest wavelet to the location x ⁇ d rfx . This step is performed repeatedly for all wavelets from I l .
  • the closest wavelet may be found by re-evaluating it at the exact same location. Such an embodiment, however, may significantly increase the computational cost.
  • the disparity between wavelet pairs determined in the previous step is often inaccurate, due to insufficient quality of the input disparity maps, or additional effects such as transparency or depth of field that cannot be captured using a per-pixel disparity value.
  • phase difference can be easily transformed into the disparity residual by multiplying it by f/2 ⁇ and added to the initial disparity of wavelet as a correction. Consequently, the disparity information d rfx of wavelet ⁇ rfx is updated by adding ⁇ f/2 ⁇ . In this way, a continuous depth resolution may be obtained without expensively numerous depth labels.
  • phase differences may be determined for each channel separately and combine them using a weighted sum to get the disparity refinement. The weights are proportional to the wavelet amplitudes to penalize the phase for weak signals that can be only poorly estimated.
  • An accurate disparity estimation for each wavelet is determined as a result of the wavelet disparity refinement step.
  • the example embodiment provide a much richer representation, as it stores disparity information separately for different frequencies. Such additional information enables handling difficult cases when used for rendering novel views.
  • the example embodiment modifies the position of each wavelet.
  • the new position for each wavelet ⁇ , at location x and disparity d, is determined as x+a ⁇ d, where parameter a directly controls the new viewing position.
  • the displaced wavelets are converted back into uniform-spaced samples using a non-uniform Fourier transform.
  • a low-pass filter is used to downsample back into the original grid.
  • the 1D signal may be reconstructed using a pyramid reconstruction. For lowest frequency wavelets corresponding to filter b 0 , a linear interpolation of the wavelet values on the uniform grid is used, to prevent low-passed wavelets from accumulating and creating color bands.
  • This strategy leads to a simple view expansion. Note that before reconstructing novel views, the pair of corresponding wavelets from the left and the right views can be moved closer to each other by scaling disparity between them by factor s ⁇ 1 and moving their positions accordingly.
  • Moving individual wavelets of the same frequency independently has similar shortcomings as moving image patches in the Lagrangian approach, e.g., there might be two potential problems resulting from the non-uniform sampling. First, there can be missing information in the undersampled regions. This does not cause significant problems, as there is remaining information in lower frequency levels. Second, some of the wavelets may overlap. This leads to mixing background and foreground signals. To avoid this, the example embodiment detects occluded wavelets and attenuates their frequency. This approach is conceptually similar to resolving pixel occlusions using depth information in depth-image-based rendering (DIBR).
  • DIBR depth-image-based rendering
  • the example embodiment first finds the closest wavelets to the left ⁇ l and to the right ⁇ r that have smaller disparities (i.e., they are in front of ⁇ ). It is sufficient to consider wavelets corresponding to the same frequency. The portion of the wavelet ⁇ that is occluded by ⁇ l and ⁇ r is then determined. An assumption is made that one wavelet completely occludes the other wavelet if the distance between them is at most half of the original sampling distance.
  • d l and d r are the distances, as marked in FIG. 6 , and the original spacing between wavelets is assumed to be 1.
  • the occlusions have constant value 1 if the neighboring wavelet moves halfway to ⁇ , and 0, if the distance between them is at least the original sampling distance.
  • s ⁇ ( x ) ( 1 if ⁇ ⁇ x ⁇ 1 3 ⁇ ⁇ x 2 - 2 ⁇ ⁇ x 3 if ⁇ ⁇ x ⁇ ( 0 , 1 ) 0 if ⁇ ⁇ x ⁇ 0
  • a ⁇ s(O ⁇ ) ⁇ A ⁇ .
  • One technique for performing inter-view antialiasing is by attenuating local amplitude according to phase information.
  • the example embodiment implements Gaussian filter for antialiasing filtering. For a given wavelet at frequency level f with disparity d, the signal is filtered with
  • w non-linear disparity mapping operators may be easily applied, which was not possible for Eulerian methods.
  • Such operators are usually defined as a disparity mapping function that maps disparity according to certain goals.
  • a disparity mapping function usually scales disparities in a non-linear way. To apply such a mapping during the synthesis, it is sufficient to replace the scaling factor s with the desired non-linear function. The rest of the described view synthesis technique remains unchanged.
  • FIG. 7 (bottom) demonstrates one example of such manipulations.
  • FIG. 7 presents results of the additional processing described herein.
  • the top portion of FIG. 7 shows a synthesized view using the technique of the described embodiments (left) without inter-view antialiasing simulated as it would appear on an automultiscopic screen.
  • the top-right portion of FIG. 7 shows the same view with the antialiasing, and the inset shows a zoomed-in region. Note how aliasing in the form of ghosting is removed by the additional step.
  • the bottom portion of FIG. 7 shows an example of nonlinear disparity remapping. The depth for the foreground objects is compressed, resulting in this part of the scene being pushed close to the zero disparity plane (screen depth).
  • the described embodiments facilitate performance that is necessary to convert a 4K stereoscopic content in real time.
  • Two example embodiments are presented—a CUDA-based GPU implementation, and a hardware implementation using an FPGA with ARM processors.
  • CUDA is a parallel computing platform and application programming interface known in the art.
  • the example embodiment produces content for an 8-view 4K (3840 ⁇ 2160) automultiscopic display, where each of the output views has a resolution of 960 ⁇ 1080.
  • the example embodiment accepts a FullHD stereo video input and determines the initial disparity maps at a quarter the size of the input. The rest of the pipeline is determined at 960 ⁇ 1080.
  • the example embodiment implements the processing described herein on a GPU using CUDA.
  • the example embodiment runs on the Nvidia GeForce GTX Titan Z graphics card. For such a setup, the described embodiments can perform the conversion with the additional steps in 25-26 FPS for all sequences presented herein. The breakdown of the timing and the memory usage for the individual steps is presented in Table 1.
  • One advantage of the described embodiments is that most stages in the processing can be done in a scanline fashion. This eliminates the need for any external memory during the computation of these stages, and thus, it is suitable for a hardware implementation such as an field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • the described embodiments require only low-resolution disparity maps. Therefore, the described embodiments leverage the ARM processors inside the System-On-Chip (SoC) for this task. The ARM processor determines these disparity maps at the 240 ⁇ 180 resolution at 24 FPS.
  • FIG. 8 depicts each stage in an example embodiment hardware implementation 800 of the described embodiments.
  • a steroscopic frame 802 is received by a frame copy block 804 , which produces two copies of the stereoscopic frame 802 .
  • a pyramid decomposition block 806 decomposes the frame into two pyramids: a first pyramid for the left view and a second pyramid for the right view. Both pyramids are sent to a wavelet reprojection block 808 , in which each wavelet in the pyramid is re-projected according to the disparity maps generated by the ARM processor 810 (which performs disparity estimation and refinement).
  • the re-projected wavelets are filtered similarly to the filtering described in An accurate algorithm for nonuniform fast Fourier transforms ( NUFFT's ), Q H Liu and N Nguyen, IEEE Microwave and Guided Wave Letters 8, 1 (1998), 18-20, Lytro Inc., January 2015, https://www.lytro.com/, and sent to pyramid reconstruction block 812 .
  • This final stage 812 reconstructs views from the synthesized pyramids, and sends the resulting novel views (eight views in this example embodiment) to the output 814 .
  • the example embodiment was implemented on an FPGA SoC Xilinx ZC706 development board using Xilinx Vivado HLS 2015.4 software.
  • the FPGA SoC has two ARM processors running at up to 1 GHz and programmable logic with 350K logic cells and a total of 19 Mbit of internal RAM.
  • Table 2 shows the resource utilization of our implementation. Each stage is customized to the target, generating 8 views of 512 ⁇ 540 resolution at 24 FPS while running at 150 MHz.
  • the total memory utilization of our implementation is only 13 Mbit of the internal memory.
  • the example embodiment uses only about 50% of the hardware resource on the FPGA. The resolution may therefore be doubled to get a FullHD resolution.
  • FIG. 9 is a diagram of an example internal structure of a processing system 900 that may be used to implement one or more of the embodiments herein.
  • Each processing system 900 contains a system bus 902 , where a bus is a set of hardware lines used for data transfer among the components of a computer or processing system.
  • the system bus 902 is essentially a shared conduit that connects different components of a processing system (e.g., processor, disk storage, memory, input/output ports, network ports, etc.) that enables the transfer of information between the components.
  • Attached to the system bus 902 is a user I/O device interface 904 for connecting various input and output devices (e.g., keyboard, mouse, displays, printers, speakers, etc.) to the processing system 900 .
  • a network interface 906 allows the computer to connect to various other devices attached to a network 908 .
  • Memory 910 provides volatile and non-volatile storage for information such as computer software instructions used to implement one or more of the embodiments of the present invention described herein, for data generated internally and for data received from sources external to the processing system 900 .
  • a central processor unit 912 is also attached to the system bus 902 and provides for the execution of computer instructions stored in memory 910 .
  • the system may also include support electronics/logic 914 , and a communications interface 916 .
  • the communications interface may comprise, for example, a port for receiving the stereoscopic frames 802 and outputting novel views 814 , as described in FIG. 8 .
  • the information stored in memory 910 may comprise a computer program product, such that the memory 910 may comprise a non-transitory computer-readable medium (e.g., a removable storage medium such as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.) that provides at least a portion of the software instructions for the invention system.
  • the computer program product can be installed by any suitable software installation procedure, as is well known in the art.
  • at least a portion of the software instructions may also be downloaded over a cable communication and/or wireless connection.
  • certain embodiments of the example embodiments described herein may be implemented as logic that performs one or more functions.
  • This logic may be hardware-based, software-based, or a combination of hardware-based and software-based.
  • Some or all of the logic may be stored on one or more tangible, non-transitory, computer-readable storage media and may include computer-executable instructions that may be executed by a controller or processor.
  • the computer-executable instructions may include instructions that implement one or more embodiments of the invention.
  • the tangible, non-transitory, computer-readable storage media may be volatile or non-volatile and may include, for example, flash memories, dynamic memories, removable disks, and non-removable disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
US16/000,662 2017-06-05 2018-06-05 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion Active 2038-10-16 US10834372B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/000,662 US10834372B2 (en) 2017-06-05 2018-06-05 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion
US16/725,448 US10972713B2 (en) 2017-06-05 2019-12-23 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762515193P 2017-06-05 2017-06-05
US16/000,662 US10834372B2 (en) 2017-06-05 2018-06-05 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/725,448 Continuation US10972713B2 (en) 2017-06-05 2019-12-23 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Publications (2)

Publication Number Publication Date
US20180352208A1 US20180352208A1 (en) 2018-12-06
US10834372B2 true US10834372B2 (en) 2020-11-10

Family

ID=64460873

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/000,662 Active 2038-10-16 US10834372B2 (en) 2017-06-05 2018-06-05 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion
US16/725,448 Active US10972713B2 (en) 2017-06-05 2019-12-23 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/725,448 Active US10972713B2 (en) 2017-06-05 2019-12-23 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Country Status (2)

Country Link
US (2) US10834372B2 (fr)
WO (1) WO2018226725A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10972713B2 (en) 2017-06-05 2021-04-06 Massachusetts Institute Of Technology 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110304706A1 (en) * 2010-06-09 2011-12-15 Border John N Video camera providing videos with perceived depth
US8532410B2 (en) 2008-04-25 2013-09-10 Thomson Licensing Multi-view video coding with disparity estimation based on depth information
US20140192076A1 (en) 2011-08-16 2014-07-10 Imax Corporation Hybrid Image Decomposition and Protection
US20150124062A1 (en) 2013-11-04 2015-05-07 Massachusetts Institute Of Technology Joint View Expansion And Filtering For Automultiscopic 3D Displays
US20160300339A1 (en) * 2015-04-08 2016-10-13 Ningbo University Objective assessment method for stereoscopic video quality based on wavelet transform
US20160373715A1 (en) * 2011-10-26 2016-12-22 The Regents Of The University Of California Multi view synthesis method and display devices with spatial and inter-view consistency
US20170161921A1 (en) * 2015-10-09 2017-06-08 Institute Of Automation, Chinese Academy Of Sciences Wearable molecular imaging navigation system
US20180205909A1 (en) * 2017-01-16 2018-07-19 Gopro, Inc. Apparatus and methods for the selection of one or more frame interpolation techniques

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672609B1 (en) * 2011-11-11 2017-06-06 Edge 3 Technologies, Inc. Method and apparatus for improved depth-map estimation
US10834372B2 (en) 2017-06-05 2020-11-10 Massachusetts Institute Of Technology 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8532410B2 (en) 2008-04-25 2013-09-10 Thomson Licensing Multi-view video coding with disparity estimation based on depth information
US20110304706A1 (en) * 2010-06-09 2011-12-15 Border John N Video camera providing videos with perceived depth
US20140192076A1 (en) 2011-08-16 2014-07-10 Imax Corporation Hybrid Image Decomposition and Protection
US20160373715A1 (en) * 2011-10-26 2016-12-22 The Regents Of The University Of California Multi view synthesis method and display devices with spatial and inter-view consistency
US20150124062A1 (en) 2013-11-04 2015-05-07 Massachusetts Institute Of Technology Joint View Expansion And Filtering For Automultiscopic 3D Displays
US20160300339A1 (en) * 2015-04-08 2016-10-13 Ningbo University Objective assessment method for stereoscopic video quality based on wavelet transform
US20170161921A1 (en) * 2015-10-09 2017-06-08 Institute Of Automation, Chinese Academy Of Sciences Wearable molecular imaging navigation system
US20180205909A1 (en) * 2017-01-16 2018-07-19 Gopro, Inc. Apparatus and methods for the selection of one or more frame interpolation techniques

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
International Search Report and Written Opinion of rPCT/US2018/036105 dated Oct. 29, 2018 entitled "3dtv at Home: Eulerian-Lagrangian Stereo-To-Multi-View Conversion".
Kellnhofer et al. "3DTV at Home: Eulerian-Lagrangian Stereo-to-Multiview Conversion." In: ACM Transactions on Graphics (TOG) TOG, vol. 36 Issue 4, Jul. 2017, [online] [retrieved on Oct. 3, 2018 (Apr. 10, 2018)] Retrieved from the Internet <URL: https://dl .acm.org/citation.cfm?id=3078617>; entire document.
Wang "Eulerian-Lagrangian Stereo-to-Multi-view Conversion." In: Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Jan. 29, 2016., [online] [retrieved on Oct. 3, 2018 (Apr. 10, 2018)] Retrieved from the Internet <URL: https://dspace.mit.edu/handle/1721.1/106448>, entire document, especially Abstract; p. 14-15; 22-26.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10972713B2 (en) 2017-06-05 2021-04-06 Massachusetts Institute Of Technology 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion

Also Published As

Publication number Publication date
WO2018226725A1 (fr) 2018-12-13
US20200145634A1 (en) 2020-05-07
US10972713B2 (en) 2021-04-06
US20180352208A1 (en) 2018-12-06
WO2018226725A8 (fr) 2019-03-21

Similar Documents

Publication Publication Date Title
US11595653B2 (en) Processing of motion information in multidimensional signals through motion zones and auxiliary information through auxiliary zones
US10404961B2 (en) Auxiliary data for artifacts—aware view synthesis
US9756316B2 (en) Joint view expansion and filtering for automultiscopic 3D displays
Battisti et al. Objective image quality assessment of 3D synthesized views
CN107430782B (zh) 用于利用深度信息的全视差压缩光场合成的方法
US9113043B1 (en) Multi-perspective stereoscopy from light fields
Fachada et al. Depth image based view synthesis with multiple reference views for virtual reality
Didyk et al. Joint view expansion and filtering for automultiscopic 3D displays
Sandić-Stanković et al. DIBR-synthesized image quality assessment based on morphological multi-scale approach
Nguyen et al. Depth image-based rendering from multiple cameras with 3D propagation algorithm
Kellnhofer et al. 3DTV at home: eulerian-lagrangian stereo-to-multiview conversion
Mao et al. Expansion hole filling in depth-image-based rendering using graph-based interpolation
Plath et al. Adaptive image warping for hole prevention in 3D view synthesis
US10298914B2 (en) Light field perception enhancement for integral display applications
US10972713B2 (en) 3DTV at home: Eulerian-Lagrangian stereo-to-multi-view conversion
Liu et al. An enhanced depth map based rendering method with directional depth filter and image inpainting
Adhikarla et al. Real-time adaptive content retargeting for live multi-view capture and light field display
KR100914171B1 (ko) 휴대 방송에서의 3차원 서비스를 위한 깊이 영상 기반렌더링 장치 및 방법
US11277633B2 (en) Method and apparatus for compensating motion for a holographic video stream
US11265533B2 (en) Image generation apparatus, image generation method, and program
CN115373782A (zh) 基于多边形的三维全息显示方法及装置
CN115061348B (zh) 基于物点的三维动态全息显示方法及装置
Tajima et al. Chromatic interpolation based on anisotropy-scale-mixture statistics
Chang et al. Precise depth map upsampling and enhancement based on edge‐preserving fusion filters
Akyazi et al. Graph-based interpolation for zooming in 3d scenes

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

AS Assignment

Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSET

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATUSIK, WOJCIECH;WANG, SZU-PO;DIDYK, PIOTR K.;AND OTHERS;SIGNING DATES FROM 20170909 TO 20191220;REEL/FRAME:051357/0809

Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATUSIK, WOJCIECH;WANG, SZU-PO;DIDYK, PIOTR K.;AND OTHERS;SIGNING DATES FROM 20170909 TO 20191220;REEL/FRAME:051357/0809

STCB Information on status: application discontinuation

Free format text: ABANDONMENT FOR FAILURE TO CORRECT DRAWINGS/OATH/NONPUB REQUEST

AS Assignment

Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATUSIK, WOJCIECH;DIDYK, PIOTR K.;FREEMAN, WILLIAM T.;AND OTHERS;SIGNING DATES FROM 20161011 TO 20200131;REEL/FRAME:051712/0985

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4