WO2013164778A1 - Quality metric for processing 3d video - Google Patents

Quality metric for processing 3d video Download PDF

Info

Publication number
WO2013164778A1
WO2013164778A1 PCT/IB2013/053461 IB2013053461W WO2013164778A1 WO 2013164778 A1 WO2013164778 A1 WO 2013164778A1 IB 2013053461 W IB2013053461 W IB 2013053461W WO 2013164778 A1 WO2013164778 A1 WO 2013164778A1
Authority
WO
WIPO (PCT)
Prior art keywords
view
image
parameter
display
values
Prior art date
Application number
PCT/IB2013/053461
Other languages
French (fr)
Inventor
Wilhelmus Hendrikus Alfonsus BRULS CHAMBERLIN
Bartolomeus Wilhelmus Damianus SONNEVELDT
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Priority to JP2015509552A priority Critical patent/JP6258923B2/en
Priority to EP13729086.2A priority patent/EP2845384A1/en
Priority to CN201380023230.3A priority patent/CN104272729A/en
Priority to US14/397,404 priority patent/US20150085073A1/en
Publication of WO2013164778A1 publication Critical patent/WO2013164778A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/144Processing image signals for flicker reduction

Definitions

  • the invention relates to a 3D video device for processing a three dimensional [3D] video signal.
  • the 3D video signal comprises at least a first image to be displayed on a 3D display.
  • the 3D display requires multiple views for creating a 3D effect for a viewer.
  • the 3D video device comprises a receiver for receiving the 3D video signal.
  • the invention further relates to a method of processing a 3D video signal.
  • the invention relates to the field of generating and/or adapting views based on the 3D video signal for a respective 3D display.
  • the disparity/depth in the image may need to be mapped onto a disparity range of the target display device.
  • the document "A Perceptual Model for disparity, by p. Didyk et al, ACM Transactions on Graphics, Proc. of SIGGRAPH, year 2011, volume 30, number 4" provides a perceptual model for disparity and indicates that it can be used for adapting 3D image material for specific viewing conditions.
  • the paper describes that disparity contrasts are more perceptively noticeable and provides a disparity difference metric for retargeting.
  • the disparity difference metric is based on analyzing images based on the disparity differences to determine the amount of perceived perspective.
  • a process of adapting a 3D signal for different viewing conditions is called retargeting and global operators for retargeting are discussed, the effect of retargeting being determined based on the metric (e.g. in section 6, first two paragraphs, and section 6.2).
  • the known difference metric is rather complex and requires disparity data to be available for analysis.
  • the device as described in the opening paragraph comprises a processor for determining at least one processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display, calculating a quality metric indicative of perceived 3D image quality, which quality metric is based on a combination of image values of the processed view and a further view, and determining a preferred value for the parameter based on performing said determining and calculating for multiple values of the parameter.
  • the method comprises receiving the 3D video signal, determining at least one processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display, calculating a quality metric indicative of perceived 3D image quality, which quality metric is based on a combination of image values of the processed view and a further view, and determining a preferred value for the parameter based on performing said determining and calculating for multiple values of the parameter.
  • the measures have the effect that device receives a 3D video signal and determines a parameter for adapting views for the respective display to enhance the quality of the 3D image as displayed by the respective 3D display for a viewer.
  • the process of adapting views for a particular display is called targeting the views for the 3D display.
  • the particular display may have a limited depth range for high quality 3D images.
  • a gain parameter may be determined for applying to the depth values used for generating or adapting the views for such display.
  • the respective display may have a preferred depth range, usually near the display screen that has a high sharpness, whereas 3D objects protruding towards the viewer tend to be less sharp.
  • An offset parameter may be applied to the views to control the amount of disparity, and subsequently the 3D objects may be shifted towards the high sharpness, preferred depth range.
  • the device is provided with an automatic system for adjusting said parameter for optimizing the 3D effect and perceived image quality of the respective 3D display.
  • the quality metric is calculated based on the combination of image values to determine the perceived 3D image quality and is used to measure the effect of multiple different values of the parameter on the 3D image quality.
  • the invention is also based on the following recognition.
  • the adjustment of the views for the respective 3D display may be performed manually by the viewer based on his judgment of the 3D image quality.
  • Automatic adjustment e.g. based on processing a depth or disparity map by gain and offset to map the depths into a preferred depth range of the respective 3D display, may result in images getting blurred for certain parts and/or a relatively small depth effect.
  • the inventors have seen that such mapping tends to be biased by relatively large objects having a relatively large disparity, but a relatively low contribution to perceived image quality, such as remote clouds.
  • the proposed quality metric is based on comparing image values of the combination of image values of the processed view that contains image data warped by disparity and image values of the further view, for example an image that is provided with the 3D video signal.
  • the image values of the combination represent both the image content and the disparity in the views as disparity is different in both views. Effectively objects that have high contrasts or structure do contribute substantially to the quality metric, whereas objects having few perceivable characteristics do hardly contribute in spite of large disparity.
  • the image metric is used to optimize parameters impacting the on-screen disparity of rendered images it is important to relate image information from different views. Moreover in order to best relate these views, the image information compared is preferably from the corresponding x,y position in the image. More preferably this involves re-scaling input and rendered image such that their image dimensions match, in which case the same x,y position can be matched.
  • the proposed metric does not require that disparity data or depth maps as such are provided or calculated to determine the metric. Instead, the metric is based on the image values of the processed image, which are modified by the parameter, and the further view.
  • the further view is a further processed view based on the 3D image data adapted by the parameter.
  • the further view represents a different viewing angle, and is processed by the same value of the parameter, e.g. offset.
  • the effect is that at least two processed views are compared and the quality metric represents the perceived quality due to the differences between the processed views.
  • the further view is a 2D view available in the 3D image data.
  • the effect is that the processed view is compared to an original 2D view that has a high quality and no artifacts due to view warping.
  • the further view is a further processed view based on the 3D image data adapted by the parameter and the processed view and the further processed view are interleaved to constitute the combination of image values.
  • the processed view may correspond to an interleaved 3D image to be displayed on an array of pixels of an auto stereoscopic 3D display by interleaving the multiple views.
  • the interleaved 3D image is constructed by assembling a combined matrix of pixels to be transferred to a display screen, which is provided with optics to accommodate different, adjacent views in different directions so that such different views are perceived by the respective left and right eyes of viewers.
  • the optics may be a lenticular array for constituting an autostereoscopic display (ASD) as disclosed in EP 0791847A1.
  • EP 0791847A1 by the same Applicant shows how image information associated with the different views may be interleaved for a lenticular ASD.
  • the respective subpixels of the display panel under the lenticular (or other light directing means) are assigned view numbers; i.e. they care information associated with that particular view.
  • the lenticular (or other light directing means) overlaying the display panel subsequently directs the light emitted by the respective subpixels to the eyes of an observer, thereby providing the observer with pixels associated a first view to the left eye and a second view to the right eye.
  • the observer will, provided that proper information is provided in the first and second view image, perceive a stereoscopic image.
  • pixels of different views are interleaved, preferably at the subpixel level when looking at the respective R, G and B values of a display panel.
  • the processed image is now similar to the interleaved image that has to be generated for the final 3D display.
  • the quality metric is calculated based on the interleaved image, e.g. by determining a sharpness of the interleaved image.
  • the processor is arranged for determining at least a first view and a second view based on the 3D image data adapted by the parameter, and interleaving the at least first and second view to determine the processed view.
  • the interleaved view is compared to the further view, e.g. a 2D image as provided in the 3D video signal.
  • the processor is arranged for determining the processed view based on a leftmost and/or a rightmost view, the multiple views forming a sequence of views extending from the leftmost view to the rightmost view.
  • the leftmost and/or rightmost view contain relatively high disparity with respect to the further view.
  • the processor is arranged for calculating the quality metric based on a Peak Signal-to-Noise Ratio calculation on the combination of image values, or based on a sharpness calculation on the combination of image values.
  • the Peak Signal-to-Noise Ratio is the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation.
  • the PSNR now provides a measure of perceived quality of 3D image.
  • the parameter for targeting the 3D video comprises at least one of an offset; a gain; or a type of scaling.
  • the preferred value of such parameter is applied for targeting the views for the 3D display as a processing condition for adapting the warping of views.
  • the offset when applied to the views, effectively moves objects back or forth with respect to the plane of the display.
  • a preferred value for the offset moves important objects to a position near the 3D display plane.
  • the gain when applied to the views, effectively moves objects away or towards the plane of the 3D display.
  • a preferred value for the gain moves important objects with respect to the 3D display plane.
  • the type of scaling indicates how the values in the views are modified into actual values when warping the views, e.g. bi-linear scaling, bicubic scaling, or how to adapt the viewing cone.
  • the processor is arranged for calculating the quality metric based on a central area of the combination of image values by ignoring border zones.
  • the border zones may be disturbed, or incomplete due to the adapting by the parameter, and usually do not contain relevant high disparity values or protruding objects.
  • the metric when only based on the central area, is more reliable.
  • the processor is arranged for calculating the quality metric by applying a weighting on the combination of image values in dependence on corresponding depth values. Differences between the image values are further weighted by local depths, e.g. protruding objects that have more impact on perceived quality may be stressed to have more contribution to the quality metric.
  • the processor is arranged for determining a region of interest in the processed view, and for calculating the quality metric by applying a weighting on the combination of image values in the region of interest. In the region of interest differences between the image values are weighted for calculating the quality metric.
  • the processor may have a face detector for determining the region of interest.
  • the processor is arranged for calculating the quality metric for a period of time in dependence of a shot in the 3D video signal.
  • the preferred value of the parameter applies to a period of the 3D video signal that has a same 3D configuration, e.g. a specific camera and zoom configuration.
  • the configuration is substantially stable during a shot of a video program. Shot boundaries may be known or can be easily detected at the source side, and a preferred value for the parameter is advantageously determined for the time period corresponding to the shot.
  • the processor may be further arranged for updating the preferred value of the parameter in dependence of a change of the region of interest exceeding a predetermined threshold, such as a substantial change of the depth position of a face.
  • Figure 1 shows a system for processing 3D video data and displaying the 3D video data
  • Figure 2 shows a method of processing a 3D video signal
  • Figure 3 shows a distribution of disparity values
  • Figure 4 shows a 3D signal
  • Figure 5 shows interleaved views for various offset values
  • Figure 6 shows a quality metric calculated for different values of an offset parameter
  • Figure 7 shows a system to determine an offset based on a sharpness metric
  • Figure 8 shows example depth map histograms
  • Figure 9 shows scaling for adapting the view cone.
  • 3D video signal may be formatted and transferred, according to a so-called a 3D video format. Some formats are based on using a 2D channel to also carry stereo information.
  • the image is represented by image values in a two-dimensional array of pixels. For example the left and right view can be interlaced or can be placed side by side or top-bottom (above and under each other) in a frame.
  • a depth map may be transferred, and possibly further 3D data like occlusion or transparency data.
  • a disparity map, in this text, is also considered to be a type of depth map.
  • the depth map has depth values also in a two-dimensional array corresponding to the image, although the depth map may have a resolution different from that of the "texture" input image(s) contained in the 3D signal.
  • the 3D video data may be compressed according to compression methods known as such, e.g. MPEG. Any 3D video system, such as internet or a Blu-ray Disc (BD), may benefit from the proposed enhancements.
  • BD Blu-ray Disc
  • the 3D display can be a relatively small unit (e.g. a mobile phone), a large Stereo Display (STD) requiring shutter glasses, any stereoscopic display (STD), an advanced STD taking into account a variable baseline, an active STD that targets the L and R views to the viewers eyes based on head tracking, or an auto-stereoscopic multiview display (ASD), etc. Views need to be warped for said different types of displays, e.g. for ASD's and advanced STD's for variable baseline, based on the depth/disparity data in the 3D signal.
  • the disparity/depth in the image needs to be mapped onto a disparity range of the target display device, which is called targeting.
  • targeting due to targeting images may get blurred for certain parts and/or there is a relatively small depth effect.
  • Figure 1 shows a system for processing 3D video data and displaying the 3D video data.
  • a 3D video signal 41 is provided to a 3D video device 50, which is coupled to a 3D display device 60 for transferring a 3D display signal 56.
  • the 3D video signal may for example be a 3D TV broadcast signal such as a standard stereo transmission using 1 ⁇ 2 HD frame compatible, multi view coded (MVC) or frame compatible full resolution (e.g. FCFR as proposed by Dolby).
  • MVC multi view coded
  • FCFR frame compatible full resolution
  • Figure 1 further shows a record carrier 54 as a carrier of the 3D video signal.
  • the record carrier is disc-shaped and has a track and a central hole.
  • the track constituted by a pattern of physically detectable marks, is arranged in accordance with a spiral or concentric pattern of turns constituting substantially parallel tracks on one or more information layers.
  • the record carrier may be optically readable, called an optical disc, e.g. a DVD or BD (Blu- ray Disc).
  • the information is embodied on the information layer by the optically detectable marks along the track, e.g. pits and lands.
  • the track structure also comprises position information, e.g. headers and addresses, for indication the location of units of information, usually called information blocks.
  • the record carrier 54 carries information representing digitally encoded 3D image data like video, for example encoded according to the MPEG2 or MPEG4 encoding system, in a predefined recording format like the DVD or BD format.
  • the 3D video device 50 has a receiver for receiving the 3D video signal 41, which receiver has one or more signal interface units and an input unit 51 for parsing the incoming video signal.
  • the receiver may include an optical disc unit 58 coupled to the input unit for retrieving the 3D video information from an optical record carrier 54 like a DVD or Blu-ray disc.
  • the receiver may include a network interface unit 59 for coupling to a network 45, for example the internet or a broadcast network, such device being a set-top box or a mobile computing device like a mobile phone or tablet computer.
  • the 3D video signal may be retrieved from a remote website or media server.
  • the 3D video device may be a converter that converts an image input signal to an image output signal having view targeting information, e.g. a preferred value for a parameter for targeting as described below.
  • Such a converter may be used to convert input 3D video signals for a specific type of 3D display, for example standard 3D content to a video signal suitable for auto-stereoscopic displays of a particular type or vendor.
  • the 3D display requires multiple views for creating a 3D effect for a viewer.
  • the 3D video device may be a 3D enabled amplifier or receiver, a 3D optical disc player, or a satellite receiver or set top box, or any type of media player.
  • the 3D video device may be integrated in a multi-view ASD, such as a barrier or lenticular based ASD.
  • the 3D video device has a processor 52 coupled to the input unit 51 for processing the 3D information for generating a 3D display signal 56 to be transferred via an output interface unit 55 to the 3D display device, e.g. a display signal according to the HDMI standard, see "High Definition Multimedia Interface; Specification Version 1.4a of March 4, 2010", the 3D portion of which being available at
  • the 3D display device 60 is for displaying the 3D image data.
  • the device has an input interface unit 61 for receiving the 3D display signal 56 including the 3D video data and the view targeting information transferred from the 3D video device 50.
  • the device has a view processor 62 for providing multiple views of the 3D video data based on the 3D video information.
  • the views may be generated from the 3D image data using a 2D view at a known position and a depth map.
  • warping of a view is called warping of a view.
  • the views are further adapted based on the view targeting parameter as discussed below.
  • the processor 52 in the 3D video device may be arranged to perform said view processing.
  • Multiple views generated for the specified 3D display may be transferred with the 3D image signal towards said 3D display.
  • the 3D video device and the display may be combined into a single device.
  • the functions of the processor 52 and the video processor 62, and remaining functions of output unit 55 and input unit 61, may be performed by a single processor unit. The functions of the processor are described now.
  • the processor determines a processed view based on at least one of the multiple views adapted by a parameter for targeting the multiple views to the 3D display.
  • the parameter may for example be an offset, and/or a gain, applied to the views for targeting the views to the 3D display.
  • the processor determines a combination of image values of the processed view that contains image data warped by disparity and image values of a further view, for example an image that is provided with the 3D video signal.
  • a quality metric is calculated indicative of perceived 3D image quality.
  • the quality metric is based on the combination of image values.
  • the process of determining the processed view and calculating the quality metric is repeated for multiple values of the parameter, and a preferred value for the parameter is determined based on the respective metric s .
  • the quality metric is being calculated based on non-interleaved images, it is preferable to relate image information from the corresponding (x,y) position in the images.
  • the rendered image is not at the same spatial resolution, preferably one or both images are scaled so as to simplify the calculation of the quality metric in that then the same spatial (x,y) positions can be used.
  • the quality metric calculation can be adapted so as to handle the original unsealed images, but to relate the proper image information, e.g. by calculating one or more intermediate values that allow comparison of the non-interleaved images.
  • the parameter may also be a type of scaling, which indicates how the values in the depth map are to be translated into actual values to be used when warping the views, e.g. bi-linear scaling, bicubic scaling, or a predetermined type of non-linear scaling. For different types of scaling the quality metric is calculated, and a preference is determined.
  • a further type of scaling refers to scaling the shape of the view cone, which is described below with reference to Figure 8.
  • the further view in the combination of image values may be a further processed view based on the 3D image data adapted by the parameter.
  • the further view represents a different viewing angle, and is processed by the same value of the parameter, e.g. offset.
  • the quality metric now represents the perceived quality due to the differences between the processed views.
  • the further view may be a 2D view available in the 3D image data. Now the processed view is compared to an original 2D view that has a high quality and no artifacts due to view warping.
  • the further view may be a further processed view based on the 3D image data adapted by the parameter and the processed view and the further processed view are interleaved to constitute the combination of image values.
  • a single interleaved image contains the image values of the combination.
  • the processed view may correspond to an interleaved 3D image to be displayed on an array of pixels of an auto stereoscopic 3D display by interleaving the multiple views.
  • the quality metric is calculated based on the interleaved image as such, e.g. by determining a sharpness of the interleaved image.
  • the processor may be arranged for determining at least a first view and a second view based on the 3D image data adapted by the parameter, and interleaving the at least first and second view to determine the processed view.
  • the interleaved view is compared to the further view, e.g. a 2D image as provided in the 3D video signal to calculate the quality metric, e.g. based on a PSNR calculation.
  • the processor may be arranged for determining the processed view based on a leftmost and/or a rightmost view from a sequence of views extending from the leftmost view to the rightmost view. Such an extreme view does have the highest disparity, and therefore the quality metric will be affected substantially.
  • Figure 2 shows a method of processing a 3D video signal.
  • the 3D video signal contains at 3D image data to be displayed on a 3D display, which 3D display requires multiple views for creating a 3D effect for a viewer.
  • RCV the method starts with receiving the 3D video signal.
  • SETPAR 22 a value is set for a parameter for targeting the multiple views to the 3D display, e.g. an offset parameter.
  • a processed view is determined based on at least one of the multiple views adapted by the actual value of the parameter, as described above.
  • a quality metric is calculated indicative of perceived 3D image quality. The quality metric is based on the combination of image values of the processed view and the further view.
  • stage LOOP 25 it is decided whether further values of the parameter need to be evaluated. If so, the process continues at stage SETPAR 22.
  • a preferred value for the parameter is determined based on the multiple corresponding quality metrics acquired by the loops of said determining and calculating for multiple values of the parameter. For example, the parameter value may be selected that has the best value for the quality metric, or an interpolation may be performed on the quality metric values found to estimate an optimum, e.g. a maximum.
  • the repeated calculation provides a solution in which a mapping is used to render an image and subsequently an error measure/metric is established based on the rendered image (or part thereof) so as to establish an improved mapping.
  • the error measure that is determined may be based on a processed view resulting from the interleaving of views.
  • An alternative a processed view may be based on one or more views prior to interleaving, as described above.
  • the processing of 3D video may be used to convert content "off-line", e.g. during recording or using a short video delay.
  • the parameter may be determined for a period of a shot. Disparity at the start and end of a shot might be quite different. In spite of such differences the mapping within a shot needs to be continuous. Processing for periods may require shot-cut detection, off-line processing and/or buffering. Automatically detecting boundaries of a shot as such is known. Also the boundaries may already be marked or may be determined during a video editing process. For example an offset value that is determined for a close-up shot of a face, may be succeeded by a next offset value for a next shot of a remote landscape.
  • Figure 3 shows a distribution of disparity values.
  • the Figure shows a graph of disparity values from a 3D image.
  • the disparities vary from a low disparity value Disp_low to high disparity value Disp_high and may have statistical distribution as shown in the figure.
  • the example of distribution of disparities in the image content has a median or center of gravity at -10 pixels disparity.
  • Such disparity range must be mapped to a depth map to support an auto-stereoscopic display.
  • the disparities between Disp_low to Disp_high may be mapped linearly to depth 0 ..255.
  • Low and high values can also be the 5% or 95% points of the distribution.
  • the disparities may be determined for each shot using a shot detector. However linear mapping might lead to problems with asymmetric distributions.
  • mapping might be to map the center of gravity of the distribution (i.e. -10 pixels in the example) to a depth value corresponding to ASD on-screen level (usually 128) and the disparity range linear around this on-screen depth level.
  • ASD on-screen level usually 1228
  • disparity range linear around this on-screen depth level usually a depth value corresponding to ASD on-screen level.
  • mapping often does not match with the visual perception when looking to the ASD.
  • An annoying blurring can be observed.
  • the blurring is content dependent.
  • An unattractive remedy to avoid the blurring is to reduce the overall depth range (low gain), however this leads to less perceived depth on the ASD. Manual control is also unattractive.
  • a depth map is provided, for example by converting stereo to 2D and depth. Then an initial mapping is performed, using a first reasonable disparity to depth mapping, such as mapping the center of the distribution to the depth value corresponding to ASD screen level. Then a number of views are generated from this depth and 2D signal and then interleaved to create a processed view.
  • the interleaved view may be coupled to the ASD display panel. The idea is to use the processed view as a 2D signal, and compare it with the original 2D signal. The process is repeated for a range of depth (or disparity) offset values.
  • the comparison as such can be done by a known method such as spectrum analysis, FFT, etc, but can also be a more simple method such a SAD or PSNR calculation.
  • the area for processing may be limited to a central area of the image by avoiding the border data, for example a border of 30 pixels wide for the horizontal and vertical borders.
  • Figure 4 shows a 3D signal.
  • the 3D video signal comprises a 2D image and a corresponding depth map.
  • Figure 4a shows a 2D image
  • Figure 4b shows a corresponding depth map.
  • the views for rendering on the 3D display are generated based on the 2D image and the depth map. Subsequently the views are interleaved to create an interleaved view.
  • the interleaved view may be transferred to an LCD panel of an autostereoscopic display.
  • the interleaved views for different values of offset are now used as the processed views to calculate the quality metric based on PSNR for the respective offsets, as illustrated by Figures 5 and 6.
  • the Figures 5 were generated for a display panel having a 1920x1080 screen resolution wherein each pixel was composed of three RGB subpixels.
  • the rendered images represent images that were rendered using different depth offset parameters; i.e. the depth level in the range of 0-255 that corresponds to zero-disparity on the display.
  • the interleaved images were rendered for an ASD having a slanted lenticular applied.
  • the sub-pixels of all 1920x1080 image pixels of the respective interleaved image comprise view information associated with three different views.
  • Fig. 5a-5d correspond with four different depth offset values; an offset of 110, 120, 130 and 140 respectively.
  • the different offsets result in objects at different depths in the image being imaged more or less sharp as a result of the interleaving process and the different displacements (disparity) of image information in the rendered views.
  • the "crisp" zigzag pattern on the mug visible in Fig. 5a is blurred in Fig. 5b-d.
  • the quality metric is calculated based on PSNR with 2D picture, and is 25.76 dB.
  • the quality metric is calculated based on PSNR with 2D picture, and is 26.00 dB.
  • the quality metric is calculated based on PSNR with 2D picture, and is 25.91 dB.
  • the quality metric is calculated based on PSNR with 2D picture, and is 25.82 dB.
  • Figure 6 shows a quality metric calculated for different values of an offset parameter.
  • the Figure shows the quality metric values based on the PSNR as a function of the offset parameter value. From the curve in the Figure it can be seen that an offset value of 120 results in the maximum value of the quality metric. Verification by a human viewer confirmed that 120 indeed is the optimum value for the offset for this image.
  • the method not only takes disparities into account, or just information from the 2D signal, but establishes a combined analysis. Due to the combined analysis, for example skies or clouds with little details but with large disparity values hardly contribute to the PSNR differences. This corresponds to perceived 3D image quality, since such objects at a somewhat blurred display position also hardly hamper the viewing experience.
  • the processed view may be a virtual interleaved view, i.e. different from the actual ASD interleaved view, by using an interleaving scheme with less views, or just one extreme view.
  • the processor may be equipped as follows.
  • the processor may have a unit for determining a region of interest in the processed view, and for calculating the quality metric by applying a weighting on differences of image values in the region of interest for displaying the region of interest in a preferred depth range of the 3D display.
  • the parameter is determined so as to enable displaying the region of interest in a preferred depth range of the 3D display.
  • the region of interest is constituted by elements or objects in the 3D video material that are assumed to catch the viewer's attention.
  • the region of interest data may indicate an area of the image that has a lot of details which will probably get the attention of the viewer.
  • the region of interest may be known or can be detected, or an indication may be available in the 3D video signal.
  • differences between the image values are weighted, e.g. objects that are intended to have more impact on perceived quality may be stressed to have more contribution to the quality metric.
  • the processor may have a face detector 53.
  • a detected face may be used to determine the region of interest.
  • a weighting may be applied for areas with faces to the corresponding image value differences, e.g. 5 times the normal weight on the squared differences for the PSNR calculation.
  • the weighting could be multiplied with the depth value or a value derived from the depth, e.g. a further weighting for faces at large depths (far out of screen), e.g. lOx, and weighting for faces at small depths (faces behind the screen) e.g. 4x.
  • the processor may be equipped for calculating the quality metric by applying a weighting on differences of image values in dependence on corresponding depth values.
  • a weight depending on the depth may be applied to image differences while calculating the metric, for example weighting at large depth 2x, and weighting at small depths lx. This relates to the perceived quality, because blurring in the foreground is more annoying than blurring in the background.
  • a weight may be applied depending on the absolute difference of the depth and the depth value at screen level. For example a weighting at large depth differences of 2x, and weighting at small depths differences of lx. This relates to the perceived quality, because the sensitivity of determining the optimal (minimum PSNR) offset level is increased.
  • the processor is equipped for calculating the quality metric based on processing along horizontal lines of the combination of image values. It is noted that disparity differences always occur in horizontal direction corresponding to the orientation of the eyes of viewers. Hence the quality metric may effectively be calculated in horizontal direction of the images. Such a one-dimensional calculation is less complex. Also the processor may be equipped for reducing the resolution of the combination of image values, for example by decimating the matrix of image values of the combination.
  • the processor may be equipped for applying a subsampling pattern or random subsampling to the combination of image values.
  • the subsampling pattern may be designed to take different pixels on adjacent lines, in order to avoid missing regular structures in the image content.
  • the random subsampling achieves that structured patterns do still contribute to the calculated quality metric.
  • a system to automatically determine the offset for a 3D display may be based on using a sharpness metric.
  • sharpness is an important parameter that influences the picture quality of 3D displays, especially auto-stereoscopic displays (ASD).
  • the sharpness metric may be applied to the combination of image values as described above.
  • the document "Local scale control for edge detection and blur estimation, by J. H. Elder and S. W. Zucker,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 699-716, July 1998 describes a method to calculate a blur-radius for the edges in an image.
  • the system may be applied to an image with an accompanying depth map.
  • the latter can e.g. be estimated from a stereo pair (left + right image), or transferred with the 3D video data.
  • the idea of the system is to weigh the histogram of the depth map using the sharpness metric. Then the depth values corresponding to sharp (in focus) areas of the image will have a higher weight than un-sharp areas. As such the mean of the resulting histogram will bias towards the in-focus depth plane.
  • the inverse of the blur-radius may be used.
  • Figure 7 shows a system to determine an offset based on a sharpness metric.
  • a 3D signal having image and depth data is provided at the input.
  • a segmenting unit 61 a binary segmentation map S is calculated using i.e. edge detection. S now indicates pixels in the image where the blur-radius can be calculated.
  • a blur-radius calculator 62 the blur- radius BR(S) is calculated for the segmented input image.
  • an inverter 63 (denoted by 1/X) the reciprocal value of the blur radius is used for determining the sharpness metric W(S).
  • histogram calculator 64 a weighted histogram of the segmented depth-map is calculated.
  • depth-values depth(S) are multiplied (weighted) with the sharpness metric W(S).
  • W(S) sharpness metric
  • a processor would be arranged for calculating a sharpness metric for locations in the input image, determining depths at the locations, weighting the depths with the corresponding sharpness metric and determining a mean value of the weighted depths.
  • the mean value may be shifted to a preferred sharpness value of the 3D display by applying a corresponding offset to the depths.
  • Figure 8 shows example depth map histograms. The histograms shows depth values of an example picture.
  • the depth map values are between 0-255.
  • the upper graph 81 shows the original histogram of the depth map.
  • the lower graph 82 shows the weighted histogram using the sharpness metric.
  • Figure 9 shows scaling for adapting the view cone.
  • the view cone refers to the sequence of warped views for a multiview 3D display.
  • the type of scaling indicates the way the view cone is adapted compared to a regular cone in which each consecutive view has a same disparity difference with the preceding view.
  • Altering the cone shape means changing the relative disparity of neighboring views by an amount less than said same disparity difference.
  • Figure 9 top-left shows a regular cone shape.
  • the regular cone shape 91 is commonly used in traditional multiview Tenderers. The shape has an equal amount of stereo for most of the cone and a sharp transition towards the next repetition of the cone. A user positioned in this transition area will perceive a large amount of crosstalk and inverse stereo.
  • a saw tooth shaped curve indicates the regular cone shape 91 having a disparity linearly related to its position in the cone. The position of the views within the viewing cone is defined to be zero for the cone center, -1 for entirely left and +1 for entirely right.
  • altering the cone shape changes only the rendering of content on the display (i.e. view synthesis, interleaving) and does not require physical adjustments to the display.
  • the parameter for adapting the depths or the warping may be the type of scaling which is used for the 3D video material at the source side for altering the cone shape.
  • a set of possible scaling cone shapes for adapting the view cone may be predefined and each shape may be given an index, whereas the actual index value is selected based on the quality metric as calculated for the set of shapes.
  • the second curve shows three examples of adapted cone shapes.
  • the views on the second curve in each example have a reduced disparity difference with the neighboring views.
  • the viewing cone shape is adapted to reduce the visibility of artifacts by reducing the maximum rendering position.
  • the alternate cone shapes may have the same slope as the regular cone. Further away from the center, the cone shape is altered (in respect to the regular cone) to limit image warping.
  • Figure 9 top-right shows a cyclic cone shape.
  • the cyclic cone shape 92 is adapted to avoid the sharp transition by creating a bigger but less strong inverse stereo region.
  • Figure 9 bottom-left shows a limited cone.
  • the limited cone shape 93 is an example of a cone shape that limits the maximum rendering position to about 40% of the regular cone.
  • Figure 9 bottom-right shows a 2D-3D cone.
  • the 2D-3D cone shape 94 also limits the maximum rendering position, but re-uses the outside part of the cone to offer a mono (2D) viewing experience.
  • a user moves through this cone, he/she experiences a cycle of stereo, inverse stereo, mono and again inverse stereo.
  • This cone shape allows a group of people of which only some members prefer stereo over mono to watch a 3D movie.
  • the invention aims to provide a targeting method that aims to reduce the blur in the image resulting from the mapping.
  • the standard process of creating an image for display on a multi-view (lenticular/barrier) display is to generate multiple views and to interleave these views, typically on pixel or subpixel level, so that the different views are placed under the lenticular in manner suitable for 3D display. It is proposed to use a processed view, e.g. the interleaved image, as a normal 2D image and compare it with a further view, e.g. the original 2D signal, for a range of values of a mapping parameter, such as offset, and calculate a quality metric.
  • a mapping parameter such as offset
  • the comparison can be based on any method, such as spectrum analysis, or SAD and PSNR measurements.
  • the analysis does not only take disparities into account but also takes into account the image content. That is, if an area of the image does not contribute to the stereoscopic effect due to the nature of the image content, then that particular area does not contribute substantially to the quality metric.
  • the current invention may be used for any type of 3D image data, either still picture or moving video.
  • 3D image data is assumed to be available as electronic, digitally encoded, data.
  • the current invention relates to such image data and manipulates the image data in the digital domain.
  • the invention may be implemented in hardware and/or software, or in programmable components.
  • a computer program product may implement the methods as described with reference to Figure 2.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

A 3D video device (50) processes a video signal (41) that has at least a first image to be displayed on a 3D display. The 3D display (63) requires multiple views for creating a 3D effect for a viewer, such as an autostereoscopic display. The 3D video device has a processor (52) for determining a processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display, and calculating a quality metric indicative of perceived 3D image quality. The quality metric is based on a combination of image values of the processed view and a further view. A preferred value for the parameter is determined based on repeatedly determining and calculating using different values. Advantageously, the quality metric predicts the perceived image quality based on a combination of image content and disparity.

Description

Quality metric for processing 3D video
FIELD OF THE INVENTION
The invention relates to a 3D video device for processing a three dimensional [3D] video signal. The 3D video signal comprises at least a first image to be displayed on a 3D display. The 3D display requires multiple views for creating a 3D effect for a viewer. The 3D video device comprises a receiver for receiving the 3D video signal.
The invention further relates to a method of processing a 3D video signal.
The invention relates to the field of generating and/or adapting views based on the 3D video signal for a respective 3D display. When content is not intended for playback on a specific autostereoscopic device, the disparity/depth in the image may need to be mapped onto a disparity range of the target display device.
BACKGROUND OF THE INVENTION
The document "A Perceptual Model for disparity, by p. Didyk et al, ACM Transactions on Graphics, Proc. of SIGGRAPH, year 2011, volume 30, number 4" provides a perceptual model for disparity and indicates that it can be used for adapting 3D image material for specific viewing conditions. The paper describes that disparity contrasts are more perceptively noticeable and provides a disparity difference metric for retargeting. The disparity difference metric is based on analyzing images based on the disparity differences to determine the amount of perceived perspective. A process of adapting a 3D signal for different viewing conditions is called retargeting and global operators for retargeting are discussed, the effect of retargeting being determined based on the metric (e.g. in section 6, first two paragraphs, and section 6.2).
SUMMARY OF THE INVENTION
The known difference metric is rather complex and requires disparity data to be available for analysis.
It is an object of the invention to provide a system for providing a parameter for targeting a 3D video signal to a respective 3D display based on a quality metric that is less complex while optimizing the perceived 3D image quality of a respective 3D display. For this purpose, according to a first aspect of the invention, the device as described in the opening paragraph comprises a processor for determining at least one processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display, calculating a quality metric indicative of perceived 3D image quality, which quality metric is based on a combination of image values of the processed view and a further view, and determining a preferred value for the parameter based on performing said determining and calculating for multiple values of the parameter.
The method comprises receiving the 3D video signal, determining at least one processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display, calculating a quality metric indicative of perceived 3D image quality, which quality metric is based on a combination of image values of the processed view and a further view, and determining a preferred value for the parameter based on performing said determining and calculating for multiple values of the parameter.
The measures have the effect that device receives a 3D video signal and determines a parameter for adapting views for the respective display to enhance the quality of the 3D image as displayed by the respective 3D display for a viewer. The process of adapting views for a particular display is called targeting the views for the 3D display. For example, the particular display may have a limited depth range for high quality 3D images. For example a gain parameter may be determined for applying to the depth values used for generating or adapting the views for such display. In a further example the respective display may have a preferred depth range, usually near the display screen that has a high sharpness, whereas 3D objects protruding towards the viewer tend to be less sharp. An offset parameter may be applied to the views to control the amount of disparity, and subsequently the 3D objects may be shifted towards the high sharpness, preferred depth range. Effectively the device is provided with an automatic system for adjusting said parameter for optimizing the 3D effect and perceived image quality of the respective 3D display. In particular the quality metric is calculated based on the combination of image values to determine the perceived 3D image quality and is used to measure the effect of multiple different values of the parameter on the 3D image quality.
The invention is also based on the following recognition. Traditionally the adjustment of the views for the respective 3D display may be performed manually by the viewer based on his judgment of the 3D image quality. Automatic adjustment, e.g. based on processing a depth or disparity map by gain and offset to map the depths into a preferred depth range of the respective 3D display, may result in images getting blurred for certain parts and/or a relatively small depth effect. The inventors have seen that such mapping tends to be biased by relatively large objects having a relatively large disparity, but a relatively low contribution to perceived image quality, such as remote clouds. The proposed quality metric is based on comparing image values of the combination of image values of the processed view that contains image data warped by disparity and image values of the further view, for example an image that is provided with the 3D video signal. The image values of the combination represent both the image content and the disparity in the views as disparity is different in both views. Effectively objects that have high contrasts or structure do contribute substantially to the quality metric, whereas objects having few perceivable characteristics do hardly contribute in spite of large disparity.
When the image metric is used to optimize parameters impacting the on-screen disparity of rendered images it is important to relate image information from different views. Moreover in order to best relate these views, the image information compared is preferably from the corresponding x,y position in the image. More preferably this involves re-scaling input and rendered image such that their image dimensions match, in which case the same x,y position can be matched.
Advantageously, by using the combination of image values of the further view and the processed view for calculating the metric a measure has been found that corresponds to the perceived image quality. Moreover, the proposed metric does not require that disparity data or depth maps as such are provided or calculated to determine the metric. Instead, the metric is based on the image values of the processed image, which are modified by the parameter, and the further view.
Optionally, the further view is a further processed view based on the 3D image data adapted by the parameter. The further view represents a different viewing angle, and is processed by the same value of the parameter, e.g. offset. The effect is that at least two processed views are compared and the quality metric represents the perceived quality due to the differences between the processed views.
Optionally, the further view is a 2D view available in the 3D image data. The effect is that the processed view is compared to an original 2D view that has a high quality and no artifacts due to view warping.
Optionally, the further view is a further processed view based on the 3D image data adapted by the parameter and the processed view and the further processed view are interleaved to constitute the combination of image values. The processed view may correspond to an interleaved 3D image to be displayed on an array of pixels of an auto stereoscopic 3D display by interleaving the multiple views. The interleaved 3D image is constructed by assembling a combined matrix of pixels to be transferred to a display screen, which is provided with optics to accommodate different, adjacent views in different directions so that such different views are perceived by the respective left and right eyes of viewers. For example the optics may be a lenticular array for constituting an autostereoscopic display (ASD) as disclosed in EP 0791847A1.
EP 0791847A1 by the same Applicant shows how image information associated with the different views may be interleaved for a lenticular ASD. As can be seen in the figures of EP 0791847A1, the respective subpixels of the display panel under the lenticular (or other light directing means) are assigned view numbers; i.e. they care information associated with that particular view. The lenticular (or other light directing means) overlaying the display panel subsequently directs the light emitted by the respective subpixels to the eyes of an observer, thereby providing the observer with pixels associated a first view to the left eye and a second view to the right eye. As a result the observer will, provided that proper information is provided in the first and second view image, perceive a stereoscopic image.
As disclosed in EP 0791847A1 pixels of different views are interleaved, preferably at the subpixel level when looking at the respective R, G and B values of a display panel. Advantageously, the processed image is now similar to the interleaved image that has to be generated for the final 3D display. The quality metric is calculated based on the interleaved image, e.g. by determining a sharpness of the interleaved image.
Optionally, the processor is arranged for determining at least a first view and a second view based on the 3D image data adapted by the parameter, and interleaving the at least first and second view to determine the processed view. The interleaved view is compared to the further view, e.g. a 2D image as provided in the 3D video signal.
Optionally, the processor is arranged for determining the processed view based on a leftmost and/or a rightmost view, the multiple views forming a sequence of views extending from the leftmost view to the rightmost view. Advantageously, the leftmost and/or rightmost view contain relatively high disparity with respect to the further view.
Optionally, the processor is arranged for calculating the quality metric based on a Peak Signal-to-Noise Ratio calculation on the combination of image values, or based on a sharpness calculation on the combination of image values. The Peak Signal-to-Noise Ratio (PSNR) is the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. The PSNR now provides a measure of perceived quality of 3D image.
Optionally in the 3D device the parameter for targeting the 3D video comprises at least one of an offset; a gain; or a type of scaling. The preferred value of such parameter is applied for targeting the views for the 3D display as a processing condition for adapting the warping of views. The offset, when applied to the views, effectively moves objects back or forth with respect to the plane of the display. Advantageously a preferred value for the offset moves important objects to a position near the 3D display plane. The gain, when applied to the views, effectively moves objects away or towards the plane of the 3D display. Advantageously, a preferred value for the gain moves important objects with respect to the 3D display plane. The type of scaling indicates how the values in the views are modified into actual values when warping the views, e.g. bi-linear scaling, bicubic scaling, or how to adapt the viewing cone.
Optionally, the processor is arranged for calculating the quality metric based on a central area of the combination of image values by ignoring border zones. The border zones may be disturbed, or incomplete due to the adapting by the parameter, and usually do not contain relevant high disparity values or protruding objects. Advantageously the metric, when only based on the central area, is more reliable.
Optionally, the processor is arranged for calculating the quality metric by applying a weighting on the combination of image values in dependence on corresponding depth values. Differences between the image values are further weighted by local depths, e.g. protruding objects that have more impact on perceived quality may be stressed to have more contribution to the quality metric.
Optionally, the processor is arranged for determining a region of interest in the processed view, and for calculating the quality metric by applying a weighting on the combination of image values in the region of interest. In the region of interest differences between the image values are weighted for calculating the quality metric. The processor may have a face detector for determining the region of interest.
Optionally, the processor is arranged for calculating the quality metric for a period of time in dependence of a shot in the 3D video signal. Effectively the preferred value of the parameter applies to a period of the 3D video signal that has a same 3D configuration, e.g. a specific camera and zoom configuration. Usually the configuration is substantially stable during a shot of a video program. Shot boundaries may be known or can be easily detected at the source side, and a preferred value for the parameter is advantageously determined for the time period corresponding to the shot.
Optionally, the processor may be further arranged for updating the preferred value of the parameter in dependence of a change of the region of interest exceeding a predetermined threshold, such as a substantial change of the depth position of a face.
Further preferred embodiments of devices and methods according to the invention are given in the appended claims, disclosure of which is incorporated herein by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other aspects of the invention will be apparent from and elucidated further with reference to the embodiments described by way of example in the following description and with reference to the accompanying drawings, in which
Figure 1 shows a system for processing 3D video data and displaying the 3D video data,
Figure 2 shows a method of processing a 3D video signal,
Figure 3 shows a distribution of disparity values,
Figure 4 shows a 3D signal,
Figure 5 shows interleaved views for various offset values,
Figure 6 shows a quality metric calculated for different values of an offset parameter,
Figure 7 shows a system to determine an offset based on a sharpness metric,
Figure 8 shows example depth map histograms, and
Figure 9 shows scaling for adapting the view cone.
The figures are purely diagrammatic and not drawn to scale. In the Figures, elements which correspond to elements already described may have the same reference numerals.
DETAILED DESCRIPTION OF EMBODIMENTS
There are many different ways in which 3D video signal may be formatted and transferred, according to a so-called a 3D video format. Some formats are based on using a 2D channel to also carry stereo information. In the 3D video signal the image is represented by image values in a two-dimensional array of pixels. For example the left and right view can be interlaced or can be placed side by side or top-bottom (above and under each other) in a frame. Also a depth map may be transferred, and possibly further 3D data like occlusion or transparency data. A disparity map, in this text, is also considered to be a type of depth map. The depth map has depth values also in a two-dimensional array corresponding to the image, although the depth map may have a resolution different from that of the "texture" input image(s) contained in the 3D signal. The 3D video data may be compressed according to compression methods known as such, e.g. MPEG. Any 3D video system, such as internet or a Blu-ray Disc (BD), may benefit from the proposed enhancements.
The 3D display can be a relatively small unit (e.g. a mobile phone), a large Stereo Display (STD) requiring shutter glasses, any stereoscopic display (STD), an advanced STD taking into account a variable baseline, an active STD that targets the L and R views to the viewers eyes based on head tracking, or an auto-stereoscopic multiview display (ASD), etc. Views need to be warped for said different types of displays, e.g. for ASD's and advanced STD's for variable baseline, based on the depth/disparity data in the 3D signal. When content is used that is not intended for playback on an autostereoscopic device, the disparity/depth in the image needs to be mapped onto a disparity range of the target display device, which is called targeting. However, due to targeting images may get blurred for certain parts and/or there is a relatively small depth effect.
Figure 1 shows a system for processing 3D video data and displaying the 3D video data. A 3D video signal 41 is provided to a 3D video device 50, which is coupled to a 3D display device 60 for transferring a 3D display signal 56. The 3D video signal may for example be a 3D TV broadcast signal such as a standard stereo transmission using ½ HD frame compatible, multi view coded (MVC) or frame compatible full resolution (e.g. FCFR as proposed by Dolby). Building upon a frame-compatible base layer, Dolby developed an enhancement layer to recreate the full resolution 3D images.
Figure 1 further shows a record carrier 54 as a carrier of the 3D video signal. The record carrier is disc-shaped and has a track and a central hole. The track, constituted by a pattern of physically detectable marks, is arranged in accordance with a spiral or concentric pattern of turns constituting substantially parallel tracks on one or more information layers. The record carrier may be optically readable, called an optical disc, e.g. a DVD or BD (Blu- ray Disc). The information is embodied on the information layer by the optically detectable marks along the track, e.g. pits and lands. The track structure also comprises position information, e.g. headers and addresses, for indication the location of units of information, usually called information blocks. The record carrier 54 carries information representing digitally encoded 3D image data like video, for example encoded according to the MPEG2 or MPEG4 encoding system, in a predefined recording format like the DVD or BD format. The 3D video device 50 has a receiver for receiving the 3D video signal 41, which receiver has one or more signal interface units and an input unit 51 for parsing the incoming video signal. For example, the receiver may include an optical disc unit 58 coupled to the input unit for retrieving the 3D video information from an optical record carrier 54 like a DVD or Blu-ray disc. Alternatively (or additionally), the receiver may include a network interface unit 59 for coupling to a network 45, for example the internet or a broadcast network, such device being a set-top box or a mobile computing device like a mobile phone or tablet computer. The 3D video signal may be retrieved from a remote website or media server. The 3D video device may be a converter that converts an image input signal to an image output signal having view targeting information, e.g. a preferred value for a parameter for targeting as described below. Such a converter may be used to convert input 3D video signals for a specific type of 3D display, for example standard 3D content to a video signal suitable for auto-stereoscopic displays of a particular type or vendor. The 3D display requires multiple views for creating a 3D effect for a viewer. In practice, the 3D video device may be a 3D enabled amplifier or receiver, a 3D optical disc player, or a satellite receiver or set top box, or any type of media player. Alternatively the 3D video device may be integrated in a multi-view ASD, such as a barrier or lenticular based ASD.
The 3D video device has a processor 52 coupled to the input unit 51 for processing the 3D information for generating a 3D display signal 56 to be transferred via an output interface unit 55 to the 3D display device, e.g. a display signal according to the HDMI standard, see "High Definition Multimedia Interface; Specification Version 1.4a of March 4, 2010", the 3D portion of which being available at
http://hdmi.org/manufacturer/specification.aspx for public download.
The 3D display device 60 is for displaying the 3D image data. The device has an input interface unit 61 for receiving the 3D display signal 56 including the 3D video data and the view targeting information transferred from the 3D video device 50. The device has a view processor 62 for providing multiple views of the 3D video data based on the 3D video information. The views may be generated from the 3D image data using a 2D view at a known position and a depth map. The process of generating a view for a different 3D display eye position, based on using a view at a known position and a depth map is called warping of a view. The views are further adapted based on the view targeting parameter as discussed below. Alternatively the processor 52 in the 3D video device may be arranged to perform said view processing. Multiple views generated for the specified 3D display may be transferred with the 3D image signal towards said 3D display. The 3D video device and the display may be combined into a single device. The functions of the processor 52 and the video processor 62, and remaining functions of output unit 55 and input unit 61, may be performed by a single processor unit. The functions of the processor are described now.
In operation, the processor determines a processed view based on at least one of the multiple views adapted by a parameter for targeting the multiple views to the 3D display. The parameter may for example be an offset, and/or a gain, applied to the views for targeting the views to the 3D display. Then the processor determines a combination of image values of the processed view that contains image data warped by disparity and image values of a further view, for example an image that is provided with the 3D video signal.
Subsequently, a quality metric is calculated indicative of perceived 3D image quality. The quality metric is based on the combination of image values. The process of determining the processed view and calculating the quality metric is repeated for multiple values of the parameter, and a preferred value for the parameter is determined based on the respective metric s .
When the quality metric is being calculated based on non-interleaved images, it is preferable to relate image information from the corresponding (x,y) position in the images. When the rendered image is not at the same spatial resolution, preferably one or both images are scaled so as to simplify the calculation of the quality metric in that then the same spatial (x,y) positions can be used. Alternatively the quality metric calculation can be adapted so as to handle the original unsealed images, but to relate the proper image information, e.g. by calculating one or more intermediate values that allow comparison of the non-interleaved images.
The parameter may also be a type of scaling, which indicates how the values in the depth map are to be translated into actual values to be used when warping the views, e.g. bi-linear scaling, bicubic scaling, or a predetermined type of non-linear scaling. For different types of scaling the quality metric is calculated, and a preference is determined. A further type of scaling refers to scaling the shape of the view cone, which is described below with reference to Figure 8.
The further view in the combination of image values may be a further processed view based on the 3D image data adapted by the parameter. The further view represents a different viewing angle, and is processed by the same value of the parameter, e.g. offset. The quality metric now represents the perceived quality due to the differences between the processed views. The further view may be a 2D view available in the 3D image data. Now the processed view is compared to an original 2D view that has a high quality and no artifacts due to view warping.
Alternatively, the further view may be a further processed view based on the 3D image data adapted by the parameter and the processed view and the further processed view are interleaved to constitute the combination of image values. Now a single interleaved image contains the image values of the combination. For example, the processed view may correspond to an interleaved 3D image to be displayed on an array of pixels of an auto stereoscopic 3D display by interleaving the multiple views. The quality metric is calculated based on the interleaved image as such, e.g. by determining a sharpness of the interleaved image.
The processor may be arranged for determining at least a first view and a second view based on the 3D image data adapted by the parameter, and interleaving the at least first and second view to determine the processed view. The interleaved view is compared to the further view, e.g. a 2D image as provided in the 3D video signal to calculate the quality metric, e.g. based on a PSNR calculation.
The processor may be arranged for determining the processed view based on a leftmost and/or a rightmost view from a sequence of views extending from the leftmost view to the rightmost view. Such an extreme view does have the highest disparity, and therefore the quality metric will be affected substantially.
Figure 2 shows a method of processing a 3D video signal. The 3D video signal contains at 3D image data to be displayed on a 3D display, which 3D display requires multiple views for creating a 3D effect for a viewer. Initially, at stage 21 RCV the method starts with receiving the 3D video signal. Next in stage SETPAR 22, a value is set for a parameter for targeting the multiple views to the 3D display, e.g. an offset parameter.
Different values for the parameter are subsequently set for further iterations of the process. Next, at stage PVIEW 23, a processed view is determined based on at least one of the multiple views adapted by the actual value of the parameter, as described above. Next, at stage METR 24, a quality metric is calculated indicative of perceived 3D image quality. The quality metric is based on the combination of image values of the processed view and the further view. Next, at stage LOOP 25, it is decided whether further values of the parameter need to be evaluated. If so, the process continues at stage SETPAR 22. When sufficient values for the parameter have been evaluated, at stage PREF 26, a preferred value for the parameter is determined based on the multiple corresponding quality metrics acquired by the loops of said determining and calculating for multiple values of the parameter. For example, the parameter value may be selected that has the best value for the quality metric, or an interpolation may be performed on the quality metric values found to estimate an optimum, e.g. a maximum.
Effectively the repeated calculation provides a solution in which a mapping is used to render an image and subsequently an error measure/metric is established based on the rendered image (or part thereof) so as to establish an improved mapping. The error measure that is determined may be based on a processed view resulting from the interleaving of views. An alternative a processed view may be based on one or more views prior to interleaving, as described above.
The processing of 3D video may be used to convert content "off-line", e.g. during recording or using a short video delay. For example the parameter may be determined for a period of a shot. Disparity at the start and end of a shot might be quite different. In spite of such differences the mapping within a shot needs to be continuous. Processing for periods may require shot-cut detection, off-line processing and/or buffering. Automatically detecting boundaries of a shot as such is known. Also the boundaries may already be marked or may be determined during a video editing process. For example an offset value that is determined for a close-up shot of a face, may be succeeded by a next offset value for a next shot of a remote landscape.
Figure 3 shows a distribution of disparity values. The Figure shows a graph of disparity values from a 3D image. The disparities vary from a low disparity value Disp_low to high disparity value Disp_high and may have statistical distribution as shown in the figure. The example of distribution of disparities in the image content has a median or center of gravity at -10 pixels disparity. Such disparity range must be mapped to a depth map to support an auto-stereoscopic display. Traditionally, the disparities between Disp_low to Disp_high may be mapped linearly to depth 0 ..255. Low and high values can also be the 5% or 95% points of the distribution. The disparities may be determined for each shot using a shot detector. However linear mapping might lead to problems with asymmetric distributions. An alternative mapping might be to map the center of gravity of the distribution (i.e. -10 pixels in the example) to a depth value corresponding to ASD on-screen level (usually 128) and the disparity range linear around this on-screen depth level. However, such mapping often does not match with the visual perception when looking to the ASD. Often for some object close to the viewer (out of screen), or objects far from the viewer, an annoying blurring can be observed. The blurring is content dependent. An unattractive remedy to avoid the blurring, is to reduce the overall depth range (low gain), however this leads to less perceived depth on the ASD. Manual control is also unattractive.
In an embodiment the following processing is implemented. First a depth map is provided, for example by converting stereo to 2D and depth. Then an initial mapping is performed, using a first reasonable disparity to depth mapping, such as mapping the center of the distribution to the depth value corresponding to ASD screen level. Then a number of views are generated from this depth and 2D signal and then interleaved to create a processed view. The interleaved view may be coupled to the ASD display panel. The idea is to use the processed view as a 2D signal, and compare it with the original 2D signal. The process is repeated for a range of depth (or disparity) offset values. The comparison as such can be done by a known method such as spectrum analysis, FFT, etc, but can also be a more simple method such a SAD or PSNR calculation. The area for processing may be limited to a central area of the image by avoiding the border data, for example a border of 30 pixels wide for the horizontal and vertical borders.
Figure 4 shows a 3D signal. The 3D video signal comprises a 2D image and a corresponding depth map. Figure 4a shows a 2D image, and Figure 4b shows a corresponding depth map. The views for rendering on the 3D display are generated based on the 2D image and the depth map. Subsequently the views are interleaved to create an interleaved view. The interleaved view may be transferred to an LCD panel of an autostereoscopic display. The interleaved views for different values of offset are now used as the processed views to calculate the quality metric based on PSNR for the respective offsets, as illustrated by Figures 5 and 6.
The Figures 5 were generated for a display panel having a 1920x1080 screen resolution wherein each pixel was composed of three RGB subpixels. The rendered images represent images that were rendered using different depth offset parameters; i.e. the depth level in the range of 0-255 that corresponds to zero-disparity on the display.
As a result of the difference in aspect ratio between the input image and that of the target device, the image is stretched along its horizontal axis. In order to better observe the differences between the respective images a section of the interleaved images has been enlarged. In order to calculate a PSNR quality metric the original input image (Fig. 4a) was scaled to 1920x1080. Subsequently the PSNR quality metrics were calculated for Fig.
5a-5d. The interleaved images were rendered for an ASD having a slanted lenticular applied.
As a result of the interleaving process the sub-pixels of all 1920x1080 image pixels of the respective interleaved image comprise view information associated with three different views.
Fig. 5a-5d correspond with four different depth offset values; an offset of 110, 120, 130 and 140 respectively. Visually, the different offsets result in objects at different depths in the image being imaged more or less sharp as a result of the interleaving process and the different displacements (disparity) of image information in the rendered views. As a result the "crisp" zigzag pattern on the mug visible in Fig. 5a is blurred in Fig. 5b-d.
Figure 5a shows the interleaved picture with offset = 110. The quality metric is calculated based on PSNR with 2D picture, and is 25.76 dB.
Figure 5b shows the interleaved picture with offset = 120. The quality metric is calculated based on PSNR with 2D picture, and is 26.00 dB.
Figure 5c shows the interleaved picture with offset = 130. The quality metric is calculated based on PSNR with 2D picture, and is 25.91 dB.
Figure 5d shows the interleaved picture with offset = 140. The quality metric is calculated based on PSNR with 2D picture, and is 25.82 dB.
In the example illustrated by Figure 5 the optimum offset parameter would be 120.
Figure 6 shows a quality metric calculated for different values of an offset parameter. The Figure shows the quality metric values based on the PSNR as a function of the offset parameter value. From the curve in the Figure it can be seen that an offset value of 120 results in the maximum value of the quality metric. Verification by a human viewer confirmed that 120 indeed is the optimum value for the offset for this image.
It is noted that the method not only takes disparities into account, or just information from the 2D signal, but establishes a combined analysis. Due to the combined analysis, for example skies or clouds with little details but with large disparity values hardly contribute to the PSNR differences. This corresponds to perceived 3D image quality, since such objects at a somewhat blurred display position also hardly hamper the viewing experience. The processed view may be a virtual interleaved view, i.e. different from the actual ASD interleaved view, by using an interleaving scheme with less views, or just one extreme view.
In the device as shown in Figure 1, the processor may be equipped as follows.
The processor may have a unit for determining a region of interest in the processed view, and for calculating the quality metric by applying a weighting on differences of image values in the region of interest for displaying the region of interest in a preferred depth range of the 3D display. The parameter is determined so as to enable displaying the region of interest in a preferred depth range of the 3D display. Effectively, the region of interest is constituted by elements or objects in the 3D video material that are assumed to catch the viewer's attention. For example, the region of interest data may indicate an area of the image that has a lot of details which will probably get the attention of the viewer. The region of interest may be known or can be detected, or an indication may be available in the 3D video signal.
In the region of interest differences between the image values are weighted, e.g. objects that are intended to have more impact on perceived quality may be stressed to have more contribution to the quality metric. For example, the processor may have a face detector 53. A detected face may be used to determine the region of interest. Making use of the face detector, optionally in combination with the depth map, a weighting may be applied for areas with faces to the corresponding image value differences, e.g. 5 times the normal weight on the squared differences for the PSNR calculation. Also the weighting could be multiplied with the depth value or a value derived from the depth, e.g. a further weighting for faces at large depths (far out of screen), e.g. lOx, and weighting for faces at small depths (faces behind the screen) e.g. 4x.
Furthermore, the processor may be equipped for calculating the quality metric by applying a weighting on differences of image values in dependence on corresponding depth values. Selectively a weight depending on the depth may be applied to image differences while calculating the metric, for example weighting at large depth 2x, and weighting at small depths lx. This relates to the perceived quality, because blurring in the foreground is more annoying than blurring in the background.
Optionally, a weight may be applied depending on the absolute difference of the depth and the depth value at screen level. For example a weighting at large depth differences of 2x, and weighting at small depths differences of lx. This relates to the perceived quality, because the sensitivity of determining the optimal (minimum PSNR) offset level is increased.
In an embodiment the processor is equipped for calculating the quality metric based on processing along horizontal lines of the combination of image values. It is noted that disparity differences always occur in horizontal direction corresponding to the orientation of the eyes of viewers. Hence the quality metric may effectively be calculated in horizontal direction of the images. Such a one-dimensional calculation is less complex. Also the processor may be equipped for reducing the resolution of the combination of image values, for example by decimating the matrix of image values of the combination.
Furthermore, the processor may be equipped for applying a subsampling pattern or random subsampling to the combination of image values. The subsampling pattern may be designed to take different pixels on adjacent lines, in order to avoid missing regular structures in the image content. Advantageously, the random subsampling achieves that structured patterns do still contribute to the calculated quality metric.
A system to automatically determine the offset for a 3D display may be based on using a sharpness metric. As such sharpness is an important parameter that influences the picture quality of 3D displays, especially auto-stereoscopic displays (ASD). The sharpness metric may be applied to the combination of image values as described above. The document "Local scale control for edge detection and blur estimation, by J. H. Elder and S. W. Zucker," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 699-716, July 1998 describes a method to calculate a blur-radius for the edges in an image.
Alternatively, the system may be applied to an image with an accompanying depth map. The latter can e.g. be estimated from a stereo pair (left + right image), or transferred with the 3D video data. The idea of the system is to weigh the histogram of the depth map using the sharpness metric. Then the depth values corresponding to sharp (in focus) areas of the image will have a higher weight than un-sharp areas. As such the mean of the resulting histogram will bias towards the in-focus depth plane. As a sharpness metric, the inverse of the blur-radius may be used.
Figure 7 shows a system to determine an offset based on a sharpness metric. A 3D signal having image and depth data is provided at the input. In a segmenting unit 61 a binary segmentation map S is calculated using i.e. edge detection. S now indicates pixels in the image where the blur-radius can be calculated. In a blur-radius calculator 62 the blur- radius BR(S) is calculated for the segmented input image. In an inverter 63 (denoted by 1/X) the reciprocal value of the blur radius is used for determining the sharpness metric W(S). In histogram calculator 64 a weighted histogram of the segmented depth-map is calculated. In this process, depth-values depth(S) are multiplied (weighted) with the sharpness metric W(S). In an average calculator 65 the mean of the histogram is calculated, which is now biased towards the focal plane (= optimal offset) of the input image. In such a system a processor would be arranged for calculating a sharpness metric for locations in the input image, determining depths at the locations, weighting the depths with the corresponding sharpness metric and determining a mean value of the weighted depths. The mean value may be shifted to a preferred sharpness value of the 3D display by applying a corresponding offset to the depths. Figure 8 shows example depth map histograms. The histograms shows depth values of an example picture. The depth map values are between 0-255. The image has a focal plane around depth = 104, which depth would be an optimal offset for an ASD putting the sharp areas on-screen (zero-disparity). The upper graph 81 shows the original histogram of the depth map. The mean of this histogram is depth = 86, which substantially deviates from the optimal value of depth = 104. The lower graph 82 shows the weighted histogram using the sharpness metric. The mean of this histogram is depth = 96, which is closer to the optimal value of depth = 104.
Figure 9 shows scaling for adapting the view cone. The view cone refers to the sequence of warped views for a multiview 3D display. The type of scaling indicates the way the view cone is adapted compared to a regular cone in which each consecutive view has a same disparity difference with the preceding view. Altering the cone shape means changing the relative disparity of neighboring views by an amount less than said same disparity difference.
Figure 9 top-left shows a regular cone shape. The regular cone shape 91 is commonly used in traditional multiview Tenderers. The shape has an equal amount of stereo for most of the cone and a sharp transition towards the next repetition of the cone. A user positioned in this transition area will perceive a large amount of crosstalk and inverse stereo. In the Figure a saw tooth shaped curve indicates the regular cone shape 91 having a disparity linearly related to its position in the cone. The position of the views within the viewing cone is defined to be zero for the cone center, -1 for entirely left and +1 for entirely right.
It should be understood that altering the cone shape changes only the rendering of content on the display (i.e. view synthesis, interleaving) and does not require physical adjustments to the display. By adapting the viewing cone artifacts may be reduced and a zone of reduced 3D effect may be created for accommodating humans that have no or limited stereo viewing ability, or prefer watching limited 3D or 2D video. The parameter for adapting the depths or the warping may be the type of scaling which is used for the 3D video material at the source side for altering the cone shape. For example a set of possible scaling cone shapes for adapting the view cone may be predefined and each shape may be given an index, whereas the actual index value is selected based on the quality metric as calculated for the set of shapes.
In the further three graphs of the Figure the second curve shows three examples of adapted cone shapes. The views on the second curve in each example have a reduced disparity difference with the neighboring views. The viewing cone shape is adapted to reduce the visibility of artifacts by reducing the maximum rendering position. At the center position the alternate cone shapes may have the same slope as the regular cone. Further away from the center, the cone shape is altered (in respect to the regular cone) to limit image warping.
Figure 9 top-right shows a cyclic cone shape. The cyclic cone shape 92 is adapted to avoid the sharp transition by creating a bigger but less strong inverse stereo region.
Figure 9 bottom-left shows a limited cone. The limited cone shape 93 is an example of a cone shape that limits the maximum rendering position to about 40% of the regular cone. When a user moves through the cone, he/she experiences a cycle of stereo, reduced stereo, inverse stereo and again reduced stereo.
Figure 9 bottom-right shows a 2D-3D cone. The 2D-3D cone shape 94 also limits the maximum rendering position, but re-uses the outside part of the cone to offer a mono (2D) viewing experience. When a user moves through this cone, he/she experiences a cycle of stereo, inverse stereo, mono and again inverse stereo. This cone shape allows a group of people of which only some members prefer stereo over mono to watch a 3D movie.
In summary, the invention aims to provide a targeting method that aims to reduce the blur in the image resulting from the mapping. The standard process of creating an image for display on a multi-view (lenticular/barrier) display is to generate multiple views and to interleave these views, typically on pixel or subpixel level, so that the different views are placed under the lenticular in manner suitable for 3D display. It is proposed to use a processed view, e.g. the interleaved image, as a normal 2D image and compare it with a further view, e.g. the original 2D signal, for a range of values of a mapping parameter, such as offset, and calculate a quality metric. The comparison can be based on any method, such as spectrum analysis, or SAD and PSNR measurements. The analysis does not only take disparities into account but also takes into account the image content. That is, if an area of the image does not contribute to the stereoscopic effect due to the nature of the image content, then that particular area does not contribute substantially to the quality metric.
It is noted that the current invention may be used for any type of 3D image data, either still picture or moving video. 3D image data is assumed to be available as electronic, digitally encoded, data. The current invention relates to such image data and manipulates the image data in the digital domain. The invention may be implemented in hardware and/or software, or in programmable components. For example a computer program product may implement the methods as described with reference to Figure 2.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without deviating from the invention. For example, functionality illustrated to be performed by separate units, processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization. The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
It is noted, that in this document the word 'comprising' does not exclude the presence of other elements or steps than those listed and the word 'a' or 'an' preceding an element does not exclude the presence of a plurality of such elements, that any reference signs do not limit the scope of the claims, that the invention may be implemented by means of both hardware and software, and that several 'means' or 'units' may be represented by the same item of hardware or software, and a processor may fulfill the function of one or more units, possibly in cooperation with hardware elements. Further, the invention is not limited to the embodiments, and the invention lies in each and every novel feature or combination of features described above or recited in mutually different dependent claims.

Claims

CLAIMS:
1. 3D video device (50) for processing a three dimensional [3D] video signal
(41), the 3D video signal comprising 3D image data to be displayed on a 3D display, which 3D display requires multiple views for creating a 3D effect for a viewer, the 3D video device comprising:
- receiver (51,58,59) for receiving the 3D video signal,
a processor (52) for
determining at least one processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display,
calculating a quality metric indicative of perceived 3D image quality, which quality metric is based on a combination of image values of the processed view and a further view, and
determining a preferred value for the parameter based on performing said determining and calculating for multiple values of the parameter.
2. 3D video device as claimed in claim 1, wherein the further view is a further processed view based on the 3D image data adapted by the parameter, or the further view is a 2D view available in the 3D image data, or the further view is a further processed view based on the 3D image data adapted by the parameter and the processed view and the further processed view are interleaved to constitute the combination of image values.
3. 3D video device as claimed in claim 1, wherein the processor (52) is arranged for determining at least a first view and a second view based on the 3D image data adapted by the parameter, and interleaving the at least first and second view to determine the processed view, or the processor is arranged for determining the processed view based on a leftmost and/or a rightmost view, the multiple views forming a sequence of views extending from the leftmost view to the rightmost view.
4. 3D video device as claimed in claim 1, wherein the processor (52) is arranged for calculating the quality metric based on a Peak Signal-to-Noise Ratio calculation on the combination of image values, or based on a sharpness calculation on the combination of image values.
5. 3D video device as claimed in claim 1, wherein the parameter for targeting the 3D video comprises at least one of:
an offset;
a gain;
a type of scaling.
6. 3D video device as claimed in claim 1, wherein the processor (52) is arranged for calculating the quality metric based on a central area of the combination of image values by ignoring border zones, or for calculating the quality metric by applying a weighting on the combination of image values in dependence on corresponding depth values.
7. 3D video device as claimed in claim 1, wherein the processor (52) is arranged for determining a region of interest in the processed view, and for calculating the quality metric by applying a weighting on the combination of image values in the region of interest for displaying the region of interest in a preferred depth range of the 3D display.
8. 3D video device as claimed in claim 7, wherein the processor (52) comprises a face detector (53) for determining the region of interest.
9. 3D video device as claimed in claim 1, wherein the processor (52) is arranged for calculating the quality metric for a period of time in dependence of a shot in the 3D video signal.
10. 3D video device as claimed in claim 1, wherein the processor (52) is arranged for calculating the quality metric based on a subset of the combination of image values by at least one of:
processing along horizontal lines of the combination of image values;
reducing the resolution of the combination of image values;
applying a subsampling pattern or random subsampling to the combination of image values.
11. 3D video device as claimed in claim 1, wherein the receiver comprises a read unit (58) for reading a record carrier for receiving the 3D video signal.
12. 3D video device as claimed in claim 1, wherein the device comprises:
- a view processor (62) for generating the multiple views of the 3D video data based on the 3D video signal and for targeting the multiple views to the 3D display in dependence of the preferred value of the parameter;
the 3D display (63) for displaying the targeted multiple views.
13. Method of processing a three dimensional [3D] video signal, the 3D video signal comprising at least a first image to be displayed on a 3D display, which 3D display requires multiple views for creating a 3D effect for a viewer, the method comprising:
receiving the 3D video signal,
determining at least one processed view based on the 3D image data adapted by a parameter for targeting the multiple views to the 3D display,
calculating a quality metric indicative of perceived 3D image quality, which quality metric is based on a combination of image values of the processed view and a further view, and
determining a preferred value for the parameter based on performing said determining and calculating for multiple values of the parameter.
14. Method as claimed in claim 13, wherein the further view is a further processed view based on the 3D image data adapted by the parameter, or the further view is a 2D view available in the 3D image data, or the further view is a further processed view based on the 3D image data adapted by the parameter and the processed view and the further processed view are interleaved to constitute the combination of image values.
15. Computer program product for processing a three dimensional [3D] video signal, which program is operative to cause a processor to perform the respective steps of the method as claimed in claim 13.
PCT/IB2013/053461 2012-05-02 2013-05-02 Quality metric for processing 3d video WO2013164778A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2015509552A JP6258923B2 (en) 2012-05-02 2013-05-02 Quality metrics for processing 3D video
EP13729086.2A EP2845384A1 (en) 2012-05-02 2013-05-02 Quality metric for processing 3d video
CN201380023230.3A CN104272729A (en) 2012-05-02 2013-05-02 Quality metric for processing 3d video
US14/397,404 US20150085073A1 (en) 2012-05-02 2013-05-02 Quality metric for processing 3d video

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261641352P 2012-05-02 2012-05-02
US61/641,352 2012-05-02

Publications (1)

Publication Number Publication Date
WO2013164778A1 true WO2013164778A1 (en) 2013-11-07

Family

ID=48626493

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/053461 WO2013164778A1 (en) 2012-05-02 2013-05-02 Quality metric for processing 3d video

Country Status (5)

Country Link
US (1) US20150085073A1 (en)
EP (1) EP2845384A1 (en)
JP (1) JP6258923B2 (en)
CN (1) CN104272729A (en)
WO (1) WO2013164778A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106664397A (en) * 2014-06-19 2017-05-10 皇家飞利浦有限公司 Method and apparatus for generating a three dimensional image

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101588877B1 (en) * 2008-05-20 2016-01-26 펠리칸 이매징 코포레이션 Capturing and processing of images using monolithic camera array with heterogeneous imagers
US11792538B2 (en) 2008-05-20 2023-10-17 Adeia Imaging Llc Capturing and processing of images including occlusions focused on an image sensor by a lens stack array
US8866920B2 (en) 2008-05-20 2014-10-21 Pelican Imaging Corporation Capturing and processing of images using monolithic camera array with heterogeneous imagers
US8514491B2 (en) 2009-11-20 2013-08-20 Pelican Imaging Corporation Capturing and processing of images using monolithic camera array with heterogeneous imagers
US8878950B2 (en) 2010-12-14 2014-11-04 Pelican Imaging Corporation Systems and methods for synthesizing high resolution images using super-resolution processes
WO2013043761A1 (en) 2011-09-19 2013-03-28 Pelican Imaging Corporation Determining depth from multiple views of a scene that include aliasing using hypothesized fusion
WO2013049699A1 (en) 2011-09-28 2013-04-04 Pelican Imaging Corporation Systems and methods for encoding and decoding light field image files
WO2013126578A1 (en) 2012-02-21 2013-08-29 Pelican Imaging Corporation Systems and methods for the manipulation of captured light field image data
WO2014005123A1 (en) 2012-06-28 2014-01-03 Pelican Imaging Corporation Systems and methods for detecting defective camera arrays, optic arrays, and sensors
US20140002674A1 (en) 2012-06-30 2014-01-02 Pelican Imaging Corporation Systems and Methods for Manufacturing Camera Modules Using Active Alignment of Lens Stack Arrays and Sensors
EP3869797B1 (en) 2012-08-21 2023-07-19 Adeia Imaging LLC Method for depth detection in images captured using array cameras
WO2014032020A2 (en) 2012-08-23 2014-02-27 Pelican Imaging Corporation Feature based high resolution motion estimation from low resolution images captured using an array source
US20140092281A1 (en) 2012-09-28 2014-04-03 Pelican Imaging Corporation Generating Images from Light Fields Utilizing Virtual Viewpoints
US8866912B2 (en) 2013-03-10 2014-10-21 Pelican Imaging Corporation System and methods for calibration of an array camera using a single captured image
WO2014164550A2 (en) 2013-03-13 2014-10-09 Pelican Imaging Corporation System and methods for calibration of an array camera
US9578259B2 (en) 2013-03-14 2017-02-21 Fotonation Cayman Limited Systems and methods for reducing motion blur in images or video in ultra low light with array cameras
US9497429B2 (en) 2013-03-15 2016-11-15 Pelican Imaging Corporation Extended color processing on pelican array cameras
US9445003B1 (en) 2013-03-15 2016-09-13 Pelican Imaging Corporation Systems and methods for synthesizing high resolution images using image deconvolution based on motion and depth information
EP2973476A4 (en) 2013-03-15 2017-01-18 Pelican Imaging Corporation Systems and methods for stereo imaging with camera arrays
US10122993B2 (en) 2013-03-15 2018-11-06 Fotonation Limited Autofocus system for a conventional camera that uses depth information from an array camera
US9898856B2 (en) 2013-09-27 2018-02-20 Fotonation Cayman Limited Systems and methods for depth-assisted perspective distortion correction
US10119808B2 (en) 2013-11-18 2018-11-06 Fotonation Limited Systems and methods for estimating depth from projected texture using camera arrays
US9426361B2 (en) 2013-11-26 2016-08-23 Pelican Imaging Corporation Array camera configurations incorporating multiple constituent array cameras
WO2015134996A1 (en) 2014-03-07 2015-09-11 Pelican Imaging Corporation System and methods for depth regularization and semiautomatic interactive matting using rgb-d images
CN107077743B (en) 2014-09-29 2021-03-23 快图有限公司 System and method for dynamic calibration of an array camera
CN106157285B (en) * 2015-04-03 2018-12-21 株式会社理光 For selecting the method and system of the preferred value of the parameter group for disparity computation
US10237473B2 (en) * 2015-09-04 2019-03-19 Apple Inc. Depth map calculation in a stereo camera system
PT3345389T (en) * 2015-09-05 2021-02-03 Leia Inc Supersampled 3d display with improved angular resolution
US10620441B2 (en) * 2016-12-14 2020-04-14 Qualcomm Incorporated Viewport-aware quality metric for 360-degree video
US10776992B2 (en) * 2017-07-05 2020-09-15 Qualcomm Incorporated Asynchronous time warp with depth data
CN109147023A (en) * 2018-07-27 2019-01-04 北京微播视界科技有限公司 Three-dimensional special efficacy generation method, device and electronic equipment based on face
WO2021055585A1 (en) 2019-09-17 2021-03-25 Boston Polarimetrics, Inc. Systems and methods for surface modeling using polarization cues
MX2022004162A (en) 2019-10-07 2022-07-12 Boston Polarimetrics Inc Systems and methods for augmentation of sensor systems and imaging systems with polarization.
KR20230116068A (en) 2019-11-30 2023-08-03 보스턴 폴라리메트릭스, 인크. System and method for segmenting transparent objects using polarization signals
CN115552486A (en) 2020-01-29 2022-12-30 因思创新有限责任公司 System and method for characterizing an object pose detection and measurement system
WO2021154459A1 (en) 2020-01-30 2021-08-05 Boston Polarimetrics, Inc. Systems and methods for synthesizing data for training statistical models on different imaging modalities including polarized images
US11953700B2 (en) 2020-05-27 2024-04-09 Intrinsic Innovation Llc Multi-aperture polarization optical systems using beam splitters
US12020455B2 (en) 2021-03-10 2024-06-25 Intrinsic Innovation Llc Systems and methods for high dynamic range image reconstruction
US12069227B2 (en) 2021-03-10 2024-08-20 Intrinsic Innovation Llc Multi-modal and multi-spectral stereo camera arrays
US11954886B2 (en) 2021-04-15 2024-04-09 Intrinsic Innovation Llc Systems and methods for six-degree of freedom pose estimation of deformable objects
US11290658B1 (en) 2021-04-15 2022-03-29 Boston Polarimetrics, Inc. Systems and methods for camera exposure control
US12067746B2 (en) 2021-05-07 2024-08-20 Intrinsic Innovation Llc Systems and methods for using computer vision to pick up small objects
US11689813B2 (en) 2021-07-01 2023-06-27 Intrinsic Innovation Llc Systems and methods for high dynamic range imaging using crossed polarizers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0791847A1 (en) 1996-02-23 1997-08-27 Koninklijke Philips Electronics N.V. Autostereoscopic display apparatus
WO2011081646A1 (en) * 2009-12-15 2011-07-07 Thomson Licensing Stereo-image quality and disparity/depth indications

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3826236B2 (en) * 1995-05-08 2006-09-27 松下電器産業株式会社 Intermediate image generation method, intermediate image generation device, parallax estimation method, and image transmission display device
JP3477023B2 (en) * 1996-04-05 2003-12-10 松下電器産業株式会社 Multi-view image transmission method and multi-view image display method
JP2001175863A (en) * 1999-12-21 2001-06-29 Nippon Hoso Kyokai <Nhk> Method and device for multi-viewpoint image interpolation
JP4065488B2 (en) * 2001-12-14 2008-03-26 キヤノン株式会社 3D image generation apparatus, 3D image generation method, and storage medium
JP2005073049A (en) * 2003-08-26 2005-03-17 Sharp Corp Device and method for reproducing stereoscopic image
JP2006211386A (en) * 2005-01-28 2006-08-10 Konica Minolta Photo Imaging Inc Stereoscopic image processing apparatus, stereoscopic image display apparatus, and stereoscopic image generating method
US8224067B1 (en) * 2008-07-17 2012-07-17 Pixar Animation Studios Stereo image convergence characterization and adjustment
EP2434764A1 (en) * 2010-09-23 2012-03-28 Thomson Licensing Adaptation of 3D video content
AU2012219026B2 (en) * 2011-02-18 2017-08-03 Iomniscient Pty Ltd Image quality assessment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0791847A1 (en) 1996-02-23 1997-08-27 Koninklijke Philips Electronics N.V. Autostereoscopic display apparatus
WO2011081646A1 (en) * 2009-12-15 2011-07-07 Thomson Licensing Stereo-image quality and disparity/depth indications

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEXANDRE BENOIT ET AL: "Quality Assessment of Stereoscopic Images", EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, vol. 18, no. 2, 1 January 2008 (2008-01-01), pages 69 - 13, XP055073037, ISSN: 1687-5176, DOI: 10.1023/A:1014573219977 *
DIDYK ET AL.: "A Perceptual Model for disparity", ACM TRANSACTIONS ON GRAPHICS, PROC. OF SIGGRAPH, vol. 30, no. 4, 2011
J. H. ELDER; S. W. ZUCKER, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 20, no. 7, July 1998 (1998-07-01), pages 699 - 716

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106664397A (en) * 2014-06-19 2017-05-10 皇家飞利浦有限公司 Method and apparatus for generating a three dimensional image
CN106664397B (en) * 2014-06-19 2018-11-16 皇家飞利浦有限公司 Method and apparatus for generating 3-D image
TWI683281B (en) * 2014-06-19 2020-01-21 荷蘭商皇家飛利浦有限公司 Method and apparatus for generating a three dimensional image

Also Published As

Publication number Publication date
EP2845384A1 (en) 2015-03-11
CN104272729A (en) 2015-01-07
JP2015521407A (en) 2015-07-27
US20150085073A1 (en) 2015-03-26
JP6258923B2 (en) 2018-01-10

Similar Documents

Publication Publication Date Title
US20150085073A1 (en) Quality metric for processing 3d video
TWI574544B (en) Saliency based disparity mapping
RU2554465C2 (en) Combination of 3d video and auxiliary data
Zinger et al. Free-viewpoint depth image based rendering
RU2568309C2 (en) Detection of 3d video format
US8270768B2 (en) Depth perception
US9031356B2 (en) Applying perceptually correct 3D film noise
US20110193860A1 (en) Method and Apparatus for Converting an Overlay Area into a 3D Image
KR101975247B1 (en) Image processing apparatus and image processing method thereof
WO2010095080A1 (en) Combining 3d image and graphical data
WO2006046180A1 (en) Disparity map
EP2553932B1 (en) Disparity value indications
WO2011039679A1 (en) Selecting viewpoints for generating additional views in 3d video
US9654759B2 (en) Metadata for depth filtering
Winkler et al. Stereo/multiview picture quality: Overview and recent advances
JP5861114B2 (en) Image processing apparatus and image processing method
Lee et al. View synthesis using depth map for 3D video
US9787980B2 (en) Auxiliary information map upsampling
Lin et al. A stereoscopic video conversion scheme based on spatio-temporal analysis of MPEG videos
TWI624803B (en) Depth signaling data
JP6131256B6 (en) Video processing apparatus and video processing method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13729086

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14397404

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2015509552

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE