US20110292044A1 - Depth map coding using video information - Google Patents
Depth map coding using video information Download PDFInfo
- Publication number
- US20110292044A1 US20110292044A1 US13/138,362 US200913138362A US2011292044A1 US 20110292044 A1 US20110292044 A1 US 20110292044A1 US 200913138362 A US200913138362 A US 200913138362A US 2011292044 A1 US2011292044 A1 US 2011292044A1
- Authority
- US
- United States
- Prior art keywords
- picture
- depth
- video
- video picture
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/122—Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/158—Switching image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
- H04N19/463—Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Definitions
- Implementations are described that relate to coding systems. Various particular implementations relate to depth map coding.
- depth maps are used to render virtual views.
- the distortion in depth maps (as compared to the ground truth depth, which is the accurate and actual depth) may result in degradation in the visual quality of rendered views.
- the conventional rate-distortion scheme cannot necessarily provide a direct measurement of rendering quality.
- a portion of a first depth picture that is to be coded is accessed.
- the portion of the first depth picture has one or more depth values for a corresponding portion of a first video picture. It is determined that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture. Based on the determination that differences were small enough, the portion of the first depth picture is coded using an indicator.
- the indicator instructs a decoder (i) to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and (ii) to use the portion of the second depth picture for the portion of the first depth picture.
- a video signal or a video signal structure includes a depth picture coding section including coding of a portion of a first depth picture.
- the video signal or video signal structure also includes a video picture coding section including coding of a portion of a first video picture.
- the first video picture corresponds to the first depth picture.
- the video signal or video signal structure further includes an indicator section including coding of at least a single indicator that instructs the decoder to perform at least the following two operations.
- the first operation is to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture.
- the portion of the second depth picture is collocated with the potion of the first depth picture.
- the second operation is to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture.
- the portion of the second video picture is collocated with the potion of the first video picture, and the second video picture corresponds to the second depth picture.
- a coding of a portion of a first depth picture is accessed.
- a coding of a portion of a first video picture is accessed.
- the first video picture corresponds to the first depth picture.
- At least a single indicator is accessed.
- the single indicator instructs the decoder to perform at least the following two operations.
- the first operation is to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture.
- the portion of the second depth picture is collocated with the potion of the first depth picture.
- the second operation is to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture.
- the portion of the second video picture is collocated with the potion of the first video picture, and the second video picture corresponds to the second depth picture.
- the portion of the second depth picture is decoded by using the portion of the first depth picture for the portion of the second depth picture.
- the portion of the second video picture is decoded by using the portion of the first video picture for the portion of the second video picture.
- implementations may be configured or embodied in various manners.
- an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
- apparatus such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
- FIG. 1 is a diagram of an implementation of an encoder.
- FIG. 2 is a diagram of an implementation of a decoder.
- FIG. 3 is a block diagram of an implementation of a video transmitter.
- FIG. 4 is a block diagram of an implementation of a video receiver.
- FIG. 5 is an example of a depth map.
- FIG. 6 is another example of a depth map.
- FIG. 8 is a diagram of an implementation of an encoding process.
- FIG. 9 is a diagram of an implementation of another encoding process.
- FIG. 10 is a diagram of an implementation of yet another encoding process.
- FIG. 11 is a diagram of an implementation of still another encoding process.
- FIG. 12 is a diagram of an implementation of a decoding process.
- FIG. 13 shows a relationship between translation and distortion for a video sequence depicting a newspaper.
- FIG. 14 shows a relationship between translation and distortion for a video sequence depicting breakdancers.
- DIBR depth maps are used to render virtual views.
- the distortion in depth maps (as compared to the ground truth depth) may result in degradation in the visual quality of rendered views.
- the conventional rate-distortion scheme cannot necessarily provide a direct measurement of rendering quality.
- errors in the estimated depth maps could increase the bitrate when the estimated depth maps are coded using conventional methods.
- the estimation error usually occurs in flat (homogenous) regions as there are insufficient stereo matching feature points associated with such regions.
- the inventors encode a depth map corresponding to a first video view.
- the depth map is encoded, however, not based on the distortion of the depth map encoding. Rather, the depth map is encoded based on the distortion in a video view that is different from the first video view and that is rendered from the first video view and the depth map.
- the encoding of the depth map uses a rate-distortion procedure, optimized over various coding modes and other coding options, in which the distortion is the distortion that results in the rendered video view.
- the inventors have recognized that when encoding depth maps, the conventional rate-distortion scheme cannot necessarily provide a direct measurement of rendering quality. This is because, at least, the conventional rate-distortion scheme uses the distortion of the depth map itself, and not a distortion of a rendered picture. Therefore, in at least one implementation, a new distortion metric is proposed, based on the characteristics of the distortion in a depth map and its effect on the rendered views, such that the mode decision can be optimized according to the rendering quality.
- the proposed methods and apparatus described herein are expected to improve one or more of the visual quality of the rendered views and the depth map coding efficiency.
- FIG. 1 shows an exemplary video encoder 100 to which the present principles may be applied, in accordance with an embodiment of the present principles.
- the video encoder 100 includes a combiner 105 having an output connected in signal communication with an input of a transformer 110 .
- An output of the transformer 110 is connected in signal communication with an input of a quantizer 115 .
- An output of the quantizer 115 is connected in signal communication with an input of an entropy coder 120 and an input of an inverse quantizer 125 .
- An output of the inverse quantizer 125 is connected in signal communication with an input of an inverse transformer 130 .
- An output of the inverse transformer 130 is connected in signal communication with a first non-inverting input of a combiner 135 .
- An output of the combiner 135 is connected in signal communication with an input of an intra predictor 145 and an input of a deblocking filter 140 .
- An output of the deblocking filter 140 is connected in signal communication with an input of a depth reference buffer 150 .
- An output of the depth reference buffer 150 is connected in signal communication with a first input of a displacement (motion/disparity) compensator 160 and a first input of a displacement (motion/disparity) estimator 155 .
- An output of the displacement (motion/disparity) estimator 155 is connected in signal communication with a second input of the displacement (motion/disparity) compensator 160 .
- An output of the displacement (motion/disparity) compensator 160 is connected in signal communication with a first input of a switch 165 .
- An output of the intra predictor 145 is connected in signal communication with a second input of the switch 165 .
- An output of the switch 165 is connected in signal communication with an inverting input of the combiner 105 .
- a second output of a mode decision module 170 is connected in signal communication with the switch 165 , for providing a select signal selecting between the first and the second input of the switch 165 .
- An output of a video parameters computer 122 is connected in signal communication with a second input of the mode decision module 170 and a third input of the displacement (motion/disparity) estimator 155 .
- a non-inverting input of the combiner 105 , a third input of the displacement (motion/disparity) compensator 160 , and a second input of the displacement (motion/disparity) estimator 155 are each available as inputs of the video encoder 100 , for receiving depth sequences.
- a first input of the mode decision module 170 and a fourth input of a displacement (motion/disparity) estimator 155 are available as inputs of the video encoder 100 , for receiving camera parameters.
- An input of the video parameters computer 122 is available as an input of the video encoder 100 , for receiving video sequences.
- a first output of the mode decision module 170 and an output of the entropy coder 120 are available as outputs of the video encoder 100 , for outputting a bitstream.
- the mode decision module 170 differs from a conventional video encoder in at least the following manners.
- the mode decision module 170 will input camera parameters and video frames to compute k and n in Equation (5) and Equation (6) such that a new distortion measurement can be calculated for the mode decision.
- the mode decision module 170 will also use video frames to determine if the current macroblock should be forced to be encoded using skip mode.
- the mode decision module 170 will calculate the new distortion measurement using camera parameters and video frames, and also check if the current macroblock should be forced to be encoded using skip mode.
- these functions may be performed, at least in part, by a block other than the mode decision module 170 , and even by a block(s) not shown in FIG. 1 .
- another implementation performs these functions in a processing device (not shown in FIG. 1 ) that controls the encoder of FIG. 1 .
- the control of FIG. 1 includes, in this implementation, providing the inputs to the encoder of FIG. 1 , accessing the outputs from the encoder of FIG. 1 , and controlling the timing and signal flow of the encoder of FIG. 1 .
- the mode decision module 170 may be implemented in, for example, a general purpose computer, or a function-specific video encoder.
- a computer or encoder may include hardware, firmware, and/or software that is programmed to perform one or more of the algorithms of FIGS. 8-11 or any other algorithm provided in this application.
- FIG. 1 shows one implementation. Other implementations are contemplated. For example, another implementation does not have separate inputs on one or more of the blocks of FIG. 1 . Rather, a single input is used to receive multiple signals.
- mode decision module 170 may have only a single input. The single input receives the camera parameters, and also receives the video information from video parameters computer 122 . Further, another implementation of mode decision module 170 only has a single output that provides both the bitstream and the select signal.
- the encoder 100 may have a single input for receiving depth sequences, and then route those depth sequences to each of combiner 105 , compensator 160 , and estimator 155 .
- FIG. 2 shows an exemplary video decoder 200 to which the present principles may be applied, in accordance with an embodiment of the present principles.
- the video decoder 200 includes a bitstream receiver 205 having an output connected in signal communication with an input of a bitstream parser 210 .
- a first output of the bitstream parser 210 is connected in signal communication with an input of an entropy decoder 215 , for providing, for example, a residue bitstream.
- An output of the entropy decoder 215 is connected in signal communication with an input of an inverse quantizer 220 .
- An output of the inverse quantizer 220 is connected in signal communication with an input of an inverse transformer 225 .
- An output of the inverse transformer 225 is connected in signal communication with a first non-inverting input of a combiner 230 .
- An output of the combiner 230 is connected in signal communication with an input of a deblocking filter 235 and an input of an intra predictor 240 .
- An output of the deblocking filter 235 is connected in signal communication with an input of a depth reference buffer 250 .
- An output of the depth reference buffer 250 is connected in signal communication with a first input of a displacement (motion/disparity) compensator 255 .
- An output of the displacement (motion/disparity) compensator 255 is connected in signal communication with a second input of a switch 245 .
- An output of the intra predictor 240 is connected in signal communication with a first input of the switch 245 .
- An output of the switch 245 is connected in signal communication with a second non-inverting input of the combiner 230 .
- An output of a mode module 260 is connected in signal communication with the switch 245 , for providing a select signal selecting between the first and the second input of the switch 245 .
- a second output of the bitstream parser 210 is connected in signal communication with an input of the mode module 260 , for providing, for example, control syntax for determining the select signal.
- a third output of the bitstream parser 210 is connected in signal communication with a second input of the displacement (motion/disparity) compensator 255 , for providing, for example, a displacement (motion or disparity) vector.
- An input of the bitstream receiver 205 is available as an input of the video decoder 200 , for receiving a bitstream.
- An output of the deblocking filter 235 is available as an output of the video decoder 200 , for outputting a depth map.
- FIG. 3 shows an exemplary video transmission system 300 , to which the present principles may be applied, in accordance with an implementation of the present principles.
- the video transmission system 300 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
- the transmission may be provided over the Internet or some other network.
- the video transmission system 300 is capable of generating and delivering video content encoded using, for example, skip mode with depth, or encoded using one or more of various other modes or techniques. This is achieved by generating an encoded signal(s) including depth information or information capable of being used to generate (including, for example, reconstructing) the depth information at a receiver end that may, for example, have a decoder.
- the video transmission system 300 includes an encoder 310 and a transmitter 320 capable of transmitting the encoded signal.
- the encoder 310 receives video information and generates an encoded signal(s) there from using skip mode with depth.
- the encoder 310 may be, for example, the encoder 100 described in detail above.
- the transmitter 320 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers.
- the transmitter may include, or interface with, an antenna (not shown). Accordingly, implementations of the transmitter 320 may include, or be limited to, a modulator.
- FIG. 4 shows an exemplary video receiving system 400 to which the present principles may be applied, in accordance with an embodiment of the present principles.
- the video receiving system 400 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
- the signals may be received over the Internet or some other network.
- the video receiving system 400 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
- the video receiving system 400 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
- the video receiving system 400 is capable of receiving and processing video content including video information.
- the video receiving system 400 includes a receiver 410 capable of receiving an encoded signal, such as for example the signals described in the implementations of this application, and a decoder 420 capable of decoding the received signal.
- the receiver 410 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal.
- the receiver 410 may include, or interface with, an antenna (not shown). Implementations of the receiver 410 may include, or be limited to, a demodulator.
- the decoder 420 outputs video signals including video information and depth information.
- the decoder 420 may be, for example, the decoder 400 described in detail above.
- New data formats including both video and the corresponding depth maps such as multi-view plus depth (MVD), enable new video applications such as three-dimensional television (3DTV) and free-viewpoint video (FVV).
- FIG. 5 shows an example of a depth map 500 to which the present principles may be applied, in accordance with an embodiment of the present principles.
- FIG. 6 shows an example of another depth map 600 to which the present principles may be applied, in accordance with an embodiment of the present principles.
- These gray-level depth maps represent depth Z within the range between Z near and Z far .
- the pixel value d is calculated as the following:
- depth maps are estimated instead of captured.
- Intermediate video views can be generated using the techniques of DIBR, which takes the transmitted/stored video views (reference views) and the corresponding depth maps as input.
- DIBR the transmitted/stored video views
- This temporal variation in the depth map not only increases the bitrate to encode depth, it also causes artifacts in the rendered views.
- the framework 700 involves an auto-stereoscopic 3D display 710 , which supports output of multiple views, a first depth image-based renderer 720 , a second depth image-based renderer 730 , and a buffer for decoded data 740 .
- the decoded data is a representation known as Multiple View plus Depth (MVD) data.
- the nine cameras are denoted by V 1 through V 9 .
- Corresponding depth maps for the three input views are denoted by D 1 , D 5 , and D 9 .
- Any virtual camera positions in between the captured camera positions e.g., Pos 1 , Pos 2 , Pos 3
- D 1 , D 5 , D 9 can be generated using the available depth maps (D 1 , D 5 , D 9 ), as shown in FIG. 7 .
- the baseline between the actual cameras (V 1 , V 5 and V 9 ) used to capture data can be large.
- the correlation between these cameras is significantly reduced and coding efficiency of these cameras may suffer since the coding efficiency may only rely on temporal correlation.
- Encoding video and depth may allow for the efficient transmission and/or storage of the above described new data formats. Such encoding represents additional challenges as we should consider not only the conventional rate-distortion performance, but also the quality of DIBR rendered views.
- new depth map coding methods are presented, which improve coding efficiency and subjective quality of the rendered views, with a new mode selection scheme based on a new distortion metric and/or video information analysis.
- the new distortion metric is derived from the relationship between the distortion of the compressed depth map and the distortion of the DIBR rendered views, which depends on, for example, camera parameters and video characteristics such as, for example, the amount of texture and flat regions.
- a skip mode (such as in the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”)) can be selected by analyzing video information.
- This new skip scheme leads to a reduction of a flickering artifact caused by temporal variation in a depth map.
- a rate distortion optimized mode selection scheme has been used in video coding to select the best coding mode to achieve high quality decoded video with reduced bitrate.
- the best mode is selected which results in a minimum value of L:
- D distortion
- ⁇ the Lagrangian multiplier
- R bitrate
- the distortion is usually calculated as the sum of squared difference (SSD) between the original pixel values and the reconstructed pixel values.
- L can be referred to as a cost, or a coding cost for a given coding mode, and the minimization may occur over all available coding mode costs.
- a “rendered” picture is the same as a “synthesized” picture throughout this application.
- Rendering/synthesizing refers generally to the process of using one or more existing pictures from one or more views to create a new picture from a different view. The new picture reflects what the subject matter of the one or more existing pictures would look like if viewed from the “different” view.
- Rendering/synthesizing may be performed using, for example, DIBR.
- DIBR may include functions such as, for example, warping, hole filling, and blending.
- the derivation of new distortion based on rendering quality is a two-step process.
- the first step models how the distortion ⁇ d in the compressed depth map leads to displacement ⁇ P in the rendering position (rendering position error).
- the second step models the relationship between rendering position error ⁇ P and the distortion in the rendered views.
- Embodiment 1 and 2 below provide two exemplary implementations to derive the new distortion metric for depth coding.
- ⁇ x is the horizontal distance between two cameras (also called the baseline distance)
- Z near and Z far correspond to d with pixel values 255 and 0 in the depth maps, respectively.
- k is determined by intrinsic camera parameter a, extrinsic camera parameter ⁇ x, and the values of Z near and Z far , as follows:
- D SSD and t x in (8) are the vectors formed by D SSD ( ⁇ ) and t x , respectively, and T denotes the vector transpose operand.
- this variable can be scaled using the same weight the rendering process put on the view. For example, we want to render V render using V left , V right and their corresponding depth map sequences with the following weight ⁇ :
- V render ⁇ V left +(1 ⁇ ) ⁇ V right , (9)
- the new distortion metric D render can be derived as follows:
- the global video parameter n is used for the entire sequence.
- Other embodiments however, update the parameter whenever there is a scene change since the content characteristic may change.
- the encoder will determine when to calculate/update n.
- the estimation of n can be performed more than once for a given sequence.
- FIG. 8 shows an exemplary video coding method 800 , in accordance with an embodiment of the present principles.
- Method 800 corresponds to the above described embodiment 1.
- a distortion weight k is calculated using a camera parameter.
- a loop over depth maps begins.
- a global parameter n is calculated using video characteristics.
- the parameter n is scaled based on the weights applied to a number of views used to render an intermediate view.
- a depth map is encoded with a new distortion metric as per Equation (12) using the same n in all macroblocks (MBs).
- the loop over the depth maps is terminated.
- loop over the depth maps from steps 815 to 840 refers to a standard looping process.
- a “for” loop that is performed once for each depth map.
- step 830 may be performed in various ways. It is common, in step 830 , to perform a separate rate-distortion optimization procedure for each macroblock of a given depth map. Other implementations do not perform any optimization, or perform an optimization over limited modes, or perform a single optimization over the whole depth map that effectively selects a single mode for encoding all blocks of the depth map.
- the above derivation refers for the translation of a rendered picture.
- the translated video picture may be, for example, an original or a reconstructed picture.
- the use of a non-rendered picture is an approximation. However, given the large similarity between the non-rendered picture and a picture that is rendered based on the non-rendered picture, the approximation generally has a high degree of accuracy. Other implementations do indeed use a rendered view in the determination of “n”.
- Embodiment 1 camera parameters and a global parameter n which models video characteristics were used to construct a new distortion metric for depth coding.
- the characteristics of a frame can differ by local areas within the frame. For example, homogenous floor area, a complicated region such as a human face, and distinct object boundaries, will have very different pixel distributions when we apply translations.
- An exhaustive approach to adapt local variation can be developed, for example, by calculating parameter n for each block to obtain a more precise result. However, this will increase computational complexity.
- the camera parameters are used the same way as in embodiment 1, for modeling video characteristics, it is proposed to first partition a frame into regions with different characteristics and estimate multiple values of n using similar techniques described in Embodiment 1. Then when encoding a macroblock (MB), the distortion will be calculated based on the n value estimated for the region this MB belongs to.
- MB macroblock
- n may be obtained for various regions.
- This set may be used for an a single picture, or multiple pictures such as, for example, an entire sequence.
- the estimated values of multiple local n can be scaled according to the weights applied to the rendering, such as in Equations (9) and (10). Furthermore, estimating a new set of local n based on scene change as described in embodiment 1 can also be applied to this embodiment 2.
- FIG. 9 shows an exemplary video coding method 900 , in accordance with another embodiment of the present principles.
- Method 900 corresponds to the above described embodiment 2.
- a distortion weight k is calculated using a camera parameter.
- a loop over depth maps begins.
- step 925 a current video frame is partitioned into regions with different characteristics.
- a parameter n is calculated for each region as partitioned.
- the local parameter n are scaled based on the weights applied to a number views used to render an intermediate view.
- the current depth map is encoded by calculating, for each macroblock in the depth map, a new distortion metric as per Equation (12) according to the n estimated for the region in which the macroblock is included.
- the loop over the depth maps is terminated.
- the regions may be determined in various known manners. Regions may be, for example, texture based, or object based. In one implementation, the regions are determined based on the variance of macroblocks in a picture. More specifically, each macroblock in a picture is examined to determine if the variance of that block is above a threshold value. All blocks with variance above the threshold value are grouped into a first region, and all other blocks are grouped into a second region. This variance determination process may be applied to the luminance and/or chrominance blocks.
- the mode decision process can be improved by considering temporal information of a video sequence.
- the depth map will be less reliable, and may not be worth spending many bits to code the false variations due to depth estimation errors. This may be true, regardless of what “distortion” measure the encoding may be optimized over.
- the approach of one implementation is to compare video frames at different timestamps, and locate areas with very small changes, which usually corresponds to flat areas or static background areas. Since there are very small changes in these areas, there also should be very small changes in their corresponding depth values. If there are changes in depth values, it is likely due to errors in depth estimation, and can be regarded as distortions. The determination that the video has very small changes can be made in various ways.
- the video difference is computed as a sum of absolute differences of the relevant macroblocks, and the average of the sum is compared to a threshold.
- the threshold may be, for example, a fixed number for all blocks, or may be proportional to average intensity of one or more of the video blocks.
- the coding mode for one or more of the relevant video blocks is consulted to determine if the video difference is small enough.
- regions with very small changes may be coded using skip mode (such as in the MPEG-4 AVC Standard). Therefore, if skip mode has been selected for the video block, then the video differences are small. As a result, these areas in the depth map can also be coded using skip mode regardless of the false temporal variation.
- skip mode is applied to corresponding regions in separate depth maps that do not necessarily have very small changes. Indeed, the corresponding depth portions/regions may have differences large enough that a skip mode would not be selected based on the depth portions themselves.
- the proposed method provides a simple and efficient solution by using local video-characteristics to correct possible errors in a depth map.
- the proposed method results in improved subjective quality by reducing a flickering artifact due to temporal variation in the depth map.
- FIG. 10 shows an exemplary video coding method 1000 , in accordance with yet another embodiment of the present principles.
- Method 1000 corresponds to the above described embodiment 3.
- the loop over depth map begins.
- the loop over MBs within the depth map begins.
- video data is analyzed. For example, step 1020 may check whether or not the corresponding video macroblock is coded using skip mode and/or check the change in the video macroblock (as compared to a corresponding macroblock in a temporally distinct video macroblock) and compare against a change threshold.
- step 1025 it is determined whether or not the current macroblock should be coded using skip mode based on the analysis result from 1020 . If so, then the method proceeds to step 1030 . Otherwise, the method proceeds to step 1035 .
- step 1030 the current macroblock is forced to be encoded as skip mode.
- step 1035 the current macroblock is encoded with a conventional method.
- step 1040 it is determined whether or not the current macroblock is the last macroblock in the depth map. If so, then the method proceeds to step 1045 . Otherwise, the method returns to step 1010 for the next macroblock. At step 1045 , the loop over the depth map macroblocks is terminated.
- step 1050 it is determined whether or not the current depth map is the last one in the depth sequence. If so, then the method proceeds to step 1055 . Otherwise, the method returns to step 1005 for the next depth map. At step 1055 , the loop over the depth maps is terminated.
- the new distortion metric in embodiment 1 and embodiment 2 can be combined with the technique in one of the other embodiments, such as, for example, Embodiment 3, to achieve a higher coding efficiency and rendering quality.
- FIG. 11 shows an exemplary video coding method 1100 , in accordance with still another embodiment of the present principles.
- Method 1100 corresponds to the above described embodiment 4.
- a distortion weight k is calculated using a camera parameter.
- a loop over depth maps begins.
- one or more parameters n are calculated using video characteristics.
- the parameters n are scaled based on the weights applied to a number of views used to render an intermediate view.
- a loop over macroblocks within the current depth map begins.
- video data is analyzed (e.g., check whether or not the corresponding video macroblock is coded using skip mode and/or check the change in video macroblock and compare against a change threshold).
- the current macroblock is forced to be encoded as skip mode.
- the current macroblock is encoded with the new distortion metric calculated using k and n, for example, as per Equation (12). Step 1145 could be performed using local or global parameter(s) n.
- step 1150 it is determined whether or not the current macroblock is the last macroblock in the depth map. If so, then the method proceeds to step 1155 . Otherwise, the method returns to step 1130 for the next macroblock. At step 1155 , the loop over the depth map macroblocks is terminated. At step 1160 , it is determined whether or not the current depth map is the last one in the depth sequence. If so, then the method proceeds to step 1165 . Otherwise, the method returns to step 1115 for the next depth map. At step 1165 , the loop over the depth maps is terminated.
- FIG. 12 shows an exemplary video decoding method 1200 , in accordance with yet another embodiment of the present principles.
- Method 1200 corresponds to any of the above described embodiments 1 through 4. That is, method 1200 may be used to decode an encoding produced by any of embodiments 1 through 4.
- a loop is performed over macroblocks.
- syntax is parsed.
- a predictor of the current macroblock is obtained.
- a residue of the current block is calculated.
- the residue is added to the predictor.
- the loop is terminated.
- deblock filtering is performed.
- the new distortion metric and the skip mode selection scheme have been simulated using several multi-view test sequences. For each sequence, both video and depth map are encoded for two selected views. The decoded video and depth map are used to render an intermediate view between the two views.
- k is calculated using the camera setting parameters for each sequence. Then n is found as described herein by estimating the effect of displacements in the first frame of the video sequence.
- the results of k, global n (embodiment 1) and BD-PSNR (“Bjontegaard Difference in Peak Signal-to-Noise Ratio) is given in Table 1. Note that each multi-view sequence is acquired in a different camera setting, which would affect the amount of geometry error differently, and this difference is well reflected in k. For the outdoor scene sequences Z far is large, thus Z near is the dominant parameter to decide k when the camera distance and focal length are similar.
- the video is coded using the MPEG-4 AVC Standard (joint model (JM) reference software version 13.2), and the depth map is coded using the MPEG-4 AVC Standard with and without the proposed methods.
- JM joint model
- the same encoding configuration is used for the video and depth maps including the QP values of 24, 28, 32, and 36, and the Lagrange multiplier values, and only I-slices and P-slices are used to code 15 depth maps for each view.
- picture refers to either a frame or field. Additionally, throughout this application, wherever the term “frame” is used, alternate implementations may be devised for a field or, more generally, for a picture.
- coding gain refers to one or more of the following: for a given coding bitrate, the reduction in rendering distortion, measured in terms of, for example, SSD; or for a given rendering distortion (measured in SSD, for example), the reduction in coding bitrate.
- the phrase “distortion in rendered video” refers to a distortion between the video rendered using compressed depth and the video rendered using uncompressed depth.
- the actual distortion value may be determined in various ways, and using various measures.
- the distortion value may be determined using SSD as the distortion measure.
- skip mode refers to the SKIP mode as specified in MPEG-4 AVC Standard. That is, in SKIP mode there is no prediction residue and no motion vector to be transmitted.
- the reconstructed block is obtained by simply copying the corresponding block in previously encoded pictures.
- the block correspondence is identified by simply using the predicted motion vector obtained using motion information in neighboring blocks.
- n it is preferred to apply “global” and “localized” values of n to other frames and/or regions, so that there is some utility from the work of calculating n. That is, an encoder might calculate n for a frame or a portion of a frame, and then merely use that n for that frame/portion, but this may not be very efficient. It may be better to use that n for, for example, (i) all the other frames in that sequence, or (ii) all the other portions that are similar (e.g. “sky” portions), and so forth.
- a “portion” of a picture refers to all or part of the picture.
- a portion may be, for example, a pixel, a macroblock, a slice, a frame, a field, a picture, a region bounding an object in the picture, the foreground of the picture, the background of the object, or a particular set of (x,y) coordinates in the picture.
- a translation of a portion may refer, for example, to a translation of a particular set of (x,y) coordinates.
- a portion may include the pixel at location (x1, y1), and a translation (represented by “T”) of the portion may include the pixel at location (x1 ⁇ T, y1).
- an encoder performs the following operations to determine n for a given depth map:
- references to “video” may include any of various video components or their combinations. Such components, or their combinations, include, for example, luminance, chrominance, Y (of YUV or YCbCr or YPbPr), U (of YUV), V (of YUV), Cb (of YCbCr), Cr (of YCbCr), Pb (of YPbPr), Pr (of YPbPr), red (of RGB), green (of RGB), blue (of RGB), S-Video, and negatives or positives of any of these components.
- Various implementations consider the video distortion that arises in rendered views. Accordingly, those rendered views may include, or be limited to, one or more of various components of video.
- any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
- such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
- This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
- Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
- a “view” may typically refer to an actual picture from that view. In other instances, however, a “view” may refer, for example, to the actual view position, or to a series of pictures from a view. The meaning will be revealed from context.
- Implementations may signal information using a variety of techniques including, but not limited to, slice headers, supplemental enhancement information (SEI) messages or other messages, other high level syntax, non-high-level syntax, out-of-band information, data-stream data, and implicit signaling. Accordingly, although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
- SEI Supplemental Enhancement Information
- implementations may be implemented in one or more of an encoder, a pre-processor to an encoder, a decoder, or a post-processor to a decoder.
- the implementations described or contemplated may be used in a variety of different applications and products.
- Some examples of applications or products include set-top boxes, cell phones, personal digital assistants (PDAs), televisions, personal recording devices (for example, PVRs, computers running recording software, VHS recording devices), camcorders, streaming of data over the Internet or other communication links, and video-on-demand.
- the implementations described herein may be implemented in, for example, a method or a process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
- An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
- the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
- PDAs portable/personal digital assistants
- Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding.
- equipment include video coders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cell phones, PDAs, and other communication devices.
- the equipment may be mobile and even installed in a mobile vehicle.
- the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory (“RAM”), or a read-only memory (“ROM”).
- the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two.
- a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium having instructions for carrying out a process.
- implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
- the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
- a signal may be formatted to carry as data instructions for performing one of the depth map encoding techniques described in this application or to carry the actual encoding of the depth map.
- Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
- the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
- the information that the signal carries may be, for example, analog or digital information.
- the signal may be transmitted over a variety of different wired or wireless links, as is known. Further, the signal may be stored on a processor-readable medium.
Abstract
Several implementations relate to depth map coding. In one implementation, it is determined that differences between collocated video blocks are small enough to be interchanged. Based on that determination, a depth block corresponding to a first of the video blocks is coded using an indicator that instructs a decoder to use a collocated depth block, corresponding to a second of the video blocks, in place of the depth block. In another implementation, a video signal includes a coding of at least a single indicator that instructs a decoder to decode both a depth block and a corresponding video block using collocated blocks, from other pictures, in place of the depth block and the corresponding video block. In another implementation, the depth block and the corresponding video block are decoded, based on the single indicator, using the collocated blocks in place of the depth block and the corresponding video block.
Description
- This application claims the benefit of the following: (1) U.S. Provisional Application Ser. No. 61/207,532, filed Feb. 13, 2009, and titled “Depth Map Coding”, (2) U.S. Provisional Application Ser. No. 61/207,892, filed Feb. 18, 2009, and titled “Depth Map Distortion Analysis for View Rendering and Depth Coding”, (3) U.S. Provisional Application Ser. No. 61/271,053, filed Jul. 16, 2009, and titled “Coding for Depth Maps”, and (4) U.S. Provisional Application Ser. No. 61/269,501, filed Jun. 25, 2009, and titled “Depth Map Coding”. Each of these applications is incorporated by reference herein in its entirety.
- Implementations are described that relate to coding systems. Various particular implementations relate to depth map coding.
- In depth image based rendering (DIBR), depth maps are used to render virtual views. The distortion in depth maps (as compared to the ground truth depth, which is the accurate and actual depth) may result in degradation in the visual quality of rendered views. When encoding depth maps, the conventional rate-distortion scheme cannot necessarily provide a direct measurement of rendering quality.
- According to a general aspect, a portion of a first depth picture that is to be coded is accessed. The portion of the first depth picture has one or more depth values for a corresponding portion of a first video picture. It is determined that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture. Based on the determination that differences were small enough, the portion of the first depth picture is coded using an indicator. The indicator instructs a decoder (i) to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and (ii) to use the portion of the second depth picture for the portion of the first depth picture.
- According to another general aspect, a video signal or a video signal structure includes a depth picture coding section including coding of a portion of a first depth picture. The video signal or video signal structure also includes a video picture coding section including coding of a portion of a first video picture. The first video picture corresponds to the first depth picture. The video signal or video signal structure further includes an indicator section including coding of at least a single indicator that instructs the decoder to perform at least the following two operations. The first operation is to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture. The portion of the second depth picture is collocated with the potion of the first depth picture. The second operation is to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture. The portion of the second video picture is collocated with the potion of the first video picture, and the second video picture corresponds to the second depth picture.
- According to another general aspect, a coding of a portion of a first depth picture is accessed. A coding of a portion of a first video picture is accessed. The first video picture corresponds to the first depth picture. At least a single indicator is accessed. The single indicator instructs the decoder to perform at least the following two operations. The first operation is to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture. The portion of the second depth picture is collocated with the potion of the first depth picture. The second operation is to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture. The portion of the second video picture is collocated with the potion of the first video picture, and the second video picture corresponds to the second depth picture. The portion of the second depth picture is decoded by using the portion of the first depth picture for the portion of the second depth picture. The portion of the second video picture is decoded by using the portion of the first video picture for the portion of the second video picture.
- The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
-
FIG. 1 is a diagram of an implementation of an encoder. -
FIG. 2 is a diagram of an implementation of a decoder. -
FIG. 3 is a block diagram of an implementation of a video transmitter. -
FIG. 4 is a block diagram of an implementation of a video receiver. -
FIG. 5 is an example of a depth map. -
FIG. 6 is another example of a depth map. -
FIG. 7 is a diagram of an implementation of a framework for generating nine output views (N=9) out of 3 input views with depth (K=3). -
FIG. 8 is a diagram of an implementation of an encoding process. -
FIG. 9 is a diagram of an implementation of another encoding process. -
FIG. 10 is a diagram of an implementation of yet another encoding process. -
FIG. 11 is a diagram of an implementation of still another encoding process. -
FIG. 12 is a diagram of an implementation of a decoding process. -
FIG. 13 shows a relationship between translation and distortion for a video sequence depicting a newspaper. -
FIG. 14 shows a relationship between translation and distortion for a video sequence depicting breakdancers. - As mentioned earlier, in DIBR depth maps are used to render virtual views. The distortion in depth maps (as compared to the ground truth depth) may result in degradation in the visual quality of rendered views. When encoding depth maps, the conventional rate-distortion scheme cannot necessarily provide a direct measurement of rendering quality. Furthermore, errors in the estimated depth maps could increase the bitrate when the estimated depth maps are coded using conventional methods. The estimation error usually occurs in flat (homogenous) regions as there are insufficient stereo matching feature points associated with such regions.
- In at least one implementation, the inventors encode a depth map corresponding to a first video view. The depth map is encoded, however, not based on the distortion of the depth map encoding. Rather, the depth map is encoded based on the distortion in a video view that is different from the first video view and that is rendered from the first video view and the depth map. In particular, the encoding of the depth map uses a rate-distortion procedure, optimized over various coding modes and other coding options, in which the distortion is the distortion that results in the rendered video view.
- In at least one implementation, we propose to utilize camera parameters in coding depth maps. In at least one implementation, we propose to utilize video information in coding depth maps. In at least one implementation, we use such video coding information to code the depth maps such that the negative effect of erroneous depth values can be reduced.
- The inventors have recognized that when encoding depth maps, the conventional rate-distortion scheme cannot necessarily provide a direct measurement of rendering quality. This is because, at least, the conventional rate-distortion scheme uses the distortion of the depth map itself, and not a distortion of a rendered picture. Therefore, in at least one implementation, a new distortion metric is proposed, based on the characteristics of the distortion in a depth map and its effect on the rendered views, such that the mode decision can be optimized according to the rendering quality.
- The proposed methods and apparatus described herein are expected to improve one or more of the visual quality of the rendered views and the depth map coding efficiency.
-
FIG. 1 shows anexemplary video encoder 100 to which the present principles may be applied, in accordance with an embodiment of the present principles. Thevideo encoder 100 includes acombiner 105 having an output connected in signal communication with an input of atransformer 110. An output of thetransformer 110 is connected in signal communication with an input of aquantizer 115. An output of thequantizer 115 is connected in signal communication with an input of anentropy coder 120 and an input of aninverse quantizer 125. An output of theinverse quantizer 125 is connected in signal communication with an input of aninverse transformer 130. An output of theinverse transformer 130 is connected in signal communication with a first non-inverting input of acombiner 135. An output of thecombiner 135 is connected in signal communication with an input of anintra predictor 145 and an input of adeblocking filter 140. An output of thedeblocking filter 140 is connected in signal communication with an input of adepth reference buffer 150. An output of thedepth reference buffer 150 is connected in signal communication with a first input of a displacement (motion/disparity)compensator 160 and a first input of a displacement (motion/disparity)estimator 155. An output of the displacement (motion/disparity)estimator 155 is connected in signal communication with a second input of the displacement (motion/disparity)compensator 160. An output of the displacement (motion/disparity)compensator 160 is connected in signal communication with a first input of aswitch 165. An output of theintra predictor 145 is connected in signal communication with a second input of theswitch 165. An output of theswitch 165 is connected in signal communication with an inverting input of thecombiner 105. A second output of amode decision module 170 is connected in signal communication with theswitch 165, for providing a select signal selecting between the first and the second input of theswitch 165. An output of avideo parameters computer 122 is connected in signal communication with a second input of themode decision module 170 and a third input of the displacement (motion/disparity)estimator 155. - A non-inverting input of the
combiner 105, a third input of the displacement (motion/disparity)compensator 160, and a second input of the displacement (motion/disparity)estimator 155 are each available as inputs of thevideo encoder 100, for receiving depth sequences. A first input of themode decision module 170 and a fourth input of a displacement (motion/disparity)estimator 155 are available as inputs of thevideo encoder 100, for receiving camera parameters. An input of thevideo parameters computer 122 is available as an input of thevideo encoder 100, for receiving video sequences. A first output of themode decision module 170 and an output of theentropy coder 120 are available as outputs of thevideo encoder 100, for outputting a bitstream. - It is to be appreciated that at least the
mode decision module 170 differs from a conventional video encoder in at least the following manners. Forembodiment 1 andembodiment 2 described herein after, themode decision module 170 will input camera parameters and video frames to compute k and n in Equation (5) and Equation (6) such that a new distortion measurement can be calculated for the mode decision. For embodiment 3 described herein, besides the conventional mode decision rules, themode decision module 170 will also use video frames to determine if the current macroblock should be forced to be encoded using skip mode. Forembodiment 4, themode decision module 170 will calculate the new distortion measurement using camera parameters and video frames, and also check if the current macroblock should be forced to be encoded using skip mode. In other implementations, these functions may be performed, at least in part, by a block other than themode decision module 170, and even by a block(s) not shown inFIG. 1 . For example, another implementation performs these functions in a processing device (not shown inFIG. 1 ) that controls the encoder ofFIG. 1 . The control ofFIG. 1 includes, in this implementation, providing the inputs to the encoder ofFIG. 1 , accessing the outputs from the encoder ofFIG. 1 , and controlling the timing and signal flow of the encoder ofFIG. 1 . - The
mode decision module 170, or a processing device that controls the encoder ofFIG. 1 , may be implemented in, for example, a general purpose computer, or a function-specific video encoder. Such a computer or encoder may include hardware, firmware, and/or software that is programmed to perform one or more of the algorithms ofFIGS. 8-11 or any other algorithm provided in this application. -
FIG. 1 shows one implementation. Other implementations are contemplated. For example, another implementation does not have separate inputs on one or more of the blocks ofFIG. 1 . Rather, a single input is used to receive multiple signals. As a specific example,mode decision module 170 may have only a single input. The single input receives the camera parameters, and also receives the video information fromvideo parameters computer 122. Further, another implementation ofmode decision module 170 only has a single output that provides both the bitstream and the select signal. Similarly, theencoder 100 may have a single input for receiving depth sequences, and then route those depth sequences to each ofcombiner 105,compensator 160, andestimator 155. -
FIG. 2 shows anexemplary video decoder 200 to which the present principles may be applied, in accordance with an embodiment of the present principles. Thevideo decoder 200 includes a bitstream receiver 205 having an output connected in signal communication with an input of abitstream parser 210. A first output of thebitstream parser 210 is connected in signal communication with an input of anentropy decoder 215, for providing, for example, a residue bitstream. An output of theentropy decoder 215 is connected in signal communication with an input of aninverse quantizer 220. An output of theinverse quantizer 220 is connected in signal communication with an input of aninverse transformer 225. An output of theinverse transformer 225 is connected in signal communication with a first non-inverting input of acombiner 230. An output of thecombiner 230 is connected in signal communication with an input of adeblocking filter 235 and an input of an intra predictor 240. An output of thedeblocking filter 235 is connected in signal communication with an input of adepth reference buffer 250. An output of thedepth reference buffer 250 is connected in signal communication with a first input of a displacement (motion/disparity)compensator 255. An output of the displacement (motion/disparity)compensator 255 is connected in signal communication with a second input of aswitch 245. An output of the intra predictor 240 is connected in signal communication with a first input of theswitch 245. An output of theswitch 245 is connected in signal communication with a second non-inverting input of thecombiner 230. An output of amode module 260 is connected in signal communication with theswitch 245, for providing a select signal selecting between the first and the second input of theswitch 245. A second output of thebitstream parser 210 is connected in signal communication with an input of themode module 260, for providing, for example, control syntax for determining the select signal. A third output of thebitstream parser 210 is connected in signal communication with a second input of the displacement (motion/disparity)compensator 255, for providing, for example, a displacement (motion or disparity) vector. An input of the bitstream receiver 205 is available as an input of thevideo decoder 200, for receiving a bitstream. An output of thedeblocking filter 235 is available as an output of thevideo decoder 200, for outputting a depth map. -
FIG. 3 shows an exemplaryvideo transmission system 300, to which the present principles may be applied, in accordance with an implementation of the present principles. Thevideo transmission system 300 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The transmission may be provided over the Internet or some other network. - The
video transmission system 300 is capable of generating and delivering video content encoded using, for example, skip mode with depth, or encoded using one or more of various other modes or techniques. This is achieved by generating an encoded signal(s) including depth information or information capable of being used to generate (including, for example, reconstructing) the depth information at a receiver end that may, for example, have a decoder. - The
video transmission system 300 includes anencoder 310 and atransmitter 320 capable of transmitting the encoded signal. Theencoder 310 receives video information and generates an encoded signal(s) there from using skip mode with depth. Theencoder 310 may be, for example, theencoder 100 described in detail above. - The
transmitter 320 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers. The transmitter may include, or interface with, an antenna (not shown). Accordingly, implementations of thetransmitter 320 may include, or be limited to, a modulator. -
FIG. 4 shows an exemplaryvideo receiving system 400 to which the present principles may be applied, in accordance with an embodiment of the present principles. Thevideo receiving system 400 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network. - The
video receiving system 400 may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, thevideo receiving system 400 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device. - The
video receiving system 400 is capable of receiving and processing video content including video information. Thevideo receiving system 400 includes areceiver 410 capable of receiving an encoded signal, such as for example the signals described in the implementations of this application, and adecoder 420 capable of decoding the received signal. - The
receiver 410 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. Thereceiver 410 may include, or interface with, an antenna (not shown). Implementations of thereceiver 410 may include, or be limited to, a demodulator. - The
decoder 420 outputs video signals including video information and depth information. Thedecoder 420 may be, for example, thedecoder 400 described in detail above. - New data formats including both video and the corresponding depth maps, such as multi-view plus depth (MVD), enable new video applications such as three-dimensional television (3DTV) and free-viewpoint video (FVV).
FIG. 5 shows an example of adepth map 500 to which the present principles may be applied, in accordance with an embodiment of the present principles.FIG. 6 shows an example of anotherdepth map 600 to which the present principles may be applied, in accordance with an embodiment of the present principles. These gray-level depth maps represent depth Z within the range between Znear and Zfar. The pixel value d is calculated as the following: -
- Thus, the nearest depth Znear will be mapped to d with
value 255 while the furthest depth Zfar will be mapped to d withvalue 0. Typically, depth maps are estimated instead of captured. Intermediate video views (virtual views) can be generated using the techniques of DIBR, which takes the transmitted/stored video views (reference views) and the corresponding depth maps as input. In the estimated depth maps, typically more errors can be observed around object boundaries and in the flat regions with less texture, where the stereo matching suffers due to occlusion and lack of matching features, respectively. This temporal variation in the depth map not only increases the bitrate to encode depth, it also causes artifacts in the rendered views. - In order to reduce the amount of data to be transmitted, a dense array of cameras (V1, V2, . . . V9) may be sub-sampled and only a sparse set of cameras actually capture the scene.
FIG. 7 shows anexemplary framework 700 for generating nine output views (N=9) out of 3 input views with depth (K=3), to which the present principles may be applied, in accordance with an embodiment of the present principles. Theframework 700 involves an auto-stereoscopic 3D display 710, which supports output of multiple views, a first depth image-basedrenderer 720, a second depth image-basedrenderer 730, and a buffer for decodeddata 740. The decoded data is a representation known as Multiple View plus Depth (MVD) data. The nine cameras are denoted by V1 through V9. Corresponding depth maps for the three input views are denoted by D1, D5, and D9. Any virtual camera positions in between the captured camera positions (e.g.,Pos 1,Pos 2, Pos 3) can be generated using the available depth maps (D1, D5, D9), as shown inFIG. 7 . As can be seen inFIG. 7 , the baseline between the actual cameras (V1, V5 and V9) used to capture data can be large. As a result, the correlation between these cameras is significantly reduced and coding efficiency of these cameras may suffer since the coding efficiency may only rely on temporal correlation. - Encoding video and depth may allow for the efficient transmission and/or storage of the above described new data formats. Such encoding represents additional challenges as we should consider not only the conventional rate-distortion performance, but also the quality of DIBR rendered views.
- In at least one implementation, new depth map coding methods are presented, which improve coding efficiency and subjective quality of the rendered views, with a new mode selection scheme based on a new distortion metric and/or video information analysis. First, the new distortion metric is derived from the relationship between the distortion of the compressed depth map and the distortion of the DIBR rendered views, which depends on, for example, camera parameters and video characteristics such as, for example, the amount of texture and flat regions. Second, since estimated depth maps are likely to include errors in the flat regions due to the insufficient stereo matching feature points, a skip mode (such as in the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4)
Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”)) can be selected by analyzing video information. This new skip scheme leads to a reduction of a flickering artifact caused by temporal variation in a depth map. - A rate distortion optimized mode selection scheme has been used in video coding to select the best coding mode to achieve high quality decoded video with reduced bitrate. Using Lagrangian optimization, the best mode is selected which results in a minimum value of L:
-
L=D+λR (2) - where D is distortion, λ is the Lagrangian multiplier, and R is bitrate. The distortion is usually calculated as the sum of squared difference (SSD) between the original pixel values and the reconstructed pixel values. L can be referred to as a cost, or a coding cost for a given coding mode, and the minimization may occur over all available coding mode costs.
- However, when this is applied to depth map coding, for many applications it will be beneficial for the distortion to reflect the distortion of the rendered views, since what is important is the quality of the view that is rendered using the depth map, and not necessarily the quality of the decoded depth map itself. Therefore, for such applications, we demonstrate how to model the relationship between the distortion of the compressed depth map and the distortion of the rendered views, such that the mode decision is optimized with respect to the rendering quality.
- Generally, a “rendered” picture is the same as a “synthesized” picture throughout this application. Rendering/synthesizing refers generally to the process of using one or more existing pictures from one or more views to create a new picture from a different view. The new picture reflects what the subject matter of the one or more existing pictures would look like if viewed from the “different” view. Rendering/synthesizing may be performed using, for example, DIBR. DIBR may include functions such as, for example, warping, hole filling, and blending.
- The derivation of new distortion based on rendering quality is a two-step process. The first step models how the distortion Δd in the compressed depth map leads to displacement ΔP in the rendering position (rendering position error). Then the second step models the relationship between rendering position error ΔP and the distortion in the rendered views. In at least one implementation, it is proposed to utilize camera parameters to link compression distortion Δd in depth map to rendering position error ΔP. In at least one implementation, it is proposed to utilize a new parameter n, which depends on video characteristics, to link ΔP to the distortion of the pixel values in the rendered views. Then this distortion metric will be used in the rate-distortion optimization in Equation (2) to obtain a new distortion metric.
Embodiment - In DIBR, with parallel cameras arranged on a horizontal line, the distortion Δd in depth map at image position (x,y) will result in a horizontal translational error ΔP in the rendered pixel position (deviation away from the correct rendering location). This rendered position error can be calculated using (1) and camera parameters, which leads to the following:
-
- where a is the focal length of the camera, δx is the horizontal distance between two cameras (also called the baseline distance), and Znear and Zfar correspond to d with
pixel values -
ΔP=k·Δd (4) - where k is determined by intrinsic camera parameter a, extrinsic camera parameter δx, and the values of Znear and Zfar, as follows:
-
- Now, given the position error ΔP, we would like to estimate the resulting distortion in the rendered view. Clearly this distortion (in terms of pixel values) will be content dependent. For example, if the video frame includes complex textures and objects, then the distortion caused by ΔP will be significant as different positions should have quite different pixel values. On the other hand, if the video frame includes simple textures or flat (homogenous) areas, then the amount of distortion due to ΔP will be small since pixels at different positions are similar.
- We propose to use a linear model such that the relationship between the distortion in the rendered view Drender (measured as the sum of squared difference (SSD)) and the rendering position error ΔP can be described using a parameter n as follows:
-
New_distortion D render =n·ΔP=n·k·Δd (6) - In this embodiment, we propose to capture the content-dependent nature of rendered view distortion by estimating a simple global parameter n for the video. The estimation of the parameter n can be performed as below:
- Under different horizontal translations tx, the SSD between the original video frame I(x,y) and the translated one I(x−tx,y) is measured:
-
- This procedure will collect data points of the global distortion in a video frame in terms of pixel value, due to different translations. By experiment, it is observed that the relationship between DSSD(·) and tx is approximately linear, as shown in
FIGS. 13 and 14 , where the first frame of each video sequence is used with tx from one to thirty pixels.FIG. 13 shows the relationship 1300 between translation and distortion for a video sequence depicting a newspaper, calculated using (7).FIG. 14 shows the relationship 1400 between translation and distortion for a video sequence depicting breakdancers, calculated using (7). Hence, the variable s between DSSD and translation tx can be found, for example, using the least square linear fit as follows: -
- where DSSD and tx in (8) are the vectors formed by DSSD(·) and tx, respectively, and T denotes the vector transpose operand. The obtained variable s provides an estimation of n, and can be used in Equation (6) with n=s to serve as the new distortion metric in the rate-distortion optimization for depth coding. Note that s is the slope of a linear approximation of n.
- Optionally, according to a rendering process which takes multiple views, this variable can be scaled using the same weight the rendering process put on the view. For example, we want to render Vrender using Vleft, Vright and their corresponding depth map sequences with the following weight α:
-
V render =α·V left+(1−α)·V right, (9) - Then the new scaled variable, s′, to represent global characteristic for Vleft can be calculated as follows:
-
s′=α·s (10) - Using the two parameters k and n=s (or n=s′) found above, the new distortion metric Drender can be derived as follows:
-
D render =n·ΔP=n·k·Δd, (11) - where k is from the camera setting, and n is from the global video characteristics. This new distortion metric can be used in the rate distortion optimized mode selection process using the Lagrangian optimization as follows:
-
- where (x,y) is a pixel position in the block, λ is a Lagrangian multiplier, and R is the bitrate consumed to code the block. Please note that the sum of squared difference of video is replaced with the sum of absolute difference of depth map, followed by the two parameters k, n to estimate the squared difference in synthesized views. Sum of absolute differences (“SAD”) is a common metric used in video quality evaluation. Further, SAD is the metric that is assumed, for the distortion of the depth map, in the derivation throughout the above discussion in
Embodiment 1. - Note that in the above description, the global video parameter n is used for the entire sequence. Other embodiments, however, update the parameter whenever there is a scene change since the content characteristic may change. Thus, by using some scene-change detection, the encoder will determine when to calculate/update n. Thus, the estimation of n can be performed more than once for a given sequence.
-
FIG. 8 shows an exemplaryvideo coding method 800, in accordance with an embodiment of the present principles.Method 800 corresponds to the above describedembodiment 1. Atstep 810, a distortion weight k is calculated using a camera parameter. Atstep 815, a loop over depth maps begins. Atstep 820, it is determined whether the global parameter n needs to be calculated. If so, then the method proceeds to step 825. Otherwise, the method proceeds to step 830. It is to be appreciated that for the first depth map, step 820 will return yes. Step 820 may also return yes when a scene-change is detected. Atstep 825, a global parameter n is calculated using video characteristics. Atstep 828, optionally, the parameter n is scaled based on the weights applied to a number of views used to render an intermediate view. Atstep 830, a depth map is encoded with a new distortion metric as per Equation (12) using the same n in all macroblocks (MBs). Atstep 835, it is determined whether or not the current depth map is the last one in a current depth sequence being processed. If so, then control is passed to astep 840. Otherwise, control is returned to step 815 for the next depth map. Atstep 840, the loop over the depth maps is terminated. - Note that the loop over the depth maps from
steps 815 to 840 refers to a standard looping process. For example, in software, one might use a “for” loop that is performed once for each depth map. - Additionally, the encoding in
step 830 may be performed in various ways. It is common, instep 830, to perform a separate rate-distortion optimization procedure for each macroblock of a given depth map. Other implementations do not perform any optimization, or perform an optimization over limited modes, or perform a single optimization over the whole depth map that effectively selects a single mode for encoding all blocks of the depth map. - Note that the above derivation refers for the translation of a rendered picture. However, we are determining “n” by translating the video picture (a non-rendered picture) that corresponds to the depth map, rather than translating a rendered picture. Of course, the translated video picture may be, for example, an original or a reconstructed picture. The use of a non-rendered picture is an approximation. However, given the large similarity between the non-rendered picture and a picture that is rendered based on the non-rendered picture, the approximation generally has a high degree of accuracy. Other implementations do indeed use a rendered view in the determination of “n”.
- In
Embodiment 1, camera parameters and a global parameter n which models video characteristics were used to construct a new distortion metric for depth coding. However, the characteristics of a frame can differ by local areas within the frame. For example, homogenous floor area, a complicated region such as a human face, and distinct object boundaries, will have very different pixel distributions when we apply translations. An exhaustive approach to adapt local variation can be developed, for example, by calculating parameter n for each block to obtain a more precise result. However, this will increase computational complexity. In this embodiment, while the camera parameters are used the same way as inembodiment 1, for modeling video characteristics, it is proposed to first partition a frame into regions with different characteristics and estimate multiple values of n using similar techniques described inEmbodiment 1. Then when encoding a macroblock (MB), the distortion will be calculated based on the n value estimated for the region this MB belongs to. - Thus, a set of n may be obtained for various regions. This set may be used for an a single picture, or multiple pictures such as, for example, an entire sequence.
- Note that as in
Embodiment 1, the estimated values of multiple local n can be scaled according to the weights applied to the rendering, such as in Equations (9) and (10). Furthermore, estimating a new set of local n based on scene change as described inembodiment 1 can also be applied to thisembodiment 2. -
FIG. 9 shows an exemplaryvideo coding method 900, in accordance with another embodiment of the present principles.Method 900 corresponds to the above describedembodiment 2. Atstep 910, a distortion weight k is calculated using a camera parameter. Atstep 915, a loop over depth maps begins. Atstep 920, it is determined whether the local parameters need to be calculated. If so, then the method proceeds to step 925. Otherwise the method proceeds to step 935. It is to be appreciated that for the first depth map, step 920 will return a yes. It may also return yes when a scene-change is detected. Atstep 925, a current video frame is partitioned into regions with different characteristics. Atstep 930, a parameter n is calculated for each region as partitioned. Atstep 933, optionally, the local parameter n are scaled based on the weights applied to a number views used to render an intermediate view. Atstep 935, the current depth map is encoded by calculating, for each macroblock in the depth map, a new distortion metric as per Equation (12) according to the n estimated for the region in which the macroblock is included. Atstep 940, it is determined whether or not the current depth map is the last one in a current depth sequence being processed. If so, then control is passed to astep 945. Otherwise, control is returned to step 915 for the next depth map. Atstep 945, the loop over the depth maps is terminated. - In calculating the local parameters, more than one picture may be used. Additionally, the regions may be determined in various known manners. Regions may be, for example, texture based, or object based. In one implementation, the regions are determined based on the variance of macroblocks in a picture. More specifically, each macroblock in a picture is examined to determine if the variance of that block is above a threshold value. All blocks with variance above the threshold value are grouped into a first region, and all other blocks are grouped into a second region. This variance determination process may be applied to the luminance and/or chrominance blocks.
- In addition to the adjustment of the distortion metric to optimize the rendering quality when encoding depth, the mode decision process can be improved by considering temporal information of a video sequence.
- As described before, in the estimated depth maps, errors are likely to occur in flat regions, resulting in noisy depth maps. In such a case, the depth map will be less reliable, and may not be worth spending many bits to code the false variations due to depth estimation errors. This may be true, regardless of what “distortion” measure the encoding may be optimized over.
- To attempt to solve this problem, in at least one implementation, it is proposed to utilize video information to help encoding depth maps. The mode selection scheme in depth map coding can often be improved by analyzing video information. An example of such practice is presented in here with respect to embodiment 3.
- The approach of one implementation is to compare video frames at different timestamps, and locate areas with very small changes, which usually corresponds to flat areas or static background areas. Since there are very small changes in these areas, there also should be very small changes in their corresponding depth values. If there are changes in depth values, it is likely due to errors in depth estimation, and can be regarded as distortions. The determination that the video has very small changes can be made in various ways.
- In one implementation, the video difference is computed as a sum of absolute differences of the relevant macroblocks, and the average of the sum is compared to a threshold. The threshold may be, for example, a fixed number for all blocks, or may be proportional to average intensity of one or more of the video blocks.
- In another implementation, the coding mode for one or more of the relevant video blocks is consulted to determine if the video difference is small enough. In video coding, regions with very small changes may be coded using skip mode (such as in the MPEG-4 AVC Standard). Therefore, if skip mode has been selected for the video block, then the video differences are small. As a result, these areas in the depth map can also be coded using skip mode regardless of the false temporal variation. Thus, skip mode is applied to corresponding regions in separate depth maps that do not necessarily have very small changes. Indeed, the corresponding depth portions/regions may have differences large enough that a skip mode would not be selected based on the depth portions themselves.
- The last fact is significant. By applying skip mode to a portion of a depth map that has significant changes from a collocated portion of another depth map, these implementations effectively filter the depth map. This filtering removes all or part of the assumed noise that is in the depth map portion.
- One result is that, for some implementations, there will be differences between collocated portions of two depth pictures that are large enough to produce a coding cost for a skip mode that is greater than a cost for at least one other coding mode. Accordingly, the skip mode would not ordinarily be selected in a rate-distortion optimization algorithm. However, such implementations use the skip mode despite the non-minimizing cost of the skip mode. The reason is that the cost of the skip mode is artificially high due to the noise in the depth pictures, and the failure of the skip mode to accurately code that noise. But the visual quality is improved by not accurately coding that noise, and by filtering the noise by using the skip mode.
- In addition, with this strategy one can select temporal skip in depth automatically, whenever temporal skip in video has been chosen so that no skip mode information needs to be inserted in the depth bitstream. This leads to a reduction in the bitrate by signaling skip mode for both video and depth with a single indicator. This also leads to a reduction in the encoding complexity by omitting motion estimation and the mode decision process.
- Note that a more precise depth map representation (with less false contours and flickering) would possibly limit the performance of the proposed method. However, the problem described above is inevitable to many existing depth acquisition systems. The proposed method provides a simple and efficient solution by using local video-characteristics to correct possible errors in a depth map. The proposed method results in improved subjective quality by reducing a flickering artifact due to temporal variation in the depth map.
- The above discussion focused on the use of skip mode for depth map portions collocated across different points in time. However, other implementations apply the same technique to depth map (and video) portions collocated across different views at the same point in time. Further implementations apply the same technique to depth map (and video) portions collocated across different views at different points in time. Appropriate signaling can be provided to indicate that video and/or depth are coded in skip mode with respect to a particular view and a particular time. Implementations that examine different views would typically need a disparity vector, or equivalent information, to determine the collocated block in a different view.
-
FIG. 10 shows an exemplaryvideo coding method 1000, in accordance with yet another embodiment of the present principles.Method 1000 corresponds to the above described embodiment 3. Atstep 1005, the loop over depth map begins. Atstep 1010, the loop over MBs within the depth map begins. Atstep 1020, video data is analyzed. For example,step 1020 may check whether or not the corresponding video macroblock is coded using skip mode and/or check the change in the video macroblock (as compared to a corresponding macroblock in a temporally distinct video macroblock) and compare against a change threshold. - At
step 1025, it is determined whether or not the current macroblock should be coded using skip mode based on the analysis result from 1020. If so, then the method proceeds to step 1030. Otherwise, the method proceeds to step 1035. Atstep 1030, the current macroblock is forced to be encoded as skip mode. Atstep 1035, the current macroblock is encoded with a conventional method. Atstep 1040, it is determined whether or not the current macroblock is the last macroblock in the depth map. If so, then the method proceeds to step 1045. Otherwise, the method returns to step 1010 for the next macroblock. Atstep 1045, the loop over the depth map macroblocks is terminated. Atstep 1050, it is determined whether or not the current depth map is the last one in the depth sequence. If so, then the method proceeds to step 1055. Otherwise, the method returns to step 1005 for the next depth map. Atstep 1055, the loop over the depth maps is terminated. - The new distortion metric in
embodiment 1 andembodiment 2 can be combined with the technique in one of the other embodiments, such as, for example, Embodiment 3, to achieve a higher coding efficiency and rendering quality. -
FIG. 11 shows an exemplaryvideo coding method 1100, in accordance with still another embodiment of the present principles.Method 1100 corresponds to the above describedembodiment 4. Atstep 1110, a distortion weight k is calculated using a camera parameter. Atstep 1115, a loop over depth maps begins. Atstep 1120, it is determined whether the parameter n (global or local) needs to be calculated. If so, then the method proceeds to step 1125. Otherwise the method proceeds to step 1130. Atstep 1125, one or more parameters n (global or local) are calculated using video characteristics. Atstep 1128, optionally, the parameters n are scaled based on the weights applied to a number of views used to render an intermediate view. Atstep 1130, a loop over macroblocks within the current depth map begins. Atstep 1133, video data is analyzed (e.g., check whether or not the corresponding video macroblock is coded using skip mode and/or check the change in video macroblock and compare against a change threshold). Atstep 1135, it is determined whether or not the current macroblock should be coded using skip mode based on the analysis results fromstep 1133. If so, then the method proceeds to step 1140. Otherwise, the method proceeds to step 1145. Atstep 1140, the current macroblock is forced to be encoded as skip mode. Atstep 1145, the current macroblock is encoded with the new distortion metric calculated using k and n, for example, as per Equation (12).Step 1145 could be performed using local or global parameter(s) n. - At
step 1150, it is determined whether or not the current macroblock is the last macroblock in the depth map. If so, then the method proceeds to step 1155. Otherwise, the method returns to step 1130 for the next macroblock. Atstep 1155, the loop over the depth map macroblocks is terminated. Atstep 1160, it is determined whether or not the current depth map is the last one in the depth sequence. If so, then the method proceeds to step 1165. Otherwise, the method returns to step 1115 for the next depth map. Atstep 1165, the loop over the depth maps is terminated. -
FIG. 12 shows an exemplaryvideo decoding method 1200, in accordance with yet another embodiment of the present principles.Method 1200 corresponds to any of the above describedembodiments 1 through 4. That is,method 1200 may be used to decode an encoding produced by any ofembodiments 1 through 4. At step 1210, a loop is performed over macroblocks. Atstep 1215, syntax is parsed. Atstep 1220, a predictor of the current macroblock is obtained. At step 1225, a residue of the current block is calculated. Atstep 1230, the residue is added to the predictor. At step 1235, the loop is terminated. Atstep 1240, deblock filtering is performed. - The new distortion metric and the skip mode selection scheme have been simulated using several multi-view test sequences. For each sequence, both video and depth map are encoded for two selected views. The decoded video and depth map are used to render an intermediate view between the two views.
- First, the term k is calculated using the camera setting parameters for each sequence. Then n is found as described herein by estimating the effect of displacements in the first frame of the video sequence. The results of k, global n (embodiment 1) and BD-PSNR (“Bjontegaard Difference in Peak Signal-to-Noise Ratio) is given in Table 1. Note that each multi-view sequence is acquired in a different camera setting, which would affect the amount of geometry error differently, and this difference is well reflected in k. For the outdoor scene sequences Zfar is large, thus Znear is the dominant parameter to decide k when the camera distance and focal length are similar. This can be seen in the “
Lovebird 1” and “Lovebird 2” cases shown in TABLE 1, where the former captures nearer object (smaller Znear), leading to larger k as calculated in Equation (5). With larger k, the position error becomes more sensitive to the depth distortion as in Equation (4). In the case of indoor scene sequences, all parameters can affect the amount of the position error caused by the depth distortion. For example, two indoor scene sequences, namely “Ballet” and “Dog”, have quite different values of k, where the former has a dense camera setting to capture near objects compared to the latter. The second term n, depends on the image characteristics. Comparing the cases of “Champagne Tower” and “Ballet” in TABLE 1, n is larger for the former which includes a lot of objects, resulting in large distortion in the synthesized view by position error. - The video is coded using the MPEG-4 AVC Standard (joint model (JM) reference software version 13.2), and the depth map is coded using the MPEG-4 AVC Standard with and without the proposed methods. To simplify test conditions, the same encoding configuration is used for the video and depth maps including the QP values of 24, 28, 32, and 36, and the Lagrange multiplier values, and only I-slices and P-slices are used to code 15 depth maps for each view.
- Subjective quality is improved because flickering artifacts are reduced. The flickering artifacts occur in the synthesized views due to the temporal variation in the depth map. By applying the skip mode selection method, erroneous depth map information is coded using skip mode and, as a result, the flickering artifact is reduced.
-
TABLE 1 Sequence k n BD-PSNR (dB) Champagne Tower 0.282 65.238 0.34 Dog 0.078 41.671 0.75 Lovebird 10.214 24.807 0.24 Lovebird 20.057 29.265 1.96 Door Flowers 0.090 15.810 1.23 Newspaper 0.275 38.653 0.75 Ballet 0.442 7.723 1.23 Breakdancers 0.383 11.430 0.29 - As used herein, the term “picture” refers to either a frame or field. Additionally, throughout this application, wherever the term “frame” is used, alternate implementations may be devised for a field or, more generally, for a picture.
- Moreover, as used herein, the phrase “coding gain” refers to one or more of the following: for a given coding bitrate, the reduction in rendering distortion, measured in terms of, for example, SSD; or for a given rendering distortion (measured in SSD, for example), the reduction in coding bitrate.
- Further, as used herein, the phrase “distortion in rendered video” refers to a distortion between the video rendered using compressed depth and the video rendered using uncompressed depth. The actual distortion value may be determined in various ways, and using various measures. For example, the distortion value may be determined using SSD as the distortion measure.
- Also, as is known skip mode refers to the SKIP mode as specified in MPEG-4 AVC Standard. That is, in SKIP mode there is no prediction residue and no motion vector to be transmitted. The reconstructed block is obtained by simply copying the corresponding block in previously encoded pictures. The block correspondence is identified by simply using the predicted motion vector obtained using motion information in neighboring blocks.
- Additionally, it is to be appreciated that while one or more embodiments of the present principles are described herein with respect to the MPEG-4 AVC standard, the present principles are not limited to solely this standard and, thus, may be used with respect to other video coding standards, recommendations, and extensions thereof, including extensions of the MPEG-4 AVC standard, while maintaining the spirit of the present principles.
- In at least one implementation, it is preferred to apply “global” and “localized” values of n to other frames and/or regions, so that there is some utility from the work of calculating n. That is, an encoder might calculate n for a frame or a portion of a frame, and then merely use that n for that frame/portion, but this may not be very efficient. It may be better to use that n for, for example, (i) all the other frames in that sequence, or (ii) all the other portions that are similar (e.g. “sky” portions), and so forth.
- A “portion” of a picture, as used in this application, refers to all or part of the picture. A portion may be, for example, a pixel, a macroblock, a slice, a frame, a field, a picture, a region bounding an object in the picture, the foreground of the picture, the background of the object, or a particular set of (x,y) coordinates in the picture. A translation of a portion may refer, for example, to a translation of a particular set of (x,y) coordinates. For example, a portion may include the pixel at location (x1, y1), and a translation (represented by “T”) of the portion may include the pixel at location (x1−T, y1).
- In at least one implementation, an encoder performs the following operations to determine n for a given depth map:
-
- For a given depth distortion, we know the translational error in the resulting rendered video, via k, which is determined based on camera parameters including focal length, baseline distance, as calculated in equation (3).
- We take the picture (or portion) and translate it by the translational error, then calculate the resulting video distortion. This involves a summation over all pixels in the picture (or portion) of, if you use MSE, (1(x,y)−1(x-shift,y))2, and then dividing by the number of pixels. This gives us a (translational error, video distortion) point, which is one point on a graph having translation error on the x-axis and video distortion on the y-axis. We then do the same for a variety of values of translation error (corresponding to a variety of values of depth distortion), and chart this, and then fit a line to get an estimation of n.
- We use this estimated n for the picture/portion in performing the rate-distortion optimization encoding of the depth map, or we can use an actual plotted value of video distortion. We also use this value of n for other pictures/portions. In additional implementations, we also create a library of values of n, for different types of scenes for example. Scenes may include, for example, sky, woods, grass (for example, a sports playing field), or water.
- Throughout this disclosure, we refer to the “video” for a given location. References to “video” may include any of various video components or their combinations. Such components, or their combinations, include, for example, luminance, chrominance, Y (of YUV or YCbCr or YPbPr), U (of YUV), V (of YUV), Cb (of YCbCr), Cr (of YCbCr), Pb (of YPbPr), Pr (of YPbPr), red (of RGB), green (of RGB), blue (of RGB), S-Video, and negatives or positives of any of these components. Various implementations consider the video distortion that arises in rendered views. Accordingly, those rendered views may include, or be limited to, one or more of various components of video.
- Reference in the specification to “one embodiment” or “an embodiment” or “one implementation” or “an implementation” of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
- It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
- Additionally, this application and its claims refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
- Further, this application frequently refers to a “view”. It is to be understood that a “view” may typically refer to an actual picture from that view. In other instances, however, a “view” may refer, for example, to the actual view position, or to a series of pictures from a view. The meaning will be revealed from context.
- We thus provide one or more implementations having particular features and aspects. However, features and aspects of described implementations may also be adapted for other implementations. Although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
- Implementations may signal information using a variety of techniques including, but not limited to, slice headers, supplemental enhancement information (SEI) messages or other messages, other high level syntax, non-high-level syntax, out-of-band information, data-stream data, and implicit signaling. Accordingly, although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
- Additionally, many implementations may be implemented in one or more of an encoder, a pre-processor to an encoder, a decoder, or a post-processor to a decoder. The implementations described or contemplated may be used in a variety of different applications and products. Some examples of applications or products include set-top boxes, cell phones, personal digital assistants (PDAs), televisions, personal recording devices (for example, PVRs, computers running recording software, VHS recording devices), camcorders, streaming of data over the Internet or other communication links, and video-on-demand.
- The implementations described herein may be implemented in, for example, a method or a process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
- Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding. Examples of equipment include video coders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cell phones, PDAs, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
- Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium having instructions for carrying out a process.
- As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data instructions for performing one of the depth map encoding techniques described in this application or to carry the actual encoding of the depth map. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. Further, the signal may be stored on a processor-readable medium.
- A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application and are within the scope of the following claims.
Claims (31)
1. A method comprising:
accessing a portion of a first depth picture that is to be coded, the portion of the first depth picture having one or more depth values for a corresponding portion of a first video picture;
determining that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture; and
coding, based on the determining, the portion of the first depth picture using an indicator that instructs a decoder to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and that instructs the decoder to use the portion of the second depth picture for the portion of the first depth picture.
2. The method of claim 1 wherein one or more of the portion of the first depth picture or the portion of the second depth picture includes noise, and coding the portion of the first depth picture using the indicator reduces the noise.
3. The method of claim 2 wherein differences between the portion of the first depth picture and the portion of the second depth picture are large enough to produce a coding cost for a skip mode that is greater than a cost for at least one other coding mode, and the method further comprises determining to use the skip mode despite the non-minimizing cost of the skip mode.
4. The method of claim 1 wherein differences between the portion of the first depth picture and the portion of the second depth picture are large enough to produce a coding cost for a skip mode that is greater than a cost for at least one other coding mode, and the method further comprises determining to use the skip mode despite the non-minimizing cost of the skip mode.
5. The method of claim 1 wherein one or more of the portion of the first depth picture or the portion of the second depth picture includes noise that contributes to the differences between the portion of the first depth picture and the portion of the second depth picture, and coding the portion of the first depth picture using the indicator reduces a flickering artifact due to the differences between the portion of the first video picture and the portion of the second video picture.
6. The method of claim 1 wherein determining that differences are small enough comprises comparing the portion of the first video picture and the portion of a second video picture.
7. The method of claim 1 wherein determining that differences are small enough comprises determining that the portion of the first video picture was coded using a coding mode that instructs a decoder to replace the portion of the first video picture with the portion of the second video picture.
8. The method of claim 7 wherein the indicator further instructs the decoder to replace the portion of the first video picture with the portion of the second video picture, such that a single indicator is used to indicate the coding of both the portion of the first depth picture and the portion of the first video picture.
9. The method of claim 1 wherein the first video picture is from a first time and the second video picture is from a second time that is different from the first time.
10. The method of claim 9 wherein the first video picture is from a first view and the second video picture is from a second view that is different from the first view.
11. The method of claim 1 wherein the first video picture is from a first view and the second video picture is from a second view that is different from the first view.
12. The method of claim 1 wherein the second time is earlier than the first time.
13. The method of claim 1 wherein the second time is later than the first time.
14. The method of claim 1 wherein coding the portion of the first depth picture comprises using a skip mode.
15. The method of claim 1 wherein differences between the portion of the first depth picture and the portion of the second depth picture are above a threshold.
16. An apparatus comprising:
means for accessing a portion of a first depth picture that is to be coded, the portion of the first depth picture having one or more depth values for a corresponding portion of a first video picture;
means for determining that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture; and
means for coding, based on the determining, the portion of the first depth picture using an indicator that instructs a decoder to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and that instructs the decoder to use the portion of the second depth picture for the portion of the first depth picture.
17. The apparatus of claim 16 , wherein the apparatus is implemented in a video encoder.
18. A processor readable medium having stored thereon instructions for causing a processor to perform at least the following:
accessing a portion of a first depth picture that is to be coded, the portion of the first depth picture having one or more depth values for a corresponding portion of a first video picture;
determining that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture; and
coding, based on the determining, the portion of the first depth picture using an indicator that instructs a decoder to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and that instructs the decoder to use the portion of the second depth picture for the portion of the first depth picture.
19. An apparatus, comprising a processor configured to perform at least the following:
accessing a portion of a first depth picture that is to be coded, the portion of the first depth picture having one or more depth values for a corresponding portion of a first video picture;
determining that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture; and
coding, based on the determining, the portion of the first depth picture using an indicator that instructs a decoder to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and that instructs the decoder to use the portion of the second depth picture for the portion of the first depth picture.
20. An apparatus comprising an encoder for performing at least the following operations:
accessing a portion of a first depth picture that is to be coded, the portion of the first depth picture having one or more depth values for a corresponding portion of a first video picture;
determining that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture; and
coding, based on the determining, the portion of the first depth picture using an indicator that instructs a decoder to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and that instructs the decoder to use the portion of the second depth picture for the portion of the first depth picture.
21. An apparatus comprising:
an encoder for performing at least the following operations:
accessing a portion of a first depth picture that is to be coded, the portion of the first depth picture having one or more depth values for a corresponding portion of a first video picture,
determining that differences between the portion of the first video picture and a corresponding portion of a second video picture are small enough that an encoding of the portion of the first video picture may replace the portion of the first video picture with the portion of the second video picture,
coding, based on the determining, the portion of the first depth picture using an indicator that instructs a decoder to find a portion of a second depth picture that has one or more depth values for the portion of the second video picture, and that instructs the decoder to use the portion of the second depth picture for the portion of the first depth picture; and
a modulator for modulating a signal that includes the coding of the portion of the first depth picture.
22. (canceled)
23. (canceled)
24. A processor readable medium having stored thereon a video signal structure, comprising:
a depth picture coding section including coding of a portion of a first depth picture;
a video picture coding section including coding of a portion of a first video picture, the first video picture corresponding to the first depth picture; and
an indicator section including coding of at least a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture.
25. A method comprising:
accessing a coding of a portion of a first depth picture;
accessing a coding of a portion of a first video picture, the first video picture corresponding to the first depth picture;
accessing at least a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture;
decoding the portion of the second depth picture by using the portion of the first depth picture for the portion of the second depth picture; and
decoding the portion of the second video picture by using the portion of the first video picture for the portion of the second video picture.
26. An apparatus comprising:
means for accessing a coding of a portion of a first depth picture;
means for accessing a coding of a portion of a first video picture, the first video picture corresponding to the first depth picture;
means for accessing a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture;
means for decoding the portion of the second depth picture by using the portion of the first depth picture; and
means for decoding the portion of the second video picture by using the portion of the first video picture.
27. The apparatus of claim 26 , wherein the apparatus is implemented in a video decoder.
28. A processor readable medium having stored thereon instructions for causing a processor to perform at least the following:
accessing a coding of a portion of a first depth picture;
accessing a coding of a portion of a first video picture, the first video picture corresponding to the first depth picture;
accessing a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture;
decoding the portion of the second depth picture by using the portion of the first depth picture; and
decoding the portion of the second video picture by using the portion of the first video picture.
29. An apparatus, comprising a processor configured to perform at least the following:
accessing a coding of a portion of a first depth picture;
accessing a coding of a portion of a first video picture, the first video picture corresponding to the first depth picture;
accessing a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture;
decoding the portion of the second depth picture by using the portion of the first depth picture; and
decoding the portion of the second video picture by using the portion of the first video picture.
30. An apparatus comprising a decoder for performing at least the following operations:
accessing a coding of a portion of a first depth picture;
accessing a coding of a portion of a first video picture, the first video picture corresponding to the first depth picture;
accessing a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture;
decoding the portion of the second depth picture by using the portion of the first depth picture; and
decoding the portion of the second video picture by using the portion of the first video picture.
31. An apparatus comprising:
a demodulator for receiving and demodulating a signal, the signal including:
a coding of a portion of a first depth picture,
a coding of a portion of a first video picture, the first video picture corresponding to the first depth picture,
a coding of a single indicator that instructs the decoder to perform at least the following operations:
to decode a portion of a second depth picture by using the portion of the first depth picture for the portion of the second depth picture, the portion of the second depth picture being collocated with the potion of the first depth picture, and
to decode a portion of a second video picture by using the portion of the first video picture for the portion of the second video picture, the portion of the second video picture being collocated with the potion of the first video picture, and the second video picture corresponding to the second depth picture; and
a decoder for performing at least the following operations:
accessing the coding of a portion of a first depth picture,
accessing the coding of a portion of a first video picture,
accessing the single indicator,
decoding the portion of the second depth picture by using the portion of the first depth picture, and
decoding the portion of the second video picture by using the portion of the first video picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/138,362 US20110292044A1 (en) | 2009-02-13 | 2009-11-23 | Depth map coding using video information |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20753209P | 2009-02-13 | 2009-02-13 | |
US20789209P | 2009-02-18 | 2009-02-18 | |
US26950109P | 2009-06-25 | 2009-06-25 | |
US27105309P | 2009-07-16 | 2009-07-16 | |
US13/138,362 US20110292044A1 (en) | 2009-02-13 | 2009-11-23 | Depth map coding using video information |
PCT/US2009/006245 WO2010093350A1 (en) | 2009-02-13 | 2009-11-23 | Depth map coding using video information |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110292044A1 true US20110292044A1 (en) | 2011-12-01 |
Family
ID=42561995
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/138,362 Abandoned US20110292044A1 (en) | 2009-02-13 | 2009-11-23 | Depth map coding using video information |
US13/138,335 Active 2031-05-25 US9066075B2 (en) | 2009-02-13 | 2009-11-23 | Depth map coding to reduce rendered distortion |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/138,335 Active 2031-05-25 US9066075B2 (en) | 2009-02-13 | 2009-11-23 | Depth map coding to reduce rendered distortion |
Country Status (2)
Country | Link |
---|---|
US (2) | US20110292044A1 (en) |
WO (2) | WO2010093351A1 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130114730A1 (en) * | 2011-11-07 | 2013-05-09 | Qualcomm Incorporated | Coding significant coefficient information in transform skip mode |
US20130278597A1 (en) * | 2012-04-20 | 2013-10-24 | Total 3rd Dimension Systems, Inc. | Systems and methods for real-time conversion of video into three-dimensions |
US20140003518A1 (en) * | 2011-04-12 | 2014-01-02 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US20140002594A1 (en) * | 2012-06-29 | 2014-01-02 | Hong Kong Applied Science and Technology Research Institute Company Limited | Hybrid skip mode for depth map coding and decoding |
US20140071233A1 (en) * | 2012-09-11 | 2014-03-13 | Samsung Electronics Co., Ltd. | Apparatus and method for processing image using correlation between views |
US20140160239A1 (en) * | 2012-12-06 | 2014-06-12 | Dihong Tian | System and method for depth-guided filtering in a video conference environment |
US20140192148A1 (en) * | 2011-08-15 | 2014-07-10 | Telefonaktiebolaget L M Ericsson (Publ) | Encoder, Method in an Encoder, Decoder and Method in a Decoder for Providing Information Concerning a Spatial Validity Range |
US20140328389A1 (en) * | 2011-12-22 | 2014-11-06 | Mediatek Inc. | Method and apparatus of texture image compression in 3d video coding |
US9165393B1 (en) * | 2012-07-31 | 2015-10-20 | Dreamworks Animation Llc | Measuring stereoscopic quality in a three-dimensional computer-generated scene |
US20150334418A1 (en) * | 2012-12-27 | 2015-11-19 | Nippon Telegraph And Telephone Corporation | Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program |
US9371099B2 (en) | 2004-11-03 | 2016-06-21 | The Wilfred J. and Louisette G. Lagassey Irrevocable Trust | Modular intelligent transportation system |
US20170078641A1 (en) * | 2011-11-11 | 2017-03-16 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US10909373B1 (en) * | 2018-08-24 | 2021-02-02 | Snap Inc. | Augmented reality system using structured light |
US10911779B2 (en) * | 2013-10-17 | 2021-02-02 | Nippon Telegraph And Telephone Corporation | Moving image encoding and decoding method, and non-transitory computer-readable media that code moving image for each of prediction regions that are obtained by dividing coding target region while performing prediction between different views |
US11017540B2 (en) * | 2018-04-23 | 2021-05-25 | Cognex Corporation | Systems and methods for improved 3-d data reconstruction from stereo-temporal image sequences |
US11218681B2 (en) * | 2017-06-29 | 2022-01-04 | Koninklijke Philips N.V. | Apparatus and method for generating an image |
US20230237730A1 (en) * | 2022-01-21 | 2023-07-27 | Meta Platforms Technologies, Llc | Memory structures to support changing view direction |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2328337A4 (en) * | 2008-09-02 | 2011-08-10 | Huawei Device Co Ltd | 3d video communicating means, transmitting apparatus, system and image reconstructing means, system |
KR101640404B1 (en) * | 2010-09-20 | 2016-07-18 | 엘지전자 주식회사 | Mobile terminal and operation control method thereof |
US20120206578A1 (en) * | 2011-02-15 | 2012-08-16 | Seung Jun Yang | Apparatus and method for eye contact using composition of front view image |
EP2717572B1 (en) * | 2011-06-24 | 2018-08-08 | LG Electronics Inc. | Encoding/decoding method and apparatus using a skip mode |
US10158850B2 (en) | 2011-08-25 | 2018-12-18 | Telefonaktiebolaget Lm Ericsson (Publ) | Depth map encoding and decoding |
WO2014172387A1 (en) * | 2013-04-15 | 2014-10-23 | Huawei Technologies Co., Ltd. | Method and apparatus of depth prediction mode selection |
US10080036B2 (en) * | 2013-05-16 | 2018-09-18 | City University Of Hong Kong | Method and apparatus for depth video coding using endurable view synthesis distortion |
TW201528775A (en) | 2014-01-02 | 2015-07-16 | Ind Tech Res Inst | Depth map aligning method and system |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4849914A (en) * | 1987-09-22 | 1989-07-18 | Opti-Copy, Inc. | Method and apparatus for registering color separation film |
US5301242A (en) * | 1991-05-24 | 1994-04-05 | International Business Machines Corporation | Apparatus and method for motion video encoding employing an adaptive quantizer |
US5623557A (en) * | 1994-04-01 | 1997-04-22 | Sony Corporation | Method and apparatus for data encoding and data recording medium |
US5748229A (en) * | 1996-06-26 | 1998-05-05 | Mci Corporation | System and method for evaluating video fidelity by determining information frame rate |
US20040047614A1 (en) * | 2002-08-22 | 2004-03-11 | Dustin Green | Accelerated access to frames from a compressed digital video stream without keyframes |
US6823015B2 (en) * | 2002-01-23 | 2004-11-23 | International Business Machines Corporation | Macroblock coding using luminance date in analyzing temporal redundancy of picture, biased by chrominance data |
US20070024614A1 (en) * | 2005-07-26 | 2007-02-01 | Tam Wa J | Generating a depth map from a two-dimensional source image for stereoscopic and multiview imaging |
US20070083883A1 (en) * | 2004-03-29 | 2007-04-12 | Deng Kevin K | Methods and apparatus to detect a blank frame in a digital video broadcast signal |
US20070274396A1 (en) * | 2006-05-26 | 2007-11-29 | Ximin Zhang | Complexity adaptive skip mode estimation for video encoding |
WO2008071037A1 (en) * | 2006-12-14 | 2008-06-19 | Thomson Licensing | Method and apparatus for encoding and/or decoding video data using enhancement layer residual prediction for bit depth scalability |
US20080198920A1 (en) * | 2007-02-21 | 2008-08-21 | Kai Chieh Yang | 3d video encoding |
US20080212690A1 (en) * | 2007-03-01 | 2008-09-04 | Qualcomm Incorporated | Transcoder media time conversion |
US20090129667A1 (en) * | 2007-11-16 | 2009-05-21 | Gwangju Institute Of Science And Technology | Device and method for estimatiming depth map, and method for generating intermediate image and method for encoding multi-view video using the same |
US20090225163A1 (en) * | 2008-03-07 | 2009-09-10 | Honeywell International, Inc. | System and method for mapping of text events from multiple sources with camera outputs |
US7633551B2 (en) * | 2000-10-18 | 2009-12-15 | Microsoft Corporation | Compressed timing indicators for media samples |
US20100046635A1 (en) * | 2007-04-12 | 2010-02-25 | Purvin Bibhas Pandit | Tiling in video decoding and encoding |
US20100064220A1 (en) * | 2008-03-27 | 2010-03-11 | Verizon Data Services India Private Limited | Method and system for providing interactive hyperlinked video |
US20100098157A1 (en) * | 2007-03-23 | 2010-04-22 | Jeong Hyu Yang | method and an apparatus for processing a video signal |
US20100231688A1 (en) * | 2009-03-11 | 2010-09-16 | Industry Academic Cooperation Foundation Of Kyung Hee University | Method and apparatus for block-based depth map coding and 3d video coding method using the same |
US20100284466A1 (en) * | 2008-01-11 | 2010-11-11 | Thomson Licensing | Video and depth coding |
US20110007135A1 (en) * | 2009-07-09 | 2011-01-13 | Sony Corporation | Image processing device, image processing method, and program |
US20110038418A1 (en) * | 2008-04-25 | 2011-02-17 | Thomson Licensing | Code of depth signal |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07123447A (en) * | 1993-10-22 | 1995-05-12 | Sony Corp | Method and device for recording image signal, method and device for reproducing image signal, method and device for encoding image signal, method and device for decoding image signal and image signal recording medium |
US20040104935A1 (en) * | 2001-01-26 | 2004-06-03 | Todd Williamson | Virtual reality immersion system |
DE602004014901D1 (en) * | 2004-04-29 | 2008-08-21 | Mitsubishi Electric Corp | Adaptive quantization of a depth map |
US7671894B2 (en) * | 2004-12-17 | 2010-03-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for processing multiview videos for view synthesis using skip and direct modes |
US7728878B2 (en) | 2004-12-17 | 2010-06-01 | Mitsubishi Electric Research Labortories, Inc. | Method and system for processing multiview videos for view synthesis using side information |
US7903737B2 (en) | 2005-11-30 | 2011-03-08 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for randomly accessing multiview videos with known prediction dependency |
KR100795974B1 (en) | 2006-05-03 | 2008-01-21 | 한국과학기술연구원 | Apparatus for realtime-generating a depth-map by processing streaming stereo images |
US8139142B2 (en) * | 2006-06-01 | 2012-03-20 | Microsoft Corporation | Video manipulation of red, green, blue, distance (RGB-Z) data including segmentation, up-sampling, and background substitution techniques |
EP1931150A1 (en) * | 2006-12-04 | 2008-06-11 | Koninklijke Philips Electronics N.V. | Image processing system for processing combined image data and depth data |
US8160149B2 (en) | 2007-04-03 | 2012-04-17 | Gary Demos | Flowfield motion compensation for video compression |
US20090010507A1 (en) | 2007-07-02 | 2009-01-08 | Zheng Jason Geng | System and method for generating a 3d model of anatomical structure using a plurality of 2d images |
CN101516040B (en) | 2008-02-20 | 2011-07-06 | 华为终端有限公司 | Video matching method, device and system |
CN100563340C (en) | 2008-07-07 | 2009-11-25 | 浙江大学 | Multichannel video stream encoder and decoder based on deep image rendering |
CN101374242B (en) | 2008-07-29 | 2010-06-02 | 宁波大学 | Depth map encoding compression method for 3DTV and FTV system |
CN102124742B (en) * | 2008-08-20 | 2013-09-11 | 汤姆逊许可公司 | Refined depth map |
US20100278232A1 (en) * | 2009-05-04 | 2010-11-04 | Sehoon Yea | Method Coding Multi-Layered Depth Images |
US9148673B2 (en) * | 2009-06-25 | 2015-09-29 | Thomson Licensing | Depth map coding |
WO2011084021A2 (en) * | 2010-01-11 | 2011-07-14 | 엘지전자 주식회사 | Broadcasting receiver and method for displaying 3d images |
GB2479784B (en) * | 2010-04-23 | 2012-11-07 | Nds Ltd | Image scaling |
KR101702948B1 (en) * | 2010-07-20 | 2017-02-06 | 삼성전자주식회사 | Rate-Distortion Optimization Apparatus and Method for depth-image encoding |
US8447098B1 (en) * | 2010-08-20 | 2013-05-21 | Adobe Systems Incorporated | Model-based stereo matching |
US20120050494A1 (en) * | 2010-08-27 | 2012-03-01 | Xuemin Chen | Method and system for creating a view-angle dependent 2d and/or 3d image/video utilizing a monoscopic video camera array |
KR20120023431A (en) * | 2010-09-03 | 2012-03-13 | 삼성전자주식회사 | Method and apparatus for converting 2-dimensinal image to 3-dimensional image with adjusting depth of the 3-dimensional image |
US8902283B2 (en) * | 2010-10-07 | 2014-12-02 | Sony Corporation | Method and apparatus for converting a two-dimensional image into a three-dimensional stereoscopic image |
US9094660B2 (en) * | 2010-11-11 | 2015-07-28 | Georgia Tech Research Corporation | Hierarchical hole-filling for depth-based view synthesis in FTV and 3D video |
CN102760234B (en) * | 2011-04-14 | 2014-08-20 | 财团法人工业技术研究院 | Depth image acquisition device, system and method |
EP2614490B1 (en) * | 2011-11-11 | 2013-12-18 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
CN103139577B (en) * | 2011-11-23 | 2015-09-30 | 华为技术有限公司 | The method and apparatus of a kind of depth image filtering method, acquisition depth image filtering threshold |
JP6016061B2 (en) * | 2012-04-20 | 2016-10-26 | Nltテクノロジー株式会社 | Image generation apparatus, image display apparatus, image generation method, and image generation program |
EP2757524B1 (en) * | 2013-01-16 | 2018-12-19 | Honda Research Institute Europe GmbH | Depth sensing method and system for autonomous vehicles |
US20140363097A1 (en) * | 2013-06-06 | 2014-12-11 | Etron Technology, Inc. | Image capture system and operation method thereof |
US20150049821A1 (en) * | 2013-08-16 | 2015-02-19 | Qualcomm Incorporated | In-loop depth map filtering for 3d video coding |
-
2009
- 2009-11-23 US US13/138,362 patent/US20110292044A1/en not_active Abandoned
- 2009-11-23 WO PCT/US2009/006248 patent/WO2010093351A1/en active Application Filing
- 2009-11-23 WO PCT/US2009/006245 patent/WO2010093350A1/en active Application Filing
- 2009-11-23 US US13/138,335 patent/US9066075B2/en active Active
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4849914A (en) * | 1987-09-22 | 1989-07-18 | Opti-Copy, Inc. | Method and apparatus for registering color separation film |
US5301242A (en) * | 1991-05-24 | 1994-04-05 | International Business Machines Corporation | Apparatus and method for motion video encoding employing an adaptive quantizer |
US5623557A (en) * | 1994-04-01 | 1997-04-22 | Sony Corporation | Method and apparatus for data encoding and data recording medium |
US5748229A (en) * | 1996-06-26 | 1998-05-05 | Mci Corporation | System and method for evaluating video fidelity by determining information frame rate |
US7633551B2 (en) * | 2000-10-18 | 2009-12-15 | Microsoft Corporation | Compressed timing indicators for media samples |
US6823015B2 (en) * | 2002-01-23 | 2004-11-23 | International Business Machines Corporation | Macroblock coding using luminance date in analyzing temporal redundancy of picture, biased by chrominance data |
US20040047614A1 (en) * | 2002-08-22 | 2004-03-11 | Dustin Green | Accelerated access to frames from a compressed digital video stream without keyframes |
US20070083883A1 (en) * | 2004-03-29 | 2007-04-12 | Deng Kevin K | Methods and apparatus to detect a blank frame in a digital video broadcast signal |
US20070024614A1 (en) * | 2005-07-26 | 2007-02-01 | Tam Wa J | Generating a depth map from a two-dimensional source image for stereoscopic and multiview imaging |
US20070274396A1 (en) * | 2006-05-26 | 2007-11-29 | Ximin Zhang | Complexity adaptive skip mode estimation for video encoding |
WO2008071037A1 (en) * | 2006-12-14 | 2008-06-19 | Thomson Licensing | Method and apparatus for encoding and/or decoding video data using enhancement layer residual prediction for bit depth scalability |
US20080198920A1 (en) * | 2007-02-21 | 2008-08-21 | Kai Chieh Yang | 3d video encoding |
US20080212690A1 (en) * | 2007-03-01 | 2008-09-04 | Qualcomm Incorporated | Transcoder media time conversion |
US20100098157A1 (en) * | 2007-03-23 | 2010-04-22 | Jeong Hyu Yang | method and an apparatus for processing a video signal |
US20100046635A1 (en) * | 2007-04-12 | 2010-02-25 | Purvin Bibhas Pandit | Tiling in video decoding and encoding |
US20090129667A1 (en) * | 2007-11-16 | 2009-05-21 | Gwangju Institute Of Science And Technology | Device and method for estimatiming depth map, and method for generating intermediate image and method for encoding multi-view video using the same |
US20100284466A1 (en) * | 2008-01-11 | 2010-11-11 | Thomson Licensing | Video and depth coding |
US20090225163A1 (en) * | 2008-03-07 | 2009-09-10 | Honeywell International, Inc. | System and method for mapping of text events from multiple sources with camera outputs |
US20100064220A1 (en) * | 2008-03-27 | 2010-03-11 | Verizon Data Services India Private Limited | Method and system for providing interactive hyperlinked video |
US20110038418A1 (en) * | 2008-04-25 | 2011-02-17 | Thomson Licensing | Code of depth signal |
US20100231688A1 (en) * | 2009-03-11 | 2010-09-16 | Industry Academic Cooperation Foundation Of Kyung Hee University | Method and apparatus for block-based depth map coding and 3d video coding method using the same |
US20110007135A1 (en) * | 2009-07-09 | 2011-01-13 | Sony Corporation | Image processing device, image processing method, and program |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9371099B2 (en) | 2004-11-03 | 2016-06-21 | The Wilfred J. and Louisette G. Lagassey Irrevocable Trust | Modular intelligent transportation system |
US10979959B2 (en) | 2004-11-03 | 2021-04-13 | The Wilfred J. and Louisette G. Lagassey Irrevocable Trust | Modular intelligent transportation system |
US10992956B2 (en) | 2011-04-12 | 2021-04-27 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US11910013B2 (en) * | 2011-04-12 | 2024-02-20 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US11902575B2 (en) | 2011-04-12 | 2024-02-13 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US20230156220A1 (en) * | 2011-04-12 | 2023-05-18 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US20230133166A1 (en) * | 2011-04-12 | 2023-05-04 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US11523133B2 (en) | 2011-04-12 | 2022-12-06 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US10142623B2 (en) * | 2011-04-12 | 2018-11-27 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US20140003518A1 (en) * | 2011-04-12 | 2014-01-02 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US11910014B2 (en) * | 2011-04-12 | 2024-02-20 | Electronics And Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US10575014B2 (en) | 2011-04-12 | 2020-02-25 | Electronics Telecommunications Research Institute | Image encoding method using a skip mode, and a device using the method |
US20140192148A1 (en) * | 2011-08-15 | 2014-07-10 | Telefonaktiebolaget L M Ericsson (Publ) | Encoder, Method in an Encoder, Decoder and Method in a Decoder for Providing Information Concerning a Spatial Validity Range |
US9497435B2 (en) * | 2011-08-15 | 2016-11-15 | Telefonaktiebolaget Lm Ericsson (Publ) | Encoder, method in an encoder, decoder and method in a decoder for providing information concerning a spatial validity range |
US10390046B2 (en) * | 2011-11-07 | 2019-08-20 | Qualcomm Incorporated | Coding significant coefficient information in transform skip mode |
US20130114730A1 (en) * | 2011-11-07 | 2013-05-09 | Qualcomm Incorporated | Coding significant coefficient information in transform skip mode |
US9877008B2 (en) * | 2011-11-11 | 2018-01-23 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US11350075B2 (en) | 2011-11-11 | 2022-05-31 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US20170078641A1 (en) * | 2011-11-11 | 2017-03-16 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US10154245B2 (en) * | 2011-11-11 | 2018-12-11 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US10506214B2 (en) | 2011-11-11 | 2019-12-10 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US10687042B2 (en) | 2011-11-11 | 2020-06-16 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US10827158B2 (en) | 2011-11-11 | 2020-11-03 | Ge Video Compression, Llc | Concept for determining a measure for a distortion change in a synthesized view due to depth map modifications |
US20140328389A1 (en) * | 2011-12-22 | 2014-11-06 | Mediatek Inc. | Method and apparatus of texture image compression in 3d video coding |
US9560362B2 (en) * | 2011-12-22 | 2017-01-31 | Mediatek Inc. | Method and apparatus of texture image compression in 3D video coding |
US20130278597A1 (en) * | 2012-04-20 | 2013-10-24 | Total 3rd Dimension Systems, Inc. | Systems and methods for real-time conversion of video into three-dimensions |
US9384581B2 (en) * | 2012-04-20 | 2016-07-05 | Affirmation, Llc | Systems and methods for real-time conversion of video into three-dimensions |
US20140002594A1 (en) * | 2012-06-29 | 2014-01-02 | Hong Kong Applied Science and Technology Research Institute Company Limited | Hybrid skip mode for depth map coding and decoding |
US9165393B1 (en) * | 2012-07-31 | 2015-10-20 | Dreamworks Animation Llc | Measuring stereoscopic quality in a three-dimensional computer-generated scene |
US20140071233A1 (en) * | 2012-09-11 | 2014-03-13 | Samsung Electronics Co., Ltd. | Apparatus and method for processing image using correlation between views |
US9681154B2 (en) * | 2012-12-06 | 2017-06-13 | Patent Capital Group | System and method for depth-guided filtering in a video conference environment |
US20140160239A1 (en) * | 2012-12-06 | 2014-06-12 | Dihong Tian | System and method for depth-guided filtering in a video conference environment |
US20150334418A1 (en) * | 2012-12-27 | 2015-11-19 | Nippon Telegraph And Telephone Corporation | Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program |
US9924197B2 (en) * | 2012-12-27 | 2018-03-20 | Nippon Telegraph And Telephone Corporation | Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program |
US10911779B2 (en) * | 2013-10-17 | 2021-02-02 | Nippon Telegraph And Telephone Corporation | Moving image encoding and decoding method, and non-transitory computer-readable media that code moving image for each of prediction regions that are obtained by dividing coding target region while performing prediction between different views |
US11218681B2 (en) * | 2017-06-29 | 2022-01-04 | Koninklijke Philips N.V. | Apparatus and method for generating an image |
US11017540B2 (en) * | 2018-04-23 | 2021-05-25 | Cognex Corporation | Systems and methods for improved 3-d data reconstruction from stereo-temporal image sequences |
US11593954B2 (en) | 2018-04-23 | 2023-02-28 | Cognex Corporation | Systems and methods for improved 3-D data reconstruction from stereo-temporal image sequences |
US11074700B2 (en) | 2018-04-23 | 2021-07-27 | Cognex Corporation | Systems, methods, and computer-readable storage media for determining saturation data for a temporal pixel |
US11069074B2 (en) | 2018-04-23 | 2021-07-20 | Cognex Corporation | Systems and methods for improved 3-D data reconstruction from stereo-temporal image sequences |
US11468673B2 (en) | 2018-08-24 | 2022-10-11 | Snap Inc. | Augmented reality system using structured light |
US10909373B1 (en) * | 2018-08-24 | 2021-02-02 | Snap Inc. | Augmented reality system using structured light |
US20230237730A1 (en) * | 2022-01-21 | 2023-07-27 | Meta Platforms Technologies, Llc | Memory structures to support changing view direction |
Also Published As
Publication number | Publication date |
---|---|
WO2010093351A1 (en) | 2010-08-19 |
US20110292043A1 (en) | 2011-12-01 |
US9066075B2 (en) | 2015-06-23 |
WO2010093350A1 (en) | 2010-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9066075B2 (en) | Depth map coding to reduce rendered distortion | |
US10417748B2 (en) | Filtering and edge encoding and decoding for depth maps | |
US9148673B2 (en) | Depth map coding | |
US8165201B2 (en) | Method of computing disparity, method of synthesizing interpolation view, method of encoding and decoding multi-view video using the same, and encoder and decoder using the same | |
US9569819B2 (en) | Coding of depth maps | |
US8537200B2 (en) | Depth map generation techniques for conversion of 2D video data to 3D video data | |
CA2941085C (en) | Method for low-latency illumination compensation process and depth lookup table based coding | |
Macchiavello et al. | Loss-resilient coding of texture and depth for free-viewpoint video conferencing | |
US9503751B2 (en) | Method and apparatus for simplified depth coding with extended prediction modes | |
US20150365698A1 (en) | Method and Apparatus for Prediction Value Derivation in Intra Coding | |
Shao et al. | Fast color correction for multi-view video by modeling spatio-temporal variation | |
Bal et al. | Multiview video plus depth coding with depth-based prediction mode | |
WO2010021664A1 (en) | Depth coding | |
Macchiavello et al. | Reference frame selection for loss-resilient texture & depth map coding in multiview video conferencing | |
De Silva et al. | Intra mode selection method for depth maps of 3D video based on rendering distortion modeling | |
CN107409211B (en) | A kind of video coding-decoding method and device | |
Dou et al. | View synthesis optimization based on texture smoothness for 3D-HEVC | |
De Silva et al. | A novel depth map quality metric and its usage in depth map coding | |
Micallef et al. | Low complexity disparity estimation for immersive 3D video transmission | |
US20190356912A1 (en) | Information processing apparatus, information processing method and computer-readable recording medium having stored program therein | |
Shao et al. | Colour correction pre-processing and chrominance reconstruction post-processing for multi-view video coding | |
Dou et al. | An adaptive segment-based view synthesis optimization method for 3D-HEVC | |
JP2013110555A (en) | Image encoder, image decoder, and method and program thereof | |
Gao | Error-resilient multi-view video plus depth based 3-D video coding | |
Cernigliaro et al. | Depth perceptual video coding for free viewpoint video based on H. 264/AVC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: THOMSON LICENSING, FRANCE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIM, WOO-SHIK;LAI, PO-LIN;TIAN, DONG;REEL/FRAME:026743/0790 Effective date: 20090518 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |