WO2014053086A1 - Method and apparatus of motion vector derivation 3d video coding - Google Patents

Method and apparatus of motion vector derivation 3d video coding Download PDF

Info

Publication number
WO2014053086A1
WO2014053086A1 PCT/CN2013/082800 CN2013082800W WO2014053086A1 WO 2014053086 A1 WO2014053086 A1 WO 2014053086A1 CN 2013082800 W CN2013082800 W CN 2013082800W WO 2014053086 A1 WO2014053086 A1 WO 2014053086A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate
view
reference picture
inter
current block
Prior art date
Application number
PCT/CN2013/082800
Other languages
French (fr)
Inventor
Kai Zhang
Jicheng An
Original Assignee
Mediatek Singapore Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mediatek Singapore Pte. Ltd. filed Critical Mediatek Singapore Pte. Ltd.
Priority to SG11201502627QA priority Critical patent/SG11201502627QA/en
Priority to CN201380052367.1A priority patent/CN104718760B/en
Priority to EP13843228.1A priority patent/EP2904800A4/en
Priority to US14/433,328 priority patent/US9924168B2/en
Publication of WO2014053086A1 publication Critical patent/WO2014053086A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation

Definitions

  • the present invention relates to video coding.
  • the present invention relates to motion vector candidate list derivation for advanced motion vector prediction (AMVP) and Merge mode in three-dimensional video coding and multi-view video coding.
  • AMVP advanced motion vector prediction
  • Multi-view video is a technique to capture and render 3D video.
  • the multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint.
  • the multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth.
  • a straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance.
  • multi-view video coding In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras.
  • depth data is often captured or derived as well.
  • the depth data may be captured for video associated with one view or multiple views.
  • the depth information may also be derived from images of different views.
  • the depth data may be represented in lower spatial resolution than the texture data. The depth information is useful for view synthesis and inter- view prediction.
  • DCP disparity-compensated prediction
  • HTM High Efficiency Video Coding (HEVC)-based Test Model
  • MCP Motion- compensated prediction
  • Fig. 1 illustrates an example of 3D video coding system incorporating MCP and DCP.
  • the vector (110) used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP.
  • DV disparity vector
  • Fig. 1 illustrates three MVs (120, 130 and 140) associated with MCP.
  • the DV of a DCP block can also be predicted by the disparity vector predictor (DVP) candidate derived from neighboring blocks or the temporal collocated blocks that also use inter- view reference pictures.
  • DVP disparity vector predictor
  • FIG. 2 The derivation of inter-view motion prediction is illustrated in Fig. 2.
  • An estimated disparity vector 210 (DV) is derived for the current block (222) in the current picture (220).
  • the estimated DV (210) is used along with the current block (222) to locate the corresponding block (232) in the base-view picture (230) by combining the position of the current block and the estimated DV
  • a condition is checked to determine whether the corresponding block (232) is Inter-coded and the Picture Order Count (POC) of the reference picture (240) is in the reference lists of the current block (222).
  • POC Picture Order Count
  • the MV (260) of the corresponding block (232) will be provided as the inter-view motion prediction for the current block (222), where the MV (260) of the corresponding block (232) is used by the current block (222) to point to a reference picture (250) in the same view as the current picture (220). Otherwise, the estimated DV itself (with vertical component set to zero) can be regarded as a 'Motion Vector Prediction (MVP)', which is actually DV Prediction (DVP).
  • MVP 'Motion Vector Prediction
  • the estimated DV plays a critical role in the process of inter-view motion prediction.
  • the estimated DV is derived by checking whether spatial or temporal neighboring blocks have any available DV. If so, an available DV will be used as the estimated DV for the current block. If none of the neighboring blocks has any available DV, the conventional HTM adopts a technique, named DV-MCP (Disparity Vector - Motion Compensated Prediction) to provide an estimated DV.
  • the DV-MCP technique determines the estimated DV based on the depth map of the current block. If the DV-MCP method also fails to find an estimated DV, a zero DV is used as the default DV.
  • Merge mode is provided for Inter coded block to allow the block to be "merged" with a neighboring block.
  • the motion information can be determined from the coded neighboring blocks.
  • a set of possible candidates in Merge mode comprises spatial neighbor candidates and a temporal candidate. Index information is transmitted to select one out of several available candidates. Therefore, only residual information for the selected block needs to send.
  • Skip mode is similar to Merge mode where no motion information needs to be explicitly transmitted. For a block coded in Skip mode, there is also no need to explicitly transmit the residual information. The residual information can be inferred as default values, such as zero.
  • there are two types of Inter-coded blocks Merge and non-Merge. When an Inter-coded block is not coded in Merge/Skip mode, the Inter- coded block is coded according to Advanced Motion Vector Prediction (AMVP).
  • AMVP Advanced Motion Vector Prediction
  • an inter- view candidate is introduced into the MV candidate list.
  • the inter-view candidate can be inter-view motion prediction or DV prediction depending on the existence condition for Merge coded blocks and depending on the target reference picture for AMVP coded blocks as mentioned before.
  • the inter-view candidate is placed in the first candidate position (i.e., position 0) for Merge coded blocks and the third candidate position (i.e., position 2) for AMVP coded blocks.
  • the MV candidate list is constructed in the same way regardless of whether the target reference picture of the current block corresponds to an inter-view reference picture or a temporal reference picture.
  • the MV candidate list is constructed in the same way regardless of whether the inter-view candidate of the current block refers to an inter-view reference picture or a temporal reference picture.
  • the target reference picture is specified explicitly.
  • the DV estimation process is invoked first to find an estimated DV.
  • the AMVP derivation process will fill up the candidate list, where the candidate list includes spatial candidates, temporal candidate and inter-view candidate.
  • the term candidate in this disclosure may refer to DV candidate, MV candidate or MVP candidate.
  • the spatial candidates are derived based on neighboring blocks as shown in Fig. 3 A, where neighboring blocks include Above_Left block (B 2 ), Above block (B , Above_Right block (Bo), Left block (AO and Below_Left block (A 0 ).
  • a spatial candidate is selected among Bo - B2 and another spatial candidate is selected from Ao and Ai.
  • the inter-view candidate is checked to determine if it refers to the target reference picture.
  • the temporal candidate is then derived based on temporal neighboring blocks as shown in Fig. 3B, where the temporal neighboring blocks include a collocated center block (B CTR ) and Right_Bottom block (RB).
  • B CTR collocated center block
  • RB Right_Bottom block
  • HTM Right_Bottom block
  • RB is checked first for the temporal candidate and, if no MV is found, the collocated center block (B CTR ) is checks.
  • Fig. 3C shows a simplified flowchart of the AMVP candidate list derivation process.
  • An estimated DV is received as a possible inter-view candidate as shown in step 310.
  • the DVs from neighboring blocks are checked in step 320 through step 360 to derive spatial candidates.
  • Below_Left block is checked and if an MV is available, the first spatial MV candidate is derived. In this case, the process continues to derive the second spatial MV candidate. Otherwise, Left block is check as shown in step 330.
  • Above block is checked first. If an MV is available, the second spatial MV candidate is derived. Otherwise, the process further checks Above_Right block as shown in step 350.
  • the second spatial MV is derived. Otherwise, it further checks Above_Left block as shown in step 360.
  • the inter-view candidate is checked to determine whether it refers to the target reference picture as shown in step 370.
  • a temporal candidate is checked in step 380. If the temporal candidate exists, it is added to the MV candidate list for AMVP. The POC scaling checking is omitted in the following discussion.
  • the spatial candidates, temporal candidate and inter- view candidate are all referred as candidate members of the candidate list.
  • An exemplary DV estimation process 400 is shown in Fig. 4. Neighboring blocks are checked one by one as shown in steps 410-450 of Fig. 4 to determine whether a DV is available in the neighboring block. Whenever an available DV is found, there is no need to further check the remaining neighboring blocks or to use DV-MCP. The estimated DV is considered as an 'MV's referring to an inter- view reference picture. After spatial neighboring blocks are checked, if no available DV is found, the DV of the temporal neighboring block is checked in step 460 to determine whether a DV is available. DV-MCP method is used to derive an estimated DV if none of the spatial and temporal neighboring blocks has an available DV. In this case, the depth map of the current block is used to derive the estimated DV as shown in step 470.
  • the target reference picture determined for an AMVP coded block is an inter-view reference picture
  • both processes check the availability of the motion information, where both DV and MV are consider part of motion information associated with a block, among the spatial and temporal neighboring blocks to determine if there is an 'MV referring to the inter-view reference picture.
  • all the neighboring blocks will be checked for the second time in different orders, as shown in Fig. 5.
  • the estimated DV derivation process (400) will determine the estimated DV.
  • the estimated DV is used during the AMVP candidate list derivation process to fill up the needed candidates in the list.
  • the inter-view candidate is used as DVP instead of a candidate for the inter- view motion prediction when the target reference picture is an inter- view reference picture.
  • the inter-view candidate in this case is based on DVs of spatial and temporal neighboring blocks.
  • the spatial and temporal candidates of the MV candidate list for AMVP correspond to DVs of the spatial and temporal neighboring blocks pointing to the inter-view reference picture. Therefore, the inter-view candidate and spatial/temporal candidates are derived based on the same motion information. Therefore, the inter- view candidate in this case may not be efficient.
  • inter-view candidate is placed in the first candidate position in the candidate list.
  • the inter-view candidate can be used in the inter-view motion prediction or used for DVP, depending on the existence condition.
  • inter-view motion prediction the inter-view candidate refers to a temporal reference picture.
  • DVP the DVP refers to an inter-view reference picture. It may not be efficient to place the inter-view candidate at the first candidate position when the inter-view candidate is used as DVP.
  • Embodiments according to the present invention construct a motion vector (MV) or disparity vector (DV) candidate list for a block coded in the advanced motion vector prediction (AMVP) mode or Merge mode.
  • a first candidate referring to a reference picture corresponding to an inter-view reference picture is derived.
  • a second candidate referring to a reference picture corresponding to a non-inter-view reference picture is derived.
  • An AMVP candidate list is constructed accordingly that comprises at least the first candidate and the second candidate, wherein the first candidate is set to a lower priority position in the AMVP candidate list than the second candidate.
  • Three-dimensional or multi-view video encoding or decoding is applied to the input data associated with the current block using the AMVP candidate list if the current block is coded with AMVP mode.
  • an estimated disparity vector (DV) from neighboring disparity vectors (DVs) associated with neighboring blocks of the current block is derived, wherein the estimated DV is used to derive the first candidate.
  • the estimated DV can be derived from the first available DV among the neighboring DVs.
  • candidate members of the AMVP candidate list are also generated during the process of generating the estimated DV. In this case, each of the neighboring motion information associated with one neighboring block is searched at most once.
  • the second candidate is an inter-view candidate and is placed in the first candidate position.
  • the first candidate will not be included in the AMVP candidate list or a default non-zero DV is used as the estimated DV if the estimated DV is not available.
  • Information associated with the default non-zero DV can be signaled in a bitstream generated by the three-dimensional video coding or the multi-view video coding. Furthermore, the information associated with the default non-zero DV can be signaled in the bitstream for a sequence, picture, or a slice or region of the picture.
  • Embodiments according to the present invention construct a Merge candidate list for a current block coded in Merge mode.
  • a first candidate referring to a reference picture corresponding to an inter-view reference picture is derived.
  • a second candidate referring to a reference picture corresponding to non-inter-view reference picture is derived.
  • a Merge candidate list is constructed to comprise at least the first candidate and second candidate, wherein the first candidate is set to a lower priority position in the Merge candidate list than the second candidate.
  • Three-dimensional or multi-view video encoding or decoding is applied to the input data associated with the current block using the Merge candidate list.
  • the first candidate in the Merge candidate list is placed at a selected candidate position.
  • the first candidate in the Merge candidate list can be placed at the fourth candidate position.
  • the first candidate in the Merge candidate list may correspond to disparity vector prediction and the second candidate in the Merge candidate list may correspond to temporal motion prediction or inter-view motion prediction.
  • the second candidate can be placed at the first position of the Merge candidate list if corresponding to inter-view motion prediction.
  • Fig. 1 illustrates an example of three-dimensional coding or multi-view coding, where both motion-compensated prediction and disparity-compensated prediction are used.
  • Fig. 2 illustrates an example of derivation process for inter- view motion prediction based on an estimated disparity vector.
  • Fig. 3A illustrates spatial neighboring blocks used for deriving candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
  • AMVP advanced motion vector prediction
  • Fig. 3B illustrates temporal neighboring blocks used for deriving candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
  • AMVP advanced motion vector prediction
  • Fig. 3C illustrates an exemplary process for filling up candidate members of a motion vector member list for advanced motion vector prediction (AM VP).
  • AM VP advanced motion vector prediction
  • Fig. 4 illustrates an example of derivation process for disparity vector estimation used by advanced motion vector prediction (AMVP) to derive an inter- view candidate.
  • AMVP advanced motion vector prediction
  • Fig. 5 illustrates exemplary derivation process for an estimated disparity vector and for filling up candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
  • AMVP advanced motion vector prediction
  • Fig. 6 illustrates an example of single derivation process for both estimated disparity vector and filling up candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
  • AMVP advanced motion vector prediction
  • Fig. 7 illustrates an exemplary flowchart of a three-dimensional or multi-view coding incorporating candidate list construction for advanced motion vector prediction (AMVP) based on an embodiment of the present invention.
  • AMVP advanced motion vector prediction
  • Fig. 8 illustrates an exemplary flowchart of a three-dimensional or multi-view coding incorporating candidate list construction for Merge mode based on an embodiment of the present invention.
  • Embodiments of the present invention derive candidate list for advanced motion vector prediction (AMVP) coded blocks depending on whether the target reference picture is an inter-view reference picture.
  • AMVP advanced motion vector prediction
  • Embodiments of the present invention also derive candidate list for Merge coded blocks depending on whether the inter-view candidate refers to an inter- view reference picture.
  • a zero DV is used if neither the DV from neighboring blocks nor the DV from DV-MCP derivation process is available.
  • a zero DV between two inter-view pictures implies that two corresponding cameras are at the same location or the two cameras are at an infinite distance.
  • the zero DV is not a good default value. Accordingly, an embodiment according to the present invention will forbid setting the DV to be zero if neither the DV from neighboring blocks nor the DV from DV-MCP derivation process is available. Instead, an embodiment according to the present invention will not use the inter-view candidate in this case.
  • the inter-view candidate When the inter-view candidate is declared as unavailable instead of using a default zero DV, it will provide an opportunity for additional candidate to be selected.
  • Another way to overcome the issue with the default zero- DV is to use a global DV or a default non-zero DV, which can be transmitted in the bitstream. And this global DV or non-zero DV will be treated as the default estimated DV when the DV estimation process fails.
  • the global DV may correspond to typical average DV.
  • the performance of a system incorporating an embodiment of the present invention is compared with the conventional 3D video coding. If neither the DV from neighboring blocks nor the DV from DV-MCP derivation process is available, the conventional 3D video coding uses a default zero DV while the system incorporating an embodiment of the present invention forbids the use of a default zero DV.
  • the system incorporating an embodiment of the present invention achieves slightly better performance in terms of BD-Rate (0% to 0.1%) for various test video materials, where the BD-rate is a commonly used performance measurement in the field of video coding.
  • the required processing times for encoding, decoding and rendering show noticeable improvement (reduction of processing times by 4%, 3.1% and 5.4% respectively).
  • FIG. 6 illustrates an exemplary flowchart for an AMVP candidate derivation process incorporating an embodiment of the present invention.
  • the AMVP candidate derivation process incorporating an embodiment of the present invention has several benefits over the conventional approach.
  • the inefficient DV candidate with a zero vertical component is also removed.
  • DV-MCP can be used as an AMVP candidate, which may result in more coding gain. For example, if only the Above-Left neighboring block has a DV and the DV value is (dx,dy), then there will be only two non-zero candidates: (dx,dy) and (dx, 0) according to the conventional HTM. However, the candidate list derivation according to the present invention, will remove (dx, 0) due to its redundancy. Therefore, one or more DV-MCP candidates may be added into the candidates list. This would provide opportunity to derive candidates that can be more reasonable and efficient since some redundant candidates would be eliminated during checking the DVs of the neighboring blocks.
  • the AMVP candidate list derivation process places the inter-view candidate at the first candidate position in the candidate list according to an embodiment of the present invention.
  • the rationale behind this embodiment is that the inter- view candidate in this case is likely more efficient than DVP as being used in the conventional 3D-AVC (3D Advanced Video Coding).
  • the performance of a system incorporating an embodiment of the present invention as described above is compared with the conventional 3D video coding.
  • the system incorporating an embodiment of the present invention uses the DV estimation process to fill up the MV or DV candidate list when the target reference picture is an inter-view reference picture.
  • the system incorporating an embodiment of the present invention achieves slightly better performance in terms of BD-Rate (0% to 0.2%) for various test video materials.
  • the required processing times for encoding, decoding and rendering i.e., view synthesis
  • show noticeable improvement (reduction of processing times by 5.6%, 6.7% and 5.9% respectively).
  • the DV estimation process may also forbid the use of a default zero DV when the DV estimation process fails to find an available DV.
  • an embodiment of the present invention changes the inter- view candidate position in the candidate list when the inter-view candidate refers to an inter-view reference picture.
  • the inter-view candidate actually is used for DVP.
  • the inter-view candidate is likely not a good choice.
  • the inter-view candidate is placed at the first candidate position (i.e., the highest priority). Accordingly, an embodiment of the present invention for candidate list derivation for Merge mode places the inter-view candidate at the fourth candidate position when the inter-view candidate refers to an inter-view reference picture.
  • the fourth candidate position is used as an example to lower the priority of the inter-view candidate in the candidate list, other lower-priority candidate position may also be used.
  • the inter-view candidate refers to a temporal reference picture
  • the inter-view motion prediction is carried out. In this case, it is reasonable to keep the interview candidate at the first candidate position.
  • the performance of a system incorporating an embodiment of the present invention as described above is compared with the conventional 3D video coding.
  • the system incorporating an embodiment of the present invention places the inter-view candidate at the fourth candidate position when the inter-view candidate refers to an inter-view reference picture, while the system leaves the inter-view candidate at the first candidate position when the inter-view candidate refers to a temporal reference picture.
  • the system incorporating an embodiment of the present invention achieves slightly better performance in terms of BD-Rate (0% to 0.2%) for various test video materials.
  • the required processing times for encoding, decoding and rendering i.e., view synthesis
  • show noticeable improvement (reduction of processing times by 0.3%, 5.3% and 1.8% respectively).
  • a system may also combine the techniques disclosed above.
  • a system may incorporate the DV estimation process forbidding a default zero DV when the DV estimation process fails to find an available DC, the AMVP candidate list derivation process using the DV estimation process to fill up the candidate list when the target reference picture corresponds to an inter-view reference picture, and the candidate list derivation process for Merge mode that places the inter- view candidate in the fourth position in the candidate list if the inter- view candidate refers to an inter- view reference picture.
  • the performance of the system using the combined technique is compared with a conventional system.
  • the system incorporating an embodiment of the present invention achieves better performance in terms of BD-Rate (0% to 0.5%) for various test video materials.
  • the required processing times for encoding, decoding and rendering i.e., view synthesis
  • show noticeable improvement (reduction of processing times by 2.8%, 6.7% and 5.1% respectively).
  • Fig. 7 illustrates an exemplary flowchart of a three-dimensional/multi-view encoding or decoding system incorporating an MV or DV candidate list construction process according to an embodiment of the present invention for a current block coded in the advanced motion vector prediction (AMVP) mode.
  • the system receives input data associated with a current block in a dependent view as shown in step 710.
  • the input data associated with the current block corresponds to original pixel data, depth data, or other information associated with the current block (e.g., motion vector, disparity vector, motion vector difference, or disparity vector difference) to be coded.
  • the input data corresponds to the coded data associated with the current block in the dependent view.
  • the input data may be retrieved from storage such as a computer memory, buffer (RAM or DRAM) or other media.
  • the input data may also be received from a processor such as a controller, a central processing unit, a digital signal processor or electronic circuits that produce the input data.
  • a first candidate referring to a reference picture corresponding to an inter-view reference picture is derived as shown in step 720.
  • a second candidate referring to a reference picture corresponding to a non- inter-view reference picture is derived as shown in step 730.
  • An AMVP candidate list comprising the first candidate and the second candidate is constructed, wherein the first candidate is set to a lower priority position in the AMVP candidate list than the second candidate as shown I step 740.
  • Three-dimensional or multi-view video encoding or decoding is then applied to the input data associated with the current block using the AMVP candidate list as shown in step 7450.
  • Fig. 8 illustrates an exemplary flowchart of a three-dimensional/multi-view encoding or decoding system incorporating an MV or DV candidate list construction process according to an embodiment of the present invention for a current block coded in Merge mode.
  • the system receives input data associated with a current block in a dependent view as shown in step 810.
  • a first candidate referring to one reference picture corresponding to an inter-view reference picture is derived as shown in step 820.
  • a second candidate referring to one reference picture corresponding to a non-inter-view reference picture is derived as shown in step 830.
  • a Merge candidate list comprising the first candidate and the second candidate is constructed as shown in step 840, wherein the first candidate is set to a lower priority position in the Merge candidate list than the second candidate.
  • Three-dimensional or multi-view video encoding or decoding is then applied to the input data associated with the current block using the Merge candidate list as shown in step 850.
  • Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both.
  • an embodiment of the present invention can be a circuit integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein.
  • An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein.
  • DSP Digital Signal Processor
  • the invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention.
  • the software code or firmware code may be developed in different programming languages and different formats or styles.
  • the software code may also be compiled for different target platforms.
  • different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

Abstract

A method and apparatus for three-dimensional and multi-view video coding are disclosed, where the motion vector (MV) or disparity vector (DV) candidate list construction process for a block depends on whether the target reference picture corresponds to an inter-view reference picture or whether the inter-view candidate refers to an inter-view reference picture. In one embodiment, an MV or DV candidate list for a block coded in Merge mode is constructed, and an inter-view candidate in the MV or DV candidate list is set lower than the first candidate position if the inter-view candidate refers to an inter-view reference picture. In another embodiment, an MV or DV candidate list for a block coded in advanced motion vector prediction mode is constructed, and an inter-view candidate is set lower than the first candidate position if the inter-view candidate refers to an inter-view reference picture.

Description

METHOD AND APPARATUS OF MOTION VECTOR DERIVATION 3D
VIDEO CODING
CROSS REFERENCE TO RELATED APPLICATIONS
The present invention claims priority to U.S. Provisional Patent Application, Serial No.
61/710,064, filed on October 5, 2012, entitled "Improvements on MV Candidates". The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.
FIELD OF INVENTION
The present invention relates to video coding. In particular, the present invention relates to motion vector candidate list derivation for advanced motion vector prediction (AMVP) and Merge mode in three-dimensional video coding and multi-view video coding.
BACKGROUND OF THE INVENTION
Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras.
For 3D video, in addition to the conventional texture data associated with multiple views, depth data is often captured or derived as well. The depth data may be captured for video associated with one view or multiple views. The depth information may also be derived from images of different views. The depth data may be represented in lower spatial resolution than the texture data. The depth information is useful for view synthesis and inter- view prediction.
To share the previously coded texture information of adjacent views, a technique known as disparity-compensated prediction (DCP) has been included in the HTM (High Efficiency Video Coding (HEVC)-based Test Model) software test platform as an alternative to motion- compensated prediction (MCP). MCP refers to Inter-picture prediction that uses previously coded pictures of the same view, while DCP refers to an Inter-picture prediction that uses previously coded pictures of other views in the same access unit. Fig. 1 illustrates an example of 3D video coding system incorporating MCP and DCP. The vector (110) used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. Fig. 1 illustrates three MVs (120, 130 and 140) associated with MCP. Moreover, the DV of a DCP block can also be predicted by the disparity vector predictor (DVP) candidate derived from neighboring blocks or the temporal collocated blocks that also use inter- view reference pictures.
The derivation of inter-view motion prediction is illustrated in Fig. 2. An estimated disparity vector 210 (DV) is derived for the current block (222) in the current picture (220). The estimated DV (210) is used along with the current block (222) to locate the corresponding block (232) in the base-view picture (230) by combining the position of the current block and the estimated DV A condition is checked to determine whether the corresponding block (232) is Inter-coded and the Picture Order Count (POC) of the reference picture (240) is in the reference lists of the current block (222). If the existence condition is true, the MV (260) of the corresponding block (232) will be provided as the inter-view motion prediction for the current block (222), where the MV (260) of the corresponding block (232) is used by the current block (222) to point to a reference picture (250) in the same view as the current picture (220). Otherwise, the estimated DV itself (with vertical component set to zero) can be regarded as a 'Motion Vector Prediction (MVP)', which is actually DV Prediction (DVP).
The estimated DV plays a critical role in the process of inter-view motion prediction. In the conventional HTM, the estimated DV is derived by checking whether spatial or temporal neighboring blocks have any available DV. If so, an available DV will be used as the estimated DV for the current block. If none of the neighboring blocks has any available DV, the conventional HTM adopts a technique, named DV-MCP (Disparity Vector - Motion Compensated Prediction) to provide an estimated DV. The DV-MCP technique determines the estimated DV based on the depth map of the current block. If the DV-MCP method also fails to find an estimated DV, a zero DV is used as the default DV.
In HTM, Merge mode is provided for Inter coded block to allow the block to be "merged" with a neighboring block. For a selected block coded in Merge mode, the motion information can be determined from the coded neighboring blocks. A set of possible candidates in Merge mode comprises spatial neighbor candidates and a temporal candidate. Index information is transmitted to select one out of several available candidates. Therefore, only residual information for the selected block needs to send. Skip mode is similar to Merge mode where no motion information needs to be explicitly transmitted. For a block coded in Skip mode, there is also no need to explicitly transmit the residual information. The residual information can be inferred as default values, such as zero. In general, there are two types of Inter-coded blocks: Merge and non-Merge. When an Inter-coded block is not coded in Merge/Skip mode, the Inter- coded block is coded according to Advanced Motion Vector Prediction (AMVP). The MV candidate lists for Merge coded block and AMVP coded block are constructed differently.
In Three-Dimensional Video Coding (3DVC), an inter- view candidate is introduced into the MV candidate list. The inter-view candidate can be inter-view motion prediction or DV prediction depending on the existence condition for Merge coded blocks and depending on the target reference picture for AMVP coded blocks as mentioned before. The inter-view candidate is placed in the first candidate position (i.e., position 0) for Merge coded blocks and the third candidate position (i.e., position 2) for AMVP coded blocks. For AMVP coded blocks, the MV candidate list is constructed in the same way regardless of whether the target reference picture of the current block corresponds to an inter-view reference picture or a temporal reference picture. Similarly, for Merge coded blocks, the MV candidate list is constructed in the same way regardless of whether the inter-view candidate of the current block refers to an inter-view reference picture or a temporal reference picture.
For AMVP, the target reference picture is specified explicitly. For MV candidate list constructed for AMVP coded blocks, the DV estimation process is invoked first to find an estimated DV. The AMVP derivation process will fill up the candidate list, where the candidate list includes spatial candidates, temporal candidate and inter-view candidate. The term candidate in this disclosure may refer to DV candidate, MV candidate or MVP candidate. The spatial candidates are derived based on neighboring blocks as shown in Fig. 3 A, where neighboring blocks include Above_Left block (B2), Above block (B , Above_Right block (Bo), Left block (AO and Below_Left block (A0). A spatial candidate is selected among Bo - B2 and another spatial candidate is selected from Ao and Ai. After spatial MV candidates are derived, the inter-view candidate is checked to determine if it refers to the target reference picture. The temporal candidate is then derived based on temporal neighboring blocks as shown in Fig. 3B, where the temporal neighboring blocks include a collocated center block (BCTR) and Right_Bottom block (RB). In HTM, Right_Bottom block (RB) is checked first for the temporal candidate and, if no MV is found, the collocated center block (BCTR) is checks.
Fig. 3C shows a simplified flowchart of the AMVP candidate list derivation process. An estimated DV is received as a possible inter-view candidate as shown in step 310. The DVs from neighboring blocks are checked in step 320 through step 360 to derive spatial candidates. As shown in step 320, Below_Left block is checked and if an MV is available, the first spatial MV candidate is derived. In this case, the process continues to derive the second spatial MV candidate. Otherwise, Left block is check as shown in step 330. For the second spatial MV candidate, Above block is checked first. If an MV is available, the second spatial MV candidate is derived. Otherwise, the process further checks Above_Right block as shown in step 350. If an MV is available, the second spatial MV is derived. Otherwise, it further checks Above_Left block as shown in step 360. The inter-view candidate is checked to determine whether it refers to the target reference picture as shown in step 370. A temporal candidate is checked in step 380. If the temporal candidate exists, it is added to the MV candidate list for AMVP. The POC scaling checking is omitted in the following discussion. The spatial candidates, temporal candidate and inter- view candidate are all referred as candidate members of the candidate list.
An exemplary DV estimation process (400) is shown in Fig. 4. Neighboring blocks are checked one by one as shown in steps 410-450 of Fig. 4 to determine whether a DV is available in the neighboring block. Whenever an available DV is found, there is no need to further check the remaining neighboring blocks or to use DV-MCP. The estimated DV is considered as an 'MV's referring to an inter- view reference picture. After spatial neighboring blocks are checked, if no available DV is found, the DV of the temporal neighboring block is checked in step 460 to determine whether a DV is available. DV-MCP method is used to derive an estimated DV if none of the spatial and temporal neighboring blocks has an available DV. In this case, the depth map of the current block is used to derive the estimated DV as shown in step 470.
If the target reference picture determined for an AMVP coded block is an inter-view reference picture, there might be redundancy between the neighboring block checking in the DV estimation process and the AMVP candidate list derivation process. Both processes check the availability of the motion information, where both DV and MV are consider part of motion information associated with a block, among the spatial and temporal neighboring blocks to determine if there is an 'MV referring to the inter-view reference picture. In the worst case, all the neighboring blocks will be checked for the second time in different orders, as shown in Fig. 5. The estimated DV derivation process (400) will determine the estimated DV. The estimated DV is used during the AMVP candidate list derivation process to fill up the needed candidates in the list. Moreover, the inter-view candidate is used as DVP instead of a candidate for the inter- view motion prediction when the target reference picture is an inter- view reference picture. The inter-view candidate in this case is based on DVs of spatial and temporal neighboring blocks. On the other hand, since the target reference picture is an inter- view reference picture, the spatial and temporal candidates of the MV candidate list for AMVP correspond to DVs of the spatial and temporal neighboring blocks pointing to the inter-view reference picture. Therefore, the inter-view candidate and spatial/temporal candidates are derived based on the same motion information. Therefore, the inter- view candidate in this case may not be efficient.
Another issue with the conventional 3D video coding as described in the conventional HTM is related to the candidate list derivation for Merge mode. In Merge mode, inter-view candidate is placed in the first candidate position in the candidate list. As mentioned before, the inter-view candidate can be used in the inter-view motion prediction or used for DVP, depending on the existence condition. In inter-view motion prediction, the inter-view candidate refers to a temporal reference picture. In the case of DVP, the DVP refers to an inter-view reference picture. It may not be efficient to place the inter-view candidate at the first candidate position when the inter-view candidate is used as DVP.
SUMMARY OF THE INVENTION
A method and apparatus for three-dimensional video coding and multi-view video coding are disclosed. Embodiments according to the present invention construct a motion vector (MV) or disparity vector (DV) candidate list for a block coded in the advanced motion vector prediction (AMVP) mode or Merge mode. A first candidate referring to a reference picture corresponding to an inter-view reference picture is derived. A second candidate referring to a reference picture corresponding to a non-inter-view reference picture is derived. An AMVP candidate list is constructed accordingly that comprises at least the first candidate and the second candidate, wherein the first candidate is set to a lower priority position in the AMVP candidate list than the second candidate. Three-dimensional or multi-view video encoding or decoding is applied to the input data associated with the current block using the AMVP candidate list if the current block is coded with AMVP mode.
In the first candidate derivation process, an estimated disparity vector (DV) from neighboring disparity vectors (DVs) associated with neighboring blocks of the current block is derived, wherein the estimated DV is used to derive the first candidate. The estimated DV can be derived from the first available DV among the neighboring DVs. In one embodiment, candidate members of the AMVP candidate list are also generated during the process of generating the estimated DV. In this case, each of the neighboring motion information associated with one neighboring block is searched at most once. In yet another embodiment, the second candidate is an inter-view candidate and is placed in the first candidate position. In the process of deriving the estimated DV, the first candidate will not be included in the AMVP candidate list or a default non-zero DV is used as the estimated DV if the estimated DV is not available. Information associated with the default non-zero DV can be signaled in a bitstream generated by the three-dimensional video coding or the multi-view video coding. Furthermore, the information associated with the default non-zero DV can be signaled in the bitstream for a sequence, picture, or a slice or region of the picture.
Embodiments according to the present invention construct a Merge candidate list for a current block coded in Merge mode. A first candidate referring to a reference picture corresponding to an inter-view reference picture is derived. A second candidate referring to a reference picture corresponding to non-inter-view reference picture is derived. A Merge candidate list is constructed to comprise at least the first candidate and second candidate, wherein the first candidate is set to a lower priority position in the Merge candidate list than the second candidate. Three-dimensional or multi-view video encoding or decoding is applied to the input data associated with the current block using the Merge candidate list. In one embodiment, the first candidate in the Merge candidate list is placed at a selected candidate position. For example, the first candidate in the Merge candidate list can be placed at the fourth candidate position. The first candidate in the Merge candidate list may correspond to disparity vector prediction and the second candidate in the Merge candidate list may correspond to temporal motion prediction or inter-view motion prediction. The second candidate can be placed at the first position of the Merge candidate list if corresponding to inter-view motion prediction. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 illustrates an example of three-dimensional coding or multi-view coding, where both motion-compensated prediction and disparity-compensated prediction are used.
Fig. 2 illustrates an example of derivation process for inter- view motion prediction based on an estimated disparity vector.
Fig. 3A illustrates spatial neighboring blocks used for deriving candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
Fig. 3B illustrates temporal neighboring blocks used for deriving candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
Fig. 3C illustrates an exemplary process for filling up candidate members of a motion vector member list for advanced motion vector prediction (AM VP).
Fig. 4 illustrates an example of derivation process for disparity vector estimation used by advanced motion vector prediction (AMVP) to derive an inter- view candidate.
Fig. 5 illustrates exemplary derivation process for an estimated disparity vector and for filling up candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
Fig. 6 illustrates an example of single derivation process for both estimated disparity vector and filling up candidate members of a motion vector member list for advanced motion vector prediction (AMVP).
Fig. 7 illustrates an exemplary flowchart of a three-dimensional or multi-view coding incorporating candidate list construction for advanced motion vector prediction (AMVP) based on an embodiment of the present invention.
Fig. 8 illustrates an exemplary flowchart of a three-dimensional or multi-view coding incorporating candidate list construction for Merge mode based on an embodiment of the present invention.
DETAILED DESCRIPTION
As described above, there are various issues with the candidate derivation in three- dimensional (3D) video coding based on the conventional High Efficiency Video Coding (HEVC) based Test Model (HTM). Embodiments of the present invention derive candidate list for advanced motion vector prediction (AMVP) coded blocks depending on whether the target reference picture is an inter-view reference picture. Embodiments of the present invention also derive candidate list for Merge coded blocks depending on whether the inter-view candidate refers to an inter- view reference picture.
As mentioned before, in the current estimated DV derivation process, a zero DV is used if neither the DV from neighboring blocks nor the DV from DV-MCP derivation process is available. A zero DV between two inter-view pictures implies that two corresponding cameras are at the same location or the two cameras are at an infinite distance. Apparently, the zero DV is not a good default value. Accordingly, an embodiment according to the present invention will forbid setting the DV to be zero if neither the DV from neighboring blocks nor the DV from DV-MCP derivation process is available. Instead, an embodiment according to the present invention will not use the inter-view candidate in this case. When the inter-view candidate is declared as unavailable instead of using a default zero DV, it will provide an opportunity for additional candidate to be selected. Another way to overcome the issue with the default zero- DV is to use a global DV or a default non-zero DV, which can be transmitted in the bitstream. And this global DV or non-zero DV will be treated as the default estimated DV when the DV estimation process fails. The global DV may correspond to typical average DV.
The performance of a system incorporating an embodiment of the present invention is compared with the conventional 3D video coding. If neither the DV from neighboring blocks nor the DV from DV-MCP derivation process is available, the conventional 3D video coding uses a default zero DV while the system incorporating an embodiment of the present invention forbids the use of a default zero DV. The system incorporating an embodiment of the present invention achieves slightly better performance in terms of BD-Rate (0% to 0.1%) for various test video materials, where the BD-rate is a commonly used performance measurement in the field of video coding. The required processing times for encoding, decoding and rendering (i.e., view synthesis) show noticeable improvement (reduction of processing times by 4%, 3.1% and 5.4% respectively).
Regarding the AMVP candidate derivation, there might be redundancy between the neighboring block checking in the DV estimation process and the AMVP candidate list derivation process as mentioned before. An embodiment according to the present invention solves the redundant processing issue by using the DV estimation process to fill up the MV or DV candidate list when the target reference picture is an inter-view reference picture. Therefore, the DV estimation process in this case will not return the first found neighboring DV as before. Fig. 6 illustrates an exemplary flowchart for an AMVP candidate derivation process incorporating an embodiment of the present invention. The AMVP candidate derivation process incorporating an embodiment of the present invention has several benefits over the conventional approach. In addition to the benefit of avoiding redundancy in checking the motion information (i.e, DVs and MVs) of neighboring blocks, the inefficient DV candidate with a zero vertical component is also removed. Furthermore, DV-MCP can be used as an AMVP candidate, which may result in more coding gain. For example, if only the Above-Left neighboring block has a DV and the DV value is (dx,dy), then there will be only two non-zero candidates: (dx,dy) and (dx, 0) according to the conventional HTM. However, the candidate list derivation according to the present invention, will remove (dx, 0) due to its redundancy. Therefore, one or more DV-MCP candidates may be added into the candidates list. This would provide opportunity to derive candidates that can be more reasonable and efficient since some redundant candidates would be eliminated during checking the DVs of the neighboring blocks.
If the target reference picture is not an inter- view reference picture (referred as a non-interview reference picture), i.e. the reference picture is a temporal reference picture, the AMVP candidate list derivation process places the inter-view candidate at the first candidate position in the candidate list according to an embodiment of the present invention. The rationale behind this embodiment is that the inter- view candidate in this case is likely more efficient than DVP as being used in the conventional 3D-AVC (3D Advanced Video Coding).
The performance of a system incorporating an embodiment of the present invention as described above is compared with the conventional 3D video coding. The system incorporating an embodiment of the present invention uses the DV estimation process to fill up the MV or DV candidate list when the target reference picture is an inter-view reference picture. The system incorporating an embodiment of the present invention achieves slightly better performance in terms of BD-Rate (0% to 0.2%) for various test video materials. The required processing times for encoding, decoding and rendering (i.e., view synthesis) show noticeable improvement (reduction of processing times by 5.6%, 6.7% and 5.9% respectively).
For a block coded in AMVP mode, the DV estimation process may also forbid the use of a default zero DV when the DV estimation process fails to find an available DV.
For Merge mode, an embodiment of the present invention changes the inter- view candidate position in the candidate list when the inter-view candidate refers to an inter-view reference picture. When the inter-view candidate refers to an inter-view reference picture, the inter-view candidate actually is used for DVP. In this case, the inter-view candidate is likely not a good choice. However, according to the conventional candidate derivation process, the inter- view candidate is placed at the first candidate position (i.e., the highest priority). Accordingly, an embodiment of the present invention for candidate list derivation for Merge mode places the inter-view candidate at the fourth candidate position when the inter-view candidate refers to an inter-view reference picture. While the fourth candidate position is used as an example to lower the priority of the inter-view candidate in the candidate list, other lower-priority candidate position may also be used. When the inter- view candidate refers to a temporal reference picture, the inter-view motion prediction is carried out. In this case, it is reasonable to keep the interview candidate at the first candidate position.
The performance of a system incorporating an embodiment of the present invention as described above is compared with the conventional 3D video coding. The system incorporating an embodiment of the present invention places the inter-view candidate at the fourth candidate position when the inter-view candidate refers to an inter-view reference picture, while the system leaves the inter-view candidate at the first candidate position when the inter-view candidate refers to a temporal reference picture. The system incorporating an embodiment of the present invention achieves slightly better performance in terms of BD-Rate (0% to 0.2%) for various test video materials. The required processing times for encoding, decoding and rendering (i.e., view synthesis) show noticeable improvement (reduction of processing times by 0.3%, 5.3% and 1.8% respectively).
A system according to the present invention may also combine the techniques disclosed above. For example, a system may incorporate the DV estimation process forbidding a default zero DV when the DV estimation process fails to find an available DC, the AMVP candidate list derivation process using the DV estimation process to fill up the candidate list when the target reference picture corresponds to an inter-view reference picture, and the candidate list derivation process for Merge mode that places the inter- view candidate in the fourth position in the candidate list if the inter- view candidate refers to an inter- view reference picture. The performance of the system using the combined technique is compared with a conventional system. The system incorporating an embodiment of the present invention achieves better performance in terms of BD-Rate (0% to 0.5%) for various test video materials. Also, the required processing times for encoding, decoding and rendering (i.e., view synthesis) show noticeable improvement (reduction of processing times by 2.8%, 6.7% and 5.1% respectively).
Fig. 7 illustrates an exemplary flowchart of a three-dimensional/multi-view encoding or decoding system incorporating an MV or DV candidate list construction process according to an embodiment of the present invention for a current block coded in the advanced motion vector prediction (AMVP) mode. The system receives input data associated with a current block in a dependent view as shown in step 710. For encoding, the input data associated with the current block corresponds to original pixel data, depth data, or other information associated with the current block (e.g., motion vector, disparity vector, motion vector difference, or disparity vector difference) to be coded. For decoding, the input data corresponds to the coded data associated with the current block in the dependent view. The input data may be retrieved from storage such as a computer memory, buffer (RAM or DRAM) or other media. The input data may also be received from a processor such as a controller, a central processing unit, a digital signal processor or electronic circuits that produce the input data. A first candidate referring to a reference picture corresponding to an inter-view reference picture is derived as shown in step 720. A second candidate referring to a reference picture corresponding to a non- inter-view reference picture is derived as shown in step 730. An AMVP candidate list comprising the first candidate and the second candidate is constructed, wherein the first candidate is set to a lower priority position in the AMVP candidate list than the second candidate as shown I step 740. Three-dimensional or multi-view video encoding or decoding is then applied to the input data associated with the current block using the AMVP candidate list as shown in step 7450.
Fig. 8 illustrates an exemplary flowchart of a three-dimensional/multi-view encoding or decoding system incorporating an MV or DV candidate list construction process according to an embodiment of the present invention for a current block coded in Merge mode. The system receives input data associated with a current block in a dependent view as shown in step 810. A first candidate referring to one reference picture corresponding to an inter-view reference picture is derived as shown in step 820. A second candidate referring to one reference picture corresponding to a non-inter-view reference picture is derived as shown in step 830. A Merge candidate list comprising the first candidate and the second candidate is constructed as shown in step 840, wherein the first candidate is set to a lower priority position in the Merge candidate list than the second candidate. Three-dimensional or multi-view video encoding or decoding is then applied to the input data associated with the current block using the Merge candidate list as shown in step 850.
The flowcharts shown above are intended to illustrate examples of constructing MV or DV candidate list depending on whether the target reference picture is an inter-view reference picture or whether the inter-view candidate refers to an inter-view reference picture. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention.
The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be a circuit integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method for three-dimensional video coding and multi-view video coding, Advanced Motion Vector Prediction (AMVP) mode is selected for a current block, the method comprising:
receiving input data associated with the current block in a dependent view;
deriving a first candidate referring to one reference picture corresponding to an inter-view reference picture;
deriving a second candidate referring to one reference picture corresponding to a non- inter- view reference picture,
constructing an AMVP candidate list comprising the first candidate and the second candidate, wherein the first candidate is set to a lower priority position in the AMVP candidate list than the second candidate; and
applying three-dimensional or multi-view video encoding or decoding to the input data associated with the current block using the AMVP candidate list.
2. The method of Claim 1, wherein said deriving the first candidate comprises deriving an estimated disparity vector (DV) from neighboring disparity vectors (DVs) associated with neighboring blocks of the current block.
3. The method of Claim 2, wherein said deriving the estimated DV generates the estimated DV by searching through the neighboring DVs associated with the neighboring blocks of the current block and selects a first available DV as the estimated DV.
4. The method of Claim 2, wherein said deriving the estimated DV generates the estimated
DV and also generates candidate members of the AMVP candidate list by searching through neighboring motion information associated with the neighboring blocks of the current block.
5. The method of Claim 4, wherein each of the neighboring motion information associated with one neighboring block is searched at most once.
6. The method of Claim 2, wherein if the estimated DV is not available, a default non-zero DV is used as the estimated DV.
7. The method of Claim 6, wherein information associated with the default non-zero DV is signaled in a bitstream generated by the three-dimensional video coding or the multi-view video coding.
8. The method of Claim 7, wherein the information associated with the default non-zero
DV is signaled in the bitstream for a sequence, picture, or a slice or region of the picture.
9. The method of Claim 1, wherein the second candidate comprises a motion vector (MV) associated with a corresponding block in a reference view derived from a disparity vector (DV), and the second candidate is placed at first position in the AMVP candidate list.
10. A method for three-dimensional video coding and multi-view video coding, wherein Merge mode is selected for a current block, the method comprising:
receiving input data associated with the current block in a dependent view;
deriving a first candidate referring to one reference picture corresponding to an inter- view reference picture;
deriving a second candidate referring to one reference picture corresponding to a non- inter- view reference picture;
constructing a Merge candidate list comprising the first candidate and the second candidate, wherein the first candidate is set to a lower priority position in the Merge candidate list than the second candidate; and
applying three-dimensional/multi-view video encoding or decoding to the input data associated with the current block using the Merge candidate list.
11. The method of Claim 10, wherein the first candidate in the Merge candidate list corresponds to disparity vector prediction and the second candidate in the Merge candidate list corresponds to inter-view motion prediction.
12. The method of Claim 10, wherein said deriving the first candidate comprises deriving an estimated disparity vector (DV) from neighboring disparity vectors (DVs) associated with neighboring blocks of the current block.
13. The method of Claim 12, wherein said deriving the estimated DV generates the estimated DV by searching through the neighboring DVs associated with the neighboring blocks of the current block and selects a first available DV as the estimated DV.
14. The method of Claim 10, wherein the second candidate comprises a motion vector (MV) associated with a corresponding block in a reference view derived from a disparity vector (DV), and the second candidate is placed at first position in the Merge candidate list.
15. The method of Claim 10, wherein the second candidate comprises a motion vector (MV) derived from neighboring blocks of the current block.
16. An apparatus for three-dimensional and multi-view video coding, comprising:
one or more electronic circuits, wherein said one or more electronic circuits are configured to:
receive input data associated with a current block in a dependent view;
derive a first candidate referring to one reference picture corresponding to an inter-view reference picture;
derive a second candidate referring to one reference picture corresponds to a non-interview reference picture;
construct a AMVP candidate list comprising the first candidate and the second candidates, wherein the first candidate is set to a lower priority position in the AMVP candidate list than the second candidate; and
apply three-dimensional/multi-view video encoding or decoding to the input data associated with the current block using the AMVP candidate list.
17. An apparatus for three-dimensional video coding and multi-view video coding, comprising:
one or more electronic circuits, wherein said one or more electronic circuits are configured to:
receive input data associated with a current block in a dependent view;
derive a first candidate referring to one reference picture corresponding to an inter-view reference picture;
derive a second candidate referring to one reference picture corresponding to a non-interview reference picture;
construct a Merge candidate list comprising the first candidate and the second candidates, wherein the first candidate is set to a lower priority position in the Merge candidate list than the second candidate; and
apply three-dimensional/multi-view video encoding or decoding to the input data associated with the current block using the Merge candidate list.
PCT/CN2013/082800 2012-10-05 2013-09-02 Method and apparatus of motion vector derivation 3d video coding WO2014053086A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
SG11201502627QA SG11201502627QA (en) 2012-10-05 2013-09-02 Method and apparatus of motion vector derivation 3d video coding
CN201380052367.1A CN104718760B (en) 2012-10-05 2013-09-02 Method and apparatus for three peacekeeping multi-view video codings
EP13843228.1A EP2904800A4 (en) 2012-10-05 2013-09-02 Method and apparatus of motion vector derivation 3d video coding
US14/433,328 US9924168B2 (en) 2012-10-05 2013-09-02 Method and apparatus of motion vector derivation 3D video coding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261710064P 2012-10-05 2012-10-05
US61/710,064 2012-10-05

Publications (1)

Publication Number Publication Date
WO2014053086A1 true WO2014053086A1 (en) 2014-04-10

Family

ID=50434361

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/082800 WO2014053086A1 (en) 2012-10-05 2013-09-02 Method and apparatus of motion vector derivation 3d video coding

Country Status (5)

Country Link
US (1) US9924168B2 (en)
EP (1) EP2904800A4 (en)
CN (1) CN104718760B (en)
SG (1) SG11201502627QA (en)
WO (1) WO2014053086A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016003074A1 (en) * 2014-06-30 2016-01-07 한국전자통신연구원 Device and method for eliminating redundancy of view synthesis prediction candidate in motion merge mode
WO2016054979A1 (en) * 2014-10-09 2016-04-14 Mediatek Inc. Method of 3d or multi-view video coding including view synthesis prediction
US10194133B2 (en) 2014-06-30 2019-01-29 Electronics And Telecommunications Research Institute Device and method for eliminating redundancy of view synthesis prediction candidate in motion merge mode
CN113228635A (en) * 2018-11-13 2021-08-06 北京字节跳动网络技术有限公司 Motion candidate list construction method for intra block replication
US11909952B2 (en) 2018-06-13 2024-02-20 Panasonic Intellectual Property Corporation Of America Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11201405038RA (en) * 2012-04-24 2014-09-26 Mediatek Inc Method and apparatus of motion vector derivation for 3d video coding
KR102260146B1 (en) * 2014-03-31 2021-06-03 인텔렉추얼디스커버리 주식회사 Method and device for creating inter-view merge candidates
WO2016143972A1 (en) * 2015-03-11 2016-09-15 엘지전자(주) Method and apparatus for encoding/decoding video signal
CN110140355B (en) * 2016-12-27 2022-03-08 联发科技股份有限公司 Method and device for fine adjustment of bidirectional template motion vector for video coding and decoding
CN111010571B (en) 2018-10-08 2023-05-16 北京字节跳动网络技术有限公司 Generation and use of combined affine Merge candidates
KR20220009952A (en) 2019-05-21 2022-01-25 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Syntax signaling in subblock merge mode
CN114097228B (en) 2019-06-04 2023-12-15 北京字节跳动网络技术有限公司 Motion candidate list with geometric partition mode coding
EP3963890A4 (en) 2019-06-04 2022-11-02 Beijing Bytedance Network Technology Co., Ltd. Motion candidate list construction using neighboring block information
CN117354507A (en) 2019-06-06 2024-01-05 北京字节跳动网络技术有限公司 Motion candidate list construction for video coding and decoding
KR20220030995A (en) 2019-07-14 2022-03-11 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Transform block size limit in video coding
WO2021047631A1 (en) * 2019-09-13 2021-03-18 Beijing Bytedance Network Technology Co., Ltd. Derivation of collocated motion vectors
WO2021057996A1 (en) 2019-09-28 2021-04-01 Beijing Bytedance Network Technology Co., Ltd. Geometric partitioning mode in video coding
KR20220078600A (en) 2019-10-18 2022-06-10 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Syntax constraints in parameter set signaling of subpictures

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120230408A1 (en) * 2011-03-08 2012-09-13 Minhua Zhou Parsing Friendly and Error Resilient Merge Flag Coding in Video Coding
WO2012171442A1 (en) * 2011-06-15 2012-12-20 Mediatek Inc. Method and apparatus of motion and disparity vector prediction and compensation for 3d video coding
CN102946535A (en) * 2012-10-09 2013-02-27 华为技术有限公司 Method and device for obtaining disparity vector predictors of prediction units

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4793366B2 (en) * 2006-10-13 2011-10-12 日本ビクター株式会社 Multi-view image encoding device, multi-view image encoding method, multi-view image encoding program, multi-view image decoding device, multi-view image decoding method, and multi-view image decoding program
KR101893559B1 (en) * 2010-12-14 2018-08-31 삼성전자주식회사 Apparatus and method for encoding and decoding multi-view video
US9532066B2 (en) 2011-01-21 2016-12-27 Qualcomm Incorporated Motion vector prediction
KR20120118780A (en) * 2011-04-19 2012-10-29 삼성전자주식회사 Method and apparatus for encoding and decoding motion vector of multi-view video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120230408A1 (en) * 2011-03-08 2012-09-13 Minhua Zhou Parsing Friendly and Error Resilient Merge Flag Coding in Video Coding
WO2012171442A1 (en) * 2011-06-15 2012-12-20 Mediatek Inc. Method and apparatus of motion and disparity vector prediction and compensation for 3d video coding
CN102946535A (en) * 2012-10-09 2013-02-27 华为技术有限公司 Method and device for obtaining disparity vector predictors of prediction units

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SCHWARZ, HEIKO ET AL.: "Inter-View Prediction of Motion Data in Multiview Video Coding", 2012 PICTURE CODING SYMPOSIUM, 7 May 2012 (2012-05-07) - 9 May 2012 (2012-05-09), pages 101 - 104, XP032449839 *
See also references of EP2904800A4 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016003074A1 (en) * 2014-06-30 2016-01-07 한국전자통신연구원 Device and method for eliminating redundancy of view synthesis prediction candidate in motion merge mode
US10194133B2 (en) 2014-06-30 2019-01-29 Electronics And Telecommunications Research Institute Device and method for eliminating redundancy of view synthesis prediction candidate in motion merge mode
WO2016054979A1 (en) * 2014-10-09 2016-04-14 Mediatek Inc. Method of 3d or multi-view video coding including view synthesis prediction
US9743110B2 (en) 2014-10-09 2017-08-22 Hfi Innovation Inc. Method of 3D or multi-view video coding including view synthesis prediction
US11909952B2 (en) 2018-06-13 2024-02-20 Panasonic Intellectual Property Corporation Of America Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
CN113228635A (en) * 2018-11-13 2021-08-06 北京字节跳动网络技术有限公司 Motion candidate list construction method for intra block replication
CN113228635B (en) * 2018-11-13 2024-01-05 北京字节跳动网络技术有限公司 Motion candidate list construction method for intra block copy

Also Published As

Publication number Publication date
CN104718760B (en) 2019-04-05
CN104718760A (en) 2015-06-17
US9924168B2 (en) 2018-03-20
EP2904800A1 (en) 2015-08-12
SG11201502627QA (en) 2015-05-28
US20150264347A1 (en) 2015-09-17
EP2904800A4 (en) 2016-05-04

Similar Documents

Publication Publication Date Title
US9924168B2 (en) Method and apparatus of motion vector derivation 3D video coding
US10021367B2 (en) Method and apparatus of inter-view candidate derivation for three-dimensional video coding
EP2944087B1 (en) Method of disparity vector derivation in three-dimensional video coding
KR101638752B1 (en) Method of constrain disparity vector derivation in 3d video coding
JP5970609B2 (en) Method and apparatus for unified disparity vector derivation in 3D video coding
US10264281B2 (en) Method and apparatus of inter-view candidate derivation in 3D video coding
US9961369B2 (en) Method and apparatus of disparity vector derivation in 3D video coding
CA2891723C (en) Method and apparatus of constrained disparity vector derivation in 3d video coding
US20150365649A1 (en) Method and Apparatus of Disparity Vector Derivation in 3D Video Coding
WO2014075615A1 (en) Method and apparatus for residual prediction in three-dimensional video coding
EP2850523A1 (en) Method and apparatus of inter-view motion vector prediction and disparity vector prediction in 3d video coding
US10341638B2 (en) Method and apparatus of depth to disparity vector conversion for three-dimensional video coding
CA2921759C (en) Method of motion information prediction and inheritance in multi-view and three-dimensional video coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13843228

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14433328

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2013843228

Country of ref document: EP