OA18314A - Sub-prediction unit based advanced temporal motion vector prediction. - Google Patents

Sub-prediction unit based advanced temporal motion vector prediction. Download PDF

Info

Publication number
OA18314A
OA18314A OA1201700269 OA18314A OA 18314 A OA18314 A OA 18314A OA 1201700269 OA1201700269 OA 1201700269 OA 18314 A OA18314 A OA 18314A
Authority
OA
OAPI
Prior art keywords
candidate
block
atmvp
motion vector
current block
Prior art date
Application number
OA1201700269
Inventor
Ying Chen
Marta Karczewicz
Jianle Chen
Hongbin Liu
Li Zhang
Xiang Ll
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Publication of OA18314A publication Critical patent/OA18314A/en

Links

Abstract

In one example, a device for coding video data includes a memory configured to store video data and a video coder configured to form, for a current block of the video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prediction (ATMVP) candidate, code an index into the merge candidate list that identifies a merge candidate of the plurality of merge candidates in the merge candidate list, and code the current block of video data using motion information of the identified merge candidate.

Description

[0001J This application claims the benefit of U.S. Provisional Application No.
62/107,933, filed January 26,2015, the entïre contents of whîch are hereby încorporated by référencé.
TECHNICAL FIELD [0002] This disclosure relates to video coding.
BACKGROUND [0003] Digital video capabilities can be încorporated into a wide range ofdevices, including digital télévisions, digital direct broadeast Systems, wireless broadeast Systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital caméras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio téléphonés, so-called “smart phones, video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defîned by MPEG-2, MPEG-4, ITU-T H.263, ITU-T
H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265, also referred to as High Efïiciency Video Coding (HEVC), and extensions of such standards. The video devices may transmit, receive, encode, décodé, and/or store digital video information more eflîciently by implementing such video coding techniques.
[0004] Video coding techniques include spatial (intra-picture) prédiction and/or temporal (inter-picture) prédiction to reduce or remove redundancy inhérent in video sequences. For block-based video coding, a video slice (e.g., a video frame or a portion of a vîdeo frame) may be partitioned into video blocks, which for some techniques may also be referred to as treeblocks, coding units (CUs) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prédiction with respect to référencé samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prédiction with respect to référencé samples in neighboring blocks in the same picture or temporal prédiction with respect to référencé samples in other référencé pictures. Pictures may be referred to as trames, and référencé pictures may be referred to a référencé frames.
(0005] Spatial or temporal prédiction résulta in a prédictive block for a block to be coded. Resîdual data represents pixel différences between the original block to be coded and the prédictive block An inter-coded block is encoded according to a motion vector that points to a block of référencé samples forming the prédictive block and the resîdual data îndicating the différence between the coded block and the prédictive block An intra-coded block is encoded according to an intra-codîng mode and the resîdual data. For further compression, the resîdual data may be transformed from the pixel domain to a transform domain, resulting in resîdual transform coefficients, which then may be quantized. The quantized transform coefficients, ïnîtially arranged in a two10 dimensional array, may be scanned in order to produce a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
SUMMARY [0006] In general, thîs disclosure describes techniques related to coding (e.g., encoding or decoding) of motion information for a block of video data. More particularly, a video coder (e.g., a video encoder or a video décoder) may be configured to code motion information for a current block (e.g., a current prédiction unit (PU)) using advanced temporal motion vector prédiction (ATMVP). ATMVP generally involves using a temporal motion vector to identify a corresponding prédiction unit that is spîit into subPUs. Rather than splitting the current PU into sub-PUs, the temporal motion vector may simply identify the corresponding block that is spîit into sub-PUs, each having their own motion information, and the video coder may be configured to predict corresponding portions of the current block using the motion information of the respective sub-PUs.
By avoiding actîvely splitting the current block overhead stgnaling information may be reduced for the current block whîle stî11 achieving fine grain prédiction for portions of the current block that may otherwise resuit from splitting the current block into subPUs.
[0007] In one example, a method of coding video data includes forming, for a current block of video data, a merge candidate lîst includîng a plurality of merge candidates, the plurality ofmerge candidates including four spatial neïghboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neïghboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, coding an index into the merge candidate list that identifies a merge
candidate of the plurality of merge candidates in the merge candidate list, and coding the current block of video data using motion information of the identified merge candidate.
[0008] In another example, a device for coding video data ïncludes a memory configured to store video data and a video coder configured to form, for a current block of the video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) to candidate, code an index into the merge candidate list that identifies a merge candidate of the plurality of merge candidates in the merge candidate list, and code the current block of video data using motion information of the identifïed merge candidate.
[0009] In another example, a device for coding video data tncludes means for form in g, for a current block of video data, a merge candidate list including a plurality of merge 15 candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, means for coding an index into the merge candidate list that identifies a merge candidate of the plurality of merge candidates in the merge 20 candidate list, and means for coding the current block of video data using motion information of the identi fied merge candidate.
[0010] In another example, a computer-readable storage medium has stored thereon instructions that, when executed, cause a processor to form, for a current block of video data, a merge candidate list including a plurality of merge candidates, the plurality of 25 merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, code an index into the meige candidate list that identifies a merge candidate ofthe plurality of merge candidates in the merge candidate list, and code the current block of video data 30 using motion information of the identifïed merge candidate.
[0011] The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages wîll be apparent from the description and drawings, and from the daims.
BRIEF DESCRIPTION OF DRAWINGS [0012] FIG. 1 is a block diagram illustrating an example video encoding and decoding System that may utilize techniques for implementing advanced temporal motion vector prédiction (ATMVP).
[0013] FIG. 2 is a block diagram illustrating an example of video encoder that may implement techniques for advanced temporal motion vector prédiction (ATMVP). [0014] FIG. 3 is a block diagram illustrating an example of video décoder that may implement techniques for advanced temporal motion vector prédiction (ATMVP). [0015] FIG. 4 is a conceptual diagram illustrating spatial neighboring candidates in
High Efficiency Video Coding (HEVC).
[0016] FIG. 5 is a conceptual diagram illustrating temporal motion vector prédiction (TMVP) in HEVC.
[0017] FIG. 6 is a conceptual diagram illustrating an example prédiction structure for 3D-HEVC.
[0018] FIG. 7 is a conceptual diagram illustrating sub-PU based inter-view motion prédiction in 3D-HEVC.
[0019] FIG. 8 is a conceptual diagram illustrating sub-PU motion prédiction from a référencé picture.
[0020] FIG. 9 is a conceptual diagram illustrating relevant pictures in ATMVP (similar 20 to TMVP).
[0021] FIG. 10 is a flowchart illustrating an example method for adding an ATMVP candidate to a candidate Iist during an encoding process in accordance with the techniques of this disclosure.
[0022] FIG. 11 is a flowchart illustrating an example method for adding an ATMVP candidate to a candidate 1 ist during a decoding process in accordance with the techniques of this disclosure.
DETAILED DESCRIPTION [0023] In general, this disclosure is related to motion vector prédiction in video codées.
More speciflcally, advanced temporal motion vector prédiction is achieved by collectîng the motion vectors in a sub-block (sub-PU) level for a given block (prédiction unit). [0024] Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T
H.264 (also known as ISO/IEC MPEG-4 AVC), includîng its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions. One joint draft ofMVC is described in “Advanced video coding for generic audiovisual services, ITU-T Recommendation H.264, March, 2010.
[0025] In addition, there is a newly developed video coding standard, namely High
Efficiency Video Coding (HEVC), developed by the Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG). A recent draft of HEVC is available from http://phenix.int-evry.fr/jct/doc_end_user/documents/l2_Geneva/wgll/JCTVC-Ll00310 v34.zip. The HEVC standard is also presented jointly in Recommendation ITU-T
H.265 and International Standard ISO/IEC 23008-2, both entitled “High efficiency video coding,” and both published October, 2014.
[0026] Motion information: For each block, a set of motion information can be available. A set of motion information contains motion information for forward and 15 backward prédiction directions. Here, forward and backward prédiction directions are two prédiction directions of a bi-directional prédiction mode and the terms “forward” and “backward” do not necessarily hâve a geometry meaning; instead they correspond to reference picture list 0 (RefPicListO) and référencé picture list 1 (RefPicListl ) ofa current picture. When only one référencé picture list is available for a picture or slice, 20 only RefPicListO is available and the motion information of each block of a slice is always forward.
[0027] For each prédiction direction, the motion information must contain a référencé index and a motion vector. In some cases, for simplicity, a motion vector itself may be referred to in a way that it is assumed that it has an associated référencé index. A référencé index is used to îdentify a référencé picture in the current référencé picture list (RefPicListO or RefPicListl). A motion vector has a horizontal and a vertical component.
[0028] Picture order count (POC) is widely used in video coding standards to îdentify a display order of a picture. Although there are cases in which two pictures within one 30 coded video sequence may hâve the same POC value, it typically does not happen within a coded video sequence. When multiple coded video sequences are présent in a bitstream, pictures with a same value of POC may be doser to each other in terms of decoding order. FOC values of pictures are typically used for référencé picture list construction, dérivation of référencé picture sets as in HEVC and motion vector scaling.
[0029] Macroblock (MB) structure in Advanced Video Codîng (AVC) (H.264): In H.264/AVC, each inter macroblock (MB) may be partîtioned înto four different ways:
• One I6xl6 MB partition • Two 16x8 MB partitions · Two 8xl 6 MB partitions • Four 8x8 MB partitions [0030] Different MB partitions in one MB may hâve different référencé index values for each direction (RefPicListO or RefPicListl).
[0031] When an MB is not partîtioned into four 8x8 MB partitions, it has only one motion vector for each MB partition in each direction.
[0032] When an MB is partîtioned into four 8x8 MB partitions, each 8x8 MB partition can be further partîtioned into sub-blocks, each of which can hâve a different motion vector în each direction. There are four different ways to get sub-blocks from an 8x8 MB partition:
· One 8x8 sub-block • Two 8x4 sub-blocks • Two 4x8 sub-blocks • Four 4x4 sub-blocks [0033] Each sub-block can hâve a different motion vector in each direction. Therefore, 20 a motion vector is présent in a level equal to higher than sub-block.
[0034] Temporal direct mode in AVC: In AVC, temporal direct mode could be enabled in either MB or MB partition level for skip or direct mode in B slices. For each MB partition, the motion vectors of the block co-located wîth the current MB partition in the RefPicListl [ 0 ] of the current block are used to dérivé the motion vectors. Each motion 25 vector in the co-located block is scaled based on POC distances.
[0035] Spatial direct mode in AVC: In AVC, a direct mode can also predict motion information from the spatial neighbors.
[0036] Coding Unit (CU) Structure in High Efficiency Video Coding (HEVC): In HEVC, the largest coding unit in a slice is called a coding tree block (CTB) or coding 30 tree unit (CTU). A CTB contains a quad-tree the nodes of which are coding units.
[0037] The sîze of a CTB can be ranges from 16x16 to 64x64 in the HEVC main profile (although technically 8x8 CTB sîzes can be supported). A coding unit (CU) could be the same size of a CTB although and as small as 8x8. Each coding unit is coded with one mode. When a CU is inter coded, it may be further partîtioned into 2 or 4 prédiction
units (PUs) or become just one PU when further partition doesn’t apply. When two PUs are présent in one CU, they can be half size rectangles or two rectangle size with '/« or ’/« size of the CU.
[0038] When the CU is inter coded, one set of motion information is présent for each
PU. In addition, each PU is coded with a unique inter-prediction mode to dérivé the set of motion information.
[0039] Motion prédiction in HEVC: In HEVC standard, there are two inter prédiction modes, named merge (skip is considered as a spécial case ofmerge) and advanced motion vector prédiction (AMVP) modes respectively for a prédiction unit (PU).
o [0040] In either AMVP or merge mode, a motion vector (MV) candidate list is maîntained for multiple motion vector predictors. The motion vector(s), as well as référencé indices in the merge mode, of the current PU are generated by takïng one candidate from the MV candidate list.
[0041] The MV candidate list contains up to 5 candidates for the merge mode and only two candidates for the AMVP mode. A merge candidate may contain a set of motion information, e.g., motion vectors corresponding to both référencé picture lists (list 0 and list 1) and the référencé indices. If a merge candidate is identified by a merge index, the référencé pictures are used for the prédiction of the current blocks, as well as the associated motion vectors are determined. However, under AMVP mode for each potential prédiction direction from either list 0 or list 1, a référencé index needs to be explicitly signaled, together with an MVP index to the MV candidate list since the AMVP candidate contains only a motion vector. In AMVP mode, the predîcted motion vectors can be further refined.
[0042] As can be seen above, a merge candidate corresponds to a full set of motion information while an AMVP candidate contains just one motion vector for a spécifie prédiction direction and référencé index.
[0043] The candidates for both modes are derived similarly from the same spatial and temporal neighboring blocks.
[0044] The sub-PU design for a 2D video codée, especially the one related to the advanced TM VP, may encounter the fol lowing problème. A sub-PU based temporal motion vector prédiction process can be achieved by defîning such a process as an additional candidate, namely ATMVP candidate. However, there are the following design issues for such an ATMVP candidate:
1. Although an ATMVP candidate may be inserted as an additional candidate as TMVP, the positon of such an ATMVP candidate, as well as the interaction with the TMVP candidate to achieve higher coding efïîciency is not known.
2. It is not ctear how to define the availability of the ATMVP candidate; it would be of high complexity if ail motion vectors of ail sub-PUs were to be checked to détermine whether an ATMVP candidate ts unavailable and thus can be inserted into the ATMVP candidate.
3. The pruning process with an ATMVP candidate may be needed; however pruning with such a candidate may be complicated.
4. Various other design details for ATMVP candidate to achieve the best trade-off between coding efficiency and complexity remain unknown.
[0045] FIG. 1 is a block diagram illustrating an example video encodîng and decodîng System 10 that may utilize techniques for implementing advanced temporal motion vector prédiction (ATMVP). As shown in FIG. 1, System 10 includes a source device
12 that provides encoded video data to be decoded at a later time by a destination device
14. In particular, source device 12 provides the video data to destination device 14 via a computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, téléphoné handsets such as so-ca!led “smart” phones, so-called “smart” pads, télévisions, caméras, display devices, digital media players, video gaming consoles, video streaming device, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.
[0046] Destination device 14 may receive the encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a communication medium to enable source device 12 to transmit encoded video data dîrectly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide-area network,
or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful to faciIîtate communication from source device 12 to destination device 14.
[0047] In some examples, encoded data may be output from output interface 22 to a storage device. Similarly, encoded data may be accessed from the storage device by input interface. The storage device may include any of a variety of dîstributed or locally accessed data storage media such as a hard drive, Blu-ray dises, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In a further example, the storage device may 10 correspond to a file server or another intermediate storage device that may store the encoded video generated by source device 12. Destination device 14 may access stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to the destination device 14. Example file servers include a web server (e.g., 15 for a website), an FTP server, network attached storage (NAS) devices, or a local disk drive. Destination device 14 may access the encoded video data through any standard data connection, including an Internet connection. Thîs may include a wïreless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof. [0048] The techniques of thîs disclosure are not necessarily limïted to wireless applications or settïngs. The techniques may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air télévision broadeasts, 25 cable télévision transmissions, satellite télévision transmissions, Internet streaming video transmissions, such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded onto a data storage medium, decodîng of digital video stored on a data storage medium, or other applications. In some examples, System 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadeasting, and/or video telephony. [0049] In the example of FIG. 1, source device 12 includes video source 18, video encoder 20, and output interface 22. Destination device 14 includes input interface 28, video décoder 30, and display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply the techniques for advanced
temporal motion vector prédiction (ATMVP). In other examples, a source device and a destination device may include other components or arrangements. For example, source device 12 may receive video data from an cxtemal video source 18, such as an extemal caméra. Likewise, destination device 14 may interface with an extemal display device, 5 rather than including an integrated display device.
[0050] The illustrated System lOofFIG. I ts merely one example. Techniques for advanced temporal motion vector prédiction (ATMVP) may be performed by any digital video encoding and/or decodîng device. Although général!y the techniques of this disclosure are performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, typically referred to as a “CODEC.” Moreover, the techniques of this disclosure may also be performed by a video preprocessor. Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12,14 may operate in a substantially symmetrical manner such that each of devices 12,14 include video encoding and decodîng components. Hence, System 10 may support one-way or two-way video transmission between video devices 12,14, e.g., for video streaming, video playback, video broadcasting, or video telephony.
[0051] Video source 18 of source device 12 may include a video capture device, such as a video caméra, a video archive containing previously captured video, and/or a video feed interface to receive video from a video content provider. As a further alternative, video source 18 may generate computer graphîcs-based data as the source video, or a combination oflive video, archived video, and computer-generated video. In some cases, if video source 18 is a video caméra, source device 12 and destination device 14 may form so-cal!ed caméra phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output by output interface 22 onto a computer30 readable medium 16.
[0052] Computer-readable medium 16 may include transient media, such as a wireless broadcast or wired network transmission, or storage media (that is, non-transitory storage media), such as a hard disk, flash drive, compact dise, digital video dise, Blu-ray dise, or other computer-readable media. In some examples, a network server (not
shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, e.g., via network transmission. Similarly, a computing device of a medium production facîlïty, such as a dise stamping facility, may receive encoded video data from source device 12 and produce a dise containing the encoded video data. Therefore, computer-readable medium 16 may be understood to include one or more computer-readable media of various forms, in various examples. [0053] Input interface 28 of destination device 14 receives information from computerreadable medium 16. The information ofcomputer-readable medium 16 may include syntax information defined by video encoder 20, which is also used by video décoder
30, that includes syntax éléments that describe characteristics and/or processing of blocks and other coded units, e.g., GOPs. Display device 32 dîsplays the decoded video data to a user, and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma dîsplay, an organic light emitting diode (OLED) display, or another type of dîsplay device.
[0054] Video encoder 20 and video décoder 30 may operate according to a video coding standard, such as the Hîgh Efficiency Video Coding (HEVC) standard, extensions to the HEVC standard, or subséquent standards, such as ITU-T H.266. Altematively, video encoder 20 and video décoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, altematively referred to as MPEG-4, Part
10, Advanced Video Coding (AVC), or extensions of such standards. The techniques of thîs disclosure, however, are not limited to any particular coding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in FIG. 1, in some aspects, video encoder 20 and video décoder 30 may each be integrated with an audio encoder and décoder, and may include appropriate MUX25 DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
[0055] Video encoder 20 and video décoder 30 each may be implemented as any of a variety of suitable encoder cîrcuitry, such as one or more microprocessors, digital signal processors (DSPs), application spécifie integrated circuits (ASICs), field programmable gâte arrays (FPGAs), discrète logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented partially tn software, a device may store instructions for the software in a suitable, non-transitory computer-readable medium and
execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video décoder 30 may be înciuded in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device.
[0056] The JCT-VC is working on development of the HEVC standard. The HEVC standardization efforts are based on an evolving model of a video coding device referred to as the HEVC Test Model (HM). The HM présumés several additional capabilities of video coding devices relative to existing devices according to, e.g., ITU-T H.264/AVC.
For example, whereas H.264 provides nine intra-prediction encoding modes, the HEVC 10 HM may provide as many as thirty-three intra-prediction encoding modes.
[0057] In general, the working model of the HM describes that a video trame or picture may be divided into a sequence of treeblocks or largest coding units (LC U) that include both luma and chroma samples. Syntax data within a bitstream may defîne a size for the LCU, which is a largest coding unit in terms of the number of pixels. A slice includes a 15 number of consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slîces. Each treeblock may be split into coding units (CUs) according to a quadtree. In general, a quadtree data structure includes one node per CU, with a root node corresponding to the treeblock. If a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of which corresponds to 20 one of the sub-CUs.
[0058] Each node of the quadtree data structure may provide syntax data for the corresponding CU. For example, a node in the quadtree may include a split flag, îndicating whether the CU corresponding to the node is split into sub-CUs. Syntax éléments for a CU may be defined recursively, and may dépend on whether the CU is 25 split into sub-CUs. If a CU is not split further, it is referred as a leaf-CU. In this disclosure, four sub-CUs of a leaf-CU will also be referred to as leaf-CUs even if there is no explicit splîtting of the original leaf-CU. For example, if a CU at 16x16 size is not split further, the four 8x8 sub-CUs will also be referred to as leaf-CUs although the 16x16 CU was never split.
[0059] A CU has a similar purpose as a macroblock of the H.264 standard, except that a
CU does not hâve a size distinction. For example, a treeblock may be split into four child nodes (also referred to as sub-CUs), and each child node may in tum be a parent node and be split into another four child nodes. A final, unsplit child node, referred to as a leaf node of the quadtree, comprises a coding node, also referred to as a leaf-CU.
Syntax data associated with a coded bitstream may define a maximum number of times a treeblock may be split, referred to as a maximum CU depth, and may also define a minimum size of the coding nodes. Accordingly, a bitstream may also define a smallest coding unit (SCU). Thïs disclosure uses the term “block” to refer to any ofa CU, PU, or TU, in the context of HEVC, or similar data structures in the context of other standards (e.g., macroblocks and sub-blocks thereof in H.264/AVC).
[0060] A CU includes a coding node and prédiction units (PUs) and transform units (TUs) associated with the coding node. A size ofthe CU corresponds to a size of the coding node and must be square in shape. The size of the CU may range from 8x8 10 pixels up to the size of the treeblock with a maximum of64x64 pixels or greater. Each
CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning of the CU into one or more PUs.
Partitioning modes may difTer between whether the CU is skîp or direct mode encoded, intra-prediction mode encoded, or inter-prediction mode encoded. PUs may be partitioned to be non-square in shape. Syntax data associated with a CU may also describe, for example, partitioning of the CU into one or more TUs according to a quadtree. A TU can be square or non-square (e.g., rectangular) in shape.
[0061] The HEVC standard allows for transformations according to TUs, whîch may be different for différent CUs. The TUs are typically sized based on the size of PUs withîn 20 a given CU defined for a partitioned LCU, although this may not always be the case.
The TUs are typically the same size or smaller than the PUs. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure known as residual quad tree (RQT). The leaf nodes of the RQT may be referred to as transform units (TUs). Pixel différence values associated with the 25 TUs may be transformed to produce transform coefficients, whîch may be quantized.
[0062] A leaf-CU may înclude one or more prédiction units (PUs). In general, a PU représente a spatial area corresponding to ail or a portion of the corresponding CU, and may înclude data for retrieving a référencé sample for the PU. Moreover, a PU includes data related to prédiction. For example, when the PU is intra-mode encoded, data for 30 the PU may be included in a residual quadtree (RQT), whîch may înclude data describing an intra-prediction mode for a TU corresponding to the PU. As another example, when the PU is înter-mode encoded, the PU may include data defining one or more motion vectors for the PU. The data defining the motion vector for a PU may describe, for example, a horizontal component of the motion vector, a vertical
component of the motion vector, a resolution for the motion vector (e.g., one-quarter pixel précision or one-eighth pixel précision), a référencé picture to which the motion vector points, and/or a référencé picture list (e.g., List 0, List l, or List C) for the motion vector.
[0063] A leaf-CU having one or more PUs may also include one or more transform units (TUs). The transform units may be specified using an RQT (also referred to as a TU quadtree structure), as discussed above. For example, a split flag may indicate whether a leaf-CU is split into four transform units. Then, each transform unit may be split further into further sub-TUs. When a TU is not split further, it may be referred to 10 as a leaf-TU. Generally, for intra coding, ail the leaf-TUs belonging to a leaf-CU share the same intra prédiction mode. That is, the same intra-predîction mode is generally applied to calculate predicted values for ail TUs of a leaf-CU. For intra coding, a video encoder may calculate a residual value for each leaf-TU using the intra prédiction mode, as a différence between the portion of the CU corresponding to the TU and the original 15 block. A TU is not necessarily limîted to the size of a PU. Thus, TUs may be larger or smaller than a PU. For intra coding, a PU may be collocated with a corresponding leafTU for the same CU. In some examples, the maximum sîze of a leaf-TU may correspond to the size of the corresponding leaf-CU.
[0064] Moreover, TUs of leaf-CUs may also be associated with respective quadtree data 20 structures, referred to as residual quadtrees (RQTs). That is, a leaf-CU may include a quadtree indicating how the leaf-CU is partitioned into TUs. The root node of a TU quadtree generally corresponds to a leaf-CU, while the root node of a CU quadtree generally corresponds to a treeblock (or LCU). TUs ofthe RQT that are not split are referred to as leaf-TUs. In general, this disclosure uses the terms CU and TU to refer to 25 leaf-CU and leaf-TU, respectively, unless noted otherwise.
[0065] A video sequence typïcally încludes a sériés of video frames or pictures. A group of pictures (GOP) generally comprises a sériés of one or more of the video pictures. A GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, that describes a number of pictures included in the 30 GOP. Each slice ofa picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 20 typically opérâtes on video blocks within individual video slices in orderto encode the video data. A video block may correspond to a coding node within a CU. The video blocks may hâve fixed or varying sizes, and may differ in size according to a specified coding standard.
[0066] As an exemple, the HM supports prédiction in various PU sizes. Assuming that the size of a particular CU is 2Nx2N, the HM supports intra-prediction in PU sizes of 2Nx2N or NxN, and inter-prediction in symmetric PU sizes of2Nx2N, 2NxN, Nx2N, or NxN. The HM also supports asymmetric partitioning for inter-prediction in PU sizes of 5 2NxnU, 2NxnD, nLx2N, and nRx2N. In asymmetric partitioning, one direction ofa CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an “n” followed by an indication of “Up”, “Down, “Left,” or “Right.” Thus, for example, “2NxnU” refers to a 2Nx2N CU that is partitioned horizontally with a 2NxO.5N PU on top and a
2Nxl .5N PU on bottom.
[0067] In this disclosure, “NxN” and “N by N” may be used interchangeably to refer to the pixel dimensions of a video block in ternis of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. In general, a 16x16 block will hâve 16 pixels tn a vertical direction (y = 16) and 16 pixels in a horizontal direction (x = 16). Likewise, an 15 NxN block generally has N pixels tn a vertical direction and N pixels in a horizontal direction, where N represents a nonnegative integer value. The pixels in a block may be arranged in rows and columns. Moreover, blocks need not necessarily hâve the same number of pixels in the horizontal direction as in the vertical direction. For example, blocks may comprise NxM pixels, where M is not necessarily equal to N.
[0068] Following intra-predîctive or inter-predictive coding using the PUs ofa CU, video encoder 20 may calculate residual data for the TUs of the CU. The PUs may comprise syntax data describing a method or mode of generating prédictive pixel data in the spatial domain (also referred to as the pixel domain) and the TUs may comprise coefficients in the transform domain following application of a transform, e.g., a discrète cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to residual video data. The residual data may correspond to pixel différences between pixels of the unencoded pic turc and prédiction values corresponding to the PUs. Video encoder 20 may form the TUs including the residual data for the CU, and then transform the TUs to produce transform coefficients for the
CU.
[0069] Following any transforms to produce transform coefficients, video encoder 20 may perform quantization ofthe transform coefficients. Quantization generally refers to a process in whïch transform coefficients are quantized to possîbly reduce the amount of data used to represent the coefficients, providing further compression. The quantization
process may reduce the bit depth associated with some or ail of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.
[0070] Following quantization, the video encoder may scan the transform coefficients, produc in g a one-dimensional vector from the two-dimensional matrix including the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) coefficients at the front of the array and to place lower energy (and therefore higher frequency) coefficients at the back of the array. In some exemples, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to produce a serialized vector that can be entropy encoded. In other examples, video encoder 20 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, e.g., according to context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding or another entropy encoding methodology. Video encoder 20 may also entropy encode syntax éléments associated with the encoded video data for use by video décoder 30 in decoding the video data.
[0071] To perform CABAC, video encoder 20 may assign a context within a context 20 model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values ofthe symbol are non-zero or not. To perform CAVLC, video encoder 20 may select a variable length code for a symbol to be transmitted.
Codewords în VLC may be constructed such that relatively shorter codes correspond to more probable symbols, while longer codes correspond to less probable symbols. In thîs way, the use of VLC may achieve a bit savîngs over, for example, using equallength codewords for each symbol to be transmitted. The probability détermination may be based on a context assigned to the symbol.
[0072] In accordance with the techniques of thîs disclosure, video encoder 20 and video décoder 30 may be configured to perform any or ail of the following techniques shown in the enumerated list below, alone or in any combination:
1. Position of the ATMVP candidate, if înserted, e.g., as a merge candidate list
a. Assume the spatial candidates and TMVP candidate are inserted into a merge candidate list in a certain order. The ATMVP candidate may be inserted in any relatively fixed position of those candidates.
i. In one alternative, for example, the ATM VP candidate can be inserted in the merge candidate list after the first two spatial candidates e.g., Al and Bl;
ii. In one alternative, for example, the ATMVP candidate can be inserted after the first three spatial candidates e.g.. Al and Bl and
BO;
iiî. In one alternative, for example, the ATMVP candidate can be inserted after the first four candidates e.g., Al, Bl, BO, and AO.
iv. In one alternative, for example, the ATMVP candidate can be inserted right before the TMVP candidate.
v. In one altematively, for example, the ATMVP candidate can be inserted right after the TMVP candidate.
b. Altematively, the position of ATMVP candidate in the candidate list can be signaled in the bitstream. The positions of other candidates, including the TMVP candidate can be additionally signaled.
2. Availabilîty check ofthe ATMVP candidate can apply by accessingjust one set of motion information. When such set of information is unavailable, e.g., one block being intra-coded, the whole ATMVP candidate is considered as unavailable. In that case, the ATMVP will not be inserted into the merge list.
a. A center position, or a center sub-PU is used purely to check the availabilîty of the ATMVP candidate. When a center sub-PU is used, the center sub-PU is chosen to be the one that covers the center position (e.g., the center 3 position, with a relative coordinate of (W/2, H/2) to the top-left sample of the PU, wherein WxH is the size of the PU). Such a position or center sub-PU may be used together with the temporal vector to identify a corresponding block in the motion source picture. A set of motion information from the block that covers the center position of a corresponding block is identifïed.
3. Représentative set of motion information for the ATMVP coded PU from a sub-
PU.
a. To form the ATMVP candidate the représentative set of motion information is first formed.
b. Such a représentative set of motion information may be derived from a fixed position or fixed sub-PU. It can be chosen in the same way as that of the set of motion information used to détermine the availability of the ATM VP candidate, as described in bullet #2.
c. When a sub-PU has identified its own set of motion information and is unavai labié, it is set to be equal to the représentative set of motion information.
d. If the représentative set of motion information is set to be that of a subPU, no addîtîonal motion storage is needed at the décoder sîde for the current CTU or slice in the worst case scénario.
e. Such a représentative set of motion information is used in ail scénarios when the decoding processes requires the whole PU to be represented by one set of motion information, including pruning, such that the process is used to generate combined bi-predictive merging candidates.
4. The ATMVP candidate is pruned with TMVP candidate and interactions between TMVP and ATMVP can be considered; detailed techniques are listed below:
a. The pruning ofa sub-PU based candidate, e.g., ATMVP candidate with a normal candidate, may be conducted by using the représentative set of motion information (as in bullet #3) for such a sub-PU based candidate. If such set of motion information is the same as a normal merge candidate, the two candidates are considered as the same.
b. Altematively, in addition, a check is performed to détermine whether the ATMVP contains multiple different sets of motion information for multiple sub-Pus; if at least two different sets are identified, the sub-PU based candidate is not used for pruning, i.e., is considered to be different to any other candidate; Otherwise, it may be used for pruning (e.g., may be pruned during the pruning process).
c. Altematively, in addition, the ATMVP candidate may be pruned with the spatial candidates, e.g., the left and top ones only, with positions denoted asAl and Bl.
d. Altematively, only one candidate is formed from temporal référencé, being either ATMVP candidate or TMVP candidate. When ATMVP is available, the candidate is ATMVP; otherwise, the candidate is TMVP. Such a candidate is inserted into the merge candidate list in a position similar to the position of TMVP. In this case, the maximum number of candidates may be kept as unchanged.
i. Altematively, TMVP is always disabled even when ATMVP îs unavailable.
iî. Altematively, TMVP is used only when ATMVP is unavailable.
e. Altematively, when ATMVP is available and TMVP is unavailable, one set of motion information of one sub-PU is used as the TMVP candidate. In this case, fùrthermore, the pruning process between ATMVP and TMVP îs not applîed.
f. Altematively, or additîonally, the temporal vector used forATMVP may be also used for TMVP, such that the bottom-right position or center 3 position as used for current TMVP in HEVC do not need to be used.
i. Altematively, the position îdentified by the temporal vector and the bottom-right and center 3 positions are joïntly considered to provide an available TMVP candidate.
5. Multiple availability checks for ATMVP are supported to gîve higher chances for the ATMVP candidate to be more accurate and efficient. When the current ATMVP candidate from the motion source pîcture as îdentified by the first temporal vector (e.g., as shown in FIG. 9) is unavailable, other pictures can be considered as motion source picture. When another picture is considered, it may be associated with a different second temporal vector, or may be associated simply with a second temporal vector scaled from the first temporal vector that points to the unavailable ATMVP candidate.
a. A second temporal vector can îdentify an ATMVP candidate in a second motion source picture and the same availability check can apply. If the ATMVP candidate as derived from the second motion source picture is available, the ATMVP candidate is derived and no other pictures need to be checked; otherwise, other pictures as motion source pictures need to be checked.
b. Pictures to be checked may be those in the référencé picture lists ofthe current picture, with a given order. For each list, the pictures are checked in the ascending order of the référencé index. List X îs first checked and pictures in list Y (beïng 1-X) follows.
i. List X is chosen so that list X is the list that contains the colocated picture used for TMVP.
fl. Altematively, X is simply set to be l or 0.
c. Pictures to be checked are those identified by motion vectors ofthe spatial neighbors, with a given order.
6. A partition of the PU that the current ATMVP apply to may be 2Nx2N, NxN, 2NxN, Nx2N or asymmetric motion partition (AMP) partitions, such as 2NxN/2.
a. Altematively, in addition, if other partition sizes can be allowed, ATMVP can be supported too, and such a size may include e.g., 64x8.
to b. Altematively, the mode may be oniy applied to certain partitions, e.g.,
2Nx2N.
7. The ATMVP candidate is marked as a different type of merge candidate.
8. When identifying a vector (temporal vector as in the first stage) from neighbors, multiple neighboring positions, e.g., those used in merge candidate list construction, can be checked in order. For each of the neighbors, the motion vectors corresponding to référencé picture list 0 (list 0) or référencé picture list 1 (list 1) can be checked in order. When two motion vectors are available, the motion vectors in list X can be checked first and followed by list Y (with Y being equal to 1-X), so that list X is the list that contains the co-located picture used for TMVP. In ATMVP, a temporal vector is used be added as a shift of any center position of a sub-PU, whereîn the components of temporal vector may need to be shifted to integer numbers. Such a shifted center position is used to identify a smallest unit that motion vectors can be allocated to, e.g., with a size of4x4 that covers the current center position.
a. A Itematively, motion vectors correspondra g toi ist 0 may be checked before those corresponding to list 1;
b. Altematively, motion vectors corresponding to list 1 may be checked before those corresponding to list 0;
c. Altematively, ail motion vectors corresponding to list X in ail spatial neighbors are checked in order, followed by the motion vectors corresponding to list Y (with Y being equal to 1-X). Here, list “X” can be the list that indicates where co-located picture belongs, or just simply set tobeOor 1.
d. The order ofthe spatial neighbors can be the same as that used in HEVC merge mode.
9. When in the first stage of identïfying a temporal vector does not include information identifying a référencé picture, the motion source picture as shown in FIG. 9, may be simply set to be a fixed picture, e.g., the co-located picture used forTMVP.
a. In such a case, the vector may only be identified from the motion vectors that point to such a fixed picture.
b. In such a case, the vector may only be identified from the motion vectors that point to any picture but fo rther scaled towards the fixed picture.
10. When in the first stage of identifying a vector consists identifying a référencé picture, the motion source picture as shown in FIG. 9, one or more of the following additional checks may apply for a candidate motion vector.
a. Ifthe motion vector is associated with a picture or slice that is Intra coded, such a motion vector is considered as unavailable and cannot be used to be converted to the vector.
b. If the motion vector identifies an Intra block (by e.g., adding the current center coordinate with the motion vector) in the associated picture, such a motion vector is considered as unavailable and cannot be used to be converted to the vector.
11. When in the first stage of identifying a vector, the components of the vector may be set to be (half width of the current PU, half height of the current PU), so that it identifies a bottom-right pixel position in the motion source picture. Hère (x, y) indicates a horizontal and vertical components of one motion vector.
a. Altematively, the components ofthe vector may be set to be (sumfhalf width ofthe current PU, M), sum(half height of the current PU, N)) where the fonction sum(a, b) retums the sum of a and b. In one example, when the motion information is stored in 4x4 unit, M and N are both set to be equal to 2. In another example, when the motion information is stored in 8x8 unit, M and N are both set to be equal to 4.
12. The sub-block/sub-PU size when ATMVP applîes is sîgnaled in a parameter set,
e.g., sequence parameter set of picture parameter set. The size ranges from the least PU size to the CTU size. The size can be also pre-defined or sîgnaled. The size can be, e.g., as small as 4x4. Altematively, the sub-block/sub-PU size can be derived based on the size of PU or CU. For example, the sub-block/sub-PU can be set equal to max (4x4, (width of CU)» M). The value of M can be predefined or signaled in the bitstream.
13. The maximum number of merge candidates may be increased by 1 due to the fact that ATMVP can be consïdered as a new merge candidate. For example, compared to HEVC which takes up to 5 candidates in a merge candidate list after pruning, the maximum number of merge candidates can be increased to 6.
a. Altematively, pruning with conventional TMVP candidate or unification with the conventional TMVP candidate can be performed for ATMVP such that the maximum number of merge candidates can be kept as unchanged.
b. Altematively, when ATMVP îs Îdentified to be available, a spatial neighboring candidate is excluded from the merge candidate list, e.g. the last spatial neighboring candidate in fetching order is excluded.
14. When multiple spatial neighboring motion vectors are considered to dérivé the temporal vector, a motion vector similarity may be calculated based on the neighboring motion vectors of the current PU as well as the neighboring motion vectors îdentified by a spécifie temporal vector being set equal to a motion vector. The one that leads to the highest motion similarity may be chosen as the final temporal vector.
a. In one alternative, for each motion vector from a neighboring position N, the motion vector identifies a block (same size as the current PU) in the motion source picture, wherein îts neighboring position N contains a set of the motion information. This set of motion vector îs compared with the set of motion information as in the neighboring position N of the current block.
b. In another alternative, for each motion vector from a neighboring position N, the motion vector identifies a block in the motion source picture, wherein its neighboring positions contain multiple sets of motion information. These multiple sets of motion vector are compared with the multiple sets of motion information from the neighboring positions of the current PU in the same relative positions. A motion information similarity is calculated. For example, the current PU has the following sets of motion information from Al, Bl, AO and B0, denoted as M1A1,
MIB1, ΜΙΑ0 and MIBO. For a temporal vector TV, it identifies a block corresponding to the PU in the motion source picture. Such a block has motion information from the same relative Al, Bl, AO and B0 positions, and denoted as TMIA1, TMIBI, ΤΜΙΑ0 and TMIBO. The motion sim üarity as determined by TV is calculated as MStv= £_(NE{Al,B1^0,B0})2ÎMVSim([MI]_N,[TMI]_N)],wherein MVSim defines the similarity between two sets of motion information.
c. In both ofthe above cases, the motion similarity MVSim can be used, wherein the two input parameters are the two sets of motion information, each containtng up to two motion vectors and two référencé indices. Each pair of the motion vectors in list X are actually associated with référencé pictures in different list X of different pictures, the current picture and the motion source picture. For each of the two motion vectors MVXN and TMVXN (with X beîng equal to 0 or 1), the motion vector différence MVDXN can be calculated as MVXN - TMVXN. Afterwards, the différence MVSimX is calculated as e.g., abs([MVDX]_N [0])+ abs([MVDX]_N [1]), or ([MVDX]_N [0]*[MVDX]_N [0]+ JMVDXJ-N [1]*[MVDX]_N [I]). If both sets of motion information contain available motion vectors, MVSim is set equal to MVSimO + MVSiml.
i. In order to hâve a unifîed calculation of the motion différence, both of the motion vectors need to be scaled towards the same fixed picture, which can be, e.g., the fïrst référencé picture RefPicListX[O] of the list X of the current picture.
îi. Ifthe availability ofthe motion vector in list X from the fïrst set and the availability of the motion vector in list X from the second set are different, î.e., one référencé index is -1 while the other is not, such two sets of motion information are considered as not stmilar in direction X. If the two sets are not similar in both sets, the final MVSim fonction may retum a big value T, which may be, e.g., considered as infinité.
iii. Altematively, for a pair of sets of motion information, if one is predicted from list X (X beîng equal to 0 or 1) but not list Y (Y being equal to 1-X) and the other has the same status, a weighting between l and 2 (e.g., MVSim is equal to MVSimX * 1.5) may be used. When one set is only predicted from list X and the other is only predicted from list Y, MVSim is set to the bîg value T.
iv. Altematively, for any set of motion information, as long as one motion vector is available, both motion vectors will be produced.
In the case that only one motion vector is available (corresponding to list X), it is scaled to form the motion vector corresponding to the other list Y.
d. Altematively, the motion vector may be measured based on différences between the neïghboring pixels of the current PU and the neïghboring pixels of the block (same size as the current PU) identifîed by the motion vector. The motion vector that leads to the smallest différence may be chosen as the final temporal vector.
15. When deriving the temporal vector of the current block, motion vectors and/or temporal vectors from neïghboring blocks that are coded with ATMVP may hâve a higher priority than motion vectors from other neïghboring blocks.
a. In one example, only temporal vectors ofneïghboring blocks are checked first, and the first available one can be set to the temporal vector of the current block. Only when such temporal vectors are not présent, normal motion vectors are further checked. In this case, temporal vectors for
ATMVP coded blocks need to be stored.
b. In another example, only motion vectors from ATMVP coded neïghboring blocks are checked first, and the first available one can be set to the temporal vector of the current block. Only when such temporal vectors are not présent, normal motion vectors are further checked.
c. In another example, only motion vectors from ATMVP coded neighboring blocks are checked first, and the first available one can be set to the temporal vector of the current block. If such motion vectors are not available, the checking of temporal vector continues sîmïlar as in bullet 15a.
d. In another example, temporal vectors from neighboring blocks are checked first, the first available one can be set to the temporal vector of the current block. If such motion vectors are not available, the checking of temporal vector continues similaras in bullet 15b.
e. In another example, temporal vectors and motion vectors ofATMVP coded neighboring blocks are checked first, the fîrst available one can be set to the temporal vector of the current block. Only when such temporal vectors and motion vectors are not présent, normal motion vectors are further checked.
16. When multiple spatial neighboring motion vectors are considered to dérivé the temporal vector, a motion vector may be chosen so that it minimizes the distortion that is calculated from the pixel domain, e.g., template matchîng may be used to dérivé the temporal vector such that the one leads to minimal match ing cost is selected as the final temporal vector.
17. Dérivation of a set of motion information from a corresponding block (in the motion source picture) is done in a way that when a motion vector is available in the corresponding block for any list X (dénoté the motion vector to be MVX), for the current sub-PU ofthe ATMVP candidate, the motion vector is considered as available for lîst X (by scaling the MVX). If the motion vector is unavailable in the corresponding block for any list X, the motion vector is considered as unavailable for list X.
a. Altematively, when motion vector in the corresponding block is unavailable for lîst X but available for list 1 —X (denoted 1 — X by Y and dénoté the motion vector to be MVY), the motion vector is still considered as available for list X (by scaling the MVY towards the target référencé picture in lîst X).
b. Altematively, or in addition, when both motion vectors in the corresponding block for list X and list Y (equal to 1 -X) are available, the motion vectors from list X and list Y are not necessary used to directly scale and generate the two motion vectors ofa current sub-PU by scaling.
i. In one example, when formulating the ATMVP candidate, the low-delay check as done in TM VP applies to each sub-PU. If for every picture (denoted by refPic) in every référencé picture list of the current slice, picture order count (POC) value of refPic is smaller than POC of current slice, current slice is considered with low-delay mode. In thîs low-delay mode, motion vectors from list X and list Y are scaled to generate the motion vectors of a current sub-PU for lîst X and list Y, respectively. When not in the low delay mode, only one motion vector MVZ from MVX or MVY is chosen and scaled to generate the two motion vectors for a current sub-PU. Similar to TMVP, in such a case Z is set equal to collocated_from_JO_flag, meaning that it dépends on whether the co-located picture as in TMVP is in the list X or list Y of the current picture. Altematively, Z is set as follows: if the motion source picture is identifïed from list X, Z is set to X. Altematively, in addition, when the motion source pictures belong to both référencé picture lists, and RefPicListO]idxO] is the motion source picture that is first présent in list 0 and RefPicList(l)[idxl] is the motion source picture that is first présent in list 1, Z is set to be 0 if idxO is smaller than or equal to idxl, and set to be 1 otherwïse.
18. The motion source picture may be sîgnaled, e.g., generated by video encoder 20 in a coded bitstream. In detail, a flag indicating whether the motion source picture is from list 0 or list 1 is signaled for a B slice. Altematively, in addition, a référencé index to a list 0 or list 1 of the current picture may be signaled to îdentify the motion source picture.
19. When identifying a temporal vector, a vector is consîdered as unavailable (thus other ones can be consîdered) îf it points to an Intra coded block in the associated motion source picture.
[0073] Implémentation ofthe various techniques of this disclosure is discussed below. It is assumed that the ATMVP is implemented on top of HEVC version 1. Motion compression may not apply to référencé pictures and smaller blocks with bi-directional motion compensation may be enabled.
[0074] Sîgnaling of ATMVP in SPS: [0075] atmvp_sub_pu_size may be présent in SPS.
[0076] atmvp_sub_pu_size may specify the size of the sub-PUs of a PU coded with ATMVP mode. It is in the range of2 to 6, inclusive. The sub-PU size for ATMVP, (spuWidth, spuHeîght) is derived as min (w, 1« atmvp_sub_pu_size) by min(h, 1« atmvp_sub_pu_size), whereïn w x h is the size of a current PU.
[0077] Altematively, both width and height ofthe sub-PU sizes are signaled separately in SPS.
[0078] Altematively, the sub-PU sizes are signaled in contrast to the CTU size or the smallest coding unit size.
[0079] A variable atmvpEnableFlag is derived to be equal to 1 if atmvp_sub_pu_size is smaller than a CTU size (e.g., 6 as in HEVC version 1), and 0, otherwise.
[0080] Signaling of ATMVP in slice header: five_minus_maxjium_merge_cand spécifiés the maximum number ofmerging MVP candidates supported in the slice subtracted from 5. The maximum number ofmerging MVP candidates, MaxNumMergeCand is derived as:
MaxNumMergeCand - (atmvpEnableFlag? 6:5)10 five_mînus_max_num_merge_cand (7-41) [0081] The value of five_minus_max_num_merge_cand shall be limited such that MaxNumMergeCand is in the range of 1 to (atmvpEnableFlag? 6:5), inclusive. [0082] Altematively, the five_minus_max_num_merge_cand is changed to six_minus_max_num_merge_cand and the semantics are as follows:
[0083] six_minus_max_num_merge_cand spécifiés the maximum number of merging
MVP candidates supported in the slice subtracted from 6. The maximum number of merging MVP candidates, MaxNumMergeCand is derived as
MaxNumMergeCand = 6 - six_minus_max_num_merge_cand (7-41) [0084] Altematively, max_num_merge_cand_mînusl is dîrectly signaled.
[0085] In some examples, other syntax changes are unnecessary, and an ATMVP candidate is identified by a merge_îdx, which may be in the range of 0 to 5, inclusive. [0086] Decoding processes related to ATMVP: The following decoding processes may be implemented, e.g., by video décoder 30, to formulate an ATMVP candidate and include it as part of the merge candidate lîst:
[0087] Identification of the first stage temporal vector:
[0088] Set variable mtSrcPOC to the POC value ofthe co-located pîcture used in TMVP, tV to a zéro vector, and atmvpAvaFlag is set to 0.
[0089] For each ofthe position N of spatial neighboring positions, being Al, Bl, B0, A0, and B2, the following apply :
· dir is set equal to col!ocated_from_l0_fiag;
• For X being equal to dir through (1-dir), inclusive, if the current slice îs a B slice, or just X being equal to 0 if the current slice is not a B slice, the following apply:
o When the neighboring block N is aval labié and it is not Intra coded, and RefIdxX[N] îs larger than or equal to 0 (dénoté MVLX[N] and RefldxXJN] are the motion vector and référencé index of the neighboring block N correspondîng to RefPicListX), the following steps apply in order:
• mtSrcPOC is set equal to the POC value ofthe RefPicListX[ RefïdxX[N]];
tV is set equal to MVLX[ N ];
• atmvpAvaFlag is set to 1 ;
terminate this process.
[0090] Identification of an available motion source:
Set a list of pictures to be CanPicATMVP to be an empty list.
CanPicATMVPf 0 ] is set to be the picture with POC value equal to mtSrcPOC.
i is set equal to 1;
MotionSrcPic is an empty picture and can be updated as specifîed below. For each of the available référencé picture list X, the following apply:
• dir is set equal to collocated_from_IO_flag;
o For X being equal to dir through ( 1 -dir), inclusive, if the current slice is a B slice, or just X being equal to 0 if the current slice is not a B slice, the following apply:
o For each idx from 0 through num_ref_actîve_lX_minusl ;
- CanPicATMVP[i++] = RefPicListX[idx];
Let (CurrPosX, CurrPosY) be the coordinate ofthe top-left pixel position ofthe current PU.
For n being equal to 0 through i, inclusive, the following apply.
• If n îs not equal to 0, scale the tV towards the picture CanPicATMVP[ n ] to dérivé a tScaledVector, wherein the relevant pictures for the tV are the current picture and the CanPicATMVP[ 0 ], and the relevant pictures for the destination vector tScaledVector are the current picture and the CanPicATMVP[ ΐ ];
• Otherwise ( n is equal to 0), tScaledVector is set equal to tV.
• Get the motion information of the block corresponding to the center sub-PU from the CanPicATMVP[ n ] as follows:
o centerPosX =CurrPosX+= ((tScaledVector[ 0 ] +2)»2);
o centerPosY =CurrPosY+= ((tScaledvector[ 1 ] +2)»2);
o Let (centerPosX, centerPosY) be the position that identifies the correspond ing block of the center sub-PU and the current PU size to be width by height.
o centerPosX += CurrPosX + ( ( width ! spuWidth)» 1 ) * spuWidth + (min(spu Width, width) » 1 ) ;
o centerPosY += CurrPosX + ( ( height/spuHeight) » 1 ) * spuHeight + (min (spuHeight, height) » 1 ) ;
o Invoke the motion information fetchîng process that grabs the motion information, with a picture mtnSrcPic beîng equal to CanPicATMVP[ n ], and a position (posX, posY) being equal to (centerPosX, centerPosY) as input and a sub-PU motion available flag SubPuMtnAvaFlag, a pair of référencé indices sColRefldxO and sColRefldxl, and a pair of motion vectors, sColMVO, sCoIMVl as output.
o If SubPuMtnAvaFlag is equal to 1, the following applies.
MotîonSrcPic is set to CanPicATMVP[ n ] tV is set to be tScaledVector • terminate this loop.
[0091] Motion information fetchîng process:
The înput ofthis process are a picture mtnSrcPic and a position (posX, posY) with the picture, and the output of this process are the motion available fiag mtnAvaFlag, a pair of refence indices refldxO and refldxl, and a pair of motion vectors, mvO, mvl.
The (posX, posY) is firstly clipped to be within the picture mtnSrcPic.
The 4x4 (or other smallest size that storing motion information) block blkT, containing the position (posX, posY) is identified.
mtnAvaFlag is set equal to 0.
If blkT is not Intra coded, and its motion information contains blkTRefldxO, blkTRefldxl, blkTMvO, and blkTMvl, the following applies.
• When either blkTRefldxO or blkTRefldxl is larger than or equal to 0, mtnAvaFlag is set equal to 1 and the following applies for X being equal to 0 and 1.
o refldxX is set equal to the blkTRefldxX o mvX is set equal to the blkTMvX [0092] Génération of sub-PU motion for ATMVP:
If SubPuMtnAvaFlag is equal to 1, the following process is învoked.
• For each of the sub-PU (e.g., in raster-scan order), the following applies.
o Dénoté the horizontal index and vertical index of the current sub-PU as k and 1 respectively, wherein k ranges from 0 through width / spuWidth-1, inclusive and 1 ranges from 0 through height/spuHeight-l, inclusive. For example, if an 16x16 PU is devided into four 8x8 sub-PUs, the (k, 1) values of the four sub-PUs in raster scan order are (0,0), (1,0), (0,1) and (1,1) respectively.
o Sub-PU’s coordinate (tempPosX, tempPosY) are caculated as (tempPosX, tempPosY) = (CurrPosX, CurrPosY)+ (k* spuWidth, I· spuHeight).
o tempPosX+ ((tV[ 0] +2)»2);
o tempPosY+= (ftVf 1 ] +2)»2);
o Invoke the motion information fetchingprocess that grabs the motion information, with a picture mtnSrcPic being equal to MotionSrcPic, and a position (posX, posY) being equal to (tempPosX, tempPosY) as input and a sub-PU motion available flag currSubPuMtnAvaFlag, a pair of refence indices currSubRefîdxO and currSubRefldxl, and a pair of motion vectors, currSubMVO, currSubMVl as output.
o When currSubPuMtnAvaFlag is equal to 0, forXequal to 0 and 1, inclusive, currSubRefldxX is set equal to cColRefldxX and currSubAfVX is set equal to cColMVX.
o For X being equal to 0 and 1, inclusive, scale the motion vector currSubMVX, towards the default target reference picture of the current picture, which is RefPicListX[0], similar as in TMVP. Dénoté the derived reference index and motion vector for the current sub-PU as cSpuRefldxX and cSpuMVX and they are derived as follows:
• cSpuRefldxX =( currSubRefldxX>-0 ? 0: -1);
cSpuMVX is set to be the scaled vector of currSubMVX, similar as in TMVP.
The représentative set of motion information, aRefldxX, and aMVX (for X being equal to 0 or 1) for this ATMVP candidate is derived as follows:
• aRefldxX =( cColRefldxX >=0 ? 0: -1 );
• aMVX îs set to be the scaled vector of cColMVX, similar as ïn TMVP.
Altemativeïy, the motion fetching process and motion scaling process are the same (or similar) as in TMVP of HEVC version 1, Le., the subclause 8,5.3.2.8 ofHEVC version 1: Dérivation process for collocated motion vectors applies to replace the highlighted text In this sub-section. In this case, the motion fetching and motion scaling 5 process as in TMVP (subclause 8.5.3.2.8) replace the motion fetching process and motion scaling process defined above (including as indicated by italicized text).
[0093] Insertion of ATMVP candidate in a merge candidate list:
[0094] When SubPuMtnAvaFlag is equat to 1, the ATMVP candidate is inserted into the merge candidate after the JO (or altemativeïy B0) candidate is tested and possîbly 10 inserted into the merge candidate list.
[0095] The motion information for this candidate is considered to be formed by aRefldxX, and aMVX (with X beîng equal to 0 or 1).
[0096] When TMVP candidate is available, it is fiirther compared with the représentative information ofthe ATMVP candidate (aRefldxX, and aMVX); only tf the 15 TMVP candidate has a refldxX being unequal to aRefldxX or motion vector being unequal to aMVX (with X being equal to 0 or 1), it is further inserted into the merge candidate list.
[0097] When ail the candidates, including the ATMVP candidate are considered transparently as represented by one single set of motion information (up to two référencé indices and two associated motion vectors), the représentative information for the ATMVP candidate is used. For example, in the Dérivation process for combined bipredictive merging candidates, the représentative motion information ofthe ATMVP is used.
[0098] In addition, each merge candidate is attached with a tag (can be a flag or a type) 25 indicating whether such a candidate is an ATMVP candidate.
[0099] In addition, for an ATMVP candidate, the motion information sets, denoted above as cSpuRefldxX and cSpuMVX for X being equal to 0 and 1 and for each sub-PU need to be stored for the current PU is decoded.
[0100] Motion compensation based on the ATMVP candidate: When a current PU is coded with merge mode and the mergejdx specîfîed a candidate indicated as an ATMVP candidate, for each sub-PU, the motion information cSpuRefldxX and cSpuMVX (for X being equal to 0 and 1) are derived and used to perform motion compensation for the current sub-PU. After the motion compensation is done, the residual decoding and other processes are done in the same way as other inter modes.
[0101J Video encoder 20 may further send syntax data, such as block-based syntax data, frame-based syntax data, and GOP-based syntax data, to video décoder 30, e.g., in a frame header, a block header, a slice header, or a GOP header. The GOP syntax data may describe a number of frames in the respective GOP, and the frame syntax data may indicate an encodîng/predîction mode used to encode the corresponding frame. [0102] Video encoder 20 and video décoder 30 each may be implemented as any of a variety of suitable encoder or décoder circuitry, as applicable, such as one or more microprocessors, digital signal processors (DSPs), application spécifie integrated circuits (ASICs), fieid programmable gâte arrays (FPGAs), discrète logîc circuitry, 10 software, hardware, firmware or any combinations thereof. Each of video encoder 20 and video décoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder/decoder (CODEC). A device including video encoder 20 and/or video décoder 30 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cellular 15 téléphoné.
[0103] FIG. 2 is a block diagram illustrating an example of video encoder 20 that may implement techniques for advanced temporal motion vector prédiction (ATMVP). Video encoder 20 may perform intra- and înter-coding of video blocks within video slices. Intra-coding relies on spatial prédiction to reduce or remove spatial redundancy 20 in video within a given video frame or picture. Inter-coding relies on temporal prédiction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I mode) may refer to any of several spatial based coding modes. Inter-modes, such as uni-directional prédiction (P mode) or biprediction (B mode), may refer to any of several temporal-based coding modes.
[0104] As shown in FIG. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the exemple of FIG. 2, video encoder 20 includes mode select unit 40, référencé picture memory 64, summer 50, transform processing unit 52, quantizatîon unit 54, and entropy encoding unit 56. Mode select unît 40, in tum, includes motion compensation unit 44, motion estimation unit 42, intra-predîction unit
46, and partition unit 48. For video block reconstruction, video encoder 20 also includes inverse quantizatîon unit 58, inverse transform unit 60, and summer 62. A deblockîng filter (not shown in FIG. 2) may also be included to filter block boundaries to remove blockiness artifacts from reconstructed video. Ifdesired, the deblockîng filter would typically filter the output ofsummer 62. Additional filters (in loop or post loop)
may also be used in addition to the deblocking filter. Such filters are not shown for brevity, but ïf desired, may filter the output of summer 50 (as an in-ioop filter).
[0105] During the encoding process, video encoder 20 receives a video frame or slice to be coded. The frame or slice may be divided into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-predictive coding of the received video block relative to one or more blocks in one or more référencé frames to provide temporal prédiction. Intra-prediction unit 46 may altematively perform întrapredictive coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial prédiction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.
[0106] Moreover, partition unit 48 may partition blocks of video data into sub-blocks, based on évaluation of previous partitioning schemes in previous coding passes. For example, partition unit 48 may initially partition a frame or slice into LCUs, and partition each of the LCUs into sub-CUs based on rate-distortion analysis (e.g., ratedistortion optimization). Mode select unit 40 may further produce a quadtree data structure indicative of partitioning of an LCU into sub-CUs. Leaf-node CUs of the quadtree may înclude one or more PUs and one or more TUs.
[0107] Mode select unit 40 may select one of the coding modes, intra or inter, e.g., based on error results, and provides the resulting intra- or inter-coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a référencé frame. Mode select unit 40 also provides syntax éléments, such as motion vectors, intra-mode Îndicators, partition information, and other such syntax information, to entropy encoding unit 56.
[0108] Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptuel purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, may indicate the displacement of a PU of a video block within a current video frame or picture relative to a prédictive block within a référencé frame (or other coded unit) relative to the current block being coded within the current frame (or other coded unit). A prédictive block is a block that is found to closely match the block to be coded, in terms of pixel différence, which may be determined by sum of absolute différence (SAD), sum of square différence (SSD), or other différence metrics. In some exemples, video
encoder 20 may calculate values for sub-lnteger pixel positions of référencé pictures stored in référencé picture memory 64. For example, video encoder 20 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions ofthe référencé picture. Therefore, motion estimation unit 42 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel précision.
[0109] Motion estimation unit 42 calculâtes a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a prédictive block of a référencé picture. The référencé picture may be selected from a 10 first référencé picture list (List 0) or a second référencé picture list (List I ), each of whîch îdentify one or more référencé pictures stored în référencé picture memory 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.
[0110] Motion compensation, performed by motion compensation unit 44, may involve 15 fetching or generating the prédictive block based on the motion vector determined by motion estimation unit 42. Again, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated, in some examples. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the prédictive block to which the motion vector points in one ofthe référencé picture 20 lists. Summer 50 forms a resîdual video block by subtracting pixel values of the prédictive block from the pixel values ofthe current video block being coded, forming pixel différence values, as discussed below. In general, motion estimation unit 42 performs motion estimation relative to luma components, and motion compensation unit 44 uses motion vectors calculated based on the luma components for both chroma 25 components and luma components. Mode select unit 40 may also generate syntax éléments associated with the video blocks and the video slice for use by video décoder 30 in decoding the video blocks ofthe video slice.
[0111] Video encoder 20 may be configured to perform any of the various techniques of this disclosure discussed above with respect to FIG. 1. For example, motion compensation unit 44 may be configured to code motion information for a block of video data using AMVP or merge mode in accordance with the techniques of this disclosure.
[0112] Assuming that motion compensation unit 44 elects to perform merge mode, motion compensation unit 44 may form a candidate list including a set of merge candidates. Motion compensation unit 44 may add candidates to the candidate list based on a particular, predetermined order. In one example, motion compensation unit 44 adds the candidates to the candidate list in the order of Al, Bl, BO, AO, then an advanced temporal motion vector prédiction (ATMVP) candidate. Motion compensation unit 44 may also add additional candidates and perform pruning of the candidate list, as dîscussed above. Ultimately, mode select unit 40 may détermine which of the candidates is to be used to encode motion information of the current block, and encode a merge index representing the selected candidate.
[0113] Furthermore, in some examples, motion compensation unit 44 may first détermine whether the ATMVP candidate is available. For example, motion compensation unit may détermine a corresponding block to the current block in a référencé picture and détermine whether motion information is available for the corresponding block. Motion compensation unit 44 may then détermine that the ATMVP candidate (that îs, the corresponding block) is available when motion information is available for the corresponding block. In some examples, motion compensation unit 44 may détermine that motion information îs available for the corresponding block when the entire corresponding block (e.g., a center position block, as shown in FIG. 5a below) is predicted without the use of intra-prediction, but is not available when at least part of the corresponding block is predicted using intra20 prédiction.
[0114] Similarly, in some examples, motion compensation unit 44 may détermine which of two potential ATMVP candidates should be used as the ATMVP candidate ultimately added to the candidate list. For example, motion compensation unit 44 may form a first temporal motion vector relative to the current block that identifies a first 25 ATMVP candidate in a first motion source picture, that is, a first référencé picture. If motion information îs not available for the first ATMVP candidate, motion compensation unit 44 may détermine whether motion information is available for a second, different ATMVP candidate. The second ATMVP candidate may be îdentified using the same temporal motion vector referring to a second, different référencé picture, 30 a different temporal motion vector referring to the same (i.e. first) référencé picture, or a different temporal motion vector referring to the second, different référencé picture.
The référencé pictures to be checked, as dîscussed above, may be in ascending order of a référencé indexes in a référencé picture list. Likewise, if different temporal motion
vectors are used, the temporal motion vectors may be selected in a predetermined order from temporal vectors of neighboring blocks to the current block.
[0115] Furthermore, motion compensation unit 44 may détermine whether a motion vector is available for a sub-PU in the ATMVP candidate for a particular référencé 5 picture list. If so, the motion vector is considered to be available for that référencé picture list. Otherwise, the motion vector is considered to be unavailable for that référencé picture list. Altematively, if a motion vector is available for the other référencé picture list, motion compensation unit 44 may modify the motion information by scaling the motion vector to point to a target référencé picture in the first référencé 10 picture list, as discussed above.
[0116] Intra-prediction unit 46 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, as described above. In particular, intra-prediction unit 46 may détermine an intraprediction mode to use to encode a current block. In some examples, intra-prediction 15 unit 46 may encode a current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 46 (or mode select unît 40, in some examples) may select an appropriate intra-prediction mode to use from the tested modes.
[0117] For example, intra-prediction unit 46 may calculate rate-distortion values using a 20 rate-distortion analysis for the various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristîcs among the tested modes. Rate-distortion analysis generally détermines an amount of distortîon (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as a bitrate (that is, a number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate ratios from the distortions and rates for the various encoded blocks to détermine which intra-prediction mode exhibits the best rate-distortion value for the block.
[0118] After selecting an intra-prediction mode for a block, intra-prediction unit 46 may provide information indicative of the selected intra-prediction mode for the block to 30 entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode. Video encoder 20 may include in the transmitted bitstream configuration data, which may include a plurality of intraprediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables), définitions of encoding contexts
for various blocks, and indications of a most probable intra-predictîon mode, an întraprediction mode index table, and a modified intra-prediction mode index table to use for each of the contexts.
[0119] Video encoder 20 forms a residual video block by subtracting the prédiction data 5 from mode select unit 40 from the original video block being coded. Summer 50 «présents the component or components that perform thïs subtraction operation. Transform processing unit 52 applies a transform, such as a discrète cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. Transform processing unit 52 10 may perform other transforms which are conceptually similar to DCT. Wavelet transforms, integer transforms, sub-band transforms or other types of transforms could also be used.
[0120] In any case, transform processing unit 52 applies the transform to the residual block, producing a block of residual transform coefficients. The transform may convert 15 the residual information from a pixel value domain to a transform domain, such as a frequency domain. Transform processing unît 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce bit rate. The quantization process may reduce the bit depth assocîated with some or ali ofthe coefficients. The degree of quantization may be 20 modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan ofthe matrix încluding the quantized transform coefficients. Altematively, entropy encoding unit 56 may perform the scan.
[0121] Following quantization, entropy encoding unit 56 entropy codes the quantized transform coefficients. For example, entropy encoding unît 56 may perform context 25 adaptîve variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding or another entropy coding technique. In the case of context-based entropy coding, context may be based on neighboring blocks. Following the entropy coding by entropy encoding unit 56, the encoded bitstream may be transmitted to another device (e.g., video décoder 30) or archived for later transmission or retrieval.
[0122] Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a référencé block. Motion compensation unit
may calculate a référencé block by adding the residual block to a prédictive block of one of the frames of référencé picture memory 64. Motion compensation unît 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prédiction block produced by motion compensation unit 44 to produce a reconstructed video block for storage in référencé picture memory 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unît 44 as a référencé block to inter-code a block in a subséquent video frame.
[0123] In this manner, video encoder 20 of FIG. 2 represents an example of a video coder configured to form, for a current block of the video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, code an index into the merge candidate list that identifies a merge candidate of the plurality of merge candidates in the merge candidate list, and code the current block of video data using motion information of the identified merge candidate.
[0124] FIG. 3 is a block diagram illustrating an example of video décoder 30 that may implement techniques for advanced temporal motion vector prédiction (ATMVP). In the example of FIG. 3, video décoder 30 includes an entropy decoding unit 70, motion compensation unit 72, intra prédiction unit 74, inverse quantization unit 76, inverse transformation unit 78, référencé picture memory 82 and summer 80. Video décoder 30 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 20 (FIG. 2). Motion compensation unit 72 may generate prédiction data based on motion vectors received from entropy decoding unit 70, while intra-prediction unit 74 may generate prédiction data based on intraprediction mode indîcators received from entropy decoding unit 70.
[0125] During the decoding process, video décoder 30 receives an encoded video bitstream that represents video blocks of an encoded video slice and associated syntax éléments from video encoder 20. Entropy decoding unit 70 of video décoder 30 entropy décodés the bitstream to generate quantized coefficients, motion vectors or intraprediction mode indîcators, and other syntax éléments. Entropy decoding unit 70 forwards the motion vectors to and other syntax cléments to motion compensation unit
72. Video décoder 30 may receive the syntax éléments at the video slice level and/or the video block level.
[0126] When the video slice is coded as an întra-coded (I) slice, intra prédiction unit 74 may generate prédiction data for a video block of the current video slice based on a signaled intra prédiction mode and data from previously decoded blocks of the current trame or picture. When the video trame is coded as an înter-coded (i.e., B, P or GPB) slice, motion compensation unit 72 produces prédictive blocks for a video block of the current video slice based on the motion vectors and other syntax éléments received from entropy decodîng unit 70. The prédictive blocks may be produced from one of the référencé pictures within one of the référencé picture lists. Video décoder 30 may construct the référencé frame lists, List 0 and List I, using default construction techniques based on référencé pictures stored in référencé picture memory 82. [0127] Motion compensation unit 72 détermines prédiction information for a video block of the current video slice by parsing the motion vectors and other syntax éléments, and uses the prédiction information to produce the prédictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax éléments to détermine a prédiction mode (e.g., intra- or interprediction) used to code the video blocks of the video slice, an inter-prediction slice type (e.g., B slice, P slice, or GPB slice), construction information for one or more of the référencé picture lists for the slice, motion vectors for each inter-encoded video block of the slice, ïnter-prediction status for each inter-coded video block of the slice, and other information to décodé the video blocks in the current video slice. [0128] Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may use interpolation filters as used by video encoder 20 during encoding of the video blocks to calculate interpolated values for sub-înteger pixels of référencé blocks. In thîs case, motion compensation unit 72 may détermine the interpolation filters used by video encoder 20 from the received syntax éléments and use the interpolation filters to produce prédictive blocks.
[0129] Video décoder 30 may be configured to perform any of the various techniques of this disclosure dîscussed above with respect to FIG. I. For example, motion compensation unit 72 may be configured to détermine whether motion information for a block of video data is coded using AMVP or merge mode in accordance with the techniques ofthîs disclosure. More particularly, entropy decodîng unit 70 may décodé
one or more syntax éléments representîng how motion information is coded for the current block.
[0130] Assuming that the syntax éléments indicate that merge mode is performed, motion compensation unit 72 may form a candidate list including a set of merge candidates. Motion compensation unit 72 may add candidates to the candidate list based on a particular, predetermined order. In one example, motion compensation unit 72 adds the cand idates to the candidate list in the order of A1, B1, B0, AO, then an advanced temporal motion vector prédiction (ATMVP) candidate. Motion compensation unit 72 may also add additional candidates and perform pruning ofthe candidate list, as discussed above. Ultimately, motion compensation unit 72 may décodé a merge index representîng which ofthe candidates îs used to code motion information for the current block.
[0131] Furthermore, in some examples, motion compensation unit 72 may first détermine whether the ATMVP candidate is available. For example, motion compensation unit may détermine a corresponding block to the current block in a référencé picture and détermine whether motion information is available for the corresponding block. Motion compensation unit 72 may then détermine that the ATMVP candidate (that is, the corresponding block) is available when motion information is available for the corresponding block. In some examples, motion compensation unit 72 may détermine that motion information is available for the corresponding block when the entire corresponding block is predicted without the use of intra-prediction, but is not available when at least part ofthe corresponding block is predicted using intra-prediction.
[0132] Similarly, in some examples, motion compensation unit 72 may détermine which of two potentiel ATMVP candidates should be used as the ATMVP candidate ultimately added to the candidate list. For example, motion compensation unit 72 may form a first temporal motion vector relative to the current block that identifies a first ATMVP candidate in a first motion source picture, that is, a first référencé picture. If motion information îs not available for the first ATMVP candidate, motion compensation unit 72 may détermine whether motion information is available for a second, different ATMVP candidate. The second ATMVP candidate may be identifïed usîng the same temporal motion vector referring to a second, different référencé picture, a different temporal motion vector referring to the same (i.e. first) référencé picture, or a different temporal motion vector referring to the second, different référencé picture.
The référencé pictures to be checked, as discussed above, may be in ascending order of a référencé indexes in a référencé picture lîst. Likewise, if different temporal motion vectors are used, the temporal motion vectors may be selected in a predetermined order from temporal vectors of neighboring blocks to the current block.
[0133] Furthermore, motion compensation unit 72 may détermine whether a motion vector is available for a sub-PU in the ATMVP candidate for a particular référencé picture list. If so, the motion vector is considered to be available for that référencé picture list. Otherwise, the motion vector is considered to be unavailable for that référencé picture list. Altematively, if a motion vector is available for the other référencé picture list, motion compensation unit 72 may modify the motion information by scaling the motion vector to point to a target référencé picture in the first référencé picture lîst, as discussed above.
[0134] Inverse quantization unit 76 inverse quantîzes, î.e., de-quantizes, quantized transform coefficients provided in the bitstream and entropy decoded by entropy decoding unit 70. The inverse quantization process may include use ofa quantization parameter QPy calculated by video décoder 30 for each video block in the video slice to détermine a degree of quantization and, likewise, a degree of inverse quantization that should be applied.
[0135] Inverse transform unit 78 applies an inverse transform, e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
[0136] After motion compensation unit 72 generates the prédictive block for the current video block based on the motion vectors and other syntax éléments, video décoder 30 forms a decoded video block by summing the residual blocks from inverse transform unit 78 with the corresponding prédictive blocks generated by motion compensation unit 72. Summer 80 représente the component or components that perform thîs summation operation. Ifdesired, a deblocking filter may also be applied to filter the decoded blocks tn order to remove blockiness artifacts. Other loop filters (eîther in the coding loop or after the coding loop) may also be used to smooth pixel transitions, or otherwise improve the video quality. The decoded video blocks in a given frame or picture are then stored in référencé picture memory 82, which stores référencé pictures used for subséquent motion compensation. Référencé picture memory 82 also stores decoded video for later présentation on a display device, such as display device 32 of FIG. 1.
[0137] In this manner, video décoder 30 represents an example of a video coder configured to form, for a current block of the video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, code an index into the merge candidate list that identifies a merge candidate of the plurality of merge candidates in the merge candidate list, and code the current block of video data using motion information of the identified merge candidate.
ta [0138] FIG. 4 is a conceptual diagram illustrating spatial neighboring candidates in HEVC. Spatial MV candidates are derived from the neighboring blocks shown on FIG. 4, for a spécifie PU (PUO), although the methods of generating the candidates from the blocks dîffer for merge and AMVP modes.
[0139] In merge mode, up to four spatial MV candidates can be derived with the orders 15 shown in FIG. 4(a) with numbers, and the order is the following: left (0, Al), above (1, Bl), above-right (2, B0), below-left (3, A0), and above left (4, B2), as shown in FIG. 4 (a). That îs, in FIG. 4(a), block 100 includes PUO 104Aand PU1 104B. When a video coder is to code motion information for PUO 104A using merge mode, the video coder adds motion information from spatial neighboring blocks 108A, 108B, 108C, 108D, and 20 108E to a candidate list, in that order. Blocks 108A, 108B, 108C, 108D, and 108E may also be referred to as, respectively, blocks Al, Bl, B0, A0, and B2, as in HEVC.
[0140] In AVMP mode, the neighboring blocks are divîded înto two groups: a left group including blocks 0 and 1, and an above group including blocks 2,3, and 4 as shown on FIG.4(b). Theseblocksarelabeled,respectively,asblocks 110A, 110B, HOC, 110D, and 110E in FIG. 4(b). In particular, in FIG. 4(b), block 102 includes PUO 106A and PU1 106B, and blocks 110A, 110B, 110C, 110D, and 1 lOErepresent spatial neighbors to PUO 106A. For each group, the potential candidate in a neighboring block referring to the same référencé picture as that îndicated by the sîgnaled référencé index has the highest priority to be chosen to form a final candidate of the group. It is possible that ail neighboring blocks do not contain a motion vector pointing to the same référencé picture. Therefore, if such a candidate cannot be found, the first available candidate will be scaled to form the final candidate; thus, the temporal distance différences can be compensated.
[0141] FIG. 5 is a conceptual diagram illustrating temporal motion vector prédiction in HEVC. In partïcular, FIG. 5(a) illustrâtes an example CU 120 including PUO 122A and PU 1 122B. PUO 122A includes a center block 126 for PU 122A and a bottom-right block 124 to PUO 122A. FIG. 5(a) also shows an extemal block 128 for which motion information may be predicted from motion information of PUO 122A, as discussed below. FIG. 5(b) illustrâtes a current picture 130 including a current block 138 for which motion information is to be predicted. In particular, FIG. 5(b) illustrâtes a collocated picture 134 to current picture 130 (including collocated block 140 to current block 138), a current référencé picture 132, and a collocated référencé picture 136.
Collocated block 140 is predicted using motion vector 144, which is used as a temporal motion vector predictor (TMVP) 142 for motion information of block 138.
[0142] A video coder may add a TMVP candidate (e.g., TMVP candidate 142) into the MV candidate list after any spatial motion vector candidates if TMVP is enabled and the TMVP candidate is available. The process of motion vector dérivation for the TMVP candidate is the same for both merge and AMVP modes. However, the target référencé index for the TMVP candidate in the merge mode is set to 0, according to HEVC. [0143] The primary block location for the TMVP candidate dérivation is the bottom right block outside ofthe collocated PU, as shown in FIG. 5 (a) as block 124 to PUO 122A, to compensate the bias to the above and left blocks used to generate spatial neighboring candidates. However, if block 124 is located outside of the current CTB row or motion information is not available for block 124, the block is substituted with center block 126 ofthe PU as shown in FIG. 5(a).
[0144] The motion vector for TMVP candidate 142 is derived from co-located block 140 of co-located picture 134, as indicated in slice level information.
[0145] Similar to temporal direct mode in AVC, a motion vector of the TMVP candidate may be subject to motion vector scaling, which is performed to compensate picture order count (POC) distance différences between current picture 130 and current référencé picture 132, and collocated picture 134 and collocated référencé picture 136. That is, motion vector 144 may be scaled to produce TMVP candidate 142, based on these POC différences.
[0146] Several aspects of merge and AMVP modes ofHEVC are discussed below. [0147] Motion vector scaling: It is assumed that the value of a motion vector is proportional to the distance between pictures in présentation time. A motion vector associâtes two pictures: the référencé picture and the picture containing the motion
vector (namely the containing picture). When a motion vector is used by video encoder 20 or video décoder 30 to predict another motion vector, the distance between the containing picture and the référencé picture is calculated based on Picture Order Count (POC) values.
[0148] For a motion vector to be predicted, ïts assocîated containing picture and référencé picture are différent. That is, there are two POC différence values for two distinct motion vectors: a first motion vector to be predicted, and a second motion vector used to predict the first motion vector. Moreover, the first POC différence is the différence between the current picture and the référencé picture ofthe first motion vector, and the second POC différence is the différence between the picture containing the second motion vector and the référencé picture to which the second motion vector refers. The second motion vector may be scaled based on these two POC distances. For a spatial neighboring candidate, the containing pictures for the two motion vectors are the same, while the référencé pictures are different. In HEVC, motion vector scaling applies to both TMVP and AMVP for spatial and temporal neighboring candidates. [0149] Artî fïcial motion vector candidate génération: If a motion vector candidate list is not complété, arti fïcial motion vector candidates may be generated and inserted at the end ofthe list until the list includes a predetermined number of candidates.
[0150] In merge mode, there are two types of artifïcial MV candidates: combined candidates derived only for B-slices and zéro candidates used only for AMVP if the first type does not provide enough artifïcial candidates.
[0151] For each pair of candidates that are already in the candidate list and hâve necessary motion information, bi-directional combined motion vector candidates are derived by a combination of the motion vector ofthe first candidate referring to a picture in the list 0 and the motion vector of a second candidate referring to a picture in the list 1.
[0152] Pruning process for candidate insertion: Candidates from different blocks may happen to be the same, which decreases the efficiency of a merge/AMVP candidate list. A pruning process may be applied to solve this problem. According to the pruning process, a video coder compares one candidate to the others in the current candidate list to avoid înserting an identical candidate, to a certain extent. To reduce the complexity, only limited numbers of pruning processes are applied, înstead of comparing each potential candidate with ail other existing candidates already tn the list.
[0153] FIG. 6 illustrâtes an example prédiction structure for 3D-HEVC. 3D-HEVC is a 3D video extension of HEVC under development by JCT-3V. Certain techniques related to the techniques ofthis dîsclosure are described with respect to FIGS. 6 and 7 below.
[0154] FIG. 6 shows a multiview prédiction structure for a three-view case. V3 dénotés the base view and a picture in a non-base view (VI or V5) can be predicted from pictures in a dépendent (base) view of the same time instance.
[0155] Inter-view sample prédiction (from reconstructed samples) is supported in MVHEVC, a typîcal prédiction structure of which is shown in FIG. 8.
[0156] Both MV-HEVC and 3D-HEVC are compatible with HEVC in a way that the base (texture) view is decodable by HEVC (version l) décoder. A test model for MVHEVC and 3D-HEVC is described in Zhang et al., “Test Model 6 of 3D-HEVC and MV-HEVC,” JCT-3V document ISO/IEC JTC1/SC29/WG11 N13940, available at the website mpeg.chiarignone.org/standards/mpeg-h/high-efïïciency-video-coding/test15 model-6-3d-hevc-and-mv-hevc as of January 26,2015.
[0157] In MV-HEVC, a current picture in a non-base view may be predicted by both pictures in the same view and pictures in a référencé view of the same time instance, by putting ail of these pictures in référencé picture lists of the picture. Therefore, a référencé picture list of the current picture contains both temporal référencé pictures and 20 inter-view référencé pictures.
[0158] A motion vector associated with a référencé index conesponding to a temporal référencé picture is denoted a temporal motion vector.
[0159] A motion vector associated with a référencé index corresponding to an interview référencé picture is denoted a disparity motion vector.
[0160] 3D-HEVC supports ail features in MV-HEVC. Therefore, inter-view sample prédiction as mentioned above is enabled.
[0161] In addition, more advanced texture only coding tools and depth related/dependent coding tools are supported.
[0162] The texture-only coding tools often require the identification ofthe corresponding blocks (between views) that may belong to the same object. Therefore, disparity vector dérivation is a basic technology in 3D-HEVC.
[0163] FIG. 7 is a conceptual diagram illustrating sub-PU based inter-view motion prédiction in 3D-HEVC. FIG. 7 shows current picture 160 of a current view (VI) and a collocated picture 162 in a référencé view (V0). Current picture 160 includes a current
PU 164 încluding four sub-Pus I66A-I66D (sub-PUs 166). Respective disparity vectors 174A-174D (disparity vectors 174) identify corresponding sub-PUs 168A168D to sub-PUs 166 in collocated picture 162. In 3D-HEVC, a sub-PU level interview motion prédiction method for the inter-view merge candidate, i.e., the candidate 5 derived from a référencé block in the référencé view.
[0164] When such a mode is enabted, current PU 164 may correspond to a référencé area (with the same size as current PU îdentified by the disparity vector) in the référencé view and the référencé area may hâve richer motion information than needed for génération one set of motion information typically for a PU. Therefore, a sub-PU level 10 inter-view motion prédiction (SPIVMP) method may be used, as shown in FIG. 7.
[0165] This mode may also be signaled as a spécial merge candidate. Each of the subPUs contains a fuit set of motion information. Therefore, a PU may contain multiple sets of motion information.
[0166] Sub-PU based motion parameter inheritance (MPI) in 3D-HEVC: Similarly, in 15 3D-HEVC, the MPI candidate can also be extended in a way similar to sub-PU level inter-view motion prédiction. For example, if the current depth PU has a co-Iocated région which contains multiple PUs, the current depth PU may be separated into subPUs, each may hâve a different set of motion information. This method is called subPU MPI. That is, motion vectors 172A-172D of corresponding sub-PUs 168A-168D 20 may be inherited by sub-PUs I66A-166D, as motion vectors 170A-170D, as shown in
FIG. 7.
[0167] Sub-PU related information for 2D video coding: In U.S. Patent Application Serial No. 61/883,111, which is hereby încorporated by référencé in ïts entirety, a subPU based advanced TMVP design îs described. In single-layer coding, a two-stage advanced temporal motion vector prédiction design is proposed.
[0168] A first stage is to dérivé a vector îdentifying the corresponding block ofthe current prédiction unit (PU) in a référencé picture, and a second stage is to extract multiple sets motion information from the corresponding block and assign them to subPUs ofthe PU. Each sub-PU ofthe PU therefore is motion compensated separately. The 30 concept of the ATMVP is summarized as follows:
1. The vector in the first stage can be derived from spatial and temporal neighboring blocks of the current PU.
2. This process may be achieved as activating a merge candidate among ail the other merge candidates.
[0169] Applicable to single-layer coding and sub-PU temporal motion vector prédiction, a PU or CU may hâve motion refînement data to be conveyed on top of the predictors.
[0170] Several design aspects of the 61/883,111 application are highlighted as follows:
1. The fîrst stage of vector dérivation can also be simplifîed by just a zéro vector.
2. The fîrst stage of vector dérivation may înclude identifying jointly the motion vector and its associated picture. Various ways of selecting the associated picture and further deciding the motion vector to be the fîrst stage vector hâve been proposed.
3. If the motion information during the above process Îs unavailable, the “fîrst stage vector” is used for substitution.
4. A motion vector identifîed from a temporal neighbor has to be scaled to be used for the current sub-PU, în a way similar to motion vector scaling in TMVP. However, which référencé picture such a motion vector may be scaled to can be designed with one of the following ways:
a. The picture is identifîed by a fixed référencé index of the current picture.
b. The picture is identifîed to be the référencé picture of the corresponding temporal neighbor, if also available în a référencé picture list of the current picture.
c. The picture is set to be the co-located picture identifîed in the fîrst stage and from where the motion vectors are grabbed from.
[0171] FIG. 8 is a conceptual diagram illustrating sub-PU motion prédiction from a référencé picture. In thîs example, current picture 180 includes a current PU 184 (e.g., a PU). In this example, motion vector 192 identifies PU 186 of référencé picture 182 relative to PU 184. PU 186 is partitioned into sub-PUs I88A-188D, each havîng respective motion vectors 190A-190D. Thus, although current PU 184 is not actually partitioned into separate sub-PUs, in thîs example, current PU 184 may be predicted using motion information from sub-PUs 188A-188D. In particular, a video coder may code sub-PUs of current PU 184 using respective motion vectors 190A-190D. However, the video coder need not code syntax éléments indicating that current PU 184 is splît into sub-PUs. In this manner, current PU 184 may be effectively predicted using multiple motion vectors 190A-190D, inherited from respective sub-PUs 188A-188D, without the signaling overhead of syntax éléments used to split current PU 184 into multiple sub-PUs.
[0172] FIG. 9 is a conceptual diagram illustrating relevant pictures in ATMVP (similar to TMVP). In particular, FIG. 9 illustrâtes current picture 204, motion source picture 206, and référencé pictures 200,202. More particularly, current picture 204 includes current block 208. Temporal motion vector 212 identifies corresponding block 210 of 5 motion source picture 206 relative to current block 208. Corresponding block 210, in tum, includes motion vector 214, which refers to référencé picture 202 and acts as an advanced temporal motion vector predictor for at least a portion of current block 208, e.g., a sub-PU of current block 208. That is, motion vector 214 may be added as a candidate motion vector predictor for current block 208. If selected, at least a portion of 10 current block 208 may be predicted using a corresponding motion vector, namely, motion vector 216, which refers to référencé picture 200.
[0173] FIG. 10 is a flowchart illustrating an example method for adding an ATMVP candidate to a candidate list during an encoding process in accordance with the techniques ofthis disclosure. The method of FIG. 10 is described as being performed 15 by video encoder 20 (FIGS. 1 and 2). It should be understood, however, that other encoding devices may be configured to perform this or a similar method.
[0174] Initially, video encoder 20 obtains a block of video data to be encoded (not shown in FIG. 10). The block may include a set of spatial neighbors, such as those shown în FIG. 4(a) and FIG. 4(b). Motion compensation unit 44 may construct the 20 candidate list by first adding a left spatial candidate to the candidate list (250). That is, with respect to FIG. 4(a), assuming PU0 104A is the block to be encoded, video encoder 20 may first insert spatial neighbor 108 A into the candidate list.
[0175] Next, motion compensation unit 44 may add the above spatial candidate to the candidate list (252). With respect to FIG. 4(a), video encoder 20 may insert spatial 25 neighbor 108B into the candidate list.
[0176] Next, motion compensation unit 44 may add the above-right spatial candidate to the candidate list (254). With respect to FIG. 4(a), video encoder 20 may insert spatial neighbor 108C into the candidate list.
[0177] Next, motion compensation unît 44 may add the below-left spatial candidate to 30 the candidate list (256). With respect to FIG. 4(a), video encoder 20 may insert spatial neighbor 108D into the candidate list.
[0178] Next, motion compensation unit 44 may add an advanced temporal motion vector predictor (ATMVP) candidate into the candidate list (258). As discussed above, the ATMVP candidate may represent a corresponding block identified by a temporal
vector as shown ίη and discussed with respect to, e.g., FIGS. 8 and 9. Furthermore, in some examples, motion compensation unît 44 may first détermine whether the ATMVP candidate is available. For example, motion compensation unit may détermine a corresponding block to the current block in a référencé picture and détermine whether motion information is available for the corresponding block. Motion compensation unit 44 may then détermine that the ATMVP candidate (that is, the corresponding block) is available when motion information is available for the corresponding block. In some examples, motion compensation unit 44 may détermine that motion information is available for the corresponding block when the entire corresponding block is predicted without the use of întra-predîction, but is not available when at least part of the corresponding block is predicted using intra-prediction.
[0179] Similarly, in some examples, motion compensation unit 44 may détermine which of two potential ATMVP candidates should be used as the ATMVP candidate ultimately added to the candidate list. For example, motion compensation unit 44 may 15 form a first temporal motion vector relative to the current block that identifies a first
ATMVP candidate in a first motion source picture, that is, a first référencé picture. If motion information is not available for the first ATMVP candidate, motion compensation unit 44 may détermine whether motion information is available fora second, different ATMVP candidate. The second ATMVP candidate may be îdentified 20 using the same temporal motion vector referring to a second, different référencé picture, a different temporal motion vector referring to the same (Le., first) référencé picture, or a different temporal motion vector referring to the second, different référencé picture.
The référencé pictures to be checked, as discussed above, may be in ascending order of a référencé indexes in a référencé picture list. Likewise, if different temporal motion vectors are used, the temporal motion vectors may be selected in a predetermined order from temporal vectors of neighboring blocks to the current block.
[0180] Furthermore, motion compensation unit 44 may détermine whether a motion vector is available for a sub-PU in the ATMVP candidate for a particular reference picture list. If so, the motion vector is considered to be available for that reference 30 picture list. Otherwise, the motion vector is considered to be unavaî labié for that reference picture list. Altemativeïy, if a motion vector is available for the other reference picture list, motion compensation unit 44 may modify the motion information by scaling the motion vector to point to a target reference picture in the first reference picture list, as discussed above.
[0181] Video encoder 20 may then select one of the candidates from the candidate list (260). For example, video encoder 20 may test encoding of the block using any or ail of the candidates in the candidate list. Additionally or altematively, motion estimation unit 42 of video encoder 20 may perform a motion search and détermine a motion vector for the block, and détermine whether to encode the motion vector using advanced motion vector prédiction (AMVP) or merge mode. In the example of FIG. 10, ît is assumed that video encoder 20 has elected to encode motion information using merge mode. In general, video encoder 20 (more partîcularly, mode select unit 40) may détermine which of the candidates in the candidate list yields the best rate-distort ion characteristics, and select that candidate to be used to predict the block.
[0182] Accordingly, video encoder 20 may predict the current block using the selected candidate (262). That is, motion compensation unit 44 may retrieve one or more reference blocks îdentified by motion information ofthe selected candidate, and in some examples may înterpolate values for fractional pixels, if the motion information has subis pixel précision.
[0183] Video encoder 20 may then form a residual block for the current block (264). As discussed above, summer 50 may calculate pixel-by-pixel différences between the current block and the predicted block, forming the residual block. Video encoder 20 may then encode residual information ofthe residual block and encode a merge index (266). That is, transform processing unit 52 may transform the residual block, to produce transform coefficients representing the residual information. Quantîzatîon unit 54 may then quantize the transform coefficients. Entropy encoding unit 56 may then entropy encode the quantized transform coefficients, as well as syntax éléments représentative of the motion information coding mode (merge mode, in this example), and the merge index representing the selected candidate from the candidate list. [0184] In this manner, the method of FIG. 10 represents an example of a method including forming, for a current block of video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, coding an index into the merge candidate list that identifies a merge candidate ofthe plurality of merge candidates in the merge candidate list, and coding the current block of video data using motion information of the îdentified merge candidate.
[0185] FIG. 11 is a flowchart illustrating an example method for adding an ATMVP candidate to a candidate list during a decodîng process in accordance with the techniques of thîs disclosure. The method of FIG. 11 is described as being performed by video décoder 30 (FIGS. 1 and 3). It should be understood, however, that other decodîng devices may be configured to perform thîs or a similar method.
[0186] Inîtially, video décoder 30 obtains a block of video data to be encoded (not shown in FIG. 11). The block may include a set of spatial neighbors, such as those shown in FIG. 4(a) and FIG. 4(b). Motion compensation unit 72 may construct the candidate list by first adding a left spatial candidate to the candidate list (270). That is, w with respect to FIG. 4(a), assuming PU0 104A is the block to be encoded, video décoder may first insert spatial neighbor 108 A into the candidate list.
[0187] Next, motion compensation unit 72 may add the above spatial candidate to the candidate list (272). With respect to FIG. 4(a), video décoder 30 may insert spatial neighbor 108B into the candidate list.
[0188] Next, motion compensation unît 72 may add the above-right spatial candidate to the candidate list (274). With respect to FIG. 4(a), video décoder 30 may insert spatial neighbor 108C into the candidate list [0189] Next motion compensation unit 72 may add the below-left spatial candidate to the candidate list (276). With respect to FIG. 4(a), video décoder 30 may insert spatial neighbor 108D into the candidate list.
[0190] Next motion compensation unit 72 may add an advanced temporal motion vector predictor (ATMVP) candidate into the candidate list (278). As discussed above, the ATMVP candidate may represent a corresponding block identified by a temporal vector as shown in and discussed with respect to, e.g., FIGS. 8 and 9. Furthermore, in some examples, motion compensation unit 72 may first détermine whether the ATMVP candidate is available. For example, motion compensation unit may détermine a corresponding block to the current block in a référencé picture and détermine whether motion information is available for the corresponding block. Motion compensation unit may then détermine that the ATMVP candidate (that is, the corresponding block) is 30 available when motion information is available for the corresponding block. In some examples, motion compensation unit 72 may détermine that motion information is available for the corresponding block when the entire corresponding block is predicted without the use of intra-prediction, but is not available when at least part of the corresponding block is predicted using intra-prediction.
[0191] Simîlarly, în some examples, motion compensation unit 72 may détermine which of two potentiel ATMVP candidates should be used as the ATMVP candidate ultimately added to the candidate list. For example, motion compensation unit 72 may form a first temporal motion vector relative to the current block that identifies a first 5 ATMVP candidate in a first motion source picture, that is, a first référencé picture. If motion information is not available for the first ATMVP candidate, motion compensation unit 72 may détermine whether motion information is available for a second, different ATMVP candidate. The second ATMVP candidate may be identified using the same temporal motion vector referring to a second, different référencé picture, 10 a different temporal motion vector referring to the same (i.e., first) référencé picture, or a different temporal motion vector referring to the second, different reference picture. The reference pictures to be checked, as discussed above, may be in ascending order of a reference indexes in a reference picture list. Likewise, îf different temporal motion vectors are used, the temporal motion vectors may be selected în a predetermined order 15 from temporal vectors of neighboring blocks to the current block [0192] Furthermore, motion compensation unit 72 may détermine whether a motion vector is available for a sub-PU in the ATMVP candidate for a particular reference picture list. Ifso, the motion vector is considered to be available for that reference picture list. Otherwise, the motion vector is considered to be unavailable for that 20 reference picture list. Altematively, if a motion vector is available for the other reference picture list, motion compensation unit 72 may modify the motion information by scaling the motion vector to point to a target reference picture in the first reference picture list, as discussed above.
[0193] Video décoder 30 may then décodé a merge index to select one of the candidates 25 from the candidate list (280). More particularly, entropy decoding unit 70 ofvideo décoder 30 may entropy décodé one or more syntax éléments representing whether motion information of a current block is encoded using merge mode, as well as a merge index representing a selected candidate from the candidate list.
[0194] Accordingly, video décoder 30 may predict the current block using the selected 30 candidate (282). That is, motion compensation unit 72 may retrieve one or more reference blocks identified by motion information ofthe selected candidate, and in some examples may interpolate values for fractïonal pixels, if the motion information has subpixel précision.
[0195] Video décoder 30 may also décodé a resîdual block for the current block (284). In particular, entropy decoding unit 70 may décodé quantized transform coefficients, which inverse quantization unît 76 may inverse quantize to form a transform block Inverse transform unit 78 may then inverse transform the transform block to reproduce 5 the residuai block Summer 80 may then combine the predicted block with the resîdual block to décodé the current block (286), in particular, by reconstructing the current block.
[0196] In this manner, the method of FIG. 11 représente an example of a method includîng forming, for a current block of video data, a merge candidate üst including a 10 plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, coding an index into the merge candidate list that identifies a merge candidate of the plurality of merge candidates in the merge 15 candidate list, and coding the current block of video data using motion information of the identified merge candidate.
[0197] It is to be recognized that depending on the example, certain acts or events of any of the techniques descrîbed herein can be performed in a different sequence, may be added, merged, or left out altogether (e.g., not ail descrîbed acts or events are necessary 20 for the practice of the techniques). Moreover, in certain examples, acts or events may be performed concurrently, e.g., through multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.
[0198] In one or more examples, the fonctions descrîbed may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, 25 the fonctions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unît. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitâtes transfer of a computer program from one place to 30 another, e.g., according to a communication protocol. In this manner, computerreadable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or
data structures for implémentation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
[0199] By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic 5 disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted 10 pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the définition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signais, or other transitory 15 media, but are instead directed to non-transitory, tangible storage media. Disk and dise, as used herein, includes compact dise (CD), laser dise, optical dise, digital versatile dise (DVD), floppy disk and Blu-ray dise, where disks usually reproduce data magnetically, while dises reproduce data optîcally with lasers. Combinations ofthe above should also be included within the scope of computer-readable media.
[0200] Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application spécifie ïntegrated circuits (ASICs), field programmable gâte arrays (FPGAs), or other équivalent Ïntegrated or discrète logic circuitry. Accordingly, the terni “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for 25 implémentation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedîcated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codée. Also, the techniques could be fully implemented in one or more circuits or logic éléments.
[0201] The techniques of this disclosure may be implemented in a wide varîety of devices or apparatuses, including a wireless handset, an ïntegrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or unîts are described in this disclosure to emphasîze functional aspects of devices configured to perform the dîsclosed techniques, but do not necessarily requîre realizatîon by different hardware units. Rather, as described above, various units may be combined in a codée hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or fïrmware. [0202] Various examples hâve been described. These and other examples are within the 5 scope of the following daims.

Claims (10)

Claims
1. A method of coding video data, the method comprising:
forming, for a current block of video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, whereïn the ATMVP candidate indicates that the current block is to be predicted using a block identified by the ATMVP candidate that is split into a plurality of sub-blocks, each of the plurality of sub-blocks having respective sets of motion information;
coding an index into the merge candidate list that identifies the ATMVP candidate of the plurality of merge candidates in the merge candidate list; and based on the index identîfÿ ing the ATMVP candidate, coding the current block of video data, wherein coding the current block comprises coding sub-blocks of the current block using the respective motion information ofthe sub-blocks ofthe block identified by the ATMVP candidate.
2. The method ofclaim I, wherein forming the merge candidate list comprises: determining, for the current block, a corresponding block in a référencé picture; determining whether motion information is available for the corresponding block; and forming the merge candidate list to include the ATMVP candidate after determining that motion information is available for the corresponding block.
3. The method of claim 2, wherein determining whether motion information is available for the corresponding block comprises determining whether a portion of the corresponding block is intra-predicted.
4. The method of claim 1, wherein forming the merge candidate lîst comprises forming the ATMVP candidate from a représentative set ofmotion information fora corresponding block to the current block in a référencé picture.
5 predict the current block using the motion information of the sub-blocks of the block identified by the ATMVP candidate to form a predicted block;
décodé resîdual information for the current block; and décodé the current block using the decoded resîdual information and the predicted block.
5 neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, wherein the ATMVP candidate îndicates that the current block is to be predicted using a block identified by the ATMVP candidate that is split into a plurality of sub-blocks, each of the plurality of sub-blocks
10 having respective sets of motion information;
code an index into the merge candidate list that identifies the ATMVP candidate of the plurality of merge candidates in the merge candidate list; and based on the index identifying the ATMVP candidate, code the current block of video data, wherein the instructions that cause the processor to code the current block
15 comprise instructions that cause the processor to code sub-blocks of the current block using the respective motion information of the sub-blocks of the block identified by the ATMVP candidate.
45. The computer-readable storage medium of claim 44, wherein the instructions
20 that cause the processor to form the merge candidate list comprise instructions that cause the processor to:
détermine, for the current block, a corresponding block in a référencé picture; détermine whether motion information is available for the corresponding block; and
25 form the merge candidate list to include the ATMVP candidate after determining that motion information is available for the corresponding block.
46. The computer-readable storage medium of claim 45, wherein the instructions that cause the processor to détermine whether motion information is available for the
30 corresponding block comprise instructions that cause the processor to determining whether a portion of the corresponding block is intra-predicted.
47. The computer-readable storage medium of claim 44, wherein the instructions that cause the processor to form the merge candidate list comprise instructions that cause the processor to form the ATMVP candidate from a représentative set of motion information for a corresponding block to the current block in a référencé picture.
48. The computer-readable storage medium of claim 47, wherein the instructions that cause the processor to form the ATMVP candidate from the représentative set of motion information comprise instructions that cause the processor to form the ATMVP candidate from motion information for a predetermined position of the corresponding
10 block.
49. The computer-readable storage medium of claim 47, wherein the instructions that cause the processor to form the ATMVP candidate from the représentative set of motion information comprise instructions that cause the processor to form the ATMVP
15 candidate from motion information for a predetermined sub-prediction unit (sub-PU) of the corresponding block.
50. The computer-readable storage medium of claim 44, wherein the instructions that cause the processor to form the merge candidate list comprise instructions that
20 cause the processor to:
use a first temporal motion vector, relative to the current block, to identify a first advanced temporal motion vector prédiction (ATMVP) candidate in a first motion source picture;
when the first ATMVP candidate is available, add the first ATMVP candidate to 25 the merge candidate list as the ATMPV candidate;
when the first ATMVP candidate is not available:
use a second temporal motion vector, relative to the current block, to identify a second ATMVP candidate in a second motion source picture; and add the second ATMVP candidate to the merge candidate list as the ATMVP 30 candidate.
51. The computer-readable storage medium of daim 50, wherein the first temporal motion vector and the second temporal motion vector comprise the same temporal motion vector, and wherein the first motion source picture and the second motion source picture comprise different motion source pictures.
52. The computer-readable storage medium of claim 50, wherein the first temporal motion vector and the second temporal motion vector comprise different temporal motion vectors.
53. The computer-readable storage medium of claim 50, further comprising
10 instructions that cause the processor to select the first temporal motion vector and the second temporal motion vector according to a predetermined order from temporal vectors of the neîghboring blocks.
54. The computer-readable storage medium of claim 44, wherein the instructions
15 that cause the processor to form the merge candidate list comprise instructions that cause the processor to:
détermine whether a motion vector is available for a sub-block of the ATMVP candidate for a reference picture list X; and add the ATMVP candidate to the candidate list after determining that the motion 20 vector is available.
55. The computer-readable storage medium of claim 54, wherein when the motion vector is not available for reference picture list X but is available for reference picture list Y, wherein Y comprises a reference picture list other than reference picture list X,
25 the instructions cause the processor to set the motion vector to be available for reference picture list X and scale the motion vector to a reference picture in reference picture list X.
56. The computer-readable storage medium of claim 44, whereîn the instructions that cause the processor to code the index comprise instructions that cause the processor to décodé the index, and wherein the instructions that cause the processor to code the current block comprise instructions that cause the processor to:
5 a reference picture Üst other than reference picture list X.
42. The device of claim 30, wherein the means for coding the index comprises means for decodîng the index, and wherein the means for coding the current block comprises:
10 means for predicting the current block using the motion information of the subblocks of the block îdentified by the ATMVP candidate to form a predîcted block;
means for decodîng residual information for the current block; and means for decodîng the current block using the decoded residual information and the predîcted block.
43. The device of claim 30, wherein the means for coding the index comprises encoding the index, and wherein the means for coding the current block comprises:
means for predicting the current block using the motion information of the subblocks of the block îdentified by the ATMVP candidate to form a predîcted block;
20 means for forming a residual block representing différences between the current block and the predîcted block; and means for encoding the residual information.
44. A computer-readable storage medium havlng stored thereon instructions that, when executed, cause a processor to:
form, for a current block of video data, a merge candidate list including a plurality of merge candidates, the plurality of merge candidates including four spatial
5 first motion source picture;
means for adding, when the first ATMVP candidate is available, the first ATMVP candidate to the merge candidate list as the ATMPV candidate;
means for using a second temporal motion vector, relative to the current block, to identify a second ATMVP candidate in a second motion source picture when the first 10 ATMVP candidate is not available; and means for adding the second ATMVP candidate to the merge candidate list as the ATMVP candidate when the first ATMVP candidate is not available.
37. The device of claim 36, wherein the first temporal motion vector and the second
15 temporal motion vector comprise the same temporal motion vector, and wherein the first motion source picture and the second motion source picture comprise different motion source pictures.
38. The device of claim 36, wherein the first temporal motion vector and the second 20 temporal motion vector comprise different temporal motion vectors.
39. The device of claim 36, further comprisîng means for selecting the first temporal motion vector and the second temporal motion vector according to a predetermined order from temporal vectors of the neighboring blocks.
25
40. The device of claim 30, wherein the means for forming the merge candidate list comprises:
means for determining whether a motion vector is available for a sub-block of the ATMVP candidate for a reference picture list X; and means for adding the ATMVP candidate to the candidate list after determining
30 that the motion vector is available.
41. The device of claim 40, further comprising means for setting the motion vector to be available for reference picture list X and for scaling the motion vector to a reference picture in reference picture list X when the motion vector is not available for reference picture list X but is available for reference picture üst Y, wherein Y comprises
5 corresponding block; and means for forming the merge candidate list to înclude the ATMVP candidate after determining that motion information is available for the corresponding block.
32. The devîce of claim 31, wherein the means for determining whether motion
10 information is available for the corresponding block comprises means for determining whether a portion of the corresponding block is intra-predicted.
33. The devîce of claim 30, wherein the means for forming the merge candidate lîst comprises means for forming the ATMVP candidate from a représentative set of motion
15 information for a corresponding block to the current block in a référencé picture.
34. The device of claim 33, wherein the means for forming the ATMVP candidate from the représentative set of motion information comprises means for forming the ATMVP candidate from motion information for a predetermined position of the
20 corresponding block.
35. The device of claim 33, wherein the means for forming the ATMVP candidate from the représentative set of motion information comprises means for forming the ATMVP candidate from motion information for a predetermined sub-prediction unit
25 (sub-PU) of the correspond ing block.
36. The device of daim 30, wherein the means for forming the merge candidate list comprises:
means for using a first temporal motion vector, relative to the current block, to identify a first advanced temporal motion vector prédiction (ATMVP) candidate in a
5 predetermined order from temporal vectors of the neighboring blocks.
25. The device ofclaim 15, wherein to form the merge candidate list, the video coder is configured to:
détermine whether a motion vector is available for a sub-block of the ATMVP
10 candidate for a référencé picture list X; and add the ATMVP candidate to the candidate list after determining that the motion vector is available.
26. The device of claim 25, wherein when the motion vector is not available for
15 référencé picture list X but is available for référencé picture list Y, wherein Y comprises a référencé picture list other than référencé picture list X, the video coder is configured to set the motion vector to be available for référencé picture list X and scale the motion vector to a référencé picture in référencé picture list X.
20
27. The device of claim 15, wherein the video coder comprises a video décoder configured to décodé the index, and to code the current block, the video décoder is configured to:
predict the current block using the motion information of the sub-blocks of the block identified by the ATMVP candidate to form a predicted block;
25 décodé residual information for the current block; and décodé the current block using the decoded residual information and the predicted block.
28. The device of daim 15, wherein the video coder comprises a video encoder configured to encode the index, and wherein to code the current block, the video encoder is configured to:
predict the current block using the motion information ofthe sub-blocks ofthe block îdentified by the ATMVP candidate to form a predicted block;
form a residual block representing différences between the current block and the predicted block; and encode the residual information.
29. The device of claîm 15, wherein the device comprises at least one of:
an integrated circuit;
a microprocessor, or a wireless communication device.
30. A device for coding video data, the device comprising:
means for forming, for a current block of video data, a merge candidate list including a pluralîty of merge candidates, the plural ity of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately followîng the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, wherein the ATMVP candidate indicates that the current block is to be predicted using a block îdentified by the ATMVP candidate that is split into a pluralîty of sub-blocks, each of the pluralîty of sub-blocks having respective sets of motion information;
means for coding an index into the merge candidate list that identifies the ATMVP candidate of the pluralîty of merge candidates in the merge candidate list; and means for coding the current block of video data, wherein the means for coding the current block of video data comprises means for coding sub-blocks of the current block using the respective motion information of the sub-blocks of the block îdentified by the ATMVP candidate based on the index identifÿing the ATMVP candidate.
31. The devîce of claim 30, wherein forming the merge candidate list comprises: means for determining, for the current block, a corresponding block in a référencé picture;
means for determining whether motion information is available for the
5 including a plurality of merge candidates, the plurality of merge candidates including four spatial neighboring candidates from four neighboring blocks to the current block and, immediately following the four spatial neighboring candidates, an advanced temporal motion vector prédiction (ATMVP) candidate, wherein the ATMVP candidate indicates that the current block is to be predicted 10 using a block îdentified by the ATMVP candidate that is split into a plurality of sub-blocks, each of the plurality of sub-blocks having respective sets of motion information;
code an index into the merge candidate list that identifies the ATMVP candidate of the plurality of merge candidates in the merge candidate list; and 15 based on the index identïfyîng the ATMVP candidate, code the current block of video data, wherein to code the current block of video data, the video coder is configured to code sub-blocks of the current block using the respective motion information of the sub-blocks of the block identified by the ATMVP candidate.
16. The device ofclaim 15, wherein to form the merge candidate list, the video coder is confîgured to:
détermine, for the current block, a corresponding block in a référencé picture; détermine whether motion information is available for the corresponding block; 25 and form the merge candidate list to include the ATMVP candidate after determining that motion information is available for the corresponding block.
17. The device of claim 16, wherein to détermine whether motion information is
30 available for the corresponding block, the video coder is confîgured to détermine whether a portion ofthe corresponding block is intra-predicted.
18. The device of claim 15, wherein to form the merge candidate list, the video coder is configured to form the ATMVP candidate from a représentative set of motion information for a corresponding block to the current block in a référencé picture.
19. The device of claim 18, wherein to form the ATMVP candidate from the représentative set of motion information, the video coder is configured to form the ATMVP candidate from motion information for a predetermined position of the corresponding block.
20. The device of claim 18, wherein to form the ATMVP candidate from the représentative set of motion information, the video coder is configured to form the ATMVP candidate from motion information for a predetermined sub-prediction unit (sub-PU) of the corresponding block.
21. The device of claim 15, wherein to form the merge candidate list, the video coder is configured to:
use a first temporal motion vector, relative to the current block, to identify a first advanced temporal motion vector prédiction (ATMVP) candidate in a first motion source picture;
when the first ATMVP candidate is available, add the first ATMVP candidate to the merge candidate list as the ATMPV candidate;
when the first ATMVP candidate is not available:
use a second temporal motion vector, relative to the current block, to identify a second ATMVP candidate in a second motion source picture; and add the second ATMVP candidate to the merge candidate list as the ATMVP candidate.
22. The device of claim 21, wherein the first temporal motion vector and the second temporal motion vector comprise the same temporal motion vector, and wherein the first motion source picture and the second motion source picture comprise different motion source pîctures.
23. The device of claim 21, wherein the first temporal motion vector and the second temporal motion vector comprise different temporal motion vectors.
24. The device ofclaim 21, wherein the video coder is further configured to select the first temporal motion vector and the second temporal motion vector according to a
5 motion vector is available.
12. The method of claim 11, wherein when the motion vector is not available for référencé picture list X but is available for référencé picture list Y, wherein Y comprises a référencé picture list other than référence picture list X, setting the motion vector to be
10 available for référencé picture list X and scalîng the motion vector to a référencé picture în référencé picture list X.
13. The method of claim 1, wherein coding the index comprises decoding the index, and wherein coding the current block comprises:
15 prediettng the current block using the motion information of the sub-blocks of the block ïdentîfied by the ATMVP candidate to form a predicted block;
decoding residual information for the current block; and decoding the current block using the decoded residual information and the predicted block.
14. The method ofclaim 1, wherein coding the index comprises encoding the index, and wherein coding the current block comprises:
predicting the current block using the motion information ofthe sub-blocks of the block identified by the ATMVP candidate to form a predicted block;
25 forming a residual block representîng différences between the current block and the predicted block; and encoding the residual information.
15. A device for coding video data, the device comprising:
a memory confîgured to store video data; and a video coder confîgured to:
form, for a current block of the video data, a merge candidate list
5. The method of claim 4, wherein forming the ATMVP candidate from the représentative set of motion information comprises forming the ATMVP candidate from motion information for a predetermined position of the corresponding block.
5
6. The method of claim 4, wherein forming the ATMVP candidate from the représentative set of motion information comprises forming the ATMVP candidate from motion information for a predetermined sub-predictîon unit (sub-PU) of the corresponding block.
10
7. The method of claim 1, wherein forming the merge candidate list comprises:
using a first temporal motion vector, relative to the current block, to îdentîfy a first advanced temporal motion vector prédiction (ATMVP) candidate in a first motion source picture;
when the first ATMVP candidate is avai labié, adding the first ATMVP
15 candidate to the merge candidate list as the ATMPV candidate;
when the first ATMVP candidate is not avai labié:
using a second temporal motion vector, relative to the current block, to identify a second ATMVP candidate in a second motion source picture; and adding the second ATMVP candidate to the merge candidate list as the ATMVP 20 candidate.
8. The method of claim 7, wherein the first temporal motion vector and the second temporal motion vector comprise the same temporal motion vector, and wherein the first motion source picture and the second motion source picture comprise different motion
25 source pictures.
9. The method of claim 7, wherein the first temporal motion vector and the second temporal motion vector comprise different temporal motion vectors.
30 10. The method of claim 7, further comprising selecting the first temporal motion vector and the second temporal motion vector according to a predetermined order from temporal vectors ofthe neighboring blocks.
11. The method ofclaim 1, wherein forming the merge candidate list comprises: determinîng whether a motion vector is available for a sub-block of the ATMVP candidate for a référencé picture list X; and addîng the ATMVP candidate to the candidate list after determinîng that the
10 57. The computer-readable storage medium of claim 44, wherein the instructions that cause the processor to code the index comprises encoding the index, and whereîn the instructions that cause the processor to code the current block comprise instructions that cause the processor to:
predict the current block using the motion information of the sub-blocks of the 15 block identified by the ATMVP candidate to form a predicted block;
form a resîdual block representing différences between the current block and the predicted block; and encode the resîdual information.
OA1201700269 2015-01-26 2016-01-26 Sub-prediction unit based advanced temporal motion vector prediction. OA18314A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US62/107,933 2015-01-26
US15/005,564 2016-01-25

Publications (1)

Publication Number Publication Date
OA18314A true OA18314A (en) 2018-10-03

Family

ID=

Similar Documents

Publication Publication Date Title
EP3251361B1 (en) Sub-prediction unit based advanced temporal motion vector prediction
EP3513560B1 (en) Offset vector identification of temporal motion vector predictor
KR102094588B1 (en) Sub-prediction unit motion vector prediction using spatial and/or temporal motion information
CA3074701C (en) Coding affine prediction motion information for video coding
KR102187729B1 (en) Inter-view predicted motion vector for 3d video
KR102238567B1 (en) Selection of pictures for disparity vector derivation
US9420286B2 (en) Temporal motion vector prediction in HEVC and its extensions
US9549180B2 (en) Disparity vector generation for inter-view prediction for video coding
EP2984838B1 (en) Backward view synthesis prediction
WO2018048904A1 (en) Geometry-based priority for the construction of candidate lists
KR102312766B1 (en) Disparity vector and/or advanced residual prediction for video coding
EP2885916A1 (en) Inter-view predicted motion vector for 3d video
WO2014100610A1 (en) Constraints on neighboring block based disparity vector (nbdv) techniques for 3d video
WO2015026952A1 (en) Sub-pu-level advanced residual prediction
OA18314A (en) Sub-prediction unit based advanced temporal motion vector prediction.