CN110677679B - Shape dependent intra coding - Google Patents

Shape dependent intra coding Download PDF

Info

Publication number
CN110677679B
CN110677679B CN201910586376.8A CN201910586376A CN110677679B CN 110677679 B CN110677679 B CN 110677679B CN 201910586376 A CN201910586376 A CN 201910586376A CN 110677679 B CN110677679 B CN 110677679B
Authority
CN
China
Prior art keywords
intra
video block
list
mode
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910586376.8A
Other languages
Chinese (zh)
Other versions
CN110677679A (en
Inventor
刘鸿彬
张莉
张凯
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Publication of CN110677679A publication Critical patent/CN110677679A/en
Application granted granted Critical
Publication of CN110677679B publication Critical patent/CN110677679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Abstract

A method of video bitstream processing, the method comprising: for at least part of an intra-coded video block, a list of intra-mode candidates is generated according to a first shape dependency rule that depends on the shape of the video block, and a decoded representation of the video block is reconstructed using the list of intra-mode candidates. The shape dependency rule may also be extended to inter-coding cases of the Merge candidate list or the advanced motion vector prediction candidate list.

Description

Shape dependent intra coding
Cross Reference to Related Applications
This application claims timely priority and benefit from U.S. provisional patent application No.62/692,805 filed on 2018, 7, 1, according to applicable patent laws and/or according to the rules of the paris convention. The entire disclosure of U.S. provisional patent application No.62/692,805 is incorporated by reference as part of the disclosure of the present application.
Technical Field
This patent document relates to video coding techniques.
Background
Digital video accounts for the largest bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for pre-counting digital video usage will continue to grow.
Disclosure of Invention
The disclosed techniques may be used by video decoder or encoder embodiments in which block-shape dependent encoding techniques are used to improve the performance of encoding of intra-coding of video blocks.
In one example aspect, a video bitstream processing method is disclosed. The method comprises the following steps: for at least part of an intra-coded video block, a list of intra-mode candidates is generated according to a first shape dependency rule that depends on the shape of the video block, and a decoded representation of the video block is reconstructed using the list of intra-mode candidates.
In another example aspect, the above method may be implemented by a video decoder apparatus comprising a processor.
In another example aspect, the above-described method may be implemented by a video encoder apparatus comprising a processor for decoding encoded video during a video encoding process.
In yet another example aspect, the methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.
These and other aspects are further described in this document.
Drawings
Fig. 1 is a diagram of a binary Quadtree (QTBT) structure.
Figure 2 illustrates an example derivation process for the Merge candidate list construction.
Fig. 3 shows example positions of spatial Merge candidates.
Fig. 4 shows an example of a candidate pair considering redundancy checking for spatial Merge candidates.
Fig. 5 shows an example of the locations of second Prediction Units (PUs) for N × 2N and 2N × N partitions.
Fig. 6 is a diagram of motion vector scaling for a temporal (temporal) Merge candidate.
FIG. 7 illustrates example candidate locations for the time-domain Merge candidates C0 and C1.
Fig. 8 shows an example of combined bidirectional predictive Merge candidates.
Fig. 9 shows an example of a derivation process for a motion vector prediction candidate.
Fig. 10 is a diagram of motion vector scaling of spatial motion vector candidates.
Fig. 11 illustrates an example of Advanced Temporal Motion Vector Prediction (ATMVP) motion prediction for a Coding Unit (CU).
Fig. 12 shows an example of one CU with four sub-blocks (a-D) and its neighboring blocks (a-D).
Fig. 13 shows the non-adjacent Merge candidates proposed in J0021.
Fig. 14 shows a non-adjacent Merge candidate proposed in J0058.
Fig. 15 shows a non-adjacent Merge candidate proposed in J0059.
Fig. 16 shows the proposed 67 intra prediction modes.
Fig. 17 shows an example of neighboring blocks for Most Probable Mode (MPM) derivation.
Fig. 18 shows an example of corresponding sub-blocks of a chroma CB in an I-stripe.
Fig. 19A and 19B illustrate examples of additional blocks for an MPM list.
Fig. 20 is a block diagram of an example of a video processing apparatus.
Fig. 21 shows a block diagram of an example implementation of a video encoder.
Fig. 22 is a flowchart of an example of a video bitstream processing method.
Detailed Description
This patent document provides various techniques that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. In addition, the video encoder may also implement these techniques during the course of encoding in order to reconstruct the decoded frames for further encoding. In the following description, the term video block is used to denote a logical grouping of pixels, and different embodiments may work with different sized video blocks. Further, a video block may correspond to one chrominance or luminance component, or may comprise another component representation, such as an RGB representation.
For ease of understanding, section headings are used in this document, and embodiments and techniques are not limited to the corresponding sections. As such, embodiments from one section may be combined with embodiments from other sections.
1. Overview
The technology described in this patent document relates to video coding technology. In particular, the techniques described in this patent document relate to intra/inter mode coding in video coding. It can be applied to existing video coding standards like High Efficiency Video Coding (HEVC) or to standards to be finalized (universal video coding). It may also be applicable to future video coding standards or video codecs.
2. Background of the invention
The video coding standard has evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T makes the H.261 and H.263 standards, ISO/IEC makes the MPEG-1 and MPEG-4 Visual standards, and both organizations jointly make the H.262/MPEG-2 Video standard and the H.264/MPEG-4 Advanced Video Coding (AVC) standard and the H.265/HEVC standard. Starting from h.262, video coding standards are based on hybrid video coding structures, in which temporal prediction plus transform coding is utilized. To explore future video coding techniques beyond HEVC, joint video exploration team (jfet) was jointly established by VCEG and MPEG in 2015. Since then, JVET adopted many new approaches and incorporated it into reference software named Joint Exploration Model (JEM). In month 4 of 2018, a joint video experts group (jviet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) was created for the VVC standard, with the goal of reducing the bit rate by 50% compared to HEVC.
Fig. 21 shows a block diagram of an example implementation of a video encoder.
2.1 quad Tree plus binary Tree (QTBT) Block Structure with larger Code Tree Units (CTU)
In HEVC, CTUs are divided into CUs by using a quadtree structure, denoted as coding tree, to accommodate various local characteristics. The decision whether to encode a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction is made at the CU level. Each CU may be further divided into one, two, or four Prediction Units (PUs) according to PU partition types. Within a PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After a residual block is obtained by applying a prediction process based on the PU partition type, a CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to a coding tree of the CU. One of the key features of the HEVC structure is that it has multiple partitioning concepts, including CU, PU and TU.
The QTBT structure removes the concept of multiple partition types, i.e. it removes the separation of CU, PU and TU concepts and supports greater flexibility of CU partition shapes. In a QTBT block structure, a CU may have a square or rectangular shape. As shown in fig. 1, the CTU is first partitioned by a quadtree structure. The leaf nodes of the quadtree are further partitioned by a binary tree structure. There are two partition types in binary tree partitioning: symmetrical horizontal division and symmetrical vertical division. The binary tree leaf nodes are called Coding Units (CUs) and the partitioning is used for prediction and transform processing without any further partitioning. This means that CU, PU and TU have the same block size in the QTBT coding block structure. In JEM, a CU sometimes consists of Coded Blocks (CBs) of different color components, e.g., in 4: 2: in the case of P and B slices of the 0 chroma format, one CU contains one luma CB and two chroma CBs; and a CU sometimes consists of CBs of a single component, e.g., in the case of I-slices, a CU contains only one luma CB or only two chroma CBs.
The following parameters are defined for the QTBT segmentation scheme:
-CTU size: the root node size of the quadtree, the same as the concept in HEVC;
-MinQTSize: a minimum allowed quadtree leaf node size;
-MaxBTSize: the maximum allowed size of a root node of the binary tree;
-MaxBTDepth: a maximum allowed binary tree depth;
-MinBTSize: minimum allowed binary tree leaf node size;
in one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 luma samples with two corresponding 64 × 64 chroma sample blocks, MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (width and height) is set to 4 × 4, and MaxBTDepth is set to 4. Quadtree partitioning is first applied to CTUs to generate quadtree leaf nodes. The quad tree leaf nodes may have sizes from 16 × 16 (i.e., MinQTSize) to 128 × 128 (i.e., CTU size). If the leaf quad tree node is 128 x 128, it is not further partitioned by the binary tree since the size exceeds MaxBTSize (i.e., 64 x 64). Otherwise, the leaf quadtree nodes may be further partitioned by the binary tree. Thus, the quad tree leaf nodes are also the root nodes of the binary tree, and the binary tree depth is 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further partitioning is considered. When the width of the binary tree node is equal to MinBTSize (i.e., 4), no further horizontal partitioning is considered. Similarly, when the height of the binary tree node is equal to MinBTSize, no further vertical partitioning is considered. The leaf nodes of the binary tree are further processed by prediction and transformation processes without any further partitioning. In JEM, the maximum CTU size is 256 × 256 luma samples.
Fig. 1 shows an example of block partitioning by using QTBT, and fig. 1 (right) shows the corresponding tree representation. The solid lines represent quad-tree partitions and the dashed lines represent binary tree partitions. In each partition (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which partition type (i.e., horizontal or vertical) to use, where 0 represents horizontal partition and 1 represents vertical partition. For a quadtree partition, there is no need to indicate the partition type, since the quadtree partition always partitions the block horizontally and vertically to generate 4 sub-blocks with equal size.
In addition, the QTBT scheme supports the ability to have separate QTBT structures for luminance and chrominance. Currently, for P and B slices, luma CTB and chroma CTB in one CTU share the same QTBT structure. However, for I-slices, the luma CTB is partitioned into CUs by a QTBT structure and the chroma CTB is partitioned into chroma CUs by another QTBT structure. This means that a CU in an I-slice consists of coded blocks of luma components or coded blocks of two chroma components, and a CU in a P-slice or B-slice consists of coded blocks of all three color components.
In HEVC, inter prediction of small blocks is restricted to reduce memory access for motion compensation, such that bi-prediction is not supported for 4 × 8 and 8 × 4 blocks, and inter prediction is not supported for 4 × 4 blocks. In the QTBT of JEM, these restrictions are removed.
Inter prediction in 2.2HEVC/H.265
Each inter-predicted PU has motion parameters of one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. The inter _ pred _ idc signaling may also be used to signal the use of one of the two reference picture lists. The motion vector can be explicitly encoded as a variation from the prediction value.
When a CU is coded in skip mode (skip mode), one PU is associated with the CU and there are no significant residual coefficients, no motion vector delta or reference picture index to code. A Merge mode is specified whereby motion parameters, including spatial and temporal candidates, for the current PU are obtained from neighboring PUs. The Merge mode may be applied to any inter-predicted PU, not just the skip mode. An alternative mode to the Merge mode is the explicit transmission of motion parameters, where motion vectors (more precisely, motion vector differences compared to motion vector predictors), corresponding reference picture indices of each reference picture list, and the use of reference picture lists are explicitly signaled per PU. In this disclosure, such a mode is referred to as Advanced Motion Vector Prediction (AMVP).
When the signaling indicates that one of the two reference picture lists is to be used, the PU is generated from one sample block. This is called "unidirectional prediction". Unidirectional prediction may be used for P-slices and B-slices.
When the signaling indicates that two reference picture lists are to be used, the PU is generated from two blocks of samples. This is called "bi-prediction". Bi-prediction can only be used for B slices.
The following text provides details regarding the inter prediction modes specified in HEVC. The description will start with the Merge mode.
2.2.1Merge mode
2.2.1.1 derivation of candidates for Merge mode
When predicting a PU using Merge mode, the index pointing to an entry in the Merge candidate list is parsed from the bitstream and motion information is retrieved with it. The construction of this list is specified in the HEVC standard and can be summarized according to the following sequence of steps:
step 1: initial candidate derivation
Step 1.1: spatial candidate derivation
O step 1.2: redundancy checking of spatial candidates
Step 1.3: time domain candidate derivation
Step 2: additional candidate insertions
O step 2.1: creation of bi-directional prediction candidates
Step 2.2: insertion of zero motion candidates
These steps are also schematically depicted in fig. 2. For spatial Merge candidate derivation, a maximum of four Merge candidates are selected among the candidates located at five different positions. For time domain Merge candidate derivation, at most one Merge candidate is selected among the two candidates. Since a constant number of candidates per PU is assumed at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of Merge candidates (MaxNumMergeCand) signaled in the slice header. Since the number of candidates is constant, the index of the best Merge candidate is encoded using truncated unary code binarization (TU). If the size of the CU is equal to 8, all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2N × 2N prediction unit.
Hereinafter, operations associated with the foregoing steps are described in detail.
2.2.1.2 spatial candidate derivation
In the derivation of spatial Merge candidates, a maximum of four Merge candidates are selected among the candidates located in the positions depicted in fig. 3. The order of derivation is A 1 、B 1 、B 0 、A 0 And B 2 . Only when in position A 1 、B 1 、B 0 、A 0 Is not available (e.g., because it belongs to another slice or slice) or is intra-coded, location B is considered 2 . At the addition position A 1 After the candidate of (b), a redundancy check is performed on the addition of the remaining candidates, which ensures that candidates with the same motion information are excluded from the list, thereby improving the coding efficiency. In order to reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. In contrast, if the corresponding candidates for redundancy check have the same motion information, only the pairs linked with the arrows in fig. 4 are considered and only the candidates are added to the list. Another source of repetitive motion information is the "second PU" associated with a partition other than 2Nx 2N. As an example, fig. 5 depicts a second PU for the N × 2N and 2N × N cases, respectively. When the current PU is partitioned into Nx2N, position A 1 The candidates of (a) are not considered for list construction. In fact, adding this candidate will result in two prediction units with the same motion information, which is redundant for having only one PU in the coding unit. Similarly, when the current PU is partitioned into 2 NxN, position B is not considered 1
2.2.1.3 time-domain candidate derivation
In this step, only one candidate is added to the list. In particular, in the derivation of the temporal target candidate, a scaled motion vector is derived based on a co-located (co-located) PU belonging to a picture with the smallest POC difference with the current picture within a given reference picture list. The reference picture list to be used for deriving the co-located PU is explicitly signaled in the slice header. A scaled motion vector for the temporal domain Merge candidate is obtained, as shown by the dashed line in fig. 6, scaled from the motion vector of the co-located PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of the temporal Merge candidate is set equal to zero. The actual implementation of the scaling process is described in HEVC specification [1 ]. For B slices, two motion vectors are obtained, one for reference picture list 0 and the other for reference picture list 1, which are combined to get the bi-predictive Merge candidate.
Fig. 6 is a diagram of motion vector scaling for temporal domain Merge candidates.
In co-located PU (Y) belonging to reference frame, in candidate C 0 And C 1 The location of the time domain candidate is selected as shown in fig. 7. If position C 0 Is intra-coded or outside the current CTU row, using location C 1 . Otherwise, position C 0 For derivation of time domain Merge candidates.
2.2.1.4 additional candidate insertions
In addition to spatial and temporal Merge candidates, there are two additional types of Merge candidates: a combined bidirectional predictive Merge candidate and zero Merge candidate. A combined bidirectional predictive Merge candidate is generated by using spatial and temporal Merge candidates. The combined bi-directionally predicted Merge candidates are for B slices only. A combined bi-directional prediction candidate is generated by combining the first reference picture list motion parameters of the initial candidate with the second reference picture list motion parameters of the other. If these two tuples (tuple) provide different motion hypotheses, they will form new bi-directional prediction candidates. As an example, fig. 8 depicts the case when two candidates with mvL0 and refIdxL0 or mvL1 and refIdxL1 in the original list (left side) are used to create a combined bi-predictive Merge candidate that is added to the final list (right side). There are many rules, defined in [1], on the combinations that are considered to generate these additional Merge candidates.
Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list and thus reach the maxnummerge capacity. These candidates have zero spatial displacement and a reference picture index that starts from zero and increases each time a new zero motion candidate is added to the list. The number of reference frames that these candidates use is 1 and 2 for unidirectional and bidirectional prediction, respectively. Finally, no redundancy check is performed on these candidates.
2.2.1.5 motion estimation regions for parallel processing
To speed up the encoding process, motion estimation may be performed in parallel, whereby motion vectors for all prediction units within a given region are derived simultaneously. Deriving the Merge candidate from the spatial neighborhood may interfere with parallel processing because one prediction unit cannot derive motion parameters from neighboring PUs until its associated motion estimation is complete. To mitigate the tradeoff between coding efficiency and processing latency, HEVC defines a Motion Estimation Region (MER), the size of which is signaled using a "log 2_ parallel _ merge _ level _ minus 2" syntax element in the picture parameter set. When the MER is defined, the Merge candidate belonging to the same area is marked as unavailable and is therefore not considered in the list construction.
2.2.2AMVP
AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by first checking the availability of the left, upper temporally neighboring PU locations, removing the redundancy candidates and adding zero vectors to make the candidate list a constant length. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, a truncated unary (truncated unary) is used to encode the index of the best motion vector candidate. The maximum value to be encoded in this case is 2 (see fig. 9). In the following sections, details regarding the derivation process of motion vector prediction candidates are provided.
2.2.2.1 derivation of AMVP candidates
Fig. 9 summarizes the derivation of motion vector prediction candidates.
In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For spatial motion vector candidate derivation, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions, as shown in fig. 3.
For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatio-temporal candidates is made, the repeated motion vector candidates in the list are removed. If the number of potential candidates is greater than 2, the motion vector candidate with a reference picture index greater than 1 within the associated reference picture list is removed from the list. If the number of spatial-temporal motion vector candidates is less than 2, additional zero motion vector candidates are added to the list.
2.2.2.2 spatial motion vector candidates
In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among the five potential candidates, which are derived from PUs located at positions as shown in fig. 3, which are the same as the position of the motion Merge. The derivation order of the left side of the current PU is defined as A 0 、A 1 And scaled A 0 Zoom, A 1 . The derivation order of the upper side of the current PU is defined as B 0 、B 1 、B 2 Zoomed B 0 Zoomed B 1 Zoomed B 2 . Thus, for each side, there are four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling and two of which use spatial scaling. Four different scenarios are summarized below.
No spatial scaling
- (1) identical reference picture list, and identical reference picture index (identical POC)
- (2) different reference picture lists, but the same reference picture (same POC)
Spatial scaling
- (3) same reference picture list, but different reference pictures (different POCs)
- (4) different reference picture lists, and different reference pictures (different POCs)
First check for no spatial scaling case and then check for spatial scaling. Spatial scaling is considered when POC differs between reference pictures of neighboring PUs and reference pictures of a current PU regardless of reference picture lists. If all PUs of the left candidate are not available or intra coded, scaling of the motion vectors described above is allowed to facilitate parallel derivation on the left and above MV candidates. Otherwise, the motion vectors do not allow spatial scaling.
Fig. 10 is a diagram of motion vector scaling of spatial motion vector candidates.
In the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner as the temporal scaling, as shown in fig. 10. The main difference is that the reference picture list and the index of the current PU are given as input; the actual scaling process is the same as the time domain scaling process.
2.2.2.3 temporal motion vector candidates
All processes for deriving temporal target candidates are the same as all processes for deriving spatial motion vector candidates except for reference picture index derivation (see fig. 7). The reference picture index is signaled to the decoder.
2.3 New interframe Merge candidates in JEM
2.3.1 sub-CU-based motion vector prediction
In JEM with QTBT, each CU may have at most one set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by dividing the large CU into sub-CUs and deriving motion information of all sub-CUs of the large CU. An Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to extract multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In the spatial-temporal motion vector prediction (STMVP) method, a motion vector of a sub-CU is recursively (recursively) derived by using a temporal motion vector predictor and a spatial neighboring motion vector.
In order to preserve more accurate motion fields for sub-CU motion prediction, motion compression of reference frames is currently disabled.
2.3.1.1 alternative temporal motion vector prediction
In an Alternative Temporal Motion Vector Prediction (ATMVP) method, motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting a plurality of sets of motion information (including motion vectors and reference indices) from a block smaller than a current CU. As shown in fig. 11, a sub-CU is a square N × N block (N is set to 4 by default).
ATMVP predicts motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture using a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain the motion vector and the reference index of each sub-CU from the block corresponding to each sub-CU, as shown in fig. 11.
In a first step, the reference picture and the corresponding block are determined from motion information of spatially neighboring blocks of the current CU. To avoid a repeated scanning process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set as the indices of the temporal vector and the motion source picture. In this way, in ATMVP, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes referred to as a collocated block) is always located at the bottom right or center position relative to the current CU.
In a second step, the corresponding block of the sub-CU is identified by a temporal vector in the motion source picture by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (the minimum motion grid covering the center sample) is used to derive the motion information of the sub-CU. After identifying the motion information of the corresponding nxn block, the motion information is converted into a motion vector and reference index of the current sub-CU in the same manner as the TMVP of HEVC, wherein motion scaling and other processes apply. For example, the decoder checks whether low is satisfiedDelay condition (i.e. POC of all reference pictures of the current picture is smaller than POC of the current picture) and possibly use motion vector MV x (motion vector corresponding to reference picture list X) to predict motion vector MV of each sub-CU y (wherein X equals 0 or 1 and Y equals 1-X).
2.3.1.2 spatio-temporal motion vector prediction
In this method, the motion vectors of the sub-CUs are recursively derived in raster scan order. Fig. 12 illustrates this concept. Let us consider an 8 × 8CU containing four 4 × 4 sub-CUs a, B, C and D. The neighboring 4 x 4 blocks in the current frame are labeled a, b, c, and d.
The motion derivation of sub-CU a starts by identifying its two spatial neighborhoods. The first neighborhood is the nxn block (block c) above the sub-CU a. If this block c is not available or intra coded, the other nxn blocks above the sub-CU a are examined (from left to right, starting at block c). The second neighbourhood is the block to the left of sub-CU a (block b). If block b is not available or is intra-coded, the other blocks to the left of sub-CU a are examined (from top to bottom, starting at block b). The motion information obtained from the neighboring blocks of each list is scaled to the first reference frame of the given list. Next, the Temporal Motion Vector Predictor (TMVP) of sub-block a is derived by following the same procedure as the TMVP derivation specified in HEVC. The motion information of the collocated block at position D is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged separately for each reference list. The average motion vector is specified as the motion vector of the current sub-CU.
2.3.1.3 sub-CU motion prediction mode signaling
The sub-CU mode is enabled as an additional Merge candidate and no additional syntax element is needed to signal the mode. Two additional Merge candidates are added to the Merge candidate list of each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, then up to seven Merge candidates are used. The encoding logic of the additional Merge candidates is the same as the Merge candidates in the HM, which means that two additional Merge candidates require two more RD checks for each CU in a P or B slice.
In JEM, all bins of the Merge index are context-coded by Context Adaptive Binary Arithmetic Coding (CABAC). Whereas in HEVC only the first bin is context coded and the remaining bins are context bypass coded.
2.3.2 non-neighboring Merge candidates
In J0021 Qualcomm proposes to derive additional spatial Merge candidates from non-adjacent neighboring locations, labeled 6 to 49, as shown in fig. 13. The derived candidates are added after the TMVP candidates in the Merge candidate list.
In J0058, Tencent proposes to derive additional spatial Merge candidates from positions in an outer reference region with an offset (-96 ) to the current block.
As shown in FIG. 14, the locations are labeled A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidate. Each candidate a (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous a or D candidate. Each E (i, j) has an offset of 16 in the horizontal and vertical directions compared to its previous E candidate. The candidates are checked from the inside to the outside. The order of candidates is A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Further study was made whether the number of Merge candidates could be further reduced. The candidates are added after the TMVP candidate in the Merge candidate list.
In J0059, the extended spatial positions from 6 to 27 as in fig. 15 are checked according to their numerical order after the time domain candidates. To save MV line buffers, all spatial candidates are restricted to two CTU lines.
2.4 Intra prediction in JEM
2.4.1 Intra mode coding with 67 Intra prediction modes
To capture any edge direction present in natural video, the number of directional intra modes extends from 33 to 65 used in HEVC. The additional directional mode is depicted as a red dashed arrow in fig. 16, and the planar mode and the DC mode remain unchanged. These denser directional intra prediction modes are applicable to all block sizes as well as luma and chroma intra prediction.
2.4.2Luma Intra mode coding
To accommodate the increased number of directional intra modes, an intra mode coding method with 6 Most Probable Modes (MPM) is used. Two main technical aspects are involved: 1) derivation of 6 MPMs, and 2) entropy coding of 6 MPMs and non-MPM modes.
In JEM, the modes included in the MPM list are classified into three groups:
neighborhood intra mode
Derived intra modes
Default intra mode
Five neighboring intra prediction modes are used to form the MPM list. Those positions of the 5 neighboring blocks are the same as those used in the Merge mode, i.e., left (L), above (a), Below Left (BL), Above Right (AR), and Above Left (AL), as shown in fig. 17. The initial MPM list is formed by inserting 5 neighbor intra modes and a planar mode and a DC mode into the MPM list. The pruning process is used to remove duplicate patterns so that only unique patterns can be included in the MPM list. The order including the initial mode is: left, above, planar, DC, left below, right above, then left above.
Fig. 17 shows an example of neighboring blocks for MPM derivation.
If the MPM list is not full (i.e., less than 6 MPM candidates in the list), add the derived mode; these intra modes are obtained by adding-1 or +1 to the angular modes (angular modes) already included in the MPM list. Such additional derived modes are not generated from non-angular modes (DC or planar).
Finally, if the MPM list is not yet complete, add default mode in the following order: vertical, horizontal, mode 2, and diagonal modes. As a result of this process, a unique list of 6 MPM modes is generated.
For entropy coding of the selected mode using 6 MPMs, truncated unary code binarization is used. The first three bins (bins) are coded using a context that depends on the MPM mode associated with the bin currently being signaled. MPM modes are classified into three classes: (a) mainly horizontal modes (i.e., the number of MPM modes is less than or equal to the number of modes in the diagonal direction), (b) mainly vertical modes (i.e., the number of MPM modes is greater than the number of modes in the diagonal direction, and (c) non-angular (DC and planar) classes.
The encoding for selecting the remaining 61 non-MPMs is done as follows. The 61 non-MPMs are first divided into two sets: a selected set of modes and an unselected set of modes. The selected mode set contains 16 modes, with the remaining modes (45 modes) assigned to the unselected mode sets. The mode set to which the current mode belongs is indicated in the bitstream with a flag. If the mode to be indicated is within the selected mode set, the selected mode is signaled using a 4-bit fixed length code, and if the mode to be indicated is from the unselected set, signaled using a truncated binary code. The selected mode set is generated by sub-sampling 61 non-MPM modes as follows:
the selected set of patterns {0,4,8,12,16,20.. 60}
Unselected mode set {1,2,3,5,6,7,9,10.. 59}
At the encoder side, a similar two-stage intra mode decision process of HM is used. In the first stage, the intra mode pre-selection stage, the N intra prediction modes are pre-selected from all available intra modes using lower complexity absolute transform difference Sum (SATD) costs. In the second stage, a higher complexity R-D cost selection is further applied to select one intra prediction mode from the N candidates. However, when 67 intra prediction modes are applied, the complexity of the intra mode pre-selection stage will also increase if the same encoder mode decision process of HM is used directly, since the total number of available modes is approximately doubled. To minimize encoder complexity increase, a two-step intra mode pre-selection process is performed. In a first step, based on the Sum of Absolute Transform Differences (SATD) measurements, N (N depending on the intra-prediction block size) modes are selected from the original 35 intra-prediction modes (indicated by the solid black arrows in fig. 16); in a second step, the direct neighborhood of the selected N modes is further examined by SATD (additional intra prediction directions as indicated by the dashed arrows in fig. 16) and the list of the selected N modes is updated. Finally, if not already included, the first M MPMs are added to the N modes and a final list of candidate intra prediction modes is generated for the second stage R-D cost check, which is done in the same way as the HM. Based on the original settings in HM, the value of M is increased by 1, and as shown in table 1, N is slightly decreased.
Table 1: number of mode candidates in the intra mode pre-selection step
Figure GDA0003247220560000141
Figure GDA0003247220560000151
2.4.3 chroma Intra mode coding
In JEM, a total of 11 intra modes are allowed for chroma CB coding. These modes include 5 conventional intra modes and 6 cross-component linear model modes. The list of chroma mode candidates comprises the following three parts:
CCLM mode
DM mode, intra prediction mode derived from luma CB covering five collocated positions of the current chroma block
The five positions to be checked in order are: a Center (CR), upper left (TL), upper right (TR), lower left (BL), and lower right (BR)4 × 4 block within the corresponding luminance block for the current chrominance block of the I-band. For P and B stripes, only one of these five sub-blocks is checked because they have the same pattern index. An example of five collocated luminance locations is shown in fig. 18.
Chroma prediction modes from spatially neighboring blocks:
o 5 chroma prediction modes: from left, above, below left, above right, and above left spatially adjacent blocks
Omicron plane and DC mode
O add derived modes, these intra modes being obtained by adding-1 or +1 to the angular modes already included in the list
O vertical, horizontal, mode 2
The pruning process is applied whenever a new chroma intra mode is added to the candidate list. The non-CCLM chroma intra mode candidate list is then trimmed to size 5. For mode signaling, a flag is first signaled to indicate whether to use one of the CCLM modes or one of the conventional chroma intra prediction modes. Then a few more flags may follow to specify the exact chroma prediction mode for the current chroma CB.
3. Examples of problems addressed by embodiments
With QTBT, there are completely different CU shapes, e.g., 4 × 32 and 32 × 4, etc. For different CU shapes, they may have different dependencies with neighboring blocks. However, in intra-mode and inter-mode coding, the Merge list, AMVP list, or MPM list is used for the construction of all CU shapes in the same manner, which is not reasonable.
Meanwhile, the default intra modes for MPM list construction are always Vertical (VER), Horizontal (HOR), mode 2, and diagonal mode (DIG), which is not reasonable.
4. Examples of the embodiments
To solve the technical problems described in this patent document, and to provide other benefits, shape dependent intra/inter mode coding is proposed, in which different Merge lists, AMVP lists, or MPM lists may be constructed.
The following detailed examples should be considered as examples to explain the general concept. These exemplary features should not be construed in a narrow sense. Furthermore, these exemplary features may be combined in any manner.
1. It is proposed that the insertion of the intra mode candidates in the MPM list depends on the current coding block shape (e.g. the coding block is a CU).
a. In one example, for CU shapes with width > N x height, the intra prediction modes extracted from the above neighboring blocks are inserted before the intra prediction modes extracted from the left neighboring blocks, where N is equal to 1,2,3, or other values.
i. Alternatively, in addition, the intra prediction mode extracted from the upper-right neighboring block is inserted before the intra prediction mode extracted from the lower-left neighboring block.
Alternatively, in addition, the intra prediction mode extracted from the upper-left neighboring block is inserted before the intra prediction mode extracted from the lower-left neighboring block.
Optionally, additionally, all intra prediction modes extracted from neighboring blocks above the current block are inserted before the intra prediction mode extracted from neighboring blocks to the left of the current block.
b. In one example, for CU shapes with width > N × height, it is proposed to insert more intra prediction modes extracted from the upper block, as in the upper block shown in fig. 19A.
c. In one example, for CU shapes with height > N × width, it is proposed to insert more intra prediction modes extracted from the left block, such as the left middle block shown in fig. 19B.
d. Alternatively, in addition, the remaining intra prediction modes outside the MPM list may be reordered based on block shape. That is, the codeword length or the coding context used to code the remaining intra prediction modes may depend on the block shape.
2. It is proposed that the default intra-mode for constructing the MPM list depends on the current CU shape.
a. In one example, for CU shapes with width > M x height, a Vertical Diagonal (VDIG) mode is used instead of mode 2 (horizontal diagonal), where M is equal to 1,2, or other values.
b. In one example, for CU shapes with width > N x height, the insertion pattern HOR-/+ k replaces pattern 2 or/and diagonal pattern, where k is equal to 1,2,3,.., 8.
c. In one example, for CU shapes with width > N x height, the HOR mode is inserted before the VER mode.
d. In one example, for CU shapes with height > N × height, the insert pattern VER-/+ k replaces pattern 2 or/and the diagonal pattern.
3. Alternatively, it is additionally proposed that after constructing the MPM list, the MPM list is also reordered depending on the current CU shape.
a. In one example, for CU shapes with width > N × height, intra prediction modes closer to the horizontal direction are preferred than other modes closer to the vertical direction.
i. The MPM list is scanned from the beginning, and when an intra prediction mode closer to the vertical direction is encountered, its subsequent modes are checked, and if a mode closer to the horizontal direction is found, the two modes are swapped. This process is repeated until the entire list is processed.
Alternatively, such exchanges do not apply to the modes VER-/+ k, even if they are closer to the vertical direction, where k equals 1,2,3, or other values.
b. In one example, for a CU with a height > N x height, an intra prediction mode closer to the vertical direction is preferred compared to other modes closer to the horizontal direction.
i. The MPM list is scanned from the beginning, and when an intra prediction mode closer to the horizontal direction is encountered, its subsequent modes are checked, and if a mode closer to the vertical direction is found, the two modes are swapped. This process is repeated until the entire list is processed.
Alternatively, such exchanges do not apply to the mode HOR-/+ k, even if they are closer to the horizontal direction.
4. The term "block shape" in the above bullets (bullets) may mean:
a. square blocks or non-square blocks
b. Ratio of width to height of current coding block
c. Defined by the width and height of the block.
5. The proposed method may be applied to certain modes, block sizes/shapes and/or certain sub-block sizes.
a. The proposed method can be applied to certain modes, such as traditional translational motion (i.e. affine mode disabled).
b. The proposed method can be applied to certain block sizes.
i. In one example, the proposed method is only applied to blocks with w × h > ═ T, where w and h are the width and height of the current block.
in another example, the proposed method is only applied to blocks with w > -T & & h > -T.
6. The proposed method can be applied on all color components. Alternatively, the proposed method may be applied to only some color components. For example, the proposed method may only be applied on the luminance component.
Fig. 20 is a block diagram of the video processing apparatus 2000. The apparatus 2000 may be used to implement one or more of the methods described herein. The apparatus 2000 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, or the like. The apparatus 2000 may include one or more processors 2002, one or more memories 2004, and video processing hardware 2006. Processor(s) 2002 may be configured to implement one or more methods described in this patent document, such as the method described with reference to method 2200. Memory (es) 2004 may be used to store data and code for implementing the methods and techniques described herein, such as the methods described with reference to method 2200. The video processing hardware 2006 may be hardware circuitry for implementing some of the methods described in this patent document. In various implementations, the memory 2004 and/or the video processing hardware 2006 may be partially or wholly incorporated into the processor 2002 itself.
Fig. 22 is a flow chart of a method 2200 of video bitstream processing. The method 2200 comprises: for at least part of the intra-coded video block, a list of intra-mode candidates is generated (2202) according to a first shape-dependency rule that depends on the shape of the video block, and a decoded representation of the video block is reconstructed (2204) using the list of intra-mode candidates.
Referring to method 2200, in some embodiments, the list of intra-mode candidates is a list of Most Probable Mode (MPM) candidate lists. Referring to method 2200, in some embodiments, the first shape dependency rule specifies an order in which neighboring blocks are checked for insertion into the list of intra mode candidates. Referring to method 2200, in some embodiments, the first shape dependency rule specifies that in a case where the width of the video block is greater than N times the height of the video block, where N is an integer greater than or equal to 1, a list of intra mode candidates is first generated by using intra prediction modes from above adjacent blocks relative to the video block before intra prediction modes from left adjacent blocks relative to the video block.
Referring to method 2200, in some embodiments, the intra prediction mode from the top-right adjacent block relative to the video block is added to the list of intra mode candidates before the intra prediction mode from the bottom-left adjacent block relative to the video block, or the intra prediction mode from the top-left adjacent block relative to the video block is added to the list of intra mode candidates before the intra prediction mode from the bottom-left adjacent block relative to the video block.
Referring to method 2200, in some embodiments, the first shape dependency rule specifies that the list of intra mode candidates includes intra prediction modes from an above adjacent block relative to the video block if a width of the video block is greater than N times a height of the video block, where N is an integer greater than or equal to 1. Referring to method 2200, in some embodiments, the top adjacent block is a middle block.
Referring to method 2200, in some embodiments, the first shape dependency rule specifies that the list of intra mode candidates includes an intra prediction mode from a left neighboring block relative to the video block if the height of the video block is greater than N times the width of the video block, where N is an integer greater than or equal to 1. Referring to method 2200, in some embodiments, the left adjacent block is a middle block. Referring to method 2200, in some embodiments, the video bitstream processing comprises a compressed representation of a video block encoded using a codeword that is assigned using a second shape dependency rule.
Referring to method 2200, in some embodiments, the first shape dependency rule specifies a default intra-mode for constructing the list of intra-mode candidates. Referring to method 2200, in some embodiments, the first shape dependency rule specifies that the default intra mode corresponds to a vertical diagonal mode if a width of the video block is greater than M times a height of the video block, where M is an integer greater than or equal to 1. Referring to method 2200, in some embodiments, the first shape dependency rule specifies that, where the width of the video block is greater than M times the height of the video block, where M is an integer greater than or equal to 1, the mode HOR-/+ k is used as the default intra mode, where k is 1,2,3,. Referring to method 2200, in some embodiments, a first shape dependency rule specifies that in the event that the height of a video block is greater than N times the width of the video block, where N is an integer greater than or equal to 1, a mode VER-/+ k is inserted into the list of intra mode candidates, where k ═ 1,2,3,., or 8. Referring to method 2200, in some embodiments, the first shape dependency rule specifies that the list of intra mode candidates comprises an HOR mode before a VER mode in the event that a height of the video block is greater than N times a width of the video block, where N is an integer greater than or equal to 1.
Referring to method 2200, in some embodiments, the first shape dependency rule specifies an order of a list of intra mode candidates that depends on a shape of the video block. Referring to method 2200, in some embodiments, the first dependency rule specifies that in the event that the width of the video block is greater than N times the height of the video block, where N is an integer greater than or equal to 1, an intra prediction mode closer to the horizontal direction is used instead of other modes closer to the vertical direction. Referring to method 2200, in some embodiments, the method further comprises reordering the list of intra mode candidates by: scanning the list of intra mode candidates from a beginning portion of the list of intra mode candidates; and in the event that an intra prediction mode entry closer to the vertical direction is found, swapping the bar code with an entry of a subsequent entry closer to the horizontal direction.
Referring to method 2200, in some embodiments, the first dependency rule specifies that in the event that the height of a video block is greater than N times the width of the video block, where N is an integer greater than or equal to 1, an intra prediction mode closer to the vertical direction is used instead of other modes closer to the horizontal direction. Referring to method 2200, in some embodiments, the method further comprises reordering the list of intra mode candidates by: scanning the intra mode candidate list from a beginning portion of the intra mode candidate list; and in the event that an intra prediction mode entry closer to the horizontal direction is found, swapping the bar code with an entry of a subsequent entry closer to the vertical direction.
Referring to method 2200, in some embodiments, a video block comprises a Coding Unit (CU). Referring to method 2200, in some embodiments, the shape of the video block is one of a square or a rectangle. Referring to method 2200, in some embodiments, the shape of the video block corresponds to a ratio of a width and a height. Referring to method 2200, in some embodiments, the first shape dependency rule selectively applies two different dependency rules based on the encoding conditions of the video block. Referring to method 2200, in some embodiments, the encoding condition comprises whether a height of a plurality of pixels in the video block or a width of the video block is greater than or equal to a threshold. Referring to method 2200, in some embodiments, the method is applied to one or more of the luma component or the chroma component of the video block.
The video decoding apparatus includes a processor that may be configured to implement the method described with reference to method 2200. The video encoding apparatus includes a processor that may be configured to implement the method described with reference to method 2200. A computer program product having computer code stored thereon, which when executed by a processor, causes the processor to implement the method described with reference to method 2200.
Referring to method 2200, a video block may represent a CU of a compressed video bitstream. The shape of the video block may depend on the actual values of the aspect ratio, or the height and width, or the relative values of the height and width. In various embodiments, the list of various candidates may be generated implicitly or explicitly (e.g., by storing the list in memory).
Referring to method 2200, some examples of contiguous blocks and their use are described in chapter 4 of this document. For example, as described in chapter 4, under different shapes of video blocks, a top-adjacent block or a left-adjacent block may be preferred. In some embodiments, the top or left center or middle block (or sub-block) may be the preferred block, with candidates from the preferred block added to the list.
Referring to method 2200, a block of video may be encoded in a video bitstream using a codeword-based technique (e.g., context adaptive binary arithmetic coding or variable length coding) in which bit efficiency may be achieved by using bitstream generation rules that also depend on the shape of the video block.
Referring to method 2200, the shape of the encoded video block may be used to decide which blocks to use for the candidate, or to decide the order in which to place the candidates in the list of candidates, or both.
It should be appreciated that the disclosed techniques may be embodied in a video encoder or decoder to improve compression efficiency when the compressed coding unit has a shape that is significantly different from a conventional square-shaped block or rectangular block. For example, new coding tools using long or tall coding units, such as units of 4 x 32 or 32 x 4 size, may benefit from the disclosed techniques.
The disclosure and other aspects, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a combination of substances that affect a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" includes all devices, apparatus, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. In addition to hardware, the apparatus can include code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. While this document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions.
While this document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. In this document, certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this document should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described, and other implementations, enhancements and variations can be made based on what is described and illustrated in this document.

Claims (28)

1. A video bitstream processing method, the method comprising:
for at least a portion of the intra-coded video block, generating a list of intra mode candidates according to a first shape dependency rule that depends on a shape of the video block, wherein an insertion order of the intra mode candidates used to generate the list of intra mode candidates depends on the first shape dependency rule; and
reconstructing a decoded representation of the video block using the list of intra mode candidates,
wherein the video bitstream processing comprises a compressed representation of the video block encoded using a codeword that is assigned using a second shape dependency rule.
2. The method of claim 1, wherein the list of intra-mode candidates is a list of Most Probable Mode (MPM) candidate lists.
3. The method of claim 1, wherein the first shape dependency rule specifies an order in which neighboring blocks are checked for insertion into the list of intra mode candidates.
4. The method of claim 3, wherein the first shape dependency rule specifies that in a case that a width of the video block is greater than N times a height of the video block, where N is an integer greater than or equal to 1, the list of intra mode candidates is first generated by using intra prediction modes from above adjacent blocks relative to the video block before intra prediction modes from left adjacent blocks relative to the video block.
5. The method of claim 4, wherein
Adding an intra-prediction mode from a block adjacent to the upper right relative to the video block to the list of intra-mode candidates prior to an intra-prediction mode from a block adjacent to the lower left relative to the video block, or
Adding an intra-prediction mode from an upper-left neighboring block relative to the video block to the list of intra-mode candidates prior to an intra-prediction mode from a lower-left neighboring block relative to the video block.
6. The method of claim 1, wherein the first shape dependency rule specifies that the list of intra mode candidates comprises intra prediction modes from above adjacent blocks relative to the video block if a width of the video block is greater than N times a height of the video block, where N is an integer greater than or equal to 1.
7. The method of claim 6, wherein the above-adjacent block is a middle block.
8. The method of claim 1, wherein the first shape dependency rule specifies that the list of intra mode candidates comprises intra prediction modes from left neighboring blocks relative to the video block if a height of the video block is greater than N times a width of the video block, where N is an integer greater than or equal to 1.
9. The method of claim 8, wherein the left adjacent block is a middle block.
10. The method of claim 1, wherein the first shape dependency rule specifies a default intra mode for constructing the list of intra mode candidates.
11. The method of claim 10, wherein the first shape dependency rule specifies that the default intra mode corresponds to a vertical diagonal mode if a width of the video block is greater than M times a height of the video block, where M is an integer greater than or equal to 1.
12. The method of claim 10, wherein the first shape dependency rule specifies that, in the event that a width of the video block is greater than M times a height of the video block, where M is an integer greater than or equal to 1, a mode HOR-/+ k is used as the default intra mode, where k = 1,2,3, 4, 5,6,7, or 8.
13. The method of claim 10, wherein the first shape dependency rule specifies that a mode VER-/+ k is inserted in the list of intra mode candidates where k = 1,2,3, 4, 5,6,7, or 8 if the height of the video block is greater than N times the width of the video block, where N is an integer greater than or equal to 1.
14. The method of claim 10, wherein the first shape dependency rule specifies that the list of intra mode candidates comprises an HOR mode before a VER mode if a width of the video block is greater than N times a height of the video block, where N is an integer greater than or equal to 1.
15. The method of claim 1, wherein the first shape dependency rule specifies an order of the list of intra mode candidates that depends on a shape of the video block.
16. The method of claim 1, wherein the first shape dependency rule specifies that in the event that the width of the video block is greater than N times the height of the video block, where N is an integer greater than or equal to 1, intra prediction modes closer to a horizontal direction are used instead of other modes closer to a vertical direction.
17. The method of claim 1, wherein the method further comprises reordering the list of intra mode candidates by:
scanning the list of intra-mode candidates from a beginning portion of the list of intra-mode candidates; and
in case an intra prediction mode entry closer to the vertical direction is found, this entry is swapped with a subsequent entry closer to the horizontal direction.
18. The method of claim 1, wherein the first shape dependency rule specifies that in the event that the height of the video block is greater than N times the width of the video block, where N is an integer greater than or equal to 1, intra prediction modes closer to a vertical direction are used instead of other modes closer to a horizontal direction.
19. The method of claim 1, wherein the method further comprises reordering the list of intra mode candidates by:
scanning the list of intra-mode candidates from a beginning portion of the list of intra-mode candidates; and
in case an intra prediction mode entry closer to the horizontal direction is found, this entry is swapped with a subsequent entry closer to the vertical direction.
20. The method of any of claims 1-19, wherein the video block comprises a Coding Unit (CU).
21. The method of any of claims 1-19, wherein the shape of the video block is one of a square or a rectangle.
22. The method of any of claims 1-19, wherein a shape of the video block corresponds to a ratio of a width of the video block and a height of the video block.
23. The method of any of claims 1-19, wherein the first shape dependency rule selectively applies two different dependency rules based on encoding conditions of the video block.
24. The method of claim 23, wherein the encoding condition comprises whether a number of pixels in the video block or a height of the video block or a width of the video block is greater than or equal to a threshold.
25. The method of any one of claims 1-19, 24, wherein the method is applied to one or more of a luma component or a chroma component of the video block.
26. A video decoding device, the video decoding device comprising: a processor configured to implement the method of one or more of claims 1 to 25.
27. A video encoding device, the video encoding device comprising: a processor configured to implement the method of one or more of claims 1 to 25.
28. A computer readable program medium having stored thereon computer code which, when executed by a processor, causes the processor to implement the method of any one of claims 1 to 25.
CN201910586376.8A 2018-07-01 2019-07-01 Shape dependent intra coding Active CN110677679B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862692805P 2018-07-01 2018-07-01
US62/692,805 2018-07-01

Publications (2)

Publication Number Publication Date
CN110677679A CN110677679A (en) 2020-01-10
CN110677679B true CN110677679B (en) 2022-07-26

Family

ID=67253941

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910586376.8A Active CN110677679B (en) 2018-07-01 2019-07-01 Shape dependent intra coding
CN201910585161.4A Active CN110677678B (en) 2018-07-01 2019-07-01 Shape dependent inter-frame coding

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910585161.4A Active CN110677678B (en) 2018-07-01 2019-07-01 Shape dependent inter-frame coding

Country Status (3)

Country Link
CN (2) CN110677679B (en)
TW (2) TW202021344A (en)
WO (2) WO2020008324A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4307665A2 (en) 2019-08-10 2024-01-17 Beijing Bytedance Network Technology Co., Ltd. Buffer management in subpicture decoding
JP7322290B2 (en) 2019-10-02 2023-08-07 北京字節跳動網絡技術有限公司 Syntax for Subpicture Signaling in Video Bitstreams
MX2022004409A (en) 2019-10-18 2022-05-18 Beijing Bytedance Network Tech Co Ltd Syntax constraints in parameter set signaling of subpictures.
WO2021139806A1 (en) * 2020-01-12 2021-07-15 Beijing Bytedance Network Technology Co., Ltd. Constraints for video coding and decoding
WO2024022145A1 (en) * 2022-07-28 2024-02-01 Mediatek Inc. Method and apparatus of amvp with merge mode for video coding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103748877A (en) * 2011-08-17 2014-04-23 联发科技(新加坡)私人有限公司 Method and apparatus for intra prediction using non-square blocks
CN104054343A (en) * 2012-01-13 2014-09-17 夏普株式会社 Image decoding device, image encoding device, and data structure of encoded data
CN105898326A (en) * 2010-07-20 2016-08-24 株式会社Ntt都科摩 Image prediction encoding device and image prediction encoding method
WO2018037896A1 (en) * 2016-08-26 2018-03-01 シャープ株式会社 Image decoding apparatus, image encoding apparatus, image decoding method, and image encoding method
CN108028923A (en) * 2015-09-10 2018-05-11 Lg电子株式会社 Intra-frame prediction method and equipment in video coding system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101365570B1 (en) * 2007-01-18 2014-02-21 삼성전자주식회사 Method and apparatus for encoding and decoding based on intra prediction
US9247266B2 (en) * 2011-04-18 2016-01-26 Texas Instruments Incorporated Temporal motion data candidate derivation in video coding
US9787982B2 (en) * 2011-09-12 2017-10-10 Qualcomm Incorporated Non-square transform units and prediction units in video coding
EP3205109A4 (en) * 2014-12-09 2018-03-07 MediaTek Inc. Method of motion vector predictor or merge candidate derivation in video coding
KR20180028513A (en) * 2015-08-04 2018-03-16 엘지전자 주식회사 Method and apparatus for inter prediction in a video coding system
US10547854B2 (en) * 2016-05-13 2020-01-28 Qualcomm Incorporated Neighbor based signaling of intra prediction modes
US10506228B2 (en) * 2016-10-04 2019-12-10 Qualcomm Incorporated Variable number of intra modes for video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105898326A (en) * 2010-07-20 2016-08-24 株式会社Ntt都科摩 Image prediction encoding device and image prediction encoding method
CN103748877A (en) * 2011-08-17 2014-04-23 联发科技(新加坡)私人有限公司 Method and apparatus for intra prediction using non-square blocks
CN104054343A (en) * 2012-01-13 2014-09-17 夏普株式会社 Image decoding device, image encoding device, and data structure of encoded data
CN108028923A (en) * 2015-09-10 2018-05-11 Lg电子株式会社 Intra-frame prediction method and equipment in video coding system
WO2018037896A1 (en) * 2016-08-26 2018-03-01 シャープ株式会社 Image decoding apparatus, image encoding apparatus, image decoding method, and image encoding method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Block shape dependent intra mode coding;Vadim Seregin等;《Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11,7th Meeting: Torino, IT, 13–21 July 2017,no. JVET-G0159》;20170721;第1页摘要-第3页第3节 *

Also Published As

Publication number Publication date
CN110677678B (en) 2022-09-23
WO2020008324A1 (en) 2020-01-09
CN110677678A (en) 2020-01-10
TWI731361B (en) 2021-06-21
WO2020008328A1 (en) 2020-01-09
TW202021344A (en) 2020-06-01
TW202007153A (en) 2020-02-01
CN110677679A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN110662054B (en) Method, apparatus, computer readable storage medium for video processing
CN111064959B (en) How many HMVP candidates to examine
CN110662063B (en) Video processing method, apparatus and computer readable storage medium
CN110662070B (en) Selection of coded motion information for look-up table update
CN110662039B (en) Updating the lookup table: FIFO, constrained FIFO
CN110677679B (en) Shape dependent intra coding
WO2020003279A1 (en) Concept of using one or multiple look up tables to store motion information of previously coded in order and use them to code following blocks
WO2020003270A1 (en) Number of motion candidates in a look up table to be checked according to mode
CN113615193A (en) Merge list construction and interaction between other tools
CN110677668B (en) Spatial motion compression
WO2020044195A1 (en) Multi-motion model based video coding and decoding
CN110839160B (en) Forced boundary partitioning for extended quadtree partitioning
CN110662030A (en) Video processing method and device
CN113273216B (en) Improvement of MMVD
CN110677650B (en) Reducing complexity of non-adjacent mere designs
CN110719466B (en) Method, apparatus and storage medium for video processing
CN113196777B (en) Reference pixel padding for motion compensation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant