CN114303380B - Encoder, decoder and corresponding methods for CABAC coding of indices of geometric partition flags - Google Patents

Encoder, decoder and corresponding methods for CABAC coding of indices of geometric partition flags Download PDF

Info

Publication number
CN114303380B
CN114303380B CN202080060441.4A CN202080060441A CN114303380B CN 114303380 B CN114303380 B CN 114303380B CN 202080060441 A CN202080060441 A CN 202080060441A CN 114303380 B CN114303380 B CN 114303380B
Authority
CN
China
Prior art keywords
block
geometric
mode
current block
context model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080060441.4A
Other languages
Chinese (zh)
Other versions
CN114303380A (en
Inventor
高晗
塞米赫·艾森力克
伊蕾娜·亚历山德罗夫娜·阿尔希娜
王彪
阿南德·梅赫·科特拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114303380A publication Critical patent/CN114303380A/en
Application granted granted Critical
Publication of CN114303380B publication Critical patent/CN114303380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a decoding method realized by decoding equipment, which comprises the following steps: acquiring a code stream; acquiring the aspect ratio of the current block; acquiring a context model index of the current block according to the aspect ratio; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; and decoding the current block according to the value of the geometric division mark.

Description

Encoder, decoder and corresponding methods for CABAC coding of indices of geometric partition flags
Technical Field
Embodiments of the invention relate generally to the field of image processing, and more particularly, to CABAC coding processes.
Background
Video coding (video encoding and decoding) is widely used in digital video applications such as broadcast digital television, internet and mobile network based video transmission, video chat, video conferencing, and other real-time conversational applications, DVD and blu-ray discs, video content acquisition and editing systems, and camcorders for security applications.
Even if the video is relatively short, a large amount of video data is required to describe, which can cause difficulties when the data is to be streamed or otherwise transmitted in a communication network with limited bandwidth capacity. Video data is therefore typically compressed and then transmitted in modern telecommunication networks. Since memory resources may be limited, the size of the video may also be a problem when storing the video on a storage device. Video compression devices typically encode video data using software and/or hardware at the source side and then transmit or store the data, thereby reducing the amount of data required to represent digital video images. The compressed data is then received at the destination by a video decompression device that decodes the video data. In the case of limited network resources and an increasing demand for higher video quality, there is a need for improved compression and decompression techniques that increase the compression ratio with little sacrifice in image quality.
Disclosure of Invention
Embodiments of the present application provide an apparatus and method for encoding and decoding as set forth in the independent claims.
The above and other objects are achieved by the subject matter as claimed in the independent claims. Other implementations are apparent in the dependent claims, the description and the drawings.
In a first aspect, there is provided a decoding method implemented by a decoding apparatus, including: acquiring a code stream; acquiring the aspect ratio of the current block; acquiring a context model index of the current block according to the aspect ratio; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; and decoding the current block according to the value of the geometric division mark.
A context model may be used to decode the geometric partitioning markers from the binary code stream. Since the probability (probability) of geometrical partitioning occurring may depend on the aspect ratio of the current block, it is advantageous to derive a context model index from the aspect ratio. The context model index is used for acquiring the geometric division mark from the code stream. The geometric division mode flag is used to indicate whether the current block uses a geometric division mode. When the geometric partitioning application condition is satisfied, each block includes a geometric mode flag of value 0 or value 1.
With reference to the first aspect, in a first implementation, the aspect ratio is a ratio of a width and a height of the current block.
With reference to the first aspect or the first implementation manner of the first aspect, in a second implementation manner, the aspect ratio is obtained according to the following equation: ratio=1 < < abs (log 2 (width) -log2 (height)), where height and width in the equation are the height and width of the current block, abs () is an absolute value operator, log2 () is a logarithm based on 2, and < is a left shift operation.
With reference to the first aspect or any implementation manner of the first aspect, in a third implementation manner, the obtaining a context model index of the current block according to the aspect ratio includes: and if the aspect ratio is greater than a predefined threshold, obtaining a context model index 3 of the current block. This provides a specific context model index when the aspect ratio of the current block is large.
With reference to the first aspect or any implementation manner of the first aspect, in a fourth implementation manner, the obtaining the context model index of the current block according to the aspect ratio includes: and if the aspect ratio is equal to or smaller than a predefined threshold, acquiring the context model index of the current block according to at least one of the triangular partition mode and the geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and a top neighboring block.
With reference to the first aspect or the third or the fourth implementation manner of the first aspect, in a fifth implementation manner, the predefined threshold is 2 n N is a positive integer.
With reference to the first aspect or the third to fifth implementation manners of the first aspect, in a sixth implementation manner, the predefined threshold is 4.
In a second aspect, there is provided a decoding method implemented by a decoding apparatus, including: acquiring a code stream; acquiring a context model index of a current block according to at least one of information of a triangular partition mode and a geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and an upper neighboring block; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; and decoding the current block according to the value of the geometric division mark. The geometric division pattern is an extension of the triangular division pattern. An advantage of the second aspect is that the context model index may be derived from information of a trigonometric or geometric partition pattern of a neighboring block adjacent to the current block. And according to the context model index, acquiring the value of the geometric division mark of the current block from the code stream.
With reference to the fourth to sixth implementation manners of the first aspect, in a seventh implementation manner of the first aspect or in the second aspect, in a first implementation manner of the second aspect, the acquiring the context model index of the current block according to at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block adjacent to the current block includes: and when the geometric division mode or the triangular division mode is not used by the left adjacent block and the upper adjacent block, acquiring a context model index 0 of the current block.
With reference to the fourth to sixth implementation manners of the first aspect, in an eighth implementation manner of the first aspect or the second aspect, in a second implementation manner of the second aspect, the obtaining the context model index of the current block according to at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block adjacent to the current block includes: and when one of the left adjacent block and the upper adjacent block uses a geometric division mode or a triangular division mode, acquiring a context model index 1 of the current block.
With reference to the fourth to sixth implementation manners of the first aspect, in a ninth implementation manner of the first aspect or the second aspect, in a third implementation manner of the second aspect, the obtaining the context model index of the current block according to at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block adjacent to the current block includes: and when the left adjacent block and the upper adjacent block both use a geometric division mode or a triangular division mode, acquiring a context model index 2 of the current block.
With reference to the second aspect or any implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the method further includes: acquiring an aspect ratio of the current block, wherein the aspect ratio is a ratio of a width to a height of the current block; the obtaining a context model index of the current block according to at least one of information of a triangular partition mode and a geometric partition mode of a neighboring block adjacent to the current block includes: and acquiring a context model index of the current block according to the aspect ratio and at least one of the information of the triangular partition mode and the geometric partition mode of the adjacent block adjacent to the current block.
With reference to the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the obtaining the context model index of the current block according to the aspect ratio and at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block adjacent to the current block includes: when the aspect ratio of the current block is greater than a predefined threshold, a context model index 3 of the current block is obtained.
With reference to the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the predefined threshold is 4.
With reference to the fifth or the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the aspect ratio of the current block is obtained by the following equation: ratio=1 < < abs (log 2 (width) -log2 (height)), where height and width in the equation are the height and width of the current block, abs () is an absolute value operator, log2 () is a logarithm based on 2, and < is a left shift operation.
With reference to the fourth to sixth implementation manners of the first aspect, in a tenth implementation manner of the first aspect or any implementation manner of the second aspect, in an eighth implementation manner of the second aspect, one piece of information of the geometric partitioning mode of the left neighboring block indicates whether the left neighboring block uses a geometric partitioning mode, and one piece of information of the geometric partitioning mode of the upper neighboring block indicates whether the upper neighboring block uses a geometric partitioning mode.
With reference to the tenth implementation manner of the first aspect, in an eleventh implementation manner of the first aspect or the eighth implementation manner of the second aspect, in a ninth implementation manner of the second aspect, whether the geometric partitioning mode is used by the left neighboring block is determined according to a value of a geometric partitioning flag of the left neighboring block, or whether the geometric partitioning mode is used by the upper neighboring block is determined according to a value of a geometric partitioning flag of the upper neighboring block. In this implementation manner, one piece of information of the left neighboring block is the geometric division flag of the left neighboring block, and one piece of information of the upper neighboring block is the geometric division flag of the upper neighboring block.
With reference to the tenth implementation manner of the first aspect, in a twelfth implementation manner of the first aspect or the ninth implementation manner of the second aspect, in a tenth implementation manner of the second aspect, whether the geometric partitioning mode is used by the left neighboring block is determined according to whether the geometric partitioning mode is allowed to be used by the left neighboring block, or whether the geometric partitioning mode is allowed to be used by the upper neighboring block is determined according to whether the geometric partitioning mode is allowed to be used by the upper neighboring block. In this aspect, one piece of information of the left neighboring block is information indicating whether the left neighboring block is allowed to use the geometric division mode, for example, may be a block size of the left neighboring block, and one piece of information of the upper neighboring block is information indicating whether the upper neighboring block is allowed to use the geometric division mode, for example, may be a block size of the upper neighboring block.
With reference to the twelfth implementation manner of the first aspect, in a thirteenth implementation manner of the first aspect or the tenth implementation manner of the second aspect, in an eleventh implementation manner of the second aspect, if a block size of the left neighboring block is less than 8×8, the left neighboring block is not allowed to use a geometric division mode, or if a block size of the upper neighboring block is less than 8×8, the upper neighboring block is not allowed to use a geometric division mode. In this aspect, one piece of information of the left neighboring block is the block size of the left neighboring block, and one piece of information of the upper neighboring block is the block size of the upper neighboring block.
With reference to the fourth to sixth implementation manners of the first aspect, in a fourteenth implementation manner of the first aspect or any implementation manner of the second aspect, in a twelfth implementation manner of the second aspect, one piece of information of the triangularization mode of the left neighboring block indicates whether the left neighboring block uses a triangularization mode, and one piece of information of the triangularization mode of the upper neighboring block indicates whether the upper neighboring block uses a triangularization mode.
With reference to the fourteenth implementation manner of the first aspect, in a fifteenth implementation manner of the first aspect or the twelfth implementation manner of the second aspect, in a thirteenth implementation manner of the second aspect, whether the left neighboring block uses the triangulation pattern is determined according to a value of a triangulation flag of the left neighboring block or whether the upper neighboring block uses the triangulation pattern is determined according to a value of a triangulation flag of the upper neighboring block.
With reference to the fourteenth implementation manner of the first aspect, in a sixteenth implementation manner of the first aspect or a twelfth implementation manner of the second aspect, in a fourteenth implementation manner of the second aspect, whether the left neighboring block uses the triangulation mode is determined according to whether the left neighboring block is allowed to use the triangulation mode, or whether the upper neighboring block uses the triangulation mode is determined according to whether the upper neighboring block is allowed to use the triangulation mode.
With reference to the sixteenth implementation manner of the first aspect, in a seventeenth implementation manner of the first aspect or a fourteenth implementation manner of the second aspect, in a fifteenth implementation manner of the second aspect, if a block size of the left neighboring block is less than 8×8, the left neighboring block is not allowed to use a triangulation mode, or if a block size of the upper neighboring block is less than 8×8, the upper neighboring block is not allowed to use a triangulation mode.
A third aspect of the present invention provides a decoder comprising processing circuitry for performing the method of the first aspect or any implementation of the first aspect or the method of the second aspect or any implementation of the second aspect.
A fourth aspect of the present invention provides a computer program product comprising program code for performing the method of the first aspect or any implementation of the first aspect or the method of the second aspect or any implementation of the second aspect.
A fifth aspect of the present invention provides a decoder, comprising: one or more processors; a non-transitory computer readable storage medium coupled to the one or more processors and storing a program for execution by the one or more processors, wherein the decoder is configured to perform the method according to the first aspect or any implementation of the second aspect when the program is executed by the one or more processors.
A sixth aspect of the present invention provides a decoder, comprising: the acquisition module is used for acquiring the code stream; acquiring the aspect ratio of the current block; acquiring a context model index of the current block according to the aspect ratio; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; and the decoding module is used for decoding the current block according to the value of the geometric division mark.
With reference to the sixth aspect, in a first implementation manner of the sixth aspect, the aspect ratio is a ratio of a width and a height of the current block.
With reference to the sixth aspect, in a second implementation manner of the sixth aspect or the first implementation manner of the sixth aspect, the obtaining module is further configured to obtain the aspect ratio according to the following equation: ratio=1 < < abs (log 2 (width) -log2 (height)), where height and width in the equation are the height and width of the current block, abs () is an absolute value operator, log2 () is a logarithm based on 2, and < is a left shift operation.
With reference to the sixth aspect or any implementation manner of the sixth aspect, in a third implementation manner of the sixth aspect, the obtaining module is further configured to: if the aspect ratio is greater than a predefined threshold, a context model index 3 of the current block is obtained.
With reference to the sixth aspect or any implementation manner of the sixth aspect, in a fourth implementation manner of the sixth aspect, the obtaining module is further configured to: and if the aspect ratio is equal to or smaller than a predefined threshold, acquiring the context model index of the current block according to at least one of the triangular partition mode and the geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and a top neighboring block.
With reference to the sixth aspect or any implementation manner of the sixth aspect, in a fifth implementation manner of the sixth aspect, the predefined threshold is 2 n N is a positive integer.
With reference to the sixth aspect or any implementation manner of the sixth aspect, in a sixth implementation manner of the sixth aspect, the predefined threshold is 4.
A seventh aspect of the present invention provides a decoder, comprising: the acquisition module is used for acquiring the code stream; acquiring a context model index of a current block according to at least one of information of a triangular partition mode and a geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and an upper neighboring block; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; and the decoding module is used for decoding the current block according to the value of the geometric division mark.
With reference to the seventh aspect, in a first implementation manner of the seventh aspect, the acquiring module is further configured to: and when the geometric division mode or the triangular division mode is not used by the left adjacent block and the upper adjacent block, acquiring a context model index 0 of the current block.
With reference to the seventh aspect or the first implementation manner of the seventh aspect, in a second implementation manner of the seventh aspect, when one of the left neighboring block and the upper neighboring block uses a geometric partition mode or a trigonometric partition mode, a context model index 1 of the current block is acquired.
With reference to the seventh aspect or any implementation manner of the seventh aspect, in a third implementation manner of the seventh aspect, the obtaining module is further configured to: and when the left adjacent block and the upper adjacent block both use a geometric division mode or a triangular division mode, acquiring a context model index 2 of the current block.
With reference to the seventh aspect or any implementation manner of the seventh aspect, in a fourth implementation manner of the seventh aspect, one piece of information of the geometric partition mode of the left neighboring block indicates whether the left neighboring block uses a geometric partition mode, and one piece of information of the geometric partition mode of the upper neighboring block indicates whether the upper neighboring block uses a geometric partition mode.
With reference to the fourth implementation manner of the seventh aspect, in a fifth implementation manner of the seventh aspect, whether the geometric partition mode is used by the left neighboring block is determined according to a value of a geometric partition flag of the left neighboring block, or whether the geometric partition mode is used by the upper neighboring block is determined according to a value of a geometric partition flag of the upper neighboring block.
With reference to the fourth implementation manner of the seventh aspect, in a sixth implementation manner of the seventh aspect, whether the left neighboring block uses the geometric partition mode is determined according to whether the left neighboring block is allowed to use the geometric partition mode, or whether the upper neighboring block uses the geometric partition mode is determined according to whether the upper neighboring block is allowed to use the geometric partition mode.
With reference to the sixth implementation manner of the seventh aspect, in a seventh implementation manner of the seventh aspect, if a block size of the left neighboring block is less than 8×8, the left neighboring block is not allowed to use a geometric division mode, or if a block size of the upper neighboring block is less than 8×8, the upper neighboring block is not allowed to use a geometric division mode.
With reference to the seventh aspect or any implementation manner of the seventh aspect, in an eighth implementation manner of the seventh aspect, one piece of information of the triangularization mode of the left neighboring block indicates whether the left neighboring block uses a triangularization mode, and one piece of information of the triangularization mode of the upper neighboring block indicates whether the upper neighboring block uses a triangularization mode.
With reference to the eighth implementation manner of the seventh aspect, in a ninth implementation manner of the seventh aspect, whether the left neighboring block uses the triangulation mode is determined according to a value of a triangulation flag of the left neighboring block, or whether the upper neighboring block uses the triangulation mode is determined according to a value of a triangulation flag of the upper neighboring block.
With reference to the eighth implementation manner of the seventh aspect, in a tenth implementation manner of the seventh aspect, whether the left neighboring block uses the triangulation mode is determined according to whether the left neighboring block is allowed to use the triangulation mode, or whether the upper neighboring block uses the triangulation mode is determined according to whether the upper neighboring block is allowed to use the triangulation mode.
With reference to the tenth implementation manner of the seventh aspect, in an eleventh implementation manner of the seventh aspect, if a block size of the left neighboring block is less than 8×8, the left neighboring block is not allowed to use a geometric division mode, or if a block size of the upper neighboring block is less than 8×8, the upper neighboring block is not allowed to use a geometric division mode.
Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Embodiments of the present invention will be described in detail below with reference to the attached drawing figures, wherein:
fig. 1A is a block diagram of an exemplary video coding system for implementing an embodiment of the present invention;
fig. 1B is a block diagram of another exemplary video coding system for implementing an embodiment of the present invention;
FIG. 2 is a block diagram of an exemplary video encoder for implementing an embodiment of the present invention;
fig. 3 is a block diagram of an exemplary architecture of a video decoder for implementing an embodiment of the present invention;
FIG. 4 is a block diagram of an exemplary encoding or decoding device;
FIG. 5 is a block diagram of another exemplary encoding or decoding device;
fig. 6 shows an example of neighboring blocks of the current block;
FIG. 7 illustrates a triangular partition mode and a geometric partition mode;
FIG. 8 is a block diagram of an example of context-adaptive binary arithmetic coding;
FIG. 9 is a flow chart of a decoding method implemented by a decoding device;
FIG. 10 is a flow chart of another decoding method implemented by a decoder;
FIG. 11 is a block diagram of a decoder;
fig. 12 is a block diagram of another embodiment of a decoder.
In the following, like reference numerals refer to like or at least functionally equivalent features unless explicitly stated otherwise.
Detailed Description
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific aspects in which embodiments of the invention may be practiced. It is to be understood that embodiments of the invention may be used in other respects and include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
It will be appreciated that the disclosure relating to the described method also applies equally to a device or system corresponding to the method for performing, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, e.g., functional units, for performing the described one or more method steps (e.g., one unit performing the one or more steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described in terms of one or more units (e.g., functional units), the corresponding method may include one step to perform the function of the one or more units (e.g., one step to perform the function of the one or more units, or multiple steps each to perform the function of one or more units of the plurality), even if such one or more steps are not explicitly described or illustrated in the figures. Furthermore, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically indicated otherwise.
Video coding (coding) generally refers to the processing of a sequence of images that make up a video or video sequence. In the field of video coding, the term "frame" or "picture" may be used as synonyms. Video coding (or commonly referred to as coding) includes two parts, video encoding and video decoding. Performing video encoding on the source side typically includes processing (e.g., by compression) the original video image to reduce the amount of data needed to represent the video image (for more efficient storage and/or transmission). Video decoding is performed on the destination side, typically involving inverse processing with respect to the encoder to reconstruct the video image. The reference to a video image (or commonly referred to as image "coding)" in the embodiments is to be understood as reference to "encoding" or "decoding" of a video image or a corresponding video sequence. The combination of the encoding portion and the decoding portion is also called CODEC (coding and decoding).
In the case of lossless video coding, the original video image may be reconstructed, i.e., the reconstructed video image is of the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, compression is further performed (e.g., by quantization) to reduce the amount of data representing video images that cannot be fully reconstructed in the decoder, i.e., the quality of the reconstructed video images is lower or worse than the original video images.
Several video coding standards belong to the "lossy hybrid video codec" (i.e. 2D transform coding, which combines spatial prediction and temporal prediction in the sample domain and applies quantization in the transform domain). Each picture of a video sequence is typically partitioned into a set of non-overlapping blocks, and decoding is typically performed in units of blocks. In other words, in an encoder, video is typically processed (i.e., encoded) in units of blocks (video blocks), for example, prediction blocks are generated by spatial (intra) prediction and/or temporal (inter) prediction; subtracting the predicted block from the current block (current processing/block to be processed) to obtain a residual block; the residual block is transformed in the transform domain and quantized to reduce the amount of data to be transmitted (compressed), while in the decoder, the inverse process with respect to the encoder is applied to the encoded or compressed block to reconstruct the current block for representation. Furthermore, the processing loop of the encoder is identical to that of the decoder, such that both will produce identical predicted (e.g., intra-and inter-predicted) blocks and/or reconstructed blocks for processing (i.e., coding) of subsequent blocks.
In the following embodiments, a video coding system 10, a video encoder 20, and a video decoder 30 are described with reference to fig. 1-3.
Fig. 1A is an exemplary decoding system 10, such as video decoding system 10 (or simply decoding system 10), that may utilize the techniques of this disclosure. Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) of video coding system 10 represent examples of devices that may be used to perform techniques according to various examples described herein.
As shown in fig. 1A, decoding system 10 includes a source device 12, source device 12 for providing encoded image data 21 to a destination device 14 for decoding encoded image data 13.
Source device 12 includes an encoder 20 and, in addition, optionally, may include a preprocessor (or preprocessing unit) 18, such as an image source 16, an image preprocessor 18, a communication interface or communication unit 22.
Image source 16 may include or be any type of image capturing device, such as a camera for capturing real world images, and/or any type of image generating device, such as a computer graphics processor for generating computer animated images, or any type of other device for capturing and/or providing real world images, computer generated images (e.g., screen content, virtual Reality (VR) images), and/or any combination thereof (e.g., augmented reality (augmented reality, AR) images). The image source may be any type of memory (memory/storage) that stores any of the above images.
The image or image data 17 may also be referred to as an original image or original image data 17, unlike the preprocessor 18 and the processing performed by the preprocessing unit 18.
The preprocessor 18 is arranged to receive (raw) image data 17, preprocess the image data 17 to obtain a preprocessed image 19 or preprocessed image data 19. The preprocessing performed by the preprocessor 18 may include clipping (triming), color format conversion (e.g., from RGB to YCbCr), color correction or denoising, and the like. It should be appreciated that the preprocessing unit 18 may be an optional component.
Video encoder 20 is operative to receive preprocessed image data 19 and provide encoded image data 21 (described further below with respect to fig. 2, etc.).
The communication interface 22 in the source device 12 may be used to: the encoded image data 21 is received and the encoded image data 21 (or any other processed version) is transmitted over the communication channel 13 to another device, such as the destination device 14, or any other device, for storage or direct reconstruction.
Destination device 14 includes a decoder 30 (e.g., video decoder 30), and may additionally, or alternatively, include a communication interface or unit 28, a post-processor 32 (or post-processing unit 32), and a display device 34.
The communication interface 28 in the destination device 14 is used to receive the encoded image data 21 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, e.g., an encoded image data storage device, and to provide the encoded image data 21 to the decoder 30.
Communication interface 22 and communication interface 28 may be used to transmit or receive encoded image data 21 or encoded data 13 over a direct communication link (e.g., a direct wired or wireless connection) between source device 12 and destination device 14, or over any type of network (e.g., a wired or wireless network or any combination thereof, or any type of private and public networks), or any combination thereof.
For example, communication interface 22 may be used to encapsulate encoded image data 21 into a suitable format, such as a message, and/or process the encoded image data for transmission over a communication link or network using any type of transmission encoding or processing.
For example, a communication interface 28 corresponding to communication interface 22 may be used to receive the transmitted data and process the transmitted data using any type of corresponding transport decoding or processing and/or decapsulation to obtain encoded image data 21.
Communication interface 22 and communication interface 28 may each be configured as a unidirectional communication interface as indicated by the arrow of the corresponding communication channel 13 from source device 12 to destination device 14 in fig. 1A, or as a bi-directional communication interface, and may be used to send and receive messages or the like, to establish a connection, to acknowledge and exchange any other information related to the communication link and/or data transmission, such as encoded image data transmission, or the like.
Decoder 30 is for receiving encoded image data 21 and providing decoded image data 31 or decoded image 31 (described in detail below with respect to fig. 3 or 5).
The post-processor 32 of the destination device 14 is used to post-process the decoded image data 31 (also referred to as reconstructed image data) (e.g., decoded image 31) to obtain post-processed image data 33 (e.g., post-processed image 33). For example, post-processing performed by post-processing unit 32 may include color format conversion (e.g., conversion from YCbCr to RGB), toning, cropping, or resampling, or any other processing for producing decoded image data 31 for display by display device 34 or the like.
The display device 34 in the destination device 14 is for receiving the post-processed image data 33 to display an image to a user or viewer or the like. The display device 34 may be or include any type of display for representing a reconstructed image, such as an integrated or external display or monitor. For example, the display may include a liquid crystal display (liquid crystal display, LCD), an organic light emitting diode (organic light emitting diode, OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon display (liquid crystal on silicon, LCoS), a digital light processor (digital light processor, DLP), or any type of other display.
Although fig. 1A depicts source device 12 and destination device 14 as separate devices, device embodiments may also include two devices or two functions, namely source device 12 or corresponding function and destination device 14 or corresponding function. In these embodiments, the source device 12 or corresponding function and the destination device 14 or corresponding function may be implemented using the same hardware and/or software or by hardware and/or software alone or any combination thereof.
It will be apparent to the skilled artisan from the description that the different units or functions that are shown in fig. 1A as having and (accurately) divided in the source device 12 and/or the destination device 14 may vary depending on the actual device and application.
The encoder 20 (e.g., video encoder 20) or the decoder 30 (e.g., video decoder 30), or both the encoder 20 and the decoder 30, may be implemented by processing circuitry as shown in fig. 1B, such as one or more microprocessors, digital signal processors (digital signal processor, DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (field-programmable gate array, FPGAs), discrete logic, hardware, video coding specific processors, or any combinations thereof. Encoder 20 may be implemented by processing circuitry 46 to embody the various modules discussed with respect to encoder 20 of fig. 2 and/or any other encoder system or subsystem described herein. The decoder 30 may be implemented by the processing circuit 46 to embody the various modules discussed with respect to the decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. The processing circuitry may be used to perform various operations that will be discussed later. When the techniques are implemented in part in software, as shown in fig. 5, the device may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of the present invention. Video encoder 20 or video decoder 30 may be integrated in a single device as part of a combined encoder/decoder (codec), as shown in fig. 1B.
Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as, for example, a notebook or laptop computer, a cell phone, a smart phone, a tablet computer (tablet/tablet computer), a video camera, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game, a video streaming device (e.g., a content service server or content distribution server), a broadcast receiver device, a broadcast transmitter device, etc., and may not use or use any type of operating system. In some cases, source device 12 and destination device 14 may be equipped with components for wireless communication. Thus, the source device 12 and the destination device 14 may be wireless communication devices.
In some cases, the video coding system 10 shown in fig. 1A is merely exemplary, and the techniques provided herein may be applicable to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, the data is retrieved from local memory, sent over a network, and so on. The video encoding device may encode and store data in the memory and/or the video decoding device may retrieve and decode data from the memory. In some examples, encoding and decoding are performed by devices that do not communicate with each other, but simply encode and/or retrieve data from memory and decode data.
For ease of description, embodiments of the present invention are described herein with reference to High-Efficiency Video Coding (HEVC) or universal video coding (Versatile Video coding, VVC) reference software or next generation video coding standards developed by the video coding joint working group (Joint Collaboration Team on Video Coding, JCT-VC) of the ITU-T video coding experts group (Video Coding Experts Group, VCEG) and the ISO/IEC moving Picture experts group (Motion Picture Experts Group, MPEG). Those of ordinary skill in the art will appreciate that embodiments of the present invention are not limited to HEVC or VVC.
Encoder and encoding method
Fig. 2 is a schematic block diagram of an exemplary video encoder 20 for implementing the techniques of this application. In the example of fig. 2, video encoder 20 includes an input 201 (or input interface 201), a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210 and inverse transform processing unit 212, a reconstruction unit 214, a loop filter unit 220, a decoded image buffer (decoded picture buffer, DPB) 230, a mode selection unit 260, an entropy encoding unit 270, and an output 272 (or output interface 272). The mode selection unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a partition unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 shown in fig. 2 may also be referred to as a hybrid video encoder or a hybrid video codec based video encoder.
The residual calculation unit 204, the transform processing unit 206, the quantization unit 208, and the mode selection unit 260 constitute a forward signal path of the encoder 20; the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the decoded image buffer (decoded picture buffer, DPB) 230, the inter prediction unit 244, and the intra prediction unit 254 constitute an inverse signal path of the video encoder 20, wherein the inverse signal path of the video encoder 20 corresponds to a signal path of a decoder (see decoder 30 of fig. 3). The inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the decoded image buffer (decoded picture buffer, DPB) 230, the inter prediction unit 244, and the intra prediction unit 254 also constitute a "built-in decoder" of the video encoder 20.
Image and image segmentation (image and block)
Encoder 20 may be used to receive image 17 (or image data 17) via input 201 or the like, for example, to form images in a video or image sequence of a video sequence. The received image or image data may also be a preprocessed image 19 (or preprocessed image data 19). For simplicity, the following description uses image 17. Picture 17 may also be referred to as a current picture or a picture to be coded (especially when distinguishing the current picture from other pictures in video coding, such as the same video sequence, i.e., previously encoded pictures and/or decoded pictures in a video sequence that also includes the current picture).
The (digital) image is or may be a two-dimensional array or matrix of samples having intensity values. Samples in the array may also be referred to as pixels (short versions of picture elements). The number of samples in the horizontal and vertical directions (or axes) of the array or image defines the size and/or resolution of the image. To represent color, three color components are typically used, i.e., the image may be represented as or include three sample arrays. In the RGB format or color space, the image includes corresponding arrays of red, green, and blue samples. However, in video coding, each pixel is typically represented in luminance and chrominance format or in color space, e.g., YCbCr, including a luminance component represented by Y (sometimes also represented by L) and two chrominance components represented by Cb and Cr. The luminance component Y represents luminance or grayscale intensity (e.g., as in a grayscale image), and the two chrominance components Cb and Cr represent chrominance or color information components. Accordingly, an image in YCbCr format includes a luma sample array of luma sample values (Y) and two chroma sample arrays of chroma values (Cb and Cr). An image in RGB format may be converted to YCbCr format and vice versa, a process also known as color conversion or conversion. If the image is monochromatic, the image may include only an array of luminance samples. Accordingly, for example, the image may be an array of luma samples in a monochrome format or an array of luma samples and two corresponding arrays of chroma samples in a 4:2:0, 4:2:2, and 4:4:4 color format.
In one embodiment, video encoder 20 may include an image segmentation unit (not shown in fig. 2) for segmenting image 17 into a plurality of (typically non-overlapping) image blocks 203. These blocks may also be referred to as root blocks, macro blocks (H.264/AVC) or coding tree blocks (coding tree block, CTB), or Coding Tree Units (CTU) (H.265/HEVC and VVC). The image segmentation unit may be used to use the same block size for all images in the video sequence and to use a corresponding grid defining the block size, or to change the block size between images or image subsets or groups and to segment each image into corresponding blocks.
In further embodiments, the video encoder may be used to directly receive blocks 203 of image 17, e.g., one, several, or all of the blocks comprising image 17. The image block 203 may also be referred to as a current image block or an image block to be decoded.
As with image 17, image block 203 is also or may be considered a two-dimensional array or matrix of samples having intensity values (sample values), but the size of image block 203 is smaller than image 17. That is, for example, block 203 may include, for example, one sample array (e.g., a luminance array in the case of a black-and-white image 17, or a luminance or chrominance array in the case of a color image) or three sample arrays (e.g., a luminance array and two chrominance arrays in the case of a color image 17) or any other number and/or type of arrays depending on the color format applied. The number of samples of the block 203 in the horizontal and vertical directions (or axes) defines the size of the block 203. Thus, the block may be an array of m×n (M columns×n rows) samples, or an array of m×n transform coefficients, or the like.
In one embodiment, video encoder 20 shown in fig. 2 is used to encode image 17 on a block-by-block basis, e.g., encoding and prediction is performed for each block 203.
The embodiment of video encoder 20 shown in fig. 2 may also be used to segment and/or encode images using slices (also referred to as video slices), wherein the images may be segmented or encoded using one or more slices (typically non-overlapping) and each slice may include one or more blocks (e.g., CTUs).
The embodiment of video encoder 20 shown in fig. 2 may also be used to segment and/or encode images using blocks (also referred to as video blocks) and/or blocks (also referred to as video blocks), wherein images may be segmented or encoded using one or more blocks (typically non-overlapping) each of which may include one or more blocks (e.g., CTUs) or one or more blocks, etc., wherein each block may be rectangular, etc., may include one or more blocks (e.g., CTUs), such as complete or partial blocks.
Residual calculation
The residual calculation unit 204 is configured to calculate a residual block 205 (also referred to as residual 205) from the image block 203 and the prediction block 265 (the prediction block 265 is described in detail later): for example, sample values of the prediction block 265 are subtracted from sample values of the image block 203 on a sample-by-sample (pixel-by-pixel) basis, resulting in a residual block 205 in the sample domain.
Transformation
The transform processing unit 206 is configured to perform discrete cosine transform (discrete cosine transform, DCT) or discrete sine transform (discrete sine transform, DST) on the sample values of the residual block 205, and the like, to obtain transform coefficients 207 in the transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients, representing the residual block 205 in the transform domain.
The transform processing unit 206 may be used to apply integer approximations of DCT/DST, such as the transforms specified for H.265/HEVC. Such integer approximations are typically scaled by a factor compared to the orthogonal DCT transform. In order to maintain the norms of the residual block of the forward inverse transform process, other scaling factors are applied during the transform process. The scaling factor is typically selected based on certain constraints, such as a power of 2 for the shift operation, bit depth of the transform coefficients, trade-off between accuracy and implementation cost, etc. For example, a specific scaling factor is specified for the inverse transform by the inverse transform processing unit 212 or the like (and a corresponding inverse transform by the inverse transform processing unit 312 or the like at the video decoder 30), and accordingly, a corresponding scaling factor may be specified for the forward transform by the transform processing unit 206 or the like in the encoder 20.
Embodiments of video encoder 20 (corresponding to transform processing unit 206) may be configured to output transform parameters (e.g., one or more types of transforms) either directly or encoded or compressed by entropy encoding unit 270, e.g., such that video decoder 30 may receive and use the transform parameters for decoding.
Quantization
The quantization unit 208 is configured to quantize the transform coefficient 207 by, for example, scalar quantization or vector quantization, resulting in a quantized coefficient 209. The quantized coefficients 209 may also be referred to as quantized transform coefficients 209 or quantized residual coefficients 209.
The quantization process may reduce the bit depth associated with some or all of the transform coefficients 207. For example, n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m. The quantization level may be modified by adjusting quantization parameters (quantization parameter, QP). For example, for scalar quantization, different scaling may be applied to achieve finer or coarser quantization. A smaller quantization step corresponds to finer quantization, while a larger quantization step corresponds to coarser quantization. The applicable quantization step size may be represented by a quantization parameter (quantization parameter, QP). For example, the quantization parameter may be an index of a predefined set of applicable quantization steps. For example, a smaller quantization parameter may correspond to fine quantization (smaller quantization step size) and a larger quantization parameter may correspond to coarse quantization (larger quantization step size) and vice versa. Quantization may involve division by a quantization step size and corresponding quantization or dequantization, e.g., performed by dequantization unit 210, or may involve multiplication by a quantization step size. Embodiments according to standards such as HEVC may be used to determine quantization step sizes using quantization parameters. In general, the quantization step size may be calculated from quantization parameters using a fixed-point approximation of an equation including division. Quantization and dequantization may introduce other scaling factors to recover the norm of the residual block, which may be modified due to scaling used in the fixed-point approximation of the equation of the quantization step size and quantization parameters. In one exemplary implementation, the inverse transform and the dequantized scaling may be combined. Alternatively, a custom quantization table may be used and indicated (signal) to the decoder by the encoder by means of a code stream or the like. Quantization is a lossy operation, with the loss increasing with increasing quantization step size.
In an embodiment, video encoder 20 (corresponding to quantization unit 208) may be used to output quantization parameters (quantization parameter, QP), e.g., directly or encoded by entropy encoding unit 270, e.g., so that video decoder 30 may receive and decode using the quantization parameters.
Inverse quantization
The inverse quantization unit 210 is configured to perform inverse quantization of the quantization unit 208 on the quantized coefficient to obtain a dequantized coefficient 211, for example, performing an inverse quantization scheme corresponding to the quantization scheme performed by the quantization unit 208 according to or using the same quantization step size as the quantization unit 208. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211, corresponding to the transform coefficients 207, but the dequantized coefficients 211 are typically not exactly the same as the transform coefficients due to quantization loss.
Inverse transformation
The inverse transform processing unit 212 is configured to perform an inverse transform of the transform performed by the transform processing unit 206, for example, an inverse discrete cosine transform (discrete cosine transform, DCT) or an inverse discrete sine transform (discrete sine transform, DST), to obtain a reconstructed residual block 213 (or a corresponding dequantized coefficient 213) in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 213.
Reconstruction of
A reconstruction unit 214 (e.g., an adder or summer 214) is used to add the transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 to obtain the reconstructed block 215 in the sample domain, e.g., to add the sample values of the reconstructed residual block 213 and the sample values of the prediction block 265.
Filtering
The loop filter unit 220 (or simply "loop filter" 220) is used to filter the reconstruction block 215 to obtain a filter block 221, or is typically used to filter the reconstructed samples to obtain filtered samples. For example, loop filter units are used to smooth pixel transitions or to improve video quality. Loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bilateral filter, an adaptive loop filter (adaptive loop filter, ALF), a sharpening or smoothing filter, or a collaborative filter, or any combination thereof. Although loop filter unit 220 is shown in fig. 2 as an in-loop filter, in other configurations loop filter unit 220 may be implemented as a post-loop filter. The filtering block 221 may also be referred to as a filtered reconstruction block 221.
Embodiments of video encoder 20, and in particular loop filter unit 220, may be configured to encode, directly or via entropy encoding unit 270, etc., output loop filter parameters (e.g., sample adaptive offset information) such that, for example, decoder 30 may receive and apply the same loop filter parameters or corresponding loop filters for decoding.
Decoding image buffer
The decoded picture buffer (decoded picture buffer, DPB) 230 may be a memory that stores reference pictures or reference picture data for video encoder 20 to encode video data. DPB 230 may be formed of any of a variety of memory devices, such as dynamic random access memory (dynamic random access memory, DRAM), including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. The decoded picture buffer (decoded picture buffer, DPB) 230 may be used to store one or more filter blocks 221. The decoded image buffer 230 may also be used to store other previous filter blocks of different images, such as the same current image or a previous reconstructed image, e.g., the previous reconstruction and filter block 221, and may provide the complete previous reconstructed, i.e., decoded image (and corresponding reference blocks and samples) and/or partially reconstructed current image (and corresponding reference blocks and samples), e.g., for inter prediction. The decoded picture buffer (decoded picture buffer, DPB) 230 may also be used to store one or more unfiltered reconstructed blocks 215, or generally unfiltered reconstructed samples, e.g., reconstructed blocks 215 that are not filtered by the loop filter unit 220, or reconstructed blocks or samples that are not subjected to any other processing.
Mode selection (segmentation and prediction)
The mode selection unit 260 comprises a segmentation unit 262, an inter prediction unit 244 and an intra prediction unit 254 for receiving or obtaining raw image data, e.g. the raw block 203 (the current block 203 of the current image 17), and reconstructed image data, e.g. filtered and/or unfiltered reconstructed samples or reconstructed blocks of the same (current) image and/or one or more previously decoded images, from the decoded image buffer 230 or other buffers (e.g. line buffers, not shown in the figure). The reconstructed image data is used as reference image data required for prediction such as inter prediction or intra prediction to obtain a prediction block 265 or a prediction value 265.
The mode selection unit 260 may be configured to determine or select a partition type for a current block prediction mode (including non-partition) and a prediction mode (e.g., intra or inter prediction mode) and generate a corresponding prediction block 265 for computing the residual block 205 and reconstructing the reconstructed block 215.
In one embodiment, the mode selection unit 260 may be configured to select a partition and prediction mode (e.g., from among prediction modes supported or available by the mode selection unit 260) that provides a best match or a minimum residual (minimum residual refers to better compression in transmission or storage), or provides a minimum indication overhead (minimum indication overhead refers to better compression in transmission or storage), or both. The mode selection unit 260 may be used to determine the segmentation and prediction modes based on rate distortion optimization (rate distortion optimization, RDO), i.e. to select the prediction mode that provides the least rate distortion optimization. Terms such as "best," "minimum," "optimal," etc. in this context do not necessarily refer to "best," "minimum," "optimal," etc. in general, but may also refer to meeting termination or selection criteria, e.g., values above or below a threshold or other constraint, may be "less preferred," but with reduced complexity and processing time.
In other words, the partitioning unit 262 may be configured to partition the block 203 into smaller partitioned blocks or sub-blocks (again forming a block), e.g., iteratively using a quad-tree (QT) partition, a binary-tree (BT) partition, or a ternary-tree (TT) partition, or any combination thereof, and, for example, predict each partitioned block or sub-block, wherein the mode selection includes selecting a tree structure of the partitioned block 203 and applying a prediction mode to each partitioned block or sub-block.
The partitioning (e.g., performed by the partitioning unit 260) and prediction processing (e.g., performed by the inter-prediction unit 244 and the intra-prediction unit 254) performed by the exemplary video encoder 20 will be described in detail below.
Segmentation
The segmentation unit 262 may segment (or divide) the current block 203 into smaller segmentation blocks, such as smaller blocks of square or rectangular size. These smaller blocks (which may also be referred to as sub-blocks) may be further partitioned into even smaller partitioned blocks. This approach is also referred to as tree segmentation or hierarchical tree segmentation, wherein root blocks of, for example, root tree hierarchy 0 (hierarchy level 0, depth 0) may be recursively segmented, e.g., into two or more next lower tree hierarchy blocks, e.g., nodes of tree hierarchy 1 (hierarchy level 1, depth 1). These blocks may again be partitioned into two or more next lower levels, e.g., tree level 2 (level 2, depth 2) blocks, etc., until the maximum tree depth or minimum block size is reached due to termination criteria being met, the partitioning terminating. The blocks that are not further partitioned are also referred to as leaf blocks or leaf nodes of the tree. A tree divided into two divided blocks is called a Binary Tree (BT), a tree divided into three divided blocks is called a Ternary Tree (TT), and a tree divided into four divided blocks is called a Quadtree (QT).
As previously mentioned, the term "block" as used herein may be a portion of an image, in particular a square or rectangular portion. For example, in connection with HEVC and VVC, a block may be or correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU) and a Transform Unit (TU), and/or to a corresponding block, e.g., a coding tree block (coding tree block, CTB), a Coding Block (CB), a Transform Block (TB) or a Prediction Block (PB).
For example, a Coding Tree Unit (CTU) may be or include a CTB of a luma sample, two corresponding CTBs of chroma samples of an image having three sample arrays, or CTBs of samples of a monochrome image, or CTBs of samples of an image coded using three independent color planes and syntax structures (for coding the samples). Accordingly, the coding tree block (coding tree block, CTB) may be an n×n sample block, where N may be set to a value such that the component is divided into CTBs, which is a partition. A Coding Unit (CU) may be or include a coding block of luma samples, two corresponding coding blocks of chroma samples of an image with three sample arrays, or a coding block of samples of a monochrome image or a coding block of samples of an image coded using three independent color planes and syntax structures for coding the samples. Accordingly, a Coding Block (CB) may be an mxn sample block, where M and N may be set to a value such that CTB is divided into coding blocks, which is a partition.
In an embodiment, a Coding Tree Unit (CTU) may be divided into a plurality of CUs by a quadtree structure denoted as a coding tree, for example, according to HEVC. Whether to code an image region using inter (temporal) prediction or intra (spatial) prediction is determined in units of CUs. Each CU may be further divided into one, two, or four PUs according to PU partition types. The same prediction process is applied within one PU and related information is transmitted to the decoder in PU units. After the residual block is obtained by applying the prediction process according to the PU partition type, the CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree for the CU.
In an embodiment, the encoded blocks are partitioned using a combined quadtree and binary tree (quad-tree and binary tree, QTBT) partition, e.g., according to the current developed latest video coding standard known as universal video coding (Versatile Video Coding, VVC). In the QTBT block structure, the CU may be square or rectangular. For example, a Coding Tree Unit (CTU) is first divided by a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree or a trigeminal tree structure. The partition leaf nodes are called Coding Units (CUs) and the segments are used for prediction and transformation processing without further partitioning. That is, in the QTBT encoded block structure, the block sizes of the CU, PU, and TU are the same. Meanwhile, multiple partitions such as trigeminal tree partitions can be used in combination with the QTBT block structure.
In one example, mode selection unit 260 of video encoder 20 may be used to perform any combination of the segmentation techniques described herein.
As described above, video encoder 20 is configured to determine or select a best or optimal prediction mode from a set of (e.g., predetermined) prediction modes. For example, the set of prediction modes may include intra prediction modes and/or inter prediction modes.
Intra prediction
The set of intra prediction modes may include 35 different intra prediction modes, for example, a non-directional mode like a DC (or mean) mode and a planar mode, or a directional mode as defined by HEVC, or 67 different intra prediction modes, for example, a non-directional mode like a DC (or mean) mode and a planar mode, or a directional mode as defined in VVC.
The intra prediction unit 254 is configured to generate an intra prediction block 265 using reconstructed samples of neighboring blocks of the same current image according to intra prediction modes in the intra prediction mode set.
The intra-prediction unit 254 (or typically the mode selection unit 260) is also operable to output intra-prediction parameters (or typically information representing a selected intra-prediction mode of a block) in the form of syntax elements 266 to the entropy encoding unit 270 for inclusion into the encoded image data 21, e.g., so that the video decoder 30 may receive and use the prediction parameters for decoding.
Inter prediction
The set of (possible) inter prediction modes is determined from the available reference pictures (i.e., previously at least partially decoded pictures stored in DPB 230, for example) and other inter prediction parameters, such as whether to use the entire reference picture or only a portion of the reference picture (e.g., a search window region near the region of the current block) to search for a best matching reference block, and/or such as whether to apply pixel interpolation (e.g., half-pixel and/or quarter-pixel interpolation).
In addition to the above prediction modes, a skip mode and/or a direct mode may be applied.
The inter prediction unit 244 may include a motion estimation (motion estimation, ME) unit and a motion compensation (motion compensation, MC) unit (both not shown in fig. 2). The motion estimation unit may be used to receive or obtain an image block 203 (current image block 203 of current image 17) and a decoded image 231, or at least one or more previously reconstructed blocks, e.g. reconstructed blocks of one or more other/different previously decoded images 231, for motion estimation. For example, the video sequence may include the current image and the previous decoded image 231, or in other words, the current image and the previous decoded image 231 may be part of or form a sequence of images of the video sequence.
For example, encoder 20 may be configured to select a reference block from a plurality of reference blocks of the same or different ones of a plurality of other images, and provide the reference image (or reference image index) and/or an offset (spatial offset) between a position (x-coordinate, y-coordinate) of the reference block and a position of the current block as inter prediction parameters to the motion estimation unit. This offset is also called Motion Vector (MV).
The motion compensation unit is configured to obtain (e.g., receive) inter-prediction parameters, and perform inter-prediction according to or using the inter-prediction parameters to obtain the inter-prediction block 265. The motion compensation performed by the motion compensation unit may involve extracting or generating a prediction block from a motion/block vector determined by motion estimation and may also involve interpolation of sub-pixel precision. Interpolation filtering may generate samples of other pixels from samples of known pixels, potentially increasing the number of candidate prediction blocks available for coding an image block. Upon receiving the motion vector of the PU of the current image block, the motion compensation unit may locate the prediction block to which the motion vector points in one of the reference image lists.
The motion compensation unit may also generate syntax elements related to the blocks and video slices for use by video decoder 30 in decoding image blocks of the video slices. A tile group (tile group) and/or a tile and corresponding syntax element may be received and/or used in addition to or in place of the tile and corresponding syntax element.
Entropy coding
The entropy encoding unit 270 is configured to apply or not apply an entropy encoding algorithm or scheme (e.g., a variable length coding (variable length coding, VLC) scheme, a Context Adaptive VLC (CAVLC), an arithmetic coding scheme, a binarization algorithm, a context adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), a syntax-based context-based binary arithmetic coding (SBAC), a probability interval partition entropy (probability interval partitioning entropy, PIPE) coding or other entropy encoding method or technique) or the like to (non-compressed) quantization coefficients 209, inter-prediction parameters, intra-prediction parameters, loop filter parameters, and/or other syntax elements, resulting in encoded image data 21 that can be output in the form of an encoded stream 21 or the like through an output 272, so that the video decoder 30 or the like can receive and use the parameters for decoding. The encoded stream 21 may be transmitted to video decoder 30 or stored in memory for later transmission or retrieval by video decoder 30.
Other structural variations of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may directly quantize the residual signal of certain blocks or frames without the transform processing unit 206. In another implementation, encoder 20 may include a quantization unit 208 and an inverse quantization unit 210 combined into a single unit.
Decoder and decoding method
Fig. 3 shows an example of a video decoder 30 for implementing the techniques of the present application. Video decoder 30 is operative to receive encoded image data 21 (e.g., encoded bitstream 21) encoded, for example, by encoder 20, resulting in decoded image 331. The encoded image data or bitstream includes information for decoding the encoded image data, such as data representing image blocks of an encoded video slice (and/or block group or chunk) and associated syntax elements.
In the example of fig. 3, decoder 30 includes entropy decoding unit 304, inverse quantization unit 310, inverse transform processing unit 312, reconstruction unit 314 (e.g., summer 314), loop filter 320, decoded image buffer (decoded picture buffer, DPB) 330, mode application unit 360, inter prediction unit 344, and intra prediction unit 354. The inter prediction unit 344 may be or include a motion compensation unit. In some examples, video decoder 30 may perform a decoding process that is generally opposite to the encoding process described by video encoder 100 of fig. 2.
As explained for encoder 20, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, decoded image buffer (decoded picture buffer, DPB) 230, inter prediction unit 344, and intra prediction unit 354 are also referred to as "built-in decoders" that make up video encoder 20. Accordingly, inverse quantization unit 310 may be functionally identical to inverse quantization unit 110, inverse transform processing unit 312 may be functionally identical to inverse transform processing unit 212, reconstruction unit 314 may be functionally identical to reconstruction unit 214, loop filter 320 may be functionally identical to loop filter 220, and decoded image buffer 330 may be functionally identical to decoded image buffer 230. Accordingly, the explanation of the respective units and functions of video encoder 20 applies accordingly to the respective units and functions of video decoder 30.
Entropy decoding
The entropy decoding unit 304 is used to parse the code stream 21 (or generally the encoded image data 21) and perform entropy decoding on the encoded image data 21, resulting in quantization coefficients 309 and/or decoded coding parameters (not shown in fig. 3), etc., such as any or all of inter-prediction parameters (e.g., reference image indices and motion vectors), intra-prediction parameters (e.g., intra-prediction modes or indices), transform parameters, quantization parameters, loop filter parameters, and/or other syntax elements, etc. Entropy decoding unit 304 may be used to apply a decoding algorithm or scheme corresponding to the encoding scheme described for entropy encoding unit 270 of encoder 20. Entropy decoding unit 304 may also be used to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360, as well as other parameters to other units of decoder 30. Video decoder 30 may receive video slice-level and/or video block-level syntax elements. In addition to or instead of slices and corresponding syntax elements, chunking groups and/or chunks and corresponding syntax elements may be received or used.
Inverse quantization
The inverse quantization unit 310 may be configured to receive quantization parameters (quantization parameter, QP) (or information related to inverse quantization in general) and quantization coefficients from the encoded image data 21 (e.g., parsed and/or decoded by the entropy decoding unit 304), and to inverse quantize the decoded quantization coefficients 309 according to the quantization parameters to obtain dequantized coefficients 311, which dequantized coefficients 311 may also be referred to as transform coefficients 311. The dequantization process may include determining a degree of quantization using quantization parameters determined by video encoder 20 for each video block in a video slice (or block or group of blocks), as well as determining a degree of dequantization that needs to be applied.
Inverse transformation
The inverse transform processing unit 312 may be configured to receive the dequantized coefficients 311, also referred to as transform coefficients 311, and apply a transform to the dequantized coefficients 311 to obtain the reconstructed residual block 213 of the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 313. The transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process. The inverse transform processing unit 312 may also be used to receive transform parameters or corresponding information from the encoded image data 21 (e.g., parsed and/or decoded by the entropy decoding unit 304) to determine a transform to be applied to the dequantized coefficients 311.
Reconstruction of
A reconstruction unit 314 (e.g., a summer 314) may be used to add the reconstructed residual block 313 to the prediction block 365, resulting in a reconstructed block 315 of the sample domain, e.g., adding the sample values of the reconstructed residual block 313 and the sample values of the prediction block 365.
Filtering
The loop filter unit 320 is used to filter the reconstructed block 315 (in or after the decoding loop) to obtain a filtered block 321, so as to smoothly perform pixel transformation or improve video quality, etc. Loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bilateral filter, an adaptive loop filter (adaptive loop filter, ALF), a sharpening or smoothing filter, or a collaborative filter, or any combination thereof. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations loop filter unit 320 may be implemented as a post-loop filter.
Decoding image buffer
The decoded video blocks 321 in one picture are then stored in a decoded picture buffer 330, the decoded picture buffer 330 storing the decoded picture 331 as a reference picture for other pictures and/or for outputting subsequent motion compensation of the display, respectively.
Decoder 30 is operative to output decoded image 311 via output unit 312 or the like for presentation to a user or viewing by the user.
Prediction
The inter prediction unit 344 may function identically to the inter prediction unit 244 (in particular the motion compensation unit), the intra prediction unit 354 may function identically to the intra prediction unit 254, and decides division or segmentation and performs prediction according to segmentation and/or prediction parameters or corresponding information received from the encoded image data 21 (e.g., parsed and/or decoded by the entropy decoding unit 304, etc.). The mode application unit 360 may be configured to perform prediction (intra or inter prediction) of each block based on the reconstructed image, the block, or the corresponding samples (filtered or unfiltered), resulting in a predicted block 365.
When a video slice is coded as an intra-coded (I) slice, the intra-prediction unit 354 of the mode application unit 360 is configured to generate a prediction block 365 of an image block of the current video slice from an intra-prediction mode of a current frame or an indication (signal) of a previously decoded block of the image and data. When a video image is coded as a slice that is inter coded (e.g., B or P), the inter prediction unit 344 (e.g., a motion compensation unit) of the mode application unit 360 is used to generate a prediction block 365 for a video block of the current video slice from the motion vectors and other syntax elements received from the entropy decoding unit 304. For inter prediction, the prediction blocks may be generated from one of the reference pictures in one of the reference picture lists. Video decoder 30 may construct a reference frame list from the reference pictures stored in DPB 330 using a default construction technique: list 0 and list 1. In addition to or as an alternative to slices (e.g., video slices), the same or similar process may be applied to embodiments of blocks (e.g., video blocks) and/or partitions (e.g., video blocks), e.g., video may be coded using I, P or B blocks and/or partitions.
The mode application unit 360 is for determining prediction information of a video block in a current video band by parsing a motion vector or related information and other syntax elements and generating a prediction block for the decoded current video block using the prediction information. For example, mode application unit 360 uses some syntax elements received to determine a prediction mode (e.g., intra prediction or inter prediction) for coding video blocks of a video slice, an inter prediction slice type (e.g., B slice, P slice, or GPB slice), construction information for one or more reference picture lists for the slice, motion vectors for each inter-coded video block of the slice, inter prediction state for each inter-coded video block of the slice, other information to decode the video blocks in the current video slice. In addition to or as an alternative to slices (e.g., video slices), the same or similar process may be applied to embodiments of blocks (e.g., video blocks) and/or partitions (e.g., video blocks), e.g., video may be coded using I, P or B blocks and/or partitions.
The embodiment of video decoder 30 shown in fig. 3 may be used to segment and/or decode an image using slices (also referred to as video slices), wherein the image may be segmented or decoded using one or more slices (typically non-overlapping) and each slice may include one or more blocks (e.g., CTUs).
The embodiment of video decoder 30 shown in fig. 3 may be used to segment and/or decode an image using a set of blocks (also referred to as a video set of blocks) and/or blocks (also referred to as a video set of blocks), wherein the image may be segmented or decoded using one or more sets of blocks (typically non-overlapping), each of which may include one or more blocks (e.g., CTUs) or one or more blocks, etc., wherein each of the blocks may be rectangular, etc., may include one or more blocks (e.g., CTUs), such as full or partial blocks.
Other variations of video decoder 30 may be used to decode encoded image data 21. For example, decoder 30 may generate the output video stream without loop filter unit 320. For example, the non-transform based decoder 30 may directly dequantize the residual signal of certain blocks or frames without the inverse transform processing unit 312. In another implementation, in video decoder 30, inverse quantization unit 310 and inverse transform processing unit 312 may be combined into one unit.
It should be understood that the processing results of the current step may be further processed in the encoder 20 and the decoder 30 and then output to the next step. For example, after interpolation filtering, motion vector derivation, or loop filtering, the processing result of the interpolation filtering, motion vector derivation, or loop filtering may be subjected to further operations, such as clipping (clip) or shift (shift) operations.
It should be noted that the derived motion vector of the current block (including, but not limited to, control point motion vector of affine mode, sub-block motion vector of affine mode, plane mode and ATMVP mode, temporal motion vector, etc.) may be further calculated. For example, the value of the motion vector is limited to a predefined range according to the representation bits of the motion vector. If the representative bits of the motion vector are bitDepth, then the range is-2 (bitDepth-1) to 2 (bitDepth-1) -1, where "] represents an exponent. For example, if bitDepth is set to 16, the range is-32768-32767; if bitDepth is set to 18, the range is-131072 ~ 131071. For example, the value of the derived motion vector (e.g., MVs of 4 x 4 sub-blocks in one 8 x 8 block) is limited such that the maximum difference between integer parts of the 4 x 4 sub-blocks MVs does not exceed N pixels, e.g., 1 pixel. Two methods of restricting motion vectors according to bitDepth are provided herein.
Method 1: the most significant bits (most significant bit, MSB) of the overflow are removed by a smoothing operation
ux=(mvx+2 bitDepth )%2 bitDepth (1)
mvx=(ux>=2 bitDepth-1 )?(ux–2 bitDepth ):ux (2)
uy=(mvy+2 bitDepth )%2 bitDepth (3)
mvy=(uy>=2 bitDepth-1 )?(uy–2 bitDepth ):uy (4)
Where mvx is the horizontal component of the motion vector of an image block or sub-block, mvy is the vertical component of the motion vector of an image block or sub-block, ux and uy represent intermediate values.
For example, if mvx has a value of-32769, then after equations (1) and (2) are applied, the result value is 32767. In a computer system, decimal numbers are stored in the form of two-complement numbers. The two's complement of-32769 is 1,0111,1111,1111,1111 (17 bits) and the MSBs are discarded, so the resulting two's complement is 0111,1111,1111,1111 (32767 in decimal), which is the same as the output after applying equations (1) and (2).
ux=(mvpx+mvdx+2 bitDepth )%2 bitDepth (5)
mvx=(ux>=2 bitDepth-1 )?(ux–2 bitDepth ):ux (6)
uy=(mvpy+mvdy+2 bitDepth )%2 bitDepth (7)
mvy=(uy>=2 bitDepth-1 )?(uy–2 bitDepth ):uy (8)
The above-described operations may be applied in the summation of mvp and mvd, as shown in equations (5) to (8).
Method 2: clipping values to remove overflowed MSB
vx=Clip3(–2 bitDepth-1 ,2 bitDepth-1 –1,vx)
vy=Clip3(–2 bitDepth-1 ,2 bitDepth-1 –1,vy)
Where vx is the horizontal component of the motion vector of the image block or sub-block and vy is the vertical component of the motion vector of the image block or sub-block; x, y and z correspond to three input values of the MV correction procedure, respectively, and the function Clip3 is defined as follows:
fig. 4 is a schematic diagram of a video decoding apparatus 400 according to an embodiment of the present invention. The video coding apparatus 400 is adapted to implement the disclosed embodiments described herein. In one embodiment, video coding device 400 may be a decoder, such as video decoder 30 of fig. 1A, or an encoder, such as video encoder 20 of fig. 1A.
The video decoding apparatus 400 includes an input port 410 (or input port 410) and a receiving unit (Rx) 420 for receiving data, a processor, a logic unit, or a central processing unit (central processing unit, CPU) 430 for processing data, a transmitting unit (Tx) 440 and an output port 450 (or output port 450) for transmitting data, and a memory 460 for storing data. The video decoding apparatus 400 may further include an optical-to-electrical (OE) component and an electro-optical (EO) component coupled to the input port 410, the receiving unit 420, the transmitting unit 440, and the output port 450, for outputting or inputting optical or electrical signals.
The processor 430 is implemented by hardware and software. Processor 430 may be implemented as one or more CPU chips, cores (e.g., multi-core processors), FPGAs, ASICs, and DSPs. Processor 430 communicates with ingress port 410, receiver unit 420, transmit unit 440, egress port 450, and memory 460. Processor 430 includes a decode module 470. The decode module 470 implements the embodiments disclosed above. For example, the decode module 470 performs, processes, prepares, or provides various decoding operations. Thus, substantial improvements are provided to the functionality of video coding device 400 by coding module 470 and affect the switching of video coding device 400 to different states. Alternatively, decode module 470 may be implemented in instructions stored in memory 460 and executed by processor 430.
Memory 460 may include one or more disks, tape drives, and solid state drives, and may serve as an overflow data storage device for storing programs as they are selected for execution, and for storing instructions and data that are read during program execution. For example, the memory 460 may be volatile and/or nonvolatile, and may be read-only memory (ROM), random access memory (random access memory, RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM).
Fig. 5 is a simplified block diagram of an apparatus 500 provided by an example embodiment, the apparatus 500 being usable as either or both of the source device 12 and the destination device 14 in fig. 1A.
The processor 502 in the apparatus 500 may be a central processor. Processor 502 may be any other type of device or devices capable of manipulating or processing information, either as is known or later developed. Although the disclosed implementations may be implemented using a single processor, such as processor 502 as shown, the use of more than one processor may increase speed and efficiency.
In one implementation, the memory 504 in the apparatus 500 may be a Read Only Memory (ROM) device or a random access memory (random access memory, RAM) device. Any other suitable type of storage device may be used as memory 504. Memory 504 may include code and data 506 that processor 502 accesses over bus 512. Memory 504 may also include an operating system 508 and an application 510, application 510 including at least one program that causes processor 502 to perform the methods described herein. For example, application 510 may include applications 1 through N, as well as video coding applications that perform the methods described herein.
Apparatus 500 may also include one or more output devices, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines the display with touch-sensitive elements that can be used to sense touch inputs. A display 518 may be coupled to the processor 502 by a bus 512.
Although the bus 512 in the apparatus 500 is described herein as a single bus, the bus 512 may include multiple buses. Further, the secondary memory 514 may be directly coupled to other components of the apparatus 500 or may be accessible over a network and may include a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Thus, the apparatus 500 may have a variety of configurations.
Context-adaptive binary arithmetic coding (Context-Adaptive Binary Arithmetic Coding, CABAC)
Context-adaptive binary arithmetic coding (CABAC) is one form of entropy coding used in h.264/AVC and HEVC. Entropy coding is a lossless compression scheme that uses statistical properties to compress data such that the number of bits used to represent the data is proportional to the probability of the data. For example, in compressing a character string, common characters are represented by several bits, respectively, and uncommon characters are represented by a plurality of bits, respectively. From the information theory of Shannon (Shannon), when compressed data is represented by bits {0,1}, the optimal average code length of a character having a probability of p is-log 2 (p).
Fig. 8 is a schematic diagram of a CABAC entropy coding process. The CABAC coding operation may be described by the following steps:
step 1: binarization
The binarization (corresponding to the binarizer component in fig. 8) process converts the syntax element into a series of binary digits (b 1 、b 2 ……b n ) For example, truncated unary code is used. The "binary" portion of the context-adaptive binary arithmetic coding refers to the binarization step.
Step 2: probability model selection
For each binary digit, either the normal arithmetic coding or the bypass arithmetic coding method is selected (the selection process is represented in the figure by a normal/bypass mode switch). The bypass arithmetic coding mode indicates that the binary digits have 50% probability hypothesis values of "1" and 50% probability hypothesis values of "0". In other words, the probability model for binary digital coding is an equal probability model in the bypass arithmetic coding mode.
If the second branch (conventional arithmetic coding mode) is selected, a probability model (context model or probability estimation model) corresponding to the binary digit is first selected. The probability model is a number representing the probability of observing the binary digit as "1". For example, the probability estimation model may be 20% with a probability hypothesis value of 1 for 20% and 0 for 80% representing the syntax element.
For each binary digit to be encoded in the conventional arithmetic coding mode, there is at least one associated context model (probability model). The selection of the probabilistic model is referred to as context modeling. In some cases, the associated context model corresponding to a binary digit may exceed N > 1. One of the N sets of context models is selected from each time based on a previously coded syntax element. This process is referred to as a context switch. For example, decoding the syntax element intra_luma_not_player_flag, 1 from n=2 context models can be selected according to rules:
if the intra sub-partition (ISP) coding mode is applied by the current block, a first context model is selected.
If the current block does not apply intra sub-partition coding mode, a second context model is selected.
intra_luma_not_player_flag=0 indicates that the current coded block is coded in intra planar prediction mode. intra_luma_not_player_flag=1 indicates that the current coded block is not coded in intra-plane prediction mode.
As shown in fig. 8, there may be multiple probability models stored in the context memory. Each context model is used in the decoding process of binary digits. Multiple context models are sometimes required because the probability of a number becoming 0 (or 1) may be different in each context model. Using this difference may lead to a reduced entropy of the encoded numbers, i.e. a greater compression may be achieved. For example, assume that the probabilities of intra_luma_not_player_flag becoming 0 and 1 are 50% each. In addition, the current block has a 50% probability of applying ISP, and 50% probability of not applying ISP. However, in the block to which the ISP is applied, the probability that the flag intra_luma_not_player_flag becomes 1 is 75%, whereas in the block to which the ISP is not applied, the probability that the flag intra_luma_not_player_flag becomes 1 is 25%.
In this case, if only one context model is used, the entropy is 1
Entropy=–(0.5*log0.5+0.5*log0.5)=1
However, if two context models are used, the entropy is 0.811
Entropy=0.5*Entropy ISP +0.5*Entropy non-ISP
Entropy ISP =–(0.25*log0.25+0.75*log0.75)=0.811
Entropy non-ISP =–(0.75*log0.75+0.25*log0.25)=0.811
For each number encoded using conventional CABAC, it is addressed in CABAC using a unique context index (ctxIdx), corresponding to the first context model it is associated with. If multiple contexts are defined, then a particular context derivation rule is defined. For each associated context, some initialization parameters are needed to estimate its initial probability distribution, including I, P and B-frame initialization parameters, and a parameter called window size, to control the adaptation speed in CABAC. For more details, please refer to Vivienne Sze, madhukar Budagavi, gary j. Sullivan, springer, high Efficiency Video Coding (HEVC) published 2014: algorithm and architecture (High Efficiency Video Coding (HEVC): algorithms and Architectures).
Step 3: arithmetic coding
The binary digits are encoded using arithmetic coding according to a selected coding mode (bypass coding or conventional arithmetic coding) and a selected probability estimation model (context model). Arithmetic coding is a lossless entropy coding method, for example explained in (https:// en. Wikipedia. Org/wiki/arithmetical_coding). The arithmetic coding operation requires a probability (probability model, probability estimation model, or context model) of the occurrence of a symbol. A probability model for encoding binary digits is obtained according to step 2. The probability model is used in such a way that low probability symbols use more bits to encode and high probability symbols use fewer bits to encode. For example, if the probability model is 50%, the value 1 and the value 0 will be encoded using an equal number of bits. However, if the probability model is 20%, more bits are required to encode a value of 1 than a value of 0. With this scheme, the total number of bits required to encode many binary digits is minimal.
Step 4: probabilistic model update
After the binary digital coding is completed, the context model for the binary digital coding will be updated accordingly. For example, if the nth context model is used during the encoding of a binary digit, and if the context model represents that the probability of observing 1 is 20%, then the probability model increases (hence the probability of observing 1 is greater than 20%) if the value of the binary digit is 1. On the other hand, if the value of the binary digit is 0, the probability model decreases (thus the probability of observing 1 is less than 20%). The "adaptive" part of the context-adaptive binary arithmetic coding refers to the probability model update step. It should be noted that, step 4 (probability model update) is only applicable to the conventional arithmetic coding mode of CABAC, and is not applicable to the bypass coding mode of CABAC. In bypass coding mode, the probability model used always represents 50% (the probability of observing 1 or 0 is equal). On the other hand, the probability model in the conventional arithmetic coding mode is updated to capture the statistical variance in the relevant binary digits.
The CABAC decoding process uses the above 4 steps in a similar manner. Further, for more details on CABAC, please see the examples in (https:// en. Wikipedia. Org/wiki/Context-adaptive_binary_arithmetical_coding).
It should be noted that the terms context model, probability estimation model are used as synonyms in the present invention.
Geometric partitioning and geometric partitioning flags
In JVET-O0489, a geometric partitioning mode is proposed. The geometric partitioning scheme (geometric partition mode, GEO) is a partitioning method that is an extension of the current triangle prediction scheme (triangle prediction mode, TPM).
In the jfet-L conference held by australian, the VVC draft adopts a triangulation mode (Triangle Partition Mode, TPM) code. The TPM may be used for unidirectional prediction blocks of no less than 8 x 8 in order to balance complexity with bi-directional inter prediction block coding. The TPM divides the rectangular code block into two triangular prediction blocks, the diagonal or anti-diagonal directions of which are shown on the left side of FIG. 7. The entire block residual is encoded for the TPM; the sub-block transform (SBT) of the inter-prediction CU is not used for the TPM.
The geometric partitioning mode is an extension of the existing TPM mode. The proposed technical solution further expands the flexibility of non-rectangular partitioning between blocks using the geometric partitioning (GEO) concept and introduces a specific version of sub-block transform for non-rectangular inter CUs.
For example, the right side of fig. 7 shows some examples of geometric partitioning patterns. The dividing line may be characterized by an angle variable and a distance variable.
The geometric division mode flag is used to indicate whether the current coding block (or current block) uses a geometric division mode. When the geometric partitioning application condition is satisfied, each encoded block (or block) includes a geometric mode flag having a value of 0 or a value of 1. In jfet-O0489, a geometric partitioning mode may be used when the encoded block is larger than 8 x 8 luma samples.
The process of parsing the geometric partitioning markers from the binary code stream is to use the CABAC method described above. As mentioned in the CABAC context, a context model is used to decode a binary code stream into a logo. In jfet-O0489, 3 context models are used to decode the geometric partitioning markers from the binary bitstream. The context model index (ctxIdx) is derived as follows:
context model index 0: and deducing when the geometric division mode is not used by the left adjacent block and the upper adjacent block of the current coding block. The probability that the current block uses geometric partitioning is small. The positions of the left neighbor block and the upper neighbor block are shown in fig. 6.
Context model index 1: derived when one of the left and upper neighboring blocks of the current encoded block uses the geometric partition mode. The probability of the current block using geometric partitioning is moderate.
Context model index 2: derived when both the left neighbor and the upper neighbor of the current coding block use the geometric partitioning mode. The probability that the current block uses geometric partitioning is large.
In the context of CABAC, it is mentioned that the context model is designed based on the probability of occurrence of a token. Under certain conditions, the occurrence of the geometric division flag may be different. The following embodiments devise new context model index derivation methods for decoding geometric partitioning markers.
First embodiment
According to this embodiment, in the CABAC decoding process, the geometric partition markers are still decoded from the binary code stream using 3 context models.
As previously mentioned, the geometric partitioning mode is an extension of the existing triangulation mode, and it is reasonable to incorporate TPM modes in the context modeling derivation.
In one example of this, in one implementation,
the context model index of the geometric partitioning markers is derived as follows:
context model index 0: the derivation occurs when neither the geometric partition mode nor the trigonometric partition mode is used by the left neighbor block nor the upper neighbor block of the current encoded block (or may be referred to as the current block). The positions of the left neighbor block and the upper neighbor block are shown in fig. 6.
Context model index 1: derived when one of the left and upper neighboring blocks of the current encoded block uses a geometric division mode or a triangular division mode.
Context model index 2: derived when both the left neighbor block and the upper neighbor block of the current coding block use the geometric division mode or the triangular division mode.
In one implementation, the context model index of the geometric partition flag of the current block may be derived according to the following equation:
ctxInc=condL+condA
ctxInc is the context model index; the condL indicates whether the left neighbor block uses the geometric division mode or the triangular division mode, and if the left neighbor block uses the geometric division mode or the triangular division mode, the condL is equal to 1; if the left neighbor block uses neither geometric nor triangular modes, then the condL is equal to 0; the condA indicates whether the upper neighboring block uses the geometric division mode or the triangular division mode, and if the upper neighboring block uses the geometric division mode or the triangular division mode, the condA is equal to 1; if the upper neighbor block uses neither the geometric division mode nor the triangular division mode, condA is equal to 0.
In another implementation, the context model index of the geometric partition flag of the current block may be derived using the availability of the left neighbor block and the upper neighbor block in deriving the context model index according to the following equation:
ctxInc=(condL&&availableL)+(condA &&availableA )
availableL indicates whether a left-neighbor block is available, and if so, availableL is equal to 1; if the left neighbor block is not available, availableL is equal to 0; availableA indicates whether the upper neighbor block is available, and if so, availableA is equal to 1; if the upper neighbor block is not available, availableA is equal to 0.
According to this example, whether the neighbor block uses the geometric division mode may be determined by checking whether the geometric division flag of the neighbor block is 1.
Alternatively, it may be determined whether the neighbor block uses the geometric division pattern by checking whether the neighbor block is allowed to use the geometric division pattern, and in one example, blocks less than 8×8 are not allowed to use the geometric division pattern.
Whether a neighboring block uses the triangulation prediction mode may be determined by checking whether the triangulation prediction flag of the neighboring block is 1.
Alternatively, it is determined whether the neighboring block uses the triangular prediction mode by checking whether the neighboring block is allowed to use the triangular prediction mode, and in one example, blocks smaller than 8×8 are not allowed to use the geometric division mode.
A corresponding general decoding method implemented by a decoding device is shown in fig. 10, the method comprising:
1001: and obtaining a code stream.
1002: and acquiring a context model index of the current block according to at least one of the triangular partition mode and the geometric partition mode of the adjacent block adjacent to the current block, wherein the adjacent block comprises a left adjacent block and an upper adjacent block. The information indicates whether the adjacent block uses a geometric division mode or a triangular division mode, specifically indicates whether at least one of the left adjacent block and the upper adjacent block uses a geometric division mode or a triangular division mode, and if so, indicates whether one or both of the left adjacent block and the upper adjacent block use a geometric division mode or a triangular division mode. And acquiring a context model index according to the information.
1003: and acquiring the value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode.
1004: and decoding the current block according to the value of the geometric division mark.
Second embodiment
According to a second embodiment, in the CABAC decoding process, the geometric partition markers are still decoded from the binary code stream using 4 context models.
Since the probability of geometrical division occurring depends on the aspect ratio of the current block, according to the present embodiment, the other 3 kinds of context model index derivation is based on neighboring blocks, and the 4 th kind of context model index derivation is based on the aspect ratio of the current block.
In one example of this, in one implementation,
context model index 0: the derivation is performed when neither the left neighbor block nor the upper neighbor block of the current coding block uses the geometric partition mode. The positions of the left neighbor block and the upper neighbor block are shown in fig. 6.
Context model index 1: derived when one of the left and upper neighboring blocks of the current encoded block uses the geometric partition mode.
Context model index 2: derived when both the left neighbor and the upper neighbor of the current coding block use the geometric partitioning mode.
Upper and lower Wen MoType index 3: when the aspect ratio of the current encoded block is greater than a predefined threshold (e.g., 2, 4, 8, etc.) (independent of neighboring blocks). In one implementation, the ratio may be set to 2 n N is a positive integer.
It should be noted that the context model indexes 0 to 2 are acquired when the aspect ratio of the current encoded block is equal to or smaller than a predefined threshold. That is, in the decoding process, it is first determined whether the aspect ratio of the current encoded block is greater than a predefined threshold, and a context model index is obtained according to the determination result.
The aspect ratio is the aspect ratio of the current block, and can be obtained, for example, by the following equation:
Ratio=1<<abs(log2(width)–log2(height)),
where height and width are the height and width of the current encoded block, abs () is the absolute value operator, log2 () is the logarithm of base 2, and < is the left shift operation.
In one implementation, the context model index of the geometric partition flag of the current block may be derived according to the following equation:
ctxInc=Ratio>43:(condL +condA )
ctxInc is the context model index; the condL indicates whether the left neighbor block uses the geometric division mode, and if the left neighbor block uses the geometric division mode, the condL is equal to 1; if the left neighbor block does not use the geometric partitioning mode, then the condL is equal to 0; the condA indicates whether the upper adjacent block uses a geometric division mode, and if the upper adjacent block uses the geometric division mode, the condA is equal to 1; if the upper neighbor block does not use the geometric partitioning mode, condA is equal to 0.
In another implementation, the context model index of the geometric partition flag of the current block may be derived using the availability of the left neighbor block and the upper neighbor block in deriving the context model index according to the following equation:
ctxInc=Ratio>43:(condL&&availableL)+(condA &&availableA )
availableL indicates whether a left-neighbor block is available, and if so, availableL is equal to 1; if the left neighbor block is not available, availableL is equal to 0; availableA indicates whether the upper neighbor block is available, and if so, availableA is equal to 1; if the upper neighbor block is not available, availableA is equal to 0.
Third embodiment
The context model index derivation method of the geometric division mode of the first embodiment and the second embodiment may be combined.
In other words, in the CABAC decoding process, the geometric partition markers are still decoded from the binary code stream using the 4 context models.
The context model index is derived as:
context model index 0: and deducing when the geometric division mode or the triangular division mode is not used by the left adjacent block and the upper adjacent block of the current coding block. The positions of the left neighbor block and the upper neighbor block are shown in fig. 6.
Context model index 1: derived when one of the left and upper neighboring blocks of the current encoded block uses a geometric division mode or a triangular division mode.
Context model index 2: derived when both the left neighbor block and the upper neighbor block of the current coding block use the geometric division mode or the triangular division mode.
Context model index 3: when the aspect ratio of the current encoded block is greater than a predefined threshold (e.g., 4) (independent of neighboring blocks).
It should be noted that the context model indexes 0 to 2 are acquired when the aspect ratio of the current encoded block is equal to or smaller than a predefined threshold. That is, in the decoding process, it is first determined whether the aspect ratio of the current encoded block is greater than a predefined threshold, and a context model index is obtained according to the determination result.
The aspect ratio is the aspect ratio of the current block, and can be obtained, for example, by the following equation:
Ratio=1<<abs(log2(width)–log2(height)),
where height and width are the height and width of the current encoded block, abs () is the absolute value operator, log2 () is the logarithm of base 2, and < is the left shift operation. For example, if the current block is 16 pixels wide and 4 pixels high, this will give ratio=1 < < abs (4-2), so that the result of the Ratio is 4.
In one implementation, the context model index of the geometric partition flag of the current block may be derived according to the following equation:
ctxInc=Ratio>43:(condL+condA )
ctxInc is the context model index; the condL indicates whether the left neighbor block uses the geometric division mode or the triangular division mode, and if the left neighbor block uses the geometric division mode or the triangular division mode, the condL is equal to 1; if the left neighbor block uses neither geometric nor triangular modes, then the condL is equal to 0; the condA indicates whether the upper neighboring block uses the geometric division mode or the triangular division mode, and if the upper neighboring block uses the geometric division mode or the triangular division mode, the condA is equal to 1; if the upper neighbor block uses neither the geometric division mode nor the triangular division mode, condA is equal to 0. If the aspect ratio is greater than 4 (in this particular example), then the computation of the context model index ctxInc is equal to 3, independent of the values of condL and condA.
In another implementation, the context model index of the geometric partition flag of the current block may be derived using the availability of the left neighbor block and the upper neighbor block in deriving the context model index according to the following equation:
ctxInc=Ratio>43:(condL&&availableL)+(condA &&availableA )
availableL indicates whether a left-neighbor block is available, and if so, availableL is equal to 1; if the left neighbor block is not available, availableL is equal to 0; availableA indicates whether the upper neighbor block is available, and if so, availableA is equal to 1; if the upper neighbor block is not available, availableA is equal to 0.
Fourth embodiment
A decoding method implemented by a decoding device is shown in fig. 9, the method comprising:
901: and obtaining a code stream.
902: the aspect ratio of the current block is obtained.
The aspect ratio is the ratio of the width to the height of the current block, and in one implementation, the aspect ratio may be obtained according to the following equation:
Ratio=1<<abs(log2(width)–log2(height))。
height and width in the equation are the height and width of the current block, abs () is an absolute value operator, log2 () is a base 2 logarithm, and < is a left shift operation.
903: and obtaining the context model index of the current block according to the aspect ratio.
If the aspect ratio is greater than a predefined threshold, a context model index 3 of the current block is obtained.
And if the aspect ratio is equal to or smaller than a predefined threshold, acquiring the context model index of the current block according to at least one of the triangular partition mode and the geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and a top neighboring block.
Details of acquiring the context model index are similar to those of embodiment 2 and embodiment 3.
904: and acquiring the value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode.
905: and decoding the current block according to the value of the geometric division mark.
Fig. 11 shows a decoder 1100 provided by the present invention. Decoder 1100 includes one or more processors 1101 and a non-transitory computer readable storage medium 1102. A non-transitory computer readable storage medium 1102 is coupled to the one or more processors 1101 and stores programs for execution by the one or more processors 1101, wherein the decoder 1100 is configured to perform a method according to one of the aspects or implementations of the present invention when the one or more processors 1101 execute the programs. During startup, the one or more processors 1101 receive a program from the non-transitory computer readable storage medium 1102.
Fig. 12 shows another decoder 1200 provided by the present invention. Decoder 1200 includes an acquisition module 1201 and a decoding module 1202.
According to one aspect, the obtaining module 1201 is configured to: acquiring a code stream; acquiring the aspect ratio of the current block; acquiring a context model index of the current block according to the aspect ratio; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; the decoding module 1202 is configured to decode the current block according to the value of the geometric partition flag.
Alternatively, according to another aspect, the obtaining module 1201 is configured to: acquiring a code stream; acquiring a context model index of a current block according to at least one of information of a triangular partition mode and a geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and an upper neighboring block; acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode; the decoding module 1202 is configured to decode the current block according to the value of the geometric partition flag.
Mathematical operators
The mathematical operators used in this application are similar to those used in the C programming language. However, the results of integer division and arithmetic shift operations are defined more accurately, and other operations, such as power operations and real-valued division, are defined. The numbering and counting specifications typically start from 0, e.g. "first" corresponds to 0 th, and "second" corresponds to 1 st, etc.
Arithmetic operator
The following arithmetic operators are defined as follows:
logical operators
The following logical operators are defined as follows:
boolean logical AND of x & & y x and y "
Boolean logical OR of x y x and y "
The following is carried out Boolean logic NOT "
xy z if x is TRUE (TRUE) or not equal to 0, then the value of y is returned, otherwise the value of z is returned.
Relational operators
The following relational operators are defined as follows:
greater than
> is greater than or equal to
< less than
< = less than or equal to
= equal to
The following is carried out =not equal to
When a relational operator is applied to a syntax element or variable that has been assigned a value of "na" (inapplicable), the value "na" is a different value of the syntax element or variable. The value "na" is not equal to any other value.
Bitwise operator
The following bitwise operators are defined as follows:
and bitwise. When integer parameters are calculated, a two's complement representation of the integer value is calculated. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
The I presses bit OR. When integer parameters are calculated, a two's complement representation of the integer value is calculated. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
The exclusive OR of the bits. When integer parameters are calculated, a two's complement representation of the integer value is calculated. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
The two's complement integer of x > > y x represents an arithmetic right shift of y binary digits. This function definition is only present if y is a non-negative integer value. The result of the right shift is that the bit shifted into the most significant bit (most significant bit, MSB) is equal to the MSB of x before the shift operation.
The two's complement integer of x < < y x represents an arithmetic left shift of y binary digits. This function definition is only present if y is a non-negative integer value. The result of the left shift is that the bit shifted into the least significant bit (least significant bit, LSB) is equal to 0.
Assignment operator
The following arithmetic operators are defined as follows:
=assignment operator
++ increase, i.e., x++ equals x=x+1; when used in an array index, is equal to the value of the variable before the increment operation.
-minus, i.e. x-equals x=x-1; when used in an array index, is equal to the value of the variable before the subtraction operation.
The + = increase by a specified amount, i.e., x+=3 equals x=x+3, x+= (-3) equals x=x+ (-3).
- = decrease by a specified amount, i.e. x- = 3 equals x=x-3, x- = (-3) equals x=x- (-3).
Range representation
The following notation is used to illustrate the range of values:
x= y.. Z x takes an integer value from y to z (including y and z), where x, y and z are integers and z is greater than y.
Mathematical function
The following mathematical functions are defined:
the Asin (x) triangle arcsine function operates on a parameter x, wherein x is between-1.0 and 1.0 (inclusive), and the output value is between-pi/2 and pi/2 (inclusive), and the unit is radian.
The Atan (x) trigonometric arctangent function operates on a parameter x, and the output value is in radians between-pi/2 and pi/2 (inclusive).
Ceil (x) is greater than or equal to the smallest integer of x.
Clip1 Y (x)=Clip3(0,(1<<BitDepth Y )–1,x)
Clip1 C (x)=Clip3(0,(1<<BitDepth C )–1,x)
The Cos (x) triangle cosine function operates on the parameter x in radians.
Floor (x) is less than or equal to the maximum integer of x.
The natural logarithm of Ln (x) x (the base logarithm of e, where e is the natural logarithm base constant 2.718 281828 … …).
Log2 (x) x is a base 2 logarithm.
Log10 (x) x is a base 10 logarithm.
Round(x)=Sign(x)*Floor(Abs(x)+0.5)
Sin (x) is a triangular sine function, and the unit is radian when the parameter x is calculated.
Swap(x,y)=(y,x)
And (3) a Tan (x) trigonometric tangent function, and calculating a parameter x, wherein the unit is radian.
Operation priority order
When brackets are not used to explicitly indicate the order of precedence in an expression, the following rules are adapted:
high priority operations are computed before any low priority operations.
The operations of the same priority are computed sequentially from left to right.
The following table illustrates the priorities of operations from highest to lowest, the higher the position in the table, the higher the priority.
For operators also used in the C programming language, the operator priority order in this specification is the same as the priority order in the C programming language.
Table: priority of operations from highest (top of table) to lowest (bottom of table)
Text description of logical operations
In the text, the statement of a logical operation is described mathematically as follows:
if(condition 0)
statement 0
else if(condition 1)
statement 1
...
else/. About.about.about.the remarks of the remaining conditions
statement n
The following description may be used:
… … is as follows/… …:
-if condition 0, statement 0
Otherwise, if condition 1, statement 1
–……
Otherwise (remark for information on remaining conditions), then it is statement n
Each of the "if … … if … … otherwise, … …" expression pass "if … …" followed by "… … is as follows: or … …: "lead out". The last condition of "if … …, otherwise, if … …, otherwise, … …" is always "otherwise, … …". By setting "… … as follows: the OR … … then matches the end statement … … otherwise to identify a statement with an intermediate if … … otherwise, if … … otherwise, … … ".
In the text, the statement of a logical operation is described mathematically as follows:
if(condition 0a&&condition 0b)
statement 0
else if(condition 1a||condition 1b)
statement 1
...
else
statement n
the following description may be used:
… … is as follows/… …:
statement 0 if all of the following conditions are true:
condition 0a
Condition 0b
Otherwise, statement 1 if one or more of the following conditions are true:
Condition 1a
Condition 1b
–……
Otherwise, for statement n
In the text, the statement of a logical operation is described mathematically as follows:
if(condition 0)
statement 0
if(condition 1)
statement 1
the following description may be used:
when the condition 0, it is statement 0
If condition 1, statement 1 is present.
Although embodiments of the present invention have been described primarily in terms of video coding, it should be noted that embodiments of coding system 10, encoder 20, and decoder 30 (and accordingly, system 10), as well as other embodiments described herein, may also be used for still image processing or coding, i.e., processing or coding a single image independent of any preceding or successive image in video coding. In general, if image processing coding is limited to a single image 17, inter prediction units 244 (encoders) and 344 (decoders) may not be available. All other functions (also referred to as tools or techniques) of video encoder 20 and video decoder 30 are equally applicable to still image processing, such as residual computation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse) transform 212/312, segmentation 262/362, intra-prediction 254/354 and/or loop filtering 220/320, entropy encoding 270, and entropy decoding 304.
Embodiments of the encoder 20 and decoder 30, etc., and the functions described herein in connection with the encoder 20 and decoder 30, etc., may be implemented using hardware, software, firmware, or any combination thereof. If implemented using software, the various functions may be stored or transmitted as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (digital subscriber line, DSL), or wireless in infrared, radio, and microwave modes, then the software is included in the defined medium. It should be understood, however, that the computer-readable storage medium and data storage medium do not include connections, carrier waves, signals, or other transitory media, but are actually directed to non-transitory tangible storage media. Disk (disc) and optical disk (disc) as used herein include Compact Disk (CD), laser disk, optical disk, digital versatile disk (digital versatile disc, DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more digital signal processors (digital signal processor, DSP), general purpose microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the various functions described herein may be provided within dedicated hardware and/or software modules for encoding and decoding, or incorporated into a combined codec. Moreover, the techniques may be implemented entirely in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a variety of devices or apparatuses including a wireless handset, an integrated circuit (integrated circuit, IC), or a set of ICs (e.g., a chipset). The various components, modules, or units are described in this disclosure in order to emphasize functional aspects of the devices for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as noted above, the various elements may be combined in a codec hardware element in combination with suitable software and/or firmware, or provided by a collection of interoperable hardware elements (including one or more processors as described above).

Claims (27)

1. A decoding method implemented by a decoding device, comprising:
acquiring a code stream;
acquiring the aspect ratio of the current block;
acquiring a context model index of the current block according to the aspect ratio;
acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode;
decoding the current block according to the value of the geometric division flag;
the aspect ratio is obtained according to the following equation:
Ratio = 1 <<abs(log2(width) – log2(height)),
wherein height and width in the equation are the height and width of the current block, abs () is an absolute value operator, log2 () is a logarithm based on 2, and < is a left shift operation;
wherein, the context model index of the geometric division mark of the current block is deduced by the following formula:
ctxInc =Ratio>4 ? 3:( condL + condA ),
ctxInc is the context model index; the condL indicates whether the left neighbor block uses the geometric division mode, and if the left neighbor block uses the geometric division mode, the condL is equal to 1; if the left neighbor block does not use the geometric partitioning mode, then the condL is equal to 0; the condA indicates whether the upper adjacent block uses a geometric division mode, and if the upper adjacent block uses the geometric division mode, the condA is equal to 1; if the upper neighbor block does not use the geometric partitioning mode, condA is equal to 0.
2. The method of claim 1, wherein the aspect ratio is a ratio of a width to a height of the current block.
3. The method according to claim 1 or 2, wherein said obtaining a context model index of the current block according to the aspect ratio comprises:
if the aspect ratio is greater than a predefined threshold, a context model index of 3 for the current block is obtained, the predefined threshold being 2 n N is a positive integer.
4. A method according to any of claims 1 to 3, wherein said obtaining a context model index of the current block according to the aspect ratio comprises:
and if the aspect ratio is equal to or smaller than a predefined threshold, acquiring the context model index of the current block according to at least one of the triangular partition mode and the geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and a top neighboring block.
5. The method of claim 4, wherein the predefined threshold is 4.
6. A decoding method implemented by a decoding device, comprising:
acquiring a code stream;
acquiring a context model index of a current block according to at least one of information of a triangular partition mode and a geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and an upper neighboring block;
Obtaining a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode, and the context model index of the geometric division flag of the current block is derived through the following formula: ctxinc=ratio > 4; the condL indicates whether the left neighbor block uses the geometric division mode, and if the left neighbor block uses the geometric division mode, the condL is equal to 1; if the left neighbor block does not use the geometric partitioning mode, then the condL is equal to 0; the condA indicates whether the upper adjacent block uses a geometric division mode, and if the upper adjacent block uses the geometric division mode, the condA is equal to 1; if the upper neighbor block does not use the geometric division mode, the condA is equal to 0;
and decoding the current block according to the value of the geometric division mark.
7. The method of claim 6, wherein the obtaining the context model index of the current block according to at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block to the current block comprises:
And when the geometric division mode or the triangular division mode is not used by the left adjacent block and the upper adjacent block, acquiring a context model index 0 of the current block.
8. The method of claim 6, wherein the obtaining the context model index of the current block according to at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block to the current block comprises:
and when one of the left adjacent block and the upper adjacent block uses a geometric division mode or a triangular division mode, acquiring a context model index 1 of the current block.
9. The method of claim 6, wherein the obtaining the context model index of the current block according to at least one of information of a trigonometric partition mode and a geometric partition mode of a neighboring block to the current block comprises:
and when the left adjacent block and the upper adjacent block both use a geometric division mode or a triangular division mode, acquiring a context model index 2 of the current block.
10. The method of claim 6, wherein one information of the geometric partitioning pattern of the left neighbor block indicates whether the left neighbor block uses a geometric partitioning pattern, and one information of the geometric partitioning pattern of the upper neighbor block indicates whether the upper neighbor block uses a geometric partitioning pattern.
11. The method of claim 10, wherein whether the left neighbor block uses the geometric partition mode is determined according to a value of a geometric partition flag of the left neighbor block or whether the upper neighbor block uses the geometric partition mode is determined according to a value of a geometric partition flag of the upper neighbor block.
12. The method of claim 10, wherein whether the left neighbor block uses the geometric partition mode is determined according to whether the left neighbor block is allowed to use the geometric partition mode, or whether the upper neighbor block uses the geometric partition mode is determined according to whether the upper neighbor block is allowed to use the geometric partition mode.
13. The method of claim 12, wherein the left neighbor block is not allowed to use a geometric partitioning mode if the block size of the left neighbor block is less than 8 x 8, or wherein the upper neighbor block is not allowed to use a geometric partitioning mode if the block size of the upper neighbor block is less than 8 x 8.
14. The method of claim 6, wherein one information of the triangularization mode of the left neighbor block indicates whether the left neighbor block uses a triangularization mode, and one information of the triangularization mode of the upper neighbor block indicates whether the upper neighbor block uses a triangularization mode.
15. The method of claim 14, wherein whether the left neighbor block uses the triangulation pattern is determined according to a value of a triangulation flag of the left neighbor block or whether the upper neighbor block uses the triangulation pattern is determined according to a value of a triangulation flag of the upper neighbor block.
16. The method of claim 14, wherein whether the left neighbor block uses the triangulation mode is determined according to whether the left neighbor block is allowed to use the triangulation mode, or wherein whether the upper neighbor block uses the triangulation mode is determined according to whether the upper neighbor block is allowed to use the triangulation mode.
17. The method of claim 16, wherein the left neighbor block is not allowed to use a triangulation mode if the block size of the left neighbor block is less than 8 x 8, or wherein the upper neighbor block is not allowed to use a triangulation mode if the block size of the upper neighbor block is less than 8 x 8.
18. Decoder (30), characterized by comprising a processing circuit for performing the method according to any of claims 1 to 17.
19. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor by a method according to any one of claims 1 to 17.
20. A decoder (1100), comprising:
one or more processors (1101);
a non-transitory computer readable storage medium (1102) coupled to the one or more processors (1101) and storing a program for execution by the one or more processors, wherein the decoder is configured to perform the method of any of claims 1-17 when the program is executed by the one or more processors.
21. A decoder (1200), comprising:
an acquisition module (1201) for:
acquiring a code stream;
acquiring the aspect ratio of the current block;
acquiring a context model index of the current block according to the aspect ratio;
acquiring a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode;
-a decoding module (1202) for decoding said current block according to said value of said geometric division flag;
the aspect ratio is obtained according to the following equation:
Ratio = 1 <<abs(log2(width) – log2(height)),
wherein height and width in the equation are the height and width of the current block, abs () is an absolute value operator, log2 () is a logarithm based on 2, and < is a left shift operation;
wherein, the context model index of the geometric division mark of the current block is deduced by the following formula:
ctxInc =Ratio>4 ? 3:( condL + condA ),
ctxInc is the context model index; the condL indicates whether the left neighbor block uses the geometric division mode, and if the left neighbor block uses the geometric division mode, the condL is equal to 1; if the left neighbor block does not use the geometric partitioning mode, then the condL is equal to 0; the condA indicates whether the upper adjacent block uses a geometric division mode, and if the upper adjacent block uses the geometric division mode, the condA is equal to 1; if the upper neighbor block does not use the geometric partitioning mode, condA is equal to 0.
22. The decoder of claim 21, wherein the acquisition module is further configured to: if the aspect ratio is greater than a predefined threshold, a context model index of 3 for the current block is obtained, the predefined threshold being 2 n N is a positive integer.
23. Decoder according to claim 21 or 22, wherein the acquisition module is further adapted to: and if the aspect ratio is equal to or smaller than a predefined threshold, acquiring the context model index of the current block according to at least one of the triangular partition mode and the geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and a top neighboring block.
24. A decoder (1200), comprising:
an acquisition module (1201) for:
acquiring a code stream;
acquiring a context model index of a current block according to at least one of information of a triangular partition mode and a geometric partition mode of a neighboring block adjacent to the current block, wherein the neighboring block comprises a left neighboring block and an upper neighboring block;
obtaining a value of a geometric division flag of the current block from the code stream according to the context model index of the current block, wherein the geometric division flag of the current block indicates whether the current block uses a geometric division mode, and the context model index of the geometric division flag of the current block is derived through the following formula: ctxinc=ratio > 4; the condL indicates whether the left neighbor block uses the geometric division mode, and if the left neighbor block uses the geometric division mode, the condL is equal to 1; if the left neighbor block does not use the geometric partitioning mode, then the condL is equal to 0; the condA indicates whether the upper adjacent block uses a geometric division mode, and if the upper adjacent block uses the geometric division mode, the condA is equal to 1; if the upper neighbor block does not use the geometric division mode, the condA is equal to 0;
-a decoding module (1202) for decoding said current block according to said value of said geometric division flag.
25. The decoder of claim 24, wherein the acquisition module is further configured to: and when the geometric division mode or the triangular division mode is not used by the left adjacent block and the upper adjacent block, acquiring a context model index 0 of the current block.
26. Decoder according to claim 24 or 25, wherein the acquisition module is further adapted to: and when one of the left adjacent block and the upper adjacent block uses a geometric division mode or a triangular division mode, acquiring a context model index 1 of the current block.
27. The decoder according to any of claims 24 to 26, wherein the acquisition module is further configured to: and when the left adjacent block and the upper adjacent block both use a geometric division mode or a triangular division mode, acquiring a context model index 2 of the current block.
CN202080060441.4A 2019-08-27 2020-08-26 Encoder, decoder and corresponding methods for CABAC coding of indices of geometric partition flags Active CN114303380B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP2019072797 2019-08-27
EPPCT/EP2019/072797 2019-08-27
PCT/CN2020/111338 WO2021037053A1 (en) 2019-08-27 2020-08-26 An encoder, a decoder and corresponding methods of cabac coding for the indices of geometric partition flag

Publications (2)

Publication Number Publication Date
CN114303380A CN114303380A (en) 2022-04-08
CN114303380B true CN114303380B (en) 2024-04-09

Family

ID=74684252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080060441.4A Active CN114303380B (en) 2019-08-27 2020-08-26 Encoder, decoder and corresponding methods for CABAC coding of indices of geometric partition flags

Country Status (3)

Country Link
EP (1) EP4022925A4 (en)
CN (1) CN114303380B (en)
WO (1) WO2021037053A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230008488A1 (en) * 2021-07-07 2023-01-12 Tencent America LLC Entropy coding for intra prediction modes
WO2023239879A1 (en) * 2022-06-09 2023-12-14 Beijing Dajia Internet Information Technology Co., Ltd. Methods and devices for geometric partitioning mode with adaptive blending
WO2024153151A1 (en) * 2023-01-18 2024-07-25 Douyin Vision Co., Ltd. Method, apparatus, and medium for video processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019009503A1 (en) * 2017-07-07 2019-01-10 삼성전자 주식회사 Video coding method and device, video decoding method and device
CN109644272A (en) * 2016-09-06 2019-04-16 高通股份有限公司 Geometric type priority for construction candidate list
WO2019098758A1 (en) * 2017-11-16 2019-05-23 한국전자통신연구원 Image encoding/decoding method and device, and recording medium storing bitstream
WO2019138998A1 (en) * 2018-01-12 2019-07-18 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Encoding device, decoding device, encoding method, and decoding method
GB201909043D0 (en) * 2019-06-24 2019-08-07 Canon Kk Residual signalling

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8879632B2 (en) * 2010-02-18 2014-11-04 Qualcomm Incorporated Fixed point implementation for geometric motion partitioning
CN102724493B (en) * 2011-01-18 2014-06-25 清华大学 Coding and decoding methods of intra-frame prediction modes based on image blocks and codec
WO2013157791A1 (en) * 2012-04-15 2013-10-24 삼성전자 주식회사 Method and apparatus for determining reference images for inter prediction
CN110870308A (en) * 2017-06-30 2020-03-06 夏普株式会社 System and method for converting pictures into video blocks for video encoding by geometrically adaptive block partitioning
US11477492B2 (en) * 2017-08-04 2022-10-18 Google Inc. Adaptation for entropy coding of blocks of image data
CN118214857A (en) * 2017-10-26 2024-06-18 英迪股份有限公司 Method and apparatus for asymmetric subblock-based image encoding/decoding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109644272A (en) * 2016-09-06 2019-04-16 高通股份有限公司 Geometric type priority for construction candidate list
WO2019009503A1 (en) * 2017-07-07 2019-01-10 삼성전자 주식회사 Video coding method and device, video decoding method and device
WO2019098758A1 (en) * 2017-11-16 2019-05-23 한국전자통신연구원 Image encoding/decoding method and device, and recording medium storing bitstream
WO2019138998A1 (en) * 2018-01-12 2019-07-18 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Encoding device, decoding device, encoding method, and decoding method
GB201909043D0 (en) * 2019-06-24 2019-08-07 Canon Kk Residual signalling

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Semih Esenlik.JVET-O0489-v2,Non-CE4: Geometrical partitioning for inter blocks.JVET-O0489.2019,全文. *
袁媛 ; 郑萧桢 ; 何芸 ; .视频编码中一种新的用于几何块划分的变换结构.上海大学学报(自然科学版).2013,(03),全文. *

Also Published As

Publication number Publication date
WO2021037053A1 (en) 2021-03-04
EP4022925A4 (en) 2022-11-16
EP4022925A1 (en) 2022-07-06
CN114303380A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN114650419B (en) Encoder, decoder and corresponding methods for intra prediction
CN115361549B (en) Encoder, decoder and corresponding methods using history-based motion vector prediction
CN114303380B (en) Encoder, decoder and corresponding methods for CABAC coding of indices of geometric partition flags
CN115426494B (en) Encoder, decoder and corresponding methods using compressed MV storage
CN114885158B (en) Method and apparatus for mode dependent and size dependent block level restriction of position dependent prediction combinations
CN114424567B (en) Method and apparatus for combined inter-intra prediction using matrix-based intra prediction
CN113170152A (en) Method and apparatus for local illumination compensation for predictive coding
CN113039797A (en) Efficient indication method of CBF (cubic boron fluoride) mark
CN113411613B (en) Method for video coding image block, decoding device and coder/decoder
CN115052163B (en) Encoder, decoder and corresponding methods for transformation processing
CN113330748B (en) Method and apparatus for intra prediction mode signaling
CN116647700A (en) Encoder, decoder and corresponding method for intra prediction using intra mode coding
CN115211109A (en) Encoder, decoder and corresponding methods for filter modification for generic intra prediction processes
CN113994683A (en) Encoder, decoder and corresponding methods for chroma quantization control
CN113330741A (en) Encoder, decoder, and corresponding methods for restricting the size of a sub-partition from an intra sub-partition coding mode tool

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant