CN117063464A - IntraBC using wedge wave segmentation - Google Patents

IntraBC using wedge wave segmentation Download PDF

Info

Publication number
CN117063464A
CN117063464A CN202280019661.1A CN202280019661A CN117063464A CN 117063464 A CN117063464 A CN 117063464A CN 202280019661 A CN202280019661 A CN 202280019661A CN 117063464 A CN117063464 A CN 117063464A
Authority
CN
China
Prior art keywords
block
partition
video
prediction
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280019661.1A
Other languages
Chinese (zh)
Inventor
赵欣
许晓中
刘杉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent America LLC
Original Assignee
Tencent America LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/974,068 external-priority patent/US20230135166A1/en
Application filed by Tencent America LLC filed Critical Tencent America LLC
Publication of CN117063464A publication Critical patent/CN117063464A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure relates to video coding, and more particularly to intra block copy coding modes using wedge wave segmentation. For example, a method for processing video data is disclosed, which may include: receiving a video bitstream comprising a current block of video frames; extracting an IntraBC flag indicating that a current block is predicted in an IntraBC mode from a video bitstream; determining, from the video bitstream, that the current block is partitioned in a wedge-wave partition mode, wherein the current block is partitioned into a plurality of partitions including a first partition and a second partition; identifying at least a first partition and a second partition of the current block; determining a first block vector for predicting the first partition and a second block vector for predicting the second partition, respectively; and decoding the current block based at least on the first block vector and the second block vector.

Description

IntraBC using wedge wave segmentation
Technical Field
This disclosure describes a set of advanced video coding techniques. More particularly, the disclosed technology relates to the implementation and enhancement of intra block copy (IntraBC or IBC) using a wedge wave splitting mode in video encoding and decoding.
Background
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Video encoding and decoding may be performed using inter-picture prediction with motion compensation. The uncompressed digital video may include a series of pictures, each picture having a spatial dimension of, for example, 1920x1080 luminance samples and associated full-sampled or sub-sampled chrominance samples. The series of pictures may have a fixed or variable picture rate (alternatively referred to as a frame rate), for example, 60 pictures per second or 60 frames per second. Uncompressed video has specific bit rate requirements for streaming or data processing. For example, video with a pixel resolution of 1920x1080, a frame rate of 60 frames/second, chroma subsampling of 4:2:0, and 8 bits per pixel per color channel requires a bandwidth of approximately 1.5 Gbit/s. One hour such video requires over 600 gigabytes of storage space.
One purpose of video encoding and decoding is to reduce redundancy in an uncompressed input video signal by compression. In some cases, compression helps reduce the bandwidth and/or storage space requirements described above by two orders of magnitude or more. Lossless compression and lossy compression, as well as combinations thereof, may be employed. Lossless compression refers to a technique in which an exact copy of the original signal can be reconstructed from the compressed original signal via a decoding process. Lossy compression refers to an encoding/decoding process in which the original video information is not fully retained during encoding and cannot be fully recovered during decoding. When lossy compression is used, the reconstructed signal may be different from the original signal, but the distortion between the original and reconstructed signals is small enough that the reconstructed signal is useful for the intended application, despite some loss of information. In the case of video, lossy compression is widely used in many applications. The amount of distortion that is tolerated depends on the application. For example, users of certain consumer video streaming applications may tolerate higher distortion than users of movie or television broadcast applications. The achievable compression ratio for a particular coding algorithm may be selected or adjusted to reflect various distortion tolerances: higher tolerable distortion generally allows for coding algorithms that produce higher losses and higher compression ratios.
Video encoders and decoders can utilize several broad classes and steps of techniques, including, for example, motion compensation, fourier transforms, quantization, and entropy coding.
Video codec technology may include a technique known as intra-frame coding. In intra coding, sample values are represented without reference to samples or other data from a previously reconstructed reference picture. In some video codecs, a picture is spatially subdivided into blocks of samples. When all sample blocks are encoded in intra mode, the picture may be referred to as an intra picture. Intra pictures and their derivatives (e.g., independent decoder refresh pictures) may be used to reset decoder states and thus may be used as the first picture in an encoded video bitstream and video session, or as a still image. After intra prediction, samples of the block may then be transformed into the frequency domain and the transform coefficients so generated quantized prior to entropy encoding. Intra prediction represents a technique to minimize sample values in the pre-transform domain. In some cases, the smaller the transformed DC value, the smaller the AC coefficient, and the fewer bits needed to represent the entropy encoded block at a given quantization step.
For example, conventional intra-coding known from the MPEG-2 generation coding technique does not use intra-prediction. However, some newer video compression techniques include techniques that attempt to encode/decode blocks based on, for example, surrounding sample data and/or metadata that are obtained during encoding and/or decoding of spatially adjacent blocks and that precede the block of data that is intra-coded or decoded in decoding order. This technique is hereinafter referred to as "intra prediction" technique. Note that in at least some cases, the reference data used for intra prediction is from the current picture in the reconstruction, not from other reference pictures.
There may be many different forms of intra prediction. When more than one such technique is available in a given video coding technique, the technique used may be referred to as intra-prediction mode. One or more intra prediction modes may be provided in a particular codec. In some cases, the modes may have sub-modes and/or may be associated with various parameters, and the mode/sub-mode information and intra-coding parameters of the video block may be encoded individually or collectively, including in the mode codeword. Which codeword to use for a given mode, sub-mode and/or parameter combination has an impact on the coding efficiency obtained by intra-prediction and also on the entropy coding technique that converts the codeword into a bit stream.
264 introduces a specific intra prediction mode, improves in h.265, and further improves in newer coding techniques, such as joint exploration model (Joint Exploration Model, JEM), universal video coding (Versatile Video Coding, VVC) and reference Set (Benchmark Set, BMS). In general, for intra prediction, neighboring sample values that are already available are used to form a prediction block. For example, available values for a particular set of neighboring samples along a particular direction and/or line may be copied into a predictor block. The reference to the direction in use may be encoded into the bitstream or may be predicted by itself.
Referring to fig. 1A, the lower right depicts a subset of 9 prediction directions specified in 33 possible intra-prediction directions of h.265 (33 angle modes corresponding to 35 intra-modes specified in h.265). The point (101) where the arrows converge represents the predicted sample. The arrow indicates the direction in which neighboring samples are used to predict the sample at 101. For example, arrow (102) indicates that sample (101) is predicted from one or more adjacent samples at an upper right angle of 45 ° to the horizontal. Similarly, arrow (103) indicates that sample (101) is predicted from one or more adjacent samples at the lower left of sample (101) at an angle of 22.5 ° to the horizontal.
Still referring to fig. 1A, a square block (104) of 4 x 4 samples is depicted at the top left (represented by the thick dashed line). The square block (104) includes 16 samples, each labeled "S", its position in the Y dimension (e.g., row index) and its position in the X dimension (e.g., column index). For example, sample S21 is the second sample in the Y dimension (from the top) and the first sample in the X dimension (from the left). Similarly, sample S44 is the fourth sample in block (104) in both the Y and X dimensions. Since the block is 4×4 samples in size, S44 is located in the lower right corner. Reference samples following a similar numbering scheme are also shown. The reference samples are marked with an R-mark, their Y-position (e.g., row index) and X-position (column index) relative to the block (104). In h.264 and h.265, prediction samples adjacent to the block under reconstruction are used.
Intra picture prediction of block 104 may begin by copying reference sample values from neighboring samples indicated by the prediction direction of the signaling. For example, assume that the encoded video bitstream includes signaling indicating for this block 104 the prediction direction of arrow (102) -i.e., predicting one or more prediction samples from the samples at the upper right of a 45 ° angle to the horizontal. In this case, samples S41, S32, S23, and S14 are predicted from the same reference sample R05. Sample S44 is then predicted from the reference sample R08.
In some cases, the values of multiple reference samples may be combined, for example, by interpolation, in order to calculate the reference samples; especially when the direction is not exactly divisible by 45 °.
As video coding technology continues to evolve, the number of possible directions has increased. For example, in h.264 (2003), nine different directions are available for intra prediction. This increases to 33 in h.265 (2013), and JEM/VVC/BMS may support up to 65 directions at the time of this disclosure. Experimental studies have been conducted to help identify the most appropriate intra-prediction directions, and some technique in entropy coding can be used to encode those most appropriate directions with a small number of bits, accepting some bit loss in direction. In addition, the direction itself may sometimes be predicted from the neighboring direction used in intra prediction of neighboring blocks that have already been decoded.
Fig. 1B shows a schematic diagram (180) depicting 65 intra-prediction directions according to JEM to illustrate the increase in the number of prediction directions in various coding techniques that evolve over time.
The manner in which bits representing the intra-prediction direction are mapped to the prediction direction in the encoded video bitstream may vary depending on the video encoding technique; and can range from, for example, a simple direct mapping of prediction direction to intra-prediction modes to codewords, to complex adaptation schemes involving the most probable modes, and similar techniques. However, in all cases, certain directions of intra-prediction are statistically less likely to occur in video content than certain other directions. Since the goal of video compression is to reduce redundancy, in well-designed video coding techniques, those directions that are unlikely will be represented by more bits than the more likely directions.
Inter picture prediction or inter prediction may be based on motion compensation. In motion compensation, sample data from a previously reconstructed picture or a portion thereof (reference picture) may be used to predict a newly reconstructed picture or picture portion (e.g., block) after being spatially shifted in a direction indicated by a motion vector (hereinafter MV). In some cases, the reference picture may be the same as the picture currently being reconstructed. MV may have two dimensions X and Y, or three dimensions, the third dimension being an indication of the reference picture in use (similar to the temporal dimension).
In some video compression techniques, a current MV applicable to a certain region of sample data may be predicted from other MVs, for example, from those other MVs related to other regions of sample data, which are spatially adjacent to the region under reconstruction and are prior to the current MV in decoding order. Doing so can significantly reduce the amount of data required to encode MVs by removing redundancy in the associated MVs, thereby improving compression efficiency. MV prediction can work effectively, for example, because when encoding an input video signal derived from a camera (referred to as natural video), there is a statistical likelihood that regions larger than the region to which a single MV can apply move in similar directions in the video sequence, and thus, in some cases, can be predicted using similar motion vectors derived from MVs of neighboring regions. This results in that the MVs found for a given region are similar or identical to MVs predicted from surrounding MVs. After entropy coding, such MVs may in turn be represented by a smaller number of bits than directly coded MVs, rather than predicted from neighboring MVs. In some cases, MV prediction may be an example of lossless compression of a signal (i.e., MV) derived from an original signal (i.e., a sample stream). In other cases MV prediction itself may be lossy, for example due to rounding errors when calculating the prediction value from several surrounding MVs.
Various MV prediction mechanisms are described in h.265/HEVC (ITU-t rec.h.265, "High Efficiency Video Coding", month 12 in 2016). Among the many MV prediction mechanisms specified in h.265, one technique, hereinafter referred to as "spatial merging", is described below.
In particular, referring to fig. 2, the current block (201) includes samples found by the encoder during a motion search so as to be predictable from the same-sized previous block that has been spatially shifted. Instead of directly encoding the MV, the MV associated with any one of the five surrounding samples, denoted as A0, A1 and B0, B1, B2 (202 to 206, respectively), may be derived from metadata associated with one or more reference pictures, e.g., from the nearest (in decoding order) reference picture. In h.265, MV prediction may use predictors from the same reference picture that neighboring blocks are using.
Disclosure of Invention
Aspects of the present disclosure relate generally to video encoding and decoding, and more particularly, to implementing and enhancing intra block copy (IntraBC or IBC) using a wedge wave splitting mode.
Aspects of the present disclosure provide a method for processing video data. The method comprises the following steps: receiving a video bitstream comprising a current block of video frames; extracting an intra bc flag indicating that a current block is predicted in an intra block copy (intra bc) mode from a video bitstream; determining that a current block is segmented in a wedge-wave segmentation mode from a video bit stream; identifying at least a first partition and a second partition of the current block; extracting a first block vector and a second block vector for the first partition and the second partition respectively; and decoding the current block based at least on the first block vector and the second block vector.
Aspects of the present disclosure also provide another method for processing video data. The method comprises the following steps: receiving a video bitstream including a current block of a video frame, wherein the current block is predicted in an intra block copy (IntraBC) mode using a composite prediction; determining that the current block is divided into at least a first partition and a second partition in a wedge-wave division mode; determining at least two reference blocks for a first partition; determining a composite reference block based on a weighted sum of at least two reference blocks; and reconstructing the first partition based on the composite reference block.
Aspects of the present disclosure also provide a video encoding or decoding device or apparatus comprising circuitry configured to perform any of the above method implementations.
Aspects of the present disclosure also provide a non-transitory computer-readable medium storing instructions that, when executed by a computer for video decoding and/or encoding, cause the computer to perform a method for video decoding and/or encoding.
Drawings
Further features, properties and various advantages of the disclosed subject matter will become more apparent from the following detailed description and drawings, in which:
FIG. 1A shows a schematic diagram of an example set of intra-prediction orientation modes.
Fig. 1B shows a diagram of an exemplary intra prediction direction.
Fig. 2 shows a schematic diagram of a current block for motion vector prediction and its surrounding spatial merging candidates in one example.
Fig. 3 shows a schematic diagram of a simplified block diagram of a communication system (300) according to an example embodiment.
Fig. 4 shows a schematic diagram of a simplified block diagram of a communication system (400) according to an example embodiment.
Fig. 5 shows a schematic diagram of a simplified block diagram of a video decoder according to an example embodiment.
Fig. 6 shows a schematic diagram of a simplified block diagram of a video encoder according to an example embodiment.
Fig. 7 shows a block diagram of a video encoder according to another example embodiment.
Fig. 8 shows a block diagram of a video decoder according to another example embodiment.
Fig. 9 illustrates a scheme of coding block partitioning according to an example embodiment of the present disclosure;
fig. 10 illustrates another scheme of coding block partitioning according to an example embodiment of the present disclosure;
FIG. 11 illustrates another scheme of coding block partitioning according to an example embodiment of the present disclosure;
FIG. 12 illustrates an example partitioning of a basic block into coded blocks according to an example partitioning scheme;
FIG. 13 illustrates an example ternary segmentation scheme;
FIG. 14 illustrates an example quadtree binary tree code block partitioning scheme;
Fig. 15 illustrates a scheme for dividing an encoded block into a plurality of transform blocks and an encoding order of the transform blocks according to an example embodiment of the present disclosure;
FIG. 16 illustrates another scheme for partitioning an encoded block into multiple transform blocks and an encoding order of the transform blocks according to an example embodiment of the present disclosure;
FIG. 17 illustrates another scheme for partitioning an encoded block into multiple transform blocks according to an example embodiment of the present disclosure;
FIG. 18 illustrates the concept of Intra Block Copy (IBC) for predicting a current encoded block using reconstructed encoded blocks in the same frame;
FIG. 19 shows an example reconstructed sample that may be used as a reference sample for an IBC;
FIG. 20 illustrates an example reconstructed sample that may be used as a reference sample for an IBC with some example limitations;
FIG. 21 illustrates an example on-chip Reference Sample Memory (RSM) update mechanism for IBC;
FIG. 22 illustrates a spatial view of the example on-chip RSM update mechanism of FIG. 21;
FIG. 23 illustrates another example on-chip Reference Sample Memory (RSM) update mechanism for IBC;
FIG. 24 illustrates a comparison of spatial views of an example RSM update mechanism for an IBC of a horizontally split superblock and a vertically split superblock;
FIG. 25 illustrates example non-local and local search areas for an IBC reference block;
FIG. 26 illustrates example prediction blocks and limitations imposed on prediction block selection for IBC employing both local and non-local reference block search regions;
FIG. 27 illustrates an example wedge wave segmentation pattern with various segmentation boundaries;
FIG. 28 illustrates an example IntraBC block partitioned into two regions using wedge-wave partitioning, wherein each region has a block vector for locating a corresponding reference block;
FIG. 29 shows a flowchart of a method according to an example embodiment of the present disclosure; and
FIG. 30 shows a schematic diagram of a computer system according to an example embodiment of the present disclosure.
Detailed Description
The present invention will now be described in detail hereinafter with reference to the accompanying drawings, which form a part hereof, and which show by way of illustration specific examples of embodiments. It should be noted, however, that this invention may be embodied in many different forms and, therefore, the covered or claimed subject matter is not to be construed as limited to any of the embodiments set forth below. It is also noted that the present disclosure may be embodied as a method, apparatus, component, or system. Thus, embodiments of the present disclosure may take the form of, for example, hardware, software, firmware, or any combination thereof.
Throughout the specification and claims, terms other than those specifically stated may be used in the context of a implied or implied nuance. The phrase "in one embodiment" or "in some embodiments" as used herein does not necessarily refer to the same embodiment, and the phrase "in another embodiment" or "in other embodiments" as used herein does not necessarily refer to different embodiments. Also, the phrase "in one implementation" or "in some implementations" as used herein does not necessarily refer to the same implementation, and the phrase "in another implementation" or "in other implementations" as used herein does not necessarily refer to a different implementation. For example, it is intended that the claimed subject matter include all or part of a combination of example embodiments/implementations.
Generally, terms may be understood, at least in part, from the use of context. For example, the terms "and," "or" and/or "as used herein may include a variety of meanings that may depend, at least in part, on the context in which the terms are used. In general, "or" (if used to associate a list, e.g., A, B or C) is intended to mean: A. b and C, as used herein, are defined as inclusive; and A, B or C, are used herein in an exclusive sense. Furthermore, the terms "one or more" or "at least one" as used herein, depending at least in part on the context, may be used to describe any feature, structure, or characteristic in the singular sense, or may be used to describe a combination of features, structures, or characteristics in the plural sense. Similarly, the terms "a," "an," or "the" may be understood as conveying a singular usage or a plural usage, depending at least in part on the context. Furthermore, the term "based on" may be understood as not necessarily intended to convey an exclusive set of factors, but rather may allow for additional factors not necessarily explicitly described, again depending at least in part on the context.
Fig. 3 shows a simplified block diagram of a communication system (300) according to an embodiment of the disclosure. The communication system (300) comprises a plurality of terminal devices capable of communicating with each other via, for example, a network (350). For example, the communication system (300) includes a first pair of terminal devices (310) and (320) interconnected via a network (350). In the example of fig. 3, the first pair of terminal devices (310) and (320) may perform unidirectional transmission of data. For example, the terminal device (310) may encode video data (e.g., a video picture stream captured by the terminal device (310)) for transmission to another terminal device (320) via the network (350). The encoded video data may be transmitted in the form of one or more encoded video bitstreams. The terminal device (320) may receive encoded video data from the network (350), decode the encoded video data to recover video pictures, and display the video pictures according to the recovered video data. Unidirectional data transmission may be implemented in a context of a media service application or the like.
In another example, the communication system (300) includes a second pair of terminal devices (330) and (340) that perform bi-directional transmission of encoded video data, such as for encoded video data generated during a video conference. For bi-directional transmission of data, in one example, each of the terminal device (330) and the terminal device (340) may encode video data (e.g., a video picture stream captured by the terminal device) for transmission to the other of the terminal device (330) and the terminal device (340) via the network (350). Each of the terminal device (330) and the terminal device (340) may also receive encoded video data transmitted by the other of the terminal device (330) and the terminal device (340), and may decode the encoded video data to recover the video picture, and may display the video picture on an accessible display device according to the recovered video data.
In the example of fig. 3, the terminal device (310), the terminal device (320), the terminal device (330), and the terminal device (340) may be implemented as a server, a personal computer, and a smart phone, but application of the underlying principles of the disclosure may not be limited thereto. Embodiments of the present disclosure may be implemented in desktop computers, laptop computers, tablet computers, media players, wearable computers, dedicated video conferencing devices, and the like. The network (350) represents any number or type of networks that communicate encoded video data between the terminal devices (310), 320, 330, and 340), including, for example, wired and/or wireless communication networks. The communication network (350) may exchange data in circuit-switched, packet-switched, and/or other types of channels. Representative networks include telecommunication networks, local area networks, wide area networks, and/or the internet. For purposes of the present discussion, the architecture and topology of the network (350) may not be important to the operation of the present disclosure unless explicitly explained herein.
As an example of an application of the disclosed subject matter, fig. 4 illustrates the placement of a video encoder and video decoder in a video streaming environment. The disclosed subject matter is equally applicable to other video applications including, for example, video conferencing, digital television, broadcasting, gaming, virtual reality, storing compressed video on digital media including CDs, DVDs, memory sticks, etc.
The video streaming system may include: a video capture subsystem (413) that may include a video source (401), such as a digital camera; an uncompressed video picture or image stream is created (402). In one example, the video picture stream (402) includes samples recorded by a digital camera of the video source 401. When compared to the encoded video data (404) (or encoded video bitstream), the video picture stream (402) is depicted as a bold line emphasizing a high data volume, which can be processed by an electronic device (420) comprising a video encoder (403) coupled to the video source (401). The video encoder (403) may include hardware, software, or a combination thereof to implement or embody aspects of the disclosed subject matter as described in more detail below. The encoded video data (404) (or encoded video bitstream (404)) is depicted as a thin line to emphasize the lower amount of data when compared to the uncompressed video picture stream (402), which may be stored on a streaming server (405) for future use, or directly to a downstream video device (not shown). One or more streaming client sub-systems (e.g., client sub-systems (406) and (408) in fig. 4) may access streaming server (405) to retrieve copies (407) and (409) of encoded video data (404). The client subsystem (406) may include, for example, a video decoder (410) in an electronic device (430). A video decoder (410) decodes a copy (407) of the encoded video data and creates an output video picture stream (411) that is uncompressed and can be presented on a display (412) (e.g., a display screen) or other presentation device (not shown). The video decoder 410 may be configured to perform some or all of the various functions described in this disclosure. In some streaming systems, encoded video data (404), (407), and (409) (e.g., a video bitstream) may be encoded according to some video encoding/compression standards. Examples of such criteria include ITU-T recommendation H.265. In one example, the video coding standard being developed is informally referred to as universal video coding (VVC). The disclosed subject matter may be used in the context of VVCs and other video coding standards.
It is noted that the electronic devices (420) and (430) may include other components (not shown). For example, the electronic device (420) may include a video decoder (not shown), and the electronic device (430) may also include a video encoder (not shown).
Fig. 5 shows a block diagram of a video decoder (510) according to any of the embodiments of the disclosure below. The video decoder (510) may be included in an electronic device (530). The electronic device (530) may include a receiver (531) (e.g., a receiving circuit). A video decoder (510) may be used in place of the video decoder (410) in the example of fig. 4.
The receiver (531) may receive one or more encoded video sequences to be decoded by the video decoder (510). In the same or another embodiment, one encoded video sequence may be decoded at a time, wherein the decoding of each encoded video sequence is independent of the decoding of the other encoded video sequences. Each video sequence may be associated with a plurality of video frames or images. The encoded video sequence may be received from a channel (501), which may be a hardware/software link to a storage device storing encoded video data or a stream source transmitting encoded video data. The receiver (531) may receive encoded video data and other data, e.g., encoded audio data and/or auxiliary data streams, which may be forwarded to their respective processing circuits (not shown). The receiver (531) may separate the encoded video sequence from other data. To combat network jitter, a buffer memory (515) may be coupled between the receiver (531) and an entropy decoder/parser (520) (hereinafter "parser (520)"). In some applications, the buffer memory (515) may be implemented as part of the video decoder (510). In other applications, it may be external to and separate from the video decoder (510) (not shown). In other applications, there may be a buffer memory (not shown) external to the video decoder (510), e.g., for the purpose of combating network jitter, and another additional buffer memory (515) internal to the video decoder (510), e.g., for processing playback timing. The buffer memory (515) may be unnecessary or small when the receiver (531) receives data from a store/forward device with sufficient bandwidth and controllability or from an equivalent synchronous network. For use over best effort packet networks such as the internet, a buffer memory (515) of sufficient size, which may be relatively large, may be required. Such buffer memory may be implemented with an adaptive size and may be at least partially implemented in an operating system or similar element (not shown) external to the video decoder (510).
The video decoder (510) may include a parser (520) to reconstruct symbols (521) from the encoded video sequence. The categories of symbols include information for managing the operation of the video decoder (510) and potentially controlling a rendering device such as a display (512) (e.g., a display screen), which rendering device (512) may or may not be an integral part of the electronic device (530), but may be coupled to the electronic device (530), as shown in fig. 5. The control information for the rendering device may be in the form of supplemental enhancement information (Supplemental Enhancement Information, SEI message for short) or video availability information (Video Usability Information, VUI parameter set fragments (not shown for short). The parser (520) may parse/entropy decode the encoded video sequence received by the parser (520). Entropy encoding of an encoded video sequence may be in accordance with video encoding techniques or standards, and may follow various principles, including variable length encoding, huffman encoding, arithmetic encoding with or without context sensitivity, and the like. The parser (520) may extract a set of subgroup parameters of at least one subgroup of pixels in the video decoder from the encoded video sequence based on the at least one parameter corresponding to the subgroup. The sub-groups may include groups of pictures (Groups of Pictures, GOP), pictures, tiles, slices, macroblocks, coding Units (Cu), blocks, transform Units (TU), prediction Units (PU), and the like. The parser (520) may also extract information from the encoded video sequence, such as transform coefficients (e.g., fourier transform coefficients), quantizer parameter values, motion vectors, and the like.
The parser (520) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (515) to create symbols (521).
Depending on the type of encoded video picture or portion thereof (e.g., inter and intra pictures, inter and intra blocks), and other factors, the reconstruction of the symbol (521) may involve a number of different processing or functional units. The units involved and how they are involved may be controlled by sub-set of control information parsed from the encoded video sequence by a parser (520). For simplicity, such a subset of control information flow between the parser (520) and the underlying plurality of processing or functional units is not described.
In addition to the functional blocks already mentioned, the video decoder (510) can be conceptually subdivided into a plurality of functional units as described below. In practical implementations operating under commercial constraints, many of these functional units interact closely with each other and may be at least partially integrated with each other. However, for clarity in describing various functions of the disclosed subject matter, a conceptual subdivision into the following functional units is employed in the following disclosure.
The first unit may comprise a sealer/inverse transform unit (551). The sealer/inverse transform unit (551) may receive quantized transform coefficients as well as control information, including information of what type of inverse transform to use, block size, quantization factors/parameters, quantization scaling matrices, etc., as symbols (521) from the parser (520). The sealer/inverse transform unit (551) may output blocks comprising sample values, which may be input into an aggregator (555).
In some cases, the output samples of the sealer/inverse transform unit (551) may belong to intra-coded blocks, i.e. blocks that do not use prediction information from previously reconstructed pictures, but blocks of prediction information from previously reconstructed parts of the current picture may be used. Such prediction information may be provided by the intra picture prediction unit (552). In some cases, the intra picture prediction unit (552) uses surrounding block information that has been reconstructed and stored in the current picture buffer (558) to generate blocks having the same size and shape as the blocks in the reconstruction. The current picture buffer (558) buffers, for example, partially reconstructed current pictures and/or fully reconstructed current pictures. In some implementations, the aggregator (555) may add, on a per sample basis, prediction information that the intra prediction unit (552) has generated to the output sample information provided by the scaler/inverse transform unit (551).
In other cases, the output samples of the sealer/inverse transform unit (551) may belong to inter-coded and possibly motion compensated blocks. In this case, the motion compensated prediction unit (553) may access the reference picture memory (557) to obtain samples for prediction. After motion compensation of the acquired samples according to the symbols (521) related to the block, these samples may be added by an aggregator (555) to the output of a sealer/inverse transform unit (551) (the output of unit 551 may be referred to as residual samples or residual signals) in order to generate output sample information. The addresses within the reference picture memory (557) from which the motion compensated prediction unit (553) obtains the prediction samples may be controlled by motion vectors, which addresses may be obtained by the motion compensated prediction unit (553) in the form of symbols (521) which may have, for example, X, Y components (offsets) and reference picture components (times). When sub-sampling accurate motion vectors are used, the motion compensation may also include interpolation of sample values obtained from a reference picture memory (557) and also be associated with a motion vector prediction mechanism or the like.
The output samples of the aggregator (555) may be subjected to various loop filtering techniques in a loop filter unit (556). Video compression techniques may include loop filtering techniques that are controlled by parameters contained in the encoded video sequence (also referred to as an encoded video bitstream) and are available to the loop filtering unit (556) as symbols (521) from the parser (520), and may also be responsive to meta-information obtained during decoding of the encoded image or a previous (in decoding order) portion of the encoded video sequence, as well as to previously reconstructed and loop filtered sample values. Several types of loop filters may be included in various orders as part of loop filter unit 556, as will be described in further detail below.
The output of the loop filter unit (556) may be a sample stream, which may be output to the rendering device (512) and stored in the reference picture memory (557) for future inter picture prediction.
Once fully reconstructed, some coded pictures may be used as reference pictures for future inter-picture prediction. For example, once an encoded image corresponding to a current picture has been fully reconstructed and the encoded picture has been identified as a reference picture (e.g., by a parser (520)), the current picture buffer (558) may become part of a reference picture memory (557) and a new current picture buffer may be reallocated before starting reconstruction of a next encoded picture.
The video decoder (510) may perform decoding operations according to a predetermined video compression technique employed in a standard such as ITU-T rec.h.265. The coded video sequence may conform to the syntax specified by the video compression technique or standard used in the sense that the coded video sequence conforms to the syntax of the video compression technique or standard and the introduction recited in the video compression technique or standard. In particular, a profile may select some of all available tools from a video compression technology or standard as the only tool available under the profile. To meet the standard, the complexity of the encoded video sequence may be within a range defined by the level of video compression techniques or standards. In some cases, the level limits the maximum picture size, the maximum frame rate, the maximum reconstructed sample rate (e.g., measured in megasamples per second), the maximum reference picture size, and so on. In some cases, the limitations set by the levels may be further limited by assuming the reference decoder (Hypothetical Reference Decoder, HRD) specification and metadata for HRD buffer management for signaling in the encoded video sequence.
In some example embodiments, the receiver (531) may receive additional (redundant) data with encoded video. The additional data may be included as part of the encoded video sequence. The video decoder (510) may use the additional data to correctly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant pictures, forward error correction codes, and the like.
Fig. 6 shows a block diagram of a video encoder (603) according to an example embodiment of the disclosure. The video encoder (603) may be included in an electronic device (620). The electronic device (620) may also include a transmitter (640) (e.g., a transmitting circuit). The video encoder (603) may be used in place of the video encoder (403) in the example of fig. 4.
The video encoder (603) may receive video samples from a video source (601) (which is not part of the electronic device (620) in the example of fig. 6) that may capture video pictures to be encoded by the video encoder (603). In another example, the video source (601) may be implemented as part of an electronic device (620).
The video source (601) may provide a source video sequence to be encoded by the video encoder (603) in the form of a stream of digital video samples, which may have any suitable bit depth (e.g., 8 bits, 10 bits, 12 bits, …), any color space (e.g., bt.601Y CrCB, RGB, …), and any suitable sampling structure (e.g., Y CrCB4: 2:0, Y CrCB4: 4). In a media service system, a video source (601) may be a storage device capable of storing previously prepared video. In a video conferencing system, the video source (601) may be a camera that captures local image information as a video sequence. The video data may be provided in the form of a plurality of individual pictures or images which, when viewed in sequence, impart motion. These pictures themselves may be organized as a spatial array of pixels, where each pixel may include one or more samples, depending on the sampling structure, color space, etc. in use. The relationship between pixels and samples can be easily understood by those skilled in the art. The following description focuses on the sample.
According to some example embodiments, the video encoder (603) may encode and compress pictures of the source video sequence into an encoded video sequence (643) in real-time or under any other temporal constraint required by the application. Implementing the appropriate encoding speed constitutes one function of the controller (650). In some embodiments, the controller (650) may be functionally coupled to and control other functional units as described below. For simplicity, the coupling is not described. Parameters set by the controller (650) may include rate control related parameters (picture skip, quantizer, lambda value of rate distortion optimization technique, …), picture size, group of pictures (GOP) layout, maximum motion vector search range, etc. The controller (650) may be configured to have other suitable functions related to the video encoder (603) optimized for a particular system design.
In some example embodiments, the video encoder (603) is configured to operate in a coding loop. As a simplified description, in one example, the encoding loop may include a source encoder (630) (e.g., responsible for creating symbols, e.g., a symbol stream, based on the input picture and reference picture to be encoded) and a (local) decoder (633) embedded in the video encoder (603). The decoder (633) reconstructs the symbols to create sample data in a manner similar to that the (remote) decoder would also create, even though the embedded decoder 633 processes the video stream encoded by the source encoder 630 without entropy encoding (since any compression between the symbols in entropy encoding and the encoded video bitstream may be lossless in the video compression technique considered in the disclosed subject matter). The reconstructed sample stream (sample data) is input to a reference picture memory (634). Since decoding of the symbol stream results in a bit-accurate result that is independent of the decoder location (local or remote), the content in the reference picture memory (634) is also bit-accurate between the local encoder and the remote encoder. In other words, when prediction is used during decoding, the prediction portion of the encoder "sees" exactly the same sample value as the decoder "sees" as a reference picture sample. The basic principle of reference picture synchronicity (and drift if synchronicity cannot be preserved, e.g. due to channel errors) is used to improve the coding quality.
The operation of the "local" decoder (633) may be the same as the operation of a "remote" decoder, such as the video decoder (510), which has been described in detail above in connection with fig. 5. However, referring briefly to fig. 5, since symbols are available and the entropy encoder (645) and the decoder (520) may be lossless to encode/decode symbols of the encoded video sequence, the entropy decoding portion of the video decoder (510), including the buffer memory (515) and the decoder (520), may not be implemented entirely in the local decoder (633) in the encoder.
In this regard, it is observed that any decoder technique need not necessarily exist in the corresponding encoder in substantially the same functional form, except for the parsing/entropy decoding that may exist in the decoder. To this end, the disclosed subject matter may sometimes focus on decoder operations that are similar to the decoding portion of an encoder. Thus, the description of the encoder techniques may be simplified, as these techniques are the inverse of the fully described decoder techniques. A more detailed description of the encoder is provided below, only in certain areas or aspects.
During operation, in some example implementations, the source encoder (630) may perform motion compensated predictive encoding that predictively encodes an input picture with reference to one or more previously encoded pictures from a video sequence designated as "reference pictures. In this way, the encoding engine (632) encodes differences (or residuals) in the color channel between pixel blocks of the input picture and pixel blocks of a reference picture that may be selected as a prediction reference for the input picture. The term "residual" and adjective form "residual" may be used interchangeably.
The local video decoder (633) may decode encoded video data of a picture that may be designated as a reference picture based on the symbol created by the source encoder (630). The operation of the encoding engine (632) may advantageously be a lossy process. When encoded video data may be decoded at a video decoder (not shown in fig. 6), the reconstructed video sequence may typically be a copy of the source video sequence with some errors. The local video decoder (633) may replicate the decoding process performed by the video decoder on the reference picture, and may cause the reconstructed reference picture to be stored in the reference picture cache (634). In this way, the video encoder (603) may store copies of the reconstructed reference picture locally, with the same content (no transmission errors) as the reconstructed reference picture obtained by the remote (remote) video decoder.
The predictor (635) may perform a predictive search on the encoding engine (632). That is, for a new picture to be encoded, the predictor (635) may search the reference picture memory (634) for sample data (as candidate reference pixel blocks) or some metadata, e.g., reference picture motion vectors, block shapes, etc., which may be used as appropriate prediction references for the new picture. The predictor (635) may operate on a sample block-by-pixel block basis to find a suitable prediction reference. In some cases, the input picture may have a prediction reference extracted from a plurality of reference pictures stored in a reference picture memory (634) as determined by search results obtained by a predictor (635).
The controller (650) may manage the encoding operations of the source encoder (630), including, for example, the setting of parameters and sub-group parameters for encoding video data.
The outputs of all of the foregoing functional units may undergo entropy encoding in an entropy encoder (645). An entropy encoder (645) converts symbols generated by various functional units into an encoded video sequence through lossless compression of the symbols according to techniques such as Huffman coding, variable length coding, arithmetic coding, and the like.
The transmitter (640) may buffer the encoded video sequence created by the entropy encoder (645) in preparation for transmission via a communication channel (660), which may be a hardware/software link to a storage device that is to store the encoded video data. The transmitter (640) may combine the encoded video data from the video encoder (603) with other data to be transmitted, e.g., encoded audio data and/or an auxiliary data stream (source not shown).
The controller (650) may manage the operation of the video encoder (603). During encoding, the controller (650) may assign a particular encoded picture type to each encoded picture, which may affect the encoding techniques that may be applied to the respective picture. For example, a picture may generally be assigned one of the following picture types:
An intra picture (I picture) may be a picture that is encoded and decoded without using any other picture in the sequence as a prediction source. Some video codecs allow for different types of intra pictures, including, for example, independent decoder refresh ("IDR") pictures. Those variations of I pictures and their corresponding applications and features are known to those skilled in the art.
A predicted picture (P-picture) may be a picture encoded and decoded using intra prediction or inter prediction, which predicts sample values of each block using at most one motion vector and a reference index.
A bi-predictive picture (B-picture) may be a picture encoded and decoded using intra prediction or inter prediction, which predicts sample values of each block using at most two motion vectors and a reference index. Similarly, a multi-prediction picture may reconstruct a single block using more than two reference pictures and related metadata.
A source picture may typically be spatially subdivided into a plurality of sample blocks (e.g., 4 x 4, 8 x 8, 4 x 8, or 16 x 16 sample blocks per sample block) and encoded on a block-by-block basis. A block may be predictively encoded with reference to other (already encoded) blocks, where the other blocks are determined by the encoding allocation applied to the respective pictures of the block. For example, a block of an I picture may be non-predictively encoded, or an already encoded block that may refer to the same picture may be predictively encoded (spatial prediction or intra prediction). The pixel blocks of the P picture are predictively encoded via spatial prediction or via temporal prediction with reference to a previously encoded reference picture. The block of B pictures may be predictively encoded via spatial prediction or via temporal prediction with reference to one or two previously encoded reference pictures. For other purposes, the source picture or the intermediately processed picture may be subdivided into other types of blocks. The partitioning of the encoded blocks and other types of blocks may or may not follow the same manner, as described in further detail below.
The video encoder (603) may perform the encoding operations according to a predetermined video encoding technique or standard (e.g., ITU-t rec.h.265). In its operation, the video encoder (603) may perform various compression operations, including predictive coding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the encoded video data may conform to the syntax specified by the video encoding technique or standard being used.
In some example embodiments, the transmitter (640) may transmit additional data with the encoded video. The source encoder (630) may include data that is part of the encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data (e.g., redundant pictures and slices), SEI messages, VUI parameter set slices, and the like.
A plurality of source pictures (video pictures) are captured from the video as a time series. Intra picture prediction (often abbreviated as intra prediction) is to exploit spatial correlation in a given picture, while inter pictures are predictions exploit temporal or other correlation between pictures. In one example, a specific picture, referred to as a current picture in encoding/decoding, is divided into blocks. When a block in the current picture is similar to a reference block in a previously encoded and still buffered reference picture in video, the block in the current picture may be encoded by a vector called a motion vector. In the case of using multiple reference pictures, the motion vector points to a reference block in the reference picture and may have a third dimension that identifies the reference picture.
In some example embodiments, bi-prediction techniques may be used in inter-picture prediction. According to the bi-prediction technique, two reference pictures are used, e.g., a first reference picture and a second reference picture, both of which are preceding the current picture in the video in decoding order (but may be in the past or future, respectively, in display order). A block in the current picture may be encoded by a first motion vector pointing to a first reference block in a first reference picture and a second motion vector pointing to a second reference block in a second reference picture. The block may be jointly predicted by a combination of the first reference block and the second reference block.
Furthermore, merge mode techniques may be used in inter picture prediction to improve coding efficiency.
According to some example embodiments of the present disclosure, for example, inter-picture prediction and intra-picture prediction may perform prediction in units of blocks. For example, pictures in a sequence of video pictures are divided into coding tree units (Coding Tree Units, CTUs for short) for compression, the CTUs in the pictures having the same size, e.g., 128×128 pixels, 64×64 pixels, 32×32 pixels, or 16×16 pixels. In general, a CTU may include three parallel coding tree blocks (Coding Tree Blocks, CTB for short): one luminance CTB and two chrominance CTBs. Each CTU may be recursively quadtree divided into one or more Coding Units (CUs). For example, a 64×64 pixel CTU may be divided into one 64×64 pixel CU or 4 32×32 pixel CUs. Each utility in one or more of the 32 x 32 blocks is further divided into 4 CUs of 16 x 16 pixels. In some example embodiments, each CU may be analyzed during the encoder to determine a prediction type of the CU among various prediction types, e.g., an inter prediction type or an intra prediction type. A CU is divided into one or more Prediction Units (PUs) according to temporal and/or spatial predictability. Typically, each PU includes one luma Prediction Block (PB) and two chroma PBs. In the embodiment, a prediction operation in encoding (encoding/decoding) is performed in units of prediction blocks. A CU may be partitioned into PUs (or different color channels PB) in various spatial modes. For example, luminance or chrominance PB may include a matrix of values (e.g., luminance values) of samples, e.g., 8 x 8 pixels, 16 x 16 pixels, 8 x 16 pixels, 16 x 8 pixels, etc.
Fig. 7 shows a schematic diagram of a video encoder (703) according to another example embodiment of the present disclosure. The video encoder (703) is configured to receive a processing block (e.g., a prediction block) of sample values within a current video picture in a sequence of video pictures and encode the processing block into an encoded image that is part of an encoded video sequence. An example video encoder (703) may be used in place of the video encoder (403) in the example of fig. 4.
For example, the video encoder (703) receives a matrix of sample values for a processing block, e.g., a prediction block of 8 x 8 samples, etc. The video encoder (703) then determines whether to best encode the processing block using intra mode, inter mode, or bi-predictive mode, such as Rate-distortion optimization (Rate-Distortion Optimization, RDO for short). When it is determined that the processing block is to be encoded in intra mode, the video encoder (703) may encode the processing block into an encoded picture using intra prediction techniques; and when it is determined that the processing block is to be encoded in an inter mode or a bi-predictive mode, the video encoder (703) may encode the processing block into an encoded picture using inter-prediction or bi-prediction techniques, respectively. In some example embodiments, the merge mode may be used as a sub-mode of inter picture prediction, where motion vectors are derived from one or more motion vector predictors without benefiting from coded motion vector components outside of the predictors. In some example embodiments, there may be a motion vector component that applies to the object block. Thus, the video encoder (703) may include components not explicitly shown in fig. 7, for example, a mode decision module for determining the prediction mode of the processing block.
In the example of fig. 7, the video encoder (703) includes an inter-frame encoder (730), an intra-frame encoder (722), a residual calculator (723), a switch (726), a residual encoder (724), a general controller (721), and an entropy encoder (725) coupled together as shown in the example arrangement of fig. 7.
The inter-frame encoder (730) is configured to receive samples of a current block (e.g., a processing block), compare the block to one or more reference blocks in a reference picture (e.g., blocks in a previous picture and a subsequent picture in display order), generate inter-frame prediction information (e.g., descriptions of redundant information, motion vectors, merge mode information according to inter-frame coding techniques), and calculate inter-frame prediction results (e.g., prediction blocks) based on the inter-frame prediction information using any suitable technique. In some examples, the reference picture is a decoded reference picture that is decoded based on encoded video information, using a decoding unit 633 (shown as residual decoder 728 of fig. 7) embedded in the example encoder 620 of fig. 6, as described in further detail below.
An intra encoder (722) is configured to receive samples of a current block (e.g., a processed block), compare the block to blocks already encoded in the same picture, generate quantization coefficients after transformation, and in some cases, also generate intra prediction information (e.g., intra prediction direction information according to one or more intra coding techniques). An intra encoder (722) may calculate an intra prediction result (e.g., a prediction block) based on intra prediction information and a reference block in the same picture.
The general controller (721) may be configured to determine general control data and control other components of the video encoder (703) based on the general control data. In one example, a general purpose controller (721) determines a prediction mode for a block and provides control signals to a switch (726) based on the prediction mode. For example, when the prediction mode is an intra mode, the general controller (721) controls the switch (726) to select an intra mode result for use by the residual calculator (723), and controls the entropy encoder (725) to select intra prediction information and include the intra prediction information in the bitstream; when the prediction mode of the block is an inter mode, the general controller (721) controls the switch (726) to select an inter prediction result for use by the residual calculator (723) and controls the entropy encoder (725) to select inter prediction information and include the inter prediction information in the bitstream.
The residual calculator (723) may be configured to calculate a difference (residual data) between a received block and a prediction result of the block selected from the intra encoder (722) or the inter encoder (730). The residual encoder (724) may be configured to encode residual data to generate transform coefficients. In one example, a residual encoder (724) may be configured to convert residual data from a spatial domain to a frequency domain to generate transform coefficients. The transform coefficients are then quantized to obtain quantized transform coefficients. In various example embodiments, the video encoder (703) further includes a residual decoder (728). The residual decoder (728) is configured to perform an inverse transform and generate decoded residual data. The decoded residual data may be suitably used by an intra encoder (722) and an inter encoder (730). For example, the inter-frame encoder (730) may generate a decoded block based on the decoded residual data and the inter-frame prediction information, and the intra-frame encoder (722) may generate a decoded block based on the decoded residual data and the intra-frame prediction information. The decoded blocks are processed appropriately to generate decoded pictures, and the decoded pictures may be buffered in a memory circuit (not shown) and used as reference pictures.
The entropy encoder (725) may be configured to format the bitstream to include the encoded blocks and perform entropy encoding. The entropy encoder (725) is configured to include various information in the bitstream. For example, the entropy encoder (725) may be configured to include general control data, selected prediction information (e.g., intra-prediction information or inter-prediction information), residual information, and other suitable information in the bitstream. When a block is encoded in an inter mode or a merge sub-mode of a bi-prediction mode, there may be no residual information.
Fig. 8 shows a schematic diagram of an example video decoder (810) according to another embodiment of the disclosure. A video decoder (810) is configured to receive encoded pictures as part of an encoded video sequence and decode the encoded pictures to generate reconstructed pictures. In one example, a video decoder (810) may be used in place of the video decoder (410) in the example of fig. 4.
In the example of fig. 8, the video decoder (810) includes an entropy decoder (871), an inter decoder (880), a residual decoder (873), a reconstruction module (874), and an intra decoder (872) coupled together as shown in the example arrangement of fig. 8.
The entropy decoder (871) may be configured to reconstruct certain symbols from the encoded picture that represent syntax elements that make up the encoded picture. Such symbols may include, for example, a mode in which the block is encoded (e.g., intra mode, inter mode, bi-predictive mode, merge sub-mode, or another sub-mode), prediction information that may identify certain samples or metadata that are predicted by the intra decoder (872) or the inter decoder (880) (e.g., intra prediction information or inter prediction information), residual information in the form of quantized transform coefficients, and so forth. In one example, when the prediction mode is an inter or bi-directional prediction mode, inter prediction information is provided to an inter decoder (880); and when the prediction type is an intra prediction type, intra prediction information is provided to an intra decoder (872). The residual information may undergo inverse quantization and be provided to a residual decoder (873).
The inter decoder (880) may be configured to receive inter prediction information and generate an inter prediction result based on the inter prediction information.
An intra decoder (872) may be configured to receive intra prediction information and generate a prediction result based on the intra prediction information.
The residual decoder (873) may be configured to perform inverse quantization to extract dequantized transform coefficients and process the dequantized transform coefficients to transform the residual from the frequency domain to the spatial domain. The residual decoder (873) may also utilize certain control information (to include Quantizer Parameters (QP)) that may be provided by the entropy decoder (871) (the data path is not shown, as this may be just low data volume control information).
The reconstruction module (874) may be configured to combine the residual output by the residual decoder (873) and the prediction result (output by the inter or intra prediction module, as the case may be) in the spatial domain to form a reconstructed block that forms part of the reconstructed picture as part of the reconstructed video. Note that other suitable operations, such as deblocking operations, may also be performed to improve visual quality.
It should be noted that the video encoders (403), (603) and (703) and the video decoders (410), (510) and (810) may be implemented using any suitable technique. In some example embodiments, the video encoders (403), (603), and (703) and the video decoders (410), (510), and (810) may be implemented using one or more integrated circuits. In another embodiment, the video encoders (403), (603) and the video decoders (410), (510) and (810) may be implemented using one or more processors executing software instructions.
Turning to block partitioning for encoding and decoding, a general partitioning may start from a basic block and may follow a predefined set of rules, a specific pattern, a partitioning tree, or any partitioning structure or scheme. Segmentation may be hierarchical and recursive. After dividing or partitioning the basic blocks following any of the example partitioning processes described below or other processes or combinations thereof, a final set of partitions or encoded blocks may be obtained. Each of these partitions may be at one of different segmentation levels in the partition hierarchy and may have various shapes. Each partition may be referred to as a Coding Block (CB). For various example partitioning implementations described further below, each result CB may be any of the allowed sizes and partitioning levels. Such partitions are called encoded blocks because they can form units for which some basic encoding/decoding decisions can be made and encoding/decoding parameters can be optimized, determined and signaled in the encoded video bitstream. The highest or deepest level in the final partition represents the depth of the coded block partition structure of the tree. The coding block may be a luma coding block or a chroma coding block. The CB tree structure of each color may be referred to as a coded block tree (Coding Block Tree, CBT for short).
The coding blocks of all color channels may be collectively referred to as Coding Units (CUs). The hierarchical structure of all color channels may be collectively referred to as Coding Tree Unit (CTU). The division pattern or structure of the various color channels in the CTU may be the same or different.
In some implementations, the partition tree scheme or structure for the luma channel and the chroma channels may not need to be the same. In other words, the luma channel and the chroma channels may have separate coding tree structures or patterns. Furthermore, whether the luma channel and the chroma channel use the same or different coding partition tree structures and the actual coding partition tree structure to be used may depend on whether the slice being encoded is a P, B or I slice. For example, for an I slice, the chroma channel and the luma channel may have separate coding partition tree structures or coding partition tree structure patterns, while for a P or B slice, the luma channel and the chroma channel may share the same coding partition tree scheme. When a separate coding partition tree structure or mode is applied, a luminance channel may be partitioned into CBs by one coding partition tree structure and a chrominance channel may be partitioned into chrominance CBs by another coding partition tree structure.
In some example implementations, a predetermined segmentation pattern may be applied to the basic block. As shown in fig. 9, the example 4-way partition tree may begin at a first predefined level (e.g., a 64 x 64 block level or other size as a basic block size), and basic blocks may be hierarchically partitioned down into a predefined lowest level (e.g., a 4 x 4 level). For example, the basic block may be subject to four predefined segmentation options or patterns indicated by 902, 904, 906, and 908, allowing segmentation designated as R for recursive segmentation, as the same segmentation option as shown in fig. 9 may be repeated at a lower level until the lowest level (e.g., 4 x 4 level). In some implementations, additional restrictions may be applied to the segmentation scheme of fig. 9. In the implementation of FIG. 9, rectangular segmentation may be allowed (e.g., 1:2/2:1 rectangular segmentation), but rectangular segmentation is not allowed to be recursive, whereas square segmentation is allowed to be recursive. The recursive partitioning following fig. 9 will produce a final set of encoded blocks if desired. The encoding tree depth may be further defined to indicate a split depth from the root node or root block. For example, the code tree depth of a root node or root block (e.g., 64 x 64 blocks) may be set to 0, and after the root block is split further once according to fig. 9, the code tree depth is increased by 1. For the above scheme, the maximum or deepest level between the smallest partitions from the 64×64 basic block to the 4×4 block will be 4 (starting from level 0). Such a segmentation scheme may be applied to one or more color channels. Each color channel may be independently segmented following the scheme of fig. 9 (e.g., a segmentation pattern or option in a predefined pattern may be independently determined for each color channel of each hierarchy). Alternatively, two or more color channels may share the same hierarchical pattern tree of fig. 9 (e.g., the same segmentation pattern or option in a predefined pattern may be selected for two or more color channels at each hierarchical level).
FIG. 10 illustrates another example predefined segmentation pattern that allows recursive segmentation to form a segmentation tree. As shown in fig. 10, an example 10-way splitting structure or pattern may be predefined. The root block may start at a predetermined level (e.g., from a basic block of 128 x 128 level or 64 x 64 level). The example partition structure of FIG. 10 includes various 2:1/1:2 and 4:1/1:4 rectangular partitions. The partition type with 3 sub-partitions, indicated in the second row of fig. 10 as 1002, 1004, 1006 and 1008, may be referred to as a "T-type" partition. The "T-shaped" partitions 1002, 1004, 1006, and 1008 may be referred to as left T-shape, top T-shape, right T-shape, and bottom T-shape. In some example implementations, none of the rectangular partitions of fig. 10 are allowed to be further subdivided. The encoding tree depth may be further defined to indicate a split depth from the root node or root block. For example, the code tree depth of a root node or root block (e.g., 128 x 128 blocks) may be set to 0, and after the root block is split further once according to fig. 10, the code tree depth is increased by 1. In some implementations, full square segmentation in only 1010 may be allowed to recursively segment to the next level of the segmentation tree in accordance with the pattern of fig. 10. In other words, for square partitions within the T-shaped patterns 1002, 1004, 1006, and 1008, recursive partitioning may not be allowed. If desired, following the recursive partitioning process of FIG. 10 will produce a final set of encoded blocks. This scheme may be applied to one or more color channels. In some implementations, more flexibility may be added to the partitioning use of less than 8 x 8 blocks. For example, in some cases, 2 x 2 block chroma inter prediction may be used.
In some other example implementations for coded block partitioning, a quadtree structure may be used to partition a base block or an intermediate block into quadtree partitions. This quadtree splitting may be applied hierarchically and recursively to any square splitting. Whether the basic block or intermediate block or partition is split by a further quadtree may be adapted to various local characteristics of the basic block or intermediate block/partition. The quadtree segmentation at the picture boundaries may be further adjusted. For example, an implicit quadtree splitting may be performed at a picture boundary such that a block will preserve the quadtree splitting until its size fits the picture boundary.
In some other example implementations, hierarchical binary partitioning from basic blocks may be used. For this scheme, the basic block or intermediate block level may be partitioned into two partitions. The binary segmentation may be horizontal or vertical. For example, a horizontal binary partition may divide a basic block or a middle block into equal right and left partitions. Also, the vertical binary partition may divide the basic block or the middle block into equal upper and lower partitions. Such binary segmentation may be hierarchical and recursive. It may be decided at each basic block or intermediate block whether the binary segmentation scheme should be continued and if the scheme does continue further, whether a horizontal binary segmentation or a vertical binary segmentation should be used. In some implementations, further partitioning may stop (in one or two dimensions) at a predefined minimum partition size. Alternatively, once a predefined segmentation level or depth from the basic block is reached, further segmentation may be stopped. In some implementations, the aspect ratio of the partitions may be limited. For example, the aspect ratio of the partitions must not be less than 1:4 (or greater than 4:1). Thus, a vertical stripe partition having a vertical to horizontal aspect ratio of 4:1 can only be further vertically binary partitioned into upper and lower partitions each having a vertical to horizontal aspect ratio of 2:1.
In still other examples, a ternary partitioning scheme may be used to partition basic blocks or any intermediate blocks, as shown in fig. 13. The ternary pattern may be implemented vertically, as shown at 1302 in fig. 13, or horizontally, as shown at 1304 in fig. 13. Although the example split ratio (vertical or horizontal) in fig. 13 is shown as 1:2:1, other ratios may be predefined. In some implementations, two or more different ratios may be predefined. This ternary segmentation scheme can be used to complement the quadtree or binary segmentation structure in that the trigeminal tree segmentation can capture objects located at the center of the block in one continuous partition, while the quadtree and binary tree always split along the center of the block, dividing the object into separate partitions. In some implementations, the width and height of the partitions of the example trigeminal tree are always powers of 2 to avoid additional transformations.
The above described segmentation schemes may be combined in any way at different segmentation levels. As one example, the quadtree and binary segmentation schemes described above may be combined to segment the basic block into a quadtree-binary tree (QTBT) structure. In such a scheme, the basic block or intermediate block/partition may be a quadtree split or a binary split, if specified, according to a set of predefined conditions. Fig. 14 shows one specific example. In the example of fig. 14, the basic block is first divided into four partitions by a quadtree, as shown at 1402, 1404, 1406, and 1408. Thereafter, each resulting partition is split by a quadtree into four additional partitions (e.g., 1408), or binary into two additional partitions at the next level (horizontally or vertically, e.g., 1402 or 1406, both of which are symmetrical), or not split (e.g., 1404). For square partitions, binary or quadtree recursively split may be allowed, as shown by the general example partition pattern of 1410 and the corresponding tree structure/representation in 1420, where the solid line represents quadtree split and the dashed line represents binary tree split. Each binary split node (non She Eryuan split) uses a flag to indicate whether the binary split is horizontal or vertical. For example, consistent with the split structure of 1410, a flag "0" may represent a horizontal binary split and a flag "1" may represent a vertical binary split, as shown in 1420. For quadtree splitting, the splitting type need not be indicated, as quadtree splitting always splits blocks or partitions horizontally and vertically to produce 4 equal-sized sub-blocks/partitions. In some implementations, a flag "1" may represent a horizontal binary split and a flag "0" may represent a vertical binary split.
In some example implementations of QTBT, the quadtree and binary split rule set may be represented by the following predefined parameters and corresponding functions associated therewith:
CTU size: root node size of quadtree (size of basic block)
MinQTSize: minimum allowed quad-leaf node size
MaxBTSize: maximum binary tree root node size allowed
MaxBTDepth: maximum binary tree depth allowed
MinBTSize: minimum binary leaf node size allowed
In some example implementations of QTBT splitting structures, CTU size may be set to 128 x 128 luma samples, with two corresponding 64 x 64 chroma sample blocks (when example chroma subsampling is considered and used), minQTSize may be set to 16 x 16, maxbtsize may be set to 64 x 64, minbtsize (for width and height) may be set to 4 x 4, and maxbtdepth may be set to 4. Quadtree partitioning may be applied to CTUs first to generate quadtree leaf nodes. The size of the quadtree nodes may range from 16×16 (i.e., minQTSize) to 128×128 (i.e., CTU size), which is the smallest size that it allows. If the node is 128 x 128, it will not be split first by binary tree since its size exceeds MaxBTSize (i.e., 64 x 64). However, binary tree partitioning may be performed for nodes that do not exceed MaxBTSize. In the example of fig. 14, the basic block is 128×128. The basic block can only perform quadtree splitting according to a predefined rule set. The division depth of the basic block is 0. Each of the four partitions generated is 64 x 64, no more than MaxBTSize, and can be further split in level 1 into quadtrees or binary trees. The process continues. When the binary tree depth reaches MaxBTDepth (i.e., 4), further splitting may not be considered. When the width of the binary tree node is equal to MinBTSize (i.e., 4), further horizontal splitting may not be considered. Similarly, when the height of the binary tree node is equal to MinBTSize, no further vertical splitting is considered.
In some example implementations, the QTBT scheme described above may be configured to support the flexibility of luminance and chrominance having the same QTBT structure or independent QTBT structures. For example, for P and B slices, the luma CTB and chroma CTB in one CTU may share the same QTBT structure. However, for an I slice, the luminance CTB may be partitioned into CBs by a QTBT structure, while the chrominance CTB may be partitioned into chrominance CBs by another QTBT structure. This means that a CU may be used to refer to different color channels in an I slice, e.g. an I slice may consist of coded blocks of a luma component or coded blocks of two chroma components, whereas a CU in a P or B slice may consist of coded blocks of all three color components.
In some other implementations, the QTBT scheme may be complemented with the ternary scheme described above. Such an implementation may be referred to as a multi-type tree (MTT) structure. For example, in addition to binary splitting of nodes, one ternary partition pattern of FIG. 13 may be selected. In some implementations, only square nodes may perform ternary splitting. Additional flags may be used to indicate whether the ternary segmentation is horizontal or vertical.
Two-level or multi-level tree designs (e.g., QTBT implementations and QTBT implementations complemented by ternary splitting) may be driven primarily by complexity reduction. Theoretically, the complexity of traversing the tree is T D Where T represents the number of split types and D is the depth of the tree. The trade-off can be made by using multiple types (T) while reducing the depth (D).
In some implementations, the CB may be further partitioned. For example, for intra or inter prediction during encoding and decoding, a CB may be further partitioned into a plurality of Prediction Blocks (PB). In other words, the CB may be further divided into different sub-partitions, wherein separate prediction decisions/configurations may be made. In parallel, in order to depict a level at which transformation or inverse transformation of video data is performed, CB may be further divided into a plurality of Transformation Blocks (TBs). The scheme of dividing CB into PB and TB may be the same or different. For example, each segmentation scheme may be performed using its own process based on various features such as video data. In some example implementations, the PB and TB split schemes may be independent. In some other example implementations, the PB and TB partitioning schemes and boundaries may be related. In some implementations, for example, a TB may be partitioned after PB partitioning, in particular, after a post-partition determination of the encoded blocks, each PB may then be further partitioned into one or more TBs. For example, in some implementations, a PB may be split into one, two, four, or other number of TB.
In some implementations, the luma channel and the chroma channel may be treated differently in order to partition the basic block into coded blocks and further into prediction blocks and/or transform blocks. For example, in some implementations, partitioning the encoded block into prediction blocks and/or transform blocks may be allowed for luma channels, while partitioning the encoded block into prediction blocks and/or transform blocks may not be allowed for chroma channels. In such implementations, the transform and/or prediction of the luma block may be performed only at the coded block level. For another example, the minimum transform block sizes of the luma and chroma channels may be different, e.g., may allow the encoded blocks of the luma channel to be partitioned into smaller transform blocks and/or prediction blocks than the chroma channels. For another example, the maximum depth at which the encoded block is partitioned into transform blocks and/or prediction blocks may differ between the luma channel and the chroma channel, e.g., the encoded block of the luma channel may be allowed to be partitioned into transform blocks and/or prediction blocks that are deeper than the chroma channel. For a particular example, luma coding blocks may be partitioned into multiple sized transform blocks, which may be represented by recursive partitioning down to up to 2 levels, and may allow for transform block shapes such as squares, 2:1/1:2, and 4:1/1:4, and transform block sizes from 4 x 4 to 64 x 64. However, for a chroma block, only the largest possible transform block specified for a luma block may be allowed.
In some example implementations for partitioning an encoded block into PB, the depth, shape, and/or other characteristics of the PB partitioning may depend on whether the PB is intra-coded or inter-coded.
The partitioning of the coding block (or prediction block) into transform blocks may be implemented in various example schemes, including, but not limited to, recursive or non-recursive quadtree splitting and predefined pattern splitting, with additional consideration of the transform blocks at the coding block or prediction block boundaries. In general, the generated transform blocks may be at different split levels, may not be of the same size, and may not need to be square (e.g., the transform blocks may be rectangular with some allowable size and aspect ratio). Other examples will be described in more detail below in conjunction with fig. 15, 16, and 17.
However, in some other implementations, CBs obtained via any of the partitioning schemes described above may be used as the basic or minimum coding block for prediction and/or transformation. In other words, no further splitting is performed for the purpose of performing inter/intra prediction and/or for the purpose of transformation. For example, the CB obtained from the QTBT scheme above may be directly used as a unit to perform prediction. In particular, this QTBT structure eliminates the concept of multiple partition types, i.e., eliminates the separation of CUs, PUs and TUs, and supports greater flexibility in CU/CB partition shapes as described above. In such a QTBT block structure, the CUs/CBs may have square or rectangular shapes. The leaf nodes of such QTBT are used as units of prediction and transformation processing without any further segmentation. This means that in this example QTBT encoded block structure, the CU, PU, and TU have the same block size.
The various CB splitting schemes described above and the further splitting of CBs into PB and/or TB (excluding PB/TB splitting) may be combined in any manner. The following specific implementations are provided, but are not limited to these examples.
Specific example implementations of coding block and transform block partitioning are described below. In such example implementations, the basic blocks may be divided into encoded blocks using recursive quadtree splitting or the predefined splitting patterns described above (e.g., those in fig. 9 and 10). At each level, whether further quadtree splitting of a particular partition should continue may be determined by the local video data characteristics. The resulting CBs may be at various quadtree splitting levels and have various sizes. A decision may be made at the CB level (or CU level, for all three color channels) as to whether to encode a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction. Each CB may be further divided into one, two, four or other number of PB according to a predefined PB split type. Within one PB, the same prediction procedure may be applied, and related information may be transmitted to the decoder based on the PB. After obtaining the residual block by applying a PB-split type based prediction process, the CB may be partitioned into TBs according to another quadtree structure similar to the coding tree of the CB. In this particular implementation, the CB or TB may be, but is not necessarily limited to, square. Further, in this particular example, PB may be square or rectangular in shape for inter prediction, and PB may be only square for intra prediction. The coded block may be divided into, for example, four square TBs. Each TB may be further recursively split (split using quadtrees) into smaller TBs, referred to as residual quadtrees (Residual Quadtree, RQT for short).
Another example implementation of partitioning basic blocks into CBs, PB and/or TBs is described further below. For example, instead of using multi-partition unit types such as those shown in fig. 9 or 10, quadtrees with nested multi-type trees that use binary and ternary split subdivision structures (e.g., QTBT as described above or QTBT with ternary split) may be used. The separation of CBs, PB and TB (i.e., splitting CBs into PB and/or TB and splitting PB into TB) may be abandoned except when a larger size CB is required for the largest transition length, where such CBs may need to be split further. This example partitioning scheme may be designed to support greater flexibility in CB partition shape so that both prediction and transformation may be performed on the CB level without further partitioning. In such a coding tree structure, CBs may have a square or rectangular shape. In particular, the Coding Tree Block (CTB) may be first partitioned by a quadtree structure. The quadtree leaf nodes may then be further partitioned by the nested multi-type tree structure. FIG. 11 illustrates an example of a nested multi-type tree structure using binary or ternary splitting. Specifically, the example multi-type tree structure of fig. 11 includes four split types, referred to as vertical binary split (split_bt_vertical) (1102), horizontal binary split (split_bt_horizontal) (1104), vertical ternary split (split_tt_vertical) (1106), and horizontal ternary split (split_tt_horizontal) (1108). The CB then corresponds to the leaves of the multi-type tree. In this example implementation, unless CB is too large for the maximum transform length, the subdivision is used for prediction and transform processing without any further segmentation. This means that in most cases CB, PB and TB have the same block size in a quadtree with a nested multi-type tree coding block structure. An exception occurs when the supported maximum transform length is smaller than the width or height of the color component of the CB. In some implementations, the nested pattern of fig. 11 may include quadtree splitting in addition to binary splitting or ternary splitting.
One specific example of a quadtree of a basic block with a nested multi-type tree coding block structure of block partitions (including quadtree, binary and ternary split options) is shown in fig. 12. In more detail, fig. 12 shows a basic block 1200 divided into four square partitions 1202, 1204, 1206 and 1208 by a quadtree. A decision is made for each quadtree splitting partition to further split using the multi-type tree structure and quadtree of fig. 11. In the example of fig. 12, partition 1204 is not split further. Each of the partitions 1202 and 1208 employs another quadtree splitting. For partition 1202, the top left, top right, bottom left and bottom right partitions of the second level quadtree partition employ the third level split of the quadtree, the horizontal binary split 1104 of FIG. 11, the non-split, and the horizontal ternary split 1108 of FIG. 11, respectively. Partition 1208 employs another quadtree split, the top left, top right, bottom left and bottom right four partitions of the second level quadtree split employing the third level split, non-split of the vertical ternary split 1106 of FIG. 11 and the horizontal binary split 1104 of FIG. 11, respectively. The two sub-partitions of the third level upper left partition of partition 1208 are further split according to horizontal binary split 1104 and horizontal ternary split 1108 of fig. 11, respectively. After the vertical binary split 1102 of fig. 11 is split into two partitions, partition 1206 adopts a second level splitting pattern, which is further split in a third level according to the horizontal ternary split 1108 and vertical binary split 1102 of fig. 11. According to the horizontal binary split 1104 of fig. 11, a fourth level split is further applied to one of them.
For the particular example above, the maximum luma transform size may be 64×64, and the maximum supported chroma transform size may be different from a luma such as 32×32. Even though the example CB in fig. 12 above is not generally further divided into smaller PB and/or TB, when the width or height of a luma or chroma coding block is greater than the maximum transform width or height, the luma or chroma coding block may be automatically split in the horizontal and/or vertical directions to meet the transform size limit in that direction.
In a particular example for partitioning basic blocks into CBs, as described above, the coding tree scheme may support the ability of luma and chroma to have separate block tree structures. For example, for P-slices and B-slices, the luma and chroma CTBs in one CTU may share the same coding tree structure. For example, for an I slice, luminance and chrominance may have separate coding block tree structures. When a separate block tree structure is applied, the luminance CTB may be partitioned into luminance CBs by one coding tree structure and the chrominance CTB may be partitioned into chrominance CBs by another coding tree structure. This means that a CU in an I slice may consist of coded blocks of a luminance component or coded blocks of two chrominance components, whereas a CU in a P slice or B slice always consists of coded blocks of all three color components, unless the video is monochrome.
When the encoded block is further partitioned into multiple transform blocks, the transform blocks therein may be ordered in the bitstream in various orders or scan-wise fashion. Example implementations for partitioning an encoded or predicted block into transform blocks and the coding order of the transform blocks are described in further detail below. In some example implementations, as described above, transform partitioning may support transform blocks of various shapes, e.g., 1:1 (square), 1:2/2:1, and 1:4/4:1, with transform block sizes ranging from, e.g., 4×4 to 64×64. In some implementations, if the coding block is less than or equal to 64×64, the transform block partitioning may be applied to only the luma component such that for chroma blocks, the transform block size is the same as the coding block size. Otherwise, if the coding block width or height is greater than 64, the luma and chroma coding blocks may be implicitly divided into multiples of minimum (W, 64) x minimum (H, 64) and minimum (W, 32) x minimum (H, 32) transform blocks, respectively.
In some example implementations of transform block partitioning, for intra-coded blocks and inter-coded blocks, the coded blocks may be further partitioned into multiple transform blocks with a partition depth up to a predefined number of levels (e.g., 2 levels). The transform block segmentation depth and size may be related. For some example implementations, a mapping of the transform size from the current depth to the transform size of the next depth is shown in table 1 below.
Table 1: conversion partition size setting
/>
Based on the example mapping of table 1, for a 1:1 square block, the next stage transform split may create four 1:1 square sub-transform blocks. The transform partition may stop at, for example, 4 x 4. Thus, the transform size of the current depth of 4×4 corresponds to the same size of 4×4 of the next depth. In the example of table 1, for a 1:2/2:1 non-square block, the next stage transform split may create two 1:1 square sub-transform blocks, while for a 1:4/4:1 non-square block, the next stage transform split may create two 1:2/2:1 sub-transform blocks.
In some example implementations, for the luma component of an intra-coded block, additional restrictions may be applied for transform block segmentation. For example, for each level of transform partitioning, all sub-transform blocks may be limited to have the same size. For example, for a 32×16 encoded block, a 1-level transform split creates two 16×16 sub-transform blocks, and a 2-level transform split creates eight 8×8 sub-transform blocks. In other words, the second level of partitioning must be applied to all first level sub-blocks to preserve transform unit size equality. An example of transform block partitioning of intra-coded square blocks after table 1 and the coding order shown by the arrows is shown in fig. 15. Specifically, 1502 shows a square code block. The first level splitting into 4 equal-sized transform blocks according to table 1 and the coding order indicated by the arrows is shown in 1504. The second level splitting of all first level equally sized blocks into 16 equally sized transform blocks and the coding order indicated by the arrows according to table 1 is shown at 1506.
In some example implementations, the above-described limitations of intra-coding may not apply to the luma component of inter-coded blocks. For example, after the first level transform splitting, any one sub-transform block may be independently split further by another level. Thus, the generated transform blocks may or may not have the same size. An example of dividing an inter-coded block into transform blocks with their coding order is shown in fig. 16. In the example of fig. 16, inter-coded block 1602 is divided into two levels of transform blocks according to table 1. At a first level, the inter-coded block is divided into four equal-sized transform blocks. Then, only one (not all) of the four transform blocks is further divided into four sub-transform blocks, resulting in a total of 7 transform blocks of two different sizes, as shown at 1604. An example coding order of the 7 transform blocks is shown by the arrow in 1604 of fig. 16.
In some example implementations, some additional restrictions for the transform block may be applied for the chroma component. For example, for chroma components, the transform block size may be as large as the coding block size, but not smaller than a predefined size, e.g., 8×8.
In some other example implementations, for coding blocks with a width (W) or height (H) greater than 64, both luma coding blocks and chroma coding blocks may be implicitly separated into multiple minimum (W, 64) x minimum (H, 64) and minimum (W, 32) x minimum (H, 32) transform units, respectively. Here, in the present disclosure, "minimum (a, b)" may return a smaller value between a and b.
Fig. 17 further illustrates another alternative example scheme for partitioning an encoded or predicted block into transform blocks. As shown in fig. 17, recursive transform partitioning is no longer used, but rather a predefined set of partition types is applied to the encoded blocks according to their transform types. In the particular example shown in fig. 17, one of 6 example partition types may be applied to divide the encoded block into various numbers of transform blocks. This scheme of generating transform block partitions may be applied to coded blocks or predicted blocks.
In more detail, the partitioning scheme of fig. 17 provides up to 6 example partitioning types for any given transform type (transform type refers to types such as main transforms, such as ADST transforms and other transforms). In this scheme, a transform partition type may be assigned to each encoded block or predicted block based on, for example, rate-distortion costs. In one example, the transform partition type assigned to the encoded block or the predicted block may be determined based on the transform type of the encoded block or the predicted block. The particular transform partition type may correspond to a transform block split size and pattern, as shown by the 6 transform partition types shown in fig. 17. The correspondence between various transform types and various transform partition types may be predefined. One example is shown below, where the capitalization label indicates the type of transform partition that can be assigned to an encoded block or a predicted block based on rate-distortion costs:
Partition_none (parameter_none): a transform size equal to the block size is allocated.
Partition_split (part_split): allocating widths having block sizes 1 / 2 And height of block size 1 / 2 Is used for the transformation of the size of the transform.
Partition_level (parameter_horz): distributing a block having the same width and block size height as the block size 1 / 2 Is used for the transformation of the size of the transform.
Partition_vertical (partition_rt): the transform size having 1/2 of the width of the block size and the same height as the block size is allocated.
Partition_level 4 (parameter_horz4): allocating a height having the same width and block size as the block size 1 / 4 Is used for the transformation of the size of the transform.
Partition_vertical 4 (partition_vert4): 1/4 of the width of the block size and the same transform size as the height of the block size are allocated.
In the above example, the transform partition types shown in fig. 17 all contain a uniform transform size for the divided transform blocks. This is merely an example and is not limiting. In some other implementations, a mixed transform block size may be used for the transform blocks partitioned in a particular partition type (or pattern).
Video blocks (PB or CB, also referred to as PB when not further divided into a plurality of prediction blocks) may be predicted in various ways instead of being directly encoded, thereby improving compression efficiency by utilizing various correlations and redundancies in video data. Accordingly, such predictions may be performed in various modes. For example, a video block may be predicted via intra-prediction or inter-prediction. Particularly in inter prediction mode, a video block may be predicted by one or more other reference blocks or inter prediction blocks from one or more other frames via single reference or composite reference inter prediction. For the implementation of inter prediction, a reference block may be specified by its frame identifier (temporal position of the reference block) and a motion vector indicating the spatial offset between the current block being encoded or decoded and the reference block (spatial position of the reference block). The reference frame identification and the motion vector may be signaled in the bitstream. The motion vector, which is a spatial block offset, may be signaled directly or may be predicted from another reference motion vector or a predicted motion vector. For example, the current motion vector may be predicted directly from a reference motion vector (e.g., of a candidate neighboring block), or from a combination of the reference motion vector and a motion vector difference (Motion Vector Difference, simply MVD) between the current motion vector and the reference motion vector. The latter may be referred to as a merge mode with a motion vector difference (Mode with Motion Vector Difference, MMVD for short). The reference motion vector may be identified in the bitstream as a pointer to, for example, a spatially neighboring block or a temporally neighboring but spatially collocated block of the current block.
In some other example implementations, intra-Block Copy (IBC) prediction may be employed. In IBC, a current block in a current frame may be predicted using another block in the current frame (rather than a temporally different frame, hence the term "intra") in combination with a Block Vector (BV) indicating an offset of the position of an intra predictor or reference block relative to the position of the predicted block. The position of the encoded block may be represented by pixel coordinates such as the left side relative to the upper left corner of the current frame (or slice). That is, IBC mode uses a similar inter prediction concept within the current frame. For example, BV may be predicted directly from other reference BVs or in combination with BV differences between the current BV and the reference BV, similar to predicting MVs using reference MVs and MV differences in inter prediction. IBC is useful in providing improved coding efficiency, particularly for encoding and decoding video frames having screen content with, for example, a large number of repeated patterns, e.g., text information, where identical text segments (letters, symbols, words, phrases, etc.) appear in different parts of the same frame and can be used to predict each other.
In some implementations, IBC may be considered a separate prediction mode in addition to normal intra prediction mode and normal inter prediction mode. In this way, the prediction mode of a particular block can be selected among three different prediction modes and signaled: intra prediction, inter prediction, and IBC modes. In these implementations, the flexibility established in each of these modes will optimize the coding efficiency in each of these modes. In some other implementations, IBC may be considered a sub-mode or branch within inter prediction mode using similar motion vector determination, reference, and coding mechanisms. In such implementations (integrated inter prediction mode and IBC mode), the flexibility of IBC may be somewhat limited in order to coordinate the general inter prediction mode and IBC mode. However, this implementation is less complex, while IBC may still be utilized to improve the coding efficiency of video frames featuring, for example, screen content. In some example implementations, the inter prediction mode may be extended to support IBC using existing pre-specified mechanisms for separating the inter prediction mode from the intra prediction mode.
The selection of these prediction modes may be made at various levels including, but not limited to, sequence level, frame level, picture level, slice level, CTU level, CT level, CU level, CB level, or PB level. For example, for IBC purposes, a decision is made as to whether to employ IBC mode at CTU level and signaled. If the CTU is signaled to employ IBC mode, all coded blocks in the entire CTU can be predicted by IBC. In some other implementations, IBC prediction may be determined at the superblock (SB or superblock) level. Each SB may be divided into multiple CTUs or partitions (e.g., quadtree partitions) in various ways. Examples are provided further below.
Fig. 18 shows an example snapshot of a portion of a current frame containing multiple CTUs from the perspective of a decoder. Each square block, e.g., 1802, represents a CTU. The CTU may be one of various predetermined sizes described in detail above, e.g., SB. Each CTU may include one or more coding blocks (or prediction blocks for a particular color channel). CTUs, which are shaded with horizontal lines, represent those CTUs that have been reconstructed. CTU 1804 represents the current CTU being reconstructed. Within the current CTU 1804, the coded blocks shaded with horizontal lines represent those blocks that have been reconstructed in the current CTU, the coded blocks 1806 shaded with diagonal lines are currently being reconstructed, and the unshaded coded blocks in the current CTU 1804 await reconstruction. Other unshaded CTUs remain to be processed.
The position or offset (relative to the current block) of the reference block used to predict the current coded block in the IBC may be indicated by BV, as shown by the example arrow in fig. 18. For example, the BV may indicate in vector form the difference in position between the current block and the upper left corner of the reference block (labeled "Ref" in fig. 18). While figure 18 is shown using CTUs as the basic IBC unit. The basic principle applies to implementations in which SB is used as the basic IBC element. In such implementations, each superblock may be divided into multiple CTUs, and each CTU may be further divided into multiple encoded blocks, as described in more detail below.
As disclosed in more detail below, a reference CTU/SB may be referred to as a local CTU/SB or a non-local CTU/SB depending on the location of the reference CTU/SB relative to the current CTU/SB of the IBC. The local CTU/SB may refer to a CTU/SB that coincides with the current CTU/SB or a CTU/SB that is near the current CTU/SB and has been reconstructed (e.g., a left neighbor CTU/SB of the current CTU/SB). Non-native CTUs/SBs may refer to CTUs/SBs that are farther from the current CTU/SB. When IBC prediction of the current coding block is performed, a reference block may be searched in either or both of the native CTU/SB and the non-native CTU/SB. The particular manner in which the IBC is implemented may depend on whether the reference CTU/SB is local or non-local, as on-chip and off-chip storage management (e.g., off-chip picture buffers (DPBs) and/or on-chip memory) for reconstructed samples of the local CTU/SB or non-local CTU/SB references may be different. For example, the reconstructed local CTU/SB samples may be suitable for storage in on-chip memory of an encoder or decoder for IBC. For example, the reconstructed non-native CTU/SB samples may be stored in off-chip DPB memory or external memory.
In some implementations, the location of the reconstructed block that may be used as the reference block for the current encoded block 1804 may be limited. This limitation may be a result of various factors and may depend on whether the IBC is implemented as an integrated part of the general inter prediction mode, a special extension of the inter prediction mode, or a separate and independent IBC mode. In some examples, only currently reconstructed CTU/SB samples may be searched to identify IBC reference blocks. In some other examples, the current reconstructed CTU/SB sample and another neighboring reconstructed CTU/SB sample (e.g., the left neighboring CTU/SB) may be used for reference block search and selection, as indicated by the thick dashed box 1808 of fig. 18. For such implementations, only locally reconstructed CTU/SB samples may be used for IBC reference block search and selection. In some other examples, certain CTUs/SBs may not be available for IBC reference block search and selection for various other reasons. For example, the CTU/SB 1810 marked with a cross in fig. 18 may not be useful for searching and selecting reference blocks for the current block 1804 because they may be used for special purposes (e.g., wavefront parallel processing), as described further below.
In some implementations, as shown in fig. 18, the region formed by the thick dashed box 1808 may be referred to as a local search region. Samples in the local search area may be stored in on-chip memory.
In some implementations, when IntraBC is enabled, the Loop filter is disabled, which includes a deblocking filter, a limited direction enhancement filter (Constrained Directional Enhancement Filter, CDEF for short), and Loop Restoration (LR for short). By doing so, a second picture buffer dedicated to enabling/supporting IntraBC may be avoided.
In some implementations, the restriction of the already reconstructed CTUs/SBs that are allowed to be used to provide IBC reference blocks or reference samples,it may result from employing parallel decoding, wherein more than one encoded block is decoded simultaneously. Fig. 19 shows an example in which each square represents one CTU/SB. Parallel decoding can be achieved in which a plurality of continuous rows and a plurality of CTUs/SBs in every other column (every two columns) can be reconstructed in parallel processing, as indicated by the CTUs/SB indicated by diagonal hatching in fig. 19. Other CTUs/SBs, which are shaded with horizontal lines, have been reconstructed, and unshaded CTUs/SBs have not been constructed. With this parallel processing, for the upper left coordinates (x 0 ,y 0 ) CTU/SB of the current parallel processing of (c), only if the ordinate y is less than y0 and the abscissa x is less than x 0 +2(y 0 Y) the reconstructed samples at (x, y) can be accessed to predict the current CTU/SB in the IBC, so that the CTU/SB already constructed, indicated by horizontal line shading, can be used as a reference for the current block processed in parallel. Note that the units of coordinates (e.g., (x) 0 ,y 0 ) And (x, y)) may include pixels, blocks (e.g., SB), and the like.
In some implementations, the delay in writing back immediately reconstructed samples into the off-chip DPB may impose further limitations on the CTUs/SBs that may be used to provide IBC reference samples for the current block, especially when the off-chip DPB is used to save IBC reference samples. An example is shown in fig. 20, where additional restrictions may be imposed on top of those shown in fig. 19. Specifically, to allow hardware write back latency, IBC prediction may not access directly reconstructed regions to search for and select reference blocks. The number of restricted or prohibited immediate reconstruction regions may be 1-n CTUs/SB, where n is a positive number and n may be positively correlated with the duration of the write back delay. Thus, as the maximum limit of the specific parallel processing of FIG. 19, for the current CTU/SB (hatched), if the coordinate of the upper left position of one current CTU/SB is (x) 0 ,y 0 ) If the vertical coordinate y is smaller than y 0 And the horizontal coordinate x is smaller than x 0 +2(y 0 Y) -D, the IBC may access the prediction at position (x, y), where D represents the number of immediate reconstruction regions (e.g., left side of current CTU/SB) that are restricted/prohibited from being used as IBC references. Fig. 20 shows that for d=2 (in blocks, or when each block 128 x 128SB in pixels, 2 x 128 pixels) is limited to such additional CTUs/SB of IBC reference samples. These additional CTUs/SB that are not available as IBC references are indicated by the reverse diagonal hatching.
In some implementations, as shown in fig. 20, the region formed by the blocks represented by horizontal line shading may be referred to as a non-local search region, and samples in the region may be stored in an external memory.
In some implementations, also described in further detail below, both local and non-local CTU/SB search regions may be used for IBC reference block searching and selection. Furthermore, when on-chip memory is used, some of the limitations on the availability of an already constructed CTU/SB as IBC reference with respect to write back latency may be relaxed or eliminated. In some further implementations, the manner in which local CTUs/SBs and non-local CTUs/SBs are used may be different when co-existing due to, for example, the different management of buffering of reference blocks using on-chip memory or off-chip memory. These implementations are described in further detail in the disclosure below.
In some implementations, IBC may be implemented as an extension of inter prediction modes in which the current frame is considered as a reference frame, such that blocks within the current frame may be used as prediction references. Such IBC implementations may thus follow the coding path for inter prediction even though the IBC process only involves the current frame. In such an implementation, the reference structure of the inter prediction mode may be adapted to IBC, wherein the representation of the addressing mechanism of the reference samples using BV may be similar to Motion Vectors (MVs) in inter prediction. In this way, IBC may be implemented as a special inter prediction mode, depending on a similar or identical syntax structure and decoding procedure as the inter prediction mode based on the current frame as a reference frame.
In such implementations, since IBC may be considered as an inter prediction mode, only intra prediction slices must be prediction slices that allow the use of IBC. In other words, intra-only prediction slices are not inter-predicted (because intra-prediction modes do not invoke any inter-prediction processing paths), and therefore IBCs are not allowed for such intra-only slice prediction. When IBC is applicable, the encoder will extend the reference picture list with one entry for the pointer to the current picture. Thus, the current picture may occupy a picture-sized buffer of a shared Decoded Picture Buffer (DPB). Signaling using IBC may be implicit in the selection of reference frames in inter prediction mode. For example, when the selected reference picture points to the current picture, the coding unit will use IBC with inter prediction like coding paths of the special IBC extension if needed and available. In some implementations, the reference samples within the IBC process may not be loop filtered prior to use for prediction, as compared to conventional inter prediction. Furthermore, the corresponding reference current picture may be a long-term reference frame, as it will be close to the next frame to be encoded or decoded. In some implementations, to minimize memory requirements, the encoder may release the buffer immediately after reconstructing the current picture. When the filtered version of the reconstructed picture becomes the reference picture for the subsequent frame in the real inter prediction, the encoder may fill the filtered version of the reconstructed picture back into the DPB as a short term reference, even though the reconstructed picture may not be filtered when used for IBC.
In the example implementations above, IBCs may be handled with several special procedures that may deviate from normal inter prediction, even though IBCs may be just extensions of inter prediction modes. For example, IBC reference samples may also be unfiltered. In other words, reconstructed samples before loop filtering processing, including deblocking filtering (DeBlocking Filtering, DBF for short), sample adaptive offset (Sample Adaptive Offset, SAO for short), cross-component sample offset (Cross-Component Sample Offset, CCSO for short), etc., may be used for IBC prediction, while normal inter prediction modes employ filtered samples for prediction. For another example, no luma sample interpolation of IBC may not be performed, and chroma sample interpolation is only necessary when chroma BV is derived from luma BV, which is not an integer. For yet another example, when the chroma BV is a non-integer and the reference block of the IBC is close to the boundary of the available region of the IBC reference, surrounding reconstructed samples may be outside the boundary to perform chroma interpolation. It is not possible to avoid this situation with BV pointing to a single immediately adjacent borderline.
In such implementations, prediction of the current block by IBC may reuse the prediction and coding mechanisms of the inter prediction process, including predicting the current BV using the reference BV and, for example, additional BV differences. However, in some particular implementations, the luminance BV may be implemented with integer resolution, rather than fractional accuracy as in MVs used for conventional inter prediction.
In some implementations, all CTUs and SBs indicated with horizontal hatching in fig. 18, except for the two CTUs to the right and above the current CTU (indicated by the cross in fig. 18), may be used to search for and select IBC reference blocks to allow wavefront parallel processing (Wavefront Parallel Processing, WPP for short), as shown at 1810 in fig. 18. In this way, almost the entire already reconstructed area of the current picture is reconstructed, with some exceptions for parallel processing purposes.
In some other implementations, the region searched and selected from the IBC reference block may be limited to the local CTU/SB. One example is represented by the thick dashed box 1808 of fig. 18. In such an example, the CTU/SB to the left of the current CTU may be used as a reference sample region for IBC at the beginning of the current CTU reconstruction process. When such local reference regions are used, instead of allocating additional external memory space in the DPB, on-chip memory space may be allocated to preserve the local CTU/SB for IBC reference. In some implementations, fixed on-chip memory may be used for IBCs, thereby reducing the complexity of implementing IBCs in a hardware architecture. In this way, the special IBC mode independent of normal inter prediction can be implemented using on-chip memory instead of being implemented as a simple extension of inter prediction mode.
For example, the in-anchor slice memory size for storing local IBC reference samples (e.g., left CTU or SB) may be 128 x 128 for each color component. In some implementations, the maximum CTU size may also be 128×128. In this case, a reference sample memory (Reference Sample Memory, RSM) may hold samples of a single CTU size. In some other alternative implementations, the CTU size may be smaller. For example, the CTU size may be 64×64. Thus, the RSM may hold multiple (4 in this example) CTUs at the same time. In still other implementations, the RSM may hold multiple SBs, each of which may include one or more CTUs, and each of which may include multiple encoded blocks.
In some implementations of local on-chip IBC references, the on-chip RSM holds one CTU and may implement a continuous update mechanism for replacing the reconstructed samples of the left neighboring CTU with the reconstructed samples of the current CTU. Fig. 21 shows a simplified example of such a continuous RSM update mechanism at four intermediate times during the reconstruction process. In the example of fig. 21, the RSM has a fixed size that holds one CTU. CTUs may include implicit partitions. For example, the CTU may be implicitly divided into four discrete regions (e.g., quadtree division). Each region may include a plurality of encoded blocks. The CTU size may be 128 x 128, while for example quadtree partitions, the size of each example region or partition may be 64 x 64. The region/division of the RSM, which is shaded with horizontal lines, holds the respective reconstructed reference samples of the left-hand neighboring CTUs at each intermediate time, and the region/division, which is shaded with grey vertical lines, holds the respective reconstructed reference samples of the current CTU. The coded block of the RSM indicated by diagonal hatching represents the current coded block in the current region being encoded/decoded/reconstructed.
At a first intermediate time representing the start of a current CTU reconstruction, the RSM may include only the reconstructed reference samples of the CTUs adjacent to the left of each of the four example regions, as shown at 2102. At the other three intermediate times, the reconstruction process gradually replaces the reconstructed reference samples of the left-neighboring CTUs with the reconstructed samples of the current CTU. The 64 x 64 region/partition in RSM is reset when the encoder processes the first encoded block of the region/partition. Upon resetting an area of the RSM, the area is considered blank and is considered not holding any reconstructed reference samples of the IBC (in other words, the area of the RSM is not ready to be used as IBC reference samples). When processing the respective current encoded block in this region, the respective block in the RSM is filled with reconstructed samples of the respective block of the current CTU to be used as reference samples for IBC of the next current block, as shown in intermediate times 2104, 2106 and 2108 in fig. 21. Once all the coded blocks corresponding to the regions/partitions of the RSM are processed, the entire region is filled with reconstructed samples of these current coded blocks as IBC reference samples, as shown by the regions fully shaded with vertical lines at various intermediate times in fig. 21. Thus, at intermediate times 2104 and 2106, some regions/partitions in the rsm hold IBC reference samples from neighboring CTUs, some other regions/partitions hold reference samples from the current CTU entirely, and some regions/partitions hold reference samples from the current CTU and part of the blank (not used for IBC reference as a result of the reset procedure described above). When the last region (e.g., lower right region) is processed, all three other regions will hold reconstructed samples of the current CTU as reference samples of the IBC, while the last region/partition holds reconstructed samples of the corresponding coding block in the current CTU and is partially blank until the last coding block of the CTU is reconstructed, at which point the entire RSM holds reconstructed samples of the current CTU and if also coded in IBC mode, the RSM is ready for the next CTU.
Fig. 22 shows the above-described continuous update implementation of RSM spatially at a specific intermediate time, i.e., shows the CTUs adjacent to the left and the current CTUs with the current encoded block (block represented by oblique hatching). The horizontal and vertical hatching shows the corresponding reconstructed samples of the two CTUs in the RSM, which are valid as IBC reference samples of the current coding block. At a particular reconstruction time in this example, in the RSM, the process has replaced the samples covered by the unshaded region in the left-hand neighboring CTU with the region of the current CTU represented by the vertical hatching. The remaining effect samples from neighboring CTUs are shown as horizontal line shadows.
In the example implementations above, when the fixed RSM size is the same as the CTU size, the RSM is implemented to contain one CTU. In some other implementations where CTU sizes are smaller, RSMs may contain more than one CTU. For example, the CTU may be 32×32 in size, and the fixed RSM may be 128×128 in size. In this way, the RSM can hold samples of 16 CTUs. Following the same basic RSM update principle described above, the RSM may save 16 neighboring CTUs of the current 128 x 128 patch before being reconstructed. Once the processing of the first encoded block of the current 128 x 128 patch begins, the first 32 x 32 region of RSM initially filled with reconstructed samples of one neighboring CTU may be updated as described above for RSM holding a single CTU. The remaining 15 32 x 32 regions contain 15 neighboring CTUs as reference samples for IBCs. Once the CTU corresponding to the first 32 x 32 region of the current 128 x 128 patch being decoded is reconstructed, the first 32 x 32 region of the RSM is updated with the reconstructed samples of that CTU. CTUs corresponding to the second 32 x 32 region of the current 128 x 128 patch may then be processed and eventually updated with the reconstructed samples. This process continues until 16 32 x 32 regions of the RSM contain reconstructed samples (all 15 CTUs) of the current 128 x 128 patch. The decoding process then proceeds to the next 128 x 128 patch.
In some other implementations, as an extension of fig. 21 and 22, the RSM may hold a set of neighboring CTUs. The RSM portion holding the furthest neighboring CTUs is updated with the reconstructed current CTUs in the manner described above, processing one current CTU at a time. For the next current CTU, the furthest neighboring CTU in the RSM is updated and replaced again. In this way, the plurality of CTUs held in the fixed-size RSM are updated to the moving window of the neighboring CTUs of the IBS.
Fig. 23 shows another specific example implementation of a local IBC using on-chip RSM. In this example, the maximum block size of the IBC mode may be limited. For example, the largest IBC block may be 64×64. The on-chip RSM may be configured with a fixed size, e.g., 128 x 128, corresponding to the Super Block (SB). The RSM implementation in fig. 23 uses a similar underlying principle of the implementations of fig. 21 and 22. In fig. 23, the RSM may save a plurality of neighboring and/or current CTUs as IBC reference samples. In the example of FIG. 23, SB may be quadtree splitting. Accordingly, the RSM may be divided into 4 regions or units by a quadtree, each region or unit being 64×64. Each of these regions may hold one or more encoded blocks. Alternatively, each of these regions may hold one or more CTUs, and each CTU may hold one or more coded blocks. The coding order of the quadtree regions may be predefined. For example, the coding order may be upper left, upper right, lower left, lower right. Quadtree splitting of the SB in FIG. 23 is but one example. In some other alternative implementations, the SB may split in any other scheme. The RSM update implementations of the local IBC described herein apply to those alternative split schemes.
In such local IBC implementations, the local reference blocks available for IBC prediction may be limited. For example, it may be required that the reference block and the current block should be in the same SB line. In particular, the local reference block may be located in only the current SB or one SB to the left of the current SB. An example current block predicted at IBC by another allowed coding block is shown by the dashed arrow in fig. 23. When the current SB or left SB is used for IBC reference, the reference sample update procedure in RSM may follow the reset procedure described above. For example, when any of the 64×64 unit reference sample memories begins to be updated with reconstructed samples from the current SB, the previously stored reference samples (from the left SB) in the entire 64×64 unit are marked as unavailable for generating IBC prediction samples and gradually updated with reconstructed samples of the current block.
FIG. 23 shows 5 example states of RSM during local IBC decoding of a current SB in panel 2302. Likewise, the region of RSM that is shaded with horizontal lines in each example state has the corresponding reference sample of the corresponding quadtree region of the left-hand neighboring SB, and the region/division that is shaded with gray vertical lines has the corresponding reference sample of the current SB. The coding block of the RSM indicated by diagonal hatching represents the current coding block having the current quadtree region being encoded/decoded. At the beginning of the encoding of each current SB, the RSM stores samples of previously encoded SBs (RSM state (0) of fig. 23). When the current block is located in one of the four 64 x 64 quadtree regions in the current SB, the corresponding region in the RSM will be reset and used to store samples of the current 64 x 64 encoded region. In this way, samples in each 64×64 quadtree region of the RSM will be gradually updated by samples in the current SB (state (1) -state (3)). When the current SB has been fully encoded, the entire RSM is filled with all samples of the current SB (state (4)).
Each of the 64 x 64 regions in panel 2302 of fig. 23 is labeled with a spatially encoded serial number. Sequences 0-3 represent four 64 x 64 quadtree regions of the left adjacent SB, while sequences 4-7 represent four 64 x 64 quadtree regions of the current SB panel. In fig. 23, panel 2304 further illustrates the left neighbor of the reference sample in the 128 x 28RSM and the corresponding spatial distribution in the current SB for RSM states (1), (2) and (3) of panel 2302 of fig. 23. The area without cross-hatching indicates the area with reconstructed samples in the RSM. The cross-hatched area indicates that the reconstructed sample area with left SB in RSM is being reset (and thus cannot be used as a reference sample for local IBC).
The coding order of the 64 x 64 region and the corresponding RSM update order may follow a horizontal scan (as shown in fig. 23 above) or a vertical scan. Horizontal scanning starts from the upper left corner to the upper right corner, lower left corner and lower right corner. The vertical scan starts from the upper left corner, to the lower left corner, to the upper right corner, and to the lower left corner. Left neighbor SB and current SB reference sample update procedures for horizontal and vertical scans, respectively, are shown in panels 2402 and 2404 of FIG. 24 for comparison in reconstructing each of the four 64X 64 regions of the current SB. In fig. 24, a 64×64 region, which is indicated by a horizontal line hatching without a cross, represents a region having a sample available for IBC. The area hatched with cross-shaped horizontal lines represents the area of the left adjacent SB that has been updated to the corresponding reconstructed sample of the current SB. The unshaded region represents an unprocessed region of the current SB. The block indicated by the diagonal hatching represents the current coded block being processed.
As shown in fig. 24, the following restriction with respect to the reference block of IBC may be applied according to the position of the current coding block with respect to the current SB.
If the current block falls within the upper left 64×64 region of the current SB, reference samples in the lower right, lower left, and upper right 64×64 blocks of the left SB may be referred to in addition to the samples already reconstructed in the current SB, as shown in 2412 (for horizontal scan) and 2422 (for vertical scan) of fig. 24.
If the current block falls into the upper right 64×64 block of the current SB, the current block may refer to reference samples in the lower left 64×64 block and the lower right 64×64 block of the left SB if the luminance samples located at (0, 64) with respect to the current SB have not been reconstructed in addition to the samples already reconstructed in the current SB (2414 of fig. 24). Otherwise, the current block may also refer to reference samples in the lower right 64×64 block of the left SB of the IBC (2426 of fig. 24).
If the current block falls into the lower left 64×64 block of the current SB, the current block may refer to reference samples in the upper right 64×64 block and the lower right 64×64 block of the left SB if the luminance position (64, 0) with respect to the current SB has not been reconstructed in addition to the samples already reconstructed in the current SB (2424 of fig. 24). Otherwise, the current block may also refer to reference samples in the lower right 64×64 block of the left SB of the IBC (2416 of fig. 24).
If the current block falls within the lower right 64 x 64 block of the current SB, only the samples that have been reconstructed in the current SB of the IBC can be referenced (2418 and 2428 of FIG. 24).
As described above, in some example implementations, either or both of the local and non-local based CTUs/SBs may be used for search and selection of IBC reference blocks. Furthermore, when on-chip RSM is used for local reference, some limitations on the availability of already constructed CTU/SB as IBC reference with respect to write back delay may be relaxed or eliminated. Such implementations may be applied whether parallel decoding is employed or not.
An example implementation of local and non-local reference CTUs/SBs that may be used for IBCs is shown in fig. 25, where each square again represents one CTU/SB. CTU/SB indicated by diagonal hatching represents the current CTU/SB (marked as "0"), while CTU/SB indicated by horizontal hatching (marked as "1"), CTU/SB indicated by vertical hatching (marked as "2"), and CTU/SB indicated by reverse diagonal hatching (marked as "3") represent the already constructed region. The unshaded CTU/SB represents the region to be reconstructed. It is assumed that parallel decoding similar to fig. 19 and 20 is used. CTU/SB, which is shaded with vertical lines ("2") and back-diagonal lines ("3"), represents an example region of IBC reference that is typically limited to the current CTU/SB due to write-back delay to the DPB when only off-chip memory is used for IBC reference (see fig. 20). When using an on-chip RSM, one or more restricted areas of fig. 20 may be directly referenced from the RSM, and thus may not need to be restricted. The number of restricted areas that can now be accessed via the RSM for IBC reference may depend on the size of the RSM. In the example of fig. 25, it is assumed that RSM can hold one CTU/SB and employ the RSM update mechanism described above. Thus, the left adjacent CTU/SB is shaded with a vertical line and labeled "2" for local reference. The RSM then saves samples from the left CTU/SB and the current CTU/SB. Thus, in the example of fig. 25, the search area available for the non-native IBC reference block includes CTUs/SBs marked as "1" (search area 1 (SA 1) or non-native search area), the search area available for the native IBC reference block includes CTUs/SBs marked as "2" and "0" (this search area may be referred to as search area 2 (SA 2) or native search area), and the restricted area of the IBC reference block includes CTUs/SBs marked as "3" due to write-back delay. In some other implementations, with sufficient on-chip RSM size that the entire restricted CTU/SB can be saved, all of these potentially restricted areas can be included in the RSM for local reference. For example, two left neighboring blocks labeled "2" and "3" may both be included in the local search area.
In some other implementations, only the current CTU/SB labeled "0" or a portion of the current CTU/SB may be included in the RSM for local reference.
In some example implementations, the samples in SA1 may be stored in an external memory.
In some example implementations, the samples in SA2 may be stored in on-chip memory.
In some example implementations, the external memory and the on-chip memory have different hardware characteristics, such as access speed, access clock, access bandwidth, and the like.
When intra prediction is performed, special cases may occur when one block vector points to a block that is partially at SA1 and partially at SA 2. In this particular case, further restrictions or processing may need to be applied before the block is used as a prediction block.
In some example implementations, in this special case, the block pointed to by the block vector is not allowed or excluded from being used as a prediction block for IntraBC.
FIG. 26 illustrates various example blocks pointed to by various block vectors. Block a is not allowed to be used as a prediction block because it overlaps both SA1 and SA 2; block B is allowed to be used as a prediction block because it is fully included in SA 2; and allows block C to be used as a prediction block because it is fully included in SA 1.
In some example implementations, if the block vector of IntraBC points to a block (denoted B) that is partially at SA1 and partially at SA2, it is recommended to replace samples in B that overlap SA1 or to replace samples in B that overlap SA 2. The replacement of samples may be accomplished by expanding the boundary samples available for prediction. For example, to replace samples in B that overlap SA1, boundary samples of SA2 may be used; to replace the samples overlapping SA2 in B, boundary samples of SA1 may be used.
In some example implementations, the overlap region size may be used to determine to which overlap region the sample substitution should be applied. If the overlap area size between B and SA1 is greater than the overlap area size between B and SA2, then samples located at the overlap between B and SA2 are replaced, and vice versa.
In some example implementations, multiple samples may be used to determine which overlap region the sample substitution should be applied to. The number of samples covered by the overlap between B and SA1 is denoted as S1, and the number of samples covered by the overlap between B and SA2 is denoted as S2. If S1 is greater than S2 times a weighting factor (t 1) (i.e., S1 > S2 t 1), samples located in the overlap between B and SA2 are replaced, where t1 may be predefined or dynamically signaled. Similarly, if S2 is greater than S1 multiplied by a weighting factor (t 2), samples located in the overlap between B and SA1 are replaced, where t2 may be predefined or dynamically signaled.
In some example implementations, when sample substitution is applied to the overlap between B and SA1, the samples in SA2 may be used to replace the samples in the overlap. Similarly, when sample substitution is applied to the overlap between B and SA2, the sample of SA1 can be used to replace the sample in the overlap.
In some example implementations, CTUs/SB 2602 and 2604 form a non-allowed region as shown in fig. 26. However, in some other implementations CTUs/SB 2602 and/or 2604 may also be part of a local search region (or adjacent allowed search region, SA 2). For example, if the on-chip memory is large enough to hold samples in 2602 and/or 2604.
In some example implementations, a predicted implementation may use two reference blocks. Such a prediction mode may be referred to as a composite prediction mode. The composite prediction can be characterized by the following formula:
P(x,y)=(w(x,y)·P 0 (x,y)+(64-w(x,y))·P 1 (x,y)+32)>>6 (1)
wherein P is 0 (x, y) and P 1 (x, y) refers to the predicted samples from the two reference blocks for the current sample located at (x, y) in the current block, w (x, y) is the weighting factor applied to the predicted samples from the first reference block, and P (x, y) is the final composite prediction. Depending on the weighting factors and the derivation of the prediction block, different complex motion prediction modes may exist.
For example, an average predictor mode may be implemented in which both references are weighted equally. In this case, w (x, y) =32.
For example, a distance weighted predictor mode may be implemented, wherein the weighting factor may be determined by the temporal distance between the current block and its two reference blocks.
As described in the following subsections, w (x, y) is always set to 32 for SKIP mode using two reference blocks.
In some example implementations, the composite prediction may be applied to intra bc prediction blocks.
In some example implementations, the composite prediction may be further applied to a wedge mode to form a composite wedge-based prediction mode. In this mode, a set of 16 predefined two-dimensional weighting arrays may be predefined and/or hard coded for each block size. In each array, the weights are arranged in a manner projected onto a predefined wedge-shaped segmentation pattern. In each wedge-shaped division pattern, two wedge-shaped partitions are specified along a certain edge direction and position. For samples located in one of the two wedge-shaped partitions, the weight is typically set to 64. In one implementation, the weight may gradually change from 64 to 0 near the wedge-shaped partition boundary. For samples located in another wedge partition, the weight is typically set to 0. Further, along the wedge-shaped partition boundary, the weight may be assigned a value of 32.
Fig. 27 shows 16 exemplary wedge-wave segmentation patterns (a-p). In these examples, square blocks are used. Similar partitioning may also be applied to rectangular blocks. Within each block, there is a dividing line dividing the block into two partitions. Each boundary line may have a different start point, end point and different angle. In the wedge-based prediction mode, two syntax can be predefined: wedge_index, which specifies a wedge-shaped split pattern index. For example, the edge_index may range from 0 to 15, with each index value indicating a particular wedge-shaped split pattern; and a ridge sign that specifies which of the two partitions is to be assigned the dominant weight.
In some example implementations, a composite wedge-based prediction mode may be applied to the IntraBC prediction block. In this case, the IntraBC prediction block may be divided into a plurality of partitions according to a specific wedge-wave division pattern. For purposes of composite prediction, at least one partition may be assigned multiple reference blocks.
In some example implementations for IntraBC prediction, for a current block (e.g., a block currently being reconstructed), only a single block vector is used to locate the reference block. However, in many cases of screen content (e.g., text images), the reference block may perfectly match only a portion of the current block, but not the other portion of the block. Referring to fig. 28, when two text image blocks shown with the words "encoder" and "decoder" are matched, respectively, the right "encoder" portions of the two words perfectly match, whereas the left portion does not match (i.e., "en" versus "de"), and if a single vector is used to obtain the text image block "decoder" to predict the text image block "encoder", a large distortion may result.
In some example implementations, intraBC prediction may be combined with wedge wave segmentation. For IntraBC coded blocks using wedge wave segmentation, instead of using only a single block vector for prediction, multiple block vectors may be used. For example, a current block may first be partitioned into multiple regions using, for example, a wedge-wave partition mode, and then for each region, a particular block vector is used to locate the predictor associated with that region.
FIG. 28 further illustrates an example implementation using multiple block vectors. As shown in fig. 28, a text image block containing the word "encoder" is segmented using a wedge wave segmentation mode with a vertical segmentation boundary (e.g., by using partition (l) in fig. 27), and two partitions (2810 and 2812) are generated, wherein partition 2810 contains a text image with "en", and partition 2812 contains a text image with "encoder". In this example, two different block vectors BV0 and BV1 are used for the two partitions, respectively, to find their reference blocks in the current frame, namely an "en" portion 2820 in the text image block containing the word "enabled" and an "encoder" portion 2822 in the text image block containing the word "decoder".
In one implementation, for an IntraBC encoded block, a flag may be used to indicate whether a wedge wave splitting mode is applied. The flag may be signaled in the video bitstream.
In one implementation, when the wedge-wave splitting pattern is applied on the IntraBC encoded block, an index for indicating the wedge-wave splitting pattern may be signaled. For example, the index may indicate one of the wedge-wave segmentation patterns (a-p), as shown in FIG. 27.
In one implementation, when a wedge-wave segmentation pattern is applied on an IntraBC encoded block, a block vector is used to identify a reference block for each wedge-wave segmentation. The block vectors of different wedge segments may be different. For example, as shown in fig. 28, two different block vectors BV0 and BV1 are assigned to partitions 2810 and 2812, respectively.
In one implementation, a block vector for one wedge partition may be used to predict a block vector for another wedge partition in the same encoded block. For example, in fig. 28, BV0 may be used to predict BV1 and vice versa.
In one implementation, the block vector for each wedge partition may be generated from block vectors of neighboring encoded blocks encoded in intra bc mode. In one example, a predictor block vector candidate list may be established for the current block using block vectors of neighboring encoded blocks or historical (or historical) based block vectors. The index may then be signaled for locating the base block vector (or reference block vector) in the list, and the block vector prediction residual may be further signaled. The block vector of the Cheng Xiexing wave partition may then be generated by applying the vector prediction residual to the base block vector. This way of obtaining the block vector may be referred to as explicit signaling of the block vector.
In another implementation, when a wedge-wave splitting pattern is applied on an IntraBC encoded block, instead of using the approach described above, for each wedge-wave partition, a block vector may be derived from a previously encoded block vector. In one example, a predictor block vector candidate list may be established for the current block using block vectors of neighboring encoded blocks or history-based block vectors. The index may be signaled for locating the basic block vectors in the list, from which the block vector of the wedge partition may then be derived without signaling the vector prediction residual in any way. This way of obtaining the block vector may be referred to as implicit signaling of the block vector. The block vectors may be derived from the basic block vectors based on predefined or preconfigured transformations.
Alternatively, in one implementation, the above basic block vector may be used directly as a block vector for wedge-wave partitioning.
A combination of explicit signaling and implicit signaling of block vectors is allowed. For example, for wedge wave segmentation with two partitions, the first partition may use implicit signaling, while the second partition may use explicit signaling for block vector encoding. For example, in fig. 28, BV0 may be predicted using explicit signaling and BV1 may be derived using implicit signaling.
In one implementation, when combined with IntraBC prediction, only certain wedge-wave segmentation patterns are allowed. For example, these wedge wave segmentation patterns may include only segmentation patterns with horizontal boundaries (e.g., patterns h and p in fig. 27) and/or segmentation patterns with vertical segmentation boundaries (e.g., patterns d and l in fig. 27).
In one implementation, only wedge wave segmentation patterns with vertical segmentation boundaries are allowed when combined with IntraBC prediction.
In one implementation, the type of wedge-wave splitting pattern that is allowed may be signaled via a high level syntax, which may include at least one of:
video parameter set (Video Parameter Set, VPS for short) syntax;
picture parameter set (Picture Parameter Set, PPS for short) syntax;
sequence parameter set (Sequence Parameter Set, SPS for short) syntax;
an adaptive parameter set (Adaptive Parameter Set, APS) syntax;
slice header;
a picture header;
a frame header; or alternatively
Tile header.
In one implementation, the type of wedge-wave splitting mode that is allowed may be applied to various levels. For example, the type of wedge-wave splitting pattern allowed may be signaled at various headers corresponding to various block levels. The various headers may include at least one of: CTU header (so wedge-wave splitting pattern applies to the entire CTU), superblock header (so wedge-wave splitting pattern applies to the entire superblock), coding block header (so wedge-wave splitting pattern applies to the entire coding block).
In one implementation, when the wedge-wave splitting mode is applied to the IntraBC encoded block, the composite prediction mode as previously described may be further applied. In this case, the weighting factor w (x, y) used in equation 1 may be 0 or a specific non-zero value (e.g., 64). In one implementation, the weighting factors w (x, y) may be predefined and/or hard coded. In one implementation, the weighting factor w (x, y) may be signaled.
In one implementation, when a wedge-wave splitting pattern is applied to the IntraBC encoded block, a search region restriction may be further applied for each sub-partition. Referring to fig. 26, as previously described, for the current block, there are two allowed search areas: SA1 and SA2, and 2602 and 2604 represent search regions that are not allowed. In this implementation, further constraints may be imposed on the reference partitions (e.g., as shown in fig. 28, reference partition 2820 pointed to by BV0 and reference partition 2822 pointed to by BV 1): each complete reference partition should be within the same allowed reference region defined for the current block encoded in IntraBC mode. For example, the entire reference partition 2820 may need to be internal to SA1 or SA2, and the entire reference partition 2820 may also need to be internal to SA1 or SA 2. That is, the reference partition is not allowed to pass through different search areas. If the reference partition passes through both search areas, or if at least a portion of the reference partition is to be located in a search area that is not allowed, an alternative method should be used to locate the reference sample.
The present disclosure describes methods, apparatuses, and computer readable media for video encoding/decoding. The present disclosure addresses various problems of IntraBC. The methods, apparatus, and computer readable media described in this disclosure may enhance the performance of video codecs, optimizing IntraBC prediction using a wedge discrimination mode.
Fig. 29 illustrates an exemplary method 2900 for processing video data. The method 2900 may include some or all of the following steps: step 2910, receiving a video bitstream including a current block of a video frame; step 2920, extracting an IntraBC flag indicating that the current Block is predicted in an Intra Block Copy (IntraBC) mode from the video bitstream; step 2930, determining from the video bitstream that the current block is partitioned in a wedge-wave partition mode, wherein the current block is partitioned in the wedge-wave partition mode into a plurality of partitions including a first partition and a second partition; step 2940, identifying at least a first partition and a second partition of the current block; step 2950, determining a first block vector for predicting a first partition in IntraBC mode and a second block vector for predicting a second partition in IntraBC mode, respectively; and step 2960, decoding the current block based at least on the first block vector and the second block vector.
In embodiments and implementations of the present disclosure, any of the steps and/or operations may be combined or arranged in any number or order as desired. Two or more steps and/or operations may be performed in parallel. Embodiments and implementations of the present disclosure may be used alone or in combination in any order. Furthermore, each of the methods (or embodiments), encoder, and decoder may be implemented by a processing circuit (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer readable medium. The embodiments in the present invention can be applied to a luminance block or a chrominance block. The term block may be interpreted as a prediction block, a coding block or a coding unit, i.e. a CU. The term block may also be used herein to refer to a transform block. In the following items, when referring to a block size, it may refer to a block width or height, or a maximum value of width and height, or a minimum value of width and height, or an area size (width×height), or an aspect ratio of the block (width: height or height: width).
The techniques described above may be implemented as computer software using computer readable instructions and physically stored in one or more computer readable media. For example, FIG. 30 illustrates a computer system (3000) suitable for implementing certain embodiments of the disclosed subject matter.
The computer software may be encoded using any suitable machine code or computer language that may be subject to assembly, compilation, linking, or similar mechanisms to create code comprising instructions that may be executed directly by one or more computer central processing units (Central Pprocessing Units, CPU for short), graphics processing units (Graphics Processing Units, GPU for short), etc., or by interpretation, microcode, etc.
The above-described instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablet computers, servers, smartphones, gaming devices, internet of things devices, and the like.
The components of computer system (3000) shown in fig. 30 are exemplary in nature, and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Nor should the configuration of components be construed as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiment of the computer system (3000).
The computer system (3000) may include some human interface input devices. Such a human interface input device may be responsive to one or more human users by, for example, tactile input (e.g., key strokes, swipes, data glove movements), audio input (e.g., voice, clapping), visual input (e.g., gestures), olfactory input (not shown). The human interface device may also be used to capture certain media that are not directly related to the conscious input of a person, such as audio (e.g., speech, music, ambient sound), images (e.g., scanned images, photo images obtained from still image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).
The input human interface device may include one or more of the following (only one is depicted each): keyboard (3001), mouse (3002), trackpad (3003), touch screen (3010), data glove (not shown), joystick (3005), microphone (3006), scanner (3007), camera (3008).
The computer system (3000) may also include some human interface output device. Such human interface output devices may stimulate the perception of one or more human users by, for example, tactile output, sound, light, and smell/taste. Such human interface output devices may include haptic output devices (e.g., haptic feedback via a touch screen (3010), data glove (not shown) or joystick (3005), but there may also be haptic feedback devices that do not serve as input devices), audio output devices (e.g., speakers (3009), headphones (not shown)), visual output devices (e.g., screens (3010), including CRT screens, LCD screens, plasma screens, OLED screens, each with or without touch screen input capabilities, each with or without haptic feedback capabilities, some of which can output two-dimensional visual output or more than three-dimensional output by means such as stereoscopic output, virtual reality glasses (not shown), holographic displays and smoke boxes (not shown)), and printers (not shown).
The computer system (3000) may also include human-accessible storage devices and their associated media, for example, optical media including CD/DVD ROM/RW (3020) with CD/DVD or similar media (3020), thumb drive (3022), removable hard disk drive or solid state drive (3023), traditional magnetic media such as magnetic tapes and floppy disks (not shown), special purpose ROM/ASIC/PLD devices such as secure dongles (not shown), and the like.
It should also be appreciated by those skilled in the art that the term "computer-readable medium" as used in connection with the presently disclosed subject matter does not include transmission media, carrier waves or other transitory signals.
The computer system (3000) may also include an interface (3054) to one or more communication networks (3055). The network may be wireless, wired, optical. The network may also be local, wide area, metropolitan, vehicular and industrial, real-time, delay tolerant, and the like. Examples of networks include local area networks such as ethernet, wireless LAN, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., television wired or wireless wide area digital networks including cable television, satellite television, and terrestrial broadcast television, vehicle and industrial networks including CANBus, etc. Some networks typically require external network interface adapters that connect to some general data port or peripheral bus (3049) (e.g., a USB port of a computer system (3000)); other interfaces are typically integrated into the core of the computer system (3000) by connecting to a system bus as described below (e.g., an ethernet interface in a PC computer system or a cellular network interface in a smart phone computer system). Using any of these networks, the computer system (3000) may communicate with other entities. Such communications may be unidirectional, receive-only (e.g., broadcast television), unidirectional, send-only (e.g., CANbus to some CANbus devices), or bidirectional, e.g., to other computer systems using a local or wide area digital network. As described above, certain protocols and protocol stacks may be used on each of these networks and network interfaces.
The aforementioned human interface device, human accessible storage device, and network interface may be attached to a core (3040) of a computer system (3000).
The core (3040) may include one or more Central Processing Units (CPUs) (3041), graphics Processing Units (GPUs) (3042), dedicated programmable processing units in the form of Field Programmable Gate Areas (FPGAs) (3043), hardware accelerators (3044) for specific tasks, graphics adapters (3050), and the like. These devices may be connected through a system bus (3048) together with Read-only memory (ROM) 3045, random-access memory (RAM) 3046, internal mass storage (3047) such as an internal non-user accessible hard disk drive, SSD, and the like. In some computer systems, the system bus (3048) may be accessed in the form of one or more physical plugs to allow expansion of additional CPUs, GPUs, and the like. Peripheral devices may be connected to the system bus (3048) of the core either directly or through a peripheral bus (3049). In one example, a screen (3010) may be connected to a graphics adapter (3050). The architecture of the peripheral bus includes PCI, USB, etc.
The CPU (3041), GPU (3042), FPGA (3043) and accelerator (3044) may execute certain instructions which, in combination, may constitute the computer code described above. The computer code may be stored in ROM (3045) or RAM (3046). The transition data may be stored in RAM (3046) while the permanent data may be stored in, for example, internal mass storage (3047). Fast storage and retrieval of any storage device may be achieved through the use of a cache memory, which may be closely associated with one or more CPUs (3041), GPUs (3042), mass storage (3047), ROMs (3045), RAMs (3046), and the like.
The computer readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.
As a non-limiting example, a computer system (3000) with an architecture, particularly a core (3040) may provide functionality as a result of a processor (including CPU, GPU, FPGA, accelerators, etc.) executing software contained in one or more tangible computer-readable media. Such computer readable media may be media associated with a mass storage device accessible to the user as described above, as well as some storage devices having a non-transitory core (3040), such as a core internal mass storage device (3047) or ROM (3045). Software implementing various embodiments of the present disclosure may be stored in such devices and executed by the core (3040). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (3040), and in particular the processor therein (including CPU, GPU, FPGA, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (3046) and modifying such data structures according to the software-defined processes. In addition or alternatively, the computer system may provide functionality as a result of logic (e.g., the accelerator (3044)) hardwired or otherwise contained in circuitry, which may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. References to software may include logic, and vice versa, where appropriate. References to computer-readable media may include circuitry (e.g., an Integrated Circuit (IC)), containing logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware and software.
While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope of the present disclosure.
Appendix a: abbreviations
IBC: intra-Block Copy, intra Block Copy
IntraBC: intra-Block Copy, intra Block Copy
JEM: joint exploration model common exploration mode
VVC: versatile video coding multifunctional video coding
BMS: benchmark set
MV: motion Vector, motion Vector
HEVC: high Efficiency Video Coding high efficiency video coding
SEI: supplementary Enhancement Information, supplemental enhancement information
VUI: video Usability Information video availability information
GOP: groups of Pictures, group of pictures
TU: transform Units
PU: prediction Units
CTU: coding Tree Units coding tree unit
CTB: coding Tree Blocks coding tree blocks
PB: prediction Blocks prediction block
HRD: hypothetical Reference Decoder, assume a reference decoder
SNR: signal Noise Ratio signal to noise ratio
CPU: central Processing Units, CPU
GPU: graphics Processing Units, graphic processing unit
CRT: cathode Ray Tube
LCD: liquid-Crystal Display, liquid Crystal Display
OLED: organic Light-Emitting Diode, organic Light-Emitting Diode
CD: compact Disc, optical Disc
DVD: digital Video Disc digital video disc
ROM: read-Only Memory
RAM: random Access Memory random access memory
ASIC: application-Specific Integrated Circuit, application-specific integrated circuit
PLD: programmable Logic Device programmable logic device
LAN: local Area Network local area network
GSM: global System for Mobile communications, global System for Mobile communications LTE: long-Term Evolution
CANBus: controller Area Network Bus local area network bus of controller
USB: universal Serial Bus Universal Serial bus
PCI: peripheral Component Interconnect peripheral component interconnect
And (3) FPGA: field Programmable Gate Areas field programmable gate region
SSD: solid-state drive and solid-state disk
IC: integrated Circuit integrated circuit
HDR: high dynamic range high dynamic range
SDR: standard dynamic range Standard dynamic Range
Jfet: joint Video Exploration Team, association video exploration team
MPM: most probable mode most probable mode
WAIP: wide-angle intra prediction
CU: coding Unit
PU: prediction Unit
TU: transform Unit, and Transform method
CTU: coding Tree Unit, coding Tree Unit
PDPC: position Dependent Prediction Combination position-dependent predictive combining
ISP: intra Sub-Partitions
SPS: sequence Parameter Setting sequence parameter set-up
PPS: picture Parameter Set, picture parameter set
APS: adaptation Parameter Set adaptive parameter set
VPS: video Parameter Set video parameter set
DPS: decoding Parameter Set decoding parameter set
ALF: adaptive Loop Filter adaptive loop filter
SAO: sample Adaptive Offset sample adaptive offset
CC-ALF: cross-Component Adaptive Loop Filter, cross-component adaptive loop filter
CDEF: constrained Directional Enhancement Filter constraint direction enhancement filter
CCSO: cross-Component Sample Offset, cross-component sample offset
LSO: local Sample Offset local sample offset
LR: loop Restoration Filter Loop recovery filter
AV1: AOMedia Video 1, AOMedia Video 1
AV2: AOMedia Video 2, AOMedia Video 2
RPS: reference Picture Set, reference Picture set
DPB: decoded Picture Buffer decoded picture buffer
MMVD: merge Mode with Motion Vector Difference merge mode with motion vector difference
IntraBC or IBC: intra Block Copy, intra Block Copy
BV: block Vector, block Vector
BVD: block Vector Difference block vector difference
RSM: reference Sample Memory reference sample memory

Claims (20)

1. A method for processing video data, the method comprising:
receiving a video bitstream comprising a current block of video frames;
extracting an IntraBC flag from the video bitstream indicating that the current block is predicted in an intra block copy IntraBC mode;
determining that the current block is segmented in a wedge-wave segmentation mode from the video bit stream, wherein the current block is segmented in the wedge-wave segmentation mode into a plurality of partitions including a first partition and a second partition;
Identifying at least the first partition and the second partition of the current block;
determining a first block vector for predicting the first partition in the IntraBC mode and a second block vector for predicting the second partition in the IntraBC mode, respectively; and
the current block is decoded based at least on the first block vector and the second block vector.
2. The method of claim 1, wherein determining from the video bitstream that the current block is partitioned in the wedge-wave partition mode comprises:
extracting a first indicator from the video bitstream for indicating the wedge-wave splitting mode of the current block; and
based on the wedge-wave segmentation mode indicator, determining that the current block is segmented in the wedge-wave segmentation mode.
3. The method according to claim 1, wherein:
the method also includes extracting, from the video bitstream, a second indicator associated with the current block and indicating a pattern of the wedge-wave segmentation mode; and
identifying the current block is that the first partition and the second partition include: the first partition and the second partition are identified based on the second indicator.
4. A method according to claim 3, wherein the pattern of the wedge-wave splitting pattern comprises one of:
a vertical partition pattern in which the first partition and the second partition are divided by a vertical boundary in the current block; or alternatively
And a horizontal division pattern in which the first partition and the second partition are divided by a horizontal boundary.
5. The method of claim 3, wherein extracting the second indicator from the video bitstream comprises:
extracting the second indicator from the video bitstream via a high level syntax, the high level syntax including at least one of:
video parameter set VPS syntax;
picture parameter set PPS syntax;
sequence parameter set SPS syntax;
an adaptive parameter set APS grammar;
slice header;
a picture header;
a frame header; or alternatively
Tile header.
6. A method according to claim 3, wherein:
extracting the second indicator from the video bitstream includes extracting the second indicator from the video bitstream via a block-level signal;
the block level signal is transmitted in one of:
a coding tree unit CTU header;
super block header; or alternatively
Encoding a block header; and
The second indicator is adapted for one of:
in response to the block level signal transmitted in the CTU header, CTU includes the current block;
responsive to the block level signal transmitted in the superblock header, a superblock includes the current block; or alternatively
In response to the block level signal transmitted in the coded block header, a coded block includes the current block.
7. The method of claim 1, wherein the first block vector and the second block vector are different.
8. The method of claim 7, wherein extracting the second block vector of the second partition comprises predicting the second block vector based on the first block vector.
9. The method of claim 1, further comprising:
generating a candidate block vector list based on at least one of a block vector or a history block vector from a neighboring block of the current block, the candidate block vector list including at least one candidate block vector;
extracting a third indicator from the video bitstream that indicates a target block vector in the candidate block vector list;
selecting the target block vector from the candidate block vector list according to the third indicator; and
At least one of the first block vector or the second block vector is generated based on the target block vector.
10. The method of claim 9, wherein generating at least one of the first block vector or the second block vector based on the target block vector comprises:
extracting a vector prediction residual associated with the first block vector from the video bitstream; and
the first block vector is generated based on the target block vector and the vector prediction residual.
11. The method of claim 9, wherein generating at least one of the first block vector or the second block vector based on the target block vector comprises deriving the second block vector from the target block vector.
12. The method of claim 11, wherein deriving the second block vector from the target block vector comprises: the second block vector is generated from the target block vector using a predefined transformation.
13. The method of claim 1, wherein the first block vector points to an IntraBC prediction block in a current frame of the current block, and the IntraBC prediction block is determined by an encoder of the video bitstream by:
Determining a first search region in the video frame, wherein the first search region is a first candidate region for locating the IntraBC prediction block, wherein the first search region does not overlap with the current block and comprises a block list, the IntraBC prediction block being a prediction block for performing IntraBC prediction on the first partition;
determining a second search region, wherein the second search region is a second candidate region for locating the IntraBC prediction block, the second search region comprising at least one of: (i) A sub-block of the current block, and (ii) a neighboring block of the current block; and
the IntraBC prediction block pointed to by the first block vector that does not pass through a search region is identified.
14. The method according to claim 13, wherein:
the upper left pixel of the current block has a coordinate position (x 0, y 0);
the top left pixel of each block in the block list has a coordinate position (x, y);
y is less than y0; and
x is less than [ x0+2 (y 0-y) -D ],
where x0, y0, x and y are non-negative numbers and D is the number of immediate reconstructed blocks limited by IntraBC mode.
15. A method for processing video data, the method comprising:
Receiving a video bitstream comprising a current block of a video frame, wherein the current block is predicted in an intra block copy, intraBC, mode using composite prediction;
determining that the current block is divided into at least a first partition and a second partition in a wedge-wave division mode;
determining at least two reference blocks for the first partition;
determining a composite reference block based on a weighted sum of the at least two reference blocks; and
reconstructing the first partition based on the composite reference block.
16. The method according to claim 15, wherein:
the at least two reference blocks include a first reference block and a second reference block; and
for each sample in the first partition, determining the composite reference block includes:
weighting a corresponding first prediction sample in the first reference block using a first weighting factor to obtain a first weighted prediction sample;
weighting a respective second prediction sample in the second reference block using a second weighting factor to obtain a second weighted prediction sample, wherein a sum of the first weighting factor and the second weighting factor is constant; and
the composite reference block is determined based on a sum of the first weighted prediction samples and the second weighted prediction samples.
17. The method according to claim 16, wherein:
predefining or signaling the first weighting factor in the video bitstream; and
the value of the first weighting factor includes one of 0, 64, or a positive integer.
18. The method of claim 16, wherein a sum of the first weighting factor and the second weighting factor is equal to 64.
19. An apparatus for processing video data, the apparatus comprising a memory for storing computer instructions and a processor in communication with the memory, wherein the processor, when executing the computer instructions, is configured to cause the apparatus to:
receiving a video bitstream comprising a current block of video frames;
extracting an IntraBC flag from the video bitstream indicating that the current block is predicted in an intra block copy IntraBC mode;
determining that the current block is segmented in a wedge-wave segmentation mode from the video bit stream, wherein the current block is segmented in the wedge-wave segmentation mode into a plurality of partitions including a first partition and a second partition;
identifying at least the first partition and the second partition of the current block;
Determining a first block vector for predicting the first partition in the IntraBC mode and a second block vector for predicting the second partition in the IntraBC mode, respectively; and
the current block is decoded based at least on the first block vector and the second block vector.
20. An apparatus comprising circuitry configured to perform the method of claim 15.
CN202280019661.1A 2021-10-28 2022-10-27 IntraBC using wedge wave segmentation Pending CN117063464A (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202163273074P 2021-10-28 2021-10-28
US63/273,074 2021-10-28
US63/289,129 2021-12-13
US17/974,068 2022-10-26
US17/974,068 US20230135166A1 (en) 2021-10-28 2022-10-26 Intrabc using wedgelet partitioning
PCT/US2022/048073 WO2023076505A2 (en) 2021-10-28 2022-10-27 Intrabc using wedgelet partitioning

Publications (1)

Publication Number Publication Date
CN117063464A true CN117063464A (en) 2023-11-14

Family

ID=88690462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280019661.1A Pending CN117063464A (en) 2021-10-28 2022-10-27 IntraBC using wedge wave segmentation

Country Status (1)

Country Link
CN (1) CN117063464A (en)

Similar Documents

Publication Publication Date Title
CN113301333A (en) Method and device for video decoding
CN117063471A (en) Joint signaling method for motion vector difference
CN115398918A (en) Method and apparatus for video encoding
US20230103405A1 (en) Method and apparatus for intra block copy prediction with sample padding
CN113574895A (en) Improvement of prediction combination mode related to inter-frame position
CN116325723B (en) Method, computer device and medium for video decoding
CN117356098A (en) Deriving offsets in cross-component transform coefficient level reconstruction
CN116830581A (en) Improved signaling method and apparatus for motion vector differences
CN116941243A (en) Joint coding of adaptive motion vector difference resolution
CN115176461A (en) Video coding and decoding method and device
CN117063464A (en) IntraBC using wedge wave segmentation
US20230135166A1 (en) Intrabc using wedgelet partitioning
US20230086077A1 (en) Method and apparatus for intra block copy (intrabc) mode coding with search range restrictions
CN116584092B (en) Video block decoding method, device and storage medium
US20230093129A1 (en) Method and apparatus for intra block copy mode coding with search range switching
CN117693934A (en) Luminance and chrominance prediction based on merged chrominance blocks
CN116368801A (en) Horizontal reconstruction of cross-component transform coefficients
CN117296320A (en) Using mapping and different types of luma-to-chroma prediction
CN116783888A (en) Improved intra mode coding
CN117242772A (en) Luminance dependent chroma prediction using neighboring luminance samples
CN116686287A (en) MVD scaling for joint MVD coding
CN117203965A (en) Palette predictor generation and signaling
CN118285101A (en) Temporal motion vector predictor candidate search
CN116458159A (en) Skip transform flag coding
KR20230136646A (en) Methods and devices for refining motion vector candidates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40098264

Country of ref document: HK