CN113228632B - Encoder, decoder, and corresponding methods for local illumination compensation - Google Patents

Encoder, decoder, and corresponding methods for local illumination compensation Download PDF

Info

Publication number
CN113228632B
CN113228632B CN202080007327.5A CN202080007327A CN113228632B CN 113228632 B CN113228632 B CN 113228632B CN 202080007327 A CN202080007327 A CN 202080007327A CN 113228632 B CN113228632 B CN 113228632B
Authority
CN
China
Prior art keywords
block
mvp
video
unit
reference samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080007327.5A
Other languages
Chinese (zh)
Other versions
CN113228632A (en
Inventor
马克西姆·鲍里索维奇·西切夫
蒂莫菲·米哈伊洛维奇·索洛维耶夫
亚历山大·亚历山德罗维奇·卡拉布托夫
谢尔盖·尤里耶维奇·伊科宁
陈建乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN113228632A publication Critical patent/CN113228632A/en
Application granted granted Critical
Publication of CN113228632B publication Critical patent/CN113228632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An apparatus and method for inter prediction of a block are provided. The method includes estimating Local Illumination Compensation (LIC) parameters by using reference samples of a current block and reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional Motion Vector (MV); and obtaining the inter prediction of the current block according to the LIC parameters. This scheme reduces latency by discarding one phase.

Description

Encoder, decoder, and corresponding methods for local illumination compensation
Technical Field
Embodiments of the present application (disclosure) relate generally to the field of image processing and, more particularly, to Local Illumination Compensation (LIC) based on integer-pixel motion compensation.
Background
Video codecs (video coding and video decoding) are used in a wide range of digital video applications such as broadcast digital television, video transmission over the internet and mobile networks, real-time conversation applications (e.g. video chat, video conferencing), DVD and blu-ray discs, video content acquisition and editing systems, and camcorders for security applications.
Even if a short video is described, the amount of video data required may be large, which may cause difficulties in streaming or otherwise transferring the data over a communication network with limited bandwidth capacity. Thus, video data is typically compressed before being transmitted over modern telecommunication networks. The size of the video may also be an issue when stored on a storage device, as storage resources may be limited. Prior to transmission or storage, video data is typically encoded at the source by a video compression device using software and/or hardware, thereby reducing the amount of data required to represent digital video pictures. The compressed data is then received at the destination by a video decompression device that decodes the video data. Due to limited network resources and the growing demand for higher video quality, there is a need for improved compression and decompression techniques that improve compression rates with little sacrifice in image quality.
Disclosure of Invention
Embodiments of the present application provide apparatuses and methods for encoding and decoding according to the independent claims.
The above and other objects are achieved by the subject matter of the independent claims. Other embodiments are apparent from the dependent claims, the description and the drawings.
According to a first aspect, an embodiment of the present invention relates to a method of coding, wherein coding comprises decoding or encoding, the method comprising: estimating a Local Illumination Compensation (LIC) parameter by using reference samples of a current block and reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional Motion Vector (MV); and obtaining the inter prediction of the current block according to the LIC parameters.
The method according to the first aspect of the invention may be performed by an apparatus according to the second aspect of the invention. The apparatus according to the second aspect of the invention comprises: an obtaining unit for obtaining reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional Motion Vector (MV); and an estimating unit for estimating a Local Illumination Compensation (LIC) parameter by using the reference sample of the current block and the reference sample of the reference block.
The obtaining unit is further configured to obtain inter prediction of the current block according to the LIC parameter.
Further features and embodiments of the device according to the second aspect of the invention correspond to features and embodiments of the method according to the first aspect of the invention.
According to a third aspect, embodiments of the present invention are directed to an apparatus for decoding a video stream, the apparatus comprising a processor and a memory. The memory stores instructions for causing the processor to perform the method according to the first aspect.
According to a fourth aspect, embodiments of the present invention are directed to an apparatus for encoding a video stream, the apparatus comprising a processor and a memory. The memory stores instructions for causing the processor to perform the method according to the first aspect.
According to a fifth aspect, there is provided a computer-readable storage medium having instructions stored thereon, which when executed, cause one or more processors to use for encoding and decoding video data. The above instructions cause one or more processors to perform a method according to the above first aspect or second aspect, or any possible embodiment of the first aspect.
According to a sixth aspect, embodiments of the invention relate to a computer program comprising program code for performing a method according to the first aspect or any possible embodiment of the first aspect when executed on a computer.
According to an embodiment of the present invention, at least one of the reference samples of the reference block is obtained by using an integer part of the fractional MV. This scheme reduces latency by discarding one phase. Therefore, when the reference sample (i.e., the reference sample) of the reference block includes the fractional MV, inter prediction of the current block is simplified.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Embodiments of the invention are described in more detail below with reference to the accompanying drawings, in which:
FIG. 1A is a block diagram illustrating an example of a video codec system for implementing embodiments of the present invention;
FIG. 1B is a block diagram illustrating another example of a video codec system for implementing an embodiment of the invention;
FIG. 2 is a block diagram illustrating an example of a video encoder for implementing an embodiment of the present invention;
FIG. 3 is a block diagram showing an example structure of a video decoder for implementing an embodiment of the present invention;
FIG. 4 is a diagram showing adjacent samples for deriving IC parameters;
fig. 5 is a block diagram showing an example of an encoding apparatus or a decoding apparatus;
fig. 6 is a block diagram showing another example of an encoding apparatus or a decoding apparatus;
FIG. 7a is a diagram of a design of a proposed pipeline with Local Illumination Compensation (LIC);
FIG. 7b is an exemplary diagram of Local Illumination Compensation (LIC) based on integer pixel motion compensation;
FIG. 8 is a flowchart illustrating an exemplary inter prediction of a current block by applying LIC;
FIG. 9 is a block diagram illustrating an example structure of an apparatus for inter-predicting a current block by applying LIC;
fig. 10 is a block diagram showing an example structure of a content providing system 3100 that implements a content delivery service; and
fig. 11 is a block diagram showing an example structure of a terminal device;
in the following, identical reference numerals denote identical or at least functionally equivalent features, if not explicitly stated otherwise.
Detailed Description
In the following description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific aspects of embodiments of the invention or in which embodiments of the invention may be used. It should be understood that embodiments of the invention may be used in other respects, and include structural or logical changes not shown in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense.
For example, it is to be understood that the disclosure relating to the described method is also applicable to a corresponding device or system for performing the method, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may include one or more units (e.g., functional units) to perform the described one or more method steps (e.g., one unit performs one or more steps, or multiple units each perform one or more of the multiple steps), even if such one or more units are not explicitly described or shown in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units (e.g., functional units), the corresponding method may include one step to perform the function of the one or more units (e.g., one step performs the function of the one or more units, or multiple steps each perform the function of one or more units of the multiple units), even if such one or more steps are not explicitly described or shown in the figures. Furthermore, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.
Video coding generally refers to the processing of a sequence of images that form a video or video sequence. In the field of video coding, the terms "image", "frame" or "picture" may be used as synonyms. Video codecs (or codecs in general) consist of two parts: video encoding and video decoding. Video encoding is performed on the source side, typically including processing (e.g., by compressing) the original video image to reduce the amount of data required to represent the video image (for more efficient storage and/or transmission). Video decoding is performed at the destination side and typically includes inverse processing compared to the encoder to reconstruct the video image. Embodiments that relate to "coding" of video images (or images in general) are understood to relate to "encoding" or "decoding" of video images or corresponding video sequences. The combination of the encoded part and the decoded part is also called CODEC (Coding and Decoding).
In the case of lossless video codec, the original video image can be reconstructed, i.e. the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video codec, further compression is performed, e.g. by quantization, to reduce the amount of data representing the video image, which cannot be fully reconstructed at the decoder, i.e. the quality of the reconstructed video image is lower or worse compared to the quality of the original video image.
Several video coding standards belong to the group of "lossy hybrid video codecs" (i.e., the combination of spatial and temporal prediction in the pixel domain and 2D transform codec for applying quantization in the transform domain). Each picture of a video sequence is typically partitioned into a set of non-overlapping blocks, and the coding is typically performed at the block level. In other words, at the encoder, video is typically processed (i.e., encoded) at the block (video block) level by: for example, spatial (intra-picture) prediction and/or temporal (inter-picture) prediction is used to generate a prediction block, the prediction block is subtracted from the current block (the current processed/block to be processed) to obtain a residual block, the residual block is transformed and quantized in the transform domain to reduce the amount of data to be transmitted (compression), while at the decoder, inverse processing compared to the encoder is applied to the encoded or compressed block to reconstruct the current block for presentation. Furthermore, the encoder replicates the decoder processing loop so that both will produce the same prediction (e.g., intra-prediction and inter-prediction) and/or reconstruction for processing (i.e., codec) subsequent blocks.
In the following embodiments of the video codec system 10, the video encoder 20 and the video decoder 30 are described based on fig. 1A to 3.
Fig. 1A is a schematic block diagram illustrating an example codec system 10, such as a video codec system 10 (or simply codec system 10), that may use the techniques of the present application. Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) of video codec system 10 represent examples of devices that may be used to perform techniques in accordance with various examples described herein.
As shown in fig. 1A, codec system 10 includes a source device 12, source device 12 for providing encoded picture data 21, e.g., to a destination device 14, for decoding encoded picture data 21.
Source device 12 includes an encoder 20 and may additionally, i.e., optionally, include an image source 16, a pre-processor (or pre-processing unit) 18 (e.g., image pre-processor 18), and a communication interface or unit 22.
Image sources 16 may include or may be any kind of image capture device (e.g., a camera for capturing real-world images), and/or any kind of image generation device (e.g., a computer image processor for generating computer-animated images), or any kind of device for obtaining and/or providing real-world images, computer-generated images (e.g., screen content, Virtual Reality (VR) images), and/or any combination thereof (e.g., Augmented Reality (AR) images). The image source may be any kind of memory or storage device that stores any of the aforementioned images.
Unlike preprocessor 18 and the processing performed by preprocessing unit 18, image or image data 17 may also be referred to as an original image or original image data 17.
Preprocessor 18 is to receive (raw) image data 17 and perform preprocessing on image data 17 to obtain preprocessed image 19 or preprocessed image data 19. The pre-processing performed by pre-processing unit 18 may include, for example, cropping, color format conversion (e.g., from RGB to YCbCr), color correction, or denoising. It should be understood that the pre-processing unit 18 may be an optional component.
Video encoder 20 is operative to receive pre-processed image data 19 and provide encoded image data 21 (further details are described below, e.g., based on fig. 2).
Communication interface 22 of source device 12 may be used to receive encoded image data 21 and send encoded image data 21 (or any further processed version thereof) over communication channel 13 to another device (e.g., destination device 14 or any other device) for storage or direct reconstruction.
Destination device 14 includes a decoder 30 (e.g., a video decoder 30), and may additionally, i.e., optionally, include a communication interface or unit 28, a post-processor 32 (or post-processing unit 32), and a display device 34.
Communication interface 28 of destination device 14 is to receive encoded image data 21 (or any further processed version thereof), such as to receive encoded image data 21 directly from source device 12 or from any other source (e.g., a storage device, such as an encoded image data storage device), and to provide encoded image data 21 to decoder 30.
Communication interface 22 and communication interface 28 may be used to send and receive encoded image data 21 or encoded data 13 via a direct communication link between source device 12 and destination device 14, such as a direct wired or wireless connection, or via any kind of network, such as a wired or wireless network or any combination thereof, or any kind of private and public network or any kind of combination thereof.
Communication interface 22 may be used, for example, to encapsulate encoded image data 21 into a suitable format (e.g., packets) and/or to process the encoded image data using any kind of transport encoding or processing for transmission over a communication link or communication network.
Communication interface 28, which forms a corresponding part of communication interface 22, may be used, for example, to receive transmitted data and decode or process and/or decapsulate using any kind of corresponding transmission to obtain encoded image data 21.
Both communication interface 22 and communication interface 28 may be configured as a one-way communication interface, as indicated by the arrow pointing from source device 12 to destination device 14 of communication channel 13 in fig. 1A, or a two-way communication interface, and may be used, for example, to send and receive messages, for example, to establish a connection, acknowledge and exchange any other information related to a communication link and/or data transmission (e.g., encoded image data transmission).
Decoder 30 is operative to receive encoded image data 21 and provide decoded image data 31 or decoded image 31 (further details will be described below, e.g., based on fig. 3 or 5).
The post-processor 32 of the destination device 14 is configured to post-process the decoded image data 31 (also referred to as reconstructed image data) (e.g., decoded image 31) to obtain post-processed image data 33 (e.g., post-processed image 33). Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color correction, cropping, or resampling, or any other processing, for example, to prepare decoded image data 31 for display, for example, by display device 34.
The display device 34 of the destination device 14 is used to receive post-processed image data 33 for displaying the image to, for example, a user or viewer. The display device 34 may be or comprise any kind of display for representing the reconstructed image, such as an integrated or external display or monitor. The display may, for example, include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other display of any kind.
Although fig. 1A depicts the source device 12 and the destination device 14 as separate devices, embodiments of the devices may also include two devices or two functions (source device 12 or corresponding function and destination device 14 or corresponding function). In such embodiments, the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
As will be apparent to those skilled in the art based on the description, the existence and (exact) division of functions of different units within source device 12 and/or destination device 14 as shown in fig. 1A may vary depending on the actual device and application.
Encoder 20 (e.g., video encoder 20) and/or decoder 30 (e.g., video decoder 30) may be implemented via processing circuitry as shown in fig. 1B, such as one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, dedicated to video encoding and decoding, or any combination thereof. Encoder 20 may be implemented via processing circuitry 46 to embody the various modules as discussed with reference to encoder 20 of fig. 2 and/or any other encoder system or subsystem described herein. Decoder 30 may be implemented via processing circuitry 46 to embody the various modules as discussed with reference to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. The processing circuitry may be used to perform various operations as described below. As shown in fig. 5, if the techniques described above are implemented in part in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. For example, as shown in fig. 1B, either of video encoder 20 and video decoder 30 may be integrated as part of a combined encoder/decoder (CODEC) in a single device.
Source device 12 and destination device 14 may comprise any of a variety of devices, including any of a variety of handheld or fixed devices, such as notebook or laptop computers, mobile phones, smart phones, tablet or tablet computers, cameras, desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (e.g., content service servers or content delivery servers), broadcast receiver devices, broadcast transmitter devices, etc., and may be provided without or with any of a variety of operating systems. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Thus, source device 12 and destination device 14 may be wireless communication devices.
In some cases, the video codec system 10 shown in fig. 1A is merely an example, and the techniques of this application may be applied to a video codec device (e.g., video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. In other examples, data is retrieved from local storage, streamed over a network, and so forth. A video encoding device may encode and store data to a memory, and/or a video decoding device may retrieve and decode data from a memory. In some examples, the above encoding and decoding is performed by devices that do not communicate with each other, but simply encode data to and/or retrieve and decode data from memory.
For convenience of description, embodiments of the present invention are described herein with reference to reference software for high-efficiency video coding (HEVC) or general video coding (VVC) developed by joint video coding team (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and Motion Picture Experts Group (MPEG) video coding joint assistance group. One of ordinary skill in the art will appreciate that embodiments of the present invention are not limited to HEVC or VVC.
Encoder and encoding method
Fig. 2 shows a schematic block diagram of an example video encoder 20 for implementing the techniques of the present application. In the example of fig. 2, the video encoder 20 includes an input 201 (or input interface 201), a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, and an inverse transform unit 212, a reconstruction unit 214, a loop filter unit 220, a Decoded Picture Buffer (DPB)230, a mode selection unit 260, an entropy encoding unit 270, and an output 272 (or output interface 272). The mode selection unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a division unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 as shown in fig. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.
The residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the mode selection unit 260 may be referred to as forming a forward signal path of the encoder 20, and the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB)230, the inter prediction unit 244, and the intra prediction unit 254 may be referred to as forming an inverse signal path of the video encoder 20, wherein the inverse signal path of the video encoder 20 corresponds to a signal path of a decoder (see the video decoder 30 in fig. 3). Inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, Decoded Picture Buffer (DPB)230, inter prediction unit 244, and intra prediction unit 254 are also referred to as "built-in decoders" that form video encoder 20.
Multi-hypothesis prediction for intra and inter modes
Intra modes are improved using multi-hypothesis prediction, which combines one intra prediction and one merge index inter prediction. In the merging CU, when a flag is true, the flag is signaled to the merging mode to select an inter mode from the inter candidate list. For the luminance component, an intra candidate list is derived from 4 intra prediction modes including DC, planar, horizontal, and vertical modes, and the size of the intra candidate list may be 3 or 4 depending on the shape of the block. When the CU width is greater than twice the CU height, the horizontal mode is excluded from the intra-mode list, and when the CU height is greater than twice the CU width, the vertical mode is removed from the intra-mode list. One intra prediction mode selected by the intra mode index and one merge index prediction selected by the merge index are combined using a weighted average. For the chroma component, DM is always applied without extra signaling. The weights used for the combined prediction are described as follows. When either DC mode or planar mode is selected, or CB width or CB height is less than 4, equal weight is applied. For those CB's whose CB width and CB height are greater than or equal to 4, when the horizontal/vertical mode is selected, one CB is first vertically/horizontally divided into four equal-area regions. Each weight set (denoted as (w _ intra, w _ inter), where i is from 1 to 4 and (w _ intra1, w _ inter1) ═ 6, 2), (w _ intra2, w _ inter2) ═ 5, 3, (w _ intra3, w _ inter3) ═ 3, 5), and (w _ intra4, w _ inter4) ═ 2, 6) will be applied to the corresponding region. (w _ intra1, w _ inter1) for the region closest to the reference sample, (w _ intra4, w _ inter4) for the region farthest from the reference sample. Then, by adding and right-shifting the two weighted predictions by 3 bits, a combined prediction can be computed. In addition, the intra-prediction mode for the intra-hypothesis of the prediction value may be saved for reference by subsequent neighboring CUs.
Local luminance compensation (LIC)
Local Illumination Compensation (LIC) is based on a linear model of the illumination variation, using a scaling factor a and an offset b. And LIC is adaptively enabled or disabled for each inter mode coded Coding Unit (CU). When LIC is applied to a CU, the parameters a and b are derived using the least squares error method by using neighboring samples of the current CU and their corresponding reference samples. More specifically, as shown in fig. 4, neighboring samples of the downsampling (2: 1 downsampling) of the CU and corresponding samples in the reference image (identified by motion information of the current CU or sub-CU) are used. IC parameters are derived and applied separately to each prediction direction.
When a CU is coded by merge mode, copying LIC flags from neighboring blocks in a manner similar to motion information copying in merge mode; otherwise, the LIC flag is signaled to the CU to indicate whether LIC is applied.
When LIC is enabled for an image, an additional CU level RD check is needed to determine whether to apply LIC to the CU. When LIC is enabled for a CU, SAD and SATD are replaced with mean-removed sum of absolute differences (MR-SAD) and mean-removed sum of absolute Hadamard-transformed differences (MR-SATD), respectively, for integer-pixel motion search and fractional-pixel motion search.
Image & image partitioning (image & block)
The encoder 20 may be used for receiving images 17 (or image data 17), e.g. via the input 201, e.g. images forming an image sequence of a video or video sequence. The received image or image data may also be a pre-processed image 19 (or pre-processed image data 19). For simplicity, the following description refers to image 17. The image 17 may also be referred to as a current image or an image to be coded (in particular in video coding, for distinguishing the current image from other images, such as previously coded and/or decoded images of the same video sequence (i.e. a video sequence also comprising the current image)).
The (digital) image is or can be considered as a two-dimensional array or matrix of samples having intensity values. The samples in the array may also be referred to as pixels (short versions of picture elements) or "pels". The number of samples in the horizontal and vertical directions (or axes) of the array or image defines the size and/or resolution of the image. To represent color, three color components are typically employed, i.e., the image may be represented by or include three sample arrays. In the RBG format or color space, the image includes respective arrays of red, green, and blue samples. However, in video codec, each pixel is typically represented in a luminance format and a chrominance format or color space (e.g., YCbCr), which includes a luminance component indicated by Y (sometimes L is also used instead) and two chrominance components indicated by Cb and Cr. The luminance (or luma) component Y represents the luminance or gray-scale intensity (e.g., in a gray-scale image), while the two chrominance (or chroma) components Cb and Cr represent the chrominance or color information components. Thus, an image in YCbCr format includes a luminance sample array of luminance sample values (Y) and two chrominance sample arrays of chrominance values (Cb and Cr). An image in RGB format may be converted or transformed into YCbCr format and vice versa, a process also known as color transformation or conversion. If the image is monochromatic, the image may include only an array of luma samples. Thus, the image may be, for example, an array of intensity samples in a monochrome format or 4: 2: 0. 4: 2: 2. and 4: 4: an array of luma samples in a4 color format and two corresponding arrays of chroma samples.
An embodiment of the video encoder 20 may comprise an image dividing unit (not shown in fig. 2) for dividing the image 17 into a plurality of (typically non-overlapping) image blocks 203. These blocks may also be referred to as root blocks, macroblocks (h.264/AVC), or Coding Tree Blocks (CTBs) or Coding Tree Units (CTUs) (h.265/HEVC and VVC). The image dividing unit may be adapted to use the same block size for all images of the video sequence and to use a corresponding grid defining the above block sizes, or to change the block size between images or subsets or groups of images and to divide each image into corresponding blocks.
In other embodiments, the video encoder may be configured to receive the blocks 203 of the image 17 directly, e.g., to form one, several, or all of the blocks of the image 17. The image block 203 may also be referred to as a current image block or an image block to be coded.
Similar to the image 17, the image blocks 203 are or may be considered as a two-dimensional array or matrix of samples having intensity values (sample values), although the dimensions of the image blocks 203 are smaller than the image 17. In other words, block 203 may include, for example, one sample array (e.g., a luma array in the case of a monochrome image 17, or a luma array or a chroma array in the case of a color image 17) or three sample arrays (e.g., a luma array and two chroma arrays in the case of a color image 17), or any other number and/or kind of arrays, depending on the color format applied. The number of samples in the horizontal and vertical directions (or axes) of the block 203 defines the size of the block 203. Thus, a block may be, for example, an array of MxN (M columns by N rows) samples or an array of MxN transform coefficients.
The embodiment of video encoder 20 shown in fig. 2 may be used to encode image 17 on a block-by-block basis, e.g., to perform encoding and prediction for each block 203.
The embodiment of video encoder 20 shown in fig. 2 may also be used to divide and/or encode pictures by using slices (also referred to as video slices), where the pictures may be divided into or encoded using one or more slices (typically non-overlapping), and each slice may include one or more blocks (e.g., CTUs).
The embodiment of the video encoder 20 shown in fig. 2 may also be used to divide and/or encode pictures by using tile groups (also referred to as video tile groups) and/or tiles (also referred to as video tiles), wherein a picture may be divided into one or more tile groups (typically non-overlapping) or encoded using one or more tile groups (typically non-overlapping), and each tile group may comprise, for example, one or more blocks (e.g., CTUs) or one or more tiles, wherein each tile may be, for example, rectangular in shape and may comprise one or more blocks (e.g., CTUs), such as complete or partial blocks.
Residual calculation
The residual calculation unit 204 may be configured to calculate a residual block 205 (also referred to as a residual 205) based on the image block 203 and the prediction block 265 (further details regarding the prediction block 265 are provided later), e.g. by subtracting sample values of the prediction block 265 from sample values of the image block 203 sample-by-sample (pixel-by-pixel) to obtain the residual block 205 in the sample domain.
Transformation
The transform processing unit 206 may be configured to apply a transform, e.g. a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST), to the sample values of the residual block 205 to obtain transform coefficients 207 in the transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients and represent a residual block 205 in the transform domain.
The transform unit 206 may be used to apply integer approximations of DCT/DST, such as the transform specified for h.265/HEVC. Such integer approximations are typically scaled by some factor compared to the orthogonal DCT transform. To preserve the norm of the residual block processed by the forward transform and the backward transform, an additional scaling factor is applied as part of the transform process. The scaling factor is typically selected based on certain constraints, e.g., the scaling factor is a power of 2 for the shift operation, the bit depth of the transform coefficients, a trade-off between accuracy and implementation cost, etc. For example, a particular scaling factor is specified, e.g., by inverse transform processing unit 212 for the inverse transform (and, at video decoder 30, inverse transform processing unit 312 for the corresponding inverse transform), and a corresponding scaling factor may be specified, e.g., by transform processing unit 206, for the forward transform at encoder 20 accordingly.
Embodiments of video encoder 20 (accordingly, transform processing unit 206) may be used to output transform parameters (e.g., a transform or transforms) directly or after encoding or compression via entropy encoding unit 270, such that, for example, video decoder 30 may receive and use the transform parameters for decoding.
Quantization
Quantization unit 208 may be configured to quantize transform coefficients 207, e.g., by applying scalar quantization or vector quantization, to obtain quantized coefficients 209. Quantized coefficients 209 may also be referred to as quantized transform coefficients 209 or quantized residual coefficients 209.
The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, a transform coefficient of n bits may be rounded down to a transform coefficient of m bits during quantization, where n is greater than m. The quantization level may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different scaling may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. The applicable quantization step size may be indicated by a Quantization Parameter (QP). The quantization parameter may for example be an index to a predefined set of applicable quantization steps. For example, a small quantization parameter may correspond to a fine quantization (small quantization step size) and a large quantization parameter may correspond to a coarse quantization (large quantization step size), or vice versa. The quantization may comprise division by a quantization step size and the corresponding and/or above-described inverse quantization, e.g. by the inverse quantization unit 210, may comprise multiplication by the quantization step size. Embodiments according to some standards (e.g., HEVC) may be used to determine the quantization step size using a quantization parameter. In general, the quantization step size may be calculated based on the quantization parameter using a fixed point approximation of a formula including division. Additional scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block, which may be modified due to the scaling used in the fixed-point approximation of the formula for the quantization step size and quantization parameter. In one example embodiment, the scaling of the inverse transform and the dequantization may be combined. Alternatively, a customized quantization table may be used and signaled from the encoder to the decoder, e.g., in the bitstream. Quantization is a lossy operation in which the loss increases as the quantization step size increases.
Embodiments of video encoder 20 (and, accordingly, quantization unit 208) may be used to output Quantization Parameters (QPs) directly or after encoding via entropy encoding unit 270, such that, for example, video decoder 30 may receive and apply the quantization parameters for decoding.
Inverse quantization
The inverse quantization unit 210 is configured to apply inverse quantization of the quantization unit 208 to the quantized coefficients, for example by applying an inverse of the quantization scheme applied by the quantization unit 208 based on or using the same quantization step as the quantization unit 208, to obtain dequantized coefficients 211. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211 and correspond to the transform coefficients 207, but are typically not identical to the transform coefficients due to quantization losses.
Inverse transformation
The inverse transform processing unit 212 is configured to apply an inverse transform of the transform applied by the transform processing unit 206, e.g. an inverse Discrete Cosine Transform (DCT) or an inverse Discrete Sine Transform (DST) or other inverse transform, to obtain a reconstructed residual block 213 (or corresponding dequantized coefficients 213) in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 213.
Reconstruction
The reconstruction unit 214 (e.g., adder or summer 214) is used to add the transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 to obtain a reconstructed block 215 in the sample domain, e.g., by adding sample values of the reconstructed residual block 213 sample by sample to sample values of the prediction block 265.
Filtering
The loop filter unit 220 (or simply "loop filter" 220) is used to filter the reconstructed block 215 to obtain a filtered block 221, or typically to filter the reconstructed samples to obtain filtered samples. The loop filter unit is used, for example, to smooth pixel transitions or otherwise improve video quality. Loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters (e.g., a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening filter, a smoothing filter, or a collaborative filter, or any combination thereof). Although loop filter unit 220 is illustrated in fig. 2 as an in-loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter. The post-filter block 221 may also be referred to as a post-filter reconstruction block 221.
Embodiments of video encoder 20 (accordingly, loop filter unit 220) may be used to output loop filter parameters (e.g., sample adaptive offset information) directly or after encoding via entropy encoding unit 270, such that, for example, decoder 30 may receive and apply the same loop filter parameters or a corresponding loop filter for decoding.
Decoded picture buffer
Decoded Picture Buffer (DPB)230 may be a memory that stores reference pictures (or, in general, reference picture data) for use in encoding video data by video encoder 20. DPB 230 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) (including Synchronous DRAM (SDRAM)), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. A Decoded Picture Buffer (DPB)230 may be used to store one or more filtered blocks 221. The decoded picture buffer 230 may also be used to store other previous filtered blocks (e.g., previous reconstructed and filtered blocks 221) of the same current picture or a different picture (e.g., a previous reconstructed picture), and may provide a complete previous reconstructed (i.e., decoded) picture (and corresponding reference blocks and samples) and/or a partially reconstructed current picture (and corresponding reference blocks and samples), e.g., for inter prediction. For example, if reconstructed block 215 (or any other further processed version of the reconstructed block or samples) is not filtered by loop filter unit 220, Decoded Picture Buffer (DPB)230 may also be used to store one or more unfiltered reconstructed blocks 215 (or generally unfiltered reconstructed samples).
Mode selection (partitioning & prediction)
Mode selection unit 260 includes a partitioning unit 262, an inter-prediction unit 244, and an intra-prediction unit 254, and is configured to receive or obtain original image data, e.g., original block 203 (current block 203 of current image 17), and reconstructed image data (e.g., filtered and/or unfiltered reconstructed samples or blocks of the same (current) image, and/or filtered and/or unfiltered reconstructed samples or blocks from one or more previously decoded images (e.g., from decoded image buffer 230 or other buffers (e.g., line buffers (not shown)). The reconstructed image data is used as reference image data for prediction (e.g., inter prediction or intra prediction) to obtain a prediction block 265 or a prediction value 265.
Mode selection unit 260 may be used to determine or select a partition and a prediction mode (e.g., intra-prediction mode or inter-prediction mode) for the current block prediction mode (excluding the partition), and generate a corresponding prediction block 265, the prediction block 265 being used to calculate residual block 205 and reconstructed block 215.
Embodiments of the mode selection unit 260 may be used to select the partitioning and prediction modes (e.g., selected from those modes supported by or available to the mode selection unit 260) that provide the best match or in other words the smallest residual (which means better compression for transmission or storage), or the smallest signaling overhead (which means better compression for transmission or storage), or both. The mode selection unit 260 may be configured to determine the partitioning and prediction modes based on Rate Distortion Optimization (RDO), i.e., to select a prediction mode that provides the smallest rate distortion. In this context, terms such as "best," "minimum," "optimal," and the like do not necessarily refer to "best," "minimum," "optimal," and the like as a whole, but may also refer to meeting a termination criterion or selection criterion (such as a value above or below a threshold (or other limit)) that potentially enables "suboptimal selection" but reduces complexity and processing time.
In other words, the dividing unit 262 may be configured to divide the block 203 into smaller block partitions or sub-blocks, e.g., iteratively using quad-tree partitions (QTs), binary partitions (BT), or triple-tree partitions (TT), or any combination thereof, and perform prediction, e.g., for each block partition or sub-block, wherein the mode selection includes selecting a tree structure of the divided block 203 and a prediction mode to be applied to each block partition or sub-block.
The partitioning (e.g., by partitioning unit 260) and prediction processing (by inter-prediction unit 244 and intra-prediction unit 254) performed by example video encoder 20 will be described in more detail below.
Partitioning
The dividing unit 262 may divide (or partition) the current block 203 into smaller partitions, e.g., smaller blocks of square or rectangular size. These small blocks (which may also be referred to as sub-blocks) may be further divided into even smaller partitions. This is also referred to as tree partitioning or hierarchical tree partitioning, wherein, for example, a root block at root level 0 (level 0, depth 0) may be recursively partitioned, e.g., into two or more blocks at a next lower tree level (e.g., a node at tree level 1 (level 1, depth 1)), wherein these blocks may be partitioned again into two or more blocks at a next lower level (e.g., tree level 2 (level 2, depth 2)), etc., until the partitioning is terminated, e.g., because a termination criterion is met (e.g., a maximum tree depth or a minimum block size is reached). The blocks that are not further divided are also referred to as leaf blocks or leaf nodes of the tree. The tree that uses partitioning to obtain two partitions is called a Binary Tree (BT), the tree that uses partitioning to obtain three partitions is called a Ternary Tree (TT), and the tree that uses partitioning to obtain four partitions is called a Quadtree (QT).
As previously mentioned, the term "block" as used herein may be a portion (in particular a square portion or a rectangular portion) of an image. For example, referring to HEVC and VVC, a block may be or correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), and a Transform Unit (TU), and/or to a corresponding block, such as a Coding Tree Block (CTB), a Coding Block (CB), a Transform Block (TB), or a Prediction Block (PB).
For example, a Coding Tree Unit (CTU) may be or include a CTB in units of luma samples, two corresponding CTBs in units of chroma samples of a picture having three sample arrays, or a CTB in units of monochrome images or samples of a picture coded using three separate color planes and syntax structures for coding samples. Accordingly, a Coding Tree Block (CTB) may be a block of NxN samples (N is some value) such that dividing a component into CTBs is a partition. The Coding Unit (CU) may be or include a coding block in units of luma samples, two corresponding coding blocks in units of chroma samples of an image with three sample arrays, or a coding block in units of samples of a monochrome image or an image that is coded using three separate color planes and syntax structures for coding samples. Accordingly, an encoding block (CB) may be a block of MxN samples (M and N are some values) such that dividing a CTB into encoding blocks is a partition.
In an embodiment, for example, according to HEVC, a Coding Tree Unit (CTU) may be partitioned into CUs by using a quadtree structure represented as a coding tree. The decision whether to use inter-picture (temporal) prediction or intra-picture (spatial) prediction to encode a picture region is made at the level of the CU. Each CU may be further partitioned into one, two, or four PUs depending on the PU partition type. Within a PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After a residual block is obtained by applying a prediction process based on a PU partition type, a CU may be divided into Transform Units (TUs) according to another quadtree structure similar to a coding tree of the CU.
In an embodiment, the coding blocks are partitioned, for example, using combined Quad-tree and binary tree (QTBT) partitioning, according to the latest video codec standard in current development called universal video coding (VVC). In the QTBT block structure, a CU may be square or rectangular in shape. For example, a Coding Tree Unit (CTU) is first divided by a quadtree structure. The leaf nodes of the quadtree are further divided by a binary tree or a ternary tree structure. The divided leaf nodes are called Coding Units (CUs), and the above-described division is used to perform prediction and transform processing without any further division. This means that CUs, PUs, and TUs have the same block size in the QTBT coding block structure. Also, the QTBT block structure may be used in conjunction with multiple partitions (e.g., ternary tree partitions).
In one example, mode select unit 260 of video encoder 20 may be used to perform any combination of the partitioning techniques described herein.
As described above, video encoder 20 is used to determine or select a best or optimal prediction mode from a set of (e.g., predetermined) prediction modes. The set of prediction modes may include, for example, intra-prediction modes and/or inter-prediction modes.
Intra prediction
The set of intra prediction modes may include 35 different intra prediction modes, e.g., non-directional modes (e.g., DC (or average) mode and planar mode) or directional modes, as defined in HEVC, or may include 67 different intra prediction modes, e.g., non-directional modes (e.g., DC (or average) mode and planar mode) or directional modes, as defined in VVC.
The intra-prediction unit 254 is configured to generate an intra-prediction block 265 using reconstructed samples of neighboring blocks of the same current picture according to an intra-prediction mode of the set of intra-prediction modes.
Intra-prediction unit 254 (or normal mode selection unit 260) is also to output intra-prediction parameters (or information generally indicative of the intra-prediction mode of the selected block) in the form of syntax elements 266 to entropy encoding unit 270 for inclusion in encoded image data 21 so that, for example, video decoder 30 may receive and use the prediction parameters for decoding.
Inter prediction
The set (or possible) of inter prediction modes depends on the available reference picture (i.e., the previously at least partially decoded picture, e.g., stored in the DBP 230) and other inter prediction parameters, e.g., whether to use the entire reference picture or only a portion of the reference picture (e.g., a search window area around the area of the current block) to search for the best matching reference block, and/or whether to apply pixel interpolation, e.g., half/half pixel and/or quarter sample interpolation, for example.
In addition to the prediction mode described above, a skip mode and/or a direct mode may be applied.
The inter prediction unit 244 may include a Motion Estimation (ME) unit and a Motion Compensation (MC) unit (both not shown in fig. 2). The motion estimation unit may be configured to receive or obtain an image block 203 (a current image block 203 of a current image 17) and a decoded image 231, or at least one or more previously reconstructed blocks (e.g., reconstructed blocks of one or more other/different previously decoded images 231) for use in motion estimation. For example, the video sequence may comprise a current picture and a previously decoded picture 231, or in other words, the current picture and the previously decoded picture 231 may be part of or form the sequence of pictures forming the video sequence.
The encoder 20 may for example be configured to select a reference block from a plurality of reference blocks of the same or different one of a plurality of other pictures and to provide the reference picture (or reference picture index) and/or an offset (spatial offset) between the position (x, y coordinates) of the reference block and the position of the current block as inter prediction parameters to the motion estimation unit. This offset is also called a Motion Vector (MV).
The motion compensation unit is to obtain (e.g., receive) inter-prediction parameters and perform inter-prediction based on or using the inter-prediction parameters to obtain an inter-prediction block 265. The motion compensation performed by the motion compensation unit may involve extracting or generating a prediction block based on a motion/block vector determined by motion estimation, possibly performing interpolation to sub-pixel accuracy. Interpolation filtering may generate additional pixel samples from known pixel samples, potentially increasing the number of candidate prediction blocks that may be used to encode an image block. When receiving the motion vector of the PU of the current image block, the motion compensation unit may locate the prediction block pointed to by the motion vector in one of the reference picture lists.
Motion compensation unit may also generate syntax elements associated with the block and the video slice for use by video decoder 30 in decoding image blocks of the video slice. In addition to (or instead of) slices and corresponding syntax elements, tile groups and/or tiles and corresponding syntax elements may be generated or used.
Entropy coding
Entropy encoding unit 270 is configured to apply, for example, an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a context adaptive VLC scheme (CAVLC), an arithmetic coding scheme, binarization, context adaptive binary arithmetic coding (SBAC), probability interval entropy coding (CABAC), syntax-based context adaptive binary arithmetic coding (syntax-based) coding, or another method or technique) or bypass (no compression) quantization coefficients 209, inter-prediction parameters, intra-prediction parameters, loop filter parameters, and/or other syntax elements to obtain encoded image data 21, which encoded image data 21 may be output via output (e.g., in the form of coded bit stream 272) to obtain encoded image data 21, so that, for example, video decoder 30 may receive and use these parameters for decoding. The encoded bitstream 21 may be sent to the video decoder 30 or stored in memory for later transmission or retrieval by the video decoder 30.
Other structural variations of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may quantize the residual signal directly without the transform processing unit 206 for certain blocks or frames. In another embodiment, encoder 20 may combine quantization parameter 208 and inverse quantization parameter 210 into a single unit.
Decoder and decoding method
Fig. 3 shows an example of a video decoder 30 for implementing the techniques of the present application. The video decoder 30 is configured to receive encoded image data 21 (e.g., encoded bitstream 21), for example, encoded by the encoder 20, to obtain a decoded image 331. The encoded image data or bitstream includes information for decoding the encoded image data, such as data representing image blocks and associated syntax elements of an encoded video slice (and/or group of blocks or tiles).
In the example of fig. 3, the decoder 30 may include an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (e.g., a summer 314), a loop filter 320, a decoded picture buffer (DBP)330, a mode application unit 360, an inter prediction unit 344, and an intra prediction unit 354. The inter prediction unit 344 may be or include a motion compensation unit. In some examples, video decoder 30 may perform a decoding process that is generally the reverse of the encoding process with respect to video encoder 20 in fig. 2.
As set forth with respect to encoder 20, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, decoded picture buffer (DBP)230, inter prediction unit 344, and intra prediction unit 354 are also referred to as "built-in decoders" that form video encoder 20. Accordingly, the inverse quantization unit 310 may be functionally identical to the inverse quantization unit 110, the inverse transform processing unit 312 may be functionally identical to the inverse transform processing unit 212, the reconstruction unit 314 may be functionally identical to the reconstruction unit 214, the loop filter 320 may be functionally identical to the loop filter 220, and the decoded picture buffer 330 may be functionally identical to the decoded picture buffer 230. Accordingly, the statements provided for respective units and functions of video encoder 20 correspondingly apply to respective units and functions of video decoder 30.
Entropy decoding
Entropy decoding unit 304 is to parse bitstream 21 (or generally encoded image data 21) and perform, for example, entropy decoding on encoded image data 21 to obtain, for example, quantized coefficients 309 and/or decoded coding parameters (not shown in fig. 3), such as any or all of inter-prediction parameters (e.g., reference picture indices and motion vectors), intra-prediction parameters (e.g., intra-prediction modes or indices), transform parameters, quantization parameters, loop filter parameters, and/or other syntax elements. Entropy decoding unit 304 may be used to apply a decoding algorithm or scheme corresponding to the encoding scheme described with respect to entropy encoding unit 270 of encoder 20. Entropy decoding unit 304 may also be used to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360, and to provide other parameters to other units of decoder 30. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level. In addition to (or instead of) slices and corresponding syntax elements, tile groups and/or tiles and corresponding syntax elements may be received and/or used.
Inverse quantization
Inverse quantization unit 310 may be configured to receive Quantization Parameters (QPs) (or information typically related to inverse quantization) and quantized coefficients from encoded image data 21 (e.g., by parsing and/or decoding by an entropy decoding unit), and apply inverse quantization to decoded quantized coefficients 309 based on the quantization parameters to obtain dequantized coefficients 311, dequantized coefficients 311 may also be referred to as transform coefficients 311. The inverse quantization process may include determining a degree of quantization using a quantization parameter determined by video encoder 20 for each video block in a video slice (or tile or group of tiles), and likewise, a degree of inverse quantization that should be applied.
Inverse transformation
The inverse transform processing unit 312 may be configured to receive the dequantized coefficients 311 (also referred to as transform coefficients 311) and apply a transform to the dequantized coefficients 311 to obtain a reconstructed residual block 213 in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 313. The transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process. Inverse transform processing unit 312 may also be used to receive transform parameters or corresponding information from encoded image data 21 (e.g., by parsing and/or decoding, for example, by an entropy decoding unit) to determine a transform to apply to dequantized coefficients 311.
Reconstruction
The reconstruction unit 314 (e.g., adder or summer 314) may be used to add the reconstructed residual block 313 to the prediction block 365 (e.g., by adding sample values of the reconstructed residual block 313 to sample values of the prediction block 365) to obtain a reconstructed block 315 in the sample domain.
Filtering
The loop filter unit 320 is used (within or after the codec loop) to filter the reconstruction block 315 to obtain a filtered block 321, e.g. for smoothing pixel transitions, or to otherwise improve video quality. Loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a Sample Adaptive Offset (SAO) filter, or one or more other filters (e.g., a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening filter, a smoothing filter, or a collaborative filter, or any combination thereof). Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.
Decoded picture buffer
Decoded video block 321 of the picture is then stored in decoded picture buffer 330, and decoded picture buffer 330 stores decoded picture 331 as a reference picture for subsequent motion compensation of other pictures and/or for output display, respectively.
Decoder 30 is operative to output decoded image 311, e.g., via output 312, for presentation to or viewing by a user.
Prediction
The inter-prediction unit 344 may be identical to the inter-prediction unit 244 (in particular, the motion compensation unit), and the intra-prediction unit 354 may be functionally identical to the inter-prediction unit 254, and perform segmentation or segmentation decisions and predictions based on the partitioning and/or prediction parameters or corresponding information received from the encoded image data 21 (e.g., by parsing and/or decoding, e.g., by an entropy decoding unit). The mode application unit 360 may be configured to perform prediction (intra prediction or inter prediction) of each block based on the reconstructed image, block, or corresponding samples (filtered or unfiltered samples) to obtain a prediction block 365.
When the video slice is encoded as an intra-coded (I) slice, the intra prediction unit 354 of the mode application unit 360 is used to generate a prediction block 365 for an image block of the current video slice based on the signaled intra prediction mode and data from a previously decoded block of the current frame or image. When the video image is encoded as inter-coded (i.e., B or P), the inter prediction unit 344 (e.g., a motion compensation unit) of the mode application unit 360 is used to generate a prediction block 365 for a video block of the current video slice based on the motion vectors and other syntax elements received from the entropy decoding unit 304. For inter prediction, the prediction block may be generated from one of the reference pictures in one of the reference picture lists. Video decoder 30 may construct reference frame lists (List 0) and List 1(List 1)) using a default construction technique based on the reference pictures stored in DPB 330. In addition to (or instead of) slices (e.g., video slices), the same or similar content may be applied to or through embodiments using tile groups (e.g., video tile groups) and/or tiles (e.g., video tiles), for example I, P, or B tile groups and/or tiles may be used.
Mode application unit 360 is to determine prediction information for video blocks of a current video slice by parsing motion vectors or related information and other syntax elements, and to use the prediction information to generate a prediction block for the current video block being decoded. For example, the mode application unit 360 determines a prediction mode (e.g., intra prediction or inter prediction) for encoding video blocks of a video slice, an inter prediction slice type (e.g., a B-slice, a P-slice, or a GPB-slice), construction information for one or more reference picture lists of the slice, a motion vector for each inter-coded video block of the slice, an inter prediction state for each inter-coded video block of the slice, and other information for decoding video blocks in a current video slice using some of the received syntax elements. In addition to (or instead of) slices (e.g., video slices), the same or similar content may be applied to or through embodiments using tile groups (e.g., video tile groups) and/or tiles (e.g., video tiles), for example I, P, or B tile groups and/or tiles may be used.
The embodiment of video decoder 30 shown in fig. 3 may be used to divide and/or encode images by using slices (also referred to as video slices), where images may be divided into or decoded using one or more slices (typically non-overlapping), and each slice may include one or more blocks (e.g., CTUs).
The embodiment of the video decoder 30 shown in fig. 3 may also be used for partitioning and/or decoding pictures by using groups of pictures (also referred to as video blocks) and/or tiles (also referred to as video tiles), wherein a picture may be partitioned into or encoded using one or more groups of pictures (typically non-overlapping) and each group of pictures may comprise, for example, one or more blocks (e.g., CTUs) or one or more tiles, wherein each tile may be, for example, rectangular in shape and may comprise one or more blocks (e.g., CTUs), such as complete blocks or partial blocks.
Other variations of video decoder 30 may be used to decode encoded image data 21. For example, decoder 30 may generate an output video stream without loop filtering unit 320. For example, the non-transform based decoder 30 may directly inverse quantize the residual signal without the need for an inverse transform processing unit 312 for certain blocks or frames. In another embodiment, video decoder 30 may combine inverse quantization unit 310 and inverse transform processing unit 312 into a single unit.
It should be understood that in the encoder 20 and the decoder 30, the processing result of the current step may be further processed and then output to the next step. For example, after interpolation filtering, motion vector derivation, or loop filtering, further operations (such as clipping or shifting) may be performed on the processing result of interpolation filtering, motion vector derivation, or loop filtering.
It should be noted that further operations may be applied to the derived motion vector for the current block (including but not limited to control point motion vectors for affine mode, planar mode, sub-block motion vectors in ATMVP mode, temporal motion vectors, etc.). For example, the value of the motion vector is constrained to be within a predefined range according to the representation bits of the motion vector. If the representation bit of the motion vector is bitDepth, the range is 2^ (bitDepth-1) to 2^ (bitDepth-1) -1, where "^" represents exponentiation. For example, if bitDepth is set equal to 16, the range is-32768 ~ 32767; if bitDepth is set equal to 18, the range is-131072-131071. For example, the values of the above-derived motion vectors (e.g., MVs of four 4x4 sub-blocks within an 8x8 block) are constrained such that the maximum difference between the integer part of the MVs of the four 4x4 sub-blocks does not exceed N pixels, e.g., 1 pixel. Two methods for constraining motion vectors according to bitDepth are provided herein.
The method comprises the following steps: most Significant Bits (MSB) that overflow by:
ux=(mvx+2 bitDepth )%2 bitDepth (1)
mvx=(ux>=2 bitDepth-1 )?(ux-2 bitDepth ):ux (2)
uy=(mvy+2 bitDepth )%2 bitDepth (3)
mvy=(uy>=2 bitDepth-1 )?(uy-2 bitDepth ):uy (4)
wherein mvx is a horizontal component of a motion vector of a picture block or sub-block, mvy is a vertical component of a motion vector of a picture block or sub-block, and ux and uy represent intermediate values;
for example, if the value of mvx is-32769, the resulting value is 32767 after applying equation (1) and equation (2). In a computer system, decimal numbers are stored in two's complement. The two's complement of-32769 is 1,011,111,111,111,1111(17 bits), then the MSB is discarded, thus resulting in a two's complement of 0111,1111,1111,1111 (32767 decimal), the same as the output of equations (1) and (2).
ux=(mvpx+mvdx+2 bitDepth )%2 bitDepth (5)
mvx=(ux>=2 bitDepth-1 )?(ux-2 bitDepth ):ux (6)
uy=(mvpy+mvdy+2 bitDepth )%2 bitDepth (7)
mvy=(uy>=2 bitDepth-1 )?(uy-2 bitDepth ):uy (8)
These operations may be applied during the summation of mvp and mvd as shown in equations (5) to (8).
The method 2 comprises the following steps: removing overflow MSB by clipping value
vx=Clip3(-2 bitDepth-1 ,2 bitDepth-1 -1,vx)
vy=Clip3(-2 bitDepth-1 ,2 bitDepth-1 -1,vy)
Wherein vx is the horizontal component of the motion vector of the picture block or sub-block and vy is the vertical component of the motion vector of the picture block or sub-block; x, y, and z correspond to the three input values of the MV clipping process, respectively, as follows to define the function Clip 3:
Figure GDA0003129626820000151
fig. 5 is a schematic diagram of a video codec device 400 according to an embodiment of the present disclosure. The video codec device 400 is adapted to implement the disclosed embodiments as described herein. In an embodiment, the video codec device 400 may be a decoder (e.g., the video decoder 30 of fig. 1A) or an encoder (e.g., the video encoder 20 of fig. 1A).
The video codec device 400 includes an ingress port 410 (or an input port 410) for receiving data and a receiver unit (Rx) 420; a processor, logic unit, or Central Processing Unit (CPU) 430 for processing data; a transmitter unit (Tx)440 and an egress port 450 (or output port 450) for transmitting data; and a memory 460 for storing data. The video codec device 400 may further include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the ingress port 410, the receiver unit 420, the transmitter unit 440, and the egress port 450 for outputting or inputting optical signals or electrical signals.
The processor 430 is implemented by hardware and software. Processor 430 may be implemented by one or more CPU chips, cores (e.g., as a multi-core processor), FPGAs, ASICs, and DSPs. Processor 430 is in communication with inlet port 410, receiver unit 420, transmitter unit 440, outlet port 450, and memory 460. The processor 430 includes a codec module 470. The codec module 470 implements the above-described embodiments of the present disclosure. For example, codec module 470 implements, processes, prepares, or provides various codec operations. Thus, the inclusion of the codec module 470 provides a substantial improvement in the functionality of the encoding device 700 and enables the transition of the video codec device 400 to a different state. Alternatively, codec module 470 is implemented as instructions stored in memory 460 and executed by processor 430.
The memory 460 may include one or more disks, tape drives, and solid state drives, and may serve as an over-flow data storage device to store programs when such programs are selected for execution, and to store read instructions and data during program execution. The memory 460 may be, for example, volatile and/or non-volatile, and may be read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM).
Fig. 6 is a simplified block diagram of an apparatus 500, which apparatus 500 may be used as the source device 12 and/or destination device 14 shown in fig. 1A, according to an example embodiment.
The processor 502 in the apparatus 500 may be a central processing unit. Alternatively, processor 502 may be any other type of device or devices capable of operating or processing information now existing or later developed. Although the disclosed embodiments may be practiced using a single processor (e.g., processor 502) as shown, advantages in speed and efficiency may be realized using multiple processors.
The memory 504 in the apparatus 500 may be a Read Only Memory (ROM) device or a Random Access Memory (RAM) device in one embodiment. Any other suitable type of storage device may be used for memory 504. The memory 504 may include code and data 506 that are accessed by the processor 502 using a bus 512. The memory 504 may also include an operating system 508 and application programs 510, the application programs 510 including at least one program that allows the processor 502 to perform the methods described herein. For example, applications 510 may include applications 1 through N, which also include video codec applications that perform the methods described herein.
The apparatus 500 may also include one or more output devices, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines a display with touch-sensitive elements operable to sense touch input. A display 518 may be coupled to the processor 502 via the bus 512.
Although described herein as a single bus, the bus 512 of the device 500 may be comprised of multiple buses. Further, the secondary memory 514 may be directly coupled to other components of the apparatus 500 or may be accessible via a network and may comprise a single integrated unit (e.g., a memory card) or multiple units (e.g., multiple memory cards). Accordingly, the apparatus 500 may be implemented in various configurations.
An embodiment of the invention is based on LIC as described in CE10.5.2 (JFET-M0087). CE10.5.2 may implement the base layer pipeline structure with LIC, but in some cases an additional phase looping scheme may still be used. One of these cases is a very simple intra prediction mode (e.g., DC mode or integer slope direction mode). This is often the case in the upper left corner of each tile when the tiles (tiles) are coded in parallel. In these cases (typically used by commercial turbo encoders), the latency of the motion compensation stage or pipeline may be higher than the intra prediction. In another scenario (e.g., multi-hypothesis intra prediction (combined inter-intra prediction, CIIP)), the intra pipeline at the beginning of a picture/tile or another codec unit may be faster than inter prediction for units using fractional MC. For such cases and the like, the prior art LIC design should wait for the end of the MC phase. Furthermore, LIC parameter estimation uses samples outside the current prediction unit, which results in increased memory bandwidth. Embodiments of the present invention seek to reduce the latency of LIC for the above and similar cases.
As shown in fig. 7a, the pipeline design proposed in the present embodiment operates using non-fractional motion compensation data (samples taken only from reference frames without interpolation). This allows the processing of the LIC unit to be started in parallel with the MC. Since these threads do not have a filtering process, it is allowed to keep the memory bandwidth at the same level as without the LIC.
LIC parameter estimation is performed using unfiltered reference samples.
For inter prediction of a current block comprising an array of samples CU [ i, j ] of size WxH with available reference samples Top [ i ] (i 0.. W) and Left [ j ] (j 0.. H) stored with a precision mvp and available motion vectors MV (mvx, mvy), the LIC process comprises the following steps:
-estimating LIC parameters (linear approximation a x + B) between reference lines Top [ i ], Left [ j ], and collocated samples extracted from the reference frame using the integer part of MV (mvx, mvy) (Imvx ═ mvp) < < mvp, Imvy ═ mvp < < mvp). mvp is a constant that represents MV precision. For example, mvp is 2.
-applying the above LIC parameters to each prediction direction separately.
In another possible embodiment, the LIC process comprises the following steps:
-estimating LIC parameters (linear approximation a x + B) between reference lines Top [ i ], Left [ j ], and collocated samples (Imvx ═ (mvx > > mvp) < < mvp, mvy) extracted from the reference frame using the integer part of one component of MV (mvx, mvy). mvp is a constant that represents MV precision. For example, mvp is 2.
-applying the above LIC parameters to each prediction direction separately.
Fig. 7b is an exemplary diagram of Local Illumination Compensation (LIC) based on integer pixel motion compensation. As in the example shown in fig. 7b, the arrow 701 corresponds to the actual MV (mvx, mvy) of the current block used in the motion compensation stage. In an embodiment of the invention it is proposed to use the integer part of the MV, MVinteger (Imvx, Imvy) as indicated by arrow 702 in fig. 7 b. The upper reference sample 703-706 and/or the left reference sample 710-713 are used for LIC parameter estimation.
FIG. 8 is a flowchart 800 illustrating an exemplary inter prediction of a current block by applying LIC. In step 802, the video codec device obtains reference samples for the current block. The video codec device may be a decoder (e.g., the video decoder 30 of fig. 1A-1B, 3), or an encoder (e.g., the video encoder 20 of fig. 1A-1B, 2), or the video codec device 400 of fig. 4, or the apparatus 500 of fig. 5.
The reference samples of the current block are obtained from reference lines Top [ i ], Left [ j ], where Top [ i ] denotes an available upper reference sample of the current block, i is 0 … W, and Left [ j ] denotes an available Left reference sample of the current block, j is 0 … H, W denotes a width of the current block, and H denotes a height of the current block.
In step 804, the video codec device obtains reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional MV. The reference samples of the reference block are referred to as reference samples of the reference block.
As an example, at least one of the reference samples of the reference block is obtained by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=(mvy>>mvp)<<mvp),
where (Imvx, Imvy) is an MV of one of the reference samples, (mvx, mvy) is a fractional MV, and mvp is a constant.
As another example, at least one of the reference samples of the reference block is obtained by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=mvy),
where (Imvx, Imvy) is an MV of one of the reference samples, (mvx, mvy) is a fractional MV, and mvp is a constant.
As other examples, at least one of the reference samples of the reference block is obtained by:
(Imvx=mvx,Imvy=(mvy>>mvp)<<mvp),
where (Imvx, Imvy) is an MV of one of the reference samples, (mvx, mvy) is a fractional MV, and mvp is a constant.
In step 806, the video codec device estimates a Local Illumination Compensation (LIC) parameter by using the reference samples of the current block and the reference samples of the reference block.
The LIC parameters include a and B, and the LIC parameters are estimated by linear approximation.
In step 808, the video encoding and decoding apparatus obtains the inter prediction of the current block according to the LIC parameter. The inter prediction of the current block satisfies: y ═ a × x + B, where Y is the inter prediction of the current block, x is the reference sample of the reference block, and a and B are LIC parameters.
As disclosed in exemplary method 800, at least one of the reference samples of the reference block is obtained by using the integer part of the fractional MV. This scheme reduces latency by discarding one phase. Therefore, when the reference sample (i.e., the reference sample) of the reference block includes the fractional MV, inter prediction of the current block is simplified.
Fig. 9 is a block diagram illustrating an example structure of an apparatus 900 for inter-predicting a current block by applying LIC. The apparatus 900 is configured to perform the above method, and may include an estimating unit 910 and an obtaining unit 920.
The obtaining unit 920 is configured to obtain reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional Motion Vector (MV).
As an example, at least one of the reference samples of the reference block is obtained by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=(mvy>>mvp)<<mvp),
where (Imvx, Imvy) is an MV of one of the reference samples of the reference block, (mvx, mvy) is a fractional MV, and mvp is a constant.
As another example, at least one of the reference samples of the reference block is obtained by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=mvy),
where (Imvx, Imvy) is an MV of one of the reference samples of the reference block, (mvx, mvy) is a fractional MV, and mvp is a constant.
As other examples, at least one of the reference samples of the reference block is obtained by:
(Imvx=mvx,Imvy=(mvy>>mvp)<<mvp),
where (Imvx, Imvy) is an MV of one of the reference samples of the reference block, (mvx, mvy) is a fractional MV, and mvp is a constant.
The estimation unit 910 is configured to estimate a Local Illumination Compensation (LIC) parameter by using a reference sample of the current block and a reference sample of the reference block. For example, the estimation unit 910 estimates the LIC parameters by linear approximation.
The obtaining unit 920 is configured to obtain inter prediction of the current block according to the LIC parameter.
The obtaining unit 920 is further configured to obtain a reference sample of the current block from a reference line Top [ i ], Left [ j ], where Top [ i ] represents an available upper reference sample of the current block, i ═ 0 … W, and W represents a width of the current block; and Left j denotes an available Left reference sample of the current block, j is 0 … H, and H denotes the height of the current block.
The estimation unit 910 is used to estimate the LIC parameters in parallel with performing Motion Compensation (MC).
As disclosed in the exemplary apparatus 900, at least one of the reference samples of the reference block is obtained by using an integer part of the fractional MV. This scheme reduces latency by discarding one phase. Therefore, when the reference sample (i.e., the reference sample) of the reference block includes the fractional MV, inter prediction of the current block is simplified.
Applications of the encoding method and the decoding method shown in the above embodiments and systems using these methods are set forth below.
Fig. 10 is a block diagram illustrating a content provisioning system 3100 for implementing a content distribution service. The content provisioning system 3100 includes a capture device 3102, a terminal device 3106, and optionally a display 3126. Capture device 3102 communicates with terminal device 3106 via communication link 3104. The communication link may include the communication channel 13 described above. Communication link 3104 includes, but is not limited to, WIFI, ethernet, cable, wireless (3G/4G/5G), USB, or any type of combination thereof, and the like.
The capture device 3102 generates data, and the data may be encoded by the encoding method shown in the above-described embodiments. Alternatively, the capture device 3102 may distribute the data to a streaming server (not shown in the figure) that encodes the data and transmits the encoded data to the terminal device 3106. Capture device 3102 includes, but is not limited to, a camera, a smart phone or tablet, a computer or laptop, a video conferencing system, a PDA, a car mounted device, or any combination thereof, and the like. For example, capture device 3102 may include source device 12 described above. When the data includes video, the video encoder 20 included in the capturing device 3102 may actually perform a video encoding process. When the data includes audio (i.e., sound), an audio encoder included in the capture device 3102 may actually perform an audio encoding process. In some practical scenarios, the capture device 3102 distributes encoded video data and encoded audio data by multiplexing them together. In other practical scenarios, for example in a video conferencing system, encoded audio data and encoded video data are not multiplexed. The capture device 3102 distributes the encoded audio data and the encoded video data to the terminal device 3106, respectively.
In the content supply system 3100, the terminal device 310 receives and reproduces encoded data. The terminal device 3106 may be a device with data receiving and recovery capabilities, such as a smart phone or tablet 3108, a computer or laptop 3110, a Network Video Recorder (NVR)/Digital Video Recorder (DVR)3112, a television 3114, a Set Top Box (STB) 3116, a video conferencing system 3118, a video surveillance system 3120, a Personal Digital Assistant (PDA)3122, a vehicle device 3124, or any combination thereof, or a device capable of decoding the above encoded data, and so forth. For example, terminal device 3106 may include destination device 14 described above. When the encoded data includes video, the video decoder 30 in the terminal device preferentially performs video decoding. When the encoded data includes audio, an audio decoder included in the terminal device preferentially performs audio decoding processing.
For terminal devices with displays, such as a smartphone or tablet 3108, a computer or laptop 3110, a Network Video Recorder (NVR)/Digital Video Recorder (DVR)3112, a television 3114, a Personal Digital Assistant (PDA)3122, or a car mounted device 3124, the terminal device may send decoded data to its display. For a terminal device without a display, such as STB 3116, video conferencing system 3118, or video surveillance system 3120, the external display 3126 is connected to the terminal device to receive and display the decoded data.
The image encoding apparatus or the image decoding apparatus shown in the above-described embodiments may be used when each apparatus in the present system performs encoding or decoding.
Fig. 11 is a diagram showing an example structure of the terminal device 3106. After the terminal device 3106 receives the stream from the acquisition device 3102, the protocol performing unit 3202 analyzes the transmission protocol of the stream. Protocols include, but are not limited to, Real Time Streaming Protocol (RTSP), hypertext transfer protocol (HTTP), HTTP live streaming protocol (HLS), MPEG DASH, real time transport protocol (RTP), Real Time Messaging Protocol (RTMP), or any combination thereof, among others.
After the stream is processed by the protocol proceeding unit 3202, a stream file is generated. The file is output to the demultiplexing unit 3204. The demultiplexing unit 3204 may separate the multiplexed data into encoded audio data and encoded video data. As described above, in some practical scenarios, for example, in a video conference system, encoded audio data and encoded video data are not multiplexed. In this case, the encoded data is transmitted to the video decoder 3206 and the audio decoder 3208 without passing through the demultiplexing unit 3204.
Via demultiplexing, a video Elementary Stream (ES), an audio ES, and optionally subtitles are generated. The video decoder 3206 (including the video decoder 30 as set forth in the above embodiment) decodes the video ES by the decoding method as shown in the above embodiment to generate a video frame, and transmits the data to the synchronization unit 3212. The audio decoder 3208 decodes the audio ES to generate an audio frame, and transmits the data to the synchronization unit 3212. Alternatively, the video frames may be stored in a buffer (not shown in fig. 11) before being sent to the synchronization unit 3212. Similarly, the audio frames may be stored in a buffer (not shown in fig. 11) before being sent to the synchronization unit 3212.
The synchronization unit 3212 synchronizes the video frames and the audio frames and provides the video/audio to the video/audio display 3214. For example, the synchronization unit 3212 synchronizes presentation of video and audio information. The information may be syntactically coded using time stamps associated with the presentation of the coded audio and visual data and time stamps associated with the delivery of the data stream.
If a subtitle is included in the stream, the subtitle decoder 3210 decodes the subtitle and synchronizes the subtitle with the video frame and the audio frame and provides the video/audio/subtitle to the video/audio/subtitle display 3216.
Mathematical operators
The mathematical operators used in this application are similar to those used in the C programming language. However, the results of integer division and arithmetic shift operations are more accurately defined, and other operations, such as exponentiation and real-valued division, are defined. The numbering and counting specifications typically start from zero, e.g., "first" corresponds to 0 th, "second" corresponds to 1 st, and so on.
Arithmetic operator
The following arithmetic operators are defined as follows:
plus addition method
Subtraction (as a two-parameter operator) or non-operation (as a unary prefix operator)
Multiplication, including matrix multiplication
x y And (6) performing exponentiation. The y power of x is specified. In other contexts, the representation is used as a superscript, rather than being interpreted as an exponentiation.
Integer division with the result truncated towards zero. For example, 7/4 and-7/-4 are truncated to 1, -7/4 and 7/-4 are truncated to-1.
Division in the mathematical equation is expressed without truncation or rounding.
Figure GDA0003129626820000201
Used to represent a division in a mathematical equation without truncation or rounding.
Figure GDA0003129626820000202
Where i takes all integer values from x to y (including y).
x% y modulus. The remainder of x divided by y is defined only for integers x and y where x >0 and y > 0.
Logical operators
The following logical operators are defined as follows:
boolean logical AND of x & & yx and y "
Boolean logical "OR" of x | y x and y "
| A Boolean logic "not"
Z if x is TRUE or not equal to 0, the calculated value is y, otherwise, the calculated value is z.
Relational operator
The following relational operators are defined as follows:
is greater than
Greater than or equal to
< less than
Is less than or equal to
Equal to
| A Is not equal to
When a relational operator is applied to a syntax element or variable that has been assigned a value of "na" (not applicable), the value of "na" is treated as a different value of the syntax element or variable. The value "na" is considered not equal to any other value.
Bitwise operator
The following bitwise operator is defined as follows:
and is bit by bit. When operating on the integer parameter, the complement representation of two of the integer value is operated on. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
I is bit-wise or. When operating on integer parameters, the complement representation of two of the integer value is operated on. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
And E, bitwise exclusive or. When operating on the integer parameter, the complement representation of two of the integer value is operated on. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
The two's complement integer of x > > yx represents an arithmetic right shift by y binary bits. The function is defined only if y is a non-negative integer value. The value of the bit shifted into the Most Significant Bit (MSB) due to the right shift is equal to the MSB of x before the shift operation.
The two's complement integer of x < < yx represents an arithmetic left shift by y binary bits. The function is defined only if y is a non-negative integer value. The value of the bit shifted into the Least Significant Bit (LSB) due to the left shift is equal to 0.
Assignment operators
The following arithmetic operators are defined as follows:
operator of value assignment
Plus + plus, i.e., x + + equals x ═ x + 1; when used in array indexing, is equal to the value of the variable prior to the increment operation.
-minus, i.e., x — equals x ═ x-1; when used in array indexing, is equal to the value of the variable prior to the subtraction operation.
Plus is increased by the specified amount, i.e., x + ═ 3 equals x +3, and x + ═ (-3) equals x + (-3).
By a specified amount, i.e., x-3 equals x-3 and x-3 equals x- (-3).
Symbol of range
The following notation is used to illustrate the range of values:
and x y. zx is an integer value from y to z (inclusive), where x, y, and z are integers and z is greater than y.
Mathematical function
The following mathematical functions are defined:
Figure GDA0003129626820000211
asin (x) triangular arcsine function, operating on parameter x, x ranges from-1.0 to 1.0 inclusive, and output values range from-pi ÷ 2 to pi ÷ 2 inclusive, in radians.
Atan (x) triangular arctangent function, operating on parameter x, outputs values in the range-pi ÷ 2 to pi ÷ 2 (inclusive) in radians.
Figure GDA0003129626820000221
Ceil (x) is the smallest integer greater than or equal to x.
Clip1 Y (x)=Clip3(0,(1<<BitDepth Y )-1,x)
Clip1 C (x)=Clip3(0,(1<<BitDepth C )-1,x)
Figure GDA0003129626820000222
Cos (x) trigonometric cosine function, which operates on the parameter x in radians.
Floor (x) is less than or equal to the largest integer of x.
Figure GDA0003129626820000223
Ln (x) the natural logarithm of x (base e logarithm, where e is the natural logarithm base constant 2.718281828 … …).
Log2(x) x base 2 logarithm.
Log10(x) x base 10 logarithm.
Figure GDA0003129626820000224
Figure GDA0003129626820000225
Round(x)=Sign(x)*Floor(Abs(x)+0.5)
Figure GDA0003129626820000226
Sin (x) trigonometric sine function, calculated on the parameter x, in radians.
Figure GDA0003129626820000227
Swap(x,y)=(y,x)
Tan (x) the trigonometric tangent function, calculated on the parameter x, in radians.
Operation order priority
When no brackets are used to explicitly indicate the order of priority in an expression, the following rule applies:
-the higher priority operations are computed before any lower priority operations.
Operations of the same priority are computed sequentially from left to right.
The following table is the highest to lowest operation priority, with higher priorities being the higher the position in the table.
For those operators that are also used in the C programming language, the priority order used in this specification is the same as the priority order used in the C programming language.
Table: operation priority from highest (table top) to lowest (table bottom)
Figure GDA0003129626820000231
Textual description of logical operations
In the text, a statement of a logical operation will be described mathematically in the following form:
Figure GDA0003129626820000232
this can be described in the following way:
...as follows/...the following applies:
–If condition 0,statement 0
–Otherwise,if condition 1,statement 1
–...
–Otherwise(informative remark on remaining condition),statement n
each "If... atherwise, if... atherwise," statements are introduced in ". as wells" or ". the wells applications," immediately followed by "If..". The last condition of "If... Otherwise, if... Otherwise" is always "atherwise. An interleaved "If... other, if... other." statement may be identified by matching ". as wells" or ". the following.. the following applications" with the ending "other.
In the text, a statement of a logical operation will be described mathematically in the following form:
Figure GDA0003129626820000241
this can be described in the following way:
Figure GDA0003129626820000242
in the text, a statement of a logical operation will be described mathematically in the following form:
if(condition 0)
statement 0
if(condition 1)
statement 1
this can be described in the following way:
When condition 0,statement 0
When condition 1,statement 1
although embodiments of the present invention are described primarily in terms of video codecs, it should be noted that embodiments of the codec system 10, encoder 20, and decoder 30 (and accordingly system 10), as well as other embodiments described herein, may also be used for still image processing or codec, i.e., processing or codec a single image independent of any previous or consecutive image in a video codec. In general, if image processing codec is limited to a single image 17, only inter prediction units 244 (encoders) and 344 (decoders) are not available. All other functions (also referred to as tools or techniques) of video encoder 20 and video decoder 30 may be used for still image processing as well, such as residual calculation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse) transform 212/312, partition 262/362, intra prediction 254/354, and/or loop filtering 220/320, as well as entropy encoding 270 and entropy decoding 304.
Such as embodiments of encoder 20 and decoder 30, and the functionality described herein, such as with reference to encoder 20 and decoder 30, may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the various functions may be stored on a computer-readable medium or transmitted over a communication medium as one or more instructions or code and executed by a hardware-based processing unit. The computer readable medium may include a computer readable storage medium corresponding to a tangible medium, such as a data storage medium, or any communication medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium that is not transitory, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote resource using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the various functions described herein may be provided within dedicated hardware and/or software modules for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques described above may be implemented entirely within one or more circuits or logic elements.
The techniques of this disclosure may be implemented in various devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as mentioned above, the various units may be combined in a codec hardware unit, in combination with suitable software and/or firmware, or provided by a collection of interoperating hardware units (including one or more processors as described above).

Claims (17)

1. A coding and decoding method implemented by a decoding apparatus or an encoding apparatus, comprising:
estimating Local Illumination Compensation (LIC) parameters by using reference samples of a current block and reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional Motion Vector (MV), wherein the reference samples used in the LIC parameter estimation are not filtered; and
and obtaining the inter-frame prediction of the current block according to the LIC parameters, wherein the LIC parameter estimation and the Motion Compensation (MC) are performed in parallel.
2. The method of claim 1, wherein the reference samples of the current block are obtained from reference samples Top [ i ], Left [ j ] of a reference line, wherein Top [ i ] represents an available upper reference sample of the current block, i-0 … W, W represents a width of the current block; and wherein Left [ j ] represents an available Left reference sample of the current block, j is 0 … H, and H represents a height of the current block.
3. The method of claim 1, wherein at least one of the reference samples of the reference block is obtained by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=(mvy>>mvp)<<mvp),
wherein (Imvx, Imvy) is an MV of one of the reference samples, (mvx, mvy) is the fractional MV, and mvp is a constant.
4. The method of claim 1, wherein at least one of the reference samples of the reference block is obtained by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=mvy),
wherein (Imvx, Imvy) is an MV of one of the reference samples, (mvx, mvy) is the fractional MV, and mvp is a constant.
5. The method of claim 1, wherein at least one of the reference samples of the reference block is obtained by:
(Imvx=mvx,Imvy=(mvy>>mvp)<<mvp),
wherein (Imvx, Imvy) is an MV of one of the reference samples, (mvx, mvy) is the fractional MV, and mvp is a constant.
6. The method of any of claims 1 to 5, wherein the LIC parameters are estimated by linear approximation.
7. An apparatus for inter prediction of a block, wherein the apparatus is an encoder or a decoder, and the apparatus comprises:
an obtaining unit for obtaining reference samples of a reference block, wherein at least one of the reference samples of the reference block is obtained by using an integer part of a fractional Motion Vector (MV);
an estimating unit for estimating a Local Illumination Compensation (LIC) parameter by using a reference sample of a current block and the reference sample of the reference block, wherein the reference sample used in the LIC parameter estimation is not filtered; the estimation unit is configured to estimate the LIC parameter estimation in parallel with performing Motion Compensation (MC);
the obtaining unit is configured to obtain inter prediction of the current block according to the LIC parameter.
8. The apparatus of claim 7, wherein the obtaining unit is further configured to obtain the reference sample of the current block from reference samples Top [ i ], Left [ j ] of a reference line, wherein Top [ i ] represents an available upper reference sample of the current block, i ═ 0 … W, and W represents a width of the current block; and wherein Left [ j ] represents an available Left reference sample of the current block, j is 0 … H, and H represents a height of the current block.
9. The apparatus of claim 7, wherein the obtaining unit is to obtain the at least one of the reference samples of the reference block by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=(mvy>>mvp)<<mvp),
wherein (Imvx, Imvy) is an MV of one of the reference samples of the reference block, (mvx, mvy) is the fractional MV, and mvp is a constant.
10. The apparatus of claim 7, wherein the obtaining unit is to obtain the at least one of the reference samples of the reference block by:
(Imvx=(mvx>>mvp)<<mvp,Imvy=mvy),
wherein (Imvx, Imvy) is an MV of one of the reference samples of the reference block, (mvx, mvy) is the fractional MV, and mvp is a constant.
11. The apparatus of claim 7, wherein the obtaining unit is to obtain the at least one of the reference samples of the reference block by:
(Imvx=mvx,Imvy=(mvy>>mvp)<<mvp),
wherein (Imvx, Imvy) is an MV of one of the reference samples of the reference block, (mvx, mvy) is the fractional MV, and mvp is a constant.
12. The apparatus according to any of claims 7 to 11, wherein the estimating unit is configured to estimate the LIC parameters by linear approximation.
13. An encoder (20) comprising processing circuitry for performing the method of any of claims 1 to 6.
14. A decoder (30) comprising processing circuitry for performing the method of any of claims 1 to 6.
15. A computer-readable medium storing a computer program, wherein the computer program, when executed by a processor, implements the method according to any one of claims 1 to 6.
16. A decoder, comprising:
one or more processors; and
a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the decoder to perform the method of any of claims 1-6.
17. An encoder, comprising:
one or more processors; and
a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, configures the encoder to perform the method of any of claims 1-6.
CN202080007327.5A 2019-01-16 2020-01-16 Encoder, decoder, and corresponding methods for local illumination compensation Active CN113228632B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201962793351P 2019-01-16 2019-01-16
US62/793,351 2019-01-16
PCT/RU2020/050003 WO2020149769A1 (en) 2019-01-16 2020-01-16 An encoder, a decoder and corresponding methods for local illumination compensation

Publications (2)

Publication Number Publication Date
CN113228632A CN113228632A (en) 2021-08-06
CN113228632B true CN113228632B (en) 2022-09-02

Family

ID=71613407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080007327.5A Active CN113228632B (en) 2019-01-16 2020-01-16 Encoder, decoder, and corresponding methods for local illumination compensation

Country Status (4)

Country Link
US (1) US11876956B2 (en)
EP (1) EP3895418A4 (en)
CN (1) CN113228632B (en)
WO (1) WO2020149769A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024032672A1 (en) * 2022-08-09 2024-02-15 Mediatek Inc. Method and apparatus for video coding

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107710764A (en) * 2015-06-09 2018-02-16 高通股份有限公司 It is determined that the system and method for the illumination compensation state for video coding
WO2018129539A1 (en) * 2017-01-09 2018-07-12 Qualcomm Incorporated Encoding optimization with illumination compensation and integer motion vector restriction

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101369746B1 (en) * 2007-01-22 2014-03-07 삼성전자주식회사 Method and apparatus for Video encoding and decoding using adaptive interpolation filter
US9667942B2 (en) * 2012-11-20 2017-05-30 Qualcomm Incorporated Adaptive luminance compensation in three dimensional video coding
CN108492312B (en) * 2018-02-26 2021-06-29 大连大学 Visual tracking method based on reverse sparse representation under illumination change
CN112868240B (en) * 2018-10-23 2023-06-30 北京字节跳动网络技术有限公司 Collocated local illumination compensation and modified inter prediction codec

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107710764A (en) * 2015-06-09 2018-02-16 高通股份有限公司 It is determined that the system and method for the illumination compensation state for video coding
WO2018129539A1 (en) * 2017-01-09 2018-07-12 Qualcomm Incorporated Encoding optimization with illumination compensation and integer motion vector restriction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Frame rate up-conversion using forward-backward jointing motion estimation and spatio-temporal motion vector smoothing;Truong Quang Vinh; Young-Chul Kim; Sung-Hoon Hong;《 2009 International Conference on Computer Engineering & Systems》;20091216;全文 *
基于亮度对比度增强与饱和度补偿的快速图像去雾算法;曹绪民; 刘春晓; 张金栋; 林宇航; 赵锦威;《计算机辅助设计与图形学学报》;20181015;全文 *

Also Published As

Publication number Publication date
US20210352276A1 (en) 2021-11-11
EP3895418A4 (en) 2022-02-23
WO2020149769A1 (en) 2020-07-23
EP3895418A1 (en) 2021-10-20
CN113228632A (en) 2021-08-06
US11876956B2 (en) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2020221203A1 (en) An encoder, a decoder and corresponding methods of intra prediction
JP7366149B2 (en) An encoder, decoder, and corresponding method for harmonizing matrix-based intra-prediction and quadratic transform core selection
KR102616680B1 (en) Encoders, decoders and corresponding methods for inter prediction
CN112673633B (en) Encoder, decoder and corresponding methods for merging modes
CN113632464A (en) Method and apparatus for inter-component prediction
CN113924780A (en) Method and device for affine inter-frame prediction of chroma subblocks
JP7391991B2 (en) Method and apparatus for intra-smoothing
CN113545063A (en) Method and apparatus for intra prediction using linear model
JP2024026231A (en) Encoder related to intra-prediction mode, decoder, and corresponding method
CN114125468A (en) Intra-frame prediction method and device
CN114450958B (en) Affine motion model limiting for reducing memory bandwidth of enhanced interpolation filters
CN113455005A (en) Deblocking filter for sub-partition boundaries generated by intra sub-partition coding tools
CN113660489B (en) Decoding method, apparatus, decoder and storage medium for intra sub-division
AU2024201152A1 (en) An encoder, a decoder and corresponding methods using intra mode coding for intra prediction
KR20220065880A (en) Use of DCT-based interpolation filters and enhanced bilinear interpolation filters in affine motion compensation
CN113170118A (en) Method and apparatus for chroma intra prediction in video coding
CN114679583A (en) Video encoder, video decoder and corresponding methods
CN113228632B (en) Encoder, decoder, and corresponding methods for local illumination compensation
CN113692740A (en) Method and apparatus for division-free intra prediction
CN114900702B (en) Codec and corresponding method for reducing complexity of intra prediction for planar mode
CN114830652A (en) Method and apparatus for reference sample interpolation filtering for directional intra prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant