CN113491130B - Method and apparatus for linear model derivation for video coding and decoding - Google Patents

Method and apparatus for linear model derivation for video coding and decoding Download PDF

Info

Publication number
CN113491130B
CN113491130B CN202080016731.9A CN202080016731A CN113491130B CN 113491130 B CN113491130 B CN 113491130B CN 202080016731 A CN202080016731 A CN 202080016731A CN 113491130 B CN113491130 B CN 113491130B
Authority
CN
China
Prior art keywords
sample
samples
value
prediction
anchor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080016731.9A
Other languages
Chinese (zh)
Other versions
CN113491130A (en
Inventor
陈漪纹
王祥林
修晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Publication of CN113491130A publication Critical patent/CN113491130A/en
Application granted granted Critical
Publication of CN113491130B publication Critical patent/CN113491130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component

Abstract

A method for video encoding and decoding is provided. The method comprises the following steps: using adjacent reconstructed chroma sampling points and reference sampling points to deduce prediction parameters alpha and beta through a parameter deduction process; and determining whether to apply a local light intensity compensation, LIC, mode to the current coding unit, CU, and when it is determined to apply the LIC mode, deriving parameters, α2 and β2, of the LIC by performing the parameter derivation procedure, and obtaining a final LIC prediction value based on the following equation: pred (pred) L (i,j)=α2.rec L "(i, j) +β2; wherein α2 and β2 are examples of the parameters α and β; pred (pred) L (i, j) represents a value of an LIC prediction sample in the current CU; and rec L "(i, j) represents a value of a reference sample point in a reference picture of the current CU.

Description

Method and apparatus for linear model derivation for video coding and decoding
Cross Reference to Related Applications
The present application claims priority from U.S. provisional application No. 62/793,869, entitled "Linear Model Derivation for Video Coding (linear model derivation for video codec)" filed on 1 month 17 of 2019, which is incorporated by reference in its entirety for all purposes.
Technical Field
The present application relates generally to video coding and compression, and in particular, but not limited to, methods and apparatus for generating a prediction signal using a linear model in video coding.
Background
Digital video is supported by a variety of electronic devices such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game consoles, smart phones, video teleconferencing equipment, video streaming devices, and the like. Electronic devices transmit, receive, encode, decode, and/or store digital video data by performing video compression/decompression. Digital video devices implement video coding techniques such as those described in the standards defined by the following and extensions of such standards: universal video coding (VVC), joint exploration test model (JEM), MPEG-2, MPEG-4, ITU-T h.263, ITU-T h.264/MPEG-4 part 10, advanced Video Coding (AVC), ITU-T h.265/High Efficiency Video Coding (HEVC).
Video coding is typically performed using prediction methods (e.g., inter-prediction, intra-prediction) that exploit redundancy present in video pictures or sequences. An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality. With the advent of ever-evolving video services, there is a need for coding techniques with better codec efficiency.
Video compression typically involves performing spatial (intra) prediction and/or temporal (inter) prediction to reduce or eliminate redundancy inherent in video data. For block-based video coding, a video frame is partitioned into one or more slices, each slice having a plurality of video blocks, which may also be referred to as Coding Tree Units (CTUs). Each CTU may contain one Coding Unit (CU) or be recursively partitioned into smaller CUs until a predefined minimum CU size is reached. Each CU (also referred to as a leaf CU) contains one or more Transform Units (TUs), and each CU also contains one or more Prediction Units (PUs). Each CU may be encoded in intra, inter or IBC modes. Video blocks in an intra-coded (I) slice of a video frame are coded using spatial prediction with respect to reference points in neighboring blocks within the same video frame. Video blocks in inter-coded (P or B) slices of a video frame may use spatial prediction with respect to reference samples in neighboring blocks within the same video frame or temporal prediction with respect to reference samples in other previous and/or future reference video frames.
A prediction block for a current video block to be encoded is generated based on spatial or temporal prediction of a reference block (e.g., a neighboring block) that has been previously encoded. The process of finding the reference block may be accomplished by a block matching algorithm. Residual data representing pixel differences between a current block to be encoded and a prediction block is referred to as a residual block or prediction error. The inter-coded block is encoded according to a motion vector pointing to a reference block in a reference frame forming the prediction block, and a residual block. The process of determining motion vectors is typically referred to as motion estimation. The intra-coded block is coded according to the intra-prediction mode and the residual block. For further compression, the residual block is transformed from the pixel domain to a transform domain, e.g. the frequency domain, resulting in residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged as a two-dimensional array, may be scanned to produce a one-dimensional vector of transform coefficients, and then entropy encoded into a video bitstream to achieve more compression.
The encoded video bitstream is then saved in a computer readable storage medium (e.g., flash memory) for access by another electronic device having digital video capabilities, or transmitted directly to the electronic device in a wired or wireless manner. The electronic device then performs video decompression (which is the reverse of the video compression described above) by, for example, parsing the encoded video bitstream to obtain syntax elements from the bitstream and reconstructing the digital video data from the encoded video bitstream to its original format based at least in part on the syntax elements obtained from the bitstream, and rendering the reconstructed digital video data on a display of the electronic device.
As digital video quality goes from high definition to 4K x 2K or even 8K x 4K, the amount of video data to be encoded/decoded grows exponentially. There has been a challenge in terms of how to encode/decode video data more efficiently while maintaining the image quality of the decoded video data.
In a joint video expert group (jfet) conference, jfet defines a manuscript for a generic video codec (VVC) and VVC test model 1 (VTM 1) coding method. The decision includes a quadtree with a nested multi-type tree that uses binary and ternary partition coding block structures as the initial new codec features for the VVC. Since then, reference software VTM for implementing the coding method and VVC decoding process draft was developed during the jfet conference.
Disclosure of Invention
In general, this disclosure describes examples of techniques related to generating a prediction signal using a linear model in video coding.
According to a first aspect of the present disclosure, there is provided a method for video coding and decoding, the method comprising: using adjacent reconstructed chroma sampling points and reference sampling points to deduce prediction parameters alpha and beta through a parameter deduction process; and determining whether to apply a local light intensity compensation, LIC, mode to the current coding unit, CU, and when it is determined to apply the LIC mode, deriving parameters, α2 and β2, of the LIC by performing the parameter derivation procedure, and obtaining a final LIC prediction value based on the following equation: pred (pred) L (i,j)=α2·rec L "(i, j) +β2; wherein α2 and β2 are examples of the parameters α and β; pred (pred) L (i, j) represents a value of an LIC prediction sample in the current CU; and rec' L (i, j) represents a value of a reference sample point in a reference picture of the current CU。
According to a second aspect of the present disclosure, there is provided an apparatus for video encoding and decoding, the apparatus comprising: a processor; and a memory configured to store instructions executable by the processor; wherein the processor, when executing the instructions, is configured to: using adjacent reconstructed chroma sampling points and reference sampling points to deduce prediction parameters alpha and beta through a parameter deduction process; and determining whether to apply a local light intensity compensation, LIC, mode to the current coding unit, CU, and when it is determined to apply the LIC mode, deriving parameters, α2 and β2, of the LIC by performing the parameter derivation procedure, and obtaining a final LIC prediction value based on the following equation: pred (pred) L (i,j)=α2·rec L "(i, j) +β2; wherein α2 and β2 are examples of the parameters α and β; pred (pred) L (i, j) represents a value of an LIC prediction sample in the current CU; and rec' L (i, j) represents a value of a reference sample point in a reference picture of the current CU.
According to a third aspect of the present disclosure, there is provided a non-transitory computer readable storage medium comprising instructions stored therein, wherein when executed by a processor, the instructions cause the processor to: using adjacent reconstructed chroma sampling points and reference sampling points to deduce prediction parameters alpha and beta through a parameter deduction process; and determining whether to apply a local light intensity compensation, LIC, mode to the current coding unit, CU, and when it is determined to apply the LIC mode, deriving parameters, α2 and β2, of the LIC by performing the parameter derivation procedure, and obtaining a final LIC prediction value based on the following equation: pred (pred) L (i,j)=α2·rec L "(i, j) +β2; wherein α2 and β2 are examples of the parameters α and β; pred (pred) L (i, j) represents a value of an LIC prediction sample in the current CU; and rec' L (i, j) represents a value of a reference sample point in a reference picture of the current CU.
Drawings
A more particular description of examples of the disclosure will be rendered by reference to specific examples that are illustrated in the appended drawings. These examples will be described and explained in more detail by using the accompanying drawings, in view of the fact that these drawings depict only some examples and are therefore not to be considered limiting of scope.
Fig. 1 is a block diagram illustrating an exemplary video encoding and decoding system according to some embodiments of the present disclosure.
Fig. 2 is a block diagram illustrating an exemplary video encoder according to some embodiments of the present disclosure.
Fig. 3 is a block diagram illustrating an exemplary video decoder according to some embodiments of the present disclosure.
Fig. 4 is a schematic diagram illustrating a luminance and chrominance pixel sampling grid in yuv 4:2:0 format according to some embodiments of the present disclosure.
Fig. 5 is a schematic diagram illustrating the locations of samples used to derive parameters α1 and β1 in a cross-component linear model (CCLM) prediction mode, according to some embodiments of the present disclosure.
Fig. 6 is a schematic diagram illustrating straight line derivation of α1 and β1 using a min-Max method according to some embodiments of the present disclosure.
Fig. 7 is a schematic diagram illustrating lm_a modes for deriving α1 and β1 according to some embodiments of the present disclosure.
Fig. 8 is a schematic diagram illustrating lm_l patterns for deriving α1 and β1 according to some embodiments of the present disclosure.
Fig. 9 is a schematic diagram illustrating a luminance and chrominance pixel sampling grid in yuv 4:2:2 format according to some embodiments of the present disclosure.
Fig. 10 is a schematic diagram illustrating a luminance and chrominance pixel sampling grid in yuv 4:4:4 format according to some embodiments of the present disclosure.
Fig. 11 is a schematic diagram illustrating adjacent samples for deriving parameters α2 and β2 in a local Light Intensity Compensation (LIC) mode according to some embodiments of the present disclosure.
Fig. 12 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 13 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 14 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 15 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the present disclosure.
Fig. 16 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 17 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 18 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 19 is a schematic diagram illustrating an example of sample selection for deriving CCLM/LIC parameters according to some embodiments of the disclosure.
Fig. 20 is a block diagram illustrating an exemplary apparatus for video encoding and decoding according to some embodiments of the present disclosure.
Fig. 21 is a flowchart illustrating an exemplary video codec process for generating a prediction signal using a linear model according to some embodiments of the present disclosure.
Detailed Description
Reference will now be made in detail to the specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. It will be apparent to those of ordinary skill in the art that various alternatives may be used. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
Reference throughout this specification to "one embodiment," "an example," "some embodiments," "some examples," or similar language means that a particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described in connection with one or some embodiments may be applicable to other embodiments unless explicitly stated otherwise.
Throughout this disclosure, the terms "first," "second," "third," and the like are used as nomenclature, and are used merely to refer to related elements, e.g., devices, components, compositions, steps, etc., without implying any spatial or temporal order unless explicitly stated otherwise. For example, a "first device" and a "second device" may refer to two separately formed devices, or two portions, components, or operational states of the same device, and may be arbitrarily named.
As used herein, the term "if" or "when … …" may be understood to mean "at … …" or "responsive" depending on the context. These terms, if present in the claims, may not indicate that the relevant limitations or features are conditional or optional.
The terms "module," "sub-module," "circuit," "sub-circuit," "unit," or "subunit" may include a memory (shared, dedicated, or group) that stores code or instructions that may be executed by one or more processors. A module may include one or more circuits with or without stored code or instructions. A module or circuit may include one or more components connected directly or indirectly. These components may or may not be physically attached to each other or adjacent to each other.
The units or modules may be implemented in pure software, in pure hardware, or in a combination of hardware and software. For example, in a software-only implementation, a unit or module may include functionally related code blocks or software components that are directly or indirectly linked together to perform a particular function.
Fig. 1 is a block diagram illustrating an exemplary system 10 for encoding and decoding video blocks according to some embodiments of the present disclosure. As shown in fig. 1, system 10 includes a source device 12, which source device 12 generates and encodes video data to be decoded by a destination device 14 at a later time. The source device 12 and the destination device 14 may be any of a variety of electronic devices including desktop or laptop computers, tablet computers, smart phones, set-top boxes, digital televisions, cameras, display devices, digital media players, video gaming machines, video streaming devices, and the like. In some implementations, the source device 12 and the destination device 14 are equipped with wireless communication capabilities.
In some implementations, destination device 14 may receive encoded video data to be decoded via link 16. Link 16 may be any type of communication medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, link 16 may be a communication medium for enabling source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated and transmitted to destination device 14 in accordance with a communication standard, such as a wireless communication protocol. The communication medium may be any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The communication medium may include a router, switch, base station, or any other device that may be used to facilitate communication from source device 12 to destination device 14.
In some other implementations, the encoded video data may be transmitted from the output interface 22 to the storage device 32. The encoded video data in the storage device 32 may then be accessed by the destination device 14 via the input interface 28. Storage device 32 may include any of a variety of distributed or locally accessed data storage media such as hard drives, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile memory or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In a further example, storage device 32 may correspond to a file server or another intermediate storage device that may hold encoded video data generated by source device 12. The destination device 14 may access the stored video data from the storage device 32 via streaming or download. The file server may be any type of computer capable of storing and transmitting encoded video data to destination device 14. Exemplary file servers include web servers (e.g., for websites), FTP servers, network Attached Storage (NAS) devices, or local disk drives. The destination device 14 may access the encoded video data over any standard data connection, including a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing the encoded video data stored on the file server. The transmission of encoded video data from storage device 32 may be streaming, download transmission, or a combination of both.
As shown in fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Video source 18 may include sources such as a video capture device, for example, a video camera, a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of such sources. As one example, if video source 18 is a camera of a security monitoring system, source device 12 and destination device 14 may be a camera phone or video phone. However, the embodiments described in this disclosure may be generally applicable to video coding and may be applied to wireless and/or wired applications.
The captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video data may be transmitted directly to destination device 14 via output interface 22 of source device 12. The encoded video data may also (or alternatively) be stored on the storage device 32 for later access by the destination device 14 or other devices for decoding and/or playback. Output interface 22 may further include a modem and/or a transmitter.
Destination device 14 includes an input interface 28, a video decoder 30, and a display device 34. Input interface 28 may include a receiver and/or modem and receives encoded video data over link 16. The encoded video data transmitted over link 16 or provided on storage device 32 may include various syntax elements generated by video encoder 20 for use by video decoder 30 in decoding the video data. Such syntax elements may be included in encoded video data transmitted over a communication medium, stored on a storage medium, or stored in a file server.
In some implementations, the destination device 14 may include a display device 34, which may be an integrated display device or an external display device configured to communicate with the destination device 14. The display device 34 displays the decoded video data to a user and may be any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may operate in accordance with proprietary or industry standards such as section VVC, HEVC, MPEG-4, section 10, advanced Video Coding (AVC), or extensions to such standards. It should be understood that the present disclosure is not limited to a particular video encoding/decoding standard and may be applicable to other video encoding/decoding standards. It is generally contemplated that video encoder 20 of source device 12 may be configured to encode video data according to any of these current or future standards. Similarly, it is also generally contemplated that the video decoder 30 of the destination device 14 may be configured to decode video data according to any of these current or future standards.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When implemented in part in software, the electronic device can store instructions for the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the video encoding/decoding operations disclosed in the present disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Fig. 2 is a block diagram illustrating an exemplary video encoder 20 according to some embodiments described in this disclosure. Video encoder 20 may perform intra-prediction encoding and inter-prediction encoding of video blocks within video frames. Intra-prediction encoding relies on spatial prediction to reduce or eliminate spatial redundancy of video data within a given video frame or picture. Inter-prediction encoding relies on temporal prediction to reduce or eliminate temporal redundancy of video data within adjacent video frames or pictures of a video sequence.
As shown in fig. 2, video encoder 20 includes a video data memory 40, a prediction processing unit 41, a Decoded Picture Buffer (DPB) 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction processing unit 41 further comprises a motion estimation unit 42, a motion compensation unit 44, a partition unit 45, an intra prediction processing unit 46, an Intra Block Copy (IBC) unit 48, and an intra/inter mode decision unit 49. In some implementations, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and an adder 62 for video block reconstruction. A deblocking filter may be located between adder 62 and DPB 64 to filter block boundaries to remove blocking artifacts from the reconstructed video. In addition to the deblocking filter, a loop filter 63 may be used to filter the output of adder 62. Video encoder 20 may take the form of fixed or programmable hardware units, or may be partitioned among one or more of the fixed or programmable hardware units.
Video data memory 40 may store video data to be encoded by components of video encoder 20. The video data in video data store 40 may be obtained, for example, from video source 18. DPB 64 is a buffer that stores reference video data for use in encoding the video data by video encoder 20 (e.g., in intra-prediction encoding mode or inter-prediction encoding mode). Video data memory 40 and DPB 64 may be any of a variety of memory devices. In various examples, video data memory 40 may be on-chip with other components of video encoder 20, or off-chip with respect to those components.
As shown in fig. 2, after receiving video data, a partition unit 45 within the prediction processing unit 41 partitions the video data into video blocks. This partitioning may also include partitioning the video frame into slices, tiles, or other larger Coding Units (CUs) according to a predefined partitioning structure, such as a quadtree structure associated with the video data. A video frame may be divided into a plurality of video blocks (or a set of video blocks called a tile). The prediction processing unit 41 may select one of a plurality of possible prediction coding modes, such as one of a plurality of intra prediction coding modes or one of a plurality of inter prediction coding modes, for the current video block based on the error result (e.g., the coding rate and the distortion level). The prediction processing unit 41 may provide the resulting intra prediction encoded block or inter prediction encoded block to the adder 50 to generate a residual block and to the adder 62 to reconstruct the encoded block for subsequent use as part of a reference frame. Prediction processing unit 41 also provides syntax elements such as motion vectors, intra mode indicators, partition information, and other such syntax information to entropy encoding unit 56.
To select an appropriate intra-prediction encoding mode for the current video block, intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-prediction encoding of the current video block with respect to one or more neighboring blocks in the same frame as the current block to be encoded to provide spatial prediction. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-prediction encoding of the current video block relative to one or more prediction blocks in one or more reference frames to provide temporal prediction. Video encoder 20 may perform multiple encoding passes, for example, to select an appropriate encoding mode for each block of video data.
In some implementations, motion estimation unit 42 determines the inter-prediction mode of the current video frame by generating a motion vector that indicates a displacement of a Prediction Unit (PU) of a video block within the current video frame relative to a prediction block within a reference video frame according to a predetermined mode within the sequence of video frames. The motion estimation performed by the motion estimation unit 42 is a process of generating motion vectors that estimates the motion of the video block. The motion vector may, for example, indicate the displacement of a PU of a video block within a current video frame or picture relative to a prediction block (or other coding unit) within a reference frame relative to the current block (or other coding unit) encoded within the current frame. The predetermined pattern may designate video frames in the sequence as P-frames or B-frames. The intra BC unit 48 may determine the vector (e.g., block vector) for intra BC codec in a manner similar to that of motion vector determined by the motion estimation unit 42 for inter prediction, or may determine the block vector using the motion estimation unit 42.
A prediction block is a block of a reference frame that is considered to closely match a PU of a video block to be encoded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), sum of Square Differences (SSD), or other difference metric. In some implementations, video encoder 20 may calculate values for sub-integer pixel positions of reference frames stored in DPB 64. For example, video encoder 20 may insert values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of the reference frame. Accordingly, the motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector with fractional pixel accuracy.
Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-prediction encoded frames by comparing the locations of PUs to locations of prediction blocks of reference frames selected from a first reference frame list (e.g., list 0) or a second reference frame list (e.g., list 1), each of which identifies one or more reference frames stored in DPB 64. Motion estimation unit 42 sends the calculated motion vector to motion compensation unit 44 and then to entropy encoding unit 56.
The motion compensation performed by the motion compensation unit 44 may involve acquiring or generating a prediction block based on the motion vector determined by the motion estimation unit 42. After receiving the motion vector of the PU of the current video block, motion compensation unit 44 may locate the prediction block to which the motion vector points in one of the reference frame lists, retrieve the prediction block from DPB 64, and forward the prediction block to adder 50. Adder 50 then forms a residual video block having pixel differences by subtracting the pixel values of the prediction block provided by motion compensation unit 44 from the pixel values of the current video block being encoded. The pixel differences forming the residual video block may include a luma difference component or a chroma difference component or both. Motion compensation unit 44 may also generate syntax elements associated with the video blocks of the video frames for use by video decoder 30 in decoding the video blocks of the video frames. The syntax elements may include, for example, syntax elements defining motion vectors used to identify the prediction block, any flags indicating the prediction mode, or any other syntax information described herein. Note that the motion estimation unit 42 and the motion compensation unit 44 may be highly integrated, but they are illustrated separately for conceptual purposes.
In some implementations, the intra BC unit 48 may generate the vector and obtain the prediction block in a manner similar to that described above in connection with the motion estimation unit 42 and the motion compensation unit 44, but where the prediction block is in the same frame as the current block being encoded, and where the vector is referred to as a block vector with respect to the motion vector. In particular, the intra BC unit 48 may determine an intra prediction mode for encoding the current block. In some examples, intra BC unit 48 may encode the current block using various intra prediction modes, e.g., during separate encoding passes, and test its performance by rate-distortion analysis. Next, intra BC unit 48 may select an appropriate intra prediction mode from among the various tested intra prediction modes to use and generate an intra mode indicator accordingly. For example, the intra BC unit 48 may calculate a rate distortion value using rate distortion analysis for various tested intra prediction modes and select the intra prediction mode having the best rate distortion characteristics among the tested modes as the appropriate intra prediction mode to use. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and the original, unencoded block (encoded to produce the encoded block) and the bit rate (i.e., number of bits) used to produce the encoded block. The intra BC unit 48 may calculate ratios from the distortion and rate of each encoded block to determine which intra prediction mode exhibits the best rate distortion value for the block.
In other examples, the intra BC unit 48 may use, in whole or in part, the motion estimation unit 42 and the motion compensation unit 44 to perform such functions for intra BC prediction in accordance with the embodiments described herein. In either case, for intra block copying, the prediction block may be a block that is considered to closely match the block to be encoded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), sum of Square Differences (SSD), or other difference metric, and the identification of the prediction block may include calculating a value of sub-integer (sub-integer) pixel positions.
Whether the prediction block is from the same frame according to intra prediction or from a different frame according to inter prediction, video encoder 20 may form the residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being encoded, thereby forming pixel differences. The pixel difference values forming the residual video block may include a luminance component difference and a chrominance component difference.
As described above, the intra-prediction processing unit 46 may perform intra-prediction on the current video block as an alternative to inter-prediction performed by the motion estimation unit 42 and the motion compensation unit 44, or intra-block copy prediction performed by the intra BC unit 48. In particular, intra-prediction processing unit 46 may determine an intra-prediction mode for encoding the current block. To this end, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction processing unit 46 (or a mode selection unit in some examples) may select an appropriate intra-prediction mode from among the tested intra-prediction modes for use. Intra-prediction processing unit 46 may provide information indicative of the selected intra-prediction mode of the block to entropy encoding unit 56. Entropy encoding unit 56 may encode information indicating the selected intra-prediction mode in the bitstream.
After the prediction processing unit 41 determines the prediction block of the current video block via inter prediction or intra prediction, the adder 50 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be included in one or more Transform Units (TUs) and provided to transform processing unit 52. The transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform such as a Discrete Cosine Transform (DCT) or a conceptually similar transform.
The transform processing unit 52 may send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficient to further reduce the bit rate. The quantization process may also reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.
After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients into a video bitstream using, for example, context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partition Entropy (PIPE) coding, or other entropy encoding methods or techniques. The encoded bitstream may then be transmitted to the video decoder 30 or archived in the storage device 32 for later transmission to or retrieval by the video decoder 30. Entropy encoding unit 56 may also entropy encode the motion vector and other syntax elements of the current video frame being encoded.
Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transforms, respectively, to reconstruct the residual video block in the pixel domain to generate a reference block for predicting other video blocks. As described above, motion compensation unit 44 may generate a motion compensated prediction block from one or more reference blocks of a frame stored in DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the prediction block to calculate sub-integer pixel values for use in motion estimation.
Adder 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in DPB 64. The reference block may then be used as a prediction block by the intra BC unit 48, the motion estimation unit 42, and the motion compensation unit 44 to inter-predict another video block in a subsequent video frame.
In the example of video encoding and decoding using video encoder 20, video frames are partitioned into blocks for processing. For each given video block, a prediction is formed based on inter-prediction or intra-prediction. In inter prediction, a predictor or prediction block may be formed by motion estimation and motion compensation based on pixels from a previously reconstructed frame. In intra prediction, a prediction value may be formed based on reconstructed pixels in the current frame. Through mode decision, the best predictor may be selected to predict the current block.
The prediction residual or residual block (i.e., the difference between the current block and its prediction value) is sent to a transform module, such as transform processing unit 52. The transform coefficients are then sent to a quantization module, such as quantization unit 54, for entropy reduction. The quantized coefficients are fed to an entropy encoding module (e.g., entropy encoding unit 56) to generate a compressed video bitstream. As shown in fig. 2, prediction related information (e.g., block partition information, motion vectors, reference picture indices, intra prediction modes, etc.) from the inter and/or intra prediction modules is also passed through an entropy encoding module (e.g., entropy encoding unit 56) and then saved into the bitstream.
In video encoder 20, a decoder-correlation module may be required to reconstruct the pixels for prediction purposes. First, prediction residues are reconstructed by inverse quantization and inverse transform. The reconstructed prediction residual is then combined with the prediction value to generate unfiltered reconstructed pixels of the current block.
In order to improve coding efficiency and visual quality, a loop filter 63 is often used. Deblocking filters are available in AVC, HEVC, and VVC, for example. In HEVC, an additional loop filter, referred to as a Sample Adaptive Offset (SAO), may be defined to further improve codec efficiency. In VVC, a loop filter 63 called an Adaptive Loop Filter (ALF) may be employed.
These loop filter operations are optional. Opening the loop filter generally helps to improve codec efficiency and visual quality. They may also be turned off as encoder decisions to save computational complexity.
It should be noted that if these filters are turned on by the encoder, then inter prediction is based on filtered reconstructed pixels, whereas intra prediction is typically reconstructed based on unfiltered reconstructed pixels.
Fig. 3 is a block diagram illustrating an exemplary video decoder 30 according to some embodiments of the present disclosure. Video decoder 30 includes video data memory 79, entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, inverse transform processing unit 88, adder 90, and DPB 92. The prediction processing unit 81 further includes a motion compensation unit 82, an intra prediction processing unit 84, an intra BC unit 85, and an intra/inter mode selection unit 87. Video decoder 30 may perform a decoding process that is generally inverse to the encoding process described above in connection with fig. 2 with respect to video encoder 20. For example, the motion compensation unit 82 may generate prediction data based on the motion vector received from the entropy decoding unit 80, and the intra prediction unit 84 may generate prediction data based on the intra prediction mode indicator received from the entropy decoding unit 80.
In an example of video decoding using the video decoder 30, the received bitstream is decoded by the entropy decoding unit 80 to derive quantization coefficient levels (or quantization coefficients) and prediction related information. The quantized coefficient levels are then processed by an inverse quantization unit 86 and an inverse transform processing unit 88 to obtain reconstructed residual blocks. The prediction value or the prediction block is formed through an intra prediction or motion compensation process based on the decoded prediction related information. The unfiltered reconstructed pixels are obtained by summing the reconstructed residual block and the prediction values. With the loop filter turned on, a filtering operation is performed on these pixels to derive the final reconstructed video for output.
In some examples, units of video decoder 30 may be tasked to perform embodiments of the present disclosure. Also, in some examples, embodiments of the present disclosure may divide between one or more units of video decoder 30. For example, the intra BC unit 85 may perform embodiments of the present disclosure alone or in combination with other units of the video decoder 30 (e.g., the motion compensation unit 82, the intra prediction processing unit 84, and the entropy decoding unit 80). In some examples, video decoder 30 may not include intra BC unit 85, and the functions of intra BC unit 85 may be performed by other components of prediction processing unit 81 (e.g., motion compensation unit 82).
Video data memory 79 may store video data, such as an encoded video bitstream, to be decoded by other components of video decoder 30. For example, video data stored in video data memory 79 may be obtained from storage device 32, a local video source (e.g., a camera), via a wired or wireless network transfer of video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk). The video data memory 79 may include an encoded picture buffer (CPB) that stores encoded video data from an encoded video bitstream. A Decoded Picture Buffer (DPB) 92 of video decoder 30 stores reference video data for use in decoding the video data by video decoder 30 (e.g., in an intra-prediction encoding mode or an inter-prediction encoding mode). Video data memory 79 and DPB 92 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM), including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. For illustrative purposes, video data memory 79 and DPB 92 are depicted in fig. 3 as two distinct components of video decoder 30. It will be apparent to those skilled in the art that video data memory 79 and DPB 92 may be provided by the same memory device or separate memory devices. In some examples, video data memory 79 may be on-chip with other components of video decoder 30, or off-chip with respect to those components.
During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of encoded video frames and associated syntax elements. Video decoder 30 may receive syntax elements at the video frame level and/or the video block level. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors or intra-prediction mode indicators, and other syntax elements. Entropy decoding unit 80 then forwards the motion vectors and other syntax elements to prediction processing unit 81.
When a video frame is encoded as an intra prediction encoded (I) frame or as an intra encoding prediction block used in other types of frames, the intra prediction processing unit 84 of the prediction processing unit 81 may generate prediction data for a video block of the current video frame based on the signaled intra prediction mode and reference data from a previously decoded block of the current frame.
When a video frame is encoded as an inter-prediction encoded (i.e., B or P) frame, the motion compensation unit 82 of the prediction processing unit 81 generates one or more prediction blocks of a video block of the current video frame based on the motion vectors and other syntax elements received from the entropy decoding unit 80. Each prediction block may be generated from a reference frame within one of the reference frame lists. Video decoder 30 may construct a list of reference frames, e.g., list 0 and list 1, based on the reference frames stored in DPB 92 using a default construction technique.
In some examples, when video blocks are encoded according to the intra BC mode described herein, intra BC unit 85 of prediction processing unit 81 generates a prediction block for the current video block based on the block vectors and other syntax elements received from entropy decoding unit 80. The prediction block may be within a reconstructed region of the same picture as the current video block defined by video encoder 20.
The motion compensation unit 82 and/or the intra BC unit 85 determine prediction information for the video block of the current video frame by parsing the motion vector and other syntax elements, and then use the prediction information to generate a prediction block for the decoded current video block. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) for encoding a video block of a video frame, an inter-prediction frame type (e.g., B or P), construction information for one or more of the reference frame lists of the frame, a motion vector for each inter-prediction encoded video block of the frame, an inter-prediction state for each inter-prediction encoded video block of the frame, and other information for decoding the video block in the current video frame.
Similarly, the intra BC unit 85 may use some of the received syntax elements (e.g., flags) to determine that the current video block is predicted using: intra BC mode, construction information that the video blocks of the frame are within the reconstructed region and should be stored in DPB 92, block vectors for each intra BC predicted video block of the frame, intra BC prediction status for each intra BC predicted video block of the frame, and other information for decoding the video blocks in the current video frame.
Motion compensation unit 82 may also perform interpolation using interpolation filters to calculate interpolation values for sub-integer pixels of the reference block as used by video encoder 20 during encoding of the video block. In this case, motion compensation unit 82 may determine an interpolation filter used by video encoder 20 from the received syntax element and use the interpolation filter to generate the prediction block.
The inverse quantization unit 86 inversely quantizes the quantized transform coefficients provided in the bitstream and entropy decoded by the entropy decoding unit 80 using the same quantization parameter calculated by the video encoder 20 for each video block in the video frame to determine the degree of quantization. The inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain.
After motion compensation unit 82 or intra BC unit 85 generates a prediction block for the current video block based on the vector and other syntax elements, adder 90 reconstructs the decoded video block for the current video block by summing the residual block from inverse transform processing unit 88 and the corresponding prediction block generated by motion compensation unit 82 and intra BC unit 85. Loop filter 91 may be positioned between adder 90 and DPB 92 to further process the decoded video block. The decoded video blocks in a given frame are then stored in DPB 92, which stores reference frames for subsequent motion compensation of the next video block. DPB 92 or a memory device separate from DPB 92 may also store decoded video for later presentation on a display device, such as display device 34 of fig. 1.
In a typical video encoding process, a video sequence typically includes an ordered set of frames or pictures. Each frame may include three sample arrays, denoted SL, SCb, and SCr, respectively. SL is a two-dimensional array of luminance samples. SCb is a two-dimensional array of Cb chroma-sampling points. SCr is a two-dimensional array of Cr chroma-sampling points. In other examples, the frame may be monochromatic and thus include only one two-dimensional array of luminance samples. In this disclosure, the term "brightness" represented by the symbol or subscript Y or L is used to designate an array of samples or a single sample representing a monochromatic signal associated with a primary color. The term "chromaticity" represented by the symbols Cb and Cr (or C) is used to designate an array of samples or a single sample representing one of two color difference signals associated with a primary color.
Cross-component linear model prediction
To reduce cross-component redundancy, a cross-component linear model (CCLM) prediction mode is used in VVC reference software VTM, where chroma samples of a Coding Unit (CU) are predicted based on reconstructed luma samples of the same CU by using the following linear model:
pred C (i,j)=α1·rec L ′(i,j)+β1; (1)
wherein pred C (i, j) represents the value of the predicted chroma samples in the CU, and rec' L (i, j) represents the value of the downsampled reconstructed luma samples of the same CU.
That is, the values of the predicted chroma samples are modeled as a linear function of the values of the reconstructed luma samples of the luma block using the parameters α1 and β1. The reconstructed luma samples may be downsampled to match the size of the chroma samples.
In one example, yuv format 4:2:0 is used in the VVC development process under universal test conditions, and derivation of parameters α1 and β1 is described below with respect to yuv format 4:2:0. Fig. 4 shows a sampling grid of luminance samples and chrominance samples in yuv sampling format 4:2:0. In the illustrated sampling grid, X indicates the location of luminance samples, and O indicates the location of chrominance samples or downsampled luminance samples. In the enlarged portion of the sampling grid shown in FIG. 4, rec' L (x, y) represents the value of the downsampled reconstructed luminance sample point, and Rec L Values representing six adjacent luminance samples that can be used to generate the value Rec 'of the downsampled reconstructed luminance sample' L (x, y), where x and y are pixel indices.
The parameters α1 and β1 are derived by a method using a straight line equation, which can be named the min-Max method. Fig. 5 shows the positions of samples used to derive parameters α1 and β1 in a coded block having 2w×2h luminance samples. Rec C Representing the reconstructed chroma samples adjacent to the top left and may be referred to as anchor samples; rec' L Representing the corresponding upper left adjacent downsampled reconstructed luminance samples and may be referred to as reference samples; and the value of N (N being the number of samples used to determine the straight line) is equal to twice the minimum of the width and height of the current chroma-coded block. Fig. 6 illustrates a straight line between a minimum luminance value and a maximum luminance value for deriving the parameters α and β. The 2 points A, B (a pair of luminance and chrominance samples) are the minimum and maximum values within a set of adjacent luminance samples as depicted in fig. 5. Each chromaticity sample point and the corresponding brightness sample point are called a sample point pair; and a represents the sample pair having the smallest luminance value, i.e., the smallest sample pair, and B represents the sample pair having the largest luminance value, i.e., the largest sample pair. The linear model parameters α1 and β1 are obtained from the following equation (division can be avoided and replaced by multiplication and shifting):
β=y A -αx A (2)
Wherein y is B Is the chroma sample value of the maximum sample pair, y A Chroma sample value, x, which is the minimum sample pair B Is the luminance sample value of the maximum sample pair, and x A Is the luminance sample value of the minimum sample pair.
For square code blocks, the min-Max method is directly applied. For non-square coded blocks, adjacent samples of longer boundaries are first sub-sampled to have the same number of samples as shorter boundaries before the min-Max method is applied. Fig. 5 shows the positions of the upper left neighboring samples and the samples of the current block involved in the CCLM mode.
The min-Max method calculation is performed as part of the decoding process and not just as an encoder search operation. Therefore, no syntax is used to convey the values of parameter α1 and parameter β1 to the decoder. Currently, equation/filter (3) is used as a luminance downsampling filter to generate downsampled luminance samples. However, a different equation/filter may be selected to generate the downsampled luminance samples, as shown in equations (3) through (19). Note that equations (5) through (10) can be considered as direct sampling without the need for a downsampling process.
Rec′ L [x,y]=(Rec L [2x,2y]*2+Rec L [2x+1,2y]+Rec L [2x-1,2y]+Rec L [2x,2y+1]*2+Rec L [2x+1,2y+1]+Rec L [2x-1,2y+1]+4)>>3 (3)
Rec′ L [x,y]=(Rec L [2x,2y]+Rec L [2x,2y+1]+Rec L [2x+1,2y]+Rec L [2x+1,2y+1]+2)>>2 (4)
Rec′ L [x,y]=Rec L [2x,2y] (5)
Rec′ L [x,y]=Rec L [2x+1,2y] (6)
Rec′ L [x,y]=Rec L [2x-1,2y] (7)
Rec′ L [x,y]=Rec L [2x-1,2y+1] (8)
Rec′ L [x,y]=Rec L [2x,2y+1] (9)
Rec′ L [x,y]=Rec L [2x+1,2y+1] (10)
Rec′ L [x,y]=(Rec L [2x,2y]+Rec L [2x,2y+1]+1)>>1 (11)
Rec′ L [x,y]=(Rec L [2x,2y]+Rec L [2x+1,2y]+1)>>1 (12)
Rec′ L [x,y]=(Rec L [2x+1,2y]+Rec L [2x+1,2y+1]+1)>>1 (13)
Rec′ L [x,y]=(Rec L [2x,2y+1]+Rec L [2x+1,2y+1]+1)>>1 (14)
Rec′ L [x,y]=(2×Rec L [2x,2y+1]+Rec L [2x-1,2y+1]+Rec L [2x+1,2y+1]+2)>>2 (15)
Rec′ L [x,y]=(Rec L [2x+1,2y]+Rec L [2x+1,2y+1]+1)>>1 (16)
In addition to calculating the linear model coefficients α1 and β1 using the upper template (i.e., the sample point) and the left template together, the templates may be alternatively used among other 2 LM modes (respectively referred to as lm_a and lm_l modes). As shown in fig. 7, in lm_a mode, the linear model coefficients are calculated using the upper template. To obtain more spots, the upper template is extended to (w+w). As shown in fig. 8, in the lm_l mode, only the left template is used to calculate the linear model coefficients. To get more samples, the left template is extended to (H+H). In addition to the yuv 4:2:0 format, the codec may also support 4:2:2 format and 4:4:4 format. Fig. 9 and 10 show sample grids of luminance and chrominance samples in yuv 4:2:2 format and yuv 4:4:4 format, respectively.
Local luminance compensation
Local luminance compensation (LIC) is based on a linear model for luminance variation, which uses a scaling factor a and an offset b. It may be adaptively enabled or disabled for each inter-mode coded Coding Unit (CU). The parameters are then used to generate the LIC predictor by the following equation:
pred L (i,j)=α2·rec L ″(i,j)+β2; (20)
wherein pred is L (i, j) represents the value of the prediction sample point in the CU, and rec " L (i, j) represents the value of a reference sample point in a reference picture of the current block or CU. The reference samples are located by either the encoded motion vectors or the derived motion vectors.
When LIC is applied to a CU, the parameters α2 and β2 may be derived using a least squares error method by using neighboring samples of the current CU and their corresponding reference samples. More specifically, as illustrated in fig. 11, neighboring samples of the current block (or CU) (downsampling may be applied) and corresponding reference samples in the reference picture that may be identified by motion information of the current block/CU or sub-CU are used. The luminance compensation (IC) parameters α2 and β2 are derived and applied separately for each predicted direction.
When a CU is encoded in merge mode, the LIC flag is copied from neighboring blocks in a manner similar to the motion information copy in merge mode; otherwise, an LIC flag is issued for the CU to indicate whether LIC is applied.
When LIC is enabled for pictures, an additional CU-level rate-distortion (RD) check is required to determine if it is a CU application LlC. When LIC is enabled for CU, the sum of absolute differences removed (MR-SAD) and the sum of absolute hadamard transform removed (MR-SATD) are used instead of Sum of Absolute Differences (SAD) and Sum of Absolute Transform (SATD), respectively, for integer-pixel motion search and fractional-pixel motion search.
According to some examples of the present disclosure, for LIC mode, a min-Max method may be used that may be implemented as a parameter derivation process. In some other examples, the same method for deriving parameters (e.g., α and β) may be used for CCLM mode and LIC mode. That is, the two codec modes may share the same processing module or parameter derivation process. In the parameter derivation process, α1 and α2 are examples of the parameter α, and β1 and β2 are examples of the parameter β.
The parameter derivation process implements an algorithm that derives parameters α and β using a preset number X of pairs of samples, each pair of samples including an anchor sample that is an adjacent reconstructed sample and a corresponding reference sample for the anchor sample. The CCLM parameters and LIC parameters (e.g., α1 and β1 in equation (1) and α2 and β2 in equation (20)) can only be derived using X (X is a positive integer) pairs of samples. Each pair of samples comprises an anchor sample and a reference sample. For CCLM mode, the anchor samples are neighboring chroma samples and the reference samples are the corresponding luma samples of the anchor samples. An example of a pair of vertices of a CCLM is illustrated in FIG. 5. For the CCLM mode, the reference sample may be a downsampled luminance sample (e.g., a downsampled luminance sample obtained by using equation (3)) or a reconstructed luminance sample without downsampling (e.g., a luminance sample obtained by using one of equations (5) to (9) that directly takes one luminance sample).
For LIC mode, the anchor samples are neighboring reconstructed samples of the current block, and the reference samples are corresponding reference samples of the anchor samples in the reference picture. An example of a sample pair of LIC is illustrated in fig. 11. In yet another configuration, the reference samples are neighboring reconstructed samples of the current block, and the anchor samples are corresponding reference samples of the reference samples in the reference picture.
The following sections describe several embodiments of parameters for deriving CCLM/LIC modes, where x=3 and x=4. However, the concepts of the present disclosure can be extended to cases with other X values, i.e., other numbers of pairs of samples.
Sample pair selection at x=3
In one embodiment, as shown in fig. 12 and 13, three pairs of samples are used as selected pairs of samples for deriving CCLM/LIC parameters α and β:
the top sample in the left neighbor sample,
bottom samples in left neighbor samples
The rightmost sample in the upper neighbor samples.
Specifically, for the CCLM mode, a pair of spots (Rec 'as shown in fig. 12 is selected' L [-1,0],Rec C [-1,0])、(Rec′ L [-1,H-1],Rec C [-1,H-1]) (Rec' L [W-1,-1],Rec C [W-1,-1]) As a pair of samples for deriving CCLM parameters, where W and H represent the width and height of the chroma block. For LIC mode, a pair of spots (Rec 'is selected as shown in FIG. 13' L [-1,0],Rec″ L [-1,0])、(Rec′ L [-1,H-1],Rec″ L [-1,H-1]) (Rec' L [W-1,-1],Rec″ L [W-1,-1]) As a pair of samples for deriving the LIC parameter, W and H represent the width and height of the current block, and the current block may be a chroma block or a luma block. For simplicity of explanation, in the following examples, only neighboring samples of a current block of the CCLM mode are depicted, and the same sample mode may be easily mapped to the case of the LIC mode.
In another embodiment, as shown in fig. 14, the leftmost one of the upper neighbor points, the bottom one of the left neighbor points, and the rightmost one of the upper neighbor points are used as the selected three pairs of points. That is, a pair of spots (Rec' L [0,-1],Rec C [0,-1])、(Rec′ L [-1,H-1],Rec C [-1,H-1]) (Rec' L [W-1,-1],Rec C [W-1,-1]) As pairs of samples for deriving CCLM parameters. For LIC mode, a pair of spots (Rec' L [0,-1],Rec″ L [0,-1])、(Rec′ L [-1,H-1],Rec″ L [-1,H-1]) (Rec' L [W-1,-1],Rec″ L [W-1,-1]) As three sample pairs for deriving the LIC parameters.
In some other embodiments, two alternative choices of samples are depicted in fig. 15 and 16, respectively. In FIG. 15, the leftmost one of the superscalar samples, half the width of the leftmost one of the superscalar samples, are selected (e.g., for CCLM, (Rec' L [W/2,-1],Rec C [W/2,-1]) The method comprises the steps of carrying out a first treatment on the surface of the For LIC, (Rec' L [W/2,-1],Rec″ L [W/2,-1]) And the rightmost of the previous neighbors. In FIG. 16, the leftmost of the superscalar samples, the width (W) of the leftmost of the superscalar samples, are selected (e.g., for CCLM, (Rec' L [W,-1],Rec C [W,-1]) The method comprises the steps of carrying out a first treatment on the surface of the For LIC, (Rec' L [W,-1],Rec″ L [W,-1]) And the rightmost of the upper neighbors (e.g., for CCLM, (Rec' L [2W-1,-1],Rec C [2W-1,-1]) The method comprises the steps of carrying out a first treatment on the surface of the For LIC, (Rec' L [2W-1,-1],Rec″ L [2W-1,-1]) The upper neighbor is extended to (w+w). Similar sampling point selection can be applied to the case of the left neighbor sampling point.
The sampling point pair selection may not be limited to the above-described embodiment. The three pairs of spots may be any three pairs of spots selected from the upper side or the left side reconstructed neighboring spots, and the neighboring spots are not limited to only the upper side or the left side of the spots.
CCLM/LIC parameter derivation method when X=3
In the following paragraphs, the derivation method is described with reference to CCLM parameters for illustrative purposes. The method of deriving the LIC parameters is the same and will not be described in detail here. In one embodiment, the pair of samples having the largest reference sample value and the pair of samples having the smallest reference sample value are identified as the largest pair of samples and the smallest pair of samples, respectively, by reference sample value comparison. The reference sample point value of the maximum sample point pair is marked as x B The anchor point value of the maximum point pair is marked as y B The method comprises the steps of carrying out a first treatment on the surface of the The reference sample point value of the minimum sample point pair is marked as x A The anchor point value of the minimum point pair is marked as y A . The parameters α and β are then calculated using equation (2).
In another embodiment, the sample value is obtained by reference toThe comparison identifies the sample pair with the largest reference sample value and the sample pair with the smallest reference sample value. Then a downsampled sample associated with the reference sample value of the largest sample pair is generated (e.g., using equation (4)), and the reference sample value of the downsampled sample is noted as x B The method comprises the steps of carrying out a first treatment on the surface of the The anchor point value of the maximum point pair is marked as y B The method comprises the steps of carrying out a first treatment on the surface of the Then a downsampled sample associated with the reference sample value of the minimum sample pair is generated (e.g., using equation (4)), and the reference sample value of the downsampled sample is noted as x A The method comprises the steps of carrying out a first treatment on the surface of the And the anchor point value of the minimum point pair is marked as y A . The alpha parameter and beta are then calculated using equation (2).
In yet another embodiment, the sample points having the maximum reference sample point value, the intermediate reference sample point value (or intermediate reference sample point value), and the minimum reference sample point value, respectively, are identified as the maximum sample point pair, the intermediate sample point pair, and the minimum sample point pair, respectively, by reference sample point value comparison. The weighted average of the reference sample values of the maximum sample pair and the intermediate sample pair is noted as x B And the weighted average of the anchor sample values of the maximum sample pair and the intermediate sample pair is noted as y B The method comprises the steps of carrying out a first treatment on the surface of the The weighted average of the reference sample values of the intermediate sample pair and the minimum sample pair is noted as x A And the weighted average of the anchor sample values of the intermediate sample pair and the minimum sample pair is noted as y A 。x A 、y A 、x B And y B Is derived based on the following equation:
x A =(w1*x mid +w2*x min +offset1)>>N1;
y A =(w1*y mid +w2*y min +offset1)>>N1;
x B =(w3*x max +w4*x mid +offset2)>>N2;
y B =(w3*y max +w4*y mid +offset2)>>N2;
wherein x is max Is the reference sample value, x, of the maximum sample pair mid Is the reference sample value, x, of the intermediate sample pair min A reference sample value, y, which is the minimum sample pair max An anchor sample value, y, which is the maximum sample pair mid Is the anchor sample value of the intermediate sample pair, and y min An anchor sample value that is a minimum sample pair; w1, w2, w3, w4, offset1, offset2, N1 and N2 are predefined parameters; and w1+w2= (1 < N1), offset1 = 1 < (N1-1); w3+w4= (1 < N2), offset2 = 1 < (N2-1).
Using the obtained x A 、y A 、x B And y B The parameters α and β are then calculated using equation (2).
In one example of applying equal weights, w1=1, w2=1, w3=1, w4=1; n1=1, n2=1; and offset1=1, offset2=1.
In yet another example, w1=3, w2=1, w3=1, w4=3; n1=2, n2=2; and offset1=2, offset2=2.
In yet another embodiment, i, j, k are used as indices for these three sample pairs, only in luma i With luma j Between and luma i With luma k Two comparisons were performed between. By these two comparisons, the three sample pairs may be ordered entirely by luminance value, or divided into two groups: one group contains two larger values and the other group contains one smaller value and vice versa. The above method may be used when the values are fully ordered. When the pairs of samples are divided into two groups, a weighted average of luminance or chrominance sample(s) in the same group (i.e., the group comprising the two pairs of samples) is derived (a single pair of samples in a group need not actually perform a weighted average). For example, when there are two pairs of samples in a group, the two luminance values in a group are averaged with equal weights, and the two chrominance values are also averaged with equal weights. The weighted average is used as x A 、y A 、x B And y B To derive CCLM parameters using equation (2).
The method of deriving the CCLM parameter may not be limited to the above embodiment. The CCLM parameters may be derived in any manner using the three sample pairs selected.
Sample pair selection at x=4
In one embodiment, as shown in FIG. 17, four pairs of samples are used as selected pairs of samples for deriving CCLM/LIC parameters:
the top sample in the left neighbor sample,
the leftmost sample in the upper neighbor,
bottom samples in left neighbor samples
The rightmost sample in the upper neighbor samples.
Specifically, a pair of spots (Rec' L [-1,0],Rec C [-1,0])、(Rec′ L [0,-1],Rec C [0,-1])、(Rec′ L [-1,H-1],Rec C [-1,H-1]) And (Rec' L [W-1,-1],Rec C [W-1,-1]) Deriving CCLM parameters, wherein W and H represent the width and height of the chroma blocks; and using a pair of spots (Rec' L [-1,0],Rec″ L [-1,0])、(Rec′ L [0,-1],Rec″ L [0,-1])、(Rec′ L [-1,H-1],Rec″ L [-1,H-1]) And (Rec' L [W-1,-1],Rec″ L [W-1,-1]) To derive LIC parameters where W and H represent the width and height of the current block, and the current block may be a chroma block or a luma block.
In another embodiment, as shown in fig. 18, a quarter of the width of the leftmost one of the upper neighbor points, a quarter of the height of the top one of the left neighbor points, the bottom one of the left neighbor points, and the rightmost one of the upper neighbor points are used as a pair of points for deriving CCLM/LIC parameters. Specifically, a pair of spots (Rec' L [W/4,-1],Rec C [W/4,-1])、(Rec′ L [-1,H/4],Rec C [-1,H/4])、(Rec′ L [-1,H-1],Rec C [-1,H-1]) And (Rec' L [W-1,-1],Rec C [W-1,-1]) As pairs of samples for deriving CCLM parameters. Selection of pairs of spots (Rec' L [W/4,-1],Rec″ L [W/4,-1])、(Rec′ L [-1,H/4],Rec″ L [-1,H/4])、(Rec′ L [-1,H-1],Rec″ L [-1,H-1]) And (Rec' L [W-1,-1],Rec″ L [W-1,-1]) As pairs of samples for deriving LIC parameters.
In yet another embodiment, another sampling pattern is presented in fig. 19. Selection corresponds to Rec' L [2,-1]、Rec′ L [2+W/2,-1]、Rec′ L [2+W,-1]、Rec′ L [2+3W/2,-1]As the pairs of samples used to derive the CCLM/LIC parameters. Similar sampling point selection can be applied to the case of the left neighbor sampling point.
The sampling point pair selection may not be limited to the above-described embodiment. The four pairs of spots may be any four pairs of spots selected from the upper side or left side reconstructed neighboring spots, and the neighboring spots are not limited to only the upper side one row or left side one column of spots. For example, a set of sample pairs includes: one-fourth of the width of the leftmost one of the top-adjacent samples, one-fourth of the width of the top-adjacent samples, three-fourths of the width of the leftmost one of the top-adjacent samples, three-fourths of the width of the top-adjacent samples.
Alternatively, the other set of sample pairs includes: one eighth of the width of the leftmost one of the superscalar samples, three eighths of the width of the leftmost one of the superscalar samples, five eighths of the width of the leftmost one of the superscalar samples, and seven eighths of the width of the leftmost one of the superscalar samples.
Alternatively, the other set of sample pairs includes: one eighth of the height of the top samples in the left neighbor, three eighth of the height of the top samples in the left neighbor, five eighth of the height of the top samples in the left neighbor, and seven eighth of the height of the top samples in the left neighbor.
CCLM/LIC parameter derivation method when X=4
In the following paragraphs, the derivation method is described with reference to CCLM parameters for illustrative purposes. The method of deriving the LIC parameters is the same and will not be described in detail here. In one embodiment, the sample points having the maximum reference sample point value and the minimum reference sample point value are respectively identified as the maximum sample point pair and the minimum sample point pair by the luminance sample point comparison. The reference sample point value of the maximum sample point pair is marked as x B Anchor sample of maximum sample point pairThe point value is denoted as y B The method comprises the steps of carrying out a first treatment on the surface of the The reference sample point value of the minimum sample point pair is marked as x A The anchor point value of the minimum point pair is marked as y A . The parameters α and β are then calculated using equation (2).
In another embodiment, the pair of samples having the largest reference sample value and the pair of samples having the smallest reference sample value are identified by reference sample value comparison. Then a downsampled sample associated with the reference sample value of the largest sample pair is generated (e.g., using equation (4)), and the reference sample value of the downsampled sample is noted as x B The method comprises the steps of carrying out a first treatment on the surface of the The anchor point value of the maximum point pair is marked as y B The method comprises the steps of carrying out a first treatment on the surface of the Then a downsampled sample associated with the reference sample value of the minimum sample pair is generated (e.g., using equation (4)), and the reference sample value of the downsampled sample is noted as x A The method comprises the steps of carrying out a first treatment on the surface of the And the anchor point value of the minimum point pair is marked as y A . The parameters α and β are then calculated using equation (2).
In yet another embodiment, pairs of samples having two larger reference sample values and two smaller reference sample values, respectively, are identified by reference sample comparison. The reference sample value of the two larger sample pairs is noted as x B0 、x B1 And the anchor sample value of the two larger sample pairs is noted as y B0 、y B1 The reference sample value of the two smaller sample pairs is noted as x A0 、x A1 And the anchor point value of the two smaller point pairs is noted as y A0 、y A1 . As illustrated in the following equation, x is then derived A 、x B 、y A And y B As x A0 、x A1 、x B0 、x B1 、y A0 、y A1 And y B0 、y B1 Is a weighted average of (c). The parameters α and β are then calculated using equation (2).
x A =(w1*x A0 +w2*x A1 +offset1)>>N1;
x B =(w3*x B0 +w4*x B1 +offset2)>>N2;
y A =(w1*y A0 +w2*y A1 +offset1)>>N1;
y B =(w3*y B0 +w4*y B1 +offset2)>>N2;
Wherein w1+w2= (1 < N1), offset1 = 1 < (N1-1); w3+w4= (1 < N2), offset2 = 1 < (N2-1).
In one example of applying equal weights, w1=1, w2=1, w3=1, w4=1; n1=1, n2=1 and offset1=1, offset2=2.
In yet another example, w1=3, w2=1, w3=1, w4=3; n1=2, n2=2 and offset1=2, offset2=2.
The method of deriving the CCLM/LIC parameters may not be limited to the above embodiments. The CCLM/LIC parameters may be derived in any manner using these four selected pairs of samples.
Fig. 20 is a block diagram illustrating an apparatus for video encoding and decoding according to some embodiments of the present disclosure. The apparatus 2000 may be a terminal such as a mobile phone, tablet computer, digital broadcast terminal, tablet device, or personal digital assistant.
As shown in fig. 20, the apparatus 2000 may include one or more of the following: a processing component 2002, a memory 2004, a power supply component 2006, a multimedia component 2008, an audio component 2010, an input/output (I/O) interface 2012, a sensor component 2020, and a communication component 2016.
The processing component 2002 generally controls overall operation of the device 2000, such as operations related to display, telephone calls, data communications, camera operations, and recording operations. The processing element 2002 may include one or more processors 2020 for executing instructions to perform all or part of the steps of the methods described above. Further, the processing component 2002 may include one or more modules to facilitate interactions between the processing component 2002 and other components. For example, processing component 2002 can include multimedia modules to facilitate interaction between multimedia component 2008 and processing component 2002.
The memory 2004 is configured to store different types of data to support the operation of the apparatus 2000. Examples of such data include instructions for any application or method running on the device 2000, contact data, phonebook data, messages, pictures, video, and the like. Memory 2004 may be implemented by any type of volatile or non-volatile memory device or combination thereof, and memory 2004 may be a Static Random Access Memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic disk, or an optical disk.
The power supply unit 2006 provides power to the different components of the apparatus 2000. The power supply components 2006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 2000.
The multimedia component 2008 includes a screen that provides an output interface between the device 2000 and the user. In some examples, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen comprises a touch panel, the screen may be implemented as a touch screen that receives input signals from a user. The touch panel may include one or more touch sensors for sensing touches, swipes, and gestures on the touch panel. The touch sensor may sense not only a boundary of a touch action or a sliding action, but also a duration and a pressure associated with the touch or sliding operation. In some examples, multimedia component 2008 may include a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 2000 is in an operational mode (e.g., a photographing mode or a video mode).
Audio component 2010 is configured to output and/or input audio signals. For example, audio component 2010 includes a Microphone (MIC). The microphone is configured to receive external audio signals when the device 2000 is in an operational mode (e.g., a call mode, a recording mode, and a voice recognition mode). The received audio signal may be further stored in the memory 2004 or transmitted via the communication component 2016. In some examples, audio component 2010 further includes a speaker for outputting audio signals.
I/O interface 2012 provides an interface between the processing component 2002 and peripheral interface modules. The peripheral interface module may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
The sensor component 2020 includes one or more sensors for providing status assessment of various aspects of the apparatus 2000. For example, the sensor component 2020 may detect an on/off state of the device 2000 and a relative position of the components. For example, the components are a display and a keyboard of the device 2000. The sensor component 2020 may also detect a change in position of the device 2000 or a component of the device 2000, the presence or absence of user contact with the device 2000, an orientation or acceleration/deceleration of the device 2000, and a change in temperature of the device 2000. The sensor component 2020 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor component 2020 may further include an optical sensor, such as a CMOS or CCD image sensor used in imaging applications. In some examples, the sensor component 2020 may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 2016 is configured to facilitate wired or wireless communication between the apparatus 2000 and other devices. The device 2000 may access a wireless network based on a communication standard such as WiFi, 4G, or a combination thereof. In an example, the communication section 2016 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an example, the communication component 2016 may further include a Near Field Communication (NFC) module for facilitating short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an example, the apparatus 2000 may be implemented by one or more of the following to perform the above-described methods: an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components.
The non-transitory computer readable storage medium may be, for example, a Hard Disk Drive (HDD), a Solid State Drive (SSD), a flash memory, a hybrid drive or Solid State Hybrid Drive (SSHD), a read-only memory (ROM), a compact disc read-only memory (CD-ROM), a magnetic tape, a floppy disk, etc.
Fig. 21 is a flowchart illustrating an exemplary video codec process for generating a prediction signal using a linear model according to some embodiments of the present disclosure.
In step 2102, the processor 2020 derives the prediction parameters α and β through a parameter derivation process using the neighboring reconstructed chroma samples and their reference samples.
In some embodiments, the parameter derivation process for deriving the predicted parameters α and β is provided, for example, by a device, circuit, or block of computer program code.
In step 2104, the processor 2020 determines whether to apply a local Light Intensity Compensation (LIC) mode to the current Coding Unit (CU), and when it is determined to apply the LIC mode, derives parameters α2 and β2 of LIC by performing a parameter derivation process, and obtains a final LIC prediction value based on the following equation:
pred L (i,j)=α2·rec L ″(i,j)+β2;
wherein α2 and β2 are examples of parameters α and β; pred (pred) L (i, j) represents the value of the LIC prediction samples in the current CU; and rec' L (i, j) represents the value of the reference sample point in the reference picture of the current CU.
In some examples, processor 2020 may further determine whether to apply a cross-component linear model (CCLM) mode to a current Coding Unit (CU), and when determining to apply the CCLM mode, derive parameters α1 and β1 of the CCLM by performing a parameter derivation process, and obtain a final CCLM predicted value based on the following equation:
pred C (i,j)=α1·rec L ′(i,j)+β1;
Wherein α1 and β1 are examples of parameters α and β; pred (pred) C (i, j) represents a value of a CCLM predicted chroma sample in the current CU; and rec' L (i, j) represents the value of the downsampled reconstructed luma samples of the current CU.
The parameter derivation process implements an algorithm that derives parameters α and β using a preset number X of pairs of samples, each pair of samples including an anchor sample that is an adjacent reconstructed sample and a corresponding reference sample for the anchor sample.
In some examples, an apparatus for video encoding and decoding is provided. The device comprises: a processor 2020; and a memory 2004 configured to store instructions executable by the processor; wherein the processor, when executing the instructions, is configured to perform the method as illustrated in fig. 21.
In some other examples, a non-transitory computer-readable storage medium 2004 having instructions stored therein is provided. The instructions, when executed by the processor 2020, cause the processor to perform a method as illustrated in fig. 21.
The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will come to mind to one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
The embodiments were chosen and described in order to explain the principles of the present disclosure and to enable others of ordinary skill in the art to understand the various embodiments of the present disclosure and best utilize the basic principles and various embodiments with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure should not be limited to the specific examples of the disclosed embodiments, and that modifications and other embodiments are intended to be included within the scope of the disclosure.

Claims (12)

1. A method for video encoding and decoding, the method comprising:
deriving prediction parameters by a parameter derivation process using neighboring reconstructed chroma samples and reference samples thereofAnd->
Wherein the prediction parameters are derived by a parameter derivation process using adjacent reconstructed chroma samples and reference samples thereofAnd->Comprising the following steps: when it is determined that a local light compensation LIC mode is applied to the current coding unit CU, the parameters of the LIC are derived by performing the parameter derivation procedure>And->And obtaining a final LIC predictor based on the following equation:
wherein,and->Is the parameter->And->Is an example of (a);
a value representing an LIC prediction sample in the current CU; and is also provided with
Values representing reference samples in a reference picture of the current CU,
Wherein the parameter derivation process implements derivation of the parameter using a preset number X of sample pairsAnd->Each pair of samples comprising an anchor sample as an adjacent reconstructed sample and a corresponding reference sample for said anchor sample,
wherein the parameter derivation process comprises:
identifying the reference sample value with the largest reference sample value by comparison of the reference sample values of the X sample pairsHas a minimum reference sample value +.>Second sample pair of (c), and having medium reference sample valueIs a third pair of spots;
deriving a first weighted average of reference sample values of the second and third sample pairs based on the following equationx A Anchor sample value of the second sample pairAnd the anchor-sample value of said third-sample pair +.>Is the first weighted average of (2)y A A reference sample value of the first and third sample pairsSecond weighted averagex B Anchor sample value +.>And the anchor-sample value of said third-sample pair +.>Is the second weighted average of (2)y B
Wherein w1, w2, w3, w4, offset1, offset2, N1 and N2 are predefined parameters; and
the parameters are obtained based on the following equation And->
Wherein the preset number X is an odd number and the preset number X is 3,
wherein the predefined parameters satisfy the following relationship:
w1+w2=(1<<N1),w3+w4=(1<<N2),offset1=1<<(N1-1),offset2=1<<(N2-1),
w1=w4, w2=w3, and w1 is different from w2.
2. The method of claim 1, wherein the prediction parameters are derived by a parameter derivation process using neighboring reconstructed chroma samples and reference samples thereofAnd->Further comprises:
deriving parameters of the CCLM by performing the parameter derivation process when determining to apply a cross-component linear model CCLM mode to the current coding unit CUAnd->And obtaining a final CCLM predicted value based on the following equation:
wherein,and->Is the parameter->And->Is an example of (a);
a value representing a CCLM predicted chroma sampling point in the current CU; and is also provided with
A value representing a downsampled reconstructed luma sample of the current CU.
3. The method of claim 2, wherein the three pairs of samples include a top sample in a left neighbor, a bottom sample in the left neighbor, and a rightmost sample in a top neighbor.
4. A method as claimed in claim 3, wherein the anchor samples are adjacent chroma samples and in the CCLM mode the reference samples are luma samples at positions corresponding to the anchor samples and in the LIC mode the reference samples are reference samples at positions corresponding to the anchor samples in the reference picture.
5. A device for video encoding and decoding, the device comprising:
a processor; and
a memory configured to store instructions executable by the processor; wherein the processor, when executing the instructions, is configured to:
deriving prediction parameters by a parameter derivation process using neighboring reconstructed chroma samples and reference samples thereofAnd->The method comprises the steps of carrying out a first treatment on the surface of the Wherein the prediction parameter ++is derived by a parameter derivation process using neighboring reconstructed chroma samples and their reference samples>And->Comprising the following steps: when it is determined that a local light compensation LIC mode is applied to the current coding unit CU, the parameters of the LIC are derived by performing the parameter derivation procedure>And->And obtaining a final LIC predictor based on the following equation:
wherein,
and->Is the parameter->And->Is an example of (a);
a value representing an LIC prediction sample in the current CU; and is also provided with
Values representing reference samples in a reference picture of the current CU,
wherein the parameter derivation process implements derivation of the parameter using a preset number X of sample pairsAnd->Each pair of samples comprising an anchor sample as an adjacent reconstructed sample and a corresponding reference sample for said anchor sample,
Wherein the parameter derivation process comprises:
through the X sampling pointsComparing reference sample values of reference samples in a pair to identify a reference sample value having the greatest valueHas a minimum reference sample value +.>Second sample pair of (c), and having medium reference sample valueIs a third pair of spots;
deriving a first weighted average of reference sample values of the second and third sample pairs based on the following equationx A Anchor sample value of the second sample pairAnd the anchor-sample value of said third-sample pair +.>Is the first weighted average of (2)y A A second weighted average of reference sample values of the first and third sample pairsx B Anchor sample value +.>And the anchor-sample value of said third-sample pair +.>Is the second weighted average of (2)y B
Wherein w1, w2, w3, w4, offset1, offset2, N1 and N2 are predefined parameters; and
obtained based on the following equationThe parameters areAnd->
Wherein the preset number X is an odd number and the preset number X is 3,
wherein the predefined parameters satisfy the following relationship:
w1+w2=(1<<N1),w3+w4=(1<<N2),offset1=1<<(N1-1),offset2=1<<(N2-1),
w1=w4, w2=w3, and w1 is different from w2.
6. The apparatus of claim 5, wherein the prediction parameters are derived by a parameter derivation process using neighboring reconstructed chroma samples and reference samples thereof And->Further comprises: when it is determined to apply a cross-component linear model CCLM mode to the current coding unit CU, deriving parameters of the CCLM by performing the parameter derivation process +.>And->And obtaining a final CCLM predicted value based on the following equation:
wherein,and->Is the parameter->And->Is an example of (a);
a value representing a CCLM predicted chroma sampling point in the current CU; and is also provided with
A value representing a downsampled reconstructed luma sample of the current CU.
7. The apparatus of claim 6, wherein the three pairs of samples comprise a top sample in a left neighbor, a bottom sample in the left neighbor, and a rightmost sample in a top neighbor.
8. The apparatus of claim 7, wherein the anchor samples are neighboring chroma samples, and in the CCLM mode, the reference samples are luma samples at locations corresponding to the anchor samples, and in the LIC mode, the reference samples are reference samples at locations corresponding to the anchor samples in the reference picture.
9. A non-transitory computer-readable storage medium comprising instructions stored therein, wherein the instructions, when executed by a processor, cause the processor to:
Using adjacent reconstructed chroma samplesPoints and their reference samples derive predicted parameters by a parameter derivation processAnd->The method comprises the steps of carrying out a first treatment on the surface of the Wherein the prediction parameter ++is derived by a parameter derivation process using neighboring reconstructed chroma samples and their reference samples>And->Comprising the following steps: when it is determined that a local light compensation LIC mode is applied to the current coding unit CU, the parameters of the LIC are derived by performing the parameter derivation procedure>And->And obtaining a final LIC predictor based on the following equation:
wherein,
and->Is the parameter->And->Is an example of (a);
a value representing an LIC prediction sample in the current CU; and is also provided with
Values representing reference samples in a reference picture of the current CU,
wherein the parameter derivation process implements derivation of the parameter using a preset number X of sample pairsAnd->Each pair of samples comprising an anchor sample as an adjacent reconstructed sample and a corresponding reference sample for said anchor sample,
wherein the parameter derivation process comprises:
identifying the reference sample value with the largest reference sample value by comparison of the reference sample values of the X sample pairsHas a minimum reference sample value +.>Second sample pair of (c), and having medium reference sample value Is a third pair of spots;
deriving a first weighted average of reference sample values of the second and third sample pairs based on the following equationx A Anchor sample value of the second sample pairAnd the third sample pairAnchor point value ∈10>Is the first weighted average of (2)y A A second weighted average of reference sample values of the first and third sample pairsx B Anchor sample value +.>And the anchor-sample value of said third-sample pair +.>Is the second weighted average of (2)y B
Wherein w1, w2, w3, w4, offset1, offset2, N1 and N2 are predefined parameters; and
the parameters are obtained based on the following equationAnd->
Wherein the preset number X is an odd number and the preset number X is 3,
wherein the predefined parameters satisfy the following relationship:
w1+w2=(1<<N1),w3+w4=(1<<N2),offset1=1<<(N1-1),offset2=1<<(N2-1),
w1=w4, w2=w3, and w1 is different from w2.
10. The non-transitory system of claim 9A state computer readable storage medium, wherein the prediction parameters are derived by a parameter derivation process using adjacent reconstructed chroma samples and reference samples thereofAnd->Further comprises:
deriving parameters of the CCLM by performing the parameter derivation process when determining to apply a cross-component linear model CCLM mode to the current coding unit CU And->And obtaining a final CCLM predicted value based on the following equation:
wherein,and->Is the parameter->And->Is an example of (a);
a value representing a CCLM predicted chroma sampling point in the current CU; and is also provided with
A value representing a downsampled reconstructed luma sample of the current CU.
11. The non-transitory computer-readable storage medium of claim 10, wherein the three pairs of samples include a top sample in a left neighbor, a bottom sample in the left neighbor, and a rightmost sample in a top neighbor.
12. The non-transitory computer-readable storage medium of claim 11, wherein the anchor samples are neighboring chroma samples, and in the CCLM mode, the reference samples are luma samples at locations corresponding to the anchor samples; and in the LIC mode, the reference sample point is a reference sample point at a position in the reference picture corresponding to the anchor sample point.
CN202080016731.9A 2019-01-17 2020-01-16 Method and apparatus for linear model derivation for video coding and decoding Active CN113491130B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201962793869P 2019-01-17 2019-01-17
US62/793,869 2019-01-17
PCT/US2020/013965 WO2020150535A1 (en) 2019-01-17 2020-01-16 Methods and apparatus of linear model derivation for video coding

Publications (2)

Publication Number Publication Date
CN113491130A CN113491130A (en) 2021-10-08
CN113491130B true CN113491130B (en) 2024-02-27

Family

ID=71613454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080016731.9A Active CN113491130B (en) 2019-01-17 2020-01-16 Method and apparatus for linear model derivation for video coding and decoding

Country Status (2)

Country Link
CN (1) CN113491130B (en)
WO (1) WO2020150535A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116868571A (en) * 2021-02-22 2023-10-10 北京达佳互联信息技术有限公司 Improved local illumination compensation for inter prediction
WO2023134452A1 (en) * 2022-01-11 2023-07-20 Beijing Bytedance Network Technology Co., Ltd. Method, apparatus, and medium for video processing
WO2024037649A1 (en) * 2022-08-19 2024-02-22 Douyin Vision Co., Ltd. Extension of local illumination compensation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018039596A1 (en) * 2016-08-26 2018-03-01 Qualcomm Incorporated Unification of parameters derivation procedures for local illumination compensation and cross-component linear model prediction
CN107810635A (en) * 2015-06-16 2018-03-16 Lg 电子株式会社 Method and apparatus based on illuminance compensation prediction block in image compiling system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2699253C2 (en) * 2010-09-03 2019-09-04 Гуандун Оппо Мобайл Телекоммьюникейшнз Корп., Лтд. Method and system for compensation of illumination and transition when encoding and processing video signal
US10419757B2 (en) * 2016-08-31 2019-09-17 Qualcomm Incorporated Cross-component filter
US10652575B2 (en) * 2016-09-15 2020-05-12 Qualcomm Incorporated Linear model chroma intra prediction for video coding
US10880570B2 (en) * 2016-10-05 2020-12-29 Qualcomm Incorporated Systems and methods of adaptively determining template size for illumination compensation
US10542280B2 (en) * 2017-01-09 2020-01-21 QUALCOMM Incorpated Encoding optimization with illumination compensation and integer motion vector restriction
WO2019004283A1 (en) * 2017-06-28 2019-01-03 シャープ株式会社 Video encoding device and video decoding device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107810635A (en) * 2015-06-16 2018-03-16 Lg 电子株式会社 Method and apparatus based on illuminance compensation prediction block in image compiling system
WO2018039596A1 (en) * 2016-08-26 2018-03-01 Qualcomm Incorporated Unification of parameters derivation procedures for local illumination compensation and cross-component linear model prediction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CE3-related: Modified linear model derivation for CCLM modes;Meng Wang等;《Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11》;20190103;正文第1-2节 *

Also Published As

Publication number Publication date
WO2020150535A1 (en) 2020-07-23
CN113491130A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
US9906790B2 (en) Deblock filtering using pixel distance
WO2015006662A2 (en) Adaptive filtering in video coding
CN113491130B (en) Method and apparatus for linear model derivation for video coding and decoding
KR102484182B1 (en) Methods and apparatus of video coding for deriving affine motion vectors for chroma components
CN117221532B (en) Method, apparatus and storage medium for video decoding
CN113545050B (en) Video encoding and decoding method and device using triangle prediction
CN116156164B (en) Method, apparatus and readable storage medium for decoding video
US10009632B2 (en) Flicker detection and mitigation in video coding
CN114513666A (en) Method and apparatus for decoder-side motion vector refinement in video coding and decoding
CN114402618A (en) Method and apparatus for decoder-side motion vector refinement in video coding and decoding
CN115336272A (en) Affine motion derivation based on spatial neighborhood

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant