CN112118452A

CN112118452A - Video decoding method and device and computer equipment

Info

Publication number: CN112118452A
Application number: CN202010551778.7A
Authority: CN
Inventors: 李贵春; 李翔; 许晓中; 刘杉
Original assignee: Tencent America LLC
Current assignee: Tencent America LLC
Priority date: 2019-06-20
Filing date: 2020-06-17
Publication date: 2020-12-22
Anticipated expiration: 2040-06-17
Also published as: CN112118452B

Abstract

Embodiments of the present application provide a method and apparatus for video decoding, a computer device, and a storage medium. The method can comprise the following steps: receiving information about a current data block of an image; determining whether Local Illumination Compensation (LIC) is available for the current data block; when it is determined that the LIC is available for the current data chunk, performing at least one of: inferring an LIC flag of the current data chunk corresponding to the LIC that is enabled to a valid value, or inheriting an LIC flag of the current data chunk from an LIC flag of a neighboring block; and based on the LIC flag of the current data block corresponding to the LIC that is enabled, generating a prediction of at least one sub-block using a derived motion vector by applying the LIC to the current data block using the inherited LIC flag.

Description

Video decoding method and device and computer equipment

PRIORITY INFORMATION

Priority of U.S. provisional application No. 62/864,461 filed on 20.6.2019 and U.S. application No. 16/894,051 filed on 5.6.2020, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to video encoding and decoding techniques. In particular, the present application relates to a method and apparatus for video decoding, a computer device and a storage medium.

Background

For decades, it has been known to use inter-picture prediction with motion compensation for video encoding and decoding. Uncompressed digital video may comprise a series of pictures, each picture having spatial dimensions of, for example, 1920x1080 luma samples and related chroma samples. The series of pictures has a fixed or variable picture rate (also informally referred to as frame rate), e.g. 60 pictures per second or 60 Hz. Uncompressed video has very large bit rate requirements. For example, 1080p 604: 2:0 video (1920x1080 luminance sample resolution, 60Hz frame rate) with 8 bits per sample requires close to 1.5Gbit/s bandwidth. One hour of such video requires more than 600GB of storage space.

One purpose of video encoding and decoding is to reduce redundant information of an input video signal by compression. Video compression can help reduce the bandwidth or storage requirements described above, by two or more orders of magnitude in some cases. Lossless and lossy compression, as well as combinations of both, may be employed. Lossless compression refers to a technique for reconstructing an exact copy of an original signal from a compressed original signal. When lossy compression is used, the reconstructed signal may not be identical to the original signal, but the distortion between the original signal and the reconstructed signal is small enough that the reconstructed signal is useful for the intended application. Lossy compression is widely used for video. The amount of distortion tolerated depends on the application. For example, some users consuming streaming media applications may tolerate higher distortion than users of television applications. The achievable compression ratio reflects: higher allowable/tolerable distortion may result in higher compression ratios.

Video compression/decompression techniques are generally understood by those of ordinary skill in the art. Generally, to compress video or image data, a series of functional steps may be performed to produce a compressed video or image file. Although images, such as 360 ° images (e.g., captured by a 360 ° camera), may be suitable for compression, for ease of reading, compression of video files will be explained. To generate a compressed video file, under conventional standards (e.g., h.264, h.265), a stream of uncompressed video samples received from a video source may be split or parsed, which results in sample blocks of two or more reference pictures.

Bi-directional prediction may involve the following techniques: a Prediction Unit (PU), e.g., a block of prediction samples, may be predicted from two blocks of motion compensated samples of two or more reference pictures. Bi-prediction was first introduced to video coding standards in MPEG-1 and has been included in other video coding techniques and standards, such as the second part of MPEG-2 (or h.262), h.264 and h.265, among others.

When decompressing a compressed video file, in reconstructing samples of a bi-directionally predicted PU, motion compensated and interpolated input samples from each reference block may be multiplied by a weighting factor, and the so weighted sample values of the two reference blocks may be added to generate the samples being reconstructed, where the weighting factor for each reference block is different. This sample may be further processed by mechanisms such as loop filtering.

In MPEG-1 and MPEG-2, the weighting factor is determined based on the relative temporal distance between the picture to which the PU being reconstructed belongs and the two reference pictures. This is possible because in MPEG-1 and MPEG-2 one of the two reference I or P pictures is in the "past" and the other is in the "future" of the B picture being reconstructed (in terms of presentation order), and because in MPEG-1 and MPEG-2 a well-defined temporal relationship is established for any picture being reconstructed that is related to its reference picture.

Starting from h.264, the reference picture selection concept for bi-directionally predicted pictures is relaxed, so that the reference picture only needs to be earlier in decoding order, but not in presentation order. Also, the notion of time is relaxed because neither h.264 nor h.265 requires a constrained/fixed picture interval in the time domain. Therefore, the decoder can no longer calculate the weighting factors based on the timing information available in the bitstream. In contrast, h.264 and h.265 include a "default value" of 0.5 as a weighting factor for the reference samples of the bi-directionally predicted picture. This default value may be overridden by a syntax (syntax) called pred _ weight _ table () that is available in the slice header. This default value of 0.5 or the information in pred _ weight _ table () is applicable to all bi-directionally predicted PUs in a given slice (slice).

Non-patent document 1: "Transform design for HEVC with 16-bit intermediate data representation" (Transform design for HEVC with 16-bit intermediate data representation) published by the Joint collaborative team for video coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 at the 5 th meeting in Switzerland Geneva, 3.Y. to 23.2011 (document number: JCTVC-E243). The h.265/HEVC (high efficiency video coding) standard may include version 1(2013), version 2(2014), version 3(2015), and version 4(2016), published by ITU-T VCEG (Q6/16) ISO/IEC MPEG (JTC 1/SC29/WG 11.

Non-patent document 1 shows the h.265/HEVC standard. However, the inventors have investigated the need to standardize future video coding techniques with compression capabilities that significantly exceed the HEVC standard (including extensions thereof).

Non-patent document 2: "Universal Video Coding (VVC) and Test Model 1(VTIM 1) algorithmic description" (Algorithm description for Versatile Video Coding and Test Model 1(VTM 1)) (document number: JVET-J1002-v2) published by the Video Joint experts group (JVT ET) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 at conference number 10 of san Diego, USA from 4 months 10 to 20 days 2018 (JVT 1)).

Non-patent document 2 discloses a recently introduced standardized format for next generation video coding beyond HEVC, which is called universal video coding (VVC), with version VTM (VVC test model). VVCs may generally provide a large (e.g., 64-point or higher) transformation kernel using a 10-bit (10-bit) integer matrix.

However, when LIC is applied to an affine coding block, if the current block is encoded using the construct affine merge mode, LIC will be enabled when the LIC flag of any source that constructs a control point is set to true. This may not be optimal for coding efficiency.

When the current block is coded in affine mode using Motion Vector Difference (MVD) coding, then the LIC flag is explicitly signaled. This results in additional signaling costs.

Disclosure of Invention

The embodiment of the application provides a decoding or encoding method and device, computer equipment and a storage medium, and aims to solve the problems of low encoding efficiency when LIC is applied to an affine encoding block and extra signal transmission cost caused when a current block is encoded by using MVD encoding in an affine mode.

According to an embodiment of the present application, a method of video decoding may include:

receiving information about a current data block of an image;

determining whether local illumination compensation, LIC, is available for the current data block;

when it is determined that the LIC is available for the current data chunk, performing at least one of: inferring an LIC flag of the current data chunk corresponding to the LIC that is enabled to a valid value, or inheriting an LIC flag of the current data chunk from an LIC flag of a neighboring block; and

generating a prediction of at least one sub-block using a derived motion vector based on the LIC flag corresponding to the current data block for which the LIC is enabled, by applying the LIC to the current data block using the inherited LIC flag.

According to an embodiment of the present application, an apparatus for video decoding may include:

a receiving module for receiving information about a current data block of an image;

a determining module for determining whether local illumination compensation, LIC, is available for the current data block;

an execution module, configured to, upon determining that the LIC is available for the current data chunk, execute at least one of: inferring an LIC flag of the current data chunk corresponding to the LIC that is enabled to a valid value, or inheriting an LIC flag of the current data chunk from an LIC flag of a neighboring block; and

a generation module to generate a prediction of at least one sub-block using a derived motion vector based on the LIC flag corresponding to the current data block for which the LIC is enabled, by applying the LIC to the current data block using the inherited LIC flag.

According to an embodiment of the present application, a computer device includes one or more processors and one or more memories having stored therein at least one instruction that is loaded and executed by the one or more processors to implement the method of video decoding.

In an embodiment of the present application, when it is determined that the LIC is applied to the current data block, only the LIC flag of the current data block corresponding to the enabled LIC is inferred as a valid value, and the LIC is applied to the current data block by using the LIC flag inherited from the LIC flags of the neighboring blocks, thereby improving the encoding and decoding efficiency. Also, when the current data block is encoded using MVD coding in affine mode, the LIC flag does not need to be explicitly signaled, so that no additional signaling cost is required.

Brief description of the drawings

Other features, properties, and various advantages of the disclosed subject matter will be further apparent from the following detailed description and the accompanying drawings, in which

Fig. 1 is a schematic diagram of a simplified block diagram of a communication system according to an embodiment;

fig. 2 is a schematic diagram of a simplified block diagram of a streaming system according to an embodiment;

FIG. 3 is a schematic diagram of a simplified block diagram of a decoder according to an embodiment;

FIG. 4 is a schematic diagram of a simplified block diagram of an encoding system including an encoder and a local decoder, according to an embodiment;

fig. 5A and 5B show schematic diagrams of affine motion fields of a block according to an embodiment.

FIG. 5C shows equation (1) for an embodiment of a 4-parameter affine motion model.

FIG. 5D shows equation (2) for an embodiment of a 6-parameter affine motion model.

Fig. 6 shows a schematic diagram of affine MVF (MV field) per sub-block according to an embodiment.

Fig. 7 shows a schematic diagram of the locations of spatial merge candidate blocks according to an embodiment.

Fig. 8 shows a schematic diagram of control point motion vector inheritance according to an embodiment.

Fig. 9 shows a schematic diagram of a distribution of candidate locations for constructing an affine merging pattern according to an embodiment.

Fig. 10 shows a schematic of adjacent samples that may be used to derive LIC parameters, according to an embodiment.

Fig. 11 shows a schematic diagram of LIC with bi-prediction according to an embodiment.

Fig. 12 illustrates LIC having a multi-hypothesis intra frame LIC flag as part of motion information other than MV and reference index according to an embodiment.

Fig. 13A shows a schematic diagram of obtaining a reference sample of an affine coded CU in a method according to an embodiment.

Fig. 13B shows a schematic diagram of obtaining a reference sample of an affine coding CU in the second method according to the embodiment.

Fig. 13C shows a schematic diagram of obtaining a reference sample of an affine coding CU in method three according to the embodiment.

FIG. 14 is a flow diagram of an exemplary method according to an embodiment;

fig. 15 is a schematic diagram of a computer system, according to an embodiment.

Detailed Description

Fig. 1 is a simplified block diagram of a communication system (100) according to an embodiment disclosed herein. The communication system (100) comprises at least two terminal devices (110, 120) which can communicate with each other via a network (150). For unidirectional data transmission, the first terminal device (110) may encode video data at a local location for transmission over the network (150) to the second end device (120). The second terminal device (120) may receive encoded video data of another terminal from the network (150), decode the encoded video data to recover the video data, and display the recovered video data. Unidirectional data transmission is common in applications such as media services.

Fig. 1 shows a second pair of end devices (130, 140) supporting bi-directional transmission of encoded video, which may occur, for example, during a video conference. For bi-directional data transmission, each of the third end device (130) and the fourth end device (140) may encode video data captured at the local location for transmission over the network (150) to the other of the third end device (130) and the fourth end device (140). Each of the third end device (130) and the fourth end device (140) may also receive encoded video data transmitted by the other of the third end device (130) and the fourth end device (140), and may decode the encoded video data and display the recovered video data on a local display apparatus.

In fig. 1, the first terminal device (110), the second terminal device (120), the third terminal device (130), and the fourth terminal device (140) may be a laptop computer 110, a server 120, and

smartphones

130 and 140, but the principles disclosed herein may not be limited thereto. Embodiments disclosed herein are applicable to other devices including, but not limited to, laptop computers, tablet computers, media players, and/or dedicated video conferencing devices. Network (150) represents any number of networks that communicate encoded video data between first terminal device (110), second terminal device (120), third terminal device (130), and fourth terminal device (140), including, for example, wired and/or wireless communication networks. The communication network (150) may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For purposes of this application, the architecture and topology of the network (150) may be immaterial to the operation disclosed herein, unless explained below.

As an example of an application of the disclosed subject matter, fig. 2 illustrates the placement of a decoder 210 in a streaming environment/streaming system 200. Decoder 210 is further discussed with reference to decoder 433 in fig. 3 and 4. The decoder 210 may correspond to the decoder 210 in fig. 3 or the decoder 433 in fig. 4.

The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV, storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.

As shown in fig. 2, the streaming system (200) may include an acquisition subsystem (213), which may include a video source (201), such as a digital camera, that creates an uncompressed video sample stream (202). The stream of video samples (202) is depicted as a thick line compared to the encoded video bitstream, so as to be strongly a data-volume-up stream of video samples, the stream of video samples (202) being processable by a source encoder (203) coupled to the camera (201). The source encoder (203) may comprise hardware (e.g., a processor or circuitry and memory), software, or a combination of both to implement or implement aspects of the disclosed subject matter as described in more detail below. The encoded video bitstream (204) is depicted as a thin line compared to the video sample stream (202) to emphasize the lower data volume encoded video bitstream, which may be stored on the streaming server (205) for future use. One or more streaming clients (206, 208) may access a streaming server (205) to retrieve a copy (207) and a copy (209) of the encoded video bitstream (204). The client (206) may include a video decoder (210). The video decoder (210) decodes incoming copies (207) of the encoded video bitstream and generates a stream of output video samples (211) that can be presented on a display (212) or another presentation device.

Fig. 3 is a functional block diagram of a decoder (210), e.g., a video decoder, according to an embodiment of the present disclosure. As shown in fig. 3, the receiver (310) may receive one or more encoded video sequences to be decoded by the video decoder (210); in the same or another embodiment, the encoded video sequences are received one at a time, wherein each encoded video sequence is decoded independently of the other encoded video sequences. The encoded video sequence may be received from a channel (312), which may be a hardware/software link to a storage device that stores encoded video data. The receiver (310) may receive encoded video data as well as other data, e.g. encoded audio data and/or auxiliary data streams, which may be forwarded to their respective usage entities (not indicated). The receiver (310) may separate the encoded video sequence from other data. To prevent network jitter, a buffer memory (315) may be coupled between the receiver (310) and the entropy decoder/parser (320) (hereinafter "parser"). The buffer memory (315) may not need to be configured or may be made smaller when the receiver (310) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network. Of course, for use over a traffic packet network such as the internet, a buffer memory (315) may also be required, which may be relatively large and may be of an adaptive size.

The video decoder (210) may include a parser (320) to reconstruct symbols (321) from the entropy encoded video sequence. The categories of these symbols include information for managing the operation of the video decoder (210), as well as potential information to control a display device, such as display 212, which is not an integral part of the decoder, but may be coupled to the decoder, as shown in fig. 2 and 3. The control Information for the display device may be a parameter set fragment (not shown) of supplemental Enhancement Information (SEI message) or Video Usability Information (VUI). The parser (320) may parse/entropy decode the received encoded video sequence. The encoding of the encoded video sequence may be performed in accordance with video coding techniques or standards and may follow principles known to those skilled in the art, including variable length coding, Huffman coding, arithmetic coding with or without context sensitivity, and the like. A parser (320) may extract a subgroup parameter set for at least one of the subgroups of pixels in the video decoder from the encoded video sequence based on at least one parameter corresponding to the group. A subgroup may include a Group of Pictures (GOP), a picture, a tile, a slice, a macroblock, a Coding Unit (CU), a block, a Transform Unit (TU), a Prediction Unit (PU), and so on. The entropy decoder/parser (320) may also extract information from the encoded video sequence, such as transform coefficients, quantizer parameter values, motion vectors, and so forth.

The parser (320) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (315), thereby creating symbols (321). The reconstruction of the symbol (321) may involve a number of different units depending on the type of the encoded video picture or a portion of the encoded video picture (e.g., inter and intra pictures, inter and intra blocks), among other factors. Which units are involved and the way in which they are involved can be controlled by subgroup control information parsed from the coded video sequence by the parser (320). For the sake of brevity, such a subgroup control information flow between parser (320) and the following units is not described.

In addition to the functional blocks already mentioned, the video decoder (210) may be conceptually subdivided into several functional units as described below. In a practical embodiment operating under business constraints, many of these units interact closely with each other and may be integrated with each other. However, for the purposes of describing the disclosed subject matter, a conceptual subdivision into the following functional units is appropriate.

The first unit may be a scaler/inverse transform unit (351). The scaler/inverse transform unit (351) may receive the quantized transform coefficients as symbols (321) from the parser (320) along with control information including which transform mode to use, block size, quantization factor, quantization scaling matrix, etc. The scaler/inverse transform unit (351) may output a block comprising sample values, which may be input into the aggregator (355).

In some cases, the output samples of sealer/inverse transform unit (351) may belong to an intra-coded block; namely: predictive information from previously reconstructed pictures is not used, but blocks of predictive information from previously reconstructed portions of the current picture may be used. Such predictive information may be provided by an intra picture prediction unit (352). In some cases, the intra picture prediction unit (352) generates a surrounding block of the same size and shape as the block being reconstructed using reconstructed information extracted from the (partially reconstructed) current picture (358). In some cases, the aggregator (355) adds the prediction information generated by the intra prediction unit (352) to the output sample information provided by the scaler/inverse transform unit (351) on a per sample basis.

In other cases, the output samples of sealer/inverse transform unit (351) may belong to inter-coded and potential motion compensated blocks. In this case, the motion compensated prediction unit (353) may access the reference picture memory (357) to fetch samples for prediction. After motion compensating the extracted samples according to the sign (321), the samples may be added to the output of the scaler/inverse transform unit (351), in this case referred to as residual samples or residual signals, by an aggregator (355), thereby generating output sample information. The fetching of prediction samples by the motion compensated prediction unit (353) from addresses within the reference picture memory (357) may be controlled by motion vectors, and the motion vectors are used by the motion compensated prediction unit (353) in the form of the symbols (321), the symbols (321) for example comprising X, Y and reference picture components. Motion compensation may also include interpolation of sample values fetched from reference picture store (357), motion vector prediction mechanisms, etc., when using sub-sample exact motion vectors.

The output samples of the aggregator (355) may be employed in a loop filter unit (356) by various loop filtering techniques. The video compression techniques may include in-loop filter techniques that are controlled by parameters included in the encoded video bitstream and which are available to the loop filter unit (356) as symbols (321) from the parser (320). However, in other embodiments, the video compression techniques may also be responsive to meta-information obtained during decoding of previous (in decoding order) portions of the encoded picture or encoded video sequence, as well as to sample values previously reconstructed and loop filtered.

The output of the loop filter unit (356) may be a sample stream that may be output to a display device (212) and stored in a reference picture memory (357) for subsequent inter picture prediction.

Once fully reconstructed, some of the coded pictures may be used as reference pictures for future prediction. Once the encoded picture is fully reconstructed and the encoded picture is identified as a reference picture (by, for example, parser (320)), the current picture (358) may become part of the reference picture memory (357) and a new current picture memory may be reallocated before reconstruction of a subsequent encoded picture begins.

The video decoder (210) may perform decoding operations according to predetermined video compression techniques, such as those recorded in the ITU-T h.265 standard. The encoded video sequence may conform to the syntax specified by the video compression technique or standard used, in the sense that the encoded video sequence conforms to the syntax of the video compression technique or standard specified in the video compression technique document or standard, particularly the configuration file. For compliance, the complexity of the encoded video sequence is also required to be within the limits defined by the level of the video compression technique or standard. In some cases, the hierarchy limits the maximum picture size, the maximum frame rate, the maximum reconstruction sampling rate (measured in units of, e.g., mega samples per second), and/or the maximum reference picture size. In some cases, the limits set by the hierarchy may be further defined by a Hypothetical Reference Decoder (HRD) specification and metadata signaled HRD buffer management in the encoded video sequence.

In an embodiment, the receiver (310) may receive additional (redundant) data along with the encoded video. The additional data may be part of an encoded video sequence. The additional data may be used by the video decoder (210) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, a temporal, spatial, or signal-to-noise ratio (SNR) enhancement layer, a redundant slice, a redundant picture, a forward error correction code, and so forth.

Fig. 4 is a functional block diagram of an encoding system (400) including a source encoder (203), where the source encoder (203) may be a video encoder (203), according to an embodiment disclosed herein.

The video encoder (203) may receive video samples from a video source (201) (not part of the decoder) that may capture video images to be encoded by the video encoder (203).

The video source (201) may provide a source video sequence in the form of a stream of digital video samples to be encoded by the video encoder (203), which may have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit … …), any color space (e.g., bt.601y CrCB, RGB … …), and any suitable sampling structure (e.g., Y CrCB 4:2:0, Y CrCB 4:4: 4). In a media service system, which may include a memory and a processor, a video source (201) may be a storage device that stores previously prepared video. In a video conferencing system, a video source (201) may include a camera that captures local image information as a video sequence. Video data may be provided as a plurality of individual pictures that are given motion when viewed in sequence. The picture itself may be constructed as an array of spatial pixels, where each pixel may comprise one or more samples, depending on the sampling structure, color space, etc. used. The relationship between pixels and samples can be readily understood by those skilled in the art. The following text focuses on describing the samples.

According to an embodiment, the video encoder (203) may encode and compress pictures of a source video sequence into an encoded video sequence in real-time or under any other temporal constraint required by an application. It is a function of the controller (450) to implement the appropriate encoding speed. The controller 450 controls and is functionally coupled to other functional units as described below. For simplicity, the couplings are not labeled in the figures. The parameters set by the controller (450) may include rate control related parameters (e.g., picture skip, quantizer, lambda value of rate distortion optimization techniques, etc.), picture size, group of pictures (GOP) layout, maximum motion vector search range, etc. Other functions of the controller (450) may be readily identified by those skilled in the art as they relate to the video encoder (203) being optimized for a certain system design.

Some video encoders operate in "coding loops" that are readily recognized by those skilled in the art. As a simple description, the encoding loop may comprise an encoding part of a source encoder (430), hereinafter referred to as "source encoder" or "source encoder", which is responsible for creating symbols based on input pictures and reference pictures to be encoded, and a (local) decoder (433) embedded in the video encoder (203). The "local" decoder (433) reconstructs the symbols to create sample data in a similar manner as the (remote) decoder (210) created the sample data (since any compression between the symbols and the encoded video stream is lossless in the video compression techniques considered in this application). The reconstructed sample stream is input to a reference picture memory (434). Since the decoding of the symbol stream produces bit accurate results independent of decoder location (local or remote), the content in the reference picture memory also corresponds bit accurately between the local encoder and the remote encoder. In other words, the reference picture samples that the prediction portion of the encoder "sees" are identical to the sample values that the decoder would "see" when using prediction during decoding. Such reference picture synchronization philosophy (and drift that occurs if synchronization cannot be maintained, e.g., due to channel errors) is well known to those skilled in the art.

The operation of the "local" decoder (433) may be the same as a "remote" decoder, such as that which has been described in detail above in connection with fig. 3 for the video decoder (210). However, referring briefly also to fig. 3, when symbols are available and the entropy encoder (445) and parser (320) are able to losslessly encode/decode the symbols into an encoded video sequence, the entropy decoding portion of the video decoder (210), including the channel (312), receiver (310), buffer memory (315) and parser (320), may not be implemented entirely in the local decoder (433).

At this point it can be observed that any decoder technique other than the parsing/entropy decoding present in the decoder must also be present in the corresponding encoder in substantially the same functional form. The description of the encoder techniques is reciprocal to the described decoder techniques. A more detailed description is only needed in certain areas and is provided below.

As part of the operation, the source encoder (203) may perform motion compensated predictive coding. The motion compensated predictive coding predictively codes an input frame with reference to one or more previously coded frames from the video sequence that are designated as "reference frames". In this way, the encoding engine (432) encodes the difference between a block of pixels of an input frame and a block of pixels of a reference frame, which may be selected as a prediction reference for the input frame.

The local video decoder (433) may decode encoded video data of a frame that may be designated as a reference frame based on symbols created by the source encoder (203). The operation of the encoding engine (432) may be a lossy process. When the encoded video data can be decoded at a video decoder (not shown in fig. 4), the reconstructed video sequence may typically be a copy of the source video sequence with some errors. The local video decoder (433) replicates a decoding process that may be performed on reference frames by the video decoder, and may cause reconstructed reference frames to be stored in a reference picture memory (434). In this way, the source encoder (203) may locally store a copy of the reconstructed reference frame that has common content (no transmission errors) with the reconstructed reference frame to be obtained by the remote video decoder.

The predictor (435) may perform a prediction search for the coding engine (432). That is, for a new frame to be encoded, predictor (435) may search reference picture memory (434) for sample data (as candidate reference pixel blocks) or some metadata, such as reference picture motion vectors, block shapes, etc., that may be referenced as appropriate predictions for the new picture. The predictor (435) may operate on a block-by-block basis of samples to find a suitable prediction reference. In some cases, from search results obtained by predictor (435), it may be determined that the input picture may have prediction references taken from multiple reference pictures stored in reference picture memory (434).

The controller (450) may include a processor and may manage encoding operations of the source encoder (203), including, for example, setting parameters and subgroup parameters for encoding video data.

The outputs of all of the above functional units may be entropy encoded in an entropy encoder (445). The entropy encoder (445) may transform the symbols generated by the various functional units into an encoded video sequence by lossless compression of the symbols according to techniques known to those skilled in the art, such as huffman coding, variable length coding, arithmetic coding, and the like.

The transmitter (440) may buffer the encoded video sequence created by the entropy encoder (445) in preparation for transmission over a communication channel (460), which may be a hardware/software link to a storage device that will store the encoded video data. The transmitter (440) may combine the encoded video data from the source encoder (203) with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (sources not shown).

The controller (450) may manage the operation of the video encoder (203). During encoding, the controller (450) may assign a certain encoded picture type to each encoded picture, but this may affect the encoding techniques applicable to the respective picture. For example, pictures may be generally assigned to any of the following picture types: intra pictures (I pictures), predictive pictures (P pictures), and bi-predictive pictures (B pictures).

Intra pictures (I pictures), which may be pictures that can be encoded and decoded without using any other frame in the sequence as a prediction source. Some video codecs tolerate different types of intra pictures, including, for example, Independent Decoder Refresh ("IDR") pictures. Those skilled in the art are aware of variants of picture I and their corresponding applications and features.

Predictive pictures (P pictures), which may be pictures that may be encoded and decoded using intra prediction or inter prediction that uses at most one motion vector and reference index to predict sample values of each block.

Bi-predictive pictures (B-pictures), which may be pictures that can be encoded and decoded using intra-prediction or inter-prediction that uses at most two motion vectors and reference indices to predict sample values of each block. Similarly, multiple predictive pictures may use more than two reference pictures and associated metadata for reconstructing a single block.

A source picture may typically be spatially subdivided into blocks of samples (e.g., blocks of 4 x 4, 8 x 8, 4 x 8, or 16 x 16 samples) and encoded block-wise. These blocks may be predictively encoded with reference to other (encoded) blocks that are determined according to the encoding allocation applied to their respective pictures. For example, a block of an I picture may be non-predictive encoded, or the block may be predictive encoded (spatial prediction or intra prediction) with reference to an already encoded block of the same picture. The pixel block of the P picture can be prediction-coded by spatial prediction or by temporal prediction with reference to one previously coded reference picture. A block of a B picture may be prediction coded by spatial prediction or by temporal prediction with reference to one or two previously coded reference pictures.

The source encoder (203) may perform encoding operations according to a predetermined video encoding technique or standard, such as the ITU-T h.265 recommendation or VVC. In operation, the source encoder (203) may perform various compression operations, including predictive coding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the encoded video data may conform to syntax specified by the video coding technique or standard used.

In an embodiment, the transmitter (440) may transmit the additional data and the encoded video. Such data may be part of an encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, redundant pictures and slices, among other forms of redundant data, SEI messages, VUI parameter set segments, and the like.

As discussed above, encoding of the encoded video sequence may be in accordance with video encoding techniques or standards, and may follow principles well known to those skilled in the art, including variable length coding, huffman coding, arithmetic coding with or without context sensitivity, and so forth. In some streaming media systems, the video bit stream (204, 207, 209) may be encoded according to some video encoding/compression standard. Examples of these standards include the ITU-T HEVC recommendation of h.265.

Inter prediction in VVC

For each inter-predicted CU, the new coding features of the VVC require motion parameters (including motion vectors, reference picture indices, and reference picture list use indices) and other information for the generation of inter-prediction samples. The motion parameters may be signaled explicitly or implicitly. When a CU is encoded using skip mode, the CU may be associated with one PU and may have no significant residual coefficients, no motion vector delta already encoded, and no reference picture index. The merge mode may be specified such that the motion parameters of the current CU, including spatial candidates and temporal candidates and additional schedules (schedules) introduced in the VVC, are obtained from neighboring CUs. The merge mode can be applied to any inter-predicted CU, not just the skip mode. An alternative to merge mode is explicit transmission of motion parameters, where the motion vectors, corresponding reference picture index for each reference picture list, reference picture list usage flags, and other required information are explicitly signaled by each CU.

In addition to the inter-coding features in HEVC, VTM3 includes many new and improved inter-prediction coding tools, as follows:

-extended merge prediction

Merging mode with motion vector difference (MMVD)

-affine motion compensated prediction

Sub-block based temporal motion vector prediction (SbTMVP)

-triangle partition prediction

Combined Inter and Intra Prediction (CIIP)

Details regarding affine inter prediction and related methods are provided below.

Affine motion compensated prediction

In HEVC, only the translational motion model is applied to Motion Compensated Prediction (MCP). However, in the real world, there are many kinds of motions (e.g., zoom in/out, rotation, perspective motion, and other irregular motions). In current VTMs, block-based affine transform motion compensated prediction may be applied. As shown in fig. 5A and 5B, the affine motion field of the block can be described by the motion information of two control point motion vectors (4 parameters in fig. 5A) or three control point motion vectors (6 parameters in fig. 5B).

Fig. 5A shows an affine model based on 4-parameter control points, and fig. 5B shows an affine model based on 6-parameter control points. For a 4-parameter affine motion model, the motion vector at sample position (x, y) in the block can be derived based on equation (1) shown in fig. 5C.

For a 6-parameter affine motion model, the motion vector at sample position (x, y) in the block may be derived based on equation (2) in fig. 5D.

For equations (1) and (2), (mv)_0x，mv_0y) May be the motion vector of the upper left control point, (mv)_1x，mv_1y) May be the motion vector of the upper right control point, (mv)_2x，mv_2y) May be the motion vector of the lower left corner control point.

According to an embodiment, to simplify motion compensated prediction, block based affine transform prediction may be applied.

Fig. 6 shows affine MVFs (MV fields) for each sub-block. According to an embodiment, in order to derive the motion vector of each 4 × 4 luminance sub-block, the motion vector of the center sample of each sub-block may be calculated according to at least one of equation (1) and equation (2) above, as shown in fig. 6, and rounded (e.g., rounded to 1/16 fractional precision). Subsequently, a motion compensated interpolation filter may be applied to generate a prediction for each sub-block using the derived motion vectors. According to an embodiment, the sub-block size of the chrominance component may also be set to 4 × 4. According to an embodiment, the MVs of the 4 × 4 chroma sub-blocks may be calculated as an average of the MVs of four corresponding 4 × 4 luma sub-blocks.

As with translational motion inter-prediction, there are two affine motion inter-prediction modes: affine MERGE (AF _ MERGE) mode and affine AMVP mode.

Affine merge prediction mode

The AF _ MERGE mode may be applied to CUs having a width and a height greater than or equal to 8. In AF _ MERGE mode, the CPMV of the current CU may be generated based on motion information of spatially neighboring CUs. According to an embodiment, there may be at most five CPMVP candidates, and the index may be signaled to indicate one CPMVP candidate to be used for the current CU. The following three types of CPMV candidates may be used to form the affine merge candidate list:

1) inherited affine merge candidates inferred from the CPMVs of neighboring CUs;

2) a constructed affine merging candidate inferred from the CPMVP derived using the panning MVs of the neighboring CUs; and

3) zero MV

Fig. 7 illustrates locations of spatial merge candidate blocks according to an embodiment.

In VTM3, according to an embodiment, there are at most two inherited affine candidates derived from affine motion models of neighboring blocks, one derived from the left neighboring CU and the other derived from the upper neighboring CU. For the left predictor, the scan order may be a0- > a1, and for the up predictor, the scan order may be B0- > B1- > B2. According to an embodiment, only the first inherited candidate may be selected from each side. No pruning check is performed between the two inherited candidates. When a neighboring affine CU is identified, the control point motion vectors of the neighboring affine CU can be used to derive the CPMVP candidates in the affine merge list of the current CU.

Fig. 8 illustrates control point motion vector inheritance. As shown in FIG. 8, according to an embodiment, if the facies are opposed in an affine modeWhen the adjacent lower left block a is coded, the motion vectors v of the upper left corner, the upper right corner and the lower left corner of the CU containing the block a can be obtained respectively₂、v₃And v₄. When block A is encoded using a 4-parameter affine model, it can be based on v₂And v₃To calculate the two CPMVs of the current CU. When block A is encoded using a 6-parameter affine model, it can be based on v₂、v₃And v₄To calculate the three CPMVs of the current CU.

Fig. 9 shows the distribution of candidate positions for constructing an affine merging pattern. According to an embodiment, constructing affine candidates refers to candidates constructed by combining adjacent translational motion information of each control point. The motion information of the control point can be derived from the specified spatial neighboring blocks and temporal neighboring blocks shown in fig. 9.

As shown in FIG. 9, CPMV_k(k ═ 1, 2, 3, 4) may denote the kth control point. For CPMV₁Check B2->B3->A2 block and the MV of the first available block can be used. For CPMV₂Check B1->B0 Block, for CPMV₃Check A1->Block a 0. According to an embodiment, if TMVP is available, it can be used as CPMV₄。

According to an embodiment, after obtaining MVs for four control points, affine merge candidates may be constructed based on the motion information. The following combinations of control points MV can be used to build up sequentially: { CPMV₁，CPMV₂，CPMV₃}、{CPMV₁，CPMV₂，CPMV₄}、{CPMV₁，CPMV₃，CPMV₄}、{CPMV₂，CPMV₃，CPMV₄}、{CPMV₁，CPMV₂}、{CPMV₁，CPMV₃}。

A combination of 3 CPMVs may be used to construct a 6-parameter affine merge candidate, and a combination of 2 CPMVs may be used to construct a 4-parameter affine merge candidate. According to an embodiment, to avoid the motion scaling process, the relevant combinations of control points MV can be discarded if the reference indices of the control points are different.

According to an embodiment, after checking the inherited affine merge candidate and constructing the affine merge candidate, if the list is still not full, zero MVs may be inserted at the end of the list.

Affine AMVP prediction

Affine AMVP mode may be used for CUs having a width and height greater than or equal to 16. According to an embodiment, an affine flag at the CU level may be signaled in the bitstream to indicate whether affine AMVP mode is used, and then another flag may be signaled to indicate whether 4-parameter affine or 6-parameter affine is used. In affine AMVP mode, the difference of the CPMV of the current CU and its predictor CPMVP can be signaled in the bitstream. The affine AMVP candidate list may be 2 in size and may be generated by sequentially using the following four types of CPVM candidates:

1) inherited affine AMVP candidates inferred from CPMVs of neighboring CUs;

2) constructing affine AMVP candidate CPMVP derived using the shifted MVs of the neighboring CUs;

3) a translation MV from a neighboring CU;

4) and zero MV.

According to an embodiment, after checking the inherited affine merge candidate and constructing the affine merge candidate, if the list is still not full, zero MVs are inserted at the end of the list.

The checking order of the inherited affine AMVP candidate may be the same as or similar to the checking order of the inherited affine merge candidate. According to an embodiment, the only difference may be that for the AMVP candidate only affine CUs having the same reference picture as the reference picture in the current block are considered. According to an embodiment, no pruning process is applied when the inherited affine motion predictor is inserted into the candidate list.

According to an embodiment, constructing AMVP candidates may be derived from specified spatial neighboring blocks as shown in fig. 9. The same checking order as in the affine merge candidate construction may be used. In addition, reference picture indexes of the neighboring blocks may also be checked. The first block in check order, which is inter-coded and has the same reference picture as in the current CU, may be used. According to an embodiment, there may be only one block. When using a 4-parameter affine mode pairFront CU encodes and mv₀And mv₁Both may be added as a candidate to the affine AMVP candidate list, when available. When the current CU is encoded using the 6-parameter affine mode and all three CPMVs are available, the three CPMVs may be added as one candidate to the affine AMVP candidate list. Otherwise, the constructed AMVP candidate may be set as unavailable.

If the affine AMVP candidate list is still less than 2 after checking the inherited affine AMVP candidates and constructing the affine AMVP candidates, then mv can be ordered₀、mv₁And mv₂Added as a panning MV to predict all control points MVs of the current CU (if available). Finally, if the affine AMVP candidate list is still not full, the affine AMVP candidate list may be populated with zero MVs.

Local brightness compensation (LIC)

A Local Illumination Compensation (LIC) method is based on a linear model of the Illumination changes and uses a scaling factor a and an offset b. Also, LIC may be adaptively enabled or disabled for each Coding Unit (CU) encoded in inter mode.

Fig. 10 illustrates neighboring samples that may be used to derive LIC parameters, according to an embodiment.

As shown in fig. 10, when LIC is applied to a CU, a least squares error method may be employed to derive parameters a and b by using neighboring samples of the current CU and their corresponding reference samples. More specifically, as shown in fig. 10, neighboring samples of sub-sampling (2: 1 sub-sampling) of a CU and corresponding reference samples in a reference picture (identified by motion information of the current CU or sub-CU) are used. LIC parameters can be derived and applied separately to each prediction direction.

When a CU is encoded in merge mode, the LIC flag may be copied from neighboring blocks in a similar manner to the motion information copied in merge mode; otherwise, a LIC flag may be signaled for the CU to indicate whether LIC is applied.

When LIC is enabled for a picture, an additional rate-distortion (RD) check at the CU level may be needed to determine whether to apply LIC to the CU. When LIC is enabled for a CU, in order to compensate for the disparity value of luminance, for integer-pixel motion search and fractional-pixel motion search, the sum of absolute differences (MR-SAD) with mean removed and the sum of absolute Hadamard-transformed differences (MR-SATD) with mean removed may be used instead of the Sum of Absolute Differences (SAD) and the sum of absolute Hadamard-transformed differences (SATD), respectively. Wherein, in MR-SAD and MR-SATD, removing the mean means subtracting the pixel mean of the current block from the current picture and subtracting the pixel mean of the reference block from the reference picture.

Unidirectional brightness compensation

An improved LIC method may include unidirectional illumination compensation. In local illumination compensation, the derivation of the linear model parameters may remain unchanged and the LIC may be applied based on the CU. According to an embodiment, the proposed LIC is not applied to sub-block based inter prediction, such as ATMVP or affine, triangle partition, multi-hypothesis intra inter and bi-directional prediction.

According to an embodiment, the proposed LIC is not applied to bi-directional prediction, because reconstructed neighboring samples of the current block are not needed in the inter pipeline to perform inter prediction, and thus are not available for each unidirectional inter prediction. However, since the weighted average of bi-prediction is applied after the unidirectional predictor is derived, reconstructed neighboring samples of the current block will be necessary for LIC. Furthermore, the application of LIC to bi-directional prediction introduces an additional stage, since the LIC process is performed before weighting.

Fig. 11 illustrates LIC with bi-prediction according to an embodiment.

Similarly, LIC is not applied to multi-hypothesis intra inter prediction because it is applied after inter prediction and the LIC process delays the weighting between intra and inter prediction.

Fig. 12 illustrates LIC having a multi-hypothesis intra frame LIC flag as part of motion information other than MV and reference index according to an embodiment. However, when building the merge candidate list, the LIC flag may be inherited from the neighboring blocks for the merge candidates, but according to an embodiment, LIC is not used for motion vector pruning for simplicity.

According to an embodiment, the LIC flag is not stored in the motion vector buffer of the reference picture, so for TMVP the LIC flag may be set equal to false. According to an embodiment, for bi-directional merge candidates, such as pair-wise average candidate (pair-wise average candidate) and zero motion candidate (zero motion candidate), the LIC flag may also be set equal to false. When the LIC tool is not applied, the LIC flag may not be signaled.

Applying LIC to affine

The LIC may be extended to apply to affine coded CUs. The derivation of the linear model parameters may remain unchanged and three methods may be used to obtain reference samples for affine coded CUs, see fig. 13A-13C.

Fig. 13A shows a schematic diagram of obtaining a reference sample of an affine coded CU in a method according to an embodiment. In method one, as shown in fig. 13A, the top-left sub-block Motion Vector (MV) of an affine-coded CU may be used to obtain reference samples for the entire CU.

Fig. 13B shows a schematic diagram of obtaining a reference sample of an affine coding CU in the second method according to the embodiment. In method two, as shown in fig. 13B, the center sub-block MV of the affine-encoded CU may be used to obtain reference samples for the entire CU.

Fig. 13C shows a schematic diagram of obtaining a reference sample of an affine coding CU in method three according to the embodiment. In method three, as shown in fig. 13C, the reference sample in the top template may be obtained by each sub-block MV in the top row, and the reference sample in the left template may be obtained by each sub-block MV in the left column.

Examples of LIC applications

When the LIC tool is enabled, the following may apply:

5.1.1 in one embodiment, when encoding a current block using affine inter prediction and signaling the current block via affine AMVP mode, the LIC flag for the current block may be inferred to be a valid value, e.g., true, unless LIC is not available due to other limitations, e.g., as described in 5.1.5.

5.1.2 in another embodiment, when the current block is encoded using conventional inter prediction mode with a translational motion vector and signaled by AMVP mode, the LIC flag for the current block can be inferred to be a valid value, e.g., true, unless LIC is not available due to other restrictions, e.g., as described in 5.1.5.

5.1.3 in another embodiment, the inferred LIC flag method is enabled for affine AMVP mode (as described in 5.1.1) or AMVP mode for conventional inter prediction (as described in 5.1.2), but the inferred LIC flag cannot be enabled for both affine AMVP mode and AMVP mode for conventional inter prediction.

5.1.4 when encoding a current block using affine merge mode, the LIC flag of the current block may be determined by one or more of the following.

5.1.4.1 in one embodiment, when inherited affine merging is used for a current block, the LIC flag for the current block may be inherited from the LIC flag of a neighboring block that serves as an affine model inheritance source unless LIC is unavailable due to other limitations.

5.1.4.2 when the current block uses the constructive affine merge mode, the following can apply:

5.1.4.2.1 in one embodiment, if LIC is available for the current block, the LIC flag for the current block may be inferred to be a valid value, such as 1 (enabled). Otherwise, LIC is not available for the current block, e.g. due to the limitation in 5.1.5.

5.1.4.2.2 in another embodiment, if the current block is using the construct affine merge mode, the LIC flag for the current block can be inferred to be a valid value, e.g., 1 (enabled). If LIC is not available for the current block, e.g., due to the restriction in 5.1.5, LIC for the current block may be disabled.

5.1.4.2.3 in another embodiment, if LIC is available for the current block, then if the current block uses the constructed affine merge mode, the LIC flag for the current block can be inherited from the LIC flag of the neighboring block used to predict the CPMV of the upper left corner of the current block.

5.1.5 restrictions on the application of LIC can be applied by one or any combination of the following:

5.1.5.1 in one embodiment, when the number of samples in a current block is below some threshold, LIC may be disabled for that block. In one example, the threshold may be set to 64 luminance samples.

5.1.5.2 in another embodiment, when the number of samples in a current block is above some threshold, LIC may be disabled for that block. In one example, the threshold may be set to 4096 brightness samples.

5.1.5.3 in one embodiment, LIC may be disabled for a current block when the number of samples on either side of the block is below some threshold. In one example, the threshold may be set to 8 luminance samples.

5.1.5.4 in another embodiment, LIC may be disabled for a current block when the number of samples on either side of the block is above some threshold. In one example, the threshold may be set to 64 luminance samples.

5.1.5.5 in another embodiment, when a current block is encoded in bi-directional prediction mode, LIC may be disabled for that block. Alternatively, when the current block is encoded in a multi-hypothesis mode (multi-hypothesis mode), the LIC may be disabled for the block.

5.1.5.6 in one embodiment, the same restriction or combination of restrictions may apply to blocks encoded in affine inter prediction mode and blocks encoded in conventional translational inter prediction mode.

5.1.5.7 in another embodiment, different restrictions or combinations of restrictions may be applied to blocks encoded in affine inter prediction mode and blocks encoded in conventional translational inter prediction mode.

Fig. 14 shows a flow diagram of a method according to an embodiment of the application. The method can comprise the following steps: in step 501, information about a current block of data of an image is received. The method may further comprise: at step 502, it is determined whether Local Illumination Compensation (LIC) is available for the current data block.

The determining whether LIC is available for the current data block comprises: it is determined whether the current data block is encoded using affine inter prediction.

The method may further include performing step 503 when it is determined that LIC is available for the current data block. At step 503, at least one of the following is performed: the LIC flag of the current data block corresponding to the enabled LIC is inferred to be a valid value or inherited from the LIC flags of the neighboring blocks. Wherein the valid value may be 1 or true.

Further, the method may comprise: generating a prediction of the at least one sub-block using the derived motion vector by applying the LIC to the current data block using the inherited LIC flag based on the LIC flag of the current data block corresponding to the enabled LIC.

When it is determined that LIC is not available for the current data block, step 504 is performed. At step 504, LIC is disabled for the current data block.

The determining whether LIC is available for the current data chunk may include: determining whether the current data block is encoded using affine inter prediction, e.g., whether it is encoded using affine inter prediction or conventional inter prediction; when it is determined that the current data block is encoded using affine inter prediction, it is determined whether other LIC restrictions apply, and when it is determined that other LIC restrictions do not apply, it is inferred that LIC is enabled.

The inheriting the LIC flag of the current data block from the LIC flags of the neighboring blocks may include: if the current data block uses the constructed affine merge mode, the LIC flag of the current data block is inherited from the LIC flags of the neighboring blocks.

The method further comprises the following steps: the control point motion vector of the corner of the current data block is predicted from the neighboring blocks.

The control point motion vector for the corner of the predicted current data block may be the control point motion vector for the upper left corner of the predicted current data block.

The applying the LIC to the current data block may include: applying the LIC to the affine data block using a scaling factor and an offset based on the derived motion vector based on a linear model of the luminance variation, wherein the scaling factor and the offset are derived for each prediction direction by using at least one neighboring sample and at least one corresponding reference sample of the current data block.

The at least one neighboring sample may include sub-sampled neighboring samples of the current data block, and the at least one corresponding reference sample includes a corresponding reference sample in a reference picture identified by the motion information of the current data block.

The method may further comprise: when the current data block is encoded using merge mode, the LIC flag may be copied from the neighboring block in a manner similar to the motion information copied in merge mode; when the current data block is not encoded using merge mode, a LIC flag is signaled for the current data block to indicate whether LIC is applied.

Applying the LIC includes: for integer pixel motion search and fractional pixel motion search, SAD and SATD are replaced with the sum of absolute difference with mean removed (MR-SAD) and the sum of absolute Hadamard (Hadamard) transformed difference with mean removed (MR-SATD), respectively.

Applying the LIC may include: unidirectional illumination compensation and in local illumination compensation, scaling factors and offsets are derived and LIC is applied on CU basis.

Other limitations may include: sub-blocks based on TMVP, triangle partitions, multi-hypothesis intra inter and bi-prediction, or bi-prediction.

Other limitations may include: one or more of the following conditions: (A) when the number of samples in the current data block is below some minimum threshold; (B) when the number of samples in the current data block is above some maximum threshold; (C) when the number of samples on either side of the current data block is below a certain side minimum threshold; and (D) when the number of samples on either side of the current data block is above a certain side maximum threshold.

The certain minimum threshold, the certain maximum threshold, the certain side minimum threshold, and the certain side maximum threshold may vary depending on whether the current data block is an affine inter-prediction data block.

According to an embodiment, at least one of the following is performed: the minimum threshold is set to 64 luma samples, the maximum threshold is set to 4096 luma samples, the side minimum threshold is set to 8 luma samples, or the side maximum threshold is set to 64 luma samples.

Other limitations may include at least one of: the LIC flag is not stored in the motion vector buffer of the reference picture or bi-directional merge candidates, pair-wise average candidates or zero motion candidates are used.

The method may further comprise: when the current data block is coded by using an inherited affine combination mode, determining an LIC mark of the current data block by inheriting the LIC mark of an adjacent block used as an inheritance source of an affine model; signaling the LIC flag in a bitstream to indicate whether affine AMVP mode is used, and signaling another flag to indicate whether 4-parameter affine or 6-parameter affine is used; and the motion vector at the sample position (x, y) in the current data block is calculated using the above equation (1) when 4-parameter affine is used, and the motion vector at the sample position (x, y) in the current data block is calculated using the above equation (2) when 6-parameter affine is used. In an embodiment, the inherited affine merging mode has affine model inheritance, which has at most two inherited affine candidates derived from affine motion models of neighboring blocks, one derived from the left neighboring CU and the other derived from the upper neighboring CU.

According to an embodiment, the inferred LIC flag may be enabled for affine AMVP mode or AMVP mode for conventional inter prediction, but cannot be enabled for both affine AMVP mode and AMVP mode for conventional inter prediction.

The method may further comprise: applying an affine merging mode to CUs having both width and height greater than or equal to the threshold, selecting only the first inherited candidate from each side, and when adjacent affine CUs are identified, using the control point motion vectors of the adjacent affine CUs for deriving the CPMVP candidates in the affine merging list of the current CU.

According to an embodiment, an apparatus may comprise: at least one memory configured to store computer program code; and at least one processor configured to access the at least one memory and operate in accordance with the computer program code, the computer program code comprising: first application code configured to cause the at least one processor to receive information about a current block of data of an image; second application code configured to cause the at least one processor to determine whether Local Illumination Compensation (LIC) is available for a current data block, wherein the determining whether LIC is available for the current data block comprises: determining whether the current data block is encoded using affine inter prediction; third application code configured to cause the at least one processor, in determining that LIC is available for a current data block, to perform at least one of: inferring an LIC flag of the current data block corresponding to the enabled LIC to a valid value or inheriting the LIC flag of the current data block from LIC flags of neighboring blocks; and fourth application code configured to cause the at least one processor to generate a prediction of the at least one sub-block using the derived motion vector based on an LIC flag of a current data block corresponding to the enabled LIC by applying the LIC to the current data block using the inherited LIC flag.

According to an embodiment, there is provided a non-transitory computer-readable storage medium that may store instructions that cause one or more processors to perform: receiving information about a current data block of an image; determining whether Local Illumination Compensation (LIC) is available for a current data block, wherein the determining whether LIC is available for the current data block comprises: determining whether the current data block is encoded using affine inter prediction; upon determining that LIC is available for the current data chunk, performing at least one of: inferring an LIC flag of the current data block corresponding to the enabled LIC to a valid value or inheriting the LIC flag of the current data block from LIC flags of neighboring blocks; and generating a prediction of the at least one sub-block using the derived motion vector by applying the LIC to the current data block using the inherited LIC flag based on the LIC flag of the current data block corresponding to the enabled LIC.

According to an embodiment, there is also provided an apparatus for video decoding, including:

The embodiment of the present application further provides a computer device, which includes one or more processors and one or more memories, where at least one program instruction is stored in the one or more memories, and the at least one program instruction is loaded and executed by the one or more processors to implement the above-mentioned video decoding method.

The techniques of encoding/decoding may be implemented by one or more processors executing computer software having computer-readable instructions that may be physically stored in one or more computer-readable media (e.g., hard disk drives). The computer readable medium may be a non-transitory computer readable storage medium, and when the computer readable instructions are executed by a computer for video encoding/decoding, the computer is caused to perform the method for video decoding according to the above embodiment. For example, fig. 15 illustrates a computer system 700 suitable for implementing certain embodiments of the disclosed subject matter.

The computer software may be encoded in any suitable machine code or computer language, and by assembly, compilation, linking, etc., mechanisms create code that includes instructions that are directly executable by one or more computer Central Processing Units (CPUs), Graphics Processing Units (GPUs), etc., or by way of transcoding, microcode, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablets, servers, smartphones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 15 for computer system (700) are exemplary in nature and are not intended to limit the scope of use or functionality of the computer software implementing embodiments of the present application in any way. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (700).

The computer system (700) may include some human interface input devices. Such human interface input devices may respond to input from one or more human users through tactile input (e.g., keyboard input, swipe, data glove movement), audio input (e.g., sound, applause), visual input (e.g., gestures), olfactory input (not shown). The human-machine interface device may also be used to capture media that does not necessarily directly relate to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still-image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The human interface input device may include one or more of the following (only one of which is depicted): keyboard (701), mouse (702), touch pad (703), touch screen (710), data glove (not shown), joystick (705), microphone (706), scanner (707), camera (708).

The computer system (700) may also include certain human interface output devices. Such human interface output devices may stimulate the senses of one or more human users through, for example, tactile outputs, sounds, light, and olfactory/gustatory sensations. Such human interface output devices may include haptic output devices (e.g., haptic feedback through a touch screen (710), data glove (not shown), or joystick (705), but there may also be haptic feedback devices that do not act as input devices), audio output devices (e.g., speaker (709), headphones (not shown)), visual output devices (e.g., screens (710) including cathode ray tube screens, liquid crystal screens, plasma screens, organic light emitting diode screens, each with or without touch screen input functionality, each with or without haptic feedback functionality-some of which may output two-dimensional visual output or more than three-dimensional output by means such as stereoscopic picture output; virtual reality glasses (not shown), holographic displays and smoke boxes (not shown)), and printers (not shown).

The computer system (700) may also include human-accessible storage devices and their associated media such as optical media including compact disc read-only/rewritable (CD/DVD ROM/RW) with CD/DVD (720) or similar media (721), thumb drives (722), removable hard drives or solid state drives (723), conventional magnetic media such as magnetic tapes and floppy disks (not shown), ROM/ASIC/PLD based application specific devices such as secure dongle (not shown), and the like.

Those skilled in the art will also appreciate that the terms "computer-readable medium" or "computer-readable medium" used in connection with the disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.

The computer system (700) may also include an interface to one or more communication networks. For example, the network may be wireless, wired, optical. The network may also be a local area network, a wide area network, a metropolitan area network, a vehicular network, an industrial network, a real-time network, a delay tolerant network, and so forth. The network also includes ethernet, wireless local area networks, local area networks such as cellular networks (GSM, 3G, 4G, 5G, LTE, etc.), television wired or wireless wide area digital networks (including cable, satellite, and terrestrial broadcast television), automotive and industrial networks (including CANBus), and so forth. Some networks typically require external network interface adapters for connecting to some general purpose data ports or peripheral buses (749) (e.g., USB ports of computer system (700)); other systems are typically integrated into the core of the computer system (700) by connecting to a system bus as described below (e.g., an ethernet interface to a PC computer system or a cellular network interface to a smart phone computer system). Using any of these networks, the computer system (700) may communicate with other entities. The communication may be unidirectional, for reception only (e.g., wireless television), unidirectional for transmission only (e.g., CAN bus to certain CAN bus devices), or bidirectional, for example, to other computer systems over a local or wide area digital network. Each of the networks and network interfaces described above may use certain protocols and protocol stacks.

The human interface device, human accessible storage device, and network interface described above may be connected to the core (740) of the computer system (700).

The core (740) may include one or more Central Processing Units (CPUs) (741), Graphics Processing Units (GPUs) (742), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) (743), hardware accelerators (744) for specific tasks, and so forth. These devices, as well as Read Only Memory (ROM) (745), random access memory (746), internal mass storage (e.g., internal non-user accessible hard drives, solid state drives, etc.) (747), etc., may be connected by a system bus (748). In some computer systems, the system bus (748) may be accessed in the form of one or more physical plugs, so as to be extendable by additional central processing units, graphics processing units, and the like. The peripheral devices may be attached directly to the system bus (748) of the core or connected through a peripheral bus (749). The architecture of the peripheral bus includes peripheral controller interface PCI, universal serial bus USB, etc.

CPU (741), GPU (742), FPGA (743), and accelerator (744) may execute certain instructions that, in combination, may constitute the computer code described above. The computer code may be stored in ROM (745) or RAM (746). The transitional data may also be stored in RAM (746), while the persistent data may be stored in, for example, internal mass storage (747). Fast storage and retrieval of any memory device can be achieved through the use of cache memories, which can be closely associated with one or more CPUs (741), GPUs (742), mass storage (747), ROM (745), RAM (746), and the like.

The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present application, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having architecture (700), and in particular cores (740), may provide functionality as a processor (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be media associated with the user-accessible mass storage described above, as well as certain storage with a non-volatile core (740), such as core internal mass storage (747) or ROM (745). Software implementing various embodiments of the present application may be stored in such devices and executed by the core (740). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (740), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (746) and modifying such data structures in accordance with software-defined processes. Additionally or alternatively, the computer system may provide functionality that is logically hardwired or otherwise embodied in circuitry (e.g., accelerator (744)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to software may include logic and vice versa. Where appropriate, reference to a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) storing executable software, circuitry comprising executable logic, or both. The present application includes any suitable combination of hardware and software.

While the application has described several exemplary embodiments, various modifications, arrangements, and equivalents of the embodiments are within the scope of the application. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles of the application and are thus within its spirit and scope.

Appendix A: acronyms

AMVP: advanced motion vector prediction (Advanced MVP)

CU: coding Unit (Coding Unit)

CPMV: control Point Motion Vector (Control Point Motion Vector)

HEVC: high Efficiency Video Coding (High Efficiency Video Coding)

HMVP: History-Based motion vector prediction (History-Based MVP)

MMVD: merge mode with motion vector difference (Merge mode with MVD)

MV: motion Vector (Motion Vector)

MVD: motion Vector Difference (Motion Vector Difference)

MVP: motion Vector prediction (Motion Vector Predictor)

PU (polyurethane): prediction Unit (Prediction Unit)

SbTMVP: Subblock-Based temporal motion vector prediction (Subblock-Based TMVP)

TMVP: temporal motion vector prediction (Temporal MVP)

LIC: local Illumination Compensation (Local Illumination Compensation)

BDOF: bidirectional Optical Flow (Bi-Directional Optical Flow)

And (2) PROF: prediction Refinement using Optical Flow (Prediction reference with Optical Flow)

VTM: general Test Model (Versatile Test Model)

VVC: general purpose Video Coding (Versatile Video Coding)

Claims

1. A method of video decoding, comprising:

receiving information about a current data block of an image;

2. The method of claim 1, wherein the determining whether the LIC is available for the current data block comprises:

determining whether the current data block is encoded using affine inter prediction; and

when it is determined that the current data block is encoded using the affine inter prediction, determining whether other LIC restrictions apply, and upon determining that other LIC restrictions do not apply, inferring that the LIC is enabled.

3. The method of claim 2, wherein inheriting the LIC flag of the current data block from the LIC flags of neighboring blocks comprises: and if the current data block uses a constructed affine merging mode, obtaining the LIC flag of the current data block from the LIC flags of the adjacent blocks.

4. The method of claim 1, further comprising:

predicting a control point motion vector CMVP for a corner of the current data block from the neighboring block.

5. The method of claim 4, wherein the predicted control point motion vector for the corner of the current block is the predicted control point motion vector for the top left corner of the current block.

6. The method of claim 1, wherein the applying the LIC to the current data block comprises: applying the LIC to an affine data block using a scaling factor and an offset based on the derived motion vector based on a linear model of luminance variation, wherein the scaling factor and the offset are derived for each prediction direction by using at least one neighboring sample and at least one corresponding reference sample of the current data block.

7. The method of claim 6, wherein the at least one neighboring sample comprises a subsampled neighboring sample of the current data block, and wherein the at least one corresponding reference sample comprises a corresponding reference sample in a reference picture identified by motion information of the current data block.

8. The method of claim 1, further comprising:

copying the LIC flag from a neighboring block in a manner similar to motion information copied in merge mode when the current data block is encoded using merge mode; and

signaling the LIC flag for the current data block to indicate whether to apply the LIC when the current data block is not encoded using the merge mode.

9. The method of claim 1, wherein the applying the LIC comprises: for integer pixel motion search and fractional pixel motion search, the sum of absolute difference values SAD and the sum of absolute Hadamard transform difference values SATD are replaced with the sum of absolute difference values MR-SAD and the sum of absolute Hadamard transform difference values MR-SATD, respectively.

10. The method of claim 1, wherein the applying the LIC comprises: unidirectional illumination compensation and in local illumination compensation, a scaling factor and an offset are derived and the LIC is applied on CU basis.

11. The method of claim 2, wherein the other restrictions comprise: sub-blocks of the TMVP are predicted based on temporal motion vectors, triangle partitions, multi-hypothesis intra inter and bi-prediction, or bi-prediction.

12. The method of claim 2, wherein the other restrictions comprise one or more of the following conditions:

when the number of samples in the current data block is below a minimum threshold;

when the number of samples in the current data block is above a maximum threshold;

when the number of samples on either side of the current data block is below a side minimum threshold; and

when the number of samples on either side of the current data block is above a side maximum threshold.

13. The method of claim 12, wherein the minimum threshold, the maximum threshold, the side minimum threshold, and the side maximum threshold vary based on whether the current data block is an affine inter-prediction data block.

14. The method of claim 12, further comprising performing at least one of: set the minimum threshold to 64 luma samples, the maximum threshold to 4096 luma samples, the side minimum threshold to 8 luma samples, and the side maximum threshold to 64 luma samples.

15. The method of claim 2, wherein the other restrictions comprise at least one of: the LIC flag is not stored in the motion vector buffer of the reference picture or bi-directional merge candidates, pairwise average candidates or zero motion candidates are used.

16. The method of claim 1, further comprising:

when the current data block is coded by using an inherited affine merge mode, determining an LIC flag of the current data block by inheriting the LIC flag of the adjacent block used as an inheritance source of an affine model;

signaling the LIC flag in a bitstream to indicate whether affine advanced motion vector prediction, AMVP, mode is used, and another flag to indicate whether 4-parameter affine or 6-parameter affine is used; and is

When using a 4-parameter affine, the first equation is used

Calculating a motion vector at a sample position (x, y) in the current data block, using a second equation when 6-parameter affine is used

A motion vector at a sample position (x, y) in the current data block is calculated.

17. The method of claim 1, wherein the inferred LIC flag is enabled for an affine AMVP mode or an AMVP mode for conventional inter prediction, but not for both the affine AMVP mode and the AMVP mode for conventional inter prediction.

18. The method of claim 1, further comprising:

applying an affine merging mode to CUs having a width and a height both greater than or equal to a threshold, selecting only a first inherited candidate from each side; and is

When a neighboring affine CU is identified, the control point motion vectors of the neighboring affine CU are used to derive control point motion vector predictor, CPMVP, candidates in the affine merge list of the current CU.

19. An apparatus for video decoding, comprising:

20. A computer device, comprising one or more processors and one or more memories having stored therein at least one instruction, the at least one instruction being loaded and executed by the one or more processors to implement the method of encoding or decoding of any one of claims 1-18.