CN114760467A - Method and device for determining coding mode - Google Patents
Method and device for determining coding mode Download PDFInfo
- Publication number
- CN114760467A CN114760467A CN202210288519.9A CN202210288519A CN114760467A CN 114760467 A CN114760467 A CN 114760467A CN 202210288519 A CN202210288519 A CN 202210288519A CN 114760467 A CN114760467 A CN 114760467A
- Authority
- CN
- China
- Prior art keywords
- coding
- mode
- rdcost
- coding mode
- satd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
One or more embodiments of the present specification provide a method and an apparatus for determining an encoding mode, where the method includes: obtaining a rate distortion optimization cost RDcost of a first coding mode and an absolute transformation error and SATD of a second coding mode; obtaining estimated coding cost of the second coding mode according to the SATD and the rate distortion influence parameters; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode; in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode, selecting the first coding mode as a mode for use in encoding a target coding unit. The embodiment of the description reduces the encoding complexity of the encoder, enables the selection of the encoding mode to be more accurate, and improves the encoding performance.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of video processing technologies, and in particular, to a method and an apparatus for determining a coding mode.
Background
In the related art, a video sequence may be encoded into a compressed code stream by an encoder and then transmitted to a decoder at a receiving end. When the encoder encodes a video image in a video sequence, the encoding process may include: dividing a video image into Coding Units (CUs) which are not overlapped with each other, carrying out intra-frame Coding or inter-frame Coding on the CUs to obtain a prediction image block, obtaining a prediction residual according to the prediction image block, and carrying out discrete cosine transform, quantization, entropy Coding and other processing on the prediction residual to obtain a compressed code stream.
In the above encoding process, when the encoder encodes a CU, the encoder determines the encoding mode of the CU first, and then performs encoding according to the determined encoding mode. For a CU, multiple coding modes may be selected, and usually, rate distortion optimization costs RDcost of these coding modes may be calculated, and a mode with smaller RDcost may be selected as a preferred coding mode. However, in practice, it is found that the selection process of the coding mode is complex, and the coding efficiency is affected.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a method and an apparatus for determining a coding mode.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, there is provided a method for determining a coding mode, the method including:
obtaining a rate distortion optimization cost RDcost of a first coding mode and an absolute transformation error and SATD of a second coding mode;
obtaining an estimated coding cost of the second coding mode according to the SATD and the rate distortion influence parameters; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode;
In response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode, selecting the first coding mode as a mode used for encoding a target coding unit.
According to a second aspect of one or more embodiments of the present specification, there is provided an apparatus for determining an encoding mode, the apparatus including:
the cost obtaining module is used for obtaining rate distortion optimization cost RDcost of the first coding mode and absolute transformation error and SATD of the second coding mode;
the cost adjusting module is used for obtaining the estimated coding cost of the second coding mode according to the SATD and the rate distortion influence parameters; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode;
a mode determination module to select the first coding mode as a mode to use for encoding a target coding unit in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode.
According to a third aspect of one or more embodiments of the present specification, there is provided an electronic apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
Wherein the processor executes the executable instructions to implement the method for determining the encoding mode of any embodiment of the present specification.
According to a fourth aspect of one or more embodiments of the present specification, a computer-readable storage medium is provided, on which computer instructions are stored, and the instructions, when executed by a processor, implement the method for determining the encoding mode of any of the embodiments of the present specification.
According to a fifth aspect of one or more embodiments of the present specification, there is provided a method of selecting an inter prediction encoding mode, the method including:
acquiring RDcost of a first inter-frame prediction coding mode and SATD of a second inter-frame prediction coding mode;
obtaining an estimated coding cost of the second inter-frame prediction coding mode according to the SATD and the rate distortion influence parameters; the rate distortion influence parameters are used for representing the associated parameters of the RDcost of the inter-frame prediction coding mode;
selecting the first inter-prediction encoding mode as the inter-prediction encoding mode of the target encoding unit if the RDcost of the first inter-prediction encoding mode is less than the estimated encoding cost of the second inter-prediction encoding mode.
In the method and the device for determining the coding mode in the embodiment of the description, the second coding mode adopts SATD cost to replace RDcost, so as to participate in cost comparison among the modes, and the first coding mode is directly selected for coding when the RDcost of the first coding mode is smaller than the estimated coding cost of the second coding mode, so that the RDcost of the second coding mode does not need to be calculated, namely, the RDO process with higher complexity is avoided being executed on the second coding mode, thereby reducing the coding complexity of the coder and improving the coding efficiency; and the estimated coding cost of the second coding mode is not only based on SATD, but also rate distortion influence parameters which have influence on RDcost calculation of the second coding mode are introduced, so that the estimated coding cost can reflect the RDcost of the second coding mode more accurately, the cost comparison result between the modes is more accurate, the coding mode is more accurately selected, the coding performance is improved, the coding complexity is reduced, the coding efficiency is improved, and the coding performance is improved as far as possible.
Drawings
In order to more clearly illustrate one or more embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.
FIG. 1 is a structure of a video sequence provided by an exemplary embodiment;
FIG. 2 is a schematic diagram of an encoding process provided by an exemplary embodiment;
FIG. 3 is a schematic diagram of a decoding process provided by an exemplary embodiment;
FIG. 4 is a flow chart of a method for determining coding modes provided by an exemplary embodiment;
FIG. 5 is a flow chart of another method for determining coding modes provided by an exemplary embodiment;
FIG. 6 is a schematic diagram of the locations of 5 neighboring encoding units provided by an exemplary embodiment;
fig. 7 is a schematic structural diagram of an apparatus for determining an encoding mode according to an exemplary embodiment;
fig. 8 is a flowchart illustrating a method for selecting an inter-frame prediction coding mode according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims that follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Video is a collection of still pictures (which may be referred to as "frames") arranged in a time sequence for storing visual information. A video capture device (e.g., a camera) may be used to capture and store these pictures in a time sequence, and a video playback device (e.g., a television, a computer, a smartphone, a tablet computer, a video player, or any end-user terminal with a display function) may be used to display such pictures in a time sequence. Further, in some applications, the video capture device may send captured video to a video playback device (e.g., a computer) in real-time, such as for a meeting, live broadcast, and so forth.
To reduce the storage space and transmission bandwidth required for such applications, video may be compressed prior to storage and transmission and decompressed prior to display. Compression and decompression may be implemented by software executed by a processor (e.g., a processor of a general purpose computer) or dedicated hardware. The compression module is generally referred to as an "encoder" and the decompression module is generally referred to as a "decoder". The encoder and decoder may be collectively referred to as a "codec". The encoder and decoder may be implemented as any of a variety of suitable hardware, software, or combinations thereof. For example, a hardware implementation of the encoder and decoder may include circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, or any combinations thereof. Software implementations of the encoder and decoder may include program code, computer-executable instructions, firmware, or any suitable computer-implemented algorithm or process fixed in a computer-readable medium. Video compression and decompression may be implemented by various algorithms or standards, such as MPEG-1, MPEG-2, MPEG-4, the H.26x family, etc. In some applications, a codec may decompress video according to a first coding standard and recompress the decompressed video using a second coding standard, in which case the codec may be referred to as a "transcoder".
The video coding process may identify and retain useful information that may be used to reconstruct a picture, while ignoring information that is not important for reconstruction. Such an encoding process may be referred to as "lossy" if the ignored, insignificant information cannot be fully reconstructed. Otherwise, it may be referred to as "lossless". Most encoding processes are lossy, which is a trade-off between reducing the required memory space and transmission bandwidth.
Useful information for a picture being encoded (referred to as a "current picture") includes changes relative to a reference picture (e.g., previously encoded and reconstructed pictures). Such variations may include: a change in position of the pixel, a change in brightness, or a change in color, where a change in position is of most concern. A change in position of a set of pixels representing an object may reflect motion of the object between a reference picture and a current picture.
A picture that is encoded without reference to another picture (i.e., it is its own reference picture) is referred to as an "I picture". A picture encoded using a previous picture as a reference picture is referred to as a "P picture". Pictures that are encoded using both previous and future pictures as reference pictures (i.e., the reference is "bi-directional") are referred to as "B pictures.
Fig. 1 illustrates the structure of an example video sequence 100, according to some embodiments of the present disclosure. Video sequence 100 may be live video or video that has been captured and archived. The video 100 may be a real video, a computer generated video (e.g., a computer game video), or a combination thereof (e.g., a real video with augmented reality effects). The video sequence 100 may be input from a video capture device (e.g., a camera), a video archive containing previously captured video (e.g., a video file stored in a storage device), or a video feed interface (e.g., a video broadcast transceiver) that receives video from a video content provider.
As shown in fig. 1, video sequence 100 may include a series of pictures arranged in time along a timeline, including picture 102, picture 104, picture 106, and picture 108. Pictures 102-106 are consecutive and there are more pictures between pictures 106 and 108. In fig. 1, a picture 102 is an I-picture, and its reference picture is the picture 102 itself. Picture 104 is a P-picture, and its reference picture is picture 102, as indicated by the arrow. Picture 106 is a B picture, and its reference pictures are picture 104 and picture 108, as indicated by the arrows. In some implementations, a reference picture for a picture (e.g., picture 104) may not immediately precede or follow the picture. For example, the reference picture of picture 104 may be a picture prior to picture 102. It should be noted that the reference pictures of pictures 102 to 106 are merely examples, and the present disclosure does not limit the implementation of the reference pictures to the example shown in fig. 1.
Typically, due to the computational complexity of these tasks, a video codec does not encode or decode an entire picture at once. Instead, they can divide a picture into base segments (segments) and encode or decode the picture segment by segment. Such a base section is referred to in this disclosure as a base processing unit ("BPU"). For example, structure 110 of fig. 1 shows an example structure of a picture (e.g., any of pictures 102-108) of video sequence 100. In the structure 110, a picture is divided into 4 × 4 basic processing units, the boundaries of which are shown as dashed lines. In some implementations, the base processing unit may be referred to as a "macroblock" in some video coding standards (e.g., the MPEG series, h.261, h.263, or h.264/AVC), or a "coding tree unit" (CTU) in some other video coding standards (e.g., h.265/HEVC or h.266/VVC). In a picture, the basic processing unit may have a variable size, such as 128 × 128, 64 × 64, 32 × 32, 16 × 16, 4 × 8, 16 × 32, or any arbitrary shape and size of pixels. The size and shape of the basic processing unit may be selected for the picture based on a balance of coding efficiency and level of detail to be maintained in the basic processing unit.
The basic processing unit may be a logic unit that may include a set of different types of video data stored in computer memory (e.g., in a video frame buffer). For example, a basic processing unit of a color picture may include a luma component (Y) representing achromatic luma information, one or more chroma components (e.g., Cb and Cr) representing color information, and associated syntax elements, where the luma component and the chroma components may have the same size as the basic processing unit. In some video coding standards (e.g., h.265/HEVC or h.266/VVC), the luma component and chroma components may be referred to as "coding tree blocks" ("CTBs"). Any operation performed on the basic processing unit may be repeatedly performed on each of its luminance and chrominance components.
Video coding has a plurality of operation stages (stages), examples of which will be described in detail later. The size of the basic processing unit may still be too large to process for the various stages and may therefore be further partitioned. Are referred to in this disclosure as "basic processing subunits". In some embodiments, the basic processing subunit may be referred to as a "block" in some video coding standards (e.g., MPEG series, h.261, h.263, or h.264/AVC), or a "coding unit" ("CU") in some other video coding standards (e.g., h.265/HEVC or h.266/VVC). The basic processing sub-unit may have the same or a smaller size than the basic processing unit. Similar to the basic processing unit, the basic processing sub-unit is also a logic unit that may include a set of different types of video data (e.g., Y, Cb, Cr and associated syntax elements) stored in computer memory (e.g., in a video frame buffer). Any operation performed on the basic processing subunit may be repeated for each of its luma and chroma components. It should be noted that this division can be done to deeper levels depending on the processing needs. It should also be noted that different stages may use different schemes for dividing the basic processing unit.
Fig. 2 illustrates a schematic diagram of an example encoding process according to some embodiments of the disclosure, as shown in fig. 2, an encoder may encode a video sequence 202 into a video bitstream 228 according to process 200B. Similar to structure 110 in fig. 1, each raw picture of video sequence 202 may be divided by an encoder into basic processing units, basic processing sub-units, or regions for processing. In fig. 2, the encoder may feed the basic processing unit of the original picture of the video sequence 202 (referred to as the "original BPU") to the prediction stage to generate prediction data 206 and a prediction BPU 208. The encoder may subtract the predicted BPU 208 from the original BPU to generate a residual BPU 210. The encoder may feed the residual BPU 210 to a transform stage 212 and a quantization stage 214 to generate quantized transform coefficients 216. The encoder may feed the prediction data 206 and the quantized transform coefficients 216 to a binary encoding stage 226 to generate a video bitstream 228. Component 202, prediction stages, 206, 208, 210, 212, 214, 216, 226, and 228 may be referred to as a "forward path". During process 200B, after quantization stage 214, the encoder may feed quantized transform coefficients 216 to inverse quantization stage 218 and inverse transform stage 220 to generate reconstructed residual BPUs 222. The encoder may add the reconstructed residual BPU 222 to the prediction BPU 208 to generate a prediction reference 224, which prediction reference 224 is used in the prediction stage 204 for the next iteration of the process 200B. Components 218, 220, 222, and 224, etc. of process 200B may be referred to as "reconstruction paths. The reconstruction path may be used to ensure that the encoder and decoder use the same reference data for prediction.
Wherein, the prediction phase may include: spatial prediction and temporal prediction. Spatial prediction (e.g., intra-picture prediction or "intra prediction") may use pixels from one or more already-coded neighboring BPUs in the same picture to predict the current BPU. Spatial prediction can reduce the inherent spatial redundancy of pictures. Temporal prediction (e.g., inter-picture prediction or "inter prediction") may use regions from one or more already coded pictures to predict the current BPU. Temporal prediction can reduce the inherent temporal redundancy of pictures.
Referring to process 200B, in the forward path, the encoder performs prediction operations in spatial prediction stage 2042 and temporal prediction stage 2044. For example, in spatial prediction stage 2042, the encoder may perform intra prediction. The prediction reference 224 may include, for the original BPU of the picture being encoded, one or more neighboring BPUs in the same picture (in the forward path) that have been encoded and reconstructed (in the reconstruction path). The encoder may generate the predicted BPU 208 by extrapolating neighboring BPUs.
As another example, in temporal prediction stage 2044, the encoder may perform inter prediction. The prediction reference 224 may include one or more pictures (referred to as "reference pictures") that have been encoded (in the forward path) and reconstructed (in the reconstruction path) for the original BPU of the current picture. In some embodiments, the reference picture may be encoded and reconstructed by the BPU. For example, the encoder may add the reconstructed residual BPU 222 to the predicted BPU 208 to generate a reconstructed BPU. When generating all reconstructed BPUs of the same picture, the encoder may generate the reconstructed picture as a reference picture. The encoder may perform the operation of "motion estimation" to search for a matching region in a range of reference pictures (referred to as a "search window"). The location of the search window in the reference picture may be determined based on the location of the original BPU in the current picture.
Motion estimation may be used to identify various types of motion, such as, for example, translation, rotation, scaling, and so forth. For inter prediction, the prediction data 206 may include, for example, a location (e.g., coordinates) of the matching region, a motion vector associated with the matching region, a number of reference pictures, a weight associated with the reference pictures, and so on.
Still referring to the forward path of process 200B, after spatial prediction 2042 and temporal prediction stage 2044, at mode decision stage 230, the encoder may select a prediction mode (e.g., one of intra-prediction or inter-prediction) for the current iteration of process 200B. For example, the encoder may perform a rate-distortion optimization technique, wherein the encoder may select a prediction mode to minimize the value of the cost function according to the bit rate of the candidate prediction mode and the distortion of the reconstructed reference picture in the candidate prediction mode. Depending on the prediction mode selected, the encoder may generate corresponding prediction BPUs 208 and prediction data 206.
In the reconstruction path of process 200B, if the intra-prediction mode is selected in the forward path, after generating prediction reference 224 (e.g., the current BPU already encoded and reconstructed in the current picture), the encoder may feed prediction reference 224 directly to spatial prediction stage 2042 for later use (e.g., extrapolation of the next BPU for the current picture). If the inter prediction mode is selected in the forward path, after generating the prediction reference 224 (e.g., the current picture for which all BPUs have been encoded and reconstructed), the encoder may feed the prediction reference 224 to a loop filter stage 232, where the encoder may apply a loop filter to the prediction reference 224 to reduce or eliminate distortion (e.g., blockiness) introduced by the inter prediction. The encoder may apply various loop filter techniques, such as, for example, deblocking, sample adaptive offset, adaptive loop filter, etc., at the loop filter stage 232. The loop filtered reference pictures may be stored in buffer 234 (or "decoded picture buffer") for later use (e.g., inter-prediction reference pictures to be used as future pictures of video sequence 202). The encoder may store one or more reference pictures in buffer 234 for use in temporal prediction stage 2044. In some implementations, the encoder can encode the parameters of the loop filter (e.g., loop filter strength) as well as the quantized transform coefficients 216, prediction data 206, and other information in the binary encoding stage 226.
The encoder may iteratively perform the process 200B to encode (in a forward path) each original BPU of the original picture and to generate (in a reconstruction path) a prediction reference 224 for encoding the next original BPU of the original picture. After encoding all of the original BPUs of the original picture, the encoder may continue to encode the next picture in the video sequence 202.
Referring to process 200B, an encoder may receive a video sequence 202 generated by a video capture device (e.g., a camera). The term "receiving," as used herein, may refer to any act of receiving, inputting, obtaining, retrieving, obtaining, reading, accessing, or taking input data in any manner.
In the prediction phase 204, at the current iteration, the encoder may receive the original BPUs and prediction references 224 and perform prediction operations to generate prediction data 206 and predicted BPUs 208. The prediction reference 224 may be generated from a reconstruction path of a previous iteration of the process 200A. The purpose of the prediction phase 204 is to reduce information redundancy by extracting prediction data 206, which prediction data 206 can be used to reconstruct the original BPU into a predicted BPU 208 based on the prediction data 206 and a prediction reference 224.
To further compress the residual BPU210, at the transform stage 212, the encoder may reduce its spatial redundancy by decomposing the residual BPU210 into a set of two-dimensional "base patterns," each associated with a "transform coefficient. Different transformation algorithms may use different base patterns. Various transformation algorithms may be used in the transform stage 212, such as, for example, a discrete cosine transform, a discrete sine transform, and so forth. The transformation at the transformation stage 212 is reversible. That is, the encoder may restore the residual BPU210 by an inverse operation of the transform (referred to as "inverse transform"). The encoder may record only the transform coefficients from which the decoder may reconstruct the residual BPU210 without receiving the base mode from the encoder. The transform coefficients may have fewer bits than the residual BPU210, but they may be used to reconstruct the residual BPU210 without significant degradation. Thus, the residual BPU210 is further compressed.
The encoder may further compress the transform coefficients in a quantization stage 214. The quantization process is also reversible, in that the quantized transform coefficients 216 may be reconstructed as transform coefficients in an inverse operation of quantization (referred to as "inverse quantization").
In the binary encoding stage 226, the encoder may encode the prediction data 206 and the quantized transform coefficients 216 using a binary encoding technique (e.g., such as entropy encoding, variable length encoding, arithmetic encoding, huffman encoding, context-adaptive binary arithmetic encoding, or any other lossless or lossy compression algorithm). In some embodiments, the encoder may encode other information (e.g., such as the prediction mode used in the prediction stage 204, parameters of the prediction operation, the transform type in the transform stage 212, parameters of the quantization process (e.g., quantization parameters), encoder control parameters (e.g., bit rate control parameters), etc.) in the binary encoding stage 226 in addition to the prediction data 206 and the quantized transform coefficients 216. The encoder may use the output data of the binary encoding stage 226 to generate a video bitstream 228. In some embodiments, the video bitstream 228 may be further packetized for network transmission.
Fig. 3 illustrates a schematic diagram of an example decoding process according to some embodiments of the present disclosure, and as shown in fig. 3, process 300B may be a decompression process corresponding to compression process 200B in fig. 2.
The decoder may feed a portion of the video bitstream 228 associated with the basic processing unit of the encoded picture (referred to as the "encoding BPU") to the binary decoding stage 302. In the binary decoding stage 302, the decoder may decode the portion into prediction data 206 and quantized transform coefficients 216. The decoder may feed the quantized transform coefficients 216 to an inverse quantization stage 218 and an inverse transform stage 220 to generate a reconstructed residual BPU 222. The decoder may feed the prediction data 206 to the prediction stage to generate the predicted BPUs 208. The decoder may add the reconstructed residual BPU 222 to the prediction BPU 208 to generate the prediction reference 224. In some implementations, the prediction references 224 may be stored in a buffer (e.g., a decoded picture buffer in computer memory). The decoder may feed the prediction reference 224 to the prediction stage to perform the prediction operation in the next iteration of the process 300B.
The decoder may iteratively perform process 300A to decode respective encoded BPUs of an encoded picture and generate prediction references 224 for encoding a next encoded BPU of the encoded picture. After decoding all of the encoded BPUs for an encoded picture, the decoder may output the picture to the video stream 304 for display and continue decoding the next encoded picture in the video bitstream 228.
Therein, in process 300B, the prediction stage is divided into spatial prediction stage 2042 and temporal prediction stage 2044, and the reconstruction path includes loop filter stage 232 and buffer 234.
The prediction data 206 decoded by the decoder from the binary decoding stage 302 for the encoding base processing unit (referred to as the "current BPU") of the encoded picture being decoded (referred to as the "current picture") may include various types of data, depending on what prediction mode the encoder uses to encode the current BPU. For example, if the encoder uses intra prediction to encode the current BPU, the prediction data 206 may include a prediction mode indicator (e.g., a flag value) indicating intra prediction, parameters of intra prediction operations, and so on. For example, the parameters of the intra prediction operation may include the location (e.g., coordinates) of one or more neighboring BPUs used as a reference, the size of the neighboring BPUs, extrapolation parameters, the direction of the neighboring BPUs relative to the original BPU, and the like. For another example, if the encoder encodes the current BPU using inter prediction, the prediction data 206 may include a prediction mode indicator (e.g., a flag value) indicating inter prediction, parameters of an inter prediction operation, and so on. For example, the parameters of the inter prediction operation may include the number of reference pictures associated with the current BPU, weights respectively associated with the reference pictures, locations (e.g., coordinates) of one or more matching regions in the respective reference pictures, one or more motion vectors respectively associated with the matching regions, and so on.
Based on the prediction mode indicator, the decoder may decide whether to perform spatial prediction (e.g., intra prediction) in spatial prediction stage 2042 or temporal prediction (e.g., inter prediction) in temporal prediction stage 2044. Fig. 2 describes details of performing such spatial prediction or temporal prediction, which will not be described in detail. After performing such spatial prediction or temporal prediction, the decoder may generate a predicted BPU 208. The decoder may add the prediction BPU 208 and the reconstructed residual BPU 222 to generate the prediction reference 224.
In process 300B, the decoder may feed prediction reference 224 to either spatial prediction stage 2042 or temporal prediction stage 2044 to perform the prediction operation in the next iteration of process 300B. For example, if the current BPU is decoded using intra-prediction at spatial prediction stage 2042, after generating prediction reference 224 (e.g., the decoded current BPU), the decoder may feed prediction reference 224 directly to spatial prediction stage 2042 for later use (e.g., to extrapolate the next BPU for the current picture). If the current BPU is decoded using inter prediction in temporal prediction stage 2044, after generating prediction reference 224 (e.g., a reference picture for which all BPUs have been decoded), the encoder may feed the prediction reference 224 to a loop filter stage 232 to reduce or eliminate distortion (e.g., blocking artifacts). The decoder may apply a loop filter to the prediction reference 224 in the manner described in fig. 2. The loop-filtered reference pictures may be stored in a buffer 234 (e.g., a decoded picture buffer in computer memory) for later use (e.g., inter-prediction reference pictures to be used as future encoded pictures of the video bitstream 228). The decoder may store one or more reference pictures in buffer 234 for use in temporal prediction stage 2044. In some implementations, when the prediction mode indicator of the prediction data 206 indicates that inter prediction is used to encode the current BPU, the prediction data may further include parameters of a loop filter (e.g., loop filter strength).
Fig. 4 is a flowchart of a method for determining an encoding mode according to an exemplary embodiment, where the method is applied to selection of an encoding mode of an encoder before encoding a CU. For example, the inter-coding mode may include Skip mode, Merge mode, AMVP mode, and the intra-coding mode may include intra mode, IBC mode, and the like. In the process of selecting the coding mode, a mode with a smaller RDcost can be selected as a preferred coding mode by calculating a Rate Distortion Optimization (RDO) cost RDcost of the coding mode. However, the RDO process of most coding modes is complicated, which affects the coding efficiency of the encoder.
The method for determining the coding mode provided in the embodiments of the present specification improves the selection process of the coding mode, and aims to reduce the coding complexity, improve the efficiency of mode selection, and ensure higher coding performance as much as possible.
Before describing the method, the concept of the method is explained first: in view of the fact that the complexity of the RDO process of most coding modes is relatively high, the RDcost can be replaced by other costs of the coding modes to participate in the cost comparison between the modes, and the other costs can be obtained more simply and rapidly than the RDcost calculation process, so as to accelerate the mode selection process.
Taking the Merge mode as an example, the RDO process of the Merge mode needs discrete cosine transform, quantization, entropy coding and other processing on the prediction residual, the complexity is high, the SATD cost of the Merge mode can be used to replace RDcost calculation, and the SATD cost calculation process is simpler and faster than the RDcost calculation process. For example, the RDcost of Skip mode may be compared with the SATD of Merge mode to select Skip mode or Merge mode for encoding. For example, if the RDcost of the Skip mode is far smaller than the RDcost of the Merge mode, it can be obtained that the RDcost of the Skip mode is also smaller than the RDcost of the Merge mode, and the SATD calculation process is simpler than the RDcost, so that the cost comparison between the modes can be performed by using the SATD cost of the Merge mode instead of the RDcost.
The method for determining the coding mode illustrated in fig. 4 may be performed by an encoder, and specifically may be applied to mode selection when the encoder performs coding of a CU. As shown in fig. 4, the method may include the following processes:
in step 400, the RDcost of the first coding mode and the SATD of the second coding mode are obtained.
The first encoding mode and the second encoding mode may be selected in various ways, and the embodiment does not limit the specific modes.
Illustratively, the two modes may be two inter-coding modes, e.g., Skip mode and AMVP mode; alternatively, the two modes may be two intra coding modes, for example, intra mode and IBC mode; alternatively, it may be an inter mode and an intra mode.
In this embodiment, the first encoding mode is Skip mode, and the second encoding mode is Merge mode.
In addition, in the first coding mode and the second coding mode, one coding mode may be selected to calculate SATD, and the other coding mode may be calculated RDcost. For example, since the Skip mode does not need operations such as transformation and quantization of the residual, the RDO process of the Merge mode is more complex than that of the Skip mode, so that RDcost of the Skip mode can be calculated, and SATD of the Merge mode can be calculated. For another example, the calculated cost may also be determined according to the arrangement order of the coding modes to be selected, for example, the inter coding mode precedes the intra coding mode, the RDcost of the inter coding mode may be calculated, and the SATD of the intra coding mode may be calculated.
In step 402, obtaining an estimated coding cost of the second coding mode according to the SATD and a rate-distortion influence parameter; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode.
In this step, in order to reduce the encoding complexity of the encoder and improve the encoding performance as much as possible, after the SATD is calculated for the second encoding mode, the estimated encoding cost of the second encoding mode is obtained by combining the SATD and the rate-distortion influencing parameter. The following formula (1):
estimating coding cost as SATD + rate distortion impact parameters … … … (1)
Wherein the rate-distortion influencing parameter may be an associated parameter influencing the RDcost calculation of the second encoding mode.
Specifically, as mentioned above in the method of the embodiment of the present specification, the SATD of the second coding mode may participate in cost comparison between modes instead of RDcost, for example, if RDcost of Skip mode is much smaller than SATD of Merge mode, it is presumed that RDcost of Skip mode is also smaller than RDcost of Merge mode.
That is, see the following several cost comparison approaches:
the standard mode is as follows: comparing "RDcost of the first encoding mode" with "RDcost of the second encoding mode"
Alternative mode one: comparing the "RDcost of the first coding mode" with the "SATD of the second coding mode";
alternative mode two: comparing "RDcost of first coding mode" with "estimated coding cost of second coding mode"
When the second alternative is adopted, the comparison result is closer to the standard way. For example, if the standard comparison is adopted, the comparison result may be that "RDcost of the first coding mode" is smaller than "RDcost of the second coding mode", and accordingly the first coding mode should be selected as the preferred mode. Then, if alternative one is used, the comparison result may be the opposite, i.e. it may be that "RDcost of the first coding mode" is greater than "SATD of the second coding mode", so that the second coding mode should be selected as the preferred mode. And if the second alternative is adopted, the obtained comparison result can be that "RDcost of the first coding mode" is less than "estimated coding cost of the second coding mode", so that the first coding mode is selected as the preferred mode. As described above, when the "estimated coding cost of the second coding mode" is used, the comparison result obtained can be more accurate than when the "SATD of the second coding mode" participates in the mode cost comparison. The improvement of the accuracy of the comparison result can help to more accurately select the optimal coding mode, thereby improving the coding performance.
The rate distortion influence parameter refers to an influence parameter calculated by the RDcost of the second coding mode. On the basis of calculating the SATD of the second coding mode, the rate distortion influence parameter is introduced, so that the obtained estimated coding cost is closer to the RDcost of the second coding mode.
For example, taking the second encoding mode as a Merge mode as an example, the quantization step size Qstep may be introduced on the basis of calculating the SATD of the Merge mode. For example, the RDO process of Merge needs to perform transform, quantization, entropy coding, and other processes on the prediction residual, where the quantization parameter Qstep used in the quantization process will affect the RDcost of Merge, and therefore Qstep may be referred to as a rate-distortion affecting parameter that affects the RDcost calculation of Merge. Using "SATD + Qstep" as the estimated coding cost for Merge will be more accurate to participate in the cost comparison.
For another example, the first coding mode is Skip mode, and the second coding mode is AMVP mode. Since the RDO procedure of the AMVP mode is more complicated than that of the Skip mode, the RDcost of the Skip mode can be calculated, and the SATD of the AMVP mode can be calculated. Wherein, after obtaining the SATD of the AMVP mode, the estimated coding cost of the AMVP mode may be calculated according to the following formula (2):
Estimating coding costAMVP=SATDAMVP+QstepAMVP+MVAMVP………(2)
In the above formula, the rate distortion influencing parameters of the AMVP mode may include Qstep and MVAMVP(Motion Vector), both of which may affect the RDcost value of AMVP mode. Wherein MV is used for characterizingThe intensity of the motion of the object in the previous frame image and the next frame image.
In one example, when determining the estimated coding cost of the second coding mode according to the SATD and the rate-distortion influencing parameter, weighting coefficients may be set for the SATD and the rate-distortion influencing parameter, respectively, so that the SATD and the rate-distortion influencing parameter are weighted and added. See, for example, the following equation (3):
estimating coding cost α SATD + β rate distortion impact parameters … … … (3)
Here, α may be referred to as a first weight coefficient, β may be referred to as a second weight coefficient, "α × SATD" may be referred to as a regulation SATD, and "β × rate distortion influencing parameter" may be referred to as a regulation rate distortion influencing parameter. The values of α and β are not limited in this embodiment, for example, the value of α may be 0.1, and the value of β may be 0.3.
The embodiment does not limit the specific form of the formula (3), and the specific implementation may be changed. For example, α and β may be squared, or α and β may be adjusted in other forms.
In step 404, in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode, the first coding mode is selected as the mode used for encoding the target coding unit.
In this step, the RDcost of the first coding mode obtained in step 400 may be compared with the estimated coding cost of the second coding mode obtained in step 402, and if the RDcost of the first coding mode is smaller than the estimated coding cost of the second coding mode, the first coding mode may be selected as the mode used for coding the CU. For example, if the RDcost of skip is less than the estimated coding cost of Merge, skip mode may be selected as the mode for coding a CU.
In addition, if the RDcost of the first coding mode is greater than the estimated coding cost of the second coding mode, the RDcost of the second coding mode may be calculated, and then the rdcosts of the first coding mode and the second coding mode are compared, and a mode with a smaller RDcost is selected as the mode for coding the CU. For example, if the RDcost of skip is greater than the estimated coding cost of Merge, the RDcost of Merge may be calculated continuously, and the mode with smaller RDcost may be selected as the preferred mode for coding the CU.
Further, in another example, in a case that RDcost of the first coding mode is smaller than the estimated coding cost of the second coding mode, a difference between the RDcost and the RDcost may also be calculated, for example, "estimated coding cost — RDcost" is calculated. And comparing the difference value with a preset difference threshold value x, and if the difference value is greater than or equal to the difference threshold value x, determining the first coding mode as a mode for coding the CU. If the difference is smaller than the difference threshold x, the RDcost of the second coding mode may be calculated, and then the rdcosts of the first coding mode and the second coding mode are compared, and the mode with the smaller RDcost is selected as the mode for coding the CU. Thus, the cost comparison result is more accurate.
In the method for determining the coding mode of the embodiment, the second coding mode with higher complexity of the RDO process adopts SATD cost instead of RDcost to participate in cost comparison between the modes, and the first coding mode is directly selected for coding when the RDcost of the first coding mode is smaller than the estimated coding cost of the second coding mode, so that the RDcost of the second coding mode does not need to be calculated, that is, the RDO process with higher complexity is prevented from being executed on the second coding mode, thereby reducing the coding complexity of the coder; and the estimated coding cost of the second coding mode is not only based on SATD, but also rate distortion influence parameters which have influence on RDcost calculation of the second coding mode are introduced, so that the estimated coding cost can reflect the RDcost of the second coding mode more accurately, the cost comparison result between the modes is more accurate, the coding mode is more accurately selected, the coding performance is improved, the coding complexity is reduced, the coding efficiency is improved, and the coding performance is improved as far as possible.
Fig. 5 is a flowchart of another method for determining an encoding mode according to an exemplary embodiment, where the mode selection method is described by taking selection of a skip mode and a Merge mode as an example, but it is understood that the method is also applicable to selection between other two modes. As shown in fig. 5, the method may include the following processes:
in step 500, the initial RDcost of skip mode and the initial SATD of Merge mode are obtained.
In this step, the calculation of the initial RDcost in the skip mode and the initial SATD in the Merge mode may be obtained according to a conventional calculation manner of the SATD and the RDcost, which is not described in detail in this embodiment.
For example, the RDcost of the skip mode is calculated according to a conventional RDcost calculation method, and in order to distinguish the RDcost from the RDcost appearing in the subsequent step in terms of name, the RDcost is referred to as an initial RDcost, and the RDcost appearing in the subsequent step is obtained by performing a normalization process of scaling the initial RDcost. Similarly, the initial SATD of the Merge mode is obtained by the conventional SATD calculation method.
For example, in this step, the initial SATD of the Merge mode may be calculated first, and then the initial RDcost of the skip mode may be calculated.
In step 502, the initial RDcost and the initial SATD are normalized based on the size of the target coding unit, so as to obtain the RDcost of the first coding mode and the SATD of the second coding mode.
The target coding unit may be a CU to be coded.
The present embodiment may normalize the initial RDcost and the initial SATD to the same dimension based on the size of the CU. The condition of the encoding cost value on a single pixel can be reflected by carrying out normalization based on the CU size, so that the influence of the CU size on the cost value is avoided as much as possible, and the cost comparison result is more accurate.
For example, the above initial RDcost and initial SATD may be normalized according to the following formula (4) and formula (5), respectively:
as above, CUsize is the size of the target coding unit, SATD is the initial SATD, skip _ RDcost is the initial RDcost, SATDscaleThat is, the SATD of the Merge mode obtained after the initial SATD is normalized according to the formula (4), and in the same way, skip _ RDcostscaleNamely, the initial RDcost is normalized according to a formula (5) to obtain the RDcost of the skip mode.
In step 504, the coding modes corresponding to at least one associated coding unit of the target coding unit are obtained, where the associated coding unit and the target coding unit have a time domain or spatial domain coding correlation.
In this embodiment, after obtaining the RDcost of the first coding mode and the SATD of the second coding mode, the cost comparison between the modes may be performed according to the following formula (6):
skip_RDcostscale<α*SATDscale+β*Qstep……(6)
in the subsequent steps of this embodiment, it can be determined whether the above equation (6) holds. For example, α may be 0.1 and β may be 0.01+ offset. The above α may be referred to as a first weight coefficient, β may be referred to as a second weight coefficient, and the rate-distortion influencing parameter in this embodiment is Qstep,“β*Qstep"i.e. adjusting the rate-distortion influencing parameters,". alpha.. SATDscale"i.e. adjust SATD,". alpha.. SATDscale+β*Qstep"i.e. the estimated coding cost of the second coding mode, Merge.
Then the value of the second weight coefficient β can be determined through steps 504 to 508. As shown above, the value of β may be 0.01+ offset, and the value of offset needs to be determined first.
In this step, the encoding modes corresponding to at least one associated encoding unit of the target encoding unit may be obtained. The associated coding unit refers to a CU having a coding correlation in a time domain or a spatial domain with a target coding unit.
For example, in the same coding layer, a CU closer to a target coding unit is more likely to adopt the same coding mode as the target coding unit, i.e., a CU within a certain distance range from the target coding unit is likely to adopt the same coding mode as the target coding unit, and these CUs may be referred to as neighboring coding units having coding correlation in spatial domain with the target coding unit. In determining the coding mode of the target coding unit, referencing the coding modes of these neighboring coding units also helps to make the mode determination of the target coding unit more accurate. For example, the modes adopted by 4 adjacent coding units of the target coding unit during coding are all skip modes, and there is a high probability that the target coding unit should also adopt skip mode.
Fig. 6 illustrates the positions of 5 neighboring coding units, including CUs at left (L), top (a), top left (AL), Bottom Left (BL), and top right (AR) positions of the current CU (target coding unit). Of course, the number and positions of adjacent coding units are not limited in this embodiment, for example, the number of adjacent coding units may also be 3, and the positions may also be other positions than the positions illustrated in fig. 6, for example, a CU on the left side of L may also be referred to as an adjacent coding unit.
For another example, the associated coding unit may also be a parent coding unit of a previous coding layer of the target coding unit. For example, a 64 × 64 CU is divided to obtain 32 × 32 CUs, and then the 32 × 32 CU is the target coding unit, and the 64 × 64 CU is the parent CU of the previous coding layer, that is, is divided from the parent CU. Alternatively, the parent CU of the last coding layer of the target coding unit may be referred to, or the parent CUs of the first two coding layers may be referred to, for example, a 64 × 64 CU is divided to obtain 32 × 32 CUs, the 32 × 32 CU is divided to obtain 16 × 16 CUs, and if the 16 × 16 CU is referred to as the target coding unit, the 32 × 32 CU and the 64 × 64 CU are the parent CUs of the first two coding layers.
For another example, the associated coding unit may also be a time-domain co-located block of the target coding unit. The time-domain co-location block is a CU at the same position in the previous image frame of the current image frame where the target coding unit is located, which takes into account the time-domain correlation of the coding, and in the same way, the time-domain co-location block also has a high probability of adopting the same coding mode as the target coding unit. In addition, the number of the temporal co-located blocks may also be one or more, for example, a CU at the same position in the previous image frame may be referred to, that is, one temporal co-located block; alternatively, reference may also be made to the same location CU of the previous two image frames, i.e. two temporally co-located blocks.
In this step, when at least one associated coding unit of the target coding unit is obtained, the at least one associated coding unit may include at least one of the neighboring coding units, the parent CU, and the time-domain co-located block. For example, assuming that the coding mode of 5 associated coding units is to be obtained, the 5 associated coding units may include the following components:
for example, the 5 associated coding units may include 5 adjacent coding units, such as the positions illustrated in fig. 6.
For another example, the 5 associated coding units may include 4 neighboring coding units and 1 parent CU.
As another example, the 5 associated coding units may include 3 neighboring coding units, 1 parent CU, and 1 time-domain collocated block.
In addition, after determining the coding mode of a CU, the encoder may store the coding modes adopted by the CUs, and then, in this embodiment, several positions may be predetermined, and assuming that 5 positions illustrated in fig. 6 are predetermined, the encoder may detect whether or not adjacent coding units in the 5 positions exist. If the coding modes of the CUs exist, the previously stored coding modes of the CUs are read, and the coding modes corresponding to the associated coding units respectively are obtained.
In step 406, determining bias parameters corresponding to the at least one associated coding unit according to the coding modes corresponding to the at least one associated coding unit, respectively.
Please see the following formula (7):
as can be seen from the above, the value of the offset is obtained according to the offset corresponding to each associated coding unit, wherein the offset is the value of the corresponding coding unitiInstant association codeThe offset value corresponding to the code unit.
Table 1 below illustrates the offset values corresponding to each mode
TABLE 1 offset corresponding to each mode
Pattern type | Skip mode | Merge mode | AMVP mode | Intra/IBC mode |
offset | 0.01 | -0.16 | -0.16 | -0.03 |
After the encoding modes respectively corresponding to at least one associated encoding unit stored in the encoder are obtained, the offset parameter offset corresponding to each associated encoding unit can be obtained according to the table 1.
It should be noted that, as β is larger, equation (6) is easier to satisfy, and skip is easier to select as a preferred mode, the offset corresponding to the skip mode may be set to a positive number, and the other modes may be set to a negative number. However, the above table 1 is only an illustration, and the embodiment does not limit the value of the offset, and other values than table 1 may also be used, and the specific value may be determined according to the frame rate gain and the performance loss adjustment after the mode selection is performed on the video sequence in which the target coding unit is located.
In addition, the encoder may store a plurality of mapping tables of modes and offset parameters similar to table 1, and when different coding modes are selectively compared, the mapping table corresponding to the compared mode may be selected. For example, if the selection between Merge and AMVP is to be made, in the table used, the offset corresponding to Merge mode can be a positive number, the offset corresponding to other modes can be a negative number, and the specific value can be determined after being adjusted according to the frame rate gain and performance loss based on the video sequence.
In step 408, the second weight coefficients are obtained by combining the bias parameters respectively corresponding to the at least one associated coding unit.
For example, after obtaining the coding mode of 5 adjacent coding units in fig. 6, the offsets corresponding to the 5 adjacent coding units can be obtained according to table 1, and then the offsets can be obtained by adding according to formula (7). Then, a value of β, i.e., a second weight coefficient, is obtained according to that "the value of β may be 0.01+ offset".
It is to be understood that formula (7) and the relationship between β and offset "β is 0.01+ offset" can be changed, and is not limited to the above-listed examples. For example, the offset corresponding to each coding mode iWhen added, each offset may beiDifferent weights are assigned. When β is obtained from the offset, the offset may be multiplied by a certain weight coefficient, or the like.
In step 410, obtaining an adjustment rate distortion impact parameter according to a second weight coefficient and the rate distortion impact parameter; and obtaining an adjusted SATD according to the first weight coefficient and the SATD.
For example, according to "β x Qstep"obtaining adjusted rate-distortion influencing parameters, based on" α × SATDscale"get adjusted SATD.
In step 412, an estimated coding cost for the second coding mode is determined based on the adjusted SATD and the adjusted rate-distortion influencing parameter.
Wherein, according to the formula (6), "α SATDscale+β*QstepEstimation of "the second coding mode MergeAnd (4) counting the coding cost.
Further, in the above example, the 5-position neighboring coding units illustrated in fig. 6 may be detected, and if there is a case where the neighboring coding units are missing in the 5-position neighboring coding units, for example, when the target coding unit is located at the edge of the image frame, there may be a case where there is no neighboring coding unit at the Bottom Left (BL) position in fig. 6, or the neighboring coding units at the top left (AL) and left (L) positions shown in fig. 6 are not present, the second weight coefficient β may be set to 0.
In step 514, in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode, the first coding mode is selected as the mode used for encoding the target coding unit.
For example, according to equation (6), if the condition of equation (6) is satisfied, the RDcost of skip mode is "skip _ RDcostscale"less than estimated coding cost α x SATD for large modescale+β*QstepThe skip mode may be selected as the coding mode employed by the target coding unit.
On the contrary, if the condition of equation (6) is not satisfied, the RDcost of large may be calculated continuously, and then the RDcost of skip and the RDcost of large are compared, and the smaller one is selected as the preferred coding mode of the target coding unit.
The method for determining the coding mode of the embodiment not only reduces the coding complexity of the coder, enables the selection of the coding mode to be more accurate, and improves the coding performance, but also selects the mode of the target coding unit by referring to the coding mode of the associated coding unit of the target coding unit, considers the correlation of a time domain or a space domain in the coding process, further enables the cost comparison result to be more accurate, enables the mode selection to be more accurate, and improves the coding performance.
Fig. 7 is a schematic structural diagram of an encoding mode determining apparatus provided in an exemplary embodiment, where the apparatus may be used to implement the encoding mode determining method in any embodiment of this specification, and the structure of the apparatus is briefly described as follows, and the implementation principle of each specific module may be described with reference to the method embodiment. As shown in fig. 7, the apparatus may include: a cost acquisition module 71, a cost adjustment module 72 and a mode determination module 73.
A cost obtaining module 71, configured to obtain a rate distortion optimization cost RDcost of the first coding mode and an absolute transformation error and SATD of the second coding mode.
A cost adjusting module 72, configured to obtain an estimated coding cost of the second coding mode according to the SATD and a rate-distortion impact parameter; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode.
A mode determination module 73, configured to select the first coding mode as a mode used for encoding the target coding unit in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode.
In one example, the cost adjusting module 72, when configured to obtain the estimated coding cost of the second coding mode according to the SATD and the rate-distortion influencing parameter, includes: obtaining an adjusted SATD according to the first weight coefficient and the SATD; obtaining an adjustment rate distortion influence parameter according to a second weight coefficient and the rate distortion influence parameter; determining an estimated coding cost for the second coding mode based on the adjusting SATD and adjusting a rate-distortion affecting parameter.
In one example, the cost adjusting module 72, when configured to obtain the adjusted rate-distortion impact parameter according to the second weight coefficient and the rate-distortion impact parameter, includes: acquiring coding modes respectively corresponding to at least one associated coding unit of the target coding unit, wherein the associated coding unit and the target coding unit have time domain or space domain coding correlation; determining bias parameters respectively corresponding to the at least one associated coding unit according to coding modes respectively corresponding to the at least one associated coding unit; and combining the bias parameters respectively corresponding to the at least one associated coding unit to obtain the second weight coefficient.
In one example, the associated coding unit includes at least one of: an adjacent coding unit on the same coding layer as the target coding unit; a parent coding unit of a previous coding layer of the target coding unit; a time-domain co-located block of the target coding unit.
In one example, the cost adjusting module 72, when configured to obtain the adjusted rate-distortion influencing parameter according to the second weight coefficient and the rate-distortion influencing parameter, includes: detecting adjacent coding units at a plurality of preset positions on the same coding layer of the target coding unit; setting the second weight coefficient to 0 in response to a case where there is a missing neighboring coding unit at the plurality of positions.
In one example, the cost obtaining module 71, when configured to obtain the rate-distortion optimization cost RDcost of the first coding mode and the absolute transform error sum SATD of the second coding mode, includes: acquiring an initial RDcost of a first coding mode and an initial SATD of a second coding mode; and normalizing the initial RDcost and the initial SATD based on the size of the target coding unit to obtain the RDcost of the first coding mode and the SATD of the second coding mode.
In one example, the mode determining module 73, when configured to select the first coding mode as the mode used for encoding the target coding unit in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode, includes: under the condition that the RDcost of the first coding mode is smaller than the estimated coding cost of the second coding mode, acquiring a difference value between the estimated coding cost and the RDcost; in response to the difference reaching a difference threshold, selecting the first encoding mode as a mode for encoding a target coding unit.
In one example, the mode determining module 73 is further configured to: obtaining the RDcost of the second coding mode in response to the difference value being smaller than the difference value threshold; and according to the RDcost of the first coding mode and the RDcost of the second coding mode, selecting a mode with smaller RDcost from the first coding mode and the second coding mode as a mode used for coding the target coding unit.
Fig. 8 is a flowchart illustrating a method for selecting an inter-frame prediction coding mode according to an exemplary embodiment, where as shown in fig. 8, the method may include the following steps:
in step 800, the RDcost of the first inter prediction encoding mode and the SATD of the second inter prediction encoding mode are obtained.
For example, the first inter prediction encoding mode may be Skip mode, and the second inter prediction encoding mode may be Merge mode. The actual implementation is not limited to this, and other inter prediction encoding modes may be used.
In step 802, obtaining an estimated coding cost of the second inter-frame prediction coding mode according to the SATD and the rate-distortion impact parameters; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the inter-prediction coding mode.
For example, this step may calculate the Merge mode estimation coding cost based on the "SATD" and the rate-distortion impact parameters "of the Merge mode. For example, the rate distortion influencing parameter of the Merge mode of the present embodiment may be Qstep。
Then, SATD and Q according to Merge modestep"the estimated coding cost for this Merge mode can be obtained.
For example, the estimated coding cost of the Merge mode α SATD + β Q step. Where α may be referred to as a first weight coefficient and β may be referred to as a second weight coefficient. The values of α and β can be found in combination as described in the previous method examples.
In step 804, in case that the RDcost of the first inter prediction encoding mode is less than the estimated encoding cost of the second inter prediction encoding mode, the first inter prediction encoding mode is selected as the inter prediction encoding mode of the target coding unit.
For example, if RDcost of Skip mode is less than the estimated coding cost of Merge mode, Skip mode may be selected as the coding mode for inter-prediction of the target coding unit.
In addition, the above flow is only an example, and in another example, the RDcost of the Skip mode and the SATD of the Merge mode may be normalized based on the CU size, and then the RDcost of the Skip mode and the estimated coding cost of the Merge mode after the normalization may be compared, and if the RDcost of the Skip mode is smaller than the estimated coding cost of the Merge mode, the Skip mode may be selected as the coding mode for inter-prediction of the target coding unit.
An exemplary embodiment of the present specification also provides a schematic block diagram of an apparatus. On the hardware level, the device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, although other hardware may be included. One or more embodiments of the present description may be implemented on a software basis, such as by a processor reading a corresponding computer program from a non-volatile memory into a memory and then running the computer program. Of course, besides the software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combination of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
An exemplary embodiment of the present specification further provides a computer-readable storage medium on which computer instructions are stored, the instructions, when executed by a processor, implement the method for determining the encoding mode of any of the embodiments of the present specification.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.
Claims (13)
1. A method for determining a coding mode, the method comprising:
obtaining rate distortion optimization cost RDcost of a first coding mode and absolute transformation error and SATD of a second coding mode;
obtaining estimated coding cost of the second coding mode according to the SATD and the rate distortion influence parameters; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode;
in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode, selecting the first coding mode as a mode for use in encoding a target coding unit.
2. The method of claim 1,
the first encoding mode is Skip mode and the second encoding mode is Merge mode;
The rate distortion influencing parameter is the quantization step size Qstep.
3. The method of claim 1 wherein deriving an estimated coding cost for the second coding mode based on the SATD and a rate-distortion affecting parameter comprises:
obtaining an adjusted SATD according to the first weight coefficient and the SATD;
obtaining an adjustment rate distortion influence parameter according to a second weight coefficient and the rate distortion influence parameter;
determining an estimated coding cost for the second coding mode based on the adjusting SATD and adjusting a rate-distortion affecting parameter.
4. The method of claim 3, wherein obtaining an adjusted rate-distortion impact parameter according to the second weighting factor and the rate-distortion impact parameter comprises:
acquiring coding modes respectively corresponding to at least one associated coding unit of the target coding unit, wherein the associated coding unit and the target coding unit have coding correlation of a time domain or a space domain;
determining bias parameters respectively corresponding to the at least one associated coding unit according to the coding modes respectively corresponding to the at least one associated coding unit;
and combining the bias parameters respectively corresponding to the at least one associated coding unit to obtain the second weight coefficient.
5. The method of claim 4,
the associated coding unit comprises at least one of the following components:
an adjacent coding unit on the same coding layer as the target coding unit;
a parent coding unit of a previous coding layer of the target coding unit;
a time-domain co-located block of the target coding unit.
6. The method of claim 3, wherein obtaining an adjusted rate-distortion impact parameter according to the second weighting factor and the rate-distortion impact parameter comprises:
detecting adjacent coding units at a predetermined plurality of positions on the same coding layer of the target coding unit;
setting the second weight coefficient to 0 in response to a case where there is a missing neighboring coding unit at the plurality of positions.
7. The method according to any one of claims 1 to 6, wherein the obtaining of the rate-distortion optimization cost RDcost of the first coding mode and the absolute transform error sum SATD of the second coding mode comprises:
acquiring an initial RDcost of a first coding mode and an initial SATD of a second coding mode;
and normalizing the initial RDcost and the initial SATD based on the size of the target coding unit to obtain the RDcost of the first coding mode and the SATD of the second coding mode.
8. The method of claim 1, further comprising:
obtaining the RDcost of the second coding mode in response to the RDcost of the first coding mode being greater than the estimated coding cost of the second coding mode;
and according to the RDcost of the first coding mode and the RDcost of the second coding mode, selecting a mode with smaller RDcost from the first coding mode and the second coding mode as a mode used for coding the target coding unit.
9. The method of claim 1, wherein selecting the first coding mode as a mode for encoding a target coding unit in response to the RDcost of the first coding mode being less than an estimated coding cost of a second coding mode comprises:
under the condition that the RDcost of the first coding mode is smaller than the estimated coding cost of the second coding mode, acquiring a difference value between the estimated coding cost and the RDcost;
in response to the difference reaching a difference threshold, selecting the first encoding mode as a mode for encoding a target coding unit.
10. The method of claim 9, further comprising:
Acquiring the RDcost of the second coding mode in response to the difference value being smaller than the difference value threshold;
and according to the RDcost of the first coding mode and the RDcost of the second coding mode, selecting a mode with smaller RDcost from the first coding mode and the second coding mode as a mode used for coding the target coding unit.
11. An apparatus for determining a coding mode, the apparatus comprising:
the cost obtaining module is used for obtaining the rate distortion optimization cost RDcost of the first coding mode and the absolute transformation error and SATD of the second coding mode;
a cost adjusting module, configured to obtain an estimated coding cost of the second coding mode according to the SATD and a rate-distortion impact parameter; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the second coding mode;
a mode determination module to select the first coding mode as a mode to use for encoding a target coding unit in response to the RDcost of the first coding mode being less than the estimated coding cost of the second coding mode.
12. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
Wherein the processor implements the method of any one of claims 1-10 by executing the executable instructions.
13. A method for selecting an inter-prediction coding mode, the method comprising:
acquiring the RDcost of a first inter-frame prediction coding mode and the SATD of a second inter-frame prediction coding mode;
obtaining an estimated coding cost of the second inter-frame prediction coding mode according to the SATD and the rate distortion influence parameters; the rate distortion influence parameter is used for representing an associated parameter of the RDcost of the inter-frame prediction coding mode;
selecting the first inter-prediction encoding mode as the inter-prediction encoding mode of the target encoding unit if the RDcost of the first inter-prediction encoding mode is less than the estimated encoding cost of the second inter-prediction encoding mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210288519.9A CN114760467A (en) | 2022-03-22 | 2022-03-22 | Method and device for determining coding mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210288519.9A CN114760467A (en) | 2022-03-22 | 2022-03-22 | Method and device for determining coding mode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114760467A true CN114760467A (en) | 2022-07-15 |
Family
ID=82328019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210288519.9A Pending CN114760467A (en) | 2022-03-22 | 2022-03-22 | Method and device for determining coding mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114760467A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118301323A (en) * | 2024-04-01 | 2024-07-05 | 摩尔线程智能科技(北京)有限责任公司 | Coding unit CU coding mode determining method and device, electronic equipment and storage medium |
-
2022
- 2022-03-22 CN CN202210288519.9A patent/CN114760467A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118301323A (en) * | 2024-04-01 | 2024-07-05 | 摩尔线程智能科技(北京)有限责任公司 | Coding unit CU coding mode determining method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI717586B (en) | Deriving motion vector information at a video decoder | |
CN108848381B (en) | Video encoding method, decoding method, device, computer device and storage medium | |
KR100813963B1 (en) | Method and apparatus for loseless encoding and decoding image | |
US8964829B2 (en) | Techniques to perform fast motion estimation | |
JP5422124B2 (en) | Reference picture selection method, image encoding method, program, image encoding device, and semiconductor device | |
US10085028B2 (en) | Method and device for reducing a computational load in high efficiency video coding | |
WO2019062544A1 (en) | Inter frame prediction method and device and codec for video images | |
US9392280B1 (en) | Apparatus and method for using an alternate reference frame to decode a video frame | |
TW201735637A (en) | Merging filters for multiple classes of blocks for video coding | |
KR20090095012A (en) | Method and apparatus for encoding and decoding image using consecutive motion estimation | |
US20120027092A1 (en) | Image processing device, system and method | |
US8780971B1 (en) | System and method of encoding using selectable loop filters | |
KR20090095317A (en) | Method and apparatus for encoding and decoding image | |
US8594189B1 (en) | Apparatus and method for coding video using consistent regions and resolution scaling | |
CN108848377B (en) | Video encoding method, video decoding method, video encoding apparatus, video decoding apparatus, computer device, and storage medium | |
Hussain et al. | A survey on video compression fast block matching algorithms | |
CN113068026A (en) | Decoding prediction method, device and computer storage medium | |
JP2011250400A (en) | Moving picture encoding apparatus and moving picture encoding method | |
CN115836525B (en) | Video encoding, decoding method and apparatus for prediction from multiple cross components | |
CN114760467A (en) | Method and device for determining coding mode | |
US9615091B2 (en) | Motion picture encoding/decoding apparatus, and method and apparatus for hybrid block motion compensation/overlapped block motion compensation for same | |
CN117256149A (en) | Video codec using multi-mode linear model | |
CN115443650A (en) | Angle weighted prediction for inter prediction | |
KR20080035390A (en) | Method and apparatus for deciding video prediction mode | |
CN113395520A (en) | Decoding prediction method, device and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |