WO2023131059A1 - 图像编码方法、图像编码装置、电子设备和可读存储介质 - Google Patents

图像编码方法、图像编码装置、电子设备和可读存储介质 Download PDF

Info

Publication number
WO2023131059A1
WO2023131059A1 PCT/CN2022/143660 CN2022143660W WO2023131059A1 WO 2023131059 A1 WO2023131059 A1 WO 2023131059A1 CN 2022143660 W CN2022143660 W CN 2022143660W WO 2023131059 A1 WO2023131059 A1 WO 2023131059A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction mode
rate
target pixel
pixel block
inter
Prior art date
Application number
PCT/CN2022/143660
Other languages
English (en)
French (fr)
Inventor
张勇
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2023131059A1 publication Critical patent/WO2023131059A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the embodiment of the present application provides an image coding method, the image coding method includes:
  • the embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the first aspect are implemented.
  • Fig. 7 is one of the schematic block diagrams of the electronic device of the embodiment of the present application.
  • Inter-frame prediction aims to eliminate time-domain redundant information, that is, use previously encoded images to predict the image to be encoded now, which includes forward prediction (P frame) and bidirectional prediction (B frame). Inter prediction can search for matching macroblocks through macroblock-based motion estimation, and the motion vectors pointing to matching macroblocks can be integer-pixel or sub-pixel precision.
  • AVC Advanced Video Coding
  • the coding efficiency of inter-frame prediction has been greatly improved.
  • a macroblock with a size of 16 ⁇ 16 is used for inter-frame prediction, but such a fixed-size block method is often inflexible, especially larger macroblocks may contain images with different motion characteristics, which cannot accurately describe a macro All the movement details inside the block.
  • H.264/AVC adopts inter-frame prediction with variable block size, and its prediction block size can be changed from a maximum of 16 ⁇ 16 to 4 ⁇ 4, that is, an optimal frame is adaptively selected according to the characteristics of the image itself and motion characteristics prediction block size.
  • the prediction mode of variable block size provides more choices for inter-frame prediction of macroblocks, especially for the case where the macroblock contains multiple moving objects or the macroblock is located at the edge of the moving object, the variable block size can more accurately describe different
  • the motion of the object can improve the accuracy of inter-frame prediction.
  • H.264/AVC a variable block size inter-frame prediction technology, greatly improves the efficiency of predictive coding, but it also brings a significant increase in computational complexity.
  • Rate Distortion Optimization In H.264/AVC coding, in order to obtain the best inter-frame prediction block mode, Rate Distortion Optimization (RDO) is usually used to select the best prediction block size, that is, by using the bits used by each candidate mode The tradeoff between number and distortion is used to choose the best prediction block size.
  • RDO Rate Distortion Optimization
  • the luminance signal of each macroblock must traverse seven prediction modes, namely Inter16 ⁇ 16, Inter16 ⁇ 8, Inter8 ⁇ 16, Inter8 ⁇ 8, Inter8 ⁇ 4, Inter4 ⁇ 8, Inter4 ⁇ 4, and calculate the rate-distortion cost function, then compare the rate-distortion cost functions of each mode, and select the mode that minimizes the cost function as the best inter-frame prediction mode.
  • QP is the quantization parameter
  • IMODE represents one of all available inter-frame prediction modes
  • s represents the original pixel value of the luma block
  • c represents the reconstructed value
  • c undergoes DCT transformation, quantization, IDCT transformation, obtained by inverse quantization
  • QP) represents the number of coded bits when the IMODE mode is selected under the QP condition, including the number of bits used to code the prediction mode and the number of bits used to code the luma transformation coefficient.
  • CAVLC Context Adaptive Variable Length Coding
  • CABAC Context Adaptive Binary Arithmetic Coding
  • Motion Vector Difference Motion Vector Difference
  • MVP Motion Vector Prediction
  • this algorithm for traversing all patterns is computationally intensive, which imposes a heavy computational burden on the encoder.
  • An embodiment of the present application provides an image coding method, as shown in FIG. 4 , the image coding method includes:
  • the target frame image is divided into multiple pixel blocks (that is, macroblocks), each pixel block includes N rows and M columns of pixels, that is, a pixel block with a block size of N ⁇ M, For example, it is a 16 ⁇ 16 pixel block, and each pixel block is divided into blocks according to the above macroblock division principle.
  • motion estimation with a size of N ⁇ M blocks is performed to obtain an optimal reference frame and a matching macroblock.
  • calculate the SATD value of the target pixel block relative to the matching macroblock and then selectively compare possible classification modes according to the SATD value to obtain the inter-frame prediction mode of the target pixel block.
  • SKIP mode does not require residual coding, and its process is simple and the calculation complexity is low. Therefore, the SKIP mode can be detected in advance. If the SKIP mode can be detected in advance, the complex RDO calculation of other modes can be avoided.
  • Step 1 calculate the first rate-distortion cost value in SKIP mode.
  • the motion vector in SKIP mode is equal to the predicted motion vector, and the number of coded bits is 0, so based on formula (1), it can be seen that the rate-distortion cost value RDcost(SKIP) of SKIP mode is:
  • Step 3 Calculate the average rate-distortion cost avgRDcost(Inter16 ⁇ 16) of all coded pixel blocks in the current target frame image and other reference frames of the target frame image in the Inter16 ⁇ 16 mode.
  • the method for determining the inter-frame prediction mode includes:
  • Step 508 select an inter prediction mode in class III
  • Step 512 select skip mode.
  • pre-selecting classification based on prediction block size of target pixel block motion intensity and texture characteristics can pre-exclude certain prediction modes with less possibility, thereby reducing the complexity of inter-frame prediction.
  • determining the inter-frame prediction mode of the target pixel block according to the average horizontal standard deviation and the average vertical standard deviation of the target pixel block includes: calculating the average horizontal standard deviation and the average vertical standard deviation of the target pixel block Standard deviation; when the average horizontal standard deviation is greater than the third threshold and the average vertical standard deviation is greater than the fourth threshold, determine that the inter-frame prediction mode of the target pixel block is N ⁇ M prediction mode; when the average horizontal standard deviation is less than or equal to In the case of the third threshold, calculate the third rate-distortion cost value of the target pixel block in the N ⁇ m1 sub-block prediction mode, and determine the inter-frame rate-distortion cost value of the target pixel block according to the third rate-distortion cost value and the second rate-distortion cost value Prediction mode; when the average standard deviation is less than or equal to the fourth threshold, calculate the fourth rate-distortion cost value of the target pixel block in n1 ⁇ M sub-block prediction mode, according to
  • Type II candidate prediction modes include three modes: N ⁇ M prediction mode, N ⁇ m1 sub-block prediction mode, n1 ⁇ M sub-block prediction mode, for example, Inter16 ⁇ 16, Inter16 ⁇ 8, Inter8 ⁇ 16.
  • Inter16 ⁇ 16 predicts an entire pixel block, which is suitable for pixel blocks with relatively consistent motion. This type of pixel block is inside the same moving object, does not contain the edge of the moving object, and its horizontal texture and vertical texture are consistent.
  • Inter16 ⁇ 8 is suitable for pixel blocks with consistent motion in the horizontal direction and relatively complex motion in the vertical direction. This type of pixel block belongs to the same moving object in the horizontal direction and contains different moving objects in the vertical direction.
  • the horizontal texture has consistency
  • the vertical texture is relatively rich.
  • Inter8 ⁇ 16 is suitable for pixel blocks with consistent motion in the vertical direction and relatively complex motion in the horizontal direction.
  • This type of pixel block belongs to the same moving object in the vertical direction and contains different moving objects in the horizontal direction.
  • the vertical texture is consistent
  • the horizontal texture is relatively rich. Therefore, the embodiment of the present application further refines the candidate prediction modes according to the texture consistency of the pixel block in the horizontal direction and the vertical direction.
  • SD y is the standard deviation of the pixel values of row y, as shown in formula (11):
  • the texture of the pixel block has consistency in the vertical direction, indicating that all pixel values in each column of the pixel block are approximately equal.
  • the average vertical standard deviation SD V is used to detect this type of pixel block, and the calculation formula of the average vertical standard deviation SD V is as follows:
  • SD x is the standard deviation of the pixel values of column x, as shown in formula (13):
  • m x in formula (13) represents the mean value of all pixels in column x.
  • Step 1 calculate the average horizontal standard deviation SD H and the average vertical standard deviation SD V of the target pixel block.
  • Step 2 Determine whether SD H is greater than T3. If SD H ⁇ T3, proceed to step 3. If SD H > T3, determine whether SD V is greater than T4. If SD H > T3 and SD V > T4, determine the current target The inter-frame prediction mode of the pixel block is Inter16 ⁇ 16, and if SDV ⁇ T4, go to step 4. Wherein, T3 is the third threshold, and T4 is the fourth threshold.
  • Step 3 if SD H ⁇ T3, it indicates that the texture of the current target pixel block is consistent in the horizontal direction, and the possible candidate prediction modes are N ⁇ M prediction mode and N ⁇ m1 sub-block prediction mode, namely Inter16 ⁇ 16 and Inter16 ⁇ 8.
  • N ⁇ M prediction mode and N ⁇ m1 sub-block prediction mode namely Inter16 ⁇ 16 and Inter16 ⁇ 8.
  • the value and the second rate-distortion cost value in the N ⁇ M prediction mode determine the inter prediction mode of the target pixel block.
  • Step 4 if SD V ⁇ T4, it indicates that the texture of the current target pixel block has consistency in the vertical direction, and the possible candidate prediction modes are N ⁇ M prediction mode and n1 ⁇ M sub-block prediction mode, that is, Inter16 ⁇ 16 and Inter8 ⁇ 16.
  • N ⁇ M prediction mode and n1 ⁇ M sub-block prediction mode that is, Inter16 ⁇ 16 and Inter8 ⁇ 16.
  • Perform n1 ⁇ M motion estimation on the target pixel block to obtain the best reference frame and matching macroblock and then calculate the fourth rate-distortion cost value of the target pixel block in the n1 ⁇ M sub-block prediction mode, according to the fourth rate-distortion cost value and the second rate-distortion cost value in the N ⁇ M prediction mode to determine the inter-frame prediction mode of the target pixel block.
  • determining the inter-frame prediction mode of the target pixel block according to the third rate-distortion cost and the second rate-distortion cost includes: when the third rate-distortion cost is less than the second rate-distortion cost In the case of the cost value, determine that the inter-frame prediction mode of the target pixel block is the N ⁇ m1 sub-block prediction mode; in the case where the third rate-distortion cost value is greater than or equal to the second rate-distortion cost value, determine the frame of the target pixel block The inter-prediction mode is N ⁇ M prediction mode; according to the fourth rate-distortion cost value and the second rate-distortion cost value, determine the inter-frame prediction mode of the target pixel block, including: when the fourth rate-distortion cost value is less than the second rate-distortion cost value value, determine that the inter-frame prediction mode of the target pixel block is n1 ⁇ M sub-block prediction mode; when the fourth rate-dist
  • the third rate-distortion cost value RDcost(Inter N ⁇ m1) is the sum of the rate-distortion cost values of two N ⁇ m1 sub-blocks, that is, the third rate-distortion cost value RDcost(Inter N ⁇ m1) It is equivalent to the rate-distortion cost value of an N ⁇ M block.
  • the fourth rate-distortion cost value RDcost(Inter n1 ⁇ M) is the sum of the rate-distortion cost values of two n1 ⁇ M sub-blocks, that is, the fourth rate-distortion cost value RDcost(Inter n1 ⁇ M) It is equivalent to the rate-distortion cost value of an N ⁇ M block.
  • the best inter-frame prediction mode of the target pixel block can be determined in the type II candidate prediction mode, and the accuracy of determining the best inter-frame prediction mode of the target pixel block can be improved.
  • determining the first target candidate prediction mode corresponding to the minimum rate-distortion cost value among multiple first candidate prediction modes as the inter-frame prediction mode of the target pixel block includes: calculating the target pixel The fifth rate-distortion cost value of the block in n1 ⁇ m2 sub-block prediction mode; calculate the sixth rate-distortion cost value of the target pixel block in n2 ⁇ m1 sub-block prediction mode; calculate the target pixel block in n2 ⁇ m2 sub-block prediction The seventh rate-distortion cost value in the mode; calculate the eighth rate-distortion cost value of the target pixel block in the n1 ⁇ m1 sub-block prediction mode; combine the fifth rate-distortion cost value, the sixth rate-distortion cost value, and the seventh rate-distortion cost value
  • the first target candidate prediction mode corresponding to the smallest rate-distortion cost value among the cost value and the eighth rate-distortion cost value is determined as the inter-frame prediction mode of
  • the Class III candidate prediction modes include 4 modes: Inter n1 ⁇ m1, Inter n1 ⁇ m2, Inter n2 ⁇ m1, Inter n2 ⁇ m2, for example, Inter8 ⁇ 8, Inter8 ⁇ 4, Inter4 ⁇ 8, Inter4 ⁇ 4.
  • the pixel blocks corresponding to the type III candidate prediction modes belong to different moving objects in the horizontal direction and the vertical direction, and the motion is relatively severe.
  • the steps to determine the type III candidate prediction mode are as follows:
  • Rate-distortion cost value of the corresponding mode that is, calculate the fifth rate-distortion cost value in n1 ⁇ m2 mode, the sixth rate-distortion cost value in n2 ⁇ m1 mode, and the sixth rate-distortion cost value in n2 ⁇ m2 mode
  • the fifth rate-distortion cost value RDcost(Inter n1 ⁇ m2) is the sum of the rate-distortion cost values of the eight n1 ⁇ m2 sub-blocks, that is, the fifth rate-distortion cost value RDcost(Inter n1 ⁇ m2) It is equivalent to the rate-distortion cost value of an N ⁇ M block.
  • the sixth rate-distortion cost value RDcost(Inter n2 ⁇ m1) is the sum of the rate-distortion cost values of eight n2 ⁇ m1 sub-blocks, that is, the sixth rate-distortion cost value RDcost(Inter n2 ⁇ m1) is equivalent to an N ⁇ The rate-distortion penalty value for the M block.
  • the seventh rate-distortion cost value RDcost(Inter n2 ⁇ m2) is the sum of the rate-distortion cost values of sixteen n2 ⁇ m2 sub-blocks, that is, the seventh rate-distortion cost value RDcost(Inter n2 ⁇ m2) is equivalent to an N * Rate-distortion cost value for M blocks.
  • the eighth rate-distortion cost value RDcost(Inter n1 ⁇ m1) is the sum of the rate-distortion cost values of four n1 ⁇ m1 sub-blocks, that is, the eighth rate-distortion cost value RDcost(Inter n1 ⁇ m1) is equivalent to an N ⁇ The rate-distortion penalty value for the M block.
  • the target candidate prediction mode is used as the inter prediction mode of the target pixel block.
  • determining the second target candidate prediction mode corresponding to the minimum rate-distortion cost value among multiple second candidate prediction modes as the inter-frame prediction mode of the target pixel block includes: calculating the target pixel The ninth rate-distortion cost value of the block in the n2 ⁇ m2 intra-frame prediction mode; calculate the tenth rate-distortion cost value of the target pixel block in the N ⁇ M intra-frame prediction mode; combine the ninth rate-distortion cost value and the tenth rate-distortion cost value The second target candidate prediction mode corresponding to the smallest rate-distortion cost value among the distortion cost values is determined as the inter-frame prediction mode of the target pixel block.
  • the pixel block in order to improve the encoding efficiency and the robustness of the transmission process, the pixel block is allowed to adopt an intra prediction mode during inter-frame encoding, that is, adopt a type IV candidate prediction mode.
  • Class IV candidate prediction modes include 2 modes: Intra n2 ⁇ m2, Intra N ⁇ M, for example, Intra4 ⁇ 4, Intra16 ⁇ 16.
  • the ninth rate-distortion cost value calculates the rate-distortion cost value of the Intra N ⁇ M mode (that is, the tenth rate-distortion cost value), in the ninth rate-distortion cost value Among the cost value and the tenth rate-distortion cost value, the mode corresponding to the smallest rate-distortion cost value is selected as the inter prediction mode.
  • a quarter common intermediate format (Quarter Common Intermediate Format, QCIF) (that is, 176 ⁇ 144 pixels) video sequence is encoded, and one frame of image includes 99 macroblocks with a size of 16 ⁇ 16.
  • QCIF Quadrater Common Intermediate Format
  • the frame rate is 22 frames per second
  • the number of forward or backward reference frames is set to 1
  • rate-distortion optimization coding is enabled
  • the quantization parameter QP 28.
  • the relevant configuration parameters need to be multiplied by 2.
  • the current coding object is a P-frame image, which contains 99 macroblocks with a size of 16 ⁇ 16.
  • the steps for determining the best inter prediction mode include:
  • step 1 one macroblock (that is, the target macroblock) is sequentially selected from 99 macroblocks to determine an inter-frame prediction mode.
  • Step 2 firstly execute the step of judging the SKIP mode in advance, if the condition of the SKIP mode is met, then determine that the inter prediction mode of the current macroblock is the SKIP mode, and the inter prediction step ends. If it is not satisfied, judge the prediction matching degree, and select the corresponding candidate prediction mode according to the judgment condition, that is, compare the SATD value with the first threshold T1 and the second threshold T2, and when SATD ⁇ T1, then in class II Select the inter-frame prediction mode, when T1 ⁇ SATD ⁇ T2, select the inter-frame prediction mode in class III, and select the inter-frame prediction mode in class IV when SATD ⁇ T2.
  • Step 3 repeat the above steps 1 and 2 until all 99 macroblocks of the current P frame image are processed.
  • B frame coding and P frame coding have the following three points of difference:
  • the SKIP mode of B frame is B_SKIP.
  • step 1 one macroblock is sequentially selected from 99 macroblocks for inter-frame prediction mode decision.
  • Step 2 Firstly, the step of judging in advance of the SKIP mode is performed. If the condition of the SKIP mode is met, it is determined that the inter-frame prediction mode of the current macroblock is B_SKIP mode, and the inter-frame prediction step ends. If it is not satisfied, the prediction matching degree judgment is performed (that is, the SATD value is compared with the first threshold T1 and the second threshold T2), and the corresponding candidate prediction mode is selected according to the judgment condition. Since the B-frame intra-frame prediction mode is turned off, class IV candidate prediction modes are not considered here.
  • Step 3 repeating the above steps 1 and 2 until all 99 macroblocks of the current B-frame image are processed.
  • the embodiment of the present application proposes a method for quickly determining the inter-frame prediction mode.
  • the determination method classifies the macroblock inter-frame prediction mode according to the image motion characteristics, and pre-selects the prediction block size based on the macroblock motion intensity and texture characteristics.
  • the criterion excludes some less likely prediction block modes, reduces the calculation times of the rate-distortion cost function, and thus effectively reduces the complexity of inter-frame prediction.
  • the 4 modes of SKIP, Inter16 ⁇ 16, Inter16 ⁇ 8, and Inter8 ⁇ 16 in the inter-frame coding account for more than 60%.
  • the embodiment of the present application only needs to perform 3 times of motion estimation and rate-distortion cost function calculation in the worst case, compared to 41 times of motion in the full search algorithm in the H.264/AVC reference code Estimating and calculating the 148-time cost function, the embodiment of the present application greatly improves the encoding speed of the inter-frame prediction module in the video encoder.
  • the image encoding method provided in the embodiment of the present application may be executed by an image encoding device .
  • the image coding device provided in the embodiment of the present application is described by taking the image coding device executing the image coding method as an example.
  • the image encoding device 600 includes:
  • An acquisition module 602 configured to acquire a target frame image, wherein the target frame image includes a plurality of pixel blocks, each pixel block includes N rows and M columns of pixels, and N and M are positive integers;
  • a determination module 604 configured to determine the absolute error transformation sum of the target pixel block among the plurality of pixel blocks, and determine the inter prediction mode of the target pixel block according to the absolute error transformation sum;
  • the coding module 606 is configured to perform inter-frame coding on the target pixel block according to the determined inter-frame prediction mode.
  • the target frame image is divided into multiple pixel blocks (that is, macroblocks), each pixel block includes N rows and M columns of pixels, that is, the block size is N ⁇ M
  • the pixel block is, for example, a 16 ⁇ 16 pixel block, and each pixel block can be divided into blocks according to the above-mentioned principle of macro block division.
  • motion estimation with a size of N ⁇ M blocks is performed to obtain an optimal reference frame and a matching macroblock.
  • calculate the SATD value of the target pixel block relative to the matching macroblock and then selectively compare possible classification modes according to the SATD value to obtain the inter-frame prediction mode of the target pixel block.
  • the image encoding device 600 further includes: a judging module, configured to judge whether the inter-frame prediction mode of the target pixel block is a skip mode; an encoding module 606, also configured to When the inter-frame prediction mode is skip mode, perform inter-frame coding on the target pixel block according to the skip mode; the determining module 604 is specifically used to determine the target pixel block when the inter-frame prediction mode of the target pixel block is not skip mode The absolute error transform sum of the block.
  • the image encoding device 600 further includes: a calculation module, configured to: calculate the first rate-distortion cost value of the target pixel block in skip mode; calculate the target pixel block in N ⁇ M prediction The second rate-distortion cost value in the second rate-distortion mode; calculate the average rate-distortion cost value of the coded pixel block using the N ⁇ M prediction mode in the target frame image and other reference frame images; the judgment module is specifically used for the first rate-distortion cost value If the cost value is smaller than the second rate-distortion cost value and the first rate-distortion cost value is smaller than the average rate-distortion cost value, it is determined that the inter prediction mode of the target pixel block is the skip mode.
  • a calculation module configured to: calculate the first rate-distortion cost value of the target pixel block in skip mode; calculate the target pixel block in N ⁇ M prediction The second rate-distortion cost value in the second rate-distortion mode; calculate the average rate-dist
  • the determining module 604 is specifically configured to: determine the target pixel according to the average horizontal standard deviation and the average vertical standard deviation of the target pixel block when the absolute error transformation sum is less than the first threshold The inter-frame prediction mode of the block; when the absolute error transformation sum is greater than or equal to the first threshold and less than the second threshold, the first target candidate prediction mode corresponding to the minimum rate-distortion cost value among the multiple first candidate prediction modes , determined as the inter-frame prediction mode of the target pixel block; in the case where the absolute error transformation sum is greater than or equal to the second threshold, the second target candidate prediction mode corresponding to the minimum rate-distortion cost value among the plurality of second candidate prediction modes, Determined as the inter-frame prediction mode of the target pixel block; wherein, the first threshold is smaller than the second threshold, and the plurality of first candidate prediction modes include n1 ⁇ m1 sub-block prediction mode, n1 ⁇ m2 sub-block prediction mode, n2 ⁇ m1 sub-block prediction mode Pre
  • the calculation module is also used to calculate the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; the determination module 604 is specifically used to: when the average horizontal standard deviation is greater than the third threshold, And when the average vertical standard deviation is greater than the fourth threshold, determine that the inter-frame prediction mode of the target pixel block is an N ⁇ M prediction mode; when the average horizontal standard deviation is less than or equal to the third threshold, calculate the target pixel block in N
  • the third rate-distortion cost value in ⁇ m1 sub-block prediction mode according to the third rate-distortion cost value and the second rate-distortion cost value, determine the inter-frame prediction mode of the target pixel block; when the average standard deviation is less than or equal to the fourth rate-distortion cost value
  • calculate the fourth rate-distortion cost value of the target pixel block in the n1 ⁇ M sub-block prediction mode and determine the inter-frame prediction mode of the target pixel block according to the
  • the determination module 604 is specifically configured to: determine that the inter-frame prediction mode of the target pixel block is N ⁇ m1 when the third rate-distortion cost is smaller than the second rate-distortion cost Sub-block prediction mode; when the third rate-distortion cost value is greater than or equal to the second rate-distortion cost value, determine that the inter-frame prediction mode of the target pixel block is an N ⁇ M prediction mode; the determination module 604 is specifically used for: When the fourth rate-distortion cost value is less than the second rate-distortion cost value, determine that the inter-frame prediction mode of the target pixel block is n1 ⁇ M sub-block prediction mode; when the fourth rate-distortion cost value is greater than or equal to the second rate-distortion cost value In the case of value, it is determined that the inter-frame prediction mode of the target pixel block is an N ⁇ M prediction mode.
  • the calculation module is also used to: calculate the fifth rate-distortion cost value of the target pixel block in the n1 ⁇ m2 sub-block prediction mode; calculate the target pixel block in the n2 ⁇ m1 sub-block prediction mode The sixth rate-distortion cost value in the mode; calculate the seventh rate-distortion cost value of the target pixel block in the n2 ⁇ m2 sub-block prediction mode; calculate the eighth rate-distortion cost value of the target pixel block in the n1 ⁇ m1 sub-block prediction mode Value; the determination module 604 is specifically configured to use the first target candidate corresponding to the smallest rate-distortion cost value among the fifth rate-distortion cost value, the sixth rate-distortion cost value, the seventh rate-distortion cost value, and the eighth rate-distortion cost value
  • the prediction mode is determined as the inter-frame prediction mode of the target pixel block.
  • the image encoding apparatus 600 in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal.
  • the device may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (Ultra-Mobile Personal Computer, UMPC), a netbook or a personal digital assistant (Personal Digital Assistant).
  • non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (Personal Computer, PC), television (Television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
  • Network Attached Storage NAS
  • PC Personal Computer
  • TV Television, TV
  • teller machine or self-service machine etc.
  • the image encoding device 600 in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the image encoding device 600 provided in the embodiment of the present application can implement various processes implemented in the embodiment of the image encoding method in FIG. 4 , and details are not repeated here to avoid repetition.
  • the embodiment of the present application also provides an electronic device 700, including a processor 702, a memory 704, and a program or instruction stored in the memory 704 and executable on the processor 702.
  • the program when the instructions are executed by the processor 702, each process of the above-mentioned image encoding method embodiment can be realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 800 includes, but is not limited to: a radio frequency unit 802, a network module 804, an audio output unit 806, an input unit 808, a sensor 810, a display unit 812, a user input unit 814, an interface unit 816, a memory 818, and a processor 820, etc. part.
  • the electronic device 800 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 820 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 8 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine some components, or arrange different components, and details will not be repeated here. .
  • the processor 820 is used to: obtain the target frame image, wherein the target frame image includes a plurality of pixel blocks, each pixel block includes N rows and M columns of pixels, and N and M are positive integers; The absolute error transformation sum of the target pixel block, and determining the inter-frame prediction mode of the target pixel block according to the absolute error transformation sum; performing inter-frame encoding on the target pixel block according to the determined inter-frame prediction mode.
  • the target frame image is divided into multiple pixel blocks (that is, macroblocks), each pixel block includes N rows and M columns of pixels, that is, the block size is N ⁇ M
  • the pixel block is, for example, a 16 ⁇ 16 pixel block, and each pixel block can be divided into blocks according to the above-mentioned principle of macro block division.
  • motion estimation with a size of N ⁇ M blocks is performed to obtain an optimal reference frame and a matching macroblock.
  • calculate the SATD value of the target pixel block relative to the matching macroblock and then selectively compare possible classification modes according to the SATD value to obtain the inter-frame prediction mode of the target pixel block.
  • the processor 820 is configured to: determine whether the inter prediction mode of the target pixel block is a skip mode; if the inter prediction mode of the target pixel block is a skip mode, according to the skip mode Perform inter-frame coding on the target pixel block; if the inter-frame prediction mode of the target pixel block is not skip mode, determine the absolute error transformation sum of the target pixel block.
  • the processor 820 is configured to: calculate the first rate-distortion cost value of the target pixel block in skip mode; calculate the second rate-distortion cost value of the target pixel block in N ⁇ M prediction mode value; calculate the average rate-distortion cost value of the coded pixel block using the N ⁇ M prediction mode in the target frame image and other reference frame images; when the first rate-distortion cost value is less than the second rate-distortion cost value, and the first rate-distortion cost value If the distortion cost is smaller than the average rate-distortion cost, it is determined that the inter prediction mode of the target pixel block is the skip mode.
  • the processor 820 is configured to: determine the target pixel block's Inter-frame prediction mode; when the absolute error transformation sum is greater than or equal to the first threshold and less than the second threshold, determine the first target candidate prediction mode corresponding to the minimum rate-distortion cost value among the multiple first candidate prediction modes is the inter-frame prediction mode of the target pixel block; in the case where the absolute error transformation sum is greater than or equal to the second threshold, the second target candidate prediction mode corresponding to the minimum rate-distortion cost value among the plurality of second candidate prediction modes is determined as The inter-frame prediction mode of the target pixel block; wherein, the first threshold is smaller than the second threshold, and the plurality of first candidate prediction modes include n1 ⁇ m1 sub-block prediction mode, n1 ⁇ m2 sub-block prediction mode, n2 ⁇ m1 sub-block prediction mode and n2 ⁇ m2 sub-block prediction modes, a plurality of second candidate prediction modes include n2 ⁇ m2 intra prediction modes and N ⁇ M intra prediction modes
  • the processor 820 is configured to: calculate the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; when the average horizontal standard deviation is greater than the third threshold, and the average vertical standard deviation is greater than the fourth threshold In the case of the threshold value, determine that the inter-frame prediction mode of the target pixel block is the N ⁇ M prediction mode; in the case of the average standard deviation being less than or equal to the third threshold value, calculate the target pixel block in the N ⁇ m1 sub-block prediction mode The third rate-distortion cost value, according to the third rate-distortion cost value and the second rate-distortion cost value, determine the inter-frame prediction mode of the target pixel block; when the average standard deviation is less than or equal to the fourth threshold value, calculate the target pixel For the fourth rate-distortion cost value of the block in n1 ⁇ M sub-block prediction mode, determine the inter-frame prediction mode of the target pixel block according to the fourth rate-distortion cost value and the second rate-d
  • the processor 820 is configured to: determine that the inter-frame prediction mode of the target pixel block is an N ⁇ m1 sub-block when the third rate-distortion cost is smaller than the second rate-distortion cost Prediction mode; when the third rate-distortion cost value is greater than or equal to the second rate-distortion cost value, determine that the inter-frame prediction mode of the target pixel block is an N ⁇ M prediction mode; the determination module 604 is specifically used to: in the fourth When the rate-distortion cost value is less than the second rate-distortion cost value, determine that the inter-frame prediction mode of the target pixel block is n1 ⁇ M sub-block prediction mode; when the fourth rate-distortion cost value is greater than or equal to the second rate-distortion cost value In this case, it is determined that the inter-frame prediction mode of the target pixel block is an N ⁇ M prediction mode.
  • the processor 820 is configured to: calculate the fifth rate-distortion cost value of the target pixel block in the n1 ⁇ m2 sub-block prediction mode; calculate the fifth rate-distortion cost value of the target pixel block in the n2 ⁇ m1 sub-block prediction mode Calculate the seventh rate-distortion cost value of the target pixel block in the n2 ⁇ m2 sub-block prediction mode; calculate the eighth rate-distortion cost value of the target pixel block in the n1 ⁇ m1 sub-block prediction mode ;
  • the first target candidate prediction mode corresponding to the minimum rate-distortion cost value among the fifth rate-distortion cost value, the sixth rate-distortion cost value, the seventh rate-distortion cost value, and the eighth rate-distortion cost value is determined as the target pixel block
  • the inter prediction mode for for .
  • the input unit 808 may include a graphics processor (Graphics Processing Unit, GPU) 8082 and a microphone 8084, and the graphics processor 8082 is used for the image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
  • the display unit 812 may include a display panel 8122, and the display panel 8122 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 814 includes at least one of a touch panel 8142 and other input devices 8144 .
  • the touch panel 8142 is also called a touch screen.
  • the touch panel 8142 may include two parts, a touch detection device and a touch controller.
  • Other input devices 8144 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, and details will not be described here.
  • the memory 818 can be used to store software programs as well as various data.
  • the memory 818 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc.
  • memory 818 can include volatile memory or nonvolatile memory, or, memory 818 can include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • Erasable PROM Erasable PROM
  • EPROM erasable programmable read-only memory
  • Electrical EPROM Electrical EPROM
  • EEPROM electronically programmable Erase Programmable Read-Only Memory
  • Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM).
  • RAM Random Access Memory
  • static random access memory Static RAM, SRAM
  • dynamic random access memory Dynamic RAM, DRAM
  • synchronous dynamic random access memory Synchronous DRAM, SDRAM
  • Double data rate synchronous dynamic random access memory Double Data Rate SDRAM, DDRSDRAM
  • enhanced SDRAM synchronous dynamic random access memory
  • Synch link DRAM SLDRAM
  • Direct Memory Bus Random Access Memory Direct Rambus RAM, DRRAM
  • the processor 820 may include one or more processing units; optionally, the processor 820 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 820 .
  • the embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, each process of the above-mentioned image coding method embodiment is realized, and the same technical Effect, in order to avoid repetition, will not repeat them here.
  • the processor is the processor in the electronic device in the foregoing embodiments.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above image coding method embodiments, and can achieve the same Technical effects, in order to avoid repetition, will not be repeated here.
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image coding method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
  • the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
  • the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开了一种图像编码方法、图像编码装置、电子设备和可读存储介质,属于图像处理技术领域。该图像编码方法包括:获取目标帧图像,其中,目标帧图像包括多个像素块,每个像素块包括N行、M列像素,N、M为正整数;确定多个像素块中的目标像素块的绝对误差变换和;根据绝对误差变换和,确定目标像素块的帧间预测模式;根据确定的帧间预测模式对目标像素块进行帧间编码。

Description

图像编码方法、图像编码装置、电子设备和可读存储介质
相关申请的交叉引用
本申请主张2022年01月04日在中国提交的中国专利申请号202210003592.7的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于图像处理技术领域,具体涉及一种图像编码方法、图像编码装置、电子设备和可读存储介质。
背景技术
视频编码以视频信号的高度相关性和人眼的视觉特性为出发点,通过恰当的编码方式,消除各种相关性和人眼特性所产生的冗余,来达到压缩视频信号、减小传输码率的目的。视频信号的相关性可以分为时域相关性和空域相关性,时域相关性指的是图像序列中相邻图像之间的相似性,对于视频序列而言,其前后相邻帧往往包含相同的背景和对象,只是由于镜头的转动或对象的移动使得空间位置发生了变化,因此视频序列在时域存在极强的相关性。通常采用帧间预测(Inter-Frame Prediction)编码,也即,将视频序列帧中连续的图像内容进行匹配,对匹配的内容进行预测,从而降低冗余。
目前,为了更加准确地描述不同对象的运动情况,提高帧间预测的准确性,帧间预测主要采用可变块大小的分块方式,但是针对帧间预测模块,每个宏块亮度信号都要遍历7种块大小预测模式,并计算每种预测模式下的率失真代价函数,最后选择率失真代价最小的模式作为最优 预测模式,这就导致基于率失真优化的帧间预测模式选择算法计算复杂度非常高,严重影响了编码器的实时性。
发明内容
本申请实施例的目的是提供一种图像编码方法、图像编码装置、电子设备和可读存储介质,能够解决相关技术中基于率失真优化的帧间预测模式选择算法计算复杂度较高的问题。
第一方面,本申请实施例提供了一种图像编码方法,该图像编码方法包括:
获取目标帧图像,其中,目标帧图像包括多个像素块,每个像素块包括N行、M列像素,N、M为正整数;
确定多个像素块中的目标像素块的绝对误差变换和;
根据绝对误差变换和,确定目标像素块的帧间预测模式;
根据确定的帧间预测模式对目标像素块进行帧间编码。
第二方面,本申请实施例提供了一种图像编码装置,该图像编码装置包括:
获取模块,用于获取目标帧图像,其中,目标帧图像包括多个像素块,每个像素块包括N行、M列像素,N、M为正整数;
确定模块,用于确定多个像素块中的目标像素块的绝对误差变换和,以及根据绝对误差变换和,确定目标像素块的帧间预测模式;
编码模块,用于根据确定的帧间预测模式对目标像素块进行帧间编码。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在存储器上并可在处理器上运行的程序或指令,程序或指令被处理器执行时实现如第一方面的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,可读存储介质上存储程序或指令,程序或指令被处理器执行时实现如第一方面的方法的步骤。
第五方面,本申请实施例提供了一种芯片,芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现如第一方面的方法。
在本申请实施例中,获取目标帧图像后,将目标帧图像划分为多个像素块(也即宏块),每个像素块包括N行、M列像素,也即块大小为N×M的像素块,例如为16×16的像素块,每个像素块可按照上述宏块分块原则进行分块。对于目标帧图像的任一像素块(也即目标像素块),进行N×M块大小的运动估计,得到最佳参考帧和匹配宏块。进而计算该目标像素块相对于匹配宏块的SATD值,再根据SATD值有选择性地比较可能的分类模式,得到目标像素块的帧间预测模式。最后,按照得到的帧间预测模式对目标像素块进行帧间编码。通过本申请实施例的上述方式,一方面能够尽量减小了搜索模式的数量,从而降低帧间预测模式选择的运算复杂度;另一方面,能够防止最佳帧间预测模式被遗漏,造成编码质量的下降。
附图说明
图1是本申请实施例的率失真代价函数计算流程示意图;
图2是本申请实施例的宏块的结构划分示意图;
图3是本申请实施例的子块的结构划分示意图;
图4是本申请实施例的图像编码方法的流程示意图;
图5是本申请实施例的帧间预测模式的确定方法的流程示意图;
图6是本申请实施例的图像编码装置的示意框图;
图7是本申请实施例的电子设备的示意框图之一;
图8是本申请实施例的电子设备的示意框图之二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
视频信号的相关性分为时域相关性和空域相关性。其中,空域相关性指的是同一图像中相邻像素之间的相似性,其主要通过帧内预测(Intra-Frame Prediction)编码来消除,也即,利用同一帧图像中周围相邻的像素来预测当前像素的值。时域相关性指的是图像序列中相邻图像之间的相似性,对于视频序列而言,其前后相邻帧往往包含相同的背景和对象,只是由于镜头的转动或对象的移动使得空间位置发生了变化,因此视频序列在时域存在极强的相关性。通常采用帧间预测编码,也即,将视频序列帧中连续的图像内容进行匹配,对匹配的内容进行预测,从而降低冗余。
帧间预测旨在消除时域冗余信息,即利用之前编码过的图像来预测现在要编码的图像,其包括前向预测(P帧)和双向预测(B帧)。帧间预测可以通过基于宏块的运动估计来搜索匹配宏块,指向匹配宏块的运动矢量可以是整像素或子像素精度。在H.264/高级视频编码(Advanced  Video Coding,AVC)视频编码中,帧间预测的编码效率得到了极大的提升。通常采用大小为16×16的宏块进行帧间预测,但这样固定大小的分块方式往往不具有灵活性,特别是较大的宏块可能包含具有不同运动特征的图像,不能准确描述一个宏块内部所有的运动细节。H.264/AVC采用了可变块大小的帧间预测,其预测块大小可从最大16×16一直变化到4×4,即按照图像自身特点以及运动特点自适应地选择一个最佳的帧间预测块大小。可变块大小的预测模式为宏块帧间预测提供了更多的选择,尤其对宏块中包含多个运动对象或宏块位于活动对象边缘的情况,可变块大小可以更加准确地描述不同对象的运动情况,从而提高帧间预测的准确性。H.264/AVC这种可变块大小帧间预测技术极大提高了预测编码的效率,但同时也带来了计算复杂度的明显增加。
具体地,H.264/AVC的帧间预测算法采用了树状结构分块及运动估计的编码技术,树状结构分块是指每个宏块可以按4种方式进行分块,如图2所示,包括:1个16×16宏块,或2个16×8子块,或2个8×16子块,或4个8×8子块。而8×8模式(子块分割)的每个子块还可以进一步以4种方式进行分块,如图3所示,包括:1个8×8子块,或2个8×4子块,或2个4×8子块,或4个4×4子块。
在进行帧间编码时,每种分块模式都要尝试一次,通过运动估计计算出宏块各种可能的分块方式所能得到的最小代价,然后在这些最小代价中选择最小的那个对应的分块模式就是该宏块的最佳分块模式。
宏块的色度分量(Cr和Cb)为相应亮度的一半(水平和垂直各一半)。色度块采用和亮度块同样的分块模式,只是尺寸减半(水平和垂直方向都减半)。色度块的运动矢量(Motion Vector,MV)也是通过相应亮度MV的水平和垂直分量减半而得到。
H.264/AVC采取的树状结构分块技术使得它对宏块的分块不是采取一种单一的方式,而是自适应地决定宏块分块方式,以期最佳地描述一 个宏块的运动细节。在帧间预测编码模式下,每个分块都有一个MV需要被编码和传输,块选择信息也会被编码到编码码流中。大尺寸的分块,块选择信息和MV只需要少量的字节传输,但是这样的预测精度比较低,其所要编码的残差信号能量比较大,需要较多的字节;小尺寸的分块,运动估计精度比较高,可以得到能量较小的残差,但是相反,它却要对每个子块都传输一个MV,并且子块类型信息也需要较多的编码比特。因此如何在这两者之间取得折衷,这是设计帧间预测算法时需要考虑的一个重要问题。
H.264/AVC编码中,为了获得最佳的帧间预测块模式,通常使用率失真优化(Rate Distortion Optimization,RDO)来选择最佳的预测块大小,即通过采用每种候选模式所用的比特数与失真之间的折衷来选择最佳的预测块大小。在H.264/AVC标准中,针对帧间预测,每个宏块的亮度信号都要遍历7种预测模式,即Inter16×16、Inter16×8、Inter8×16、Inter8×8、Inter8×4、Inter4×8、Inter4×4,并计算率失真代价函数,然后比较各模式的率失真代价函数,选择使代价函数最小的模式作为最佳帧间预测模式。整个帧间预测率失真代价函数的计算过程如图1所示。显然,在整个率失真代价函数计算过程中,编码器在每种预测模式下要重复执行如下计算:运动估计、运动补偿、整数离散余弦变换(Discrete Cosine Transform,DCT)变换/量化、逆量化/整数离散余弦反变换(Inverse Discrete Cosine Transform,IDCT)变换、熵编码。
其中,率失真代价函数定义为:
J(s,c,IMODE|QP,λ MODE)=SSD(s,c,IMODE|QP)+λ MODER(s,c,IMODE|QP)
(1)
公式(1)中,QP为量化参数,IMODE代表所有可用帧间预测模式中的某一种模式,s代表亮度块原始像素值,c代表重构值,c经过DCT变 换、量化、IDCT变换、逆量化得到。R(s,c,IMODE|QP)代表在QP条件下选择IMODE模式时的编码比特数,包括编码预测模式用的比特数和编码亮度变换系数用的比特数,计算编码比特数时使用上下文自适应可变长编码(Context Adaptive Variable Length Coding,CAVLC)编码或者上下文自适应二进制算术编码(Context Adaptive Binary Arithmetic Coding,CABAC)编码。λ MODE是模式选择的拉格朗日乘子,其定义为λ MODE=0.85×2 (QP-12)/3。SSD(s,c,IMODE|QP)是s和c的平方误差和,c是在QP、IMODE的条件下得到,令(x,y)代表分块的大小,A代表像素区域,则有:
Figure PCTCN2022143660-appb-000001
在H.264/AVC帧间编码时,除了上面所述的7种预测模式外,其帧间预测还支持帧内预测模式和跳跃(SKIP)模式。帧内预测包括Intra4×4和Intra16×16两种类型,Intra4×4有9种预测模式,Intra16×16有4种预测模式。SKIP模式是一种特殊的帧间16×16模式,SKIP模式只针对宏块编码,就是一个宏块完全不用编码只需在码流中标明其为SKIP宏块即可。SKIP宏块包括P_SKIP类型宏块和B_SKIP类型宏块,P_SKIP类型宏块也就是COPY宏块,既无运动矢量残差(Motion Vector Difference,MVD),也不编码量化残差,解码时,直接用运动矢量预测值(Motion Vector Prediction,MVP)作为运动矢量得到像素预测值,解码像素重构值等于像素预测值;B_SKIP类型宏块也是既无MVD,也不编码量化残差,解码时通过Direct预测模式计算出前、后向MV得到像素预测值,像素重构值等于像素预测值。
在H.264/AVC标准参考代码中,一个宏块的帧间预测模式选择过程包括:分别对1个16×16宏块、2个16×8子块、2个8×16子块进行运动估计,并计算对应模式的率失真代价值,并从上述三种模式中选择具有最小率失真代价值的模式作为备选模式。将8×8块分成1个8×8 子块、2个8×4子块、2个4×8子块、4个4×4子块进行运动估计,并计算对应模式的率失真代价值,并从上述四种模式中选择具有最小率失真代价值的模式作为8×8子块的备选模式,再计算完16×16宏块的4个8×8子块。将4个8×8块备选模式的率失真代价值相加得到该16×16宏块采用P8×8模式编码的率失真代价值,其中,P8×8包括Inter8×8、Inter8×4、Inter4×8、Inter4×4。计算Intra16×16模式、Intra4×4模式的率失真代价值,选择具有最小率失真代价值的模式作为帧内预测备选模式。计算SKIP模式下的率失真代价值。最后,从Inter16×16、Inter16×8、Inter8×16、P8×8、SKIP模式、帧内预测备选模式中选择率失真代价值最小的模式作为该16×16宏块的最佳帧间预测模式。
由上述步骤可知1个16×16宏块运动估计和率失真代价计算的次数为:16×16宏块1次,16×8子块2次,8×16子块2次,8×8子块4次,8×4子块8次,4×8子块8次、4×4子块16次,共计1+2+2+4+8+8+16=41次。同时,该宏块帧内预测的模式组合数为:16×9+4=148,即需要计算148次代价函数以选择最佳帧内预测模式。显然,这种遍历所有模式的算法计算量极大,这给编码器带来了沉重的计算负担。
上述方式,导致基于率失真优化的帧间预测模式选择算法计算复杂度非常高,成为影响编码器实时性的瓶颈。因此,在不增加码率和保证编码图像质量的情况下,降低搜索算法的复杂度,提高编码器的实时性成为帧间预测在实际应用中必须解决的关键问题。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的图像编码方法、图像编码装置、电子设备和可读存储介质进行详细地说明。
本申请实施例提供一种图像编码方法,如图4所示,该图像编码方法包括:
步骤402,获取目标帧图像,其中,目标帧图像包括多个像素块,每个像素块包括N行、M列像素,N、M为正整数;
步骤404,确定多个像素块中的目标像素块的绝对误差变换和;
步骤406,根据绝对误差变换和,确定目标像素块的帧间预测模式;
步骤408,根据确定的帧间预测模式对目标像素块进行帧间编码。
在该实施例中,采用如图2和图3所示的树状结构分块,图像中平滑区域的块倾向于选择较大的分块类型,而纹理复杂的块倾向于选择较小的分块类型;运动较小的块倾向于选择较大的分块类型,而运动剧烈的块倾向于选择较小的分块类型。
绝对误差变换和(Sum of Absolute Transform Difference,SATD)是经过变换的残差,既反映了失真和预测块匹配程度,又能在一定程度上反映生成码流的大小。对于纹理细腻、细节变化较多、运动剧烈或运动边缘区的宏块,SATD值通常较大。而纹理简单、细节变化较少、运动平滑的宏块,SATD值通常较小。基于此,本申请实施例提出了一种结合宏块SATD值和纹理复杂度的帧间预测模式的确定方法,利用宏块SATD值的分布来判断宏块是位于运动剧烈区域还是运动平缓区域,同时结合图像的纹理复杂度来判决宏块是否包含复杂的纹理信息。在本申请实施例中,宏块分块原则如下:在运动剧烈、细节变化较多、纹理细腻的区域,选择较小的分块。在运动平滑、细节变化较少、纹理简单的区域,选择较大的分块。
依据图像运动特点,本申请实施例将宏块帧间预测模式进行分类,进而根据SATD值有选择性地比较可能的分类模式,在重建图像质量和编 码码率不变的条件下,减少了率失真代价函数计算次数,同时能够比较准确的估计帧间预测模式。
具体地,获取目标帧图像后,将目标帧图像划分为多个像素块(也即宏块),每个像素块包括N行、M列像素,也即块大小为N×M的像素块,例如为16×16的像素块,每个像素块按照上述宏块分块原则进行分块。对于目标帧图像的任一像素块(也即目标像素块),进行N×M块大小的运动估计,得到最佳参考帧和匹配宏块。进而计算该目标像素块相对于匹配宏块的SATD值,再根据SATD值有选择性地比较可能的分类模式,得到目标像素块的帧间预测模式。最后,按照得到的帧间预测模式对目标像素块进行帧间编码。
通过本申请实施例的上述方式,一方面能够尽量减小了搜索模式的数量,从而降低帧间预测模式选择的运算复杂度;另一方面,能够防止最佳帧间预测模式被遗漏,造成编码质量的下降。
进一步地,在本申请一个实施例中,在确定目标像素块的绝对误差变换和之前,该图像编码方法还包括:判断目标像素块的帧间预测模式是否为跳跃模式;在目标像素块的帧间预测模式为跳跃模式的情况下,根据跳跃模式对目标像素块进行帧间编码;确定目标像素块的绝对误差变换,包括:在目标像素块的帧间预测模式不为跳跃模式的情况下,确定目标像素块的绝对误差变换和。
在该实施例中,在视频序列中存在着一些空域上均匀或者时域上平稳的区域,如图像的背景区域。在这种类型的区域中,通常以较大的块尺寸进行编码,例如SKIP模式或者Inter16×16的子块预测模式。SKIP模式不需要进行残差编码,其过程简单且计算复杂度低,因此,可以预先检测SKIP模式,如果能够预先检测到SKIP模式,就可以避免其它模式复杂的RDO计算。
具体地,在确定目标像素块的绝对误差变换和之前,先判断目标像素块的帧间预测模式是否为SKIP模式,如果确定为SKIP模式,则直接利用SKIP模式进行目标像素块的帧间编码。
通过优先判决SKIP模式的方式,在确定目标像素块的帧间预测模式为SKIP模式时,提前终止模式选择的流程,降低帧间预测算法的复杂度,提升视频编码器的实时性。
进一步地,在本申请一个实施例中,判断目标像素块的帧间预测模式是否为跳跃模式,包括:计算目标像素块在跳跃模式下的第一率失真代价值;计算目标像素块在N×M预测模式下的第二率失真代价值;计算目标帧图像和其他参考帧图像中,采用N×M预测模式的已编码像素块的平均率失真代价值;在第一率失真代价值小于第二率失真代价值,且第一率失真代价值小于平均率失真代价值的情况下,确定目标像素块的帧间预测模式为跳跃模式。
在该实施例中,判决SKIP模式的方式具体包括:
步骤1,计算SKIP模式下的第一率失真代价值。SKIP模式运动矢量等于预测运动矢量,且编码比特数为0,所以基于公式(1)可知,SKIP模式的率失真代价值RDcost(SKIP)为:
RD cost(SKIP)=SSD(s,c|QP)     (3)
步骤2,计算在N×M预测模式,也即Inter16×16的子块预测模式下的第二率失真代价值。对目标像素块进行16×16块大小的运动估计,得到最佳参考帧和匹配宏块,然后计算该Inter16×16模式的率失真代价RDcost(Inter16×16)。
步骤3,计算当前的目标帧图像以及目标帧图像的其他参考帧中,所有已编码且采用帧间预测模式为Inter16×16模式的像素块的平均率失真代价值avgRDcost(Inter16×16)。
步骤4,如果满足如下2个预设条件,则判决当前目标像素块的帧间预测模式为SKIP模式。预设条件包括:
RDcost(SKIP)<RDcost(Inter16×16)     (4)
RDcost(SKIP)<avgRDcost(Inter16×16)   (5)
通过上述方式,实现对的SKIP模式的判断,从而在确定目标像素块的帧间预测模式为SKIP模式时,提前终止模式选择的流程,降低帧间预测算法的复杂度,提升视频编码器的实时性。
需要说明的是,在优先判决SKIP模式的情况下,为了计算目标像素块的绝对误差变换和,进行的N×M块大小的运动估计,即为上述步骤2中所进行的运动估计。
进一步地,在本申请一个实施例中,根据绝对误差变换和,确定目标像素块的帧间预测模式,包括:在绝对误差变换和小于第一阈值的情况下,根据目标像素块的平均水平标准差和平均垂直标准差,确定目标像素块的帧间预测模式;在绝对误差变换和大于或等于第一阈值,且小于第二阈值的情况下,将多个第一候选预测模式中最小率失真代价值对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式;在绝对误差变换和大于或等于第二阈值的情况下,将多个第二候选预测模式中最小率失真代价值对应的第二目标候选预测模式,确定为目标像素块的帧间预测模式;其中,第一阈值小于第二阈值,多个第一候选预测模式包括n1×m1子块预测模式、n1×m2子块预测模式、n2×m1子块预测模式和n2×m2子块预测模式,多个第二候选预测模式包括n2×m2帧内预测模式和N×M帧内预测模式,n1=N/2,m1=M/2,n2=N/4,m2=M/4。
在该实施例中,在进行N×M块大小的运动估计后,得到最佳参考帧和匹配宏块,然后可以计算目标像素块的SATD值,其计算公式如下:
SATD(s,c)=∑|T{s(x,y)-c(x,y)}|   (6)
公式(6)中,s代表亮度像素原始值,c代表重构值,T代表哈达玛变换,x代表目标像素块的行,x代表目标像素块的列,x=1,2,3......,16,y=1,2,3......,16。令H i代表i阶哈达玛矩阵,则:
T(w)=H i×w×H i     (7)
哈达玛矩阵H i可以通过递推得到:
Figure PCTCN2022143660-appb-000002
Figure PCTCN2022143660-appb-000003
本申请实施例将宏块帧间预测模式进行分类,包括:I类、II类、III类以及IV类,其中,I类包括SKIP模式,II类包括N×M预测模式、N×m1子块预测模式、n1×M子块预测模式,III类包括n1×m1子块预测模式、n1×m2子块预测模式、n2×m1子块预测模式、n2×m2子块预测模式,IV类包括n2×m2帧内预测模式、N×M帧内预测模式。示例性地,在N=16、M=16时,宏块帧间预测模式的分类情况,如表1所示。
表1
Figure PCTCN2022143660-appb-000004
根据SATD值的分布情况,将目标像素块的预测匹配度分为3种情况,每种情况选择不同的候选预测分类,如表2所示。进而,有选择性地确定除I类以外的可能的帧间预测模式。
表2
Figure PCTCN2022143660-appb-000005
表2中,T1为第一阈值、T2为第二阈值,T1<T2,其通过大量实验结果统计得到。如果满足SATD<T1,则在II类中,根据目标像素块的平均水平标准差和平均垂直标准差,确定目标像素块的帧间预测模式。如果满足T1≤SATD<T2,则将多个第一候选预测模式(也即III类)中最小率失真代价值对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式。如果满足SATD≥T2,则将多个第二候选预测模式(也即IV类)中最小率失真代价值对应的第二目标候选预测模式,确定为目标像素块的帧间预测模式。
在本申请的一个实施方式中,结合表1的帧间预测模式分类表,如图5所示,帧间预测模式的确定方法包括:
步骤502,判断目标像素块的帧间预测模式是否为跳跃模式,在目标像素块的帧间预测模式不为跳跃模式的情况下,进入步骤504,在目标像素块的帧间预测模式为跳跃模式的情况下,进入步骤512;
步骤504,将SATD值与第一阈值T1、第二阈值T2进行比较,在SATD<T1的情况下,进入步骤506,在T1≤SATD<T2的情况下,进入步骤508,在SATD≥T2的情况下,进入步骤510;
步骤506,在II类中选择帧间预测模式;
步骤508,在III类中选择帧间预测模式;
步骤510,在IV类中选择帧间预测模式;
步骤512,选择跳跃模式。
本申请实施例中,基于目标像素块块运动剧烈程度和纹理特性的预测块大小预先选择分类,可以预先排除某些可能性较小的预测模式,从而降低帧间预测的复杂度。
进一步地,在本申请一个实施例中,根据目标像素块的平均水平标准差和平均垂直标准差,确定目标像素块的帧间预测模式,包括:计算目标像素块的平均水平标准差和平均垂直标准差;在平均水平标准差大于第三阈值,且平均垂直标准差大于第四阈值的情况下,确定目标像素块的帧间预测模式为N×M预测模式;在平均水平标准差小于或等于第三阈值的情况下,计算目标像素块在N×m1子块预测模式下的第三率失真代价值,根据第三率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式;在平均水平标准差小于或等于第四阈值的情况下,计算目标像素块在n1×M子块预测模式下的第四率失真代价值,根据第四率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式。
在该实施例中,II类候选预测模式包括3种模式:N×M预测模式、N×m1子块预测模式、n1×M子块预测模式,例如,Inter16×16、Inter16×8、Inter8×16。Inter16×16对一整个像素块做预测,适合于运动相对一致的像素块,这类像素块在同一运动对象内部,不包含运动对象的边缘,其水平纹理、垂直纹理具有一致性。Inter16×8适合于在水平方向上运动一致、垂直方向上运动相对复杂的像素块,这类像素块在水平方向上属于同一运动对象,在垂直方向上包含不同的运动对象,水平纹理具有一致性而垂直纹理相对丰富。Inter8×16适合于在垂直方向上运动一致、水平方向上运动相对复杂的像素块,这类像素块在 垂直方向上属于同一运动对象,在水平方向上包含不同的运动对象,垂直纹理具有一致性而水平纹理相对丰富。因而,本申请实施例依据像素块水平方向、垂直方向的纹理一致性来进一步细化候选预测模式。
像素块在水平方向上纹理具有一致性,表明像素块每一行的所有像素值近似相等。平均水平标准差SD H被用来检测这一类型的像素块,平均水平标准差SD H的计算公式如下:
Figure PCTCN2022143660-appb-000006
其中,SD y为y行像素值的标准差,如公式(11)所示:
Figure PCTCN2022143660-appb-000007
公式(11)中p(x,y)为像素块各个亮度像素值,m y表示y行所有像素的均值。
像素块在垂直方向上纹理具有一致性,表明像素块每一列的所有像素值近似相等。平均垂直标准差SD V被用来检测这一类型的像素块,平均垂直标准差SD V的计算公式如下:
Figure PCTCN2022143660-appb-000008
其中,SD x为x列像素值的标准差,如公式(13)所示:
Figure PCTCN2022143660-appb-000009
公式(13)中m x表示x列所有像素的均值。
结合以上平均水平标准差SD H和平均垂直标准差SD V的定义,II类候选预测模式的确定步骤如下:
步骤1,计算目标像素块的平均水平标准差SD H和平均垂直标准差SD V
步骤2,判断SD H是否大于T3,如果SD H≤T3,则进行步骤3,如果SD H>T3则判断SD V是否大于T4,如果SD H>T3且SD V>T4,则确定当前的目 标像素块的帧间预测模式为Inter16×16,如果SD V≤T4则进行步骤4。其中,T3为第三阈值,T4为第四阈值。
步骤3,若SD H≤T3,表明当前目标像素块在水平方向上纹理具有一致性,可能的候选预测模式为N×M预测模式和N×m1子块预测模式,也即Inter16×16和Inter16×8。对目标像素块进行N×m1运动估计,得到最佳参考帧和匹配宏块,进而计算当前目标像素块在N×m1子块预测模式下的第三率失真代价值,根据第三率失真代价值和N×M预测模式下的第二率失真代价值,确定目标像素块的帧间预测模式。
步骤4,若SD V≤T4,表明当前目标像素块在垂直方向上纹理具有一致性,可能的候选预测模式为N×M预测模式和n1×M子块预测模式,也即,Inter16×16和Inter8×16。对目标像素块进行n1×M运动估计,得到最佳参考帧和匹配宏块,进而计算目标像素块在n1×M子块预测模式下的第四率失真代价值,根据第四率失真代价值和N×M预测模式下的第二率失真代价值,确定目标像素块的帧间预测模式。
通过上述方式,在判定SATD<T1时,排除其他分类的可能性,仅在II类候选预测模式中确定目标像素块的最佳帧间预测模式,缩小了预测模式的范围,有效降低了帧间预测的复杂度。
进一步地,在本申请一个实施例中,根据第三率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式,包括:在第三率失真代价值小于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×m1子块预测模式;在第三率失真代价值大于或等于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×M预测模式;根据第四率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式,包括:在第四率失真代价值小于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为n1×M子块预测模式;在第四率失真代 价值大于或等于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×M预测模式。
在该实施例中,若SD H≤T3,并计算得到N×m1子块预测模式的第三率失真代价值RDcost(Inter N×m1)后,如果RDcost(Inter N×m1)<RDcost(Inter N×M),则最佳帧键预测模式为Inter N×m1,如果RDcost(Inter N×m1)≥RDcost(Inter N×M),则最佳帧键预测模式为Inter N×M。例如,N=16,M=16,如果RDcost(Inter16×8)<RDcost(Inter16×16),则最佳模式为Inter16×8,如果RDcost(Inter16×8)≥RDcost(Inter16×16),则最佳模式为Inter16×16。
需要说明的是,第三率失真代价值RDcost(Inter N×m1)为两个N×m1子块的率失真代价值的加和,也即第三率失真代价值RDcost(Inter N×m1)相当于一个N×M块的率失真代价值。
若SD V≤T4,并计算得到n1×M子块预测模式的第四率失真代价值RDcost(Inter n1×M)后,如果RDcost(Inter n1×M)<RDcost(Inter N×M),则最佳帧键预测模式为Inter n1×M,如果RDcost(Inter n1×M)≥RDcost(Inter N×M),则最佳帧键预测模式为Inter N×M。例如,N=16,M=16,如果RDcost(Inter8×16)<RDcost(Inter16×16),则最佳模式为Inter8×16,如果RDcost(Inter8×16)≥RDcost(Inter16×16),则最佳模式为Inter16×16。
需要说明的是,第四率失真代价值RDcost(Inter n1×M)为两个n1×M子块的率失真代价值的加和,也即第四率失真代价值RDcost(Inter n1×M)相当于一个N×M块的率失真代价值。
通过上述方式,实现在II类候选预测模式中确定目标像素块的最佳帧间预测模式,提高确定目标像素块的最佳帧间预测模式的准确性。
进一步地,在本申请一个实施例中,将多个第一候选预测模式中最小率失真代价值对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式,包括:计算目标像素块在n1×m2子块预测模式下的第五率失真代价值;计算目标像素块在n2×m1子块预测模式下的第六率失真代价值;计算目标像素块在n2×m2子块预测模式下的第七率失真代价值;计算目标像素块在n1×m1子块预测模式下的第八率失真代价值;将第五率失真代价值、第六率失真代价值、第七率失真代价值以及第八率失真代价值中最小率失真代价值所对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式。
在该实施例中,III类候选预测模式包括4种模式:Inter n1×m1、Inter n1×m2、Inter n2×m1、Inter n2×m2,例如,Inter8×8、Inter8×4、Inter4×8、Inter4×4。III类候选预测模式对应的像素块在水平方向、垂直方向上属于不同的运动对象,且运动较剧烈。III类候选预测模式的确定步骤如下:
步骤1,将n1×m1块分成1个n1×m1子块、2个n1×m2子块、2个n2×m1子块、4个n2×m2子块,例如,对8×8块分成1个8×8子块(Inter8×8)、2个8×4子块(Inter8×4)、2个4×8子块(Inter4×8)、4个4×4子块(Inter4×4)。并进行运动估计,分别计算对应模式的率失真代价值,也即,计算n1×m2模式下的第五率失真代价值,n2×m1模式下的第六率失真代价值,n2×m2模式下的第七率失真代价值,在n1×m1模式下的第八率失真代价值。
需要说明的是,第五率失真代价值RDcost(Inter n1×m2)为八个n1×m2子块的率失真代价值的加和,也即第五率失真代价值RDcost(Inter n1×m2)相当于一个N×M块的率失真代价值。
第六率失真代价值RDcost(Inter n2×m1)为八个n2×m1子块的率失真代价值的加和,也即第六率失真代价值RDcost(Inter n2×m1)相当于一个N×M块的率失真代价值。
第七率失真代价值RDcost(Inter n2×m2)为十六个n2×m2子块的率失真代价值的加和,也即第七率失真代价值RDcost(Inter n2×m2)相当于一个N×M块的率失真代价值。
第八率失真代价值RDcost(Inter n1×m1)为四个n1×m1子块的率失真代价值的加和,也即第八率失真代价值RDcost(Inter n1×m1)相当于一个N×M块的率失真代价值。
进而在第五率失真代价值、第六率失真代价值、第七率失真代价值以及第八率失真代价值中确定最小的率失真代价值,并将最小的率失真代价值对应的第一目标候选预测模式作为目标像素块的帧间预测模式。
通过上述方式,在判定T1≤SATD<T2时,排除其他分类的可能性,仅在III类候选预测模式中确定目标像素块的最佳帧间预测模式,缩小了预测模式的范围,有效降低了帧间预测的复杂度。
进一步地,在本申请一个实施例中,将多个第二候选预测模式中最小率失真代价值对应的第二目标候选预测模式,确定为目标像素块的帧间预测模式,包括:计算目标像素块在n2×m2帧内预测模式下的第九率失真代价值;计算目标像素块在N×M帧内预测模式下的第十率失真代价值;将第九率失真代价值和第十率失真代价值中最小率失真代价值所对应的第二目标候选预测模式,确定为目标像素块的帧间预测模式。
在该实施例中,为了提高编码的效率和传输过程的鲁棒性,允许像素块在帧间编码时采用帧内预测模式,也即采用IV类候选预测模式。IV类候选预测模式包括2种模式:Intra n2×m2、Intra N×M,例如,Intra4×4、Intra16×16。
计算Intra n2×m2模式的率失真代价值(也即第九率失真代价值),以及计算Intra N×M模式的率失真代价值(也即第十率失真代价值),在第九率失真代价值和第十率失真代价值中,选择最小率失真代价值对应的模式作为帧间预测模式。
通过上述方式,在判定SATD≥T2时,排除其他分类的可能性,仅在IV类候选预测模式中确定目标像素块的最佳帧间预测模式,缩小了预测模式的范围,有效降低了帧间预测的复杂度。
示例性地,对一个四分之一通用中间格式(Quarter Common Intermediate Format,QCIF)(也即176×144像素)视频序列进行编码,一帧图像包含99个16×16大小的宏块。视频编码时,I帧、P帧和B帧编码工具均打开,并且I帧是周期性插入,每1秒钟视频插入一个I帧,B帧:P帧=2:1,即编码器对视频序列按IBBPBBPBBP……的方式进行编码。设定编码器为帧编码模式,帧率为22帧/秒,运动估计搜索范围为W=16,前向或后向参考帧的个数设定为1,开启率失真优化编码,量化参数QP=28。另外,需要说明的是,如果采用场编码模式,则相关配置参数需要乘以2。
本申请实施例的使用分为P帧编码和B帧编码两种情形,下面分别进行说明:
(1)对于P帧编码,假定当前编码对象为1个P帧图像,其包含99个大小为16×16的宏块。确定最佳帧间预测模式的步骤包括:
步骤1,依次从99个宏块中选择1个宏块(也即目标宏块)进行帧间预测模式判决。
步骤2,如图5所示,首先执行SKIP模式的提前判断步骤,如果满足SKIP模式条件,则判定当前宏块的帧间预测模式为SKIP模式,帧间预测步骤结束。如果不满足则进行预测匹配度判决,并根据判定条件选择对应的候选预测模式,也即,将SATD值与第一阈值T1、第二阈值T2进行比 较,当SATD<T1,则在II类中选择帧间预测模式,当T1≤SATD<T2,则在III类中选择帧间预测模式,当SATD≥T2,则在IV类中选择帧间预测模式。
步骤3,重复以上步骤1和步骤2,直至当前P帧图像的99个宏块全部处理完毕。
(2)对于B帧编码,B帧编码与P帧编码有以下3点不同:
1)B帧的预测包括前向预测和后向预测。
2)B帧的SKIP模式为B_SKIP。
3)关闭B帧的帧内预测模式,即在B帧编码下,只考虑Inter16×16、Inter16×8、Inter8×16、Inter8×8、Inter8×4、Inter4×8、Inter4×4模式。
鉴于上述差异,B帧编码的帧间预测算法略有调整,其步骤如下:
步骤1,依次从99个宏块中选择1个宏块进行帧间预测模式判决。
步骤2,首先执行SKIP模式的提前判断步骤,如果满足SKIP模式条件,则判定当前宏块的帧间预测模式为B_SKIP模式,帧间预测步骤结束。如果不满足则进行预测匹配度判决(也即,将SATD值与第一阈值T1、第二阈值T2进行比较),并根据判定条件选择对应的候选预测模式。由于关闭B帧帧内预测模式,所以此处不考虑IV类候选预测模式。
步骤3,重复以上步骤1和步骤2,直至当前B帧图像的99个宏块全部处理完毕。
本申请实施例提出了一种帧间预测模式快速确定方法,该确定方法依据图像运动特点,将宏块帧间预测模式进行分类,并且基于宏块运动剧烈程度和纹理特性的预测块大小预先选择准则,排除某些可能性较小的预测块模式,减少了率失真代价函数的计算次数,从而有效降低了帧间预测的复杂度。对大多数视频序列,其帧间编码中SKIP、Inter16×16、Inter16×8、Inter8×16这4种模式的使用占比都超过了60%。对于上述4种模 式的判决,本申请实施例在最差的情况下只需要进行3次运动估计和率失真代价函数计算,相对于H.264/AVC参考代码中全搜索算法中的41次运动估计以及148次代价函数的计算,本申请实施例极大地提高了视频编码器中帧间预测模块的编码速度。
本申请实施例提供的图像编码方法,执行主体可以为图像编码 装置。本申请实施例中以图像编码 装置执行图像编码方法为例,说明本申请实施例提供的图像编码 装置
本申请实施例提供一种图像编码装置,如图6所示,该图像编码装置600包括:
获取模块602,用于获取目标帧图像,其中,目标帧图像包括多个像素块,每个像素块包括N行、M列像素,N、M为正整数;
确定模块604,用于确定多个像素块中的目标像素块的绝对误差变换和,以及根据绝对误差变换和,确定目标像素块的帧间预测模式;
编码模块606,用于根据确定的帧间预测模式对目标像素块进行帧间编码。
在该实施例中,获取目标帧图像后,将目标帧图像划分为多个像素块(也即宏块),每个像素块包括N行、M列像素,也即块大小为N×M的像素块,例如为16×16的像素块,每个像素块可按照上述宏块分块原则进行分块。对于目标帧图像的任一像素块(也即目标像素块),进行N×M块大小的运动估计,得到最佳参考帧和匹配宏块。进而计算该目标像素块相对于匹配宏块的SATD值,再根据SATD值有选择性地比较可能的分类模式,得到目标像素块的帧间预测模式。最后,按照得到的帧间预测模式对目标像素块进行帧间编码。通过本申请实施例的上述方式,一方面能够尽量减小了搜索模式的数量,从而降低帧间预测模式选择的运算复杂度;另一方面,能够防止最佳帧间预测模式被遗漏,造成编码质量的下降。
进一步地,在本申请一个实施例中,该图像编码装置600还包括:判断模块,用于判断目标像素块的帧间预测模式是否为跳跃模式;编码模块606,还用于在目标像素块的帧间预测模式为跳跃模式的情况下,根据跳跃模式对目标像素块进行帧间编码;确定模块604,具体用于在目标像素块的帧间预测模式不为跳跃模式的情况下,确定目标像素块的绝对误差变换和。
进一步地,在本申请一个实施例中,该图像编码装置600还包括:计算模块,用于:计算目标像素块在跳跃模式下的第一率失真代价值;计算目标像素块在N×M预测模式下的第二率失真代价值;计算目标帧图像和其他参考帧图像中,采用N×M预测模式的已编码像素块的平均率失真代价值;判断模块,具体用于在第一率失真代价值小于第二率失真代价值,且第一率失真代价值小于平均率失真代价值的情况下,确定目标像素块的帧间预测模式为跳跃模式。
进一步地,在本申请一个实施例中,确定模块604,具体用于:在绝对误差变换和小于第一阈值的情况下,根据目标像素块的平均水平标准差和平均垂直标准差,确定目标像素块的帧间预测模式;在绝对误差变换和大于或等于第一阈值,且小于第二阈值的情况下,将多个第一候选预测模式中最小率失真代价值对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式;在绝对误差变换和大于或等于第二阈值的情况下,将多个第二候选预测模式中最小率失真代价值对应的第二目标候选预测模式,确定为目标像素块的帧间预测模式;其中,第一阈值小于第二阈值,多个第一候选预测模式包括n1×m1子块预测模式、n1×m2子块预测模式、n2×m1子块预测模式和n2×m2子块预测模式,多个第二候选预测模式包括n2×m2帧内预测模式和N×M帧内预测模式,n1=N/2,m1=M/2,n2=N/4,m2=M/4。
进一步地,在本申请一个实施例中,计算模块,还用于计算目标像素块的平均水平标准差和平均垂直标准差;确定模块604,具体用于:在平均水平标准差大于第三阈值,且平均垂直标准差大于第四阈值的情况下,确定目标像素块的帧间预测模式为N×M预测模式;在平均水平标准差小于或等于第三阈值的情况下,计算目标像素块在N×m1子块预测模式下的第三率失真代价值,根据第三率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式;在平均水平标准差小于或等于第四阈值的情况下,计算目标像素块在n1×M子块预测模式下的第四率失真代价值,根据第四率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式。
进一步地,在本申请一个实施例中,确定模块604,具体用于:在第三率失真代价值小于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×m1子块预测模式;在第三率失真代价值大于或等于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×M预测模式;确定模块604,具体用于:在第四率失真代价值小于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为n1×M子块预测模式;在第四率失真代价值大于或等于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×M预测模式。
进一步地,在本申请一个实施例中,计算模块,还用于:计算目标像素块在n1×m2子块预测模式下的第五率失真代价值;计算目标像素块在n2×m1子块预测模式下的第六率失真代价值;计算目标像素块在n2×m2子块预测模式下的第七率失真代价值;计算目标像素块在n1×m1子块预测模式下的第八率失真代价值;确定模块604,具体用于将第五率失真代价值、第六率失真代价值、第七率失真代价值以及第八率失真代价值中最小率失真代价值所对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式。
本申请实施例中的图像编码装置600可以是装置,也可以是终端中的部件、集成电路或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(Ultra-Mobile Personal Computer,UMPC)、上网本或者个人数字助理(Personal Digital Assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(Personal Computer,PC)、电视机(Television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的图像编码装置600可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的图像编码装置600能够实现图4的图像编码方法实施例中实现的各个过程,为避免重复,这里不再赘述。
可选的,如图7所示,本申请实施例还提供一种电子设备700,包括处理器702,存储器704,存储在存储器704上并可在处理器702上运行的程序或指令,该程序或指令被处理器702执行时实现上述图像编码方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要注意的是,本申请实施例中的电子设备包括上述的移动电子设备和非移动电子设备。
图8为实现本申请实施例的一种电子设备的硬件结构示意图。
该电子设备800包括但不限于:射频单元802、网络模块804、音频输出单元806、输入单元808、传感器810、显示单元812、用户输入单元814、接口单元816、存储器818、以及处理器820等部件。
本领域技术人员可以理解,电子设备800还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器820逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图8中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,处理器820用于:获取目标帧图像,其中,目标帧图像包括多个像素块,每个像素块包括N行、M列像素,N、M为正整数;确定多个像素块中的目标像素块的绝对误差变换和,以及根据绝对误差变换和,确定目标像素块的帧间预测模式;根据确定的帧间预测模式对目标像素块进行帧间编码。
在该实施例中,获取目标帧图像后,将目标帧图像划分为多个像素块(也即宏块),每个像素块包括N行、M列像素,也即块大小为N×M的像素块,例如为16×16的像素块,每个像素块可按照上述宏块分块原则进行分块。对于目标帧图像的任一像素块(也即目标像素块),进行N×M块大小的运动估计,得到最佳参考帧和匹配宏块。进而计算该目标像素块相对于匹配宏块的SATD值,再根据SATD值有选择性地比较可能的分类模式,得到目标像素块的帧间预测模式。最后,按照得到的帧间预测模式对目标像素块进行帧间编码。通过本申请实施例的上述方式,一方面能够尽量减小了搜索模式的数量,从而降低帧间预测模式选择的运算复杂度;另一方面,能够防止最佳帧间预测模式被遗漏,造成编码质量的下降。
进一步地,在本申请一个实施例中,处理器820用于:判断目标像素块的帧间预测模式是否为跳跃模式;在目标像素块的帧间预测模式为跳跃模式的情况下,根据跳跃模式对目标像素块进行帧间编码;在目标 像素块的帧间预测模式不为跳跃模式的情况下,确定目标像素块的绝对误差变换和。
进一步地,在本申请一个实施例中,处理器820用于:计算目标像素块在跳跃模式下的第一率失真代价值;计算目标像素块在N×M预测模式下的第二率失真代价值;计算目标帧图像和其他参考帧图像中,采用N×M预测模式的已编码像素块的平均率失真代价值;在第一率失真代价值小于第二率失真代价值,且第一率失真代价值小于平均率失真代价值的情况下,确定目标像素块的帧间预测模式为跳跃模式。
进一步地,在本申请一个实施例中,处理器820用于:在绝对误差变换和小于第一阈值的情况下,根据目标像素块的平均水平标准差和平均垂直标准差,确定目标像素块的帧间预测模式;在绝对误差变换和大于或等于第一阈值,且小于第二阈值的情况下,将多个第一候选预测模式中最小率失真代价值对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式;在绝对误差变换和大于或等于第二阈值的情况下,将多个第二候选预测模式中最小率失真代价值对应的第二目标候选预测模式,确定为目标像素块的帧间预测模式;其中,第一阈值小于第二阈值,多个第一候选预测模式包括n1×m1子块预测模式、n1×m2子块预测模式、n2×m1子块预测模式和n2×m2子块预测模式,多个第二候选预测模式包括n2×m2帧内预测模式和N×M帧内预测模式,n1=N/2,m1=M/2,n2=N/4,m2=M/4。
进一步地,在本申请一个实施例中,处理器820用于:计算目标像素块的平均水平标准差和平均垂直标准差;在平均水平标准差大于第三阈值,且平均垂直标准差大于第四阈值的情况下,确定目标像素块的帧间预测模式为N×M预测模式;在平均水平标准差小于或等于第三阈值的情况下,计算目标像素块在N×m1子块预测模式下的第三率失真代价值,根据第三率失真代价值和第二率失真代价值,确定目标像素块的帧 间预测模式;在平均水平标准差小于或等于第四阈值的情况下,计算目标像素块在n1×M子块预测模式下的第四率失真代价值,根据第四率失真代价值和第二率失真代价值,确定目标像素块的帧间预测模式。
进一步地,在本申请一个实施例中,处理器820用于:在第三率失真代价值小于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×m1子块预测模式;在第三率失真代价值大于或等于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×M预测模式;确定模块604,具体用于:在第四率失真代价值小于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为n1×M子块预测模式;在第四率失真代价值大于或等于第二率失真代价值的情况下,确定目标像素块的帧间预测模式为N×M预测模式。
进一步地,在本申请一个实施例中,处理器820用于:计算目标像素块在n1×m2子块预测模式下的第五率失真代价值;计算目标像素块在n2×m1子块预测模式下的第六率失真代价值;计算目标像素块在n2×m2子块预测模式下的第七率失真代价值;计算目标像素块在n1×m1子块预测模式下的第八率失真代价值;将第五率失真代价值、第六率失真代价值、第七率失真代价值以及第八率失真代价值中最小率失真代价值所对应的第一目标候选预测模式,确定为目标像素块的帧间预测模式。
应理解的是,本申请实施例中,输入单元808可以包括图形处理器(Graphics Processing Unit,GPU)8082和麦克风8084,图形处理器8082对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元812可包括显示面板8122,可以采用液晶显示器、有机发光二极管等形式来配置显示面板8122。用户输入单元814包括触控面板8142以及其他输入设备8144中的至少一种。触控面板8142,也称为触摸屏。触控面板8142可包括触摸检测装置和触摸控制器两个部分。其他输入设备8144可以包括但不限于物 理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
存储器818可用于存储软件程序以及各种数据。存储器818可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器818可以包括易失性存储器或非易失性存储器,或者,存储器818可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器818包括但不限于这些和任意其它适合类型的存储器。
处理器820可包括一个或多个处理单元;可选的,处理器820集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器820中。
本申请实施例还提供一种可读存储介质,可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述图像编码方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,处理器为上述实施例中的电子设备中的处理器。可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。
本申请实施例另提供了一种芯片,芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现上述图像编码方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。
本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如上述图像编码方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (13)

  1. 一种图像编码方法,包括:
    获取目标帧图像,其中,所述目标帧图像包括多个像素块,每个所述像素块包括N行、M列像素,N、M为正整数;
    确定所述多个像素块中的目标像素块的绝对误差变换和;
    根据所述绝对误差变换和,确定所述目标像素块的帧间预测模式;
    根据确定的所述帧间预测模式对所述目标像素块进行帧间编码。
  2. 根据权利要求1所述的图像编码方法,其中,在确定所述目标像素块的绝对误差变换和之前,还包括:
    判断所述目标像素块的帧间预测模式是否为跳跃模式;
    在所述目标像素块的帧间预测模式为所述跳跃模式的情况下,根据所述跳跃模式对所述目标像素块进行帧间编码;
    所述确定所述目标像素块的绝对误差变换,包括:
    在所述目标像素块的帧间预测模式不为所述跳跃模式的情况下,确定所述目标像素块的绝对误差变换和。
  3. 根据权利要求2所述的图像编码方法,其中,所述判断所述目标像素块的帧间预测模式是否为跳跃模式,包括:
    计算所述目标像素块在所述跳跃模式下的第一率失真代价值;
    计算所述目标像素块在N×M预测模式下的第二率失真代价值;
    计算所述目标帧图像和其他参考帧图像中,采用所述N×M预测模式的已编码像素块的平均率失真代价值;
    在所述第一率失真代价值小于所述第二率失真代价值,且所述第一率失真代价值小于所述平均率失真代价值的情况下,确定所述目标像素块的帧间预测模式为所述跳跃模式。
  4. 根据权利要求3所述的图像编码方法,其中,所述根据所述绝对误差变换和,确定所述目标像素块的帧间预测模式,包括:
    在所述绝对误差变换和小于第一阈值的情况下,根据所述目标像素块的平均水平标准差和平均垂直标准差,确定所述目标像素块的帧间预测模式;
    在所述绝对误差变换和大于或等于所述第一阈值,且小于第二阈值的情况下,将多个第一候选预测模式中最小率失真代价值对应的第一目标候选预测模式,确定为所述目标像素块的帧间预测模式;
    在所述绝对误差变换和大于或等于所述第二阈值的情况下,将多个第二候选预测模式中最小率失真代价值对应的第二目标候选预测模式,确定为所述目标像素块的帧间预测模式;
    其中,所述第一阈值小于所述第二阈值,所述多个第一候选预测模式包括n1×m1子块预测模式、n1×m2子块预测模式、n2×m1子块预测模式和n2×m2子块预测模式,所述多个第二候选预测模式包括n2×m2帧内预测模式和N×M帧内预测模式,n1=N/2,m1=M/2,n2=N/4,m2=M/4。
  5. 根据权利要求4所述的图像编码方法,其中,所述根据所述目标像素块的平均水平标准差和平均垂直标准差,确定所述目标像素块的帧间预测模式,包括:
    计算所述目标像素块的所述平均水平标准差和所述平均垂直标准差;
    在所述平均水平标准差大于第三阈值,且所述平均垂直标准差大于第四阈值的情况下,确定所述目标像素块的帧间预测模式为所述N×M预测模式;
    在所述平均水平标准差小于或等于所述第三阈值的情况下,计算所述目标像素块在N×m1子块预测模式下的第三率失真代价值,根据所述 第三率失真代价值和所述第二率失真代价值,确定所述目标像素块的帧间预测模式;
    在所述平均水平标准差小于或等于所述第四阈值的情况下,计算所述目标像素块在n1×M子块预测模式下的第四率失真代价值,根据所述第四率失真代价值和所述第二率失真代价值,确定所述目标像素块的帧间预测模式。
  6. 根据权利要求5所述的图像编码方法,其中,所述根据所述第三率失真代价值和所述第二率失真代价值,确定所述目标像素块的帧间预测模式,包括:
    在所述第三率失真代价值小于所述第二率失真代价值的情况下,确定所述目标像素块的帧间预测模式为所述N×m1子块预测模式;
    在所述第三率失真代价值大于或等于所述第二率失真代价值的情况下,确定所述目标像素块的帧间预测模式为所述N×M预测模式;
    所述根据所述第四率失真代价值和所述第二率失真代价值,确定所述目标像素块的帧间预测模式,包括:
    在所述第四率失真代价值小于所述第二率失真代价值的情况下,确定所述目标像素块的帧间预测模式为所述n1×M子块预测模式;
    在所述第四率失真代价值大于或等于所述第二率失真代价值的情况下,确定所述目标像素块的帧间预测模式为所述N×M预测模式。
  7. 根据权利要求4至6中任一项所述的图像编码方法,其中,所述将多个第一候选预测模式中最小率失真代价值对应的第一目标候选预测模式,确定为所述目标像素块的帧间预测模式,包括:
    计算所述目标像素块在所述n1×m2子块预测模式下的第五率失真代价值;
    计算所述目标像素块在所述n2×m1子块预测模式下的第六率失真代价值;
    计算所述目标像素块在所述n2×m2子块预测模式下的第七率失真代价值;
    计算所述目标像素块在所述n1×m1子块预测模式下的第八率失真代价值;
    将所述第五率失真代价值、所述第六率失真代价值、所述第七率失真代价值以及所述第八率失真代价值中最小率失真代价值所对应的第一目标候选预测模式,确定为所述目标像素块的帧间预测模式。
  8. 一种图像编码装置,包括:
    获取模块,用于获取目标帧图像,其中,所述目标帧图像包括多个像素块,每个所述像素块包括N行、M列像素,N、M为正整数;
    确定模块,用于确定所述多个像素块中的目标像素块的绝对误差变换和,以及根据所述绝对误差变换和,确定所述目标像素块的帧间预测模式;
    编码模块,用于根据确定的所述帧间预测模式对所述目标像素块进行帧间编码。
  9. 一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至7中任一项所述的图像编码方法的步骤。
  10. 一种可读存储介质,其上存储有程序或指令,所述程序或指令被处理器执行时实现如权利要求1至7中任一项所述的图像编码方法的步骤。
  11. 一种电子设备,包括所述电子设备被配置用于执行如权利要求1至7中任一项所述的图像编码方法的步骤。
  12. 一种芯片,包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1至7中任一项所述的图像编码方法的步骤。
  13. 一种计算机程序产品,包括所述计算机程序包含用于执行权利要求1至7中任一项所述的图像编码方法的步骤。
PCT/CN2022/143660 2022-01-04 2022-12-29 图像编码方法、图像编码装置、电子设备和可读存储介质 WO2023131059A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210003592.7 2022-01-04
CN202210003592.7A CN114339218A (zh) 2022-01-04 2022-01-04 图像编码方法、图像编码装置、电子设备和可读存储介质

Publications (1)

Publication Number Publication Date
WO2023131059A1 true WO2023131059A1 (zh) 2023-07-13

Family

ID=81023254

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/143660 WO2023131059A1 (zh) 2022-01-04 2022-12-29 图像编码方法、图像编码装置、电子设备和可读存储介质

Country Status (2)

Country Link
CN (1) CN114339218A (zh)
WO (1) WO2023131059A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692648A (zh) * 2024-02-02 2024-03-12 腾讯科技(深圳)有限公司 视频编码方法、装置、设备、存储介质和计算机程序产品

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339218A (zh) * 2022-01-04 2022-04-12 维沃移动通信有限公司 图像编码方法、图像编码装置、电子设备和可读存储介质
CN116962686A (zh) * 2022-04-15 2023-10-27 维沃移动通信有限公司 帧间预测方法及终端
CN117294861B (zh) * 2023-11-24 2024-03-22 淘宝(中国)软件有限公司 一种基于帧间预测的编码块划分方法及编码器

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160127725A1 (en) * 2014-10-31 2016-05-05 Ecole De Technologie Superieure Method and system for fast mode decision for high efficiency video coding
US20170094283A1 (en) * 2015-09-30 2017-03-30 Apple Inc. Adapting mode decisions in video encoder
CN107318016A (zh) * 2017-05-08 2017-11-03 上海大学 一种基于零块分布的hevc帧间预测模式快速判定方法
CN110996099A (zh) * 2019-11-15 2020-04-10 网宿科技股份有限公司 一种视频编码方法、系统及设备
CN114339218A (zh) * 2022-01-04 2022-04-12 维沃移动通信有限公司 图像编码方法、图像编码装置、电子设备和可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160127725A1 (en) * 2014-10-31 2016-05-05 Ecole De Technologie Superieure Method and system for fast mode decision for high efficiency video coding
US20170094283A1 (en) * 2015-09-30 2017-03-30 Apple Inc. Adapting mode decisions in video encoder
CN107318016A (zh) * 2017-05-08 2017-11-03 上海大学 一种基于零块分布的hevc帧间预测模式快速判定方法
CN110996099A (zh) * 2019-11-15 2020-04-10 网宿科技股份有限公司 一种视频编码方法、系统及设备
CN114339218A (zh) * 2022-01-04 2022-04-12 维沃移动通信有限公司 图像编码方法、图像编码装置、电子设备和可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692648A (zh) * 2024-02-02 2024-03-12 腾讯科技(深圳)有限公司 视频编码方法、装置、设备、存储介质和计算机程序产品
CN117692648B (zh) * 2024-02-02 2024-05-17 腾讯科技(深圳)有限公司 视频编码方法、装置、设备、存储介质和计算机程序产品

Also Published As

Publication number Publication date
CN114339218A (zh) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2023131059A1 (zh) 图像编码方法、图像编码装置、电子设备和可读存储介质
KR100957316B1 (ko) 멀티미디어 코딩을 위한 모드 선택 기술
WO2018010492A1 (zh) 视频编码中帧内预测模式的快速决策方法
US8331449B2 (en) Fast encoding method and system using adaptive intra prediction
JP5054826B2 (ja) 時空間的複雑度を用いた符号化モード決定方法及び装置
CN103248895B (zh) 一种用于hevc帧内编码的快速模式估计方法
JP2008523724A (ja) 動画像符号化のための動き推定技術
WO2021163862A1 (zh) 视频编码的方法与装置
JP2008227670A (ja) 画像符号化装置
EP1642464B1 (en) Method of encoding for handheld apparatuses
WO2022121787A1 (zh) 视频预测编码的方法及装置
EP1574072A1 (en) Video encoding with skipping motion estimation for selected macroblocks
WO2023005830A1 (zh) 预测编码方法、装置和电子设备
WO2006047936A1 (en) A method for determining the condition in zero block prejudgment and for prejudging zero block
KR100905059B1 (ko) 동영상 부호화에 있어서 비트 발생 가능성 예측을 이용한블록 모드 결정 방법 및 장치
Liu et al. Video coding and processing: a survey
TWI487381B (zh) Predictive Coding Method for Multimedia Image Texture
KR20120072205A (ko) 매크로블록 간의 예측 기법을 이용한 움직임 추정 장치 및 방법
KR100802207B1 (ko) 동영상의 움직임 추정을 위한 움직임 예측방법 및 움직임추정 부호화기
US10148954B2 (en) Method and system for determining intra mode decision in H.264 video coding
CN101977317A (zh) 帧内预测方法及装置
Wei et al. A fast macroblock mode decision algorithm for H. 264
Shen et al. Fast multiframe motion estimation algorithm in H. 264
KR20100097387A (ko) 고속 움직임 추정을 위한 부분 블록정합 방법
Liu et al. Video analytical coding: When video coding meets video analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE