CN114339218A

CN114339218A - Image encoding method, image encoding device, electronic apparatus, and readable storage medium

Info

Publication number: CN114339218A
Application number: CN202210003592.7A
Authority: CN
Inventors: 张勇
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-04-12
Also published as: WO2023131059A1

Abstract

The application discloses an image coding method, an image coding device, electronic equipment and a readable storage medium, and belongs to the technical field of image processing. The image encoding method includes: acquiring a target frame image, wherein the target frame image comprises a plurality of pixel blocks, each pixel block comprises N rows of pixels and M columns of pixels, and N, M is a positive integer; determining an absolute error transform sum of a target pixel block of a plurality of pixel blocks; determining an inter-frame prediction mode of a target pixel block according to the absolute error transformation sum; and performing inter-frame coding on the target pixel block according to the determined inter-frame prediction mode.

Description

Image encoding method, image encoding device, electronic apparatus, and readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image encoding method, an image encoding apparatus, an electronic device, and a readable storage medium.

Background

The video coding takes the high correlation of video signals and the visual characteristics of human eyes as starting points, and eliminates the redundancy generated by various correlations and the characteristics of human eyes through a proper coding mode to achieve the purposes of compressing the video signals and reducing the transmission code rate. The correlation of a video signal can be divided into a temporal correlation and a spatial correlation, where the temporal correlation refers to the similarity between adjacent images in an image sequence, and for a video sequence, adjacent frames before and after the video sequence often contain the same background and object, but the spatial position changes due to the rotation of a lens or the movement of an object, so that the video sequence has a very strong correlation in the temporal domain. Inter-Frame Prediction (Inter-Frame Prediction) coding is generally adopted, that is, continuous image contents in video sequence frames are matched, and the matched contents are predicted, so that redundancy is reduced.

At present, in order to describe motion conditions of different objects more accurately and improve accuracy of inter-frame prediction, inter-frame prediction mainly adopts a blocking mode with variable block sizes, but for an inter-frame prediction module, each macroblock brightness signal needs to traverse 7 block size prediction modes, a rate distortion cost function under each prediction mode is calculated, and finally, a mode with the minimum rate distortion cost is selected as an optimal prediction mode, so that the calculation complexity of an inter-frame prediction mode selection algorithm based on rate distortion optimization is very high, and the real-time performance of an encoder is seriously influenced.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image encoding method, an image encoding apparatus, an electronic device, and a readable storage medium, which can solve the problem in the related art that an inter-frame prediction mode selection algorithm based on rate distortion optimization has a high computational complexity.

In a first aspect, an embodiment of the present application provides an image encoding method, including:

acquiring a target frame image, wherein the target frame image comprises a plurality of pixel blocks, each pixel block comprises N rows of pixels and M columns of pixels, and N, M is a positive integer;

determining an absolute error transform sum of a target pixel block of a plurality of pixel blocks;

determining an inter-frame prediction mode of a target pixel block according to the absolute error transformation sum;

and performing inter-frame coding on the target pixel block according to the determined inter-frame prediction mode.

In a second aspect, an embodiment of the present application provides an image encoding apparatus, including:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target frame image, the target frame image comprises a plurality of pixel blocks, each pixel block comprises N rows of pixels and M columns of pixels, and N, M is a positive integer;

a determining module for determining an absolute error transform sum of a target pixel block among the plurality of pixel blocks and determining an inter prediction mode of the target pixel block according to the absolute error transform sum;

and the coding module is used for carrying out inter-frame coding on the target pixel block according to the determined inter-frame prediction mode.

In a third aspect, embodiments of the present application provide an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, where the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium on which a program or instructions are stored, which when executed by a processor, implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the present application, after the target frame image is acquired, the target frame image is divided into a plurality of pixel blocks (i.e., macroblocks), each pixel block includes N rows and M columns of pixels, that is, a pixel block with a block size of N × M, for example, a pixel block with a block size of 16 × 16, and each pixel block can be partitioned according to the macroblock partitioning principle. For any pixel block (namely, the target pixel block) of the target frame image, motion estimation with the size of N multiplied by M blocks is carried out to obtain the optimal reference frame and the matched macro block. And calculating the SATD value of the target pixel block relative to the matched macro block, and selectively comparing possible classification modes according to the SATD value to obtain the inter-frame prediction mode of the target pixel block. And finally, performing inter-frame coding on the target pixel block according to the obtained inter-frame prediction mode. Through the mode of the embodiment of the application, on one hand, the number of the search modes can be reduced as much as possible, so that the operation complexity of inter-frame prediction mode selection is reduced; on the other hand, it is possible to prevent the optimal inter prediction mode from being omitted, resulting in a reduction in encoding quality.

Drawings

Fig. 1 is a schematic diagram of a rate-distortion cost function calculation flow according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating the structural partitioning of a macroblock according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating the structural division of sub-blocks according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating an image encoding method according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for determining an inter prediction mode according to an embodiment of the present application;

FIG. 6 is a schematic block diagram of an image encoding apparatus according to an embodiment of the present application;

FIG. 7 is one of the schematic block diagrams of an electronic device of an embodiment of the present application;

fig. 8 is a second schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The correlation of a video signal is classified into a temporal correlation and a spatial correlation. The spatial correlation refers to the similarity between adjacent pixels in the same image, and is mainly eliminated by Intra-Frame Prediction (Intra-Frame Prediction) coding, that is, the value of the current pixel is predicted by using the pixels adjacent to the periphery in the same Frame image. The temporal correlation refers to the similarity between adjacent images in an image sequence, and for a video sequence, adjacent frames before and after the video sequence often contain the same background and object, but the spatial position is changed due to the rotation of a lens or the movement of the object, so that the video sequence has strong correlation in the temporal domain. Inter-frame predictive coding, i.e., matching successive image contents in frames of a video sequence and predicting the matched contents, is generally used to reduce redundancy.

Inter-prediction aims at eliminating temporal redundancy, i.e. the prediction of the picture now to be coded, using previously coded pictures, which includes forward prediction (P-frames) and bi-directional prediction (B-frames). Inter prediction may search for matching macroblocks by macroblock-based motion estimation, and the motion vectors pointing to matching macroblocks may be of integer-pixel or sub-pixel precision. In H.264/Advanced Video Coding (AVC) Video Coding, the Coding efficiency of inter-frame prediction is greatly improved. Inter prediction is usually performed by using 16 × 16 macroblocks, but such fixed-size blocking approach often has no flexibility, and especially, a larger macroblock may contain images with different motion characteristics, and cannot accurately describe all the motion details within one macroblock. h.264/AVC employs inter prediction with variable block size, the prediction block size of which can vary from maximum 16 × 16 up to 4 × 4, i.e. an optimal inter prediction block size is adaptively selected according to the image characteristics and motion characteristics. The prediction mode with the variable block size provides more choices for the inter-frame prediction of the macro block, and particularly for the condition that the macro block comprises a plurality of moving objects or the macro block is positioned at the edge of the moving object, the variable block size can more accurately describe the moving conditions of different objects, thereby improving the accuracy of the inter-frame prediction. The variable block size inter-frame prediction technique of H.264/AVC greatly improves the efficiency of predictive coding, but also brings obvious increase of computational complexity.

Specifically, the inter-frame prediction algorithm of h.264/AVC employs a coding technique of tree structure partitioning and motion estimation, where tree structure partitioning means that each macroblock can be partitioned in 4 ways, as shown in fig. 2, including: 1 16 × 16 macroblock, or 2 16 × 8 sub-blocks, or 28 × 16 sub-blocks, or 48 × 8 sub-blocks. Each sub-block of the 8 × 8 mode (sub-block division) may be further partitioned in 4 ways, as shown in fig. 3, including: 18 × 8 sub-blocks, or 28 × 4 sub-blocks, or 2 4 × 8 sub-blocks, or 4 × 4 sub-blocks.

When the inter-frame coding is carried out, each block mode needs to be tried once, the minimum cost which can be obtained by various possible block modes of the macro block is calculated through motion estimation, and then the block mode corresponding to the minimum cost is selected to be the best block mode of the macro block.

The chrominance components (Cr and Cb) of a macroblock are half (each half horizontal and vertical) of the corresponding luminance. The chroma blocks use the same blocking pattern as the luma blocks except that the size is halved (both horizontally and vertically). The Motion Vector (MV) of the chrominance block is also halved by the horizontal and vertical components of the corresponding luminance MV.

The tree-structured partitioning technique adopted by h.264/AVC makes it possible to adaptively decide the macroblock partitioning mode, rather than a single mode, for the partitioning of macroblocks, in order to best describe the motion details of a macroblock. In the inter-prediction coding mode, each block has an MV to be encoded and transmitted, and block selection information is also encoded into the encoded code stream. In the large-size block, only a small number of bytes are needed for transmitting the block selection information and the MV, but the prediction precision is low, the energy of a residual signal to be coded is large, and more bytes are needed; small-sized blocks have high motion estimation accuracy, and can obtain residual errors with low energy, but on the contrary, each sub-block needs to transmit an MV, and the sub-block type information also needs more coding bits. Therefore, how to make a compromise between the two is an important issue to be considered when designing the inter-prediction algorithm.

In h.264/AVC coding, to obtain the best inter prediction block mode, Rate Distortion Optimization (RDO) is usually used to select the best prediction block size, i.e. by taking a trade-off between the number of bits used and the Distortion for each candidate mode. In the h.264/AVC standard, for Inter prediction, a luminance signal of each macroblock is subjected to traversal through 7 prediction modes, i.e., Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8, Inter8 × 4, Inter4 × 8, and Inter4 × 4, and a rate distortion cost function is calculated, and then the rate distortion cost functions of the respective modes are compared, and a mode having the smallest cost function is selected as an optimal Inter prediction mode. The whole inter-frame prediction rate-distortion cost function calculation process is shown in fig. 1. Obviously, in the whole rate-distortion cost function calculation process, the encoder repeatedly performs the following calculation in each prediction mode: motion estimation, motion compensation, Discrete Cosine Transform (DCT) Transform/quantization, Inverse quantization/Inverse Discrete Cosine Transform (IDCT) Transform, and entropy coding.

Wherein the rate-distortion cost function is defined as:

J(s,c,IMODE|QP,λ_MODE)＝SSD(s,c,IMODE|QP)+λ_MODER(s,c,IMODE|QP) (1)

in formula (1), QP is a quantization parameter, IMODE represents one of all available inter-frame prediction modes, s represents an original pixel value of a luminance block, c represents a reconstructed value, and c is obtained by DCT transformation, quantization, IDCT transformation, and inverse quantization. R (s, c, IMODE | QP) represents selection under QP conditionsThe number of encoding bits in the IMODE mode includes the number of bits for encoding the prediction mode and the number of bits for encoding the luminance transform coefficient, and Context Adaptive Variable Length Coding (CAVLC) Coding or Context Adaptive Binary Arithmetic Coding (CABAC) Coding is used to calculate the number of encoding bits. Lambda [ alpha ]_MODEIs a mode-selective Lagrangian multiplier, defined as λ_MODE＝0.85×2^(QP-12)/3. SSD (s, c, mode | QP) is the sum of squared errors of s and c, c is obtained under the conditions of QP, mode, let (x, y) denote the size of the partition, a denotes the pixel area, and there are:

in H.264/AVC inter-coding, its inter-prediction supports an intra-prediction mode and a SKIP (SKIP) mode in addition to the above-described 7 prediction modes. Intra prediction includes two types of Intra4 × 4 and Intra16 × 16, and there are 9 prediction modes for Intra4 × 4 and 4 prediction modes for Intra16 × 16. The SKIP mode is a special inter-frame 16 × 16 mode, and the SKIP mode only aims at macro block coding, namely a macro block is not required to be coded at all, and only needs to be marked as a SKIP macro block in a code stream. The SKIP macro block comprises a P _ SKIP type macro block and a B _ SKIP type macro block, the P _ SKIP type macro block is a COPY macro block, no Motion Vector residual (MVD) exists, no quantization residual is coded, when decoding, a Motion Vector Prediction value (MVP) is directly used as a Motion Vector to obtain a pixel Prediction value, and a decoded pixel reconstruction value is equal to the pixel Prediction value; the B _ SKIP type macro block has no MVD and does not encode quantized residual, the forward and backward MVs are calculated through a Direct prediction mode during decoding to obtain a pixel prediction value, and the pixel reconstruction value is equal to the pixel prediction value.

In the h.264/AVC standard reference code, the inter prediction mode selection process for one macroblock includes: and respectively carrying out motion estimation on 1 16 × 16 macro block, 2 16 × 8 sub blocks and 28 × 16 sub blocks, calculating the rate distortion cost value of the corresponding mode, and selecting the mode with the minimum rate distortion cost value from the three modes as an alternative mode. Dividing an 8 × 8 block into 18 × 8 sub-block, 28 × 4 sub-blocks, 2 4 × 8 sub-blocks and 4 × 4 sub-blocks to perform motion estimation, calculating a rate distortion cost value of a corresponding mode, selecting a mode with the minimum rate distortion cost value from the four modes as an alternative mode of the 8 × 8 sub-block, and calculating 48 × 8 sub-blocks of a16 × 16 macro block. And adding the rate distortion cost values of the 48 × 8 block candidate modes to obtain the rate distortion cost value of the 16 × 16 macroblock by adopting the P8 × 8 mode coding, wherein the P8 × 8 comprises Inter8 × 8, Inter8 × 4, Inter4 × 8 and Inter4 × 4. Rate-distortion cost values of the Intra16 × 16 mode and the Intra4 × 4 mode are calculated, and the mode with the minimum rate-distortion cost value is selected as the Intra prediction candidate mode. And calculating the rate distortion cost value in the SKIP mode. Finally, the mode with the minimum rate distortion cost value is selected from the Inter16 × 16, the Inter16 × 8, the Inter8 × 16, the P8 × 8, the SKIP mode and the intra prediction alternative modes as the optimal Inter prediction mode of the 16 × 16 macro block.

From the above steps, the number of times of motion estimation and rate distortion cost calculation for 1 16 × 16 macro block is: the 16 × 16 macroblock is divided into 1 time, 2 times for the 16 × 8 sub-block, 2 times for the 8 × 16 sub-block, 4 times for the 8 × 8 sub-block, 8 times for the 8 × 4 sub-block, 8 times for the 4 × 8 sub-block, and 16 times for the 4 × 4 sub-block, and the total number is 1+2+2+4+8+8+16 times as 41 times. Meanwhile, the number of mode combinations of the macroblock intra prediction is as follows: 16 × 9+4 ═ 148, i.e., 148 cost functions need to be calculated to select the best intra prediction mode. Obviously, such an algorithm traversing all modes is computationally expensive, which places a heavy computational burden on the encoder.

In the above manner, the inter-frame prediction mode selection algorithm based on rate distortion optimization has very high computational complexity, and becomes a bottleneck affecting the real-time performance of the encoder. Therefore, under the condition of not increasing code rate and ensuring the quality of coded images, the complexity of a search algorithm is reduced, and the improvement of the real-time performance of an encoder becomes a key problem which needs to be solved in practical application of inter-frame prediction.

The image encoding method, the image encoding apparatus, the electronic device, and the readable storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.

An embodiment of the present application provides an image encoding method, as shown in fig. 4, the image encoding method includes:

step 402, obtaining a target frame image, wherein the target frame image comprises a plurality of pixel blocks, each pixel block comprises N rows of pixels and M columns of pixels, and N, M is a positive integer;

step 404, determining an absolute error transformation sum of a target pixel block in the plurality of pixel blocks;

step 406, determining an inter-frame prediction mode of the target pixel block according to the absolute error transformation sum;

and step 408, inter-coding the target pixel block according to the determined inter-prediction mode.

In this embodiment, with tree-structured partitioning as shown in fig. 2 and 3, blocks in smooth regions in an image tend to select larger partition types, while blocks with complex textures tend to select smaller partition types; blocks with less motion tend to select larger partition types, while blocks with more motion tend to select smaller partition types.

The Sum of Absolute error Transform (SATD) is a transformed residual error, which reflects not only the degree of matching between distortion and a prediction block, but also the size of a generated code stream to some extent. The SATD value is usually larger for the macro blocks with fine texture, more detail variation, severe motion or moving edge regions. Whereas for macroblocks with simple texture, less detail variation, smooth motion, the SATD value is usually small. Based on this, the embodiment of the present application provides a method for determining an inter-frame prediction mode combining a macroblock SATD value and a texture complexity, which determines whether a macroblock is located in a severe motion area or a smooth motion area by using the distribution of the macroblock SATD value, and determines whether the macroblock includes complex texture information by combining the texture complexity of an image. In the embodiment of the present application, the macroblock partitioning principle is as follows: and selecting smaller blocks in the areas with violent motion, more detail change and fine texture. In areas with smooth motion, less detail change and simple texture, larger blocks are selected.

According to the image motion characteristics, the embodiment of the application classifies the inter-frame prediction modes of the macro blocks, further selectively compares possible classification modes according to the SATD value, reduces the calculation times of the rate distortion cost function under the condition that the quality of the reconstructed image and the coding code rate are not changed, and can accurately estimate the inter-frame prediction modes.

Specifically, after the target frame image is acquired, the target frame image is divided into a plurality of pixel blocks (i.e., macroblocks), each pixel block includes N rows and M columns of pixels, i.e., a pixel block with a block size of N × M, for example, a pixel block with a block size of 16 × 16, and each pixel block is partitioned according to the macroblock partitioning principle. For any pixel block (namely, the target pixel block) of the target frame image, motion estimation with the size of N multiplied by M blocks is carried out to obtain the optimal reference frame and the matched macro block. And calculating the SATD value of the target pixel block relative to the matched macro block, and selectively comparing possible classification modes according to the SATD value to obtain the inter-frame prediction mode of the target pixel block. And finally, performing inter-frame coding on the target pixel block according to the obtained inter-frame prediction mode.

Through the mode of the embodiment of the application, on one hand, the number of the search modes can be reduced as much as possible, so that the operation complexity of inter-frame prediction mode selection is reduced; on the other hand, it is possible to prevent the optimal inter prediction mode from being omitted, resulting in a reduction in encoding quality.

Further, in an embodiment of the present application, before determining the absolute error transform sum of the target pixel block, the image encoding method further includes: judging whether the inter-frame prediction mode of the target pixel block is a skip mode; under the condition that the inter-frame prediction mode of the target pixel block is a skip mode, inter-frame coding is carried out on the target pixel block according to the skip mode; determining an absolute error transform for a target block of pixels, comprising: in the case where the inter prediction mode of the target pixel block is not the skip mode, the absolute error conversion sum of the target pixel block is determined.

In this embodiment, there are regions in the video sequence that are spatially uniform or temporally stationary, such as background regions of the image. In this type of region, coding is usually done with larger block sizes, such as SKIP mode or Inter16 × 16 subblock prediction mode. The SKIP mode does not need residual coding, the process is simple, and the calculation complexity is low, so that the SKIP mode can be detected in advance, and if the SKIP mode can be detected in advance, the complex RDO calculation of other modes can be avoided.

Specifically, before determining the absolute error transformation sum of the target pixel block, it is first determined whether the inter-frame prediction mode of the target pixel block is the SKIP mode, and if the inter-frame prediction mode is the SKIP mode, the SKIP mode is directly used for inter-frame coding of the target pixel block.

By means of preferentially judging the SKIP mode, when the interframe prediction mode of the target pixel block is determined to be the SKIP mode, the mode selection process is terminated in advance, complexity of an interframe prediction algorithm is reduced, and instantaneity of a video encoder is improved.

Further, in an embodiment of the present application, determining whether the inter prediction mode of the target pixel block is a skip mode includes: calculating a first rate-distortion cost value of a target pixel block in a skip mode; calculating a second rate-distortion cost value of the target pixel block in the NxM prediction mode; calculating the average rate distortion cost value of the coded pixel block adopting the N multiplied by M prediction mode in the target frame image and other reference frame images; and under the condition that the first rate distortion cost value is smaller than the second rate distortion cost value and the first rate distortion cost value is smaller than the average rate distortion cost value, determining that the inter-frame prediction mode of the target pixel block is a skip mode.

In this embodiment, the method for determining the SKIP mode specifically includes:

step 1, calculating a first rate distortion cost value in an SKIP mode. The SKIP mode motion vector is equal to the predicted motion vector, and the number of coded bits is 0, so based on the formula (1), the rate-distortion cost value rdcost (SKIP) of the SKIP mode is:

RDcost(SKIP)＝SSD(s,c|QP) (3)

and 2, calculating a second rate-distortion cost value under the N multiplied by M prediction mode, namely the Inter16 multiplied by 16 subblock prediction mode. And performing 16 × 16 block-size motion estimation on the target pixel block to obtain a best reference frame and a matching macroblock, and then calculating the rate distortion cost RDcost of the Inter16 × 16 mode (Inter16 × 16).

And 3, calculating the average rate distortion cost value avgRdcost (Inter16 x 16) of all pixel blocks which are coded and adopt an Inter prediction mode of an Inter16 x 16 mode in the current target frame image and other reference frames of the target frame image.

And 4, if the following 2 preset conditions are met, judging that the inter-frame prediction mode of the current target pixel block is the SKIP mode. The preset conditions include:

RDcost(SKIP)<RDcost(Inter16×16) (4)

RDcost(SKIP)<avgRDcost(Inter16×16) (5)

by the mode, the judgment of the SKIP mode is realized, so that when the inter-frame prediction mode of the target pixel block is determined to be the SKIP mode, the mode selection process is terminated in advance, the complexity of an inter-frame prediction algorithm is reduced, and the real-time performance of a video encoder is improved.

When the SKIP mode is preferentially determined, motion estimation of the size of N × M blocks to calculate the sum of absolute errors of the target pixel block is performed, that is, motion estimation performed in step 2.

Further, in an embodiment of the present application, determining an inter prediction mode of a target pixel block according to a sum of absolute error transforms includes: under the condition that the absolute error transformation sum is smaller than a first threshold value, determining an inter-frame prediction mode of a target pixel block according to the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; determining a first target candidate prediction mode corresponding to the minimum rate distortion cost value in the plurality of first candidate prediction modes as an inter-frame prediction mode of a target pixel block under the condition that the absolute error transformation sum is greater than or equal to a first threshold and less than a second threshold; determining a second target candidate prediction mode corresponding to the minimum rate distortion cost value in the plurality of second candidate prediction modes as an inter prediction mode of a target pixel block under the condition that the sum of absolute error transformation is greater than or equal to a second threshold; wherein the first threshold is less than the second threshold, the plurality of first candidate prediction modes include an N1 × M1 sub-block prediction mode, an N1 × M2 sub-block prediction mode, an N2 × M1 sub-block prediction mode, and an N2 × M2 sub-block prediction mode, the plurality of second candidate prediction modes include an N2 × M2 intra prediction mode and an N × M intra prediction mode, N1 ═ N/2, M1 ═ M/2, N2 ═ N/4, and M2 ═ M/4.

In this embodiment, after motion estimation of N × M block size, the best reference frame and the matching macroblock are obtained, and then the SATD value of the target pixel block may be calculated as follows:

SATD(s,c)＝∑|T{s(x,y)-c(x,y)}| (6)

in formula (6), s represents the original value of the luminance pixel, c represents the reconstructed value, T represents the hadamard transform, x represents the row of the target pixel block, x represents the column of the target pixel block, and x is 1,2,3. Let H_iRepresenting a hadamard matrix of order i, then:

T(w)＝H_i×w×H_i (7)

hadamard matrix H_iCan be obtained by recursion:

the embodiment of the present application classifies inter-macroblock prediction modes, including: class I, class II, class III, and class IV, wherein class I includes SKIP mode, class II includes nxm prediction mode, nxm 1 subblock prediction mode, and N1 xm subblock prediction mode, class III includes N1 xm 1 subblock prediction mode, N1 xm 2 subblock prediction mode, N2 xm 1 subblock prediction mode, and N2 xm 2 subblock prediction mode, and class IV includes N2 xm 2 intra prediction mode, and nxm intra prediction mode. Exemplarily, when N is 16 and M is 16, the macroblock inter prediction mode is classified as shown in table 1.

TABLE 1

Classification	Inter prediction mode
		Class I	SKIP mode
Class II	Inter16×16、Inter16×8、Inter8×16
		Class III	Inter8×8、Inter8×4、Inter4×8、Inter4×4
Class IV	Intra4×4、Intra16×16

The prediction matching degree of the target pixel block is divided into 3 cases according to the distribution of the SATD values, and different candidate prediction classifications are selected for each case, as shown in table 2. Further, possible inter prediction modes other than class I are selectively determined.

TABLE 2

In table 2, T1 is the first threshold, T2 is the second threshold, and T1< T2, which is obtained through statistics of a large number of experimental results. If SATD < T1 is satisfied, in class II, the inter prediction mode for the target pixel block is determined based on the average horizontal standard deviation and the average vertical standard deviation of the target pixel block. And if T1 ≦ SATD < T2 is satisfied, determining a first target candidate prediction mode corresponding to the minimum rate-distortion cost value among the plurality of first candidate prediction modes (i.e., class III) as the inter prediction mode of the target pixel block. And if the SATD is more than or equal to T2, determining a second target candidate prediction mode corresponding to the minimum rate distortion cost value in a plurality of second candidate prediction modes (namely IV classes) as the inter-frame prediction mode of the target pixel block.

In an embodiment of the present application, in combination with the inter prediction mode classification table of table 1, as shown in fig. 5, the method for determining an inter prediction mode includes:

step 502, judging whether the inter-frame prediction mode of the target pixel block is a skip mode, if the inter-frame prediction mode of the target pixel block is not the skip mode, entering step 504, and if the inter-frame prediction mode of the target pixel block is the skip mode, entering step 512;

step 504, comparing the SATD value with a first threshold T1 and a second threshold T2, entering step 506 under the condition that the SATD is less than T1, entering step 508 under the condition that the SATD is less than or equal to T1 and less than T2, and entering step 510 under the condition that the SATD is more than or equal to T2;

step 506, selecting an inter prediction mode in class II;

step 508, selecting an inter-frame prediction mode in class III;

step 510, selecting an inter prediction mode in the IV class;

step 512, the skip mode is selected.

In the embodiment of the application, the classification is selected in advance based on the motion intensity of the target pixel block and the prediction block size of the texture characteristic, and some prediction modes with low possibility can be excluded in advance, so that the complexity of inter-frame prediction is reduced.

Further, in an embodiment of the present application, determining an inter prediction mode of a target pixel block according to an average horizontal standard deviation and an average vertical standard deviation of the target pixel block includes: calculating the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; determining the inter-frame prediction mode of the target pixel block to be an N multiplied by M prediction mode under the condition that the average horizontal standard deviation is larger than a third threshold value and the average vertical standard deviation is larger than a fourth threshold value; under the condition that the average level standard deviation is smaller than or equal to a third threshold value, calculating a third rate distortion cost value of the target pixel block in the Nxm 1 subblock prediction mode, and determining an inter-frame prediction mode of the target pixel block according to the third rate distortion cost value and the second rate distortion cost value; and under the condition that the average level standard deviation is less than or equal to a fourth threshold value, calculating a fourth rate distortion cost value of the target pixel block in the n1 × M subblock prediction mode, and determining the inter-frame prediction mode of the target pixel block according to the fourth rate distortion cost value and the second rate distortion cost value.

In this embodiment, the class II candidate prediction modes include 3 modes: n × M prediction mode, N × M1 sub-block prediction mode, N1 × M sub-block prediction mode, e.g., Inter16 × 16, Inter16 × 8, Inter8 × 16. Inter16 × 16 predicts a whole pixel block, and is suitable for pixel blocks with relatively consistent motion, such pixel blocks are in the same motion object, do not contain the edge of the motion object, and have consistency in horizontal texture and vertical texture. Inter16 × 8 is suitable for horizontally moving uniform and vertically moving relatively complex pixel blocks belonging to the same moving object in the horizontal direction and containing different moving objects in the vertical direction, and the horizontal texture is uniform while the vertical texture is relatively rich. Inter8 × 16 is suitable for vertically moving uniform and horizontally moving relatively complex pixel blocks belonging to the same moving object in the vertical direction and containing different moving objects in the horizontal direction, and the vertical texture is uniform while the horizontal texture is relatively rich. Therefore, the embodiment of the present application further refines the candidate prediction modes according to the texture consistency of the pixel blocks in the horizontal direction and the vertical direction.

The pixel block has texture consistency in the horizontal direction, indicating that all pixel values of each row of the pixel block are approximately equal. Mean standard deviation SD_HIs used to detect this type of pixel block, the mean horizontal standard deviation SD_HThe calculation formula of (a) is as follows:

wherein, SD_yIs the standard deviation of the y row pixel values, as shown in equation (11):

in formula (11), p (x, y) is the luminance pixel value of each pixel block, m_yRepresenting the average of all the pixels of the y row.

The pixel block has texture consistency in the vertical direction, indicating that all pixel values of each column of the pixel block are approximately equal. Mean vertical standard deviation SD_VIs used to detect this type of pixel block, the mean vertical standard deviation SD_VThe calculation formula of (a) is as follows:

wherein, SD_xIs the standard deviation of x columns of pixel values, as shown in equation (13):

m in formula (13)_xRepresenting the mean of all pixels of the x columns.

Combined with the above mean standard deviation SD_HAnd mean vertical standard deviation SD_VThe step of determining the class II candidate prediction mode is as follows:

step 1, calculating the average horizontal standard deviation SD of a target pixel block_HAnd mean vertical standard deviation SD_V。

Step 2, judging SD_HWhether or not it is greater than T3, if SD_HT3. ltoreq.T, proceed to step 3 if SD_H>T3 then determine SD_VWhether or not it is greater than T4, if SD_H>T3 and SD_V>T4, determining the Inter prediction mode of the current target pixel block to be Inter16 × 16 if SD_VAnd (4) carrying out step 4 when the temperature is less than or equal to T4. Where T3 is the third threshold and T4 is the fourth threshold.

Step 3, if SD_HT3, indicating that the current target pixel block has texture consistency in the horizontal direction, the possible candidate prediction modes are the NxM prediction mode and the Nxm 1 sub-block prediction mode, i.e., Inter16 × 16 and Inter16 × 8. And carrying out motion estimation on the target pixel block by Nxm 1 to obtain an optimal reference frame and a matched macroblock, further calculating a third rate-distortion cost value of the current target pixel block in an Nxm 1 sub-block prediction mode, and determining an inter-frame prediction mode of the target pixel block according to the third rate-distortion cost value and a second rate-distortion cost value in the Nxm prediction mode.

Step 4, if SD_VT4, indicating that the current target pixel block has texture consistency in the vertical direction, the possible candidate prediction modes are the N M prediction mode and the N1M sub-block prediction mode, i.e., Inter16 × 16 and Inter8 × 16. And performing N1 xM motion estimation on the target pixel block to obtain an optimal reference frame and a matched macroblock, further calculating a fourth rate-distortion cost value of the target pixel block in an N1 xM subblock prediction mode, and determining an inter-frame prediction mode of the target pixel block according to the fourth rate-distortion cost value and a second rate-distortion cost value in the N xM prediction mode.

By the method, when the SATD is judged to be less than T1, the possibility of other classification is eliminated, the optimal inter-frame prediction mode of the target pixel block is determined only in the II-type candidate prediction mode, the range of the prediction mode is narrowed, and the complexity of inter-frame prediction is effectively reduced.

Further, in an embodiment of the present application, determining the inter prediction mode of the target pixel block according to the third rate-distortion cost value and the second rate-distortion cost value includes: determining the inter prediction mode of the target pixel block to be the nxm 1 sub-block prediction mode under the condition that the third rate distortion cost value is less than the second rate distortion cost value; determining the inter prediction mode of the target pixel block to be an nxm prediction mode under the condition that the third rate distortion cost value is greater than or equal to the second rate distortion cost value; determining an inter prediction mode of the target pixel block according to the fourth rate distortion cost value and the second rate distortion cost value, comprising: determining the inter prediction mode of the target pixel block to be the n1 × M subblock prediction mode in case that the fourth rate-distortion cost value is smaller than the second rate-distortion cost value; in a case where the fourth rate-distortion cost value is greater than or equal to the second rate-distortion cost value, determining the inter prediction mode of the target pixel block to be the nxm prediction mode.

In this embodiment, if SD_HT3 is not more than T, and after calculating the third rate-distortion cost value RDcost (InterNxm 1) of the N × m1 sub-block prediction mode, if RDcost (InterNxm 1)<The optimal frame key prediction mode is Inter Nxm 1 if RDcost (Inter NxM 1) is greater than or equal to RDcost (Inter NxM), and the optimal frame key prediction mode is Inter NxM. For example, if RDcost (Inter16 × 8) is 16, M16<The best mode is Inter16 × 8 if RDcost (Inter16 × 8) ≧ RDcost (Inter16 × 16), and Inter16 × 16 if RDcost (Inter16 × 16).

It should be noted that the third rate-distortion cost value RDcost (Inter N × M1) is the sum of the rate-distortion cost values of two N × M1 sub-blocks, that is, the third rate-distortion cost value RDcost (Inter N × M1) is equivalent to the rate-distortion cost value of one N × M block.

If SD_VT4 is less than or equal to, and after calculating the fourth rate-distortion cost value RDcost (Intern 1 xM) of the n1 xM subblock prediction mode, if RDcost (Intern 1 xM)<The optimal frame key prediction mode is Inter N1 × M if RDcost (Inter N1 × M) ≧ RDcost (Inter N × M), the optimal frame key prediction mode is Inter N × M. For example, if RDcost (Inter8 × 16) is 16, M16<The best mode is Inter8 × 16 if RDcost (Inter8 × 16) ≧ RDcost (Inter16 × 16), and Inter16 × 16 if RDcost (Inter16 × 16).

It should be noted that the fourth rate-distortion cost value RDcost (Inter N1 × M) is the sum of the rate-distortion cost values of two N1 × M sub-blocks, that is, the fourth rate-distortion cost value RDcost (Inter N1 × M) is equivalent to the rate-distortion cost value of one N × M block.

Through the mode, the optimal inter-frame prediction mode of the target pixel block is determined in the II-type candidate prediction mode, and the accuracy of determining the optimal inter-frame prediction mode of the target pixel block is improved.

Further, in an embodiment of the present application, determining a first target candidate prediction mode corresponding to a minimum rate distortion cost value in a plurality of first candidate prediction modes as an inter prediction mode of a target pixel block includes: calculating a fifth rate-distortion cost value of the target pixel block in the n1 xm 2 sub-block prediction mode; calculating a sixth rate-distortion cost value of the target pixel block in the n2 xm 1 sub-block prediction mode; calculating a seventh rate-distortion cost value of the target pixel block in the n2 xm 2 sub-block prediction mode; calculating an eighth rate-distortion cost value of the target pixel block in the n1 xm 1 sub-block prediction mode; and determining the first target candidate prediction mode corresponding to the minimum rate distortion cost value in the fifth rate distortion cost value, the sixth rate distortion cost value, the seventh rate distortion cost value and the eighth rate distortion cost value as the inter-frame prediction mode of the target pixel block.

In this embodiment, the class III candidate prediction modes include 4 modes: inter n1 × m1, Inter n1 × m2, Inter n2 × m1, Inter n2 × m2, for example, Inter8 × 8, Inter8 × 4, Inter4 × 8, Inter4 × 4. The pixel blocks corresponding to the class III candidate prediction mode belong to different moving objects in the horizontal direction and the vertical direction, and the motion is severe. The determination steps of the class III candidate prediction mode are as follows:

step 1, divide the n1 × m1 block into 1 n1 × m1 sub-block, 2 n1 × m2 sub-block, 2 n2 × m1 sub-block, 4 n2 × m2 sub-blocks, for example, divide the 8 × 8 block into 18 × 8 sub-block (Inter8 × 8), 28 × 4 sub-blocks (Inter8 × 4), 2 4 × 8 sub-blocks (Inter4 × 8), 4 × 4 sub-blocks (Inter4 × 4). And performing motion estimation, and calculating rate-distortion cost values of the corresponding modes, namely, calculating a fifth rate-distortion cost value in the n1 × m2 mode, a sixth rate-distortion cost value in the n2 × m1 mode, a seventh rate-distortion cost value in the n2 × m2 mode, and an eighth rate-distortion cost value in the n1 × m1 mode.

It should be noted that the fifth rate-distortion cost value RDcost (Inter N1 × M2) is the sum of the rate-distortion cost values of eight N1 × M2 sub-blocks, that is, the fifth rate-distortion cost value RDcost (Inter N1 × M2) is equivalent to the rate-distortion cost value of one N × M block.

The sixth rate distortion cost value RDcost (Inter N2 × M1) is the sum of the rate distortion cost values of eight N2 × M1 sub-blocks, i.e., the sixth rate distortion cost value RDcost (Inter N2 × M1) is equivalent to the rate distortion cost value of one N × M block.

The seventh rate distortion cost value RDcost (Inter N2 × M2) is the sum of the rate distortion cost values of sixteen N2 × M2 sub-blocks, i.e., the seventh rate distortion cost value RDcost (Inter N2 × M2) is equivalent to the rate distortion cost value of one N × M block.

The eighth rate-distortion cost value RDcost (Inter N1 × M1) is the sum of the rate-distortion cost values of the four N1 × M1 sub-blocks, i.e., the eighth rate-distortion cost value RDcost (Inter N1 × M1) is equivalent to the rate-distortion cost value of one N × M block.

And determining the minimum rate distortion cost value from the fifth rate distortion cost value, the sixth rate distortion cost value, the seventh rate distortion cost value and the eighth rate distortion cost value, and taking the first target candidate prediction mode corresponding to the minimum rate distortion cost value as the inter-frame prediction mode of the target pixel block.

By the method, when the T1 is judged to be less than or equal to the SATD < T2, the possibility of other classification is eliminated, the optimal inter-frame prediction mode of the target pixel block is determined only in the III-type candidate prediction mode, the range of the prediction mode is narrowed, and the complexity of inter-frame prediction is effectively reduced.

Further, in an embodiment of the present application, determining a second target candidate prediction mode corresponding to a minimum rate distortion cost value in a plurality of second candidate prediction modes as an inter prediction mode of a target pixel block includes: calculating a ninth rate-distortion cost value of the target pixel block in the n2 × m2 intra prediction mode; calculating a tenth rate distortion cost value of the target pixel block in an NxM intra-frame prediction mode; and determining a second target candidate prediction mode corresponding to the minimum rate distortion cost value in the ninth rate distortion cost value and the tenth rate distortion cost value as an inter-frame prediction mode of the target pixel block.

In this embodiment, in order to improve the efficiency of encoding and the robustness of the transmission process, the pixel block is allowed to adopt an intra prediction mode, i.e., a class IV candidate prediction mode, when inter-encoding. The class IV candidate prediction modes include 2 modes: intra N2 × M2, Intra N × M, e.g., Intra4 × 4, Intra16 × 16.

And calculating the rate-distortion cost value (namely, the ninth rate-distortion cost value) of the Intra N2 × M2 mode and the rate-distortion cost value (namely, the tenth rate-distortion cost value) of the Intra N × M mode, and selecting the mode corresponding to the minimum rate-distortion cost value from the ninth rate-distortion cost value and the tenth rate-distortion cost value as the inter-frame prediction mode.

By the method, when the SATD is judged to be larger than or equal to T2, the possibility of other classification is eliminated, the optimal inter-frame prediction mode of the target pixel block is determined only in the IV-type candidate prediction modes, the range of the prediction modes is narrowed, and the complexity of inter-frame prediction is effectively reduced.

Illustratively, a QCIF (i.e., 176 × 144 pixels) video sequence is encoded in a Quarter Common Intermediate Format (QCIF) video sequence, and a frame of image contains 99 macroblocks of 16 × 16 size. When the video is coded, I-frame, P-frame and B-frame coding tools are all turned on, and the I-frame is inserted periodically, one I-frame is inserted in every 1 second of the video, and the B-frame: p frame 2:1, i.e. the encoder encodes the video sequence in the manner of IBBPBBPBBP … …. Setting the encoder as a frame coding mode, setting the frame rate as 22 frames/second, setting the motion estimation search range as W to 16, setting the number of forward or backward reference frames as 1, starting rate distortion optimization coding, and setting the quantization parameter QP to 28. In addition, it should be noted that if the field coding mode is adopted, the relevant configuration parameter needs to be multiplied by 2.

The use of the embodiment of the present application is divided into two cases, P frame coding and B frame coding, which are respectively described below:

(1) for P frame coding, it is assumed that the current coding object is 1P frame picture, which contains 99 macroblocks of size 16 × 16. The determining of the optimal inter prediction mode includes:

step 1, sequentially selecting 1 macro block (namely a target macro block) from 99 macro blocks to perform inter prediction mode judgment.

And 2, as shown in fig. 5, firstly, executing a step of judging the SKIP mode in advance, if the SKIP mode condition is met, judging that the inter-frame prediction mode of the current macro block is the SKIP mode, and ending the inter-frame prediction step. If the prediction matching degree is not satisfied, the prediction matching degree is judged, and a corresponding candidate prediction mode is selected according to the judgment condition, namely, the SATD value is compared with a first threshold value T1 and a second threshold value T2, when the SATD is less than T1, the inter-frame prediction mode is selected in the class II, when the SATD is less than or equal to T1 and less than T2, the inter-frame prediction mode is selected in the class III, and when the SATD is more than or equal to T2, the inter-frame prediction mode is selected in the class IV.

And 3, repeating the steps 1 and 2 until all 99 macro blocks of the current P frame image are processed.

(2) For B-frame coding, B-frame coding differs from P-frame coding by 3 points:

1) the prediction of B frames includes forward prediction and backward prediction.

2) The SKIP mode of the B frame is B _ SKIP.

3) The intra prediction mode of the B frame is turned off, that is, only the Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8, Inter8 × 4, Inter4 × 8, and Inter4 × 4 modes are considered in the B frame coding.

In view of the above difference, the inter-frame prediction algorithm of B-frame coding is slightly adjusted, and the steps are as follows:

step 1, sequentially selecting 1 macro block from 99 macro blocks to carry out inter-frame prediction mode judgment.

And 2, firstly, executing a step of judging the SKIP mode in advance, if the SKIP mode condition is met, judging that the inter-frame prediction mode of the current macro block is the B _ SKIP mode, and ending the inter-frame prediction step. If not, a prediction matching degree decision is made (i.e., the SATD value is compared with the first threshold T1 and the second threshold T2), and a corresponding candidate prediction mode is selected according to the decision condition. Since the B frame intra prediction mode is turned off, the class IV candidate prediction mode is not considered here.

And 3, repeating the steps 1 and 2 until all 99 macro blocks of the current B frame image are processed.

The embodiment of the application provides a method for rapidly determining an inter-frame prediction mode, which classifies the inter-frame prediction modes of macro blocks according to the motion characteristics of images, and excludes some prediction block modes with low possibility based on the pre-selection criteria of the prediction block size of the motion intensity degree and texture characteristics of the macro blocks, so that the calculation times of a rate distortion cost function are reduced, and the complexity of inter-frame prediction is effectively reduced. For most video sequences, the usage of SKIP, Inter16 × 16, Inter16 × 8 and Inter8 × 16 in the interframe coding exceeds 60%. For the above 4 modes of decision, the embodiment of the present application only needs to perform motion estimation and rate distortion cost function calculation 3 times in the worst case, and compared with 41 times of motion estimation and 148 times of cost function calculation in the full search algorithm in the h.264/AVC reference code, the embodiment of the present application greatly improves the encoding speed of the inter-frame prediction module in the video encoder.

In the image coding method provided by the embodiment of the application, the execution subject can be image codingDevice for measuring the position of a moving object. The embodiment of the application adopts image codingDevice for measuring the position of a moving objectThe method for performing image coding is taken as an example to illustrate the image coding provided by the embodiments of the present applicationDevice for measuring the position of a moving object。

An embodiment of the present application provides an image encoding apparatus, as shown in fig. 6, the image encoding apparatus 600 includes:

an obtaining module 602, configured to obtain a target frame image, where the target frame image includes a plurality of pixel blocks, each pixel block includes N rows and M columns of pixels, and N, M is a positive integer;

a determining module 604 for determining an absolute error transform sum of a target pixel block of the plurality of pixel blocks and determining an inter prediction mode of the target pixel block according to the absolute error transform sum;

and an encoding module 606, configured to perform inter-frame encoding on the target pixel block according to the determined inter-frame prediction mode.

In this embodiment, after the target frame image is acquired, the target frame image is divided into a plurality of pixel blocks (i.e., macroblocks), each pixel block includes N rows and M columns of pixels, i.e., a pixel block with a block size of N × M, for example, a pixel block with a block size of 16 × 16, and each pixel block can be partitioned according to the macroblock partitioning principle. For any pixel block (namely, the target pixel block) of the target frame image, motion estimation with the size of N multiplied by M blocks is carried out to obtain the optimal reference frame and the matched macro block. And calculating the SATD value of the target pixel block relative to the matched macro block, and selectively comparing possible classification modes according to the SATD value to obtain the inter-frame prediction mode of the target pixel block. And finally, performing inter-frame coding on the target pixel block according to the obtained inter-frame prediction mode. Through the mode of the embodiment of the application, on one hand, the number of the search modes can be reduced as much as possible, so that the operation complexity of inter-frame prediction mode selection is reduced; on the other hand, it is possible to prevent the optimal inter prediction mode from being omitted, resulting in a reduction in encoding quality.

Further, in an embodiment of the present application, the image encoding apparatus 600 further includes: the judging module is used for judging whether the inter-frame prediction mode of the target pixel block is a skip mode; the encoding module 606 is further configured to perform inter-frame encoding on the target pixel block according to the skip mode when the inter-frame prediction mode of the target pixel block is the skip mode; the determining module 604 is specifically configured to determine the absolute error transformation sum of the target pixel block if the inter prediction mode of the target pixel block is not the skip mode.

Further, in an embodiment of the present application, the image encoding apparatus 600 further includes: a calculation module to: calculating a first rate-distortion cost value of a target pixel block in a skip mode; calculating a second rate-distortion cost value of the target pixel block in the NxM prediction mode; calculating the average rate distortion cost value of the coded pixel block adopting the N multiplied by M prediction mode in the target frame image and other reference frame images; and the judging module is specifically used for determining that the inter-frame prediction mode of the target pixel block is the skip mode under the condition that the first rate distortion cost value is smaller than the second rate distortion cost value and the first rate distortion cost value is smaller than the average rate distortion cost value.

Further, in an embodiment of the present application, the determining module 604 is specifically configured to: under the condition that the absolute error transformation sum is smaller than a first threshold value, determining an inter-frame prediction mode of a target pixel block according to the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; determining a first target candidate prediction mode corresponding to the minimum rate distortion cost value in the plurality of first candidate prediction modes as an inter-frame prediction mode of a target pixel block under the condition that the absolute error transformation sum is greater than or equal to a first threshold and less than a second threshold; determining a second target candidate prediction mode corresponding to the minimum rate distortion cost value in the plurality of second candidate prediction modes as an inter prediction mode of a target pixel block under the condition that the sum of absolute error transformation is greater than or equal to a second threshold; wherein the first threshold is less than the second threshold, the plurality of first candidate prediction modes include an N1 × M1 sub-block prediction mode, an N1 × M2 sub-block prediction mode, an N2 × M1 sub-block prediction mode, and an N2 × M2 sub-block prediction mode, the plurality of second candidate prediction modes include an N2 × M2 intra prediction mode and an N × M intra prediction mode, N1 ═ N/2, M1 ═ M/2, N2 ═ N/4, and M2 ═ M/4.

Further, in an embodiment of the present application, the calculating module is further configured to calculate an average horizontal standard deviation and an average vertical standard deviation of the target pixel block; the determining module 604 is specifically configured to: determining the inter-frame prediction mode of the target pixel block to be an N multiplied by M prediction mode under the condition that the average horizontal standard deviation is larger than a third threshold value and the average vertical standard deviation is larger than a fourth threshold value; under the condition that the average level standard deviation is smaller than or equal to a third threshold value, calculating a third rate distortion cost value of the target pixel block in the Nxm 1 subblock prediction mode, and determining an inter-frame prediction mode of the target pixel block according to the third rate distortion cost value and the second rate distortion cost value; and under the condition that the average level standard deviation is less than or equal to a fourth threshold value, calculating a fourth rate distortion cost value of the target pixel block in the n1 × M subblock prediction mode, and determining the inter-frame prediction mode of the target pixel block according to the fourth rate distortion cost value and the second rate distortion cost value.

Further, in an embodiment of the present application, the determining module 604 is specifically configured to: determining the inter prediction mode of the target pixel block to be the nxm 1 sub-block prediction mode under the condition that the third rate distortion cost value is less than the second rate distortion cost value; determining the inter prediction mode of the target pixel block to be an nxm prediction mode under the condition that the third rate distortion cost value is greater than or equal to the second rate distortion cost value; the determining module 604 is specifically configured to: determining the inter prediction mode of the target pixel block to be the n1 × M subblock prediction mode in case that the fourth rate-distortion cost value is smaller than the second rate-distortion cost value; in a case where the fourth rate-distortion cost value is greater than or equal to the second rate-distortion cost value, determining the inter prediction mode of the target pixel block to be the nxm prediction mode.

Further, in an embodiment of the present application, the calculation module is further configured to: calculating a fifth rate-distortion cost value of the target pixel block in the n1 xm 2 sub-block prediction mode; calculating a sixth rate-distortion cost value of the target pixel block in the n2 xm 1 sub-block prediction mode; calculating a seventh rate-distortion cost value of the target pixel block in the n2 xm 2 sub-block prediction mode; calculating an eighth rate-distortion cost value of the target pixel block in the n1 xm 1 sub-block prediction mode; the determining module 604 is specifically configured to determine the first target candidate prediction mode corresponding to the minimum rate-distortion cost value of the fifth rate-distortion cost value, the sixth rate-distortion cost value, the seventh rate-distortion cost value, and the eighth rate-distortion cost value as the inter prediction mode of the target pixel block.

The image encoding apparatus 600 in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the Mobile electronic device may be a Mobile phone, a tablet Computer, a notebook Computer, a palm top Computer, an in-vehicle electronic device, a wearable device, an Ultra-Mobile Personal Computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-Mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (Personal Computer, PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.

The image encoding apparatus 600 in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The image encoding apparatus 600 provided in this embodiment of the application can implement each process implemented in the image encoding method embodiment of fig. 4, and is not described here again to avoid repetition.

Optionally, as shown in fig. 7, an electronic device 700 is further provided in an embodiment of the present application, and includes a processor 702, a memory 704, and a program or an instruction stored in the memory 704 and executable on the processor 702, where the program or the instruction is executed by the processor 702 to implement each process of the above-mentioned embodiment of the image encoding method, and can achieve the same technical effect, and no further description is provided here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 800 includes, but is not limited to: radio frequency unit 802, network module 804, audio output unit 806, input unit 808, sensors 810, display unit 812, user input unit 814, interface unit 816, memory 818, and processor 820, among other components.

Those skilled in the art will appreciate that the electronic device 800 may further include a power supply (e.g., a battery) for supplying power to the various components, and the power supply may be logically connected to the processor 820 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

Wherein the processor 820 is configured to: acquiring a target frame image, wherein the target frame image comprises a plurality of pixel blocks, each pixel block comprises N rows of pixels and M columns of pixels, and N, M is a positive integer; determining an absolute error transform sum of a target pixel block among the plurality of pixel blocks, and determining an inter prediction mode of the target pixel block according to the absolute error transform sum; and performing inter-frame coding on the target pixel block according to the determined inter-frame prediction mode.

Further, in one embodiment of the present application, the processor 820 is configured to: judging whether the inter-frame prediction mode of the target pixel block is a skip mode; under the condition that the inter-frame prediction mode of the target pixel block is a skip mode, inter-frame coding is carried out on the target pixel block according to the skip mode; in the case where the inter prediction mode of the target pixel block is not the skip mode, the absolute error conversion sum of the target pixel block is determined.

Further, in one embodiment of the present application, the processor 820 is configured to: calculating a first rate-distortion cost value of a target pixel block in a skip mode; calculating a second rate-distortion cost value of the target pixel block in the NxM prediction mode; calculating the average rate distortion cost value of the coded pixel block adopting the N multiplied by M prediction mode in the target frame image and other reference frame images; and under the condition that the first rate distortion cost value is smaller than the second rate distortion cost value and the first rate distortion cost value is smaller than the average rate distortion cost value, determining that the inter-frame prediction mode of the target pixel block is a skip mode.

Further, in one embodiment of the present application, the processor 820 is configured to: under the condition that the absolute error transformation sum is smaller than a first threshold value, determining an inter-frame prediction mode of a target pixel block according to the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; determining a first target candidate prediction mode corresponding to the minimum rate distortion cost value in the plurality of first candidate prediction modes as an inter-frame prediction mode of a target pixel block under the condition that the absolute error transformation sum is greater than or equal to a first threshold and less than a second threshold; determining a second target candidate prediction mode corresponding to the minimum rate distortion cost value in the plurality of second candidate prediction modes as an inter prediction mode of a target pixel block under the condition that the sum of absolute error transformation is greater than or equal to a second threshold; wherein the first threshold is less than the second threshold, the plurality of first candidate prediction modes include an N1 × M1 sub-block prediction mode, an N1 × M2 sub-block prediction mode, an N2 × M1 sub-block prediction mode, and an N2 × M2 sub-block prediction mode, the plurality of second candidate prediction modes include an N2 × M2 intra prediction mode and an N × M intra prediction mode, N1 ═ N/2, M1 ═ M/2, N2 ═ N/4, and M2 ═ M/4.

Further, in one embodiment of the present application, the processor 820 is configured to: calculating the average horizontal standard deviation and the average vertical standard deviation of the target pixel block; determining the inter-frame prediction mode of the target pixel block to be an N multiplied by M prediction mode under the condition that the average horizontal standard deviation is larger than a third threshold value and the average vertical standard deviation is larger than a fourth threshold value; under the condition that the average level standard deviation is smaller than or equal to a third threshold value, calculating a third rate distortion cost value of the target pixel block in the Nxm 1 subblock prediction mode, and determining an inter-frame prediction mode of the target pixel block according to the third rate distortion cost value and the second rate distortion cost value; and under the condition that the average level standard deviation is less than or equal to a fourth threshold value, calculating a fourth rate distortion cost value of the target pixel block in the n1 × M subblock prediction mode, and determining the inter-frame prediction mode of the target pixel block according to the fourth rate distortion cost value and the second rate distortion cost value.

Further, in one embodiment of the present application, the processor 820 is configured to: determining the inter prediction mode of the target pixel block to be the nxm 1 sub-block prediction mode under the condition that the third rate distortion cost value is less than the second rate distortion cost value; determining the inter prediction mode of the target pixel block to be an nxm prediction mode under the condition that the third rate distortion cost value is greater than or equal to the second rate distortion cost value; the determining module 604 is specifically configured to: determining the inter prediction mode of the target pixel block to be the n1 × M subblock prediction mode in case that the fourth rate-distortion cost value is smaller than the second rate-distortion cost value; in a case where the fourth rate-distortion cost value is greater than or equal to the second rate-distortion cost value, determining the inter prediction mode of the target pixel block to be the nxm prediction mode.

Further, in one embodiment of the present application, the processor 820 is configured to: calculating a fifth rate-distortion cost value of the target pixel block in the n1 xm 2 sub-block prediction mode; calculating a sixth rate-distortion cost value of the target pixel block in the n2 xm 1 sub-block prediction mode; calculating a seventh rate-distortion cost value of the target pixel block in the n2 xm 2 sub-block prediction mode; calculating an eighth rate-distortion cost value of the target pixel block in the n1 xm 1 sub-block prediction mode; and determining the first target candidate prediction mode corresponding to the minimum rate distortion cost value in the fifth rate distortion cost value, the sixth rate distortion cost value, the seventh rate distortion cost value and the eighth rate distortion cost value as the inter-frame prediction mode of the target pixel block.

It should be understood that in the embodiment of the present application, the input Unit 808 may include a Graphics Processing Unit (GPU) 8082 and a microphone 8084, and the Graphics Processing Unit 8082 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 812 may include a display panel 8122, and the display panel 8122 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 814 includes at least one of a touch panel 8142 and other input devices 8144. A touch panel 8142, also referred to as a touch screen. The touch panel 8142 may include two parts of a touch detection device and a touch controller. Other input devices 8144 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

The memory 818 may be used to store software programs as well as various data. The memory 818 may mainly include a first storage area storing a program or an instruction and a second storage area storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 818 may include volatile memory or nonvolatile memory, or the memory 818 may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 818 in the subject embodiment includes, but is not limited to, these and any other suitable types of memory.

Processor 820 may include one or more processing units; optionally, the processor 820 integrates an application processor, which primarily handles operations related to the operating system, user interface, and applications, and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 820.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned image encoding method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device in the above embodiment. Readable storage media, including computer-readable storage media, such as Read-Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, etc.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above-mentioned embodiment of the image coding method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing image coding method embodiments, and achieve the same technical effects, and in order to avoid repetition, details are not repeated here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image encoding method, comprising:

acquiring a target frame image, wherein the target frame image comprises a plurality of pixel blocks, each pixel block comprises N rows and M columns of pixels, and N, M is a positive integer;

determining an absolute error transformed sum of a target pixel block of the plurality of pixel blocks;

determining an inter-frame prediction mode of the target pixel block according to the absolute error transformation sum;

2. The image encoding method of claim 1, further comprising, before determining the absolute error transform sum of the target pixel block:

judging whether the inter-frame prediction mode of the target pixel block is a skip mode;

under the condition that the inter-frame prediction mode of the target pixel block is the skip mode, inter-frame coding is carried out on the target pixel block according to the skip mode;

the determining an absolute error transform of the target block of pixels comprises:

determining an absolute error transform sum of the target pixel block in case that an inter prediction mode of the target pixel block is not the skip mode.

3. The image encoding method according to claim 2, wherein the determining whether the inter prediction mode of the target pixel block is a skip mode comprises:

calculating a first rate-distortion cost value of the target pixel block in the skip mode;

calculating a second rate-distortion cost value of the target pixel block in the NxM prediction mode;

calculating the average rate distortion cost value of the coded pixel block adopting the N multiplied by M prediction mode in the target frame image and other reference frame images;

and under the condition that the first rate distortion cost value is smaller than the second rate distortion cost value and the first rate distortion cost value is smaller than the average rate distortion cost value, determining that the inter-frame prediction mode of the target pixel block is the skip mode.

4. The image encoding method according to claim 3, wherein the determining the inter prediction mode of the target pixel block based on the sum of absolute error transforms comprises:

determining an inter-frame prediction mode of the target pixel block according to the average horizontal standard deviation and the average vertical standard deviation of the target pixel block under the condition that the absolute error transformation sum is smaller than a first threshold;

determining a first target candidate prediction mode corresponding to a minimum rate distortion cost value in a plurality of first candidate prediction modes as an inter prediction mode of the target pixel block when the absolute error transformation sum is greater than or equal to the first threshold and less than a second threshold;

determining a second target candidate prediction mode corresponding to a minimum rate distortion cost value in a plurality of second candidate prediction modes as an inter prediction mode of the target pixel block under the condition that the absolute error transformation sum is greater than or equal to the second threshold;

wherein the first threshold is less than the second threshold, the plurality of first candidate prediction modes include an N1 × M1 sub-block prediction mode, an N1 × M2 sub-block prediction mode, an N2 × M1 sub-block prediction mode, and an N2 × M2 sub-block prediction mode, and the plurality of second candidate prediction modes include an N2 × M2 intra prediction mode and an N × M intra prediction mode, N1 ═ N/2, M1 ═ M/2, N2 ═ N/4, and M2 ═ M/4.

5. The image encoding method according to claim 4, wherein the determining the inter prediction mode for the target pixel block based on the average horizontal standard deviation and the average vertical standard deviation of the target pixel block comprises:

calculating the average horizontal standard deviation and the average vertical standard deviation of the target pixel block;

determining that an inter prediction mode of the target pixel block is the NxM prediction mode if the average horizontal standard deviation is greater than a third threshold and the average vertical standard deviation is greater than a fourth threshold;

calculating a third rate-distortion cost value of the target pixel block in the nxm 1 sub-block prediction mode if the average horizontal standard deviation is less than or equal to the third threshold, and determining an inter prediction mode of the target pixel block according to the third rate-distortion cost value and the second rate-distortion cost value;

and under the condition that the average level standard deviation is less than or equal to the fourth threshold, calculating a fourth rate distortion cost value of the target pixel block in an n1 xM subblock prediction mode, and determining an inter prediction mode of the target pixel block according to the fourth rate distortion cost value and the second rate distortion cost value.

6. The image encoding method of claim 5, wherein determining the inter prediction mode for the target pixel block based on the third rate-distortion cost value and the second rate-distortion cost value comprises:

determining the inter prediction mode of the target pixel block to be the nxm 1 sub-block prediction mode if the third rate-distortion cost value is less than the second rate-distortion cost value;

determining the inter prediction mode of the target pixel block to be the NxM prediction mode if the third rate-distortion cost value is greater than or equal to the second rate-distortion cost value;

determining an inter prediction mode for the target pixel block according to the fourth rate-distortion cost value and the second rate-distortion cost value, comprising:

determining the inter prediction mode of the target pixel block to be the n1 xm subblock prediction mode if the fourth rate-distortion cost value is less than the second rate-distortion cost value;

determining the inter prediction mode of the target pixel block as the NxM prediction mode if the fourth rate-distortion cost value is greater than or equal to the second rate-distortion cost value.

7. The image encoding method according to any one of claims 4 to 6, wherein determining a first target candidate prediction mode corresponding to a minimum rate distortion cost value among the plurality of first candidate prediction modes as the inter prediction mode of the target pixel block comprises:

calculating a fifth rate-distortion cost value for the target pixel block in the n1 xm 2 sub-block prediction mode;

calculating a sixth rate-distortion cost value for the target pixel block in the n2 xm 1 sub-block prediction mode;

calculating a seventh rate-distortion cost value for the target pixel block in the n2 xm 2 sub-block prediction mode;

calculating an eighth rate-distortion cost value for the target pixel block in the n1 xm 1 sub-block prediction mode;

and determining a first target candidate prediction mode corresponding to the minimum rate distortion cost value in the fifth rate distortion cost value, the sixth rate distortion cost value, the seventh rate distortion cost value and the eighth rate distortion cost value as an inter-frame prediction mode of the target pixel block.

8. An image encoding device characterized by comprising:

an obtaining module, configured to obtain a target frame image, where the target frame image includes a plurality of pixel blocks, each pixel block includes N rows and M columns of pixels, and N, M is a positive integer;

a determining module for determining an absolute error transformed sum of a target pixel block of the plurality of pixel blocks and determining an inter prediction mode of the target pixel block according to the absolute error transformed sum;

9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implement the steps of the image encoding method according to any one of claims 1 to 7.

10. A readable storage medium on which a program or instructions are stored, characterized in that said program or instructions, when executed by a processor, implement the steps of the image coding method according to any one of claims 1 to 7.