CN110139098B

CN110139098B - Decision tree-based intra-frame fast algorithm selection method for high-efficiency video encoder

Info

Publication number: CN110139098B
Application number: CN201910281249.7A
Authority: CN
Inventors: 张昊; 赵御兵
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2023-01-06
Anticipated expiration: 2039-04-09
Also published as: CN110139098A

Abstract

One of the embodiments of the present invention provides a decision tree-based method for fast algorithm selection within a high-efficiency video encoder frame. Compared with a single algorithm, the method provided by one of the embodiments of the present invention can further reduce the computational complexity of the encoder, and meanwhile, the video quality loss is negligible. Under the condition of distortion similar to the original HEVC coding rate, the method of embodiment 1 of the invention can reduce the coding time by 61.7 percent on average, and meanwhile, the quality (BRBD) only loses 1.91 percent.

Description

Decision tree-based intra-frame fast algorithm selection method for high-efficiency video encoder

Technical Field

The invention belongs to the technical field of video coding and decoding, and particularly relates to a decision tree-based intra-frame fast algorithm selection method for a high-efficiency video coder.

Background

Video coding means to convert a file of a video signal into another file format by some compression means, so that the bandwidth usage is reduced during the signal transmission process, and the video signal is efficiently transmitted. High Efficiency Video Coding (HEVC for short) is a new Video compression standard, and is more excellent in performance than h.264, and the compression rate can reach 2 times of h.264 under the same Video quality. After videos such as movies and cartoons are compressed by HEVC (high efficiency video coding), not only can the flow consumption be greatly reduced when a mobile phone user watches the online videos, but also the downloading speed can be higher, the image quality can not be influenced basically, and the online watching can be smoother and is not easy to cause card jamming.

In HEVC, a video signal sequence is encoded using a Group of pictures (GOP) as a basic Unit, where each frame is divided into a series of slices (independent Units for encoding), each slice is divided into a number of Tree-shaped Coding Units (CTUs), the CTUs are divided into four Coding Units (CU) with smaller sizes according to a quadtree-like structure, a CU is a basic Unit shared by intra/inter prediction, quantization transformation, entropy Coding, and other links of HEVC, the supportable Coding size is maximum 64 × 64 and minimum 8 × 8, and an encoder can reasonably select the size of a CU according to different Picture contents, picture sizes, and application requirements, so as to obtain a larger degree of optimization.

Intra-frame prediction techniques have been successfully applied in H.264/AVC. Intra-frame prediction in HEVC refers to predicting a current pixel block by an encoded reconstructed pixel block using spatial correlation of an image to remove spatial redundant information and improve the image compression rate. In HEVC, in order to describe the texture characteristics of an image more accurately and reduce the prediction error, more precise intra prediction techniques are proposed, and the number of prediction modes is increased to 35, as shown in fig. 1. For HEVC intra coding, the CTU may be iteratively divided into four CUs up to the minimum coding single (8 × 8), the CU at each depth may be further divided into two PUs, 2N × 2N and N × N, each PU is subjected to 35 intra prediction coding, and the optimal PU mode is selected according to RDCost. The optimal division mode of a CTU is determined, which generally needs to be performed by 1+4 ² +4 ³ =85 RDCost calculations as indicated by the sequence numbers in fig. 2.

The intra-coding algorithm of HEVC selects the best prediction mode from the 35 prediction modes through two steps, namely "coarse search" and "fine search". In the coarse search, the encoder first selects N candidate types from the 35 modes that are most likely to be the best mode to form a "fine search" candidate set. N depends on the size of the Prediction Units (PUs), and when the size of a PU is {4 × 4,8 × 8, 16 × 16, 32 × 32, 64 × 64}, the corresponding N is {8, 3}, respectively. The "Most Probable" prediction Modes (MPMs for short) are then added to the candidate set. In the "fine search", the rate-distortion cost is completely calculated for the modes in the candidate set, and the mode with the minimum rate-distortion cost is taken as the direction of the intra-frame mode. Since the complexity of the complete rate-distortion cost calculation is high, only a simple rate-distortion cost is calculated in the coarse search.

In consideration of compression efficiency, the HEVC encoder has high computational complexity and is limited in applications with high requirements on delay, such as video conferencing and webcasting. Therefore, a new method is still needed to improve the video coding efficiency and reduce the computational complexity.

Disclosure of Invention

To solve the problems in the prior art, an object of an embodiment of the present invention is to provide a method for selecting an intra-frame fast algorithm of a high-efficiency video encoder based on a decision tree.

In order to achieve the above purpose, one of the embodiments of the present invention adopts the following technical solutions:

a decision tree-based intra-frame fast algorithm selection method for a high-efficiency video encoder comprises the following steps:

(1) Respectively coding a training video sequence by using a first algorithm and a second algorithm, and writing intermediate information when each CTU is coded in the coding process into a text file as a characteristic;

(2) Marking the text file in the step (1), if so

If the result is marked as 0, otherwise, marking as 1, and obtaining a marked training sample, wherein RDcost1 is the total rate-distortion cost of the first algorithm, RDcost2 is the total rate-distortion cost of the second algorithm, T1 is the time used for coding the first algorithm, and T2 is the time used for coding the second algorithm respectively;

(3) And (3) after the decision tree model is trained by using the training samples marked in the step (2), predicting when the CTU starts to encode through the trained decision tree model, and determining an encoding process.

In step (1), the training video sequence includes: kimono, parkScene, cactus, basketbaldDrill, BQMall, basketbalPass, BQSquad, fourPeople, kristen AndSara, vidoo 1, vidoo 3, basketbaldDrillText, and SlideEditing.

The number of frames of the test video is equal to the number of pictures contained in one second of the video, i.e., the frame rate of the video.

The configuration file of the encoder is encoder _ intra _ main.

Preferably, the first algorithm step comprises:

(1 s) fast mode decision:

(1s.a) carrying out coarse mode search on 11 modes {0,1,2,6, 10, 14, 18, 22, 26, 30, 34} according to the HEVC standard, selecting six optimal intra-frame mode candidates according to absolute transformation difference values, and combining the optimal intra-frame modes of the left side PU and the upper side PU of the current PU to form a set A;

(1s.b) testing 2-distance neighbor modes of all elements in the set A, further selecting the best two intra-mode candidates, and forming a set B by the 1-distance neighbor modes of the two intra-modes and the Most Probable Mode (MPM) of the PU;

(1s.c) performing coarse mode search on all modes in the set B;

(1s.d) finding M modes with the minimum SATD cost from all the modes subjected to coarse mode search, and performing subsequent operation, wherein the number of M is determined by the size of CU: when the CU sizes are {64 × 64, 32 × 32, 16 × 16,8 × 8,4 × 4}, the values of M are {3, 8}, respectively;

(2 s) mode screening based on rate-distortion optimized quantization:

(2s.a) selecting two { M1, M2} combination sets W with the minimum SATD cost from M modes obtained in the fast mode decision;

(2s.b) sequentially performing the following operations for the remaining modes mi of the M modes:

if m _i Distances to all elements in W are greater than 1, then m is _i Adding the W into the W set;

if the elements in the set include these patterns m ₁ ，m ₂ Intra _ Planar, intra _ DC, MPM }, then step (2s.b) is skipped;

(2s.c) performing a fine pattern search on all elements in W;

(3 s) rate-distortion cost based termination partitioning:

if the sum of the rate-distortion costs of the current sub-CU is larger than a certain threshold, skipping the coding process of the subsequent sub-CU, and reducing the computational complexity, wherein the specific judgment standard is as follows:

wherein: the value range of K is 1,2，3，4}，β _K the value corresponding to K = {1,2,3,4} is {1.5,1.2,1.1,1},

representing the sum of the hadamard costs of the 4 sub-CUs, and in case the 4 sub-CUs have not been completely encoded, its value is replaced by the rate-distortion cost of the current CU,

the sum of the hadamard costs of the first K sub-CUs,

represents the rate-distortion cost of the ith sub-CU,

the rate distortion cost of the current CU.

The N-distance neighbors of a certain Intra mode represent the Intra mode whose absolute value of the difference from the value of the Intra mode is equal to N, i.e., the Intra mode satisfying the equation | m-mi | = N (where mi represents the value of the Intra mode and the value of m represents the value of the Intra mode N-distance neighbors), and the modes Intra _ Planar and Intra _ DC have no N-distance neighbors.

Preferably, the second algorithm step comprises:

(1 s) CU depth prediction:

if it is

The depth range of the current coding block is set to be the same as the depth range of the coding block at the same position of the previous frame, if so

Depth range of current coding block is set as

Wherein

For the previous frame phaseThe minimum coding depth of the co-located coding block,

the maximum coding depth of the coding block at the same position of the previous frame is p, and the constant is 1.02;

(2 s) rate-distortion cost based termination partitioning:

if the rate-distortion cost J of the CU with the current coding block depth d meets the following conditions:

wherein

Represents the total rate distortion cost of the block at the current position of the previous frame, d represents the coding depth,

the value is the ratio of the number of CU (coding block) of the current position coding block of the previous frame in the maximum depth to the total number of CU of the coding block;

(3 s) fast candidate pattern screening:

and (3) for the candidate set obtained after the coarse search, wherein the elements are arranged from small to large according to the rate-distortion cost and are marked as P = { P (0), P (1), \8230;, P (M-1) }, and before the fine search is carried out, the following operations are carried out on the elements in the candidate set:

(3s.a) assuming that M is the last index value in P of the three elements in MPM, the size of the set P can be reduced to P = { P (0), P (1), \8230;, P (M) }, where M-1>, M;

(3s.b) for all elements in the new set P, if J (P (i)) _SATD ＞1.3J(p(0)) _SATD Then element P (i) is removed from set P, where J (P (i)) _SATD ，J(p(0)) _SATD Respectively representing the rate distortion cost of the (i + 1) th and 0 th elements in P.

Preferably, the intermediate information includes: maximum coding depth of the CTU, minimum coding depth of the CTU, ratio of the number of maximum coded depth CUs to the number of all CUs in the CTU, total rate-distortion cost of the CTU at the current position of the previous frame, difference between the left CTU maximum coded depth and the minimum coded depth, ratio of the left CTU maximum coded depth CU to the number of all CUs in the CTU, difference between the right CTU maximum coded depth and the minimum coded depth, ratio of the right CTU maximum coded depth CU to the number of all CUs in the CTU, and time taken to code the CTU.

Preferably, in step (3), the step of determining the encoding flow includes:

(a) Before CTU coding starts, judging whether a coding frame is a first frame, if so, coding the current CTU by using a first algorithm, and if not, performing the step (b);

(b) Inputting the characteristics stored when the CTU at the current position of the previous frame is coded into a decision tree for prediction, if the prediction result is 0, coding the CTU by using a first algorithm, and if the result is 1, coding the CTU by using a second algorithm;

(c) Collecting intermediate quantities in the encoding process of the step (b) and encoding the next CTU;

(d) Repeating steps (b) and (c) until all CTU codes are completed.

In the step (1), the current CTU is coded by a first algorithm, and after the coding is finished, intermediate quantities in the coding process are collected and stored as characteristics in the decision-making process.

The first algorithm introduces a fast decision algorithm at the micro and macro level respectively: microscopically providing a progressive coarse search algorithm to reduce the number of prediction directions for performing coarse search; macroscopically comparing the absolute transform difference (SATD) of the current PU with the sum of the absolute transform differences of the four sub-PUs to determine whether the CU is further divided down and reduce the coding depth.

With respect to the second algorithm, the average of the difference between the maximum coded depth and the minimum coded depth of a coded block is equal to 1.75, which means that only 2 to 3 depths need to be searched during the coding block encoding process, instead of encoding all depths. Therefore, the computational complexity of the encoder can be greatly reduced as long as the coding depth of the coding block can be accurately predicted.

In the method according to one embodiment of the present invention, the CTU is a minimum unit of coding, and coding a segment of video may be regarded as coding individual CTUs, and different algorithms may consume different time when coding the same CTU. If the shortest encoding time algorithm is used for each CTU to complete the encoding, the total encoding time is less than that of an encoder using a single algorithm.

The invention concept of one embodiment of the invention is as follows: the CTU is the smallest unit of coding, and coding a video segment can be regarded as coding individual CTUs, and different algorithms can consume different time to code the same CTU. If the shortest encoding time algorithm is used for each CTU to complete the encoding, the total encoding time will be less than an encoder using a single algorithm.

The embodiment of the invention has the beneficial effects

Compared with a single algorithm, the method provided by one of the embodiments of the present invention can further reduce the computational complexity of the encoder, and meanwhile, the video quality loss is negligible.

Under the condition of distortion similar to the original HEVC coding rate, the method of embodiment 1 of the invention can reduce the coding time by 61.7 percent on average, and meanwhile, the quality (BRBD) only loses 1.91 percent.

Drawings

Fig. 1 is a diagram illustrating 33 angular prediction directions of intra prediction.

Fig. 2 is a schematic diagram of a CTU quadtree recursive partitioning structure.

FIG. 3 is a flow chart for utilizing decision tree prediction.

Detailed Description

The following are specific examples of the present invention, and the technical solutions of the present invention will be further described with reference to the examples, but the present invention is not limited to the examples.

Example 1

This example provides a decision tree based method for fast algorithm selection in a frame of a high efficiency video encoder, comprising the steps of:

(2) Marking the text file in the step (1), if so

Marking the rate distortion cost as 0, otherwise marking the rate distortion cost as 1, and obtaining a marked training sample, wherein RDcost1 is the total rate distortion cost of the first algorithm, RDcost2 is the total rate distortion cost of the second algorithm, T1 is the time used for coding the first algorithm, and T2 is the time used for coding the second algorithm respectively;

(3) And (3) after the decision tree model is trained by using the training samples marked in the step (2), predicting when the CTU starts to encode through the trained decision tree model, and determining an encoding process, as shown in fig. 3.

Cfg is the encoder _ intra _ main.

The first algorithm step includes:

(1 s) fast mode decision:

(1s.b) testing 2-distance neighbor modes of all elements in set a, further selecting the best two intra-mode candidates from the 2-distance neighbor modes, and combining the 1-distance neighbor modes of the two intra-modes with the Most Probable Mode (MPM) of the PU to form set B;

(1s.c) performing coarse mode search on all modes in the set B;

(1s.d) finding out M modes with the minimum SATD cost from all modes subjected to coarse mode search, and performing subsequent operation, wherein the number of M is determined by the size of the CU: when the CU sizes are {64 × 64, 32 × 32, 16 × 16,8 × 8,4 × 4}, the values of M are {3, 8}, respectively;

(2 s) mode screening based on rate-distortion optimized quantization:

(2s.b) the remaining modes mi of the M modes are sequentially subjected to the following operations:

if m _i Distances to all elements in W are greater than 1, then m is _i Adding the obtained product into a W set;

(2s.c) performing fine mode search on all elements in W;

(3 s) rate-distortion cost based termination partitioning:

wherein: the value range of K is {1,2,3,4}, beta _K The value corresponding to K = {1,2,3,4} is {1.5,1.2,1.1,1},

represents the sum of the hadamard costs of the 4 sub-CUs, the value of which is replaced by the rate-distortion cost of the current CU in case the 4 sub-CUs have not been completely encoded,

the sum of the hadamard costs of the first K sub-CUs,

representing the rate-distortion cost of the ith sub-CU,

the rate distortion cost of the current CU.

The N-distance neighbors of a certain Intra mode represent an Intra mode whose absolute value of the difference from the value of the Intra mode is equal to N, i.e., an Intra mode satisfying the equation | m-mi | = N (where mi represents the value of the Intra mode and the value of m represents the value of the Intra mode N-distance neighbors), and the modes Intra _ Planar and Intra _ DC have no N-distance neighbors.

The second algorithm step comprises:

(1 s) CU depth prediction:

if it is

The depth range of the current coding block is set to be the same as that of the coding block at the same position of the previous frame, if so

Depth range of current coding block is set as

Wherein

The minimum coding depth of the block is coded for the same position of the previous frame,

(2 s) rate-distortion cost based termination partitioning:

wherein

Represents the total rate-distortion cost of the block of the current position of the last frame, d represents the coding depth,

the value is the ratio of the number of CUs of the current position coding block at the maximum depth of the previous frame to the total number of CUs of the coding blocks;

(3 s) fast candidate pattern screening:

(3s.a) assuming that M is the most posterior index value of the three elements in MPM in P, the size of the set P can be reduced to P = { P (0), P (1), \8230;, P (M) }, where M-1>, M;

The intermediate information includes: the maximum coding depth of the CTU, the minimum coding depth of the CTU, the ratio of the number of the maximum coding depth CUs to the number of all CUs in the CTU, the total rate-distortion cost of the CTU at the current position of the previous frame, the difference between the maximum coding depth of the left CTU and the minimum coding depth, the ratio of the number of the maximum coding depth CUs of the left CTU to the number of all CUs in the CTU, the difference between the maximum coding depth of the right CTU and the minimum coding depth, the ratio of the number of the maximum coding depth CUs of the right CTU to the number of all CUs in the CTU, and the time taken for coding the CTU.

In the step (3), the step of determining the encoding process includes:

(d) Repeating steps (b) and (c) until all CTU codes are completed.

Example 2

In this example, the method of embodiment 1 is adopted, and in a Win10 operating system, the coding environment is Visual Studio 2017, and the same video is coded and compared with the original HEVC coding result, and the HEVC reference software HM has a version number of 10.0. The results are shown in Table 1.

Table 1 comparison of performance of the method of example 1 with the original HEVC coding results

Video name	Frame rate	Encoding picture numbers	BRBD(％)	Rate of time reduction
					BQSquare	60	600	1.465593	0.552692
BasketballDrill	50	500	1.904419	0.577065
					BasketballDrive	50	500	2.15825	0.723275
FourPeople	60	600	1.745963	0.633975
					BasketballDrillText	50	500	2.286081	0.598056
Mean value of			1.9120612	0.6170126

As can be seen from table 1, compared with the original HEVC encoder, the method of embodiment 1 of the present invention can reduce the encoding time by 61.7% on average, and only 1.91% of the quality (BRBD) is lost.

Claims

1. A decision tree-based intra-frame fast algorithm selection method for a high-efficiency video encoder is characterized by comprising the following steps:

(2) Marking the text file in the step (1), if so

(3) After the decision tree model is trained by using the training samples marked in the step (2), predicting when the CTU starts to encode through the trained decision tree model, and determining an encoding process;

the first algorithm step comprises:

(1 s) fast mode decision:

(1s.a) performing coarse mode search on {0,1,2,6, 10, 14, 18, 22, 26, 30, 34}11 modes according to the HEVC standard, selecting six best intra mode candidates according to absolute transformation difference values, and combining the best intra modes of the left side PU and the upper side PU of the current PU to form a set A;

(1s.b) testing 2-distance neighbor modes of all elements in the set A, selecting two intra-mode candidates from the 2-distance neighbor modes, and combining the 1-distance neighbor modes of the two intra-modes and a Most Probable Mode (MPM) of the PU into a set B;

(1s.c) performing coarse mode search on all modes in the set B;

(1s.d) finding out M modes with the minimum SATD cost from all modes subjected to coarse mode search, and performing subsequent operation, wherein the number of M is determined by the size of the CU: when the CU sizes are {64 × 64, 32 × 32, 16 × 16,8 × 8,4 × 4}, the values of M are {3, 8};

(2 s) mode screening based on rate-distortion optimized quantization:

(2s.a) selecting two { M1, M2} sets W with the minimum SATD cost from M modes obtained in the rapid mode decision;

(2s.b) successively subjecting the remaining sets of M modes to:

if the distances between the remaining set and all elements in W are larger than 1, adding the remaining set into the W set;

(2s.c) performing a fine pattern search on all elements in W;

(3 s) rate-distortion cost based termination partitioning:

wherein: the value range of K is {1,2,3,4},

the value corresponding to K = {1,2,3,4} is {1.5,1.2,1.1,1},

the sum of the hadamard costs of the first K sub-CUs,

represents the rate-distortion cost of the ith sub-CU,

is the rate-distortion cost of the current CU;

the second algorithm step comprises:

(1 s) CU depth prediction:

if it is

If the depth range of the current coding block is set to be the same as that of the coding block at the same position of the previous frame, if so, the depth range of the current coding block is set to be the same as that of the coding block at the same position of the previous frame

The depth range of the current coding block is set to

Wherein

The minimum coding depth of a block is coded for the same position of the previous frame,

(2 s) rate-distortion cost based termination partitioning:

if the rate-distortion cost J of the CU with the current coding block depth of d meets the following conditions:

wherein

(3 s) fast candidate pattern screening:

(3s.a) assuming that M is the most backward index value in P of the three elements in MPM, the size of the set P can be reduced to P = { P (0), P (1), \8230;, P (M) }, where M-1> M;

(3s.b) for all elements in the new set P, if satisfied

Then the element P (i) is removed from the set P, where

，

Respectively representing the rate distortion cost of the (i + 1) th element and the 0 th element in the P;

the intermediate information includes: the maximum coding depth of the CTU, the minimum coding depth of the CTU, the ratio of the number of the maximum coding depth CUs to the number of all CUs in the CTU, the total rate-distortion cost of the CTU at the current position of the previous frame, the difference between the maximum coding depth of the left CTU and the minimum coding depth, the ratio of the number of the maximum coding depth CUs of the left CTU to the number of all CUs in the CTU, the difference between the maximum coding depth of the right CTU and the minimum coding depth, the ratio of the number of the maximum coding depth CUs of the right CTU to the number of all CUs in the CTU, and the time for coding the CTU;

in the step (3), the step of determining the encoding process includes:

(b) Inputting the characteristics stored when the coding of the CTU at the current position of the previous frame is finished into a decision tree for prediction, if the prediction result is 0, coding the CTU by using a first algorithm, and if the result is 1, coding the CTU by using a second algorithm;

(c) Collecting intermediate information in the encoding process of the step (b) and encoding the next CTU;

(d) Repeating steps (b) and (c) until all CTU codes are completed.