CN116320436B

CN116320436B - Decision tree-based VVC (variable valve timing) quick coding method

Info

Publication number: CN116320436B
Application number: CN202310331719.2A
Authority: CN
Inventors: 汪大勇; 储浩; 陈柳林; 黄令; 邝毅; 梁鹏; 许亚庆
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-11-07
Anticipated expiration: 2043-03-31
Also published as: CN116320436A

Abstract

The invention belongs to the field of video coding, and particularly relates to a decision tree-based VVC (variable video coding) quick coding method, which comprises the following steps: acquiring characteristics of a current CU, predicting probability distribution of a current CU prediction mode, setting a probability threshold, sequentially testing candidate modes according to the probability value, and recording probability sum of coded modes until the probability sum of the coded modes is larger than or equal to the threshold; if the current CU optimal prediction mode is a MERGE/SKIP mode, evaluating whether the prediction effect of the current CU optimal prediction mode terminates the coding flow; and obtaining coding information of the CU to be partitioned and characteristics of the subblocks, obtaining probability distribution of various partitions according to a decision tree, and trying various partition modes until the probability sum is greater than or equal to a threshold value. The invention respectively establishes the decision tree model for CUs with different sizes, solves the problem of lower machine learning prediction accuracy, and provides a method for probability and threshold value to solve the problem that a single mode threshold value can not adapt to complex video scenes, thereby improving coding efficiency.

Description

Decision tree-based VVC (variable valve timing) quick coding method

Technical Field

The invention belongs to the field of video coding, and particularly relates to a decision tree-based VVC (variable video coding) quick coding method.

Background

With the development of network technology, video is applied to various life activities of people, such as online education, remote video conference, live game, short video and the like. The requirements of people on video quality are also higher and higher, and the use frequency of ultra-high definition videos such as 4K, 8K and the like is also more and more frequent. High quality, high resolution, wide color gamut video content necessarily results in an increase in the amount of video data, which presents a serious challenge for video transmission. Thus, the joint video exploration team customizes the next generation video coding standard, multifunction video coding (VVC). The VVC standard has two main objectives, the first of which is to specify a video coding technique whose compression capacity is far greater than that of the previous generations of such standards, and the second of which is to make this technique highly versatile for efficient use in a wider range of applications. Advanced coding technology and tools are used in the VVC, but the geometric multiple of the coding complexity is increased, so that the application and popularization of the VVC are seriously hindered. Therefore, on the premise of ensuring that the video quality is basically unchanged, the coding process is optimized, and the reduction of coding complexity becomes a current research hotspot.

There are two major categories of current research methods, statistical-based and machine-learning-based, respectively. The main idea of the research method based on statistics is to predict the coding depth or mode of a Coding Unit (CU) by utilizing texture characteristics and motion characteristics of the CU; and the prediction effect is evaluated by using a transformation residual and a rate distortion cost (RDCost), and the coding flow of the CU with better prediction effect is terminated in advance. The method is based on a study method of machine learning, extracts CU pixel information and coding information as features, and trains a network model to predict the mode, depth and the like of the CU.

Although the existing method can reduce the coding complexity to a certain extent, the existing method has certain problems: 1. the statistical-based research method requires manual intervention and parameter setting, and the manual intervention becomes very difficult and time-consuming when the problems of extremely large research data quantity and very complex characteristics are studied. 2. In the conventional fast algorithm, whether a certain partition mode or a partition mode is adopted is generally determined according to a set threshold, but the VVC coding mode and the partition mode are numerous, the video sequence content is diversified, and a fixed threshold or a single threshold cannot be applied to complex and changeable coding scenes. 3. Most of the previous research was proposed for the previous generation video coding standard HEVC, in which VVC coding structures were coded, and many algorithms were not applicable.

Disclosure of Invention

In order to solve the technical problems, the invention provides a VVC (variable valve timing) quick coding method based on a decision tree, which comprises the following steps:

s1: dividing the current CU into three types according to different sizes, establishing a decision tree for each type of CU, acquiring a parent CU coding mode, an adjacent CU coding mode and a pixel standard deviation of the current CU, utilizing the decision tree to acquire probability values of various prediction modes according to the acquired parent CU coding mode, the adjacent CU coding mode and the pixel standard deviation of the current CU, sequentially coding and testing the various prediction modes according to the probabilities from high to low, recording the probability sum of the coded prediction modes, ending the coding prediction mode when the probability sum reaches a set threshold value, skipping over the uncoded mode, and taking the prediction mode with the minimum rate distortion cost in the coded and tested prediction mode as the optimal mode;

s2: judging whether the current optimal prediction mode is a MERGE/SKIP mode, if so, entering a step S3, and if not, entering a step S4;

s3: for a CU with an optimal prediction mode of MERGE/SKIP mode, acquiring a pixel standard deviation, space-time adjacent CU mode information, a historical coded CU mode rate distortion cost, horizontal and vertical motion characteristics and video resolution, and predicting whether the current CU terminates the coding flow in advance by using a decision tree according to the acquired pixel standard deviation, space-time adjacent CU mode information, the historical coded CU mode rate distortion cost, the horizontal and vertical motion characteristics and the video resolution, if yes, ending the coding flow, if not, entering S4;

s4: acquiring parent CU modes, space-time adjacent CU modes, horizontal and vertical motion characteristics and partition sequences of the CU, calculating pixel standard deviation of sub blocks of the CU and the direction of texture gradient of the CU, establishing decision trees for CUs with different sizes, predicting probability distribution of various partition modes by using the decision trees, sequentially encoding and testing various partition modes according to the decision trees until the probability sum of the tried modes is larger than a set threshold value, and stopping traversing the partition modes;

s5: after the current CU is divided, steps S1 to S4 are repeatedly performed on its sub-CUs.

The invention has the beneficial effects that:

the invention provides a method for respectively establishing decision tree models for CUs with different sizes by utilizing the characteristics of model correlation, sub-block pixel standard deviation, division sequences and the like, solves the problem of lower accuracy of machine learning prediction, provides a probability and threshold value method, solves the problem that a single mode threshold value cannot adapt to a complex video scene, and improves the coding efficiency.

Drawings

FIG. 1 is a flow chart of a decision tree-based VVC fast coding method according to the present invention;

FIG. 2 is an exemplary diagram of parent-child CUs of the present invention;

FIG. 3 is an exemplary diagram of a spatial-temporal neighboring CU according to the present invention;

FIG. 4 is a diagram illustrating the relationship between the inter-mode prediction threshold T and time BDBR according to the present invention;

FIG. 5 is a schematic diagram of the different resolution sequence division depth distribution of the present invention;

FIG. 6 is a diagram of a reference CU and motion vector sample points according to the present invention;

fig. 7 is a schematic diagram showing the relationship between the partition mode prediction threshold P and time BDBR according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

A VVC rapid coding method based on decision tree, as shown in figure 1, comprises the following steps:

Step S1, obtaining CU characteristics, and obtaining probability values of various prediction modes by utilizing a decision tree prediction result; and sequentially try various prediction modes from big to small according to probabilities. And establishing a decision tree model by using a sklearn machine learning library DecissionTreeClassification function in the Python language, performing parameter tuning by using a GridSearch method, and finally exporting the decision tree model into a C++ language by using a sklearn_porter library so as to be embedded into a VTM encoder. Specifically, the implementation flow is as follows:

features are extracted from the encoding process for use as a training set for the decision tree. The characteristic values used are:

parent CU coding mode

The definition of a parent-child CU is that a large CU is divided into smaller CUs after a certain division mode, the larger CUs are parent CUs, and the smaller CUs are child CUs. As shown in fig. 2, a CU of 32x32 size is once quadtree-split into four 16x16 CUs, the parent CU of which is the 32x32 CU for the four smaller child CUs. Because the child CU is part of the parent CU, it tends to have similar motion information and texture characteristics, which have a high probability of using the same prediction mode during predictive coding. In the coding results of the 22 test sequences, when a certain local optimal prediction mode is selected by the parent CU, the distribution condition of the selection of each prediction mode of the child CU is shown in a result table 1, and the numbers in diagonal lines are found to be obviously larger than the numbers in the first column, so that the diagonal line numbers are larger, and a certain correlation exists between the parent CU mode and the child CU mode; the reason for the larger number of the first line is that in inter prediction mode, the size/SKIP is more used by itself, and thus the sub-block takes up a larger amount of this mode.

Table 1 parent CU adopts a certain mode and its child CU adopts prediction mode distribution

Neighboring CU coding modes

Examples of the spatio-temporal neighboring CU are shown in fig. 3, among the spatial neighboring CU modes, the coding mode of the left neighboring CU (shown as block B in fig. 3) and the upper neighboring CU (shown as block C in fig. 3) is selected, and the temporal neighboring CU mode selects the mode of the reference CU; the temporal neighboring CU mode selects the mode of the reference CU. In order to deal with the case where an edge CU lacks a spatial neighboring CU, and considering that various neighboring CU mode distributions have similarities, experiments integrate temporal and spatial neighboring CU modes, as follows:

the adjacent block coding mode takes the modes M1, M2 of the left and upper coded CU spatially and takes the co-located CU coding mode M3 temporally, if M1, M2, M3 are all empty, cm=pm, otherwise CM is the mode in M1, M2, M3.

Standard deviation of CU pixels

In inter-prediction common coding modes, each mode has a video scene for which it is applicable. The Merge/Skip mode is applicable to smooth motion, texture-simple blocks, the Affine mode is applicable to video content containing rotation, scaling, and the Geo mode divides the CU into two parts, each small block containing a different motion vector, and is applicable to blocks containing edges. The standard deviation of the pixels of the block with simple texture and large flat area is smaller, and the standard deviation of the pixels of the block with complex texture and intense motion is larger, so the standard deviation of the pixels can be used for reflecting the texture complexity of the CU, and for the coding unit alpha with width W and height H, the standard deviation calculation formula is as follows:

wherein delta represents standard deviation, x _i,j The pixel value representing coordinates (i, j) is rounded.

Table 2 analyzes the mode distribution of the CU's of 128x128 to 4x4 size during VVC encoding, and finds that the difference in mode distribution of CU's of different sizes is not obvious, but there is a size limitation in the mode usage scenario in the VVC encoder. For example, in the CUs of 128x128, 128x64, and 64x128 sizes, the duty ratio of GEO and INTRA modes is 0, and in the CUs of 4x8, 8x4, 4x16, and 16x4 sizes, the duty ratio of AFFINE mode and GEO mode is 0, and some CUs have various modes. CU is therefore classified into three categories: the method comprises the steps that a CU with 128 sizes exists in the length and the width, a CU with 4 sizes exists in the length and the width and other CUs, and a decision tree is respectively built for three CUs so as to improve prediction accuracy.

Table 2 different sizes CU coding mode selection distribution

And inputting the obtained feature vector into a trained decision tree model, wherein the output result of the decision tree model is the prediction process. For example, the decision tree output is [0.1,0.2,0.3,0.4], then the probabilities for class 1,2, 3,4 coding modes are corresponded.

And selecting a part of prediction mode coding test according to the probability value predicted by the decision tree and the set probability and threshold value. In order to solve the problem of low prediction accuracy of a prediction mode in machine learning, the invention provides a method for utilizing probability and selecting one or more most probable modes so as to reduce coding distortion caused by prediction. The scheme is specifically realized as follows:

according to the predicted probability value of the decision tree, various coding modes are stacked from small to large, a threshold T is set, and the probability sum of the coded modes is t=0.

The top of stack pattern is popped off, its probability is added to the probability sum t, and the pattern is encoded.

And judging whether T is greater than or equal to T, if so, terminating the flow, otherwise, executing the step 2.

Fig. 4 shows that the amount of BDBR increase decreases with increasing threshold value, but the encoding time decreases with increasing threshold value, compared to the original encoder time decrease and the variation of BDBR for each test sequence in reference software VTM10.2 from 0 to 1. Observing curve change, when the threshold value is 0.75, the method has good performance on the BDBR increment and the coding time decrement, so that the invention selects 0.75 as the threshold value T.

In the original encoder, the unique prediction effect evaluation index rate distortion cost (RDCost) exists, and smaller prediction effects are better by comparing the RDCost sizes of various modes. There are two variables in the encoder, one to hold the minimum RDCost and one to hold the RDCost of the current mode, if it is smaller than the optimal, the current is used to replace the optimal, so that the mode with the minimum RDCost can be obtained through one traversal, but it is time-consuming to do so. The MERGE/SKIP mode prediction effect is input into a decision tree through the characteristics of the CU, and the output prediction effect is obtained.

In step S3, the invention provides a method for terminating a predictive coding flow in advance by combining CU texture characteristics, coding information and historic coded CU rate distortion cost with a decision tree according to a MERGE/SKIP mode. The specific implementation flow is as follows:

in the video coding process, CUs with simple textures, smooth motion or better prediction effect are often coded with larger sizes. For this purpose the following features are extracted:

CU size, quantization Parameter (QP)

In table 3, the VVC reference test sequence is analyzed, with CU of different sizes and different coding modes (MERGE, SKIP, AMVP, AFFINE, GEO and INTRA), which do not continue to divide any more. The study results show that when QP and CU are the same size, the probability that CU adopting MERGE/SKIP and AFFINE modes will not continue to divide is significantly higher than other modes. In case qp=27 and block size 64×64, the duty cycle of CU no longer dividing in the MERGE/SKIP mode reaches 75%, whereas in AMVP, GEO and INTRA, this duty cycle is only 20%, 21% and 4%, respectively. These results show that the adoption of the MERGE/SKIP mode and the AFFINE mode can greatly reduce the number of continuous CUs to divide, thereby reducing the coding complexity and the code rate.

Furthermore, different QP values may also affect whether the CU is further partitioned, where this effect is more pronounced for larger blocks. For example, at QP 27 and 37, the 64x64CU encoded with MERGE/SKIP does not continue dividing at 75% and 86% respectively, while for 32x32 CU this is 78% and 84% respectively. Thus, using a lower QP value or a larger CU size can reduce coding complexity and code rate more effectively.

It is also found from the table that the size of a CU is also an important factor affecting whether a CU continues to partition. Specifically, as the CU size increases, the proportion of no longer continuing to divide gradually increases from near 0 to near 1. This result shows that in the encoding process, more attention is required to the size and partitioning of the CU in order to better balance the encoding complexity and video quality.

Table 3 QP is a duty ratio of 22 and 27, which is not divided any more when each size CU adopts a certain mode

Classification of CUs according to RDCost of historic encoded CUs

In the VVC reference coding software, when coding a CU, firstly traversing various inter-frame prediction modes and intra-frame prediction modes, secondly trying various dividing modes, obtaining the rate distortion cost of each mode through rate distortion optimization, and selecting an optimal coding mode through comparing the rate distortion cost of each mode. In this process, in order to evaluate the prediction effect of a certain mode, the rate-distortion costs of all modes need to be calculated, and then the rate-distortion costs are compared, so that the prediction effect of each mode can be accurately evaluated, but the coding complexity is increased sharply, and the coding benefit is not obvious. To solve this problem, a method of evaluating the coding effect of the current prediction model using the historic coded CU information is proposed: and (3) saving the mode of the latest n coded CUs with the same size and the corresponding RDCosts, calculating the average value of the mode, classifying the current CUs into a class 1 if the RDCosts of the current CUs are larger than the historical average value, indicating that the prediction effect is poor, and considering that the prediction effect is good if the current CUs are smaller than the historical average value, and ending the coding flow with higher probability. When the n+1th CU is encoded, the RDCost of the 1 st CU is removed, and the RDCost of the n+1th CU is saved.

Wherein C represents the category of CU, RDCost _cur Representing the rate distortion cost after the current CU codes, RDCost _his Representing the average of the rate-distortion costs of the historic CU's encoded CU.

Video resolution

In real life, there are various types of terminal devices, so videos of the same content have various resolution versions. In the encoding process, it is necessary to perform different encoding strategies for videos with different resolutions. For video of the same content, a high resolution video sequence has relatively smooth texture changes, but motion vectors are relatively larger, so there is a phenomenon that: regions of relatively simple texture and relatively gentle motion, CUs in high resolution video sequences are more prone to un-partitioning, large-size CUs encoding; however, in the region where the motion is intense and there is an edge of the object, the video with high resolution is more likely to tend to continue to divide because the motion vector is amplified, and prediction encoding is performed using a smaller CU. As shown in fig. 5, when the left and right parts are respectively in RA configuration of the RaceHorses sequence in C, D class, and the QP is 37, the depth information is divided by the 3 rd frame, where the class C video resolution is 832x480, and the class d video resolution is 418x240. It can be found that, under the same coding environment, from the distribution situation of low division depth, the C-class sequence with larger resolution on the left has the most color ratio of division depth 2, and also has partial dark color, i.e. a large-size CU with division depth 1. And in the D-type video sequence encoding result, the color CU with the division depth of 3 takes the most. Whereas from the distribution of high division depths, some color CUs with a depth of 6 on the left, while the maximum division depth on the right is only 5. By analyzing the coding result, the guess of the influence of the video resolution on the CU partition is also verified, so in order to further improve the model accuracy, the video resolution is also used for assisting in judging whether the CU terminates the coding process in advance.

Parent CU mode and space-time adjacent CU mode

The invention analyzes that the CU modes have certain correlation in the coding process, and the relation is more obvious in the Merge/Skip modes. If both the neighboring CU and the parent CU of the CU adopt the Merge/Skip mode, and the current CU also adopts the Merge/Skip mode, the CU region texture is likely to be relatively simple, fine division is not required, or the benefit brought by the division is low. Therefore, the invention adopts a father-son mode and a space-time adjacent CU model as decision tree characteristics.

Whether the division is terminated in advance is determined according to the prediction result of the decision tree, if the prediction result is to terminate the division, the coding flow is ended, and if the prediction result is to continue the division, various division modes are continuously tried.

MTT division is introduced into VVC standard, specifically including four-fork division, horizontal binary division, vertical binary division, horizontal trigeminal division, and vertical trigeminal division, and the division is generally along the edge or texture direction of the moving object, so the following features are proposed:

sub-block pixel standard deviation evaluates the differences between the various partitioned sub-blocks, the greater the difference, indicating a higher likelihood of such partitioning.

Video partitioning tends to partition CUs into two or more parts that are simple and complex to texture. Therefore, it is feasible to predict the partition of the CU using the pixel standard deviation of the sub-block. In order to avoid repeated calculation of standard deviations of various divided sub-blocks, the experiment divides the CU into 16 equally sized small blocks, calculates the pixel standard deviations thereof, and represents the standard deviations of various divisions with the pixel standard deviations of the 16 small blocks. The standard deviation of each small block in the figure is calculated by the following steps:

wherein W, H represents the width and height, x, respectively, of the CU _i,j Representing the pixel value of coordinates (i, j),representing CU pixel mean, delta _k Representing the standard deviation of the pixels representing the kth sub-block, the round function representation rounds the result, e.g. when the CU uses quadtree division, the standard deviation of the pixels of its four sub-blocks may be represented by sub-block combinations of (1, 2,5, 6), (3, 4,7, 8), (9, 10, 13, 14) and (11, 12, 15, 16), respectively. The method of dividing the sub-blocks by using smaller sub-blocks can avoid a large number of repeated calculations, and the resulting calculation errors are small and can be ignored.

The gradient information helps decide whether to use horizontal or vertical partitioning. The two branches and the three branches are divided into a horizontal direction and a vertical direction. The larger the Gx/Gy ratio, the more suitable it is for horizontal partitioning and the smaller it is for vertical partitioning.

CU pixel texture direction

There are many ways to measure the direction of the CU texture, one common way is by computing a texture direction histogram. Specifically, the gradient values in the horizontal and vertical directions are calculated for the pixel point P in the CU by using the Sobel operator, and then the gradient direction of P is calculated by using the trigonometric function. Since the range of θ is [ -pi/2, pi/2 ], the whole range of [0,255] mapped in this experiment represents the gradient direction, and the mapping manner adopted is:

G＝round((θ+π/2)/π*255)

where round is a rounding function, and the mapping result is an integer. The gradient direction value obtained through calculation can be expressed as an integer between 0 and 255, and subsequent processing is facilitated. A texture direction histogram is calculated. The height of each entry in the histogram represents the sum of the gradient values in that direction, so the direction to which the peak of the histogram corresponds is the main texture direction of the CU. By measuring the CU texture direction, the texture information in the CU can be better understood and provide useful information for prediction mode selection and motion estimation in video coding.

Horizontal and vertical motion characteristics

In video coding, motion vectors are an important item of information whose purpose is to describe the motion of objects between adjacent frames. The magnitude and direction of the motion vector may reflect the speed and direction of motion of the object in the image. VVC is coded in units of blocks, and all pixels within a CU are motion compensated using the same MV. To improve the accuracy of prediction, encoders often partition a CU into smaller CUs, each part being motion compensated with a different MV.

For the above analysis, a method of representing the motion state of each sub-region of a CU with the motion vector of the reference CU is presented herein. As shown on the left side of fig. 6, the corresponding reference pixel can be obtained from the motion vector of the current CU, where MVs of four vertices are sampled, as shown on the right side of fig. 6, and used to calculate the horizontal motion feature hmv and the vertical motion feature wmv, specifically hmv and wmv are initially 0, when equations mv1=mv2, mv3=mv4 each hold, hmv plus 1, equations mv1=mv3, mv2=mv4 each hold, wmv plus 1.

Dividing sequences

In the VVC coding standard, coding of an image is performed in units of blocks, and, for example, inter prediction, all pixels in one CU are motion-compensated using the same MV information. In order to obtain a better prediction effect, the encoder further divides the Coding Unit (CU) into smaller blocks so that texture information and motion information within one block are more similar, and records the division manner of the CU using a division sequence (split series). In the division sequence, numbers are used to represent different types of division modes, for example, 1 represents a four-fork division, 2 represents a horizontal binary division, 3 represents a vertical binary division, 4 represents a horizontal three-fork division, and 5 represents a vertical three-fork division. As shown, the upper gray CU is obtained from 128x128CTU by two quadtrees and one trigeminal division, so its split series is 115; whereas the lower gray CU is obtained by the CTU through one quadtree division, two vertical binary divisions and one horizontal binary division, so its split series is 1332. The division sequence contains a large amount of information, the division depth can be deduced from the division sequence, and each digit represents one division; it is also possible to infer the shape and limitation of division, for example, for a CU with a lower gray split series 1332 in the figure, the size is 64x64 after one time of four-way division, the size is changed from 64x64 to 64x32 after two times of vertical two-way division, and the size is changed to 64x16 after one time of vertical two-way division, and the size is changed to 32x16 after one time of vertical two-way division. For blocks containing a trigeminal division, because the trigeminal division is not a uniform division, but is at 1:2:1, so more information is needed to infer its size information.

The MTT partition mode quick decision algorithm establishes a decision tree for CUs with different sizes, wherein the length and the width of the CUs are not more than 64 in a non-key frame, namely a P frame and a B frame, and the width or the height of the CUs is more than 8; in the VVC standard, the binary tree or the trigeminal tree type division is not allowed to be performed any more, but the CU of 32x32 and 16x16 may be obtained by the binary tree or the trigeminal tree type division, for example, one CU of 32x32 may be obtained by performing the four-way division on a CU of 64x64, or may be obtained by performing the one-time horizontal binary division and the one-time vertical binary division, and the CU obtained by performing the latter method is not allowed to be subjected to the four-way division any more. As shown in the table, the ratios of the division modes in the blocks allowing the four-way division and the four-way division in the test sequence are counted, and the ratios of the division modes are found to be very different, so that two decision trees are respectively built for the CUs of 32x32 and 16x 16. In summary, the MTT partition mode fast decision algorithm builds 24 decision trees in total.

Table 5 partition mode distribution of blocks allowing and prohibiting quad

And selecting a partial dividing mode to encode according to the probability value output by the decision tree. Similar to the predictive coding mode, the problem of low accuracy also exists with the machine learning predictive partitioning mode, so that the probability and threshold-based partitioning mode selection method similar to that in the step S3 is adopted in the step S5, and the specific flow is as follows:

and according to the probability value output by the decision tree, a stack is put into a dividing mode from small to large, meanwhile, a probability threshold value is set to be P, and the probability sum of the initial coded dividing modes is 0.

And taking out the trestle top dividing mode, adding the corresponding probability value to p, and encoding the dividing mode.

And comparing whether P is larger than P, if so, stopping traversing the division mode, skipping the uncoded mode, and if not, repeating the previous step.

Fig. 7 shows the mean of the time decrease and BDBR change for each reference video sequence when the threshold P is from 0.7 to 0.95, from which it is known that the time decrease and BDBR increase are at a good point when the threshold is 0.75, so the present invention takes 0.75 as the threshold P.

Coding is performed recursively, and for a CU, the prediction mode and the partition mode are in a level relationship, and eventually only one is selected, and if the partition mode is selected, there is a corresponding sub-CU. The sub-CUs also have prediction modes and partition modes that can be selected. S1-S4 correspond to the coding process of a CU, and S5 is the coding process of a sub CU when the CU tries to divide the mode.

Experimental gain results

The test results of the video sequences A-E in the VVC standard general test sequence at the VTM10.2 encoder are as follows, and the experimental results show that the average encoding time of the algorithm provided by the invention is reduced by 61.43%, and BDBR is only increased by 2.67%.

TABLE 6 variation and time reduction of BDBR in VVC universal test sequences according to the proposed method

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A decision tree-based VVC fast encoding method, comprising:

s1: dividing the current CU into three types according to different sizes, establishing a decision tree for each type of CU, acquiring a parent CU coding mode, an adjacent CU coding mode and a pixel standard deviation of the current CU, utilizing the decision tree to acquire probability values of various prediction modes according to the acquired parent CU coding mode, the adjacent CU coding mode and the pixel standard deviation of the current CU, sequentially coding and testing various prediction modes according to probability from high to low, recording the probability sum of the coded prediction modes, ending the coding prediction mode when the probability sum reaches a set threshold value, skipping over an uncoded mode, and taking the prediction mode with the minimum rate distortion cost in the coded and tested prediction mode as the optimal mode;

s3: for a CU with an optimal prediction mode of MERGE/SKIP mode, acquiring a pixel standard deviation, a space-time adjacent CU mode, a historical coded CU mode rate distortion cost, horizontal and vertical motion characteristics and video resolution, and predicting whether the current CU terminates the coding flow in advance by using a decision tree according to the acquired pixel standard deviation, the space-time adjacent CU mode, the historical coded CU mode rate distortion cost, the horizontal and vertical motion characteristics and the video resolution, if yes, ending the coding flow, if not, entering S4;

2. The VVC fast encoding method based on the decision tree according to claim 1, wherein the current CU is classified into three categories, including:

there are 128-sized CUs in length and width, 4-sized CUs in length and width, and other-sized CUs.

3. The VVC fast encoding method based on the decision tree according to claim 1, wherein the parent CU encoding mode, the neighboring CU encoding mode and the pixel standard deviation of the current CU include:

the parent CU encoding mode: after a large CU is divided into smaller CUs through a certain dividing mode, the large CU is a parent CU, the large CU is divided into small CUs which are child CUs, the child CUs are part of the parent CUs, the large CUs often have similar motion information and texture characteristics, and the same prediction mode is used with high probability in the prediction coding process;

the neighboring CU coding modes: selecting coding modes of a left adjacent CU and an upper adjacent CU from the spatial adjacent CU modes, and selecting a mode of a reference CU from the temporal adjacent CU modes;

the reference CU: in the prediction mode, the motion vector pointing region is the reference CU;

the pixel standard deviation: the standard deviation of block pixels with simple texture and large flat area is smaller, and the standard deviation of block pixels with complex texture and intense motion is often larger, so the standard deviation of pixels is used for reflecting the texture complexity of CU.

4. The decision tree-based VVC fast encoding method of claim 1, wherein the spatio-temporal neighboring CU mode information, the historic encoded CU mode rate-distortion cost, the quantization parameter and the video resolution include:

the spatio-temporal neighboring CU mode information: if the adjacent CU and the father CU of the CU both adopt the MERGE/SKIP mode and the current CU also adopts the MERGE/SKIP mode, the texture of the CU area is quite simple, fine division is not needed, or the benefit brought by division is quite low;

the historic encoded CU mode rate distortion cost: the mode of the nearest n coded CUs with the same size and the corresponding rate distortion cost are stored in an encoder, and the average value of the modes and the corresponding rate distortion cost is calculated;

the quantization parameters: for influencing whether the CU is further partitioned;

the video resolution: and the method is used for influencing the final coding result and predicting whether to terminate the coding flow in advance.

5. The decision tree-based VVC fast encoding method of claim 1, wherein the horizontal and vertical motion feature, partition sequence, comprises:

the horizontal and vertical motion features: the motion vector is obtained through calculation of a motion vector of a region pointed by a current CU motion vector in a reference frame and is used for describing the motion condition of an object in the CU;

the division sequence: the partition mode is used for recording the CU.

6. The decision tree-based VVC fast encoding method of claim 1, wherein calculating the CU sub-block pixel standard deviation includes:

wherein delta _k Representing the standard deviation of pixels of the kth sub-block, W, H representing the width and height, x, respectively, of the CU _i,j Representing the pixel value of coordinates (i, j),representing the CU pixel mean.

7. The decision tree-based VVC fast encoding method of claim 1, wherein calculating the CU texture gradient direction includes:

and calculating a CU texture direction histogram through a Sobel operator, wherein the height of each item in the histogram represents the sum of gradient values in the direction, and the direction corresponding to the peak value of the histogram is the main texture direction of the CU.