CN111479110A

CN111479110A - Fast affine motion estimation method for H.266/VVC

Info

Publication number: CN111479110A
Application number: CN202010293694.8A
Authority: CN
Inventors: 张秋闻; 黄立勋; 蒋斌; 王祎菡; 赵进超; 吴庆岗; 常化文; 王晓; 张伟伟; 赵永博; 崔腾耀; 郭睿; 孟颍辉; 李祖贺; 黄伟; 甘勇
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou Light Industry Technology Research Institute Co ltd; Zhengzhou University of Light Industry
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-07-31
Anticipated expiration: 2040-04-15
Also published as: CN111479110B

Abstract

The invention provides a fast affine motion estimation method aiming at H.266/VVC, which comprises the following steps: calculating the texture complexity of the current CU by using the standard deviation, and dividing the current CU into a static area or a non-static area according to the texture complexity; for a CU in a static area, skipping affine motion estimation, directly predicting the current CU by using the motion estimation, and selecting an optimal prediction direction mode by a rate distortion optimization method; and for the CU in the non-static area, classifying the current CU by using a trained random forest classifier RFC model, and outputting an optimal prediction direction mode. For the CU in the static area, affine motion estimation is skipped, and the calculation complexity is reduced; for the CU in the non-static area, the prediction of the prediction direction mode is directly carried out through the model trained in advance, the calculation of affine motion estimation is avoided, and therefore the complexity of an affine motion estimation module is reduced.

Description

Fast affine motion estimation method for H.266/VVC

Technical Field

The invention relates to the technical field of image processing, in particular to a fast affine motion estimation method aiming at H.266/VVC.

Background

In the current information age, the demands for video services such as three-dimensional images, ultra-high definition videos, virtual reality and the like are increasing, and encoding and transmission of high-definition videos become hot problems of research. With the development and improvement of the H.266/VVC standard, the improvement of the video processing efficiency also drives the development of the video industry, and lays a foundation for the development of a new generation of video coding technology. The high-density data brings huge challenges to bandwidth and storage, and the current mainstream video coding standard can not meet the emerging application at present, so that a new generation video coding standard H.266/VVC comes to the end, and the requirements of people on definition, fluency and real-time degree of videos are met. The international organization for standardization ISO/IEC MPEG and ITU-T VCEG established the Joint Video expansion Team (jfet), which was responsible for the development of the next generation Video Coding standard h.266/universal Video Coding (VVC). The h.266/VVC is made for high definition video of 4K and above, the bit depth is mainly 10 bits, which is different from the positioning of h.265/HEVC, and this results in that the maximum block size of the current encoder becomes 128, the pixels processed in the middle of encoding are all 10 bits, and even the input 8-bit sequence is changed to 10-bit processing.

The H.266/VVC uses a mixed coding technology framework, image division is continuously developed from single and fixed division to various and flexible division structures, and the encoding and decoding processing of high-resolution images can be more efficiently adapted. In addition, the H.266/VVC expands new elements such as inter-frame-intra prediction, prediction signal filtering, transformation, quantization/scaling, entropy coding and the like of the original H.265/HEVC encoder aiming at new generation video data, considers the characteristics of the new generation video coding standard, and adds a new model prediction mode. In particular, the H.266/VVC adopts the motion estimation, motion compensation and motion vector prediction technologies of high-efficiency video coding H.265/HEVC inter-frame coding, and introduces some new technologies on the basis. For example, the Merge mode is expanded, a prediction motion vector based on history is added, new prediction methods such as an affine transformation technology, an adaptive motion vector precision method, 1/16 sampling precision motion prediction compensation and the like are added. The introduction of a plurality of advanced coding tools greatly improves the coding efficiency of the new generation video coding standard H.266/VVC. But also significantly increases the computational complexity of the h.266/VVC interframe coding due to the rate-distortion cost calculation, thereby significantly reducing the coding speed of the new generation of video.

The main principle of inter prediction is to find a best matching block in a previously coded picture for each block of pixels of the current picture, this process is called motion estimation ME, where the picture used for prediction is called a reference picture, the reference block is the best matching block in the reference picture, i.e. the reference block of pixels, the displacement of the reference block to the current block of pixels is called the motion vector MV, and the difference between the current block of pixels and the reference block is called the prediction residual. The ME algorithm is the most critical algorithm in the H.266/VVC video coding process, occupies more than half of the calculation amount and most of the operation time of the whole video coding, and is the dominant factor for determining the video compression efficiency. Motion estimation ME has been a research focus in video compression technology by effectively removing temporal redundancy between successive images. To improve compression efficiency, recent video codecs attempt to estimate motion of different shapes and sizes. Furthermore, by adding a multi-type tree, motion estimation ME can be performed on very thin blocks (e.g., width is one-eighth of height). Therefore, among the various modules of a multi-type tree (MTT), motion estimation ME is the tool with the highest coding complexity in VVC. Due to the more advanced inter prediction schemes performed recursively in the fine partitioned blocks of MTT, the computational complexity of motion estimation ME increases even more than in HEVC, since new techniques like affine motion estimation have also been tried by ME in Future Video Coding (FVC). The affine motion estimation AME is characterized by non-translational motion such as rotation and scaling, and is effective in Rate Distortion (RD) performance at the cost of high encoding complexity. The computational complexity of the affine motion estimation AME is a large part of the overall motion estimation ME processing time, and therefore it is very important to reduce its complexity. Therefore, to reduce the complexity of the VTM encoder, it is desirable to speed up the AME module.

The algorithm proposes an adaptive inter-mode decision algorithm for H.265/HEVC based on pyramid motion divergence, the algorithm proposes an early SKIP mode decision based on statistical analysis, the algorithm proposes an early SKIP mode decision based on prediction size inter-frame correlation and a mode decision based on Rate Distortion (RD) cost correlation, the algorithm proposes a decision method for predicting depth of coding unit in advance and an adaptive mode decision method for predicting depth of coding unit in advance based on a mesh mode after calculating RD cost of 2N × 2N, the algorithm proposes an early SKIP mode decision method based on prediction size inter-frame correlation, the algorithm proposes a high correlation method for texture video and depth map content based on the high correlation of the texture video and the depth map content, the algorithm proposes a depth level of coding unit in advance and an adaptive mode method based on a high complexity calculation for reducing complexity of HEVC, and the algorithm proposes a high-speed prediction algorithm based on a high-complexity prediction algorithm based on a high-speed intra-frame prediction algorithm based on a high-prediction accuracy coefficient prediction algorithm, a high-speed prediction algorithm based on a high-speed prediction algorithm, a high-speed prediction algorithm, a high-speed prediction algorithm, a high-accuracy prediction algorithm, a high-prediction algorithm, a high-based on a prediction algorithm, a high-based on a high-prediction algorithm, a high-based on.

The method mainly utilizes dependency in an H.266/VVC prediction structure, reduces coding complexity by minimizing the maximum value of a CU reference frame search range based on prediction information of parent nodes, reduces coding complexity by Z.Wang et al, proposes a confidence interval-based quad-tree plus binary tree (QTBT) partition scheme, establishes a motion dispersion field-based Rate Distortion (RD) model to estimate rate distortion cost of each partition mode, and further eliminates unnecessary iterative partition by early terminating block partition of H.266/VVC coding based on the model to eliminate unnecessary iterative partition, so that a good balance between H.266/VVC coding performance and coding complexity is obtained, a small number of motion vector prediction modes are obtained, and a motion vector prediction algorithm is designed for reducing motion vector prediction error in a conventional coding scheme, thereby reducing coding complexity of a current block, and reducing motion vector prediction error of a coding complexity of a current block by using a convolutional encoder, a decoder, a convolutional encoder, a decoder, a convolutional encoder, a decoder, and a decoder for reducing motion vector coding error of motion vector coding complexity of a coding algorithm for reducing coding complexity of coding complexity.

Disclosure of Invention

Aiming at the defects in the background technology, the invention provides a fast affine motion estimation method aiming at H.266/VVC, and solves the technical problem of high AME encoding complexity in the affine motion estimation in VTM.

The technical scheme of the invention is realized as follows:

a fast affine motion estimation method for H.266/VVC comprises the following steps:

s1, calculating the texture complexity SD of the current CU by using the standard deviation, and dividing the current CU into static areas or non-static areas according to the texture complexity SD;

s2, for the CU in the static area, skipping affine motion estimation AME, directly predicting the current CU by using motion estimation CME, and selecting the best prediction direction mode by a rate distortion optimization method;

and S3, classifying the current CU by using the trained RFC model of the random forest classifier for the CU in the non-static area, and outputting the best prediction direction mode.

The method for calculating the texture complexity SD of the current CU by using the standard deviation comprises the following steps:

where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b) in the CU.

The method for predicting the current CU by utilizing the motion estimation CME and selecting the optimal prediction direction mode by the rate distortion optimization method comprises the following steps:

s21, firstly, the current CU is subjected to unidirectional prediction Uni-L0, then unidirectional prediction Uni-L1 and finally bidirectional prediction Bi;

s22, respectively calculating the rate distortion cost of the current CU respectively subjected to unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi in the step S21 by using utilization rate distortion optimization;

and S23, taking the prediction mode with the minimum rate distortion cost as the optimal prediction direction mode.

The rate-distortion cost calculation methods of the unidirectional prediction Uni-L0, the unidirectional prediction Uni-L1 and the bidirectional prediction Bi are as follows:

wherein the content of the first and second substances,

a set of all available reference lists is represented,

representing a set of reference lists, L0 and L1 representing two reference frame lists, [ phi ] (J) representing reference frames in the reference lists, J (-) being a rate-distortion cost function, and

d (-) represents the distortion degree of CU coding, lambda represents a Lagrangian multiplier, and R (-) represents the number of bits consumed by CU coding.

The training method of the random forest classifier RFC model in the step S3 comprises the following steps:

s31, selecting Traffic, Kimono, BQSquare, RaceHorseC and FourPeople video sequences under different resolutions from the universal test sequence, respectively coding the previous M frames on the VTM, and simultaneously recording the shape of the CU, the texture complexity of the CU and three prediction direction modes of the CU in the VTM as a data set, wherein the data set comprises a sample set S and a test set T, and the three prediction direction modes comprise unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;

s32, resampling the sample set S by using a Bootstrap method, and generating K training sample sets

Each training set to be generated

As root nodes, corresponding decision trees { T } are generated₁,T₂,...,T_KWhere i ═ 1,2, …, K denotes the ith training sample, and K denotes the size of the training sample set;

s33, starting training from a root node, randomly selecting m characteristic attributes on each intermediate node of a decision tree, calculating a Gini index coefficient of each characteristic attribute, selecting the characteristic attribute with the minimum Gini index coefficient as the optimal splitting attribute of the current node, and dividing the m characteristic attributes into a left sub-tree and a right sub-tree by taking the minimum Gini index coefficient as a splitting threshold;

s34, repeating the step S33, training K 'times until the K' decision trees are trained completely, and enabling each decision tree to grow completely without pruning;

s35, the generated decision trees are random forest classifier RFC models, the random forest classifier RFC models are used for distinguishing and classifying the test set T, the classification result adopts a voting mode, the most categories output by the K' decision trees are used as the categories of the test set T, and the best prediction direction mode of the current CU is obtained.

The method for obtaining the data set in step S31 includes:

s31.1, predicting a video sequence by utilizing motion estimation CME;

s31.2, carrying out affine prediction on the video sequence predicted in the step S31.1 by utilizing a 4-parameter affine motion model, wherein the affine prediction comprises unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;

s31.3, carrying out radial prediction on the video sequence subjected to affine prediction in the step S31.2 by using a 6-parameter affine motion model;

and S31.4, respectively calculating rate-distortion costs after affine prediction in the steps S31.2 and S31.3, and taking the prediction mode corresponding to the minimum rate-distortion cost as the prediction direction mode of the video sequence.

The characteristic attributes comprise a two-dimensional haar wavelet transform horizontal coefficient, a two-dimensional haar wavelet transform vertical coefficient, a two-dimensional haar wavelet transform angle coefficient, an angular second moment, contrast, entropy, an inverse difference moment, a minimum difference sum and gradient.

For the 4-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:

wherein (mv)_0x,mv_0y) Is the motion vector of the upper left corner control point, (mv)_1x,mv_1y) Is the motion vector of the top right control point, W represents the CU width;

for the 6-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:

wherein (mv)_2x,mv_2y) The motion vector control point in the lower left corner, H denotes CU high.

The beneficial effect that this technical scheme can produce: according to the method, firstly, a standard deviation SD is utilized to divide a CU into a static area and a non-static area, if the CU belongs to the static area, the probability of selecting a SKIP mode for inter-frame prediction is high, and the static area which tends to select the SKIP mode for inter-frame prediction does not need to be subjected to affine prediction, so that an affine motion estimation AME module can be terminated in the static area in advance, and the optimal direction mode of the current CU is the optimal direction mode of motion estimation CME; if the CU belongs to the non-static area, judging the inter-frame prediction mode of the CU according to the random forest classification model, and finally obtaining the optimal prediction direction mode in advance; therefore, the invention reduces the calculation complexity and saves the encoding time, thereby realizing the fast encoding of H.266/VVC.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a predicted directional pattern complexity profile of the present invention;

FIG. 3 is a 4-parameter affine model of the present invention;

FIG. 4 is a 6-parameter affine model of the present invention;

FIG. 5 is an overall process diagram of the motion estimation ME of the present invention;

FIG. 6 is a graph of the overall run time comparison of the inventive process and the FAME process.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a fast affine motion estimation method for h.266/VVC, which includes the following specific steps:

s1, in the image encoding process, the image content of a single area is often encoded by using a larger CU. In contrast, regions with rich detail are typically coded using smaller CUs. Therefore, the texture complexity of the coding block is used to determine whether the CU uses the SKIP mode for inter-frame prediction. In the process of image coding, the region with single image content tends to be coded by inter-frame prediction by using an SKIP mode, and the region with rich details has little possibility of inter-frame prediction by using the SKIP mode. The variance of a CU represents the dispersion of energy between two pixels of the current block, so the texture complexity of a block can be roughly measured by its standard deviation SD, and therefore, the texture complexity SD of the current CU is calculated by using the standard deviation SD, and the current CU is divided into static areas or non-static areas according to the texture complexity SD; the formula for the standard deviation is:

where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b) in the CU. Since the texture complexity of the neighboring blocks has a correlation with the CU, the threshold for classification is derived from the texture complexity of the neighboring blocks. According to a large amount of experimental data, the minimum value of standard deviations SD of adjacent blocks of CU is used as Th_staticIs reasonable. The CU may be classified by a threshold. If the current standard deviation SD is less than the threshold Th_staticThen it indicates that the current CU is a static area. Conversely, if the value of the standard deviation SD is greater than Th_staticThen the current CU belongs to a non-static area.

S2, the existing video coding standards (e.g. h.265/HEVC) use motion vectors MV covering translational motion for motion estimation CME, however, affine motion estimation AME can not only predict translational motion but also predict linear transformation motion, such as scaling and rotation, if the camera is scaled or rotated to capture video, affine motion estimation AME predicts motion more accurately than motion estimation CME in h.266/VVC, affine motion estimation AME is the same as motion estimation CME, starting with Uni-directional prediction Uni-prediction L0, then Uni-prediction L, and finally Bi-prediction, after calculating three prediction direction modes, the method of utilization distortion optimization (RDO) selects the best prediction direction mode, fig. 2 shows the distribution of the complexity of motion estimation AME prediction modes between Uni-prediction L and Uni-prediction uniprediction 120, and if the Uni-prediction modes are more motion prediction modes than Uni-prediction modes, it is more than Uni-prediction modes that do not require Uni-prediction along with Uni-prediction model 12. prediction 99. prediction, if the Uni-prediction modes are more than Uni-prediction modes, then Uni-prediction modes require Uni-prediction along prediction model coding 99. 5. prediction along with the same prediction coefficients RD prediction coefficients, if the same prediction coefficients are more than Uni-prediction modes, the prediction modes, and the prediction modes are more than the Uni-prediction modes, the prediction modes are needed along prediction modes, and the prediction modes, the prediction modes are more than the prediction modes of Uni-prediction modes of prediction block prediction modes of prediction.

To obtain the optimal motion vector MV and the optimal reference frame, the encoder searches a plurality of available reference frames, calculates a rate distortion RD cost J (-) using a lagrangian multiplier method, which calculates a rate distortion RD cost function J (-) as:

because the two reference frame lists represented by Uni-directional prediction Uni-L0 and Uni-directional prediction Uni-L1 are used for motion prediction, the motion estimation ME process for Uni-directional prediction should be tested with the two lists to generate all available frames in the two lists.

For a CU in a static area, skipping affine motion estimation AME, directly predicting the current CU by using motion estimation CME, and selecting the optimal prediction direction mode by a rate distortion optimization method; the specific method comprises the following steps:

s21, the current CU is subjected to unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi.

S22, respectively calculating the rate distortion cost of the one-way prediction Uni-L0, the one-way prediction Uni-L1 and the two-way prediction Bi by using the utilization rate distortion optimization;

the rate-distortion cost of the unidirectional prediction Uni-L0, the unidirectional prediction Uni-L1 and the bidirectional prediction Bi are respectively as follows:

wherein the content of the first and second substances,

a set of all available reference lists is represented,

representing a set of reference lists, L0 and L1 representing two reference frame lists, phi (J) representing the reference frames in the reference lists, and J (-) being a rate-distortion cost function.

S3, for the CU in the non-static area, the condition of skipping the process of Affine Motion Estimation (AME) is not met, the current CU is classified by using the trained random forest classifier RFC model, and the optimal prediction direction mode is output, so that the calculation complexity is further reduced. The random forest algorithm generates K self-help sample sets based on Bootstrap resampling, and the data of each sample set grows into a decision tree; at the node of each tree, M (M < < M ') features are randomly extracted from M' feature vectors based on a random subspace method RSM. According to a certain node splitting algorithm, selecting the optimal attribute from the m characteristic attributes to carry out branch growth; finally, K' decision trees are combined to carry out mode voting. After the random forest classifier is generated, testing a random forest classifier model, independently judging a classification result for each tree in the forest, finally deciding to take the classification category with the most same judgment, expressing the classification category with a formula as follows,

wherein H (t) represents a combined classification model, h_i(t) is a single classification tree model, t represents the characteristic attribute of the decision tree, Y represents the output variable, and I (-) represents the collective representational function (i.e. when some classification result appears in the collection, the function value is 1, otherwise, it is 0).

When the CU is traversed, the characteristics of the CU and the prediction direction mode of the CU are recorded, and the normal encoding process is not interfered. Resampling by a Bagging integration method to generate a plurality of training sets, randomly and equivalently extracting samples from an original training sample set, repeatedly extracting and returning to generate K new training sample sets, and finally obtaining K new training sample sets. And after the sample is extracted, entering a training module of a random forest classifier model. Table 1 shows the relevant training parameter settings established by the RFC model of the random forest classifier.

TABLE 1 training parameter configuration

According to the parameters of the table 1, the training method of the random forest classifier RFC model comprises the following steps:

s31, selecting a sample set, selecting a video sequence which can cover rich texture complexity and is selected from a universal test sequence, coding the video sequence with Traffic, Kimono, BQSquad, RaceHorseC and FourPeople under different resolutions on a VTM, respectively coding 50 frames before the video sequence on the VTM, and simultaneously recording three prediction direction modes of the shape of a CU, the texture complexity of the CU and the CU in the VTM as a data set, wherein the data set comprises a sample set S-20 and a test set T-30, and the three prediction direction modes comprise unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;

in the VTM, the block of affine motion is also predicted in three ways, namely, unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi., and simultaneously, the affine prediction also comprises 4-parameter and 6-parameter affine models, the unidirectional prediction or the bidirectional prediction of the affine motion estimation AME module requires related reference frames, thereby increasing the encoding complexity of the VTM, when only calculating the reference frame number required by each affine motion estimation AME module, the affine motion estimation AME process requires twice the reference frame number of the motion estimation CME process, the whole motion estimation ME process is shown in FIG. 5, and as can be known from FIG. 5, the method for obtaining the data set in step S31 is as follows:

s31.1, predicting the video sequence by utilizing the motion estimation CME, wherein the prediction method is the same as the step S21;

as shown in fig. 3, the motion vector of the sample position (x, y) in CU of the 4-parameter affine motion model is:

as shown in fig. 4, the motion vector for the sample position (x, y) in the block of the 6-parameter affine motion model is:

wherein (mv)_2x,mv_2y) Motion vector control point in lower left corner, H denotes CU high。

Each training set to be generated

the feature attributes selected by the present invention include two-dimensional Haar wavelet transform horizontal coefficients (2D Haar wavelet transform vertical coefficients, L H), two-dimensional Haar wavelet transform angle coefficients (2D Haar wavelet transform angle coefficients, HH), angular second moments (Absolute differences, ASM), contrast (contrast, entropy), entropy Difference (Difference, Difference of entropy), and minimum Difference of the feature vectors (SAD) as the feature vectors of the random forest classifier model:

the two-dimensional haar wavelet transform horizontal coefficient H L of the image represents the texture in the horizontal direction of the image, the larger the value is, the richer the texture in the horizontal direction is, the smaller the value is, the flatter the texture in the horizontal direction is, the two-dimensional haar wavelet transform vertical coefficient L H of the image represents the texture in the vertical direction of the image, the larger the value is, the richer the texture in the vertical direction is, the smaller the value is, the flatter the texture in the vertical direction is, the two-dimensional haar wavelet transform angle coefficient HH of the image represents the texture in the vertical direction of the image, the larger the value is, the richer the texture in the 45 degree direction is, the smaller the value is, the two-dimensional haar wavelet transform horizontal coefficient H L, the two-dimensional haar wavelet transform vertical coefficient L H and the two-dimensional ha:

where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b).

The angle second moment ASM reflects the uniformity degree of gray distribution and the thickness of texture, and the larger the value is, the more uniform the texture distribution of the image is; the texture depth of the contrast CON reaction image is larger when the value is larger, which indicates that the texture depth is larger; the entropy ENT represents the information amount of the image, and the larger the value is, the larger the information amount of the image is; the inverse difference moment IDM reflects the magnitude of local texture variation of the image, different regions of the texture of the image are relatively uniform and slowly varied, and the angular second moment ASM, the contrast CON, the entropy ENT, and the inverse difference moment IDE are respectively expressed as:

in block matching based motion estimation algorithms, the decision criteria for the best matching block are many, we use the minimum difference sum SAD, the smaller SAD indicates that the reference block is closer to the current prediction block, which is expressed as:

wherein, P_k(a, b) represents the value of the current pixel, (a, b) represents the coordinates of the current pixel, P_k-1(a + i ', b + j') is a reference pixel value, and (a + i ', b + j') represents the coordinates of the reference pixel.

The gradient represents the texture direction of the CU, using the gradient of the horizontal and vertical directions of the luminance sample as a characteristic property. The gradients in the horizontal and vertical directions are expressed as:

G_x(a,b)＝P(a+1,b)-P(a,b)+P(a+1,b+1)-P(a,b+1)，

G_y(a,b)＝P(a,b)-P(a,b+1)+P(a+1,b)-P(a+1,b+1)，

wherein G is_x(a, b) and G_yAnd (a, b) respectively represent gradient components of the current pixel in horizontal and vertical directions. (a, b) represents the coordinates of the pixel, and P (a, b) represents the pixel value.

S34, repeating the step S33, training K '25 times until the training of K' decision trees is completed, and enabling each decision tree to grow completely without pruning;

s35, the generated multiple decision trees are random forest classifier RFC models, the random forest classifier RFC models are used for distinguishing and classifying the test set T, the classification result adopts a voting mode, the most categories output by the K' decision trees are used as the categories of the test set T, the best prediction direction mode of the current CU is obtained, and the calculation complexity of the affine motion estimation AME module is reduced.

To evaluate the method of the present invention, simulation tests were performed on the latest H.266/VVC encoder (VTM 7.0). The test video sequence is encoded in the "Random Access" configuration using default parameters. The BDBR reflects the compression performance of the present invention, and the reduction in time represents a reduction in complexity. Table 2 shows the coding characteristics of the present invention, the total coding time average of the present invention is reduced to 87%, and the affine motion estimation AME time average is reduced to 56%. Therefore, the invention can effectively save the coding time, and the loss of the RD performance can be ignored.

TABLE 2 encoding characteristics of the invention

From table 2 it can be seen that the RD performance and the saved encoding run time of the present invention compared to VTM. It is possible that the experimental results may fluctuate for different test videos, but the method proposed by the present invention is effective. Compared with VTM, the invention can effectively reduce the complexity of the affine motion estimation AME module and has good RD performance.

The affine motion estimation AME module time is measured according to different Quantization Parameters (QPs). When the quantization parameter QP is 22, as can be seen from fig. 6, the affine motion estimation AME module time for all video sequences amounts to about 36 hours. However, in the method of the invention, the time of the affine motion estimation AME module is reduced by about 9 hours. It can be seen that this trend is similar under other quantization parameters QPs. Thus, it is more intuitive to observe from fig. 6 that the proposed method reduces the encoding time of the affine motion estimation AME module, thereby reducing the computational complexity.

The technical scheme of the invention is described in detail in combination with the drawings, and the technical scheme of the invention provides a fast affine motion estimation method for H.266/VVC, so that the AME encoding complexity of affine motion estimation in VTM is effectively reduced. Firstly, a CU is divided into a static area and a non-static area by using a standard deviation SD, if the CU belongs to the static area, the probability of selecting a SKIP mode for inter-frame prediction is high, and the static area which tends to select the SKIP mode for inter-frame prediction does not need to be subjected to affine prediction, so that an affine motion estimation AME module can be terminated in the static area in advance, and the optimal direction mode of the current CU is the optimal direction mode of motion estimation CME. And if the CU belongs to the non-static area, judging the inter-frame prediction mode of the CU according to the random forest classification model, and finally obtaining the optimal prediction direction mode in advance. Therefore, the invention reduces the calculation complexity and saves the encoding time, thereby realizing the fast encoding of H.266/VVC.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A fast affine motion estimation method for H.266/VVC is characterized by comprising the following steps:

2. The fast affine motion estimation method for h.266/VVC according to claim 1, wherein the method for calculating the texture complexity SD of the current CU using the standard deviation is:

3. The fast affine motion estimation method for h.266/VVC as claimed in claim 1, wherein the method of predicting the current CU by using motion estimation CME and selecting the best prediction direction mode by rate distortion optimization is:

4. The fast affine motion estimation method for h.266/VVC as claimed in claim 3, wherein the rate distortion cost calculation methods of the Uni-directional prediction Uni-L0, Uni-directional prediction Uni-L1 and Bi-directional prediction Bi are all as follows:

wherein the content of the first and second substances,

a set of all available reference lists is represented,

5. The fast affine motion estimation method for h.266/VVC according to claim 1, wherein the training method of the RFC model of the random forest classifier in the step S3 is as follows:

Each training set to be generated

6. The fast affine motion estimation method for h.266/VVC as claimed in claim 5, wherein the method of obtaining data set in step S31 is:

s31.1, predicting a video sequence by utilizing motion estimation CME;

7. The method of claim 5 for fast affine motion estimation for h.266/VVC, characterized in that the feature attributes comprise two-dimensional haar wavelet transform horizontal coefficients, two-dimensional haar wavelet transform vertical coefficients, two-dimensional haar wavelet transform angle coefficients, angular second moments, contrast, entropy, inverse difference moments, minimum difference values and gradients.

8. The fast affine motion estimation method for h.266/VVC as claimed in claim 6, wherein the motion vector of the sample position (x, y) in CU of said 4 parameter affine motion model is: