CN111479110A - Fast affine motion estimation method for H.266/VVC - Google Patents

Fast affine motion estimation method for H.266/VVC Download PDF

Info

Publication number
CN111479110A
CN111479110A CN202010293694.8A CN202010293694A CN111479110A CN 111479110 A CN111479110 A CN 111479110A CN 202010293694 A CN202010293694 A CN 202010293694A CN 111479110 A CN111479110 A CN 111479110A
Authority
CN
China
Prior art keywords
prediction
motion estimation
uni
affine motion
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010293694.8A
Other languages
Chinese (zh)
Other versions
CN111479110B (en
Inventor
张秋闻
黄立勋
蒋斌
王祎菡
赵进超
吴庆岗
常化文
王晓
张伟伟
赵永博
崔腾耀
郭睿
孟颍辉
李祖贺
黄伟
甘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Light Industry Technology Research Institute Co ltd
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202010293694.8A priority Critical patent/CN111479110B/en
Publication of CN111479110A publication Critical patent/CN111479110A/en
Application granted granted Critical
Publication of CN111479110B publication Critical patent/CN111479110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/543Motion estimation other than block-based using regions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria

Abstract

The invention provides a fast affine motion estimation method aiming at H.266/VVC, which comprises the following steps: calculating the texture complexity of the current CU by using the standard deviation, and dividing the current CU into a static area or a non-static area according to the texture complexity; for a CU in a static area, skipping affine motion estimation, directly predicting the current CU by using the motion estimation, and selecting an optimal prediction direction mode by a rate distortion optimization method; and for the CU in the non-static area, classifying the current CU by using a trained random forest classifier RFC model, and outputting an optimal prediction direction mode. For the CU in the static area, affine motion estimation is skipped, and the calculation complexity is reduced; for the CU in the non-static area, the prediction of the prediction direction mode is directly carried out through the model trained in advance, the calculation of affine motion estimation is avoided, and therefore the complexity of an affine motion estimation module is reduced.

Description

Fast affine motion estimation method for H.266/VVC
Technical Field
The invention relates to the technical field of image processing, in particular to a fast affine motion estimation method aiming at H.266/VVC.
Background
In the current information age, the demands for video services such as three-dimensional images, ultra-high definition videos, virtual reality and the like are increasing, and encoding and transmission of high-definition videos become hot problems of research. With the development and improvement of the H.266/VVC standard, the improvement of the video processing efficiency also drives the development of the video industry, and lays a foundation for the development of a new generation of video coding technology. The high-density data brings huge challenges to bandwidth and storage, and the current mainstream video coding standard can not meet the emerging application at present, so that a new generation video coding standard H.266/VVC comes to the end, and the requirements of people on definition, fluency and real-time degree of videos are met. The international organization for standardization ISO/IEC MPEG and ITU-T VCEG established the Joint Video expansion Team (jfet), which was responsible for the development of the next generation Video Coding standard h.266/universal Video Coding (VVC). The h.266/VVC is made for high definition video of 4K and above, the bit depth is mainly 10 bits, which is different from the positioning of h.265/HEVC, and this results in that the maximum block size of the current encoder becomes 128, the pixels processed in the middle of encoding are all 10 bits, and even the input 8-bit sequence is changed to 10-bit processing.
The H.266/VVC uses a mixed coding technology framework, image division is continuously developed from single and fixed division to various and flexible division structures, and the encoding and decoding processing of high-resolution images can be more efficiently adapted. In addition, the H.266/VVC expands new elements such as inter-frame-intra prediction, prediction signal filtering, transformation, quantization/scaling, entropy coding and the like of the original H.265/HEVC encoder aiming at new generation video data, considers the characteristics of the new generation video coding standard, and adds a new model prediction mode. In particular, the H.266/VVC adopts the motion estimation, motion compensation and motion vector prediction technologies of high-efficiency video coding H.265/HEVC inter-frame coding, and introduces some new technologies on the basis. For example, the Merge mode is expanded, a prediction motion vector based on history is added, new prediction methods such as an affine transformation technology, an adaptive motion vector precision method, 1/16 sampling precision motion prediction compensation and the like are added. The introduction of a plurality of advanced coding tools greatly improves the coding efficiency of the new generation video coding standard H.266/VVC. But also significantly increases the computational complexity of the h.266/VVC interframe coding due to the rate-distortion cost calculation, thereby significantly reducing the coding speed of the new generation of video.
The main principle of inter prediction is to find a best matching block in a previously coded picture for each block of pixels of the current picture, this process is called motion estimation ME, where the picture used for prediction is called a reference picture, the reference block is the best matching block in the reference picture, i.e. the reference block of pixels, the displacement of the reference block to the current block of pixels is called the motion vector MV, and the difference between the current block of pixels and the reference block is called the prediction residual. The ME algorithm is the most critical algorithm in the H.266/VVC video coding process, occupies more than half of the calculation amount and most of the operation time of the whole video coding, and is the dominant factor for determining the video compression efficiency. Motion estimation ME has been a research focus in video compression technology by effectively removing temporal redundancy between successive images. To improve compression efficiency, recent video codecs attempt to estimate motion of different shapes and sizes. Furthermore, by adding a multi-type tree, motion estimation ME can be performed on very thin blocks (e.g., width is one-eighth of height). Therefore, among the various modules of a multi-type tree (MTT), motion estimation ME is the tool with the highest coding complexity in VVC. Due to the more advanced inter prediction schemes performed recursively in the fine partitioned blocks of MTT, the computational complexity of motion estimation ME increases even more than in HEVC, since new techniques like affine motion estimation have also been tried by ME in Future Video Coding (FVC). The affine motion estimation AME is characterized by non-translational motion such as rotation and scaling, and is effective in Rate Distortion (RD) performance at the cost of high encoding complexity. The computational complexity of the affine motion estimation AME is a large part of the overall motion estimation ME processing time, and therefore it is very important to reduce its complexity. Therefore, to reduce the complexity of the VTM encoder, it is desirable to speed up the AME module.
The algorithm proposes an adaptive inter-mode decision algorithm for H.265/HEVC based on pyramid motion divergence, the algorithm proposes an early SKIP mode decision based on statistical analysis, the algorithm proposes an early SKIP mode decision based on prediction size inter-frame correlation and a mode decision based on Rate Distortion (RD) cost correlation, the algorithm proposes a decision method for predicting depth of coding unit in advance and an adaptive mode decision method for predicting depth of coding unit in advance based on a mesh mode after calculating RD cost of 2N × 2N, the algorithm proposes an early SKIP mode decision method based on prediction size inter-frame correlation, the algorithm proposes a high correlation method for texture video and depth map content based on the high correlation of the texture video and the depth map content, the algorithm proposes a depth level of coding unit in advance and an adaptive mode method based on a high complexity calculation for reducing complexity of HEVC, and the algorithm proposes a high-speed prediction algorithm based on a high-complexity prediction algorithm based on a high-speed intra-frame prediction algorithm based on a high-prediction accuracy coefficient prediction algorithm, a high-speed prediction algorithm based on a high-speed prediction algorithm, a high-speed prediction algorithm, a high-speed prediction algorithm, a high-accuracy prediction algorithm, a high-prediction algorithm, a high-based on a prediction algorithm, a high-based on a high-prediction algorithm, a high-based on.
The method mainly utilizes dependency in an H.266/VVC prediction structure, reduces coding complexity by minimizing the maximum value of a CU reference frame search range based on prediction information of parent nodes, reduces coding complexity by Z.Wang et al, proposes a confidence interval-based quad-tree plus binary tree (QTBT) partition scheme, establishes a motion dispersion field-based Rate Distortion (RD) model to estimate rate distortion cost of each partition mode, and further eliminates unnecessary iterative partition by early terminating block partition of H.266/VVC coding based on the model to eliminate unnecessary iterative partition, so that a good balance between H.266/VVC coding performance and coding complexity is obtained, a small number of motion vector prediction modes are obtained, and a motion vector prediction algorithm is designed for reducing motion vector prediction error in a conventional coding scheme, thereby reducing coding complexity of a current block, and reducing motion vector prediction error of a coding complexity of a current block by using a convolutional encoder, a decoder, a convolutional encoder, a decoder, a convolutional encoder, a decoder, and a decoder for reducing motion vector coding error of motion vector coding complexity of a coding algorithm for reducing coding complexity of coding complexity.
Disclosure of Invention
Aiming at the defects in the background technology, the invention provides a fast affine motion estimation method aiming at H.266/VVC, and solves the technical problem of high AME encoding complexity in the affine motion estimation in VTM.
The technical scheme of the invention is realized as follows:
a fast affine motion estimation method for H.266/VVC comprises the following steps:
s1, calculating the texture complexity SD of the current CU by using the standard deviation, and dividing the current CU into static areas or non-static areas according to the texture complexity SD;
s2, for the CU in the static area, skipping affine motion estimation AME, directly predicting the current CU by using motion estimation CME, and selecting the best prediction direction mode by a rate distortion optimization method;
and S3, classifying the current CU by using the trained RFC model of the random forest classifier for the CU in the non-static area, and outputting the best prediction direction mode.
The method for calculating the texture complexity SD of the current CU by using the standard deviation comprises the following steps:
Figure BDA0002451385330000041
where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b) in the CU.
The method for predicting the current CU by utilizing the motion estimation CME and selecting the optimal prediction direction mode by the rate distortion optimization method comprises the following steps:
s21, firstly, the current CU is subjected to unidirectional prediction Uni-L0, then unidirectional prediction Uni-L1 and finally bidirectional prediction Bi;
s22, respectively calculating the rate distortion cost of the current CU respectively subjected to unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi in the step S21 by using utilization rate distortion optimization;
and S23, taking the prediction mode with the minimum rate distortion cost as the optimal prediction direction mode.
The rate-distortion cost calculation methods of the unidirectional prediction Uni-L0, the unidirectional prediction Uni-L1 and the bidirectional prediction Bi are as follows:
Figure BDA0002451385330000042
Figure BDA0002451385330000043
wherein the content of the first and second substances,
Figure BDA0002451385330000044
a set of all available reference lists is represented,
Figure BDA0002451385330000045
representing a set of reference lists, L0 and L1 representing two reference frame lists, [ phi ] (J) representing reference frames in the reference lists, J (-) being a rate-distortion cost function, and
Figure BDA0002451385330000046
d (-) represents the distortion degree of CU coding, lambda represents a Lagrangian multiplier, and R (-) represents the number of bits consumed by CU coding.
The training method of the random forest classifier RFC model in the step S3 comprises the following steps:
s31, selecting Traffic, Kimono, BQSquare, RaceHorseC and FourPeople video sequences under different resolutions from the universal test sequence, respectively coding the previous M frames on the VTM, and simultaneously recording the shape of the CU, the texture complexity of the CU and three prediction direction modes of the CU in the VTM as a data set, wherein the data set comprises a sample set S and a test set T, and the three prediction direction modes comprise unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
s32, resampling the sample set S by using a Bootstrap method, and generating K training sample sets
Figure BDA0002451385330000051
Each training set to be generated
Figure BDA0002451385330000052
As root nodes, corresponding decision trees { T } are generated1,T2,...,TKWhere i ═ 1,2, …, K denotes the ith training sample, and K denotes the size of the training sample set;
s33, starting training from a root node, randomly selecting m characteristic attributes on each intermediate node of a decision tree, calculating a Gini index coefficient of each characteristic attribute, selecting the characteristic attribute with the minimum Gini index coefficient as the optimal splitting attribute of the current node, and dividing the m characteristic attributes into a left sub-tree and a right sub-tree by taking the minimum Gini index coefficient as a splitting threshold;
s34, repeating the step S33, training K 'times until the K' decision trees are trained completely, and enabling each decision tree to grow completely without pruning;
s35, the generated decision trees are random forest classifier RFC models, the random forest classifier RFC models are used for distinguishing and classifying the test set T, the classification result adopts a voting mode, the most categories output by the K' decision trees are used as the categories of the test set T, and the best prediction direction mode of the current CU is obtained.
The method for obtaining the data set in step S31 includes:
s31.1, predicting a video sequence by utilizing motion estimation CME;
s31.2, carrying out affine prediction on the video sequence predicted in the step S31.1 by utilizing a 4-parameter affine motion model, wherein the affine prediction comprises unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
s31.3, carrying out radial prediction on the video sequence subjected to affine prediction in the step S31.2 by using a 6-parameter affine motion model;
and S31.4, respectively calculating rate-distortion costs after affine prediction in the steps S31.2 and S31.3, and taking the prediction mode corresponding to the minimum rate-distortion cost as the prediction direction mode of the video sequence.
The characteristic attributes comprise a two-dimensional haar wavelet transform horizontal coefficient, a two-dimensional haar wavelet transform vertical coefficient, a two-dimensional haar wavelet transform angle coefficient, an angular second moment, contrast, entropy, an inverse difference moment, a minimum difference sum and gradient.
For the 4-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:
Figure BDA0002451385330000053
wherein (mv)0x,mv0y) Is the motion vector of the upper left corner control point, (mv)1x,mv1y) Is the motion vector of the top right control point, W represents the CU width;
for the 6-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:
Figure BDA0002451385330000061
wherein (mv)2x,mv2y) The motion vector control point in the lower left corner, H denotes CU high.
The beneficial effect that this technical scheme can produce: according to the method, firstly, a standard deviation SD is utilized to divide a CU into a static area and a non-static area, if the CU belongs to the static area, the probability of selecting a SKIP mode for inter-frame prediction is high, and the static area which tends to select the SKIP mode for inter-frame prediction does not need to be subjected to affine prediction, so that an affine motion estimation AME module can be terminated in the static area in advance, and the optimal direction mode of the current CU is the optimal direction mode of motion estimation CME; if the CU belongs to the non-static area, judging the inter-frame prediction mode of the CU according to the random forest classification model, and finally obtaining the optimal prediction direction mode in advance; therefore, the invention reduces the calculation complexity and saves the encoding time, thereby realizing the fast encoding of H.266/VVC.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a predicted directional pattern complexity profile of the present invention;
FIG. 3 is a 4-parameter affine model of the present invention;
FIG. 4 is a 6-parameter affine model of the present invention;
FIG. 5 is an overall process diagram of the motion estimation ME of the present invention;
FIG. 6 is a graph of the overall run time comparison of the inventive process and the FAME process.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a fast affine motion estimation method for h.266/VVC, which includes the following specific steps:
s1, in the image encoding process, the image content of a single area is often encoded by using a larger CU. In contrast, regions with rich detail are typically coded using smaller CUs. Therefore, the texture complexity of the coding block is used to determine whether the CU uses the SKIP mode for inter-frame prediction. In the process of image coding, the region with single image content tends to be coded by inter-frame prediction by using an SKIP mode, and the region with rich details has little possibility of inter-frame prediction by using the SKIP mode. The variance of a CU represents the dispersion of energy between two pixels of the current block, so the texture complexity of a block can be roughly measured by its standard deviation SD, and therefore, the texture complexity SD of the current CU is calculated by using the standard deviation SD, and the current CU is divided into static areas or non-static areas according to the texture complexity SD; the formula for the standard deviation is:
Figure BDA0002451385330000071
where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b) in the CU. Since the texture complexity of the neighboring blocks has a correlation with the CU, the threshold for classification is derived from the texture complexity of the neighboring blocks. According to a large amount of experimental data, the minimum value of standard deviations SD of adjacent blocks of CU is used as ThstaticIs reasonable. The CU may be classified by a threshold. If the current standard deviation SD is less than the threshold ThstaticThen it indicates that the current CU is a static area. Conversely, if the value of the standard deviation SD is greater than ThstaticThen the current CU belongs to a non-static area.
S2, the existing video coding standards (e.g. h.265/HEVC) use motion vectors MV covering translational motion for motion estimation CME, however, affine motion estimation AME can not only predict translational motion but also predict linear transformation motion, such as scaling and rotation, if the camera is scaled or rotated to capture video, affine motion estimation AME predicts motion more accurately than motion estimation CME in h.266/VVC, affine motion estimation AME is the same as motion estimation CME, starting with Uni-directional prediction Uni-prediction L0, then Uni-prediction L, and finally Bi-prediction, after calculating three prediction direction modes, the method of utilization distortion optimization (RDO) selects the best prediction direction mode, fig. 2 shows the distribution of the complexity of motion estimation AME prediction modes between Uni-prediction L and Uni-prediction uniprediction 120, and if the Uni-prediction modes are more motion prediction modes than Uni-prediction modes, it is more than Uni-prediction modes that do not require Uni-prediction along with Uni-prediction model 12. prediction 99. prediction, if the Uni-prediction modes are more than Uni-prediction modes, then Uni-prediction modes require Uni-prediction along prediction model coding 99. 5. prediction along with the same prediction coefficients RD prediction coefficients, if the same prediction coefficients are more than Uni-prediction modes, the prediction modes, and the prediction modes are more than the Uni-prediction modes, the prediction modes are needed along prediction modes, and the prediction modes, the prediction modes are more than the prediction modes of Uni-prediction modes of prediction block prediction modes of prediction.
To obtain the optimal motion vector MV and the optimal reference frame, the encoder searches a plurality of available reference frames, calculates a rate distortion RD cost J (-) using a lagrangian multiplier method, which calculates a rate distortion RD cost function J (-) as:
Figure BDA0002451385330000072
because the two reference frame lists represented by Uni-directional prediction Uni-L0 and Uni-directional prediction Uni-L1 are used for motion prediction, the motion estimation ME process for Uni-directional prediction should be tested with the two lists to generate all available frames in the two lists.
For a CU in a static area, skipping affine motion estimation AME, directly predicting the current CU by using motion estimation CME, and selecting the optimal prediction direction mode by a rate distortion optimization method; the specific method comprises the following steps:
s21, the current CU is subjected to unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi.
S22, respectively calculating the rate distortion cost of the one-way prediction Uni-L0, the one-way prediction Uni-L1 and the two-way prediction Bi by using the utilization rate distortion optimization;
the rate-distortion cost of the unidirectional prediction Uni-L0, the unidirectional prediction Uni-L1 and the bidirectional prediction Bi are respectively as follows:
Figure BDA0002451385330000081
Figure BDA0002451385330000082
wherein the content of the first and second substances,
Figure BDA0002451385330000083
a set of all available reference lists is represented,
Figure BDA0002451385330000084
representing a set of reference lists, L0 and L1 representing two reference frame lists, phi (J) representing the reference frames in the reference lists, and J (-) being a rate-distortion cost function.
And S23, taking the prediction mode with the minimum rate distortion cost as the optimal prediction direction mode.
S3, for the CU in the non-static area, the condition of skipping the process of Affine Motion Estimation (AME) is not met, the current CU is classified by using the trained random forest classifier RFC model, and the optimal prediction direction mode is output, so that the calculation complexity is further reduced. The random forest algorithm generates K self-help sample sets based on Bootstrap resampling, and the data of each sample set grows into a decision tree; at the node of each tree, M (M < < M ') features are randomly extracted from M' feature vectors based on a random subspace method RSM. According to a certain node splitting algorithm, selecting the optimal attribute from the m characteristic attributes to carry out branch growth; finally, K' decision trees are combined to carry out mode voting. After the random forest classifier is generated, testing a random forest classifier model, independently judging a classification result for each tree in the forest, finally deciding to take the classification category with the most same judgment, expressing the classification category with a formula as follows,
Figure BDA0002451385330000085
wherein H (t) represents a combined classification model, hi(t) is a single classification tree model, t represents the characteristic attribute of the decision tree, Y represents the output variable, and I (-) represents the collective representational function (i.e. when some classification result appears in the collection, the function value is 1, otherwise, it is 0).
When the CU is traversed, the characteristics of the CU and the prediction direction mode of the CU are recorded, and the normal encoding process is not interfered. Resampling by a Bagging integration method to generate a plurality of training sets, randomly and equivalently extracting samples from an original training sample set, repeatedly extracting and returning to generate K new training sample sets, and finally obtaining K new training sample sets. And after the sample is extracted, entering a training module of a random forest classifier model. Table 1 shows the relevant training parameter settings established by the RFC model of the random forest classifier.
TABLE 1 training parameter configuration
Figure BDA0002451385330000091
According to the parameters of the table 1, the training method of the random forest classifier RFC model comprises the following steps:
s31, selecting a sample set, selecting a video sequence which can cover rich texture complexity and is selected from a universal test sequence, coding the video sequence with Traffic, Kimono, BQSquad, RaceHorseC and FourPeople under different resolutions on a VTM, respectively coding 50 frames before the video sequence on the VTM, and simultaneously recording three prediction direction modes of the shape of a CU, the texture complexity of the CU and the CU in the VTM as a data set, wherein the data set comprises a sample set S-20 and a test set T-30, and the three prediction direction modes comprise unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
in the VTM, the block of affine motion is also predicted in three ways, namely, unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi., and simultaneously, the affine prediction also comprises 4-parameter and 6-parameter affine models, the unidirectional prediction or the bidirectional prediction of the affine motion estimation AME module requires related reference frames, thereby increasing the encoding complexity of the VTM, when only calculating the reference frame number required by each affine motion estimation AME module, the affine motion estimation AME process requires twice the reference frame number of the motion estimation CME process, the whole motion estimation ME process is shown in FIG. 5, and as can be known from FIG. 5, the method for obtaining the data set in step S31 is as follows:
s31.1, predicting the video sequence by utilizing the motion estimation CME, wherein the prediction method is the same as the step S21;
s31.2, carrying out affine prediction on the video sequence predicted in the step S31.1 by utilizing a 4-parameter affine motion model, wherein the affine prediction comprises unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
as shown in fig. 3, the motion vector of the sample position (x, y) in CU of the 4-parameter affine motion model is:
Figure BDA0002451385330000092
wherein (mv)0x,mv0y) Is the motion vector of the upper left corner control point, (mv)1x,mv1y) Is the motion vector of the top right control point, W represents the CU width;
s31.3, carrying out radial prediction on the video sequence subjected to affine prediction in the step S31.2 by using a 6-parameter affine motion model;
as shown in fig. 4, the motion vector for the sample position (x, y) in the block of the 6-parameter affine motion model is:
Figure BDA0002451385330000101
wherein (mv)2x,mv2y) Motion vector control point in lower left corner, H denotes CU high。
And S31.4, respectively calculating rate-distortion costs after affine prediction in the steps S31.2 and S31.3, and taking the prediction mode corresponding to the minimum rate-distortion cost as the prediction direction mode of the video sequence.
S32, resampling the sample set S by using a Bootstrap method, and generating K training sample sets
Figure BDA0002451385330000102
Each training set to be generated
Figure BDA0002451385330000103
As root nodes, corresponding decision trees { T } are generated1,T2,...,TKWhere i ═ 1,2, …, K denotes the ith training sample, and K denotes the size of the training sample set;
s33, starting training from a root node, randomly selecting m characteristic attributes on each intermediate node of a decision tree, calculating a Gini index coefficient of each characteristic attribute, selecting the characteristic attribute with the minimum Gini index coefficient as the optimal splitting attribute of the current node, and dividing the m characteristic attributes into a left sub-tree and a right sub-tree by taking the minimum Gini index coefficient as a splitting threshold;
the feature attributes selected by the present invention include two-dimensional Haar wavelet transform horizontal coefficients (2D Haar wavelet transform vertical coefficients, L H), two-dimensional Haar wavelet transform angle coefficients (2D Haar wavelet transform angle coefficients, HH), angular second moments (Absolute differences, ASM), contrast (contrast, entropy), entropy Difference (Difference, Difference of entropy), and minimum Difference of the feature vectors (SAD) as the feature vectors of the random forest classifier model:
the two-dimensional haar wavelet transform horizontal coefficient H L of the image represents the texture in the horizontal direction of the image, the larger the value is, the richer the texture in the horizontal direction is, the smaller the value is, the flatter the texture in the horizontal direction is, the two-dimensional haar wavelet transform vertical coefficient L H of the image represents the texture in the vertical direction of the image, the larger the value is, the richer the texture in the vertical direction is, the smaller the value is, the flatter the texture in the vertical direction is, the two-dimensional haar wavelet transform angle coefficient HH of the image represents the texture in the vertical direction of the image, the larger the value is, the richer the texture in the 45 degree direction is, the smaller the value is, the two-dimensional haar wavelet transform horizontal coefficient H L, the two-dimensional haar wavelet transform vertical coefficient L H and the two-dimensional ha:
Figure BDA0002451385330000111
Figure BDA0002451385330000112
Figure BDA0002451385330000113
where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b).
The angle second moment ASM reflects the uniformity degree of gray distribution and the thickness of texture, and the larger the value is, the more uniform the texture distribution of the image is; the texture depth of the contrast CON reaction image is larger when the value is larger, which indicates that the texture depth is larger; the entropy ENT represents the information amount of the image, and the larger the value is, the larger the information amount of the image is; the inverse difference moment IDM reflects the magnitude of local texture variation of the image, different regions of the texture of the image are relatively uniform and slowly varied, and the angular second moment ASM, the contrast CON, the entropy ENT, and the inverse difference moment IDE are respectively expressed as:
Figure BDA0002451385330000114
Figure BDA0002451385330000115
Figure BDA0002451385330000116
Figure BDA0002451385330000117
in block matching based motion estimation algorithms, the decision criteria for the best matching block are many, we use the minimum difference sum SAD, the smaller SAD indicates that the reference block is closer to the current prediction block, which is expressed as:
Figure BDA0002451385330000118
wherein, Pk(a, b) represents the value of the current pixel, (a, b) represents the coordinates of the current pixel, Pk-1(a + i ', b + j') is a reference pixel value, and (a + i ', b + j') represents the coordinates of the reference pixel.
The gradient represents the texture direction of the CU, using the gradient of the horizontal and vertical directions of the luminance sample as a characteristic property. The gradients in the horizontal and vertical directions are expressed as:
Gx(a,b)=P(a+1,b)-P(a,b)+P(a+1,b+1)-P(a,b+1),
Gy(a,b)=P(a,b)-P(a,b+1)+P(a+1,b)-P(a+1,b+1),
wherein G isx(a, b) and GyAnd (a, b) respectively represent gradient components of the current pixel in horizontal and vertical directions. (a, b) represents the coordinates of the pixel, and P (a, b) represents the pixel value.
S34, repeating the step S33, training K '25 times until the training of K' decision trees is completed, and enabling each decision tree to grow completely without pruning;
s35, the generated multiple decision trees are random forest classifier RFC models, the random forest classifier RFC models are used for distinguishing and classifying the test set T, the classification result adopts a voting mode, the most categories output by the K' decision trees are used as the categories of the test set T, the best prediction direction mode of the current CU is obtained, and the calculation complexity of the affine motion estimation AME module is reduced.
To evaluate the method of the present invention, simulation tests were performed on the latest H.266/VVC encoder (VTM 7.0). The test video sequence is encoded in the "Random Access" configuration using default parameters. The BDBR reflects the compression performance of the present invention, and the reduction in time represents a reduction in complexity. Table 2 shows the coding characteristics of the present invention, the total coding time average of the present invention is reduced to 87%, and the affine motion estimation AME time average is reduced to 56%. Therefore, the invention can effectively save the coding time, and the loss of the RD performance can be ignored.
TABLE 2 encoding characteristics of the invention
Figure BDA0002451385330000121
From table 2 it can be seen that the RD performance and the saved encoding run time of the present invention compared to VTM. It is possible that the experimental results may fluctuate for different test videos, but the method proposed by the present invention is effective. Compared with VTM, the invention can effectively reduce the complexity of the affine motion estimation AME module and has good RD performance.
The affine motion estimation AME module time is measured according to different Quantization Parameters (QPs). When the quantization parameter QP is 22, as can be seen from fig. 6, the affine motion estimation AME module time for all video sequences amounts to about 36 hours. However, in the method of the invention, the time of the affine motion estimation AME module is reduced by about 9 hours. It can be seen that this trend is similar under other quantization parameters QPs. Thus, it is more intuitive to observe from fig. 6 that the proposed method reduces the encoding time of the affine motion estimation AME module, thereby reducing the computational complexity.
The technical scheme of the invention is described in detail in combination with the drawings, and the technical scheme of the invention provides a fast affine motion estimation method for H.266/VVC, so that the AME encoding complexity of affine motion estimation in VTM is effectively reduced. Firstly, a CU is divided into a static area and a non-static area by using a standard deviation SD, if the CU belongs to the static area, the probability of selecting a SKIP mode for inter-frame prediction is high, and the static area which tends to select the SKIP mode for inter-frame prediction does not need to be subjected to affine prediction, so that an affine motion estimation AME module can be terminated in the static area in advance, and the optimal direction mode of the current CU is the optimal direction mode of motion estimation CME. And if the CU belongs to the non-static area, judging the inter-frame prediction mode of the CU according to the random forest classification model, and finally obtaining the optimal prediction direction mode in advance. Therefore, the invention reduces the calculation complexity and saves the encoding time, thereby realizing the fast encoding of H.266/VVC.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A fast affine motion estimation method for H.266/VVC is characterized by comprising the following steps:
s1, calculating the texture complexity SD of the current CU by using the standard deviation, and dividing the current CU into static areas or non-static areas according to the texture complexity SD;
s2, for the CU in the static area, skipping affine motion estimation AME, directly predicting the current CU by using motion estimation CME, and selecting the best prediction direction mode by a rate distortion optimization method;
and S3, classifying the current CU by using the trained RFC model of the random forest classifier for the CU in the non-static area, and outputting the best prediction direction mode.
2. The fast affine motion estimation method for h.266/VVC according to claim 1, wherein the method for calculating the texture complexity SD of the current CU using the standard deviation is:
Figure FDA0002451385320000011
where W represents the width of the CU, H represents the height of the CU, and P (a, b) represents the pixel value at position (a, b) in the CU.
3. The fast affine motion estimation method for h.266/VVC as claimed in claim 1, wherein the method of predicting the current CU by using motion estimation CME and selecting the best prediction direction mode by rate distortion optimization is:
s21, firstly, the current CU is subjected to unidirectional prediction Uni-L0, then unidirectional prediction Uni-L1 and finally bidirectional prediction Bi;
s22, respectively calculating the rate distortion cost of the current CU respectively subjected to unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi in the step S21 by using utilization rate distortion optimization;
and S23, taking the prediction mode with the minimum rate distortion cost as the optimal prediction direction mode.
4. The fast affine motion estimation method for h.266/VVC as claimed in claim 3, wherein the rate distortion cost calculation methods of the Uni-directional prediction Uni-L0, Uni-directional prediction Uni-L1 and Bi-directional prediction Bi are all as follows:
Figure FDA0002451385320000012
Figure FDA0002451385320000013
wherein the content of the first and second substances,
Figure FDA0002451385320000014
a set of all available reference lists is represented,
Figure FDA0002451385320000015
representing a set of reference lists, L0 and L1 representing two reference frame lists, [ phi ] (J) representing reference frames in the reference lists, J (-) being a rate-distortion cost function, and
Figure FDA0002451385320000016
d (-) represents the distortion degree of CU coding, lambda represents a Lagrangian multiplier, and R (-) represents the number of bits consumed by CU coding.
5. The fast affine motion estimation method for h.266/VVC according to claim 1, wherein the training method of the RFC model of the random forest classifier in the step S3 is as follows:
s31, selecting Traffic, Kimono, BQSquare, RaceHorseC and FourPeople video sequences under different resolutions from the universal test sequence, respectively coding the previous M frames on the VTM, and simultaneously recording the shape of the CU, the texture complexity of the CU and three prediction direction modes of the CU in the VTM as a data set, wherein the data set comprises a sample set S and a test set T, and the three prediction direction modes comprise unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
s32, resampling the sample set S by using a Bootstrap method, and generating K training sample sets
Figure FDA0002451385320000021
Each training set to be generated
Figure FDA0002451385320000022
As root nodes, corresponding decision trees { T } are generated1,T2,...,TKWhere i ═ 1,2, …, K denotes the ith training sample, and K denotes the size of the training sample set;
s33, starting training from a root node, randomly selecting m characteristic attributes on each intermediate node of a decision tree, calculating a Gini index coefficient of each characteristic attribute, selecting the characteristic attribute with the minimum Gini index coefficient as the optimal splitting attribute of the current node, and dividing the m characteristic attributes into a left sub-tree and a right sub-tree by taking the minimum Gini index coefficient as a splitting threshold;
s34, repeating the step S33, training K 'times until the K' decision trees are trained completely, and enabling each decision tree to grow completely without pruning;
s35, the generated decision trees are random forest classifier RFC models, the random forest classifier RFC models are used for distinguishing and classifying the test set T, the classification result adopts a voting mode, the most categories output by the K' decision trees are used as the categories of the test set T, and the best prediction direction mode of the current CU is obtained.
6. The fast affine motion estimation method for h.266/VVC as claimed in claim 5, wherein the method of obtaining data set in step S31 is:
s31.1, predicting a video sequence by utilizing motion estimation CME;
s31.2, carrying out affine prediction on the video sequence predicted in the step S31.1 by utilizing a 4-parameter affine motion model, wherein the affine prediction comprises unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
s31.3, carrying out radial prediction on the video sequence subjected to affine prediction in the step S31.2 by using a 6-parameter affine motion model;
and S31.4, respectively calculating rate-distortion costs after affine prediction in the steps S31.2 and S31.3, and taking the prediction mode corresponding to the minimum rate-distortion cost as the prediction direction mode of the video sequence.
7. The method of claim 5 for fast affine motion estimation for h.266/VVC, characterized in that the feature attributes comprise two-dimensional haar wavelet transform horizontal coefficients, two-dimensional haar wavelet transform vertical coefficients, two-dimensional haar wavelet transform angle coefficients, angular second moments, contrast, entropy, inverse difference moments, minimum difference values and gradients.
8. The fast affine motion estimation method for h.266/VVC as claimed in claim 6, wherein the motion vector of the sample position (x, y) in CU of said 4 parameter affine motion model is:
Figure FDA0002451385320000031
wherein (mv)0x,mv0y) Is the motion vector of the upper left corner control point, (mv)1x,mv1y) Is the motion vector of the top right control point, W represents the CU width;
for the 6-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:
Figure FDA0002451385320000032
wherein (mv)2x,mv2y) The motion vector control point in the lower left corner, H denotes CU high.
CN202010293694.8A 2020-04-15 2020-04-15 Fast affine motion estimation method for H.266/VVC Active CN111479110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010293694.8A CN111479110B (en) 2020-04-15 2020-04-15 Fast affine motion estimation method for H.266/VVC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010293694.8A CN111479110B (en) 2020-04-15 2020-04-15 Fast affine motion estimation method for H.266/VVC

Publications (2)

Publication Number Publication Date
CN111479110A true CN111479110A (en) 2020-07-31
CN111479110B CN111479110B (en) 2022-12-13

Family

ID=71752555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010293694.8A Active CN111479110B (en) 2020-04-15 2020-04-15 Fast affine motion estimation method for H.266/VVC

Country Status (1)

Country Link
CN (1) CN111479110B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112689146A (en) * 2020-12-18 2021-04-20 重庆邮电大学 Heuristic learning-based VVC intra-frame prediction rapid mode selection method
CN112911308A (en) * 2021-02-01 2021-06-04 重庆邮电大学 H.266/VVC fast motion estimation method and storage medium
CN113225552A (en) * 2021-05-12 2021-08-06 天津大学 Intelligent rapid interframe coding method
CN113630601A (en) * 2021-06-29 2021-11-09 杭州未名信科科技有限公司 Affine motion estimation method, device, equipment and storage medium
CN115278260A (en) * 2022-07-15 2022-11-01 重庆邮电大学 VVC (variable valve control) rapid CU (CU) dividing method based on space-time domain characteristics and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1934871A (en) * 2003-08-25 2007-03-21 新加坡科技研究局 Mode decision for inter prediction in video coding
CN104320658A (en) * 2014-10-20 2015-01-28 南京邮电大学 HEVC (High Efficiency Video Coding) fast encoding method
WO2018124332A1 (en) * 2016-12-28 2018-07-05 엘지전자(주) Intra prediction mode-based image processing method, and apparatus therefor
CN110213584A (en) * 2019-07-03 2019-09-06 北京电子工程总体研究所 Coding unit classification method and coding unit sorting device based on Texture complication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1934871A (en) * 2003-08-25 2007-03-21 新加坡科技研究局 Mode decision for inter prediction in video coding
CN104320658A (en) * 2014-10-20 2015-01-28 南京邮电大学 HEVC (High Efficiency Video Coding) fast encoding method
WO2018124332A1 (en) * 2016-12-28 2018-07-05 엘지전자(주) Intra prediction mode-based image processing method, and apparatus therefor
CN110213584A (en) * 2019-07-03 2019-09-06 北京电子工程总体研究所 Coding unit classification method and coding unit sorting device based on Texture complication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴庆岗等: "基于随机森林和多特征融合的青苹果图像分割", 《信阳师范学院学报(自然科学版)》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112689146A (en) * 2020-12-18 2021-04-20 重庆邮电大学 Heuristic learning-based VVC intra-frame prediction rapid mode selection method
CN112911308A (en) * 2021-02-01 2021-06-04 重庆邮电大学 H.266/VVC fast motion estimation method and storage medium
CN112911308B (en) * 2021-02-01 2022-07-01 重庆邮电大学 H.266/VVC fast motion estimation method and storage medium
CN113225552A (en) * 2021-05-12 2021-08-06 天津大学 Intelligent rapid interframe coding method
CN113225552B (en) * 2021-05-12 2022-04-29 天津大学 Intelligent rapid interframe coding method
CN113630601A (en) * 2021-06-29 2021-11-09 杭州未名信科科技有限公司 Affine motion estimation method, device, equipment and storage medium
CN113630601B (en) * 2021-06-29 2024-04-02 杭州未名信科科技有限公司 Affine motion estimation method, affine motion estimation device, affine motion estimation equipment and storage medium
CN115278260A (en) * 2022-07-15 2022-11-01 重庆邮电大学 VVC (variable valve control) rapid CU (CU) dividing method based on space-time domain characteristics and storage medium

Also Published As

Publication number Publication date
CN111479110B (en) 2022-12-13

Similar Documents

Publication Publication Date Title
CN111479110B (en) Fast affine motion estimation method for H.266/VVC
Chen et al. Learning for video compression
US9781443B2 (en) Motion vector encoding/decoding method and device and image encoding/decoding method and device using same
US9326002B2 (en) Method and an apparatus for decoding a video
CN108989802B (en) HEVC video stream quality estimation method and system by utilizing inter-frame relation
CN106170093B (en) Intra-frame prediction performance improving coding method
JP4429968B2 (en) System and method for increasing SVC compression ratio
WO2000033580A1 (en) Improved motion estimation and block matching pattern
JP2007503784A (en) Hybrid video compression method
CN110177282B (en) Interframe prediction method based on SRCNN
CN104811728B (en) A kind of method for searching motion of video content adaptive
KR100597397B1 (en) Method For Encording Moving Picture Using Fast Motion Estimation Algorithm, And Apparatus For The Same
Yang et al. Generalized rate-distortion optimization for motion-compensated video coders
CN114286093A (en) Rapid video coding method based on deep neural network
JP4417054B2 (en) Motion estimation method and apparatus referring to discrete cosine transform coefficient
CN112291562B (en) Fast CU partition and intra mode decision method for H.266/VVC
CN111586405B (en) Prediction mode rapid selection method based on ALF filtering in multifunctional video coding
CN101331773A (en) Two pass rate control techniques for video coding using rate-distortion characteristics
Yilmaz et al. End-to-end rate-distortion optimization for bi-directional learned video compression
CN113810715B (en) Video compression reference image generation method based on cavity convolutional neural network
CN110581993A (en) Coding unit rapid partitioning method based on intra-frame coding in multipurpose coding
CN106878754A (en) A kind of 3D video depths image method for choosing frame inner forecast mode
Bachu et al. Adaptive order search and tangent-weighted trade-off for motion estimation in H. 264
JP4490351B2 (en) Inter-layer prediction processing method, inter-layer prediction processing apparatus, inter-layer prediction processing program, and recording medium therefor
CN113822801A (en) Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221118

Address after: Floor 20-23, block a, Ximei building, no.6, Changchun Road, high tech Industrial Development Zone, Zhengzhou City, Henan Province, 450000

Applicant after: Zhengzhou Light Industry Technology Research Institute Co.,Ltd.

Applicant after: Zhengzhou University of light industry

Address before: 450002 No. 5 Dongfeng Road, Jinshui District, Henan, Zhengzhou

Applicant before: Zhengzhou University of light industry

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant