CN114257819A

CN114257819A - VCC coding unit fast dividing method based on space-time depth information

Info

Publication number: CN114257819A
Application number: CN202111556624.8A
Authority: CN
Inventors: 张秋闻; 赵进超; 张园园; 李鹏; 代璞; 吴奥博; 蒋斌; 黄立勋; 吴庆岗; 孙丽君; 王晓; 张伟伟; 孟颍辉; 甘勇
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2021-12-18
Filing date: 2021-12-18
Publication date: 2022-03-29

Abstract

The invention provides a method for quickly dividing VCC coding units based on space-time depth information, which comprises the following steps: performing quadtree division on the CU of the maximum coding unit, and performing multi-type tree division after the quadtree division is finished; when the quadtree partition depth of the maximum coding unit is 2 and the average depth of the spatio-temporal adjacent CUs is less than 2, the quadtree partition is terminated; when the quadtree partition depth of the maximum coding unit is 3 and the average depth of the spatio-temporal adjacent CUs is less than 3, the quadtree partition is terminated; when the quadtree division depth of the maximum coding unit is between 2 and 4, the CU performs multi-type tree division; when the multi-type tree partition depth of the maximum coding unit is 0 and the average depth of the space-time neighbors is less than 0.1, the CU does not need the multi-type tree partition, otherwise, all partition modes are traversed; and deciding an optimal prediction mode according to the rate distortion cost. The invention can save 35.72% of encoding time on average, and the average bit rate is increased less under the same encoding quality.

Description

VCC coding unit fast dividing method based on space-time depth information

Technical Field

The invention relates to the technical field of video coding, in particular to a VCC coding unit rapid partitioning method based on space-time depth information.

Background

With the increasing demand for high definition video, more and more video coding standards have emerged, such as h.264/AVC and h.265/HEVC. However, the compression ratios achievable by these standards cannot keep up with the rapid growth in video data demand. Therefore, research and standard formulation of next-generation ultra-high-definition video coding technology are urgent, and the video coding standard h.265/HEVC widely applied in the video field can not meet the requirement of practical application gradually, under the condition, a new-generation video coding standard h.266/VVC is generated.

HEVC is a precursor to high efficiency video coding (VVC), which employs a quadtree coding structure, and a great deal of research has been done to reduce the complexity of HEVC in the past decade. In HEVC, Coding Unit (CU) partitions occupy the largest proportion of Coding time, so many approaches attempt to simplify CU partitions to reduce the complexity of HEVC. Similarly, the CU partitioning structure of the VVC may also be simplified, which is much more flexible than HEVC, and employs a more complex quad, tri, or binary tree coding structure, and the amount of computation is much larger. A large number of fast partitioning algorithms are currently proposed, and these methods can be roughly classified into two categories: heuristic methods and machine learning methods.

In the heuristic approach, statistical models of CU partitions are built by using some intermediate features of the coding, such as homogeneity/complexity of texture and spatial correlation, through which further CU partitions are terminated early, ending the redundancy Rate Distortion Optimization (RDO) process in the CU partitions. Yang et al propose a low complexity Code Tree Unit (CTU) structure decision algorithm based on global, local texture information, context information, etc. King et al also describes a locally constrained CU partition decision method. In addition to texture analysis, some work is based on machine learning or convolutional neural networks. Amethoy et al introduced random forests into VVC framework partitions. The application of CNN (convolutional neural network) in video coding has been validated in previous work. Liu et al proposed CNN-based CU mode and partition decisions for HEVC intra coding. Xu et al propose a method to apply CNN and Long Short Term Memory (LSTM) networks to HEVC internal and internal partitions to reduce coding complexity. Also in VVCs, some similar works to CNN appear. Jin et al propose a CNN-oriented fast quadtree plus binary tree (QTBT) decision for VVC intra-frame prediction, which models the QTBT structure as a classification problem. For a CU of 32x32, the CU depth will be inferred by CNN, where CU depth is defined by quadtree depth and binary tree depth, unlike the depth in HEVC. Tissier et al propose an intra-frame coding complexity reduction technique based on CNN, obtain a probability vector through CNN, and obtain a CU partition mode according to the probability vectors, thereby accelerating the coding speed. Chen et al propose an early termination algorithm that uses the average depth of the coding units to speed up CU partitioning. The method is elaborately designed aiming at the division of the quick coding unit, greatly shortens the coding time, and obviously reduces the performance of RD.

Therefore, in the new generation of video Coding standard, in the high efficiency video Coding (VVC), the Coding efficiency is significantly improved due to the characteristics of Coding Units (CUs) with different sizes, but at the same time, the Coding efficiency is extremely high.

Disclosure of Invention

Aiming at the technical problem of high computation complexity of the existing VCC method, the invention provides a VCC coding unit rapid partitioning method based on space-time depth information, which removes redundant computation in CU partitions, has lower computation complexity on the premise of not influencing coding performance, reduces unnecessary rate distortion optimization operation, and remarkably accelerates coding speed.

In order to achieve the purpose, the technical scheme of the invention is realized as follows: a VCC coding unit fast dividing method based on space-time depth information includes the following steps:

s1: performing quadtree division on a CU of a maximum coding unit in the VVC, performing multi-type tree division after the quadtree division is finished, and performing correlation analysis between space-time adjacent blocks;

s2: when the quadtree partition depth of the maximum coding unit is 2, if the average depth of spatio-temporal adjacent CUs is less than 2, the quadtree partition test process is terminated; otherwise, go to step S6;

s3: when the quadtree partition depth of the maximum coding unit is 3, if the average depth of spatio-temporal adjacent CUs is less than 3, the quadtree partition test process is terminated; otherwise, go to step S6;

s4: when the quadtree partition depth of the maximum coding unit is between 2 and 4, performing a multi-type tree partition test process on the CU;

s5: when the multi-type tree partition depth of the maximum coding unit is 0, if the average depth of the space-time adjacent CU is less than 0.1, the CU does not need to perform the multi-type tree partition test process, otherwise, all partition modes are traversed;

s6: deciding an optimal prediction mode according to the rate-distortion cost: and calculating rate distortion cost for all the partition modes of the current CU, and selecting the partition mode with the minimum rate distortion cost as the partition mode of the current CU.

The spatio-temporal neighboring CUs include a left coding unit L-CU, an upper coding unit U-CU of the current CU, and a coding unit Col-CU in a previous frame at a same position as the current CU; the depth of the current CU is predicted using the Col-CU, the L-CU, and the U-CU, each of which is assigned equal weight to each spatio-temporally adjacent CU.

The quadtree partition depth of the largest coding unit is represented by a depth mean of a largest coding block: dividing a maximum coding unit into 4 CUs, respectively calculating the depth mean value of the 4 CUs, and adding the depth values of the 4 CUs to calculate the mean value to obtain the depth of the maximum coding unit.

The relationship between the average depth of the quad-tree partition and the current maximum coding unit is as follows:

step 1, calculating the number of blocks with different depths for a maximum coding unit with optimal RDO coding to obtain a CU with optimal depth;

step 2, respectively obtaining depth mean values of three space-time adjacent blocks Col-CU, L-CU and U-CU;

and 3, weighting and summing the mean values of the quadtree partitioning depths of the coding block Col-CU, the coded left coding block L-CU and the upper coding block U-CU of the previous frame.

The method for determining whether to further divide the current CU for RDO test according to the calculated depth of the quadtree division comprises the following steps:

(1) when the test quadtree division depth is 0 or 1, dividing the CU for further RDO test;

(2) when the test quadtree partition depth is 2, calculating the quadtree partition depth mean values of the space-time adjacent blocks Col-CU, L-CU and U-CU, and if the depth mean values are all smaller than 2 and the quadtree partition test depth of the current CU is 2, ending the quadtree partition in advance;

(3) and when the test quadtree partition depth is 3, calculating the quadtree partition depth mean values of the space-time adjacent blocks Col-CU, L-CU and U-CU, and if the depth mean values are all less than 3 and the quadtree partition test depth of the current coding unit is 3, ending the quadtree partition in advance.

The analysis method of the correlation between the multi-type tree partition depth information of the current CU and the spatio-temporal adjacent blocks Col-CU, L-CU and U-CU comprises the following steps:

1) calculating the number of blocks with the optimal multi-type tree partition depth of 0, 1, 2 and 3 when the optimal quadtree partition depth of the maximum coding unit is 2; calculating the number of blocks with the optimal multi-type tree (MTT) partition depth of 0, 1, 2 and 3 when the optimal quadtree partition depth of the maximum coding unit is 3; calculating the number of blocks with the optimal multi-type tree division depths of 0, 1 and 2 when the optimal quadtree division depth of the maximum coding unit is 4;

2) calculating the multi-type tree partition depth mean value of the space-time adjacent blocks Col-CU, L-CU and U-CU of each maximum coding unit;

3) and distributing equal weight calculation to the obtained multi-type tree partition depth mean values of the spatio-temporal adjacent blocks Col-CU, L-CU and U-CU.

The invention has the beneficial effects that: firstly, predicting the coding depth of a current CU according to the depth mean value of spatial-temporal adjacent CUs by utilizing spatial-temporal correlation existing in a video; then, unnecessary traversal range is skipped during encoding, and the coding unit is divided quickly to save encoding time. The invention fully utilizes the space-time correlation of the video sequence, utilizes the average depth analysis of the maximum coding block level and the average depth and threshold value comparison of the space-time adjacent blocks, and can reduce unnecessary coding block division and RDO (redundancy rate distortion optimization) test. The experimental result shows that compared with the original encoder, the encoding time can be saved by 35.72% on average, and the average bit rate (BDBR) is increased by only 0.43% under the same encoding quality.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a schematic diagram of spatio-temporal neighboring CUs according to the present invention, wherein (a) is a previous frame and (b) is a current frame.

Fig. 3 is a diagram illustrating the calculation of the average coding depth of four neighboring CUs according to the present invention.

Fig. 4 is a schematic diagram of the calculation of the average coding depth of the largest coding block according to the present invention.

Fig. 5 is a diagram illustrating maximum coding block depth analysis according to the present invention.

FIG. 6 is a diagram illustrating multi-type partition depth mean calculation of neighboring blocks according to the present invention.

FIG. 7 is a diagram illustrating the calculation of the quadtree partitioning depth-average of neighboring blocks according to the present invention.

FIG. 8 is a diagram illustrating depth-mean calculation of neighboring blocks according to the present invention.

Fig. 9 is a diagram illustrating multi-type tree partition depth calculation of a largest coding block according to the present invention.

Fig. 10 is a graph comparing the BDBR increase of the present invention with the g.tang, Tissier method.

Fig. 11 is a graph comparing the TS increase of the present invention with the g.tang, Tissier methods.

FIG. 12 is a schematic diagram of the RD curves of the present invention and VTM7.0 algorithm at different QPs.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for quickly partitioning a VVC coding unit based on spatiotemporal depth information, which aims at the problem that a quadtree nested multi-type tree (MTT) partitioning manner of a CU used in a VVC causes a significant increase in complexity of an encoder, and predicts a coding depth of a current CU according to a depth mean of spatiotemporal neighboring CUs by using spatiotemporal correlations existing in a video. The invention provides a quick QT decision algorithm and a quick MTT decision algorithm, which mainly comprise the following processes:

s1: the Largest Coding Unit (LCU) size in VVC is 128 x 128, the depth is 0, the depth of CU ranges from 0 to 6, CU firstly carries out Quadtree (QT) division, after the Quadtree (QT) division is finished, multi-type tree (MTT) division is carried out, and we firstly carry out correlation analysis between space-time adjacent blocks.

S2: since the depth level between neighboring CUs may not significantly vary, when the quad-tree split depth of the maximum coding unit is 2, if the average depth of spatio-temporal neighboring three blocks (Col-CU, L-CU, and U-CU) is less than 2, the quad-tree split test procedure is terminated early.

S3: since the depth level between neighboring CUs may not significantly vary, when the quad-tree split depth of the maximum coding unit is 3, if the average depth of spatio-temporal neighboring three blocks (Col-CU, L-CU, and U-CU) is less than 3, the quad-tree split test procedure is terminated early.

S4: when the quadtree partition depth of the largest coding unit is between 2 and 4 (the QT depth of the current CU is different, and the average depth of the adjacent blocks needs to be determined to be different), the coding unit needs to perform an additional multi-type tree partition test procedure.

S5: when the multi-type tree partition depth of the maximum coding unit is 0, if the average depth of three spatio-temporal neighboring blocks (Col-CU, L-CU and U-CU) is less than 0.1 (the partition depth cannot be less than 0, and when the average depth of the neighboring CU is less than 0.1, the depth of the current CU is close to the depth of the neighboring CU), the coding unit does not need to perform the next multi-type tree partition test process, otherwise, all partition modes are traversed.

S6: and according to a prediction mode with the optimal rate-distortion cost RDCost decision, trying all the partition modes of the current CU, calculating the RDCost of the current CU, and selecting the partition mode with the minimum cost as the partition mode of the current CU.

In natural images, neighboring CUs usually have similar textures, the optimal depth level of a current CU has strong correlation with its neighboring CUs, and the depth level between neighboring CUs may not change significantly. As shown in FIG. 2, Cur-CU denotes the current CU, L-CU, LU-CU, U-CU, and RU-CU denote the left CU, the upper CU, and the upper right CU of the current CU, respectively, and Col-CU denotes the CU in the previous frame at the same position as the current CU. To analyze the correlation between spatio-temporal neighboring CU depth information. And selecting 12 videos from standard test video sequences ClassB-ClassE of VVC by taking LCU as a unit, wherein Quantization Parameters (QP) are 22, 27, 32 and 37, and carrying out full I frame coding. The correlation is described by the probability that when the depth of L-CU, LU-CU, U-CU, RU-CU and Col-CU is d, the Cur-CU is at the same time the depth d.

And excluding LU-CU and RU-CU according to the correlation of the depths of the spatial-temporal adjacent LCUs, finally selecting Col-CU, L-CU and U-CU to predict the depth of Cur-CU, and respectively allocating equal weight to each adjacent LCU.

Firstly, a QT coding acceleration method is carried out, the depth information of space-time adjacent coding blocks divided by a Quadtree (QT) is considered to be very complex, and in order to simplify the calculation process, the depth mean value of the largest coding block is used for representation. The specific calculation steps are shown in fig. 3 and 4. As shown in fig. 3, an LCU (largest coding unit) is first divided into 4 small coding blocks, then the depth mean of the 4 CUs is calculated respectively to obtain four sub-blocks with depth values of 2, 2.25 and 1, and finally the depth values of the 4 CUs are added and averaged to obtain the depth of the whole LCU of 1.875, as shown in fig. 4.

Statistically analyzing the relation between the mean depth of the quadtree partitioning and the current LCU, wherein the statistical detailed steps are as follows:

and step 1, calculating the number of blocks with different depths for an LCU with the optimal RDO coding, and finally obtaining the CU with the optimal depth. For the example shown in fig. 5, "C" represents the depth of the current CU, which includes three depth 1 blocks, three depth 2 blocks, and one depth 3 block.

And 2, obtaining strong correlation between the current maximum coding unit and the coding block of the previous frame and between the surrounding coded left coding block and the surrounding coded upper coding block through the previous analysis, and respectively obtaining the depth mean values of the three space-time adjacent blocks, as shown in fig. 5.

And 3, weighted summation is carried out on the mean values of the division depths of the coding blocks of the previous frame, the left coding blocks coded around the coding blocks and the quad-tree (QT) of the upper coding block.

Through the depth information calculated by the statistical data, the encoder can decide whether to further divide the current coding unit for the RDO test. The detailed operation is as follows:

(1) when a Quadtree (QT) partition depth is tested to be 0 or 1, the probability of occurrence of a coding unit having an optimal depth of 0 and 1, respectively, is very small compared to the depths of other coding units because the coding unit size is large. When testing these two depths, we typically divide the coding unit for further RDO testing.

(2) When the partition depth of a Quadtree (QT) is tested to be 2, the RDO calculation of whether a coding unit is skipped is determined by calculating the mean of the partition depths of the Quadtree (QT) of a coding block of a previous frame, a coded left coding block and a coded upper coding block, and if the mean depth of the coding blocks is less than 2, the probability that the LCU is finally partitioned to the depth of 0-2 is high.

(3) When the partition depth of a Quadtree (QT) is tested to be 3, the RDO calculation of whether a coding unit is skipped is determined by calculating the mean values of the partition depths of the Quadtree (QT) of a coding block of a previous frame, a coded left coding block and a coded upper coding block, and if the mean depth of the coding blocks is less than 3, the final LCU depth is probably 0-3.

To demonstrate the two relationships (2) and (3) above, experiments were performed under the same setup and test video. With relation (2), when the average depths (L-CU, U-CU, and Col-CU) of the neighboring blocks are equal to or less than 2, the optimal depth of the current coding unit is equal to or less than a probability of 2. The same method is used to demonstrate relationship (3), and the results show that both relationship (2) and relationship (3) are more than 80% accurate in percentage, which also confirms the feasibility of both relationships. These two relationships may help reduce unnecessary block partitioning and RDO processes, speeding up the encoding process. The method comprises the following specific steps:

and 1, if the average depth values of the coding blocks at the same position of the previous frame, the coded left coding block and the coded upper coding block are all less than 2, and the test depth of the Quadtree (QT) partition of the current coding unit is 2, ending the Quadtree (QT) partition in advance.

And 2, if the average values of the depths of the coding blocks at the same position of the previous frame, the coded left coding block and the coded upper coding block are all less than 3, and the test depth of the Quadtree (QT) partition of the current coding unit is 3, ending the Quadtree (QT) partition in advance.

The encoding acceleration process of the Quadtree (QT) partition is detailed above, and the encoding acceleration method of the multi-type tree (MTT) partition is similar to the Quadtree (QT) partition. As shown in fig. 6-8, first, a largest coding block is divided into 4 CUs, the 4 individual CUs are divided into 4 sub-blocks again, multi-type tree (MTT) partition depth values of the 16 sub-blocks are calculated, as shown in fig. 6, and then a mean depth value of the four individual CUs is calculated using the multi-type tree (MTT) partition depth values of the 16 sub-blocks, as shown in fig. 7, to obtain four sub-blocks with depth values of 0.625, 0.5, 0.625, and 0. Finally, the average depths of the 4 CUs are added to obtain an average value, and the depth average value of the largest coding block is 0.4375, as shown in fig. 8.

How to analyze the correlation between the current coding unit and the multi-type tree (MTT) partition depth information of the coding block of the previous frame, the surrounding coded left coding block and the upper coding block is as follows:

step 1, calculating the number of blocks with the optimal multi-type tree (MTT) partition depth of 0, 1, 2 and 3 when the optimal Quadtree (QT) partition depth of the LCU is 2. And when the optimal Quadtree (QT) partition depth of the LCU is calculated to be 3, the optimal multi-type tree (MTT) partition depth is respectively the number of blocks of 0, 1, 2 and 3. And when the optimal Quadtree (QT) partition depth of the LCU is calculated to be 4, the optimal multi-type tree (MTT) partition depth is respectively the number of blocks of 0, 1 and 2. As shown in fig. 9, "C" denotes the current CU, and is composed of multi-type trees (MTTs) having depths of 0, 1, and 2.

And 2, calculating the average MTT depth of the adjacent LCU blocks of each LCU. As shown in fig. 9, multi-type tree (MTT) partition depth means of the coding block Col-CU of the previous frame, the surrounding coded left coding block L-CU, and the upper coding block U-CU are calculated, respectively.

And 3, distributing equal weight calculation to the obtained MTT depth mean value of the Col-CU of the previous frame and the multi-type tree (MTT) partition depth mean values of the surrounding coded L-CU and the U-CU.

Through the above calculation and analysis, a depth-based rule is found, and the depths of adjacent blocks can be used to decide whether to continue the block partitioning and RDO test. The rules are described as follows:

when testing the depth of a CU at 0, if the average of the neighboring blocks is less than 0.1, then the final optimal MTT depth is likely to be 0. To validate the proposed depth-based rule to decide whether to continue partitioning, the same test video and system settings are used. When the average MTT depth of the neighboring blocks is less than 0.1, the probability that the MTT depth of the CU is 0 is tested, and the result shows that the accuracy of the depth-based prediction rule is not less than 80%, which proves to be feasible in reducing the temporal complexity of CU encoding in VVC. In testing MTT depth 0, block partitioning and RDO testing may be terminated early if the average depth of neighboring blocks is less than 0.1.

In order to verify the performance of the algorithm, the experiment was carried out with reference to the reference implementation software VTM7.0 integrating the above algorithm into the VVC. The coding configuration mode used is the all-I-frame (AI) mode, which tests the coding performance of

video QP selection

22, 27, 32, and 37, respectively. To analyze the performance of the algorithm, the percentage saved coding time Δ T, the bit rate increment Δ BDBR are used as metrics. The calculation formulas are expressed in formulas (1) to (2).

Wherein, T_proposedAnd BDBR_proposedRespectively representing the coding time and bit rate of the fast algorithm of the invention; t is_VTM7.0And BDBR_VTM7.0Respectively representing the encoding time and the average bit rate of VTM 7.0.

The comparison result of the invention and the original VTM7.0 is shown in Table 1, compared with the standard algorithm, the invention can averagely save 35.72 percent of encoding time, has little influence on video encoding quality, even ignores the influence, and only increases the video code rate by 0.43 percent. And with the increase of video resolution, the absolute values of BDBR and BD-PSNR tend to decrease, the algorithm provided by the invention is particularly outstanding in the aspect of coding rate distortion performance, the real-time performance of VVC coding is effectively improved, and the algorithm has a better effect on high-resolution videos.

TABLE 1 comparison of test results for the invention with VTM7.0

To evaluate the coding performance of the present invention, fig. 8 and 9 show the overall coding saving time and BDBR increase comparison results of the proposed overall method and the latest h.266/VVC fast algorithm, respectively. FIG. 10 is an exemplary graph of RD curves for the VTM7.0 algorithm at different QPs. The algorithm of the present invention substantially overlaps the RD curve of the VTM7.0 algorithm. Therefore, the encoding performance of the present invention is not significantly lower than the VVC standard algorithm.

The invention provides a novel quick method for dividing VVC intra-frame coding units based on space-time depth information by utilizing the space-time correlation of a video sequence, which comprises the steps of analyzing the average depth of an LCU level, utilizing the average depth and threshold comparison of space-time adjacent blocks, reducing unnecessary coding block division and RDO test, and has outstanding performance in coding rate distortion.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A VCC coding unit fast dividing method based on space-time depth information is characterized in that the method comprises the following steps:

2. The method of claim 1, wherein the spatiotemporal neighboring CUs comprise a left coding unit L-CU, an upper coding unit U-CU of a current CU, and a coding unit Col-CU at the same position as the current CU in a previous frame; the depth of the current CU is predicted using the Col-CU, the L-CU, and the U-CU, each of which is assigned equal weight to each spatio-temporally adjacent CU.

3. The method of claim 1 or 2, wherein the quadtree partition depth of the largest coding unit is represented by a depth mean of a largest coding block: dividing a maximum coding unit into 4 CUs, respectively calculating the depth mean value of the 4 CUs, and adding the depth values of the 4 CUs to calculate the mean value to obtain the depth of the maximum coding unit.

4. The method of claim 3, wherein the relationship between the average depth of the quadtree partition and the current maximum coding unit is as follows:

5. The method of claim 1 or 4, wherein the method for deciding whether to further partition the current CU for RDO test according to the calculated depth of the quadtree partition is as follows:

6. The VCC coding unit fast partitioning method based on spatio-temporal depth information of claim 5, wherein the analysis method of the correlation between the multi-type tree partition depth information of the current CU and spatio-temporal neighboring blocks Col-CU, L-CU and U-CU is: