CN116033153A

CN116033153A - Method and system for rapidly dividing coding units under VVC standard

Info

Publication number: CN116033153A
Application number: CN202211669820.0A
Authority: CN
Inventors: 伏长虹; 闫依婷; 洪弘
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-12-25
Filing date: 2022-12-25
Publication date: 2023-04-28

Abstract

The invention discloses a method and a system for rapidly dividing coding units under a VVC standard, wherein the method comprises the following steps: step 1, collecting data in the process of encoding different video sequences: a CU pixel value of 32 x 32 size, quantization parameters, best intra prediction mode, and final partition label; step 2, training the constructed convolutional neural network by using the data saved in the step 1 to obtain a model; and 3, in the actual coding, for each CU with the size of 32 multiplied by 32, a trained model is called to input gray values, quantization parameters and the optimal intra-frame prediction mode in the CU into a trained network, and the network outputs final division labels, wherein each label corresponds to different division modes. The CU rapid dividing method provided by the invention can reduce the complexity of coding while guaranteeing the video quality.

Description

Method and system for rapidly dividing coding units under VVC standard

Technical Field

The invention belongs to the field of video coding and decoding processing, and particularly relates to a method and a system for rapidly dividing coding units under a variable valve timing (VVC) standard.

Background

The rapid development of multimedia devices, the continual increase in demand for high quality video applications, and the rapid growth in the volume of video data present challenges to existing video compression techniques.

The division mode specified by the VVC standard includes a quadtree division, a binary tree division, and a trigeminal tree division, wherein both the binary tree division and the trigeminal tree division can be further subdivided into a horizontal division and a vertical division. Furthermore, the present invention will not divide as a new way of dividing. Therefore, for most CUs, the number of partitions is 6. To determine the final optimal partitioning scheme, all partitioning schemes need to be traversed through a rate distortion optimization (Rate Distortion Optimization, RDO) process, and the final determined optimal partitioning scheme will have the smallest RD cost (RD-cost). The CU division mode is selected by calculating from top to bottom and then selecting from bottom to top. The method can bring about a good dividing effect and improve the coding efficiency, but has high computational complexity.

In general, the CU fast partitioning approach based on the conventional features focuses more on the texture information of the coding block itself. Such approaches typically use mathematical tools such as gradients, variances, etc., to find the relationship between a feature and the final partitioning result from the CU itself. Although the partitioning of the CU and the extracted features have a strong correlation, there is also a problem in that for a CU with complex texture, the method cannot decide whether further partitioning is required. In addition, performing operations such as variance and gradient on the CU introduces unnecessary computations.

In recent years, deep learning has been widely used for image classification problems. Therefore, it has become a major trend to deal with CU fast partitioning problem by adopting a convolutional neural network-based manner.

Disclosure of Invention

The invention aims to provide a method and a system for rapidly dividing coding units under a variable valve timing (VVC) standard by utilizing a convolutional neural network so as to achieve the aim of reducing calculation complexity.

The technical solution for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a method for rapidly dividing coding units under VVC standard, including:

step 1, selecting standard test sequences provided by 12 VVC, selecting 15 frames of data for data acquisition of each video sequence, and storing data with horizontal division or vertical division marks in the data acquisition process, wherein the division marks specifically refer to the horizontal division if the coding result contains 32 XH CU and H is smaller than 32, the coding result contains W X32 CU and W is smaller than 32, and the vertical division is marked; the specific data stored include: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the best label values corresponding to the current CU in the original encoder are classified into three types: horizontal division, vertical division, and other division scenarios;

step 2, inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution on a 32×32 CU, then transforming a 1×1 QP with a dimension and an optimal intra-frame prediction mode with a dimension of 1×1 through a full connection layer, transforming the QP with the dimension of 1×1 into 32×1×1, then overlapping the QP with the 64×1×1 global features after convolution, changing the overlapped dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally, corresponding to the collected data labels, thereby completing network training;

and 3, in the actual encoding process, when the encoder encodes the CU with the size of 32 multiplied by 32, loading a convolutional neural network model, and inputting the gray pixel value of the CU with the size of 32 multiplied by 32, wherein the network model can give a final division label, and the advanced decision of the division mode is realized according to the final label.

In a second aspect, the present invention provides a coding unit rapid partitioning system under VVC standard, including:

the data acquisition module is used for acquiring data of a training convolutional neural network, selecting standard test sequences provided by 12 VVC, selecting 15 frames of data for each video sequence to acquire the data, and storing the data with horizontal division or vertical division marks in the data acquisition process, wherein the division marks specifically refer to the horizontal division if the coding result contains 32 XH CU and H is smaller than 32, the horizontal division is marked, the coding result contains W X32 CU and W is smaller than 32, and the vertical division is marked; the specific data stored include: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the label values corresponding to the current CU, the specific labels are classified into three categories: horizontal division, vertical division, and other division scenarios;

the network training module is used for inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution are carried out on 32×32 CUs, then converting a QP with a dimension of 1×1 and an optimal intra-frame prediction mode with a dimension of 1×1 into 32×1×1 through a full connection layer, then superposing the QP with the convolved 64×1×1 global features, changing the superposed dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally outputting the prediction probability of the current CU division and the collected data label to correspond to each other, so that network training is completed;

the model loading module is used for inputting the pixel value, the quantization parameter and the optimal intra-frame prediction mode of the CU of 32 multiplied by 32 into a trained network in the actual coding process, and outputting a final partition label of the CU by the network for deciding the partition mode of the CU in advance.

In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the program is executed.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.

In a fifth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.

Compared with the existing algorithm, the method has the following advantages:

(1) Compared with a CU rapid dividing mode based on traditional characteristics, the CU rapid dividing mode based on the convolutional neural network does not need to manually screen characteristics related to dividing results, and only gray pixel values of the CU are required to be input into the network;

(2) The CU rapid partitioning method based on the convolutional neural network does not introduce a large amount of extra calculation, and the loading and prediction of the convolutional neural network have small time occupation ratio in the whole coding process.

Drawings

Fig. 1 is a flowchart of a CU fast partitioning method based on convolutional neural network.

Fig. 2 is a multi-type tree partition structure under the VVC standard.

Fig. 3 is a convolutional neural network structure diagram under a CU fast partitioning algorithm based on a convolutional neural network.

Detailed Description

Referring to fig. 1 to fig. 3, the present invention provides a method for quickly dividing Coding Units (CUs) under a video VVC standard, and for a CU with a size of 32×32, specific processing steps for quickly deciding a dividing manner are as follows:

step 1, selecting N frame data for data acquisition of different video sequences, and storing the data with obvious horizontal division or vertical division marks in the data acquisition process, wherein the specific stored data comprises the following steps of: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the label values corresponding to the current CU, the specific labels are classified into three categories: horizontal division, vertical division, and other division scenarios;

For the data acquisition in the step 1, if the final division result has obvious horizontal division marks, namely the block is 32×w and w×16, the block is divided by a horizontal binary tree or a horizontal trigeminal tree, so that the block label is set to 0 and then acquired; if the final division result has obvious vertical division marks, namely the block is H multiplied by 32 and H multiplied by 16, the block is divided by a vertical binary tree or a vertical three-tree, so that the block label is set to be 1 and then is collected; the other cases were collected as the case of tag 2.

For the four convolution operations of step 2, four convolution kernels are set and the step sizes are the same.

And (3) connecting the characteristics obtained after the convolution layer in the step (2) with the quantization parameter QP and the optimal intra-frame prediction mode PredMode through a full connection layer.

The present invention will be described in detail with reference to examples.

Examples

The embodiment shows a method for rapidly dividing a coding unit under a VVC standard, the flow of which is shown in fig. 1, and the steps of which include:

step 1: collecting data, namely selecting 12 video sequences BQTerrace, basketballdrill, basketballDrive, fourpeople, bubble, johnny, chinaspeed, BQsquare, basketballpass, cactus, racehorses, BQMall, selecting 1 frame every 8 frames, selecting 15 frames in total for each video, and collecting data in a mode of step 1;

step 2: inputting the data set into a network for training, and obtaining a network model for VVC coding block rapid partition prediction after training;

step 3: model deployment: in the actual encoding process of the VVC, for each CU with the size of 32 multiplied by 32, the brightness value, the QP value and the optimal intra-frame prediction mode value of the CU are input into the trained network to obtain the prediction of the optimal division of the current CU, and the advanced decision of the division mode is realized according to the output value of the network. The method comprises the following steps:

if the current CU size is 32×32, the current CU is input into the network to perform partition prediction. If the horizontal division is judged, only the division coding in the horizontal direction is carried out; if the vertical division is judged, only division coding in the vertical direction is performed; if the two conditions are not satisfied, classifying the two conditions into other conditions, and selecting a division mode according to the original VTM flow. If the current CU size does not belong to 32×32, the current CU size is encoded according to the VTM original division mode.

Comparing the performance of the method with that of the original VTM10.0 model, specific experimental results are given in table 1, and the experimental results adopt three measures of PSNR, BDBR and time saving rate. Wherein, BDBR represents the bit rate savings over the original VTM10.0 method under the condition of equal objective video quality. The coding time compares to the savings in coding time relative to the original VTM10.0 method.

TABLE 1 comparison of the coding results of the inventive method and the method of VTM6.0

The CU rapid dividing method provided by the invention can reduce the complexity of coding while guaranteeing the video quality.

The foregoing is merely a preferred embodiment of the invention, and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A coding unit rapid dividing method under a VVC standard is characterized by comprising the following steps:

2. The method according to claim 1, wherein for the data collection in step 1, if there is a horizontal division trace in the final division result, that is, the block size is 32×w, w×16, the block is described as being divided by a horizontal binary tree or a horizontal trigeminal tree, so that the block label is set to 0 and then collected; if the final division result has vertical division marks, namely the size of the block is H multiplied by 32 and H multiplied by 16, the block is divided by a vertical binary tree or a vertical trigeminal tree, so that the block label is set to be 1 and then collected; the other cases were collected as the case of tag 2.

3. The method according to claim 1, wherein four convolution kernels and step sizes are set to be the same for four convolution operations of step 2.

4. A coding unit fast partitioning system under VVC standard, comprising:

5. The system according to claim 4, wherein for the data acquisition module, if there is a horizontal division trace in the final division result, that is, the block size is 32×w, w×16, the block is described as being divided by a horizontal binary tree or a horizontal trigeminal tree, so that the block tag is set to 0 and then acquired; if the final division result has vertical division marks, namely the size of the block is H multiplied by 32 and H multiplied by 16, the block is divided by a vertical binary tree or a vertical trigeminal tree, so that the block label is set to be 1 and then collected; the other cases were collected as the case of tag 2.

6. The system according to claim 4, wherein four convolution kernels and step sizes are set to be the same for four convolution operations.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the preprocessing method of any one of claims 1-3 when executing the program.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a preprocessing method as claimed in any one of claims 1-3.

9. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-3.