CN116033153A - Method and system for rapidly dividing coding units under VVC standard - Google Patents

Method and system for rapidly dividing coding units under VVC standard Download PDF

Info

Publication number
CN116033153A
CN116033153A CN202211669820.0A CN202211669820A CN116033153A CN 116033153 A CN116033153 A CN 116033153A CN 202211669820 A CN202211669820 A CN 202211669820A CN 116033153 A CN116033153 A CN 116033153A
Authority
CN
China
Prior art keywords
division
data
horizontal
block
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211669820.0A
Other languages
Chinese (zh)
Inventor
伏长虹
闫依婷
洪弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202211669820.0A priority Critical patent/CN116033153A/en
Publication of CN116033153A publication Critical patent/CN116033153A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a method and a system for rapidly dividing coding units under a VVC standard, wherein the method comprises the following steps: step 1, collecting data in the process of encoding different video sequences: a CU pixel value of 32 x 32 size, quantization parameters, best intra prediction mode, and final partition label; step 2, training the constructed convolutional neural network by using the data saved in the step 1 to obtain a model; and 3, in the actual coding, for each CU with the size of 32 multiplied by 32, a trained model is called to input gray values, quantization parameters and the optimal intra-frame prediction mode in the CU into a trained network, and the network outputs final division labels, wherein each label corresponds to different division modes. The CU rapid dividing method provided by the invention can reduce the complexity of coding while guaranteeing the video quality.

Description

Method and system for rapidly dividing coding units under VVC standard
Technical Field
The invention belongs to the field of video coding and decoding processing, and particularly relates to a method and a system for rapidly dividing coding units under a variable valve timing (VVC) standard.
Background
The rapid development of multimedia devices, the continual increase in demand for high quality video applications, and the rapid growth in the volume of video data present challenges to existing video compression techniques.
The division mode specified by the VVC standard includes a quadtree division, a binary tree division, and a trigeminal tree division, wherein both the binary tree division and the trigeminal tree division can be further subdivided into a horizontal division and a vertical division. Furthermore, the present invention will not divide as a new way of dividing. Therefore, for most CUs, the number of partitions is 6. To determine the final optimal partitioning scheme, all partitioning schemes need to be traversed through a rate distortion optimization (Rate Distortion Optimization, RDO) process, and the final determined optimal partitioning scheme will have the smallest RD cost (RD-cost). The CU division mode is selected by calculating from top to bottom and then selecting from bottom to top. The method can bring about a good dividing effect and improve the coding efficiency, but has high computational complexity.
In general, the CU fast partitioning approach based on the conventional features focuses more on the texture information of the coding block itself. Such approaches typically use mathematical tools such as gradients, variances, etc., to find the relationship between a feature and the final partitioning result from the CU itself. Although the partitioning of the CU and the extracted features have a strong correlation, there is also a problem in that for a CU with complex texture, the method cannot decide whether further partitioning is required. In addition, performing operations such as variance and gradient on the CU introduces unnecessary computations.
In recent years, deep learning has been widely used for image classification problems. Therefore, it has become a major trend to deal with CU fast partitioning problem by adopting a convolutional neural network-based manner.
Disclosure of Invention
The invention aims to provide a method and a system for rapidly dividing coding units under a variable valve timing (VVC) standard by utilizing a convolutional neural network so as to achieve the aim of reducing calculation complexity.
The technical solution for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a method for rapidly dividing coding units under VVC standard, including:
step 1, selecting standard test sequences provided by 12 VVC, selecting 15 frames of data for data acquisition of each video sequence, and storing data with horizontal division or vertical division marks in the data acquisition process, wherein the division marks specifically refer to the horizontal division if the coding result contains 32 XH CU and H is smaller than 32, the coding result contains W X32 CU and W is smaller than 32, and the vertical division is marked; the specific data stored include: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the best label values corresponding to the current CU in the original encoder are classified into three types: horizontal division, vertical division, and other division scenarios;
step 2, inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution on a 32×32 CU, then transforming a 1×1 QP with a dimension and an optimal intra-frame prediction mode with a dimension of 1×1 through a full connection layer, transforming the QP with the dimension of 1×1 into 32×1×1, then overlapping the QP with the 64×1×1 global features after convolution, changing the overlapped dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally, corresponding to the collected data labels, thereby completing network training;
and 3, in the actual encoding process, when the encoder encodes the CU with the size of 32 multiplied by 32, loading a convolutional neural network model, and inputting the gray pixel value of the CU with the size of 32 multiplied by 32, wherein the network model can give a final division label, and the advanced decision of the division mode is realized according to the final label.
In a second aspect, the present invention provides a coding unit rapid partitioning system under VVC standard, including:
the data acquisition module is used for acquiring data of a training convolutional neural network, selecting standard test sequences provided by 12 VVC, selecting 15 frames of data for each video sequence to acquire the data, and storing the data with horizontal division or vertical division marks in the data acquisition process, wherein the division marks specifically refer to the horizontal division if the coding result contains 32 XH CU and H is smaller than 32, the horizontal division is marked, the coding result contains W X32 CU and W is smaller than 32, and the vertical division is marked; the specific data stored include: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the label values corresponding to the current CU, the specific labels are classified into three categories: horizontal division, vertical division, and other division scenarios;
the network training module is used for inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution are carried out on 32×32 CUs, then converting a QP with a dimension of 1×1 and an optimal intra-frame prediction mode with a dimension of 1×1 into 32×1×1 through a full connection layer, then superposing the QP with the convolved 64×1×1 global features, changing the superposed dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally outputting the prediction probability of the current CU division and the collected data label to correspond to each other, so that network training is completed;
the model loading module is used for inputting the pixel value, the quantization parameter and the optimal intra-frame prediction mode of the CU of 32 multiplied by 32 into a trained network in the actual coding process, and outputting a final partition label of the CU by the network for deciding the partition mode of the CU in advance.
In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the program is executed.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of the first aspect.
In a fifth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
Compared with the existing algorithm, the method has the following advantages:
(1) Compared with a CU rapid dividing mode based on traditional characteristics, the CU rapid dividing mode based on the convolutional neural network does not need to manually screen characteristics related to dividing results, and only gray pixel values of the CU are required to be input into the network;
(2) The CU rapid partitioning method based on the convolutional neural network does not introduce a large amount of extra calculation, and the loading and prediction of the convolutional neural network have small time occupation ratio in the whole coding process.
Drawings
Fig. 1 is a flowchart of a CU fast partitioning method based on convolutional neural network.
Fig. 2 is a multi-type tree partition structure under the VVC standard.
Fig. 3 is a convolutional neural network structure diagram under a CU fast partitioning algorithm based on a convolutional neural network.
Detailed Description
Referring to fig. 1 to fig. 3, the present invention provides a method for quickly dividing Coding Units (CUs) under a video VVC standard, and for a CU with a size of 32×32, specific processing steps for quickly deciding a dividing manner are as follows:
step 1, selecting N frame data for data acquisition of different video sequences, and storing the data with obvious horizontal division or vertical division marks in the data acquisition process, wherein the specific stored data comprises the following steps of: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the label values corresponding to the current CU, the specific labels are classified into three categories: horizontal division, vertical division, and other division scenarios;
step 2, inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution on a 32×32 CU, then transforming a 1×1 QP with a dimension and an optimal intra-frame prediction mode with a dimension of 1×1 through a full connection layer, transforming the QP with the dimension of 1×1 into 32×1×1, then overlapping the QP with the 64×1×1 global features after convolution, changing the overlapped dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally, corresponding to the collected data labels, thereby completing network training;
and 3, in the actual encoding process, when the encoder encodes the CU with the size of 32 multiplied by 32, loading a convolutional neural network model, and inputting the gray pixel value of the CU with the size of 32 multiplied by 32, wherein the network model can give a final division label, and the advanced decision of the division mode is realized according to the final label.
For the data acquisition in the step 1, if the final division result has obvious horizontal division marks, namely the block is 32×w and w×16, the block is divided by a horizontal binary tree or a horizontal trigeminal tree, so that the block label is set to 0 and then acquired; if the final division result has obvious vertical division marks, namely the block is H multiplied by 32 and H multiplied by 16, the block is divided by a vertical binary tree or a vertical three-tree, so that the block label is set to be 1 and then is collected; the other cases were collected as the case of tag 2.
For the four convolution operations of step 2, four convolution kernels are set and the step sizes are the same.
And (3) connecting the characteristics obtained after the convolution layer in the step (2) with the quantization parameter QP and the optimal intra-frame prediction mode PredMode through a full connection layer.
The present invention will be described in detail with reference to examples.
Examples
The embodiment shows a method for rapidly dividing a coding unit under a VVC standard, the flow of which is shown in fig. 1, and the steps of which include:
step 1: collecting data, namely selecting 12 video sequences BQTerrace, basketballdrill, basketballDrive, fourpeople, bubble, johnny, chinaspeed, BQsquare, basketballpass, cactus, racehorses, BQMall, selecting 1 frame every 8 frames, selecting 15 frames in total for each video, and collecting data in a mode of step 1;
step 2: inputting the data set into a network for training, and obtaining a network model for VVC coding block rapid partition prediction after training;
step 3: model deployment: in the actual encoding process of the VVC, for each CU with the size of 32 multiplied by 32, the brightness value, the QP value and the optimal intra-frame prediction mode value of the CU are input into the trained network to obtain the prediction of the optimal division of the current CU, and the advanced decision of the division mode is realized according to the output value of the network. The method comprises the following steps:
if the current CU size is 32×32, the current CU is input into the network to perform partition prediction. If the horizontal division is judged, only the division coding in the horizontal direction is carried out; if the vertical division is judged, only division coding in the vertical direction is performed; if the two conditions are not satisfied, classifying the two conditions into other conditions, and selecting a division mode according to the original VTM flow. If the current CU size does not belong to 32×32, the current CU size is encoded according to the VTM original division mode.
Comparing the performance of the method with that of the original VTM10.0 model, specific experimental results are given in table 1, and the experimental results adopt three measures of PSNR, BDBR and time saving rate. Wherein, BDBR represents the bit rate savings over the original VTM10.0 method under the condition of equal objective video quality. The coding time compares to the savings in coding time relative to the original VTM10.0 method.
TABLE 1 comparison of the coding results of the inventive method and the method of VTM6.0
Figure BDA0004015858760000051
The CU rapid dividing method provided by the invention can reduce the complexity of coding while guaranteeing the video quality.
The foregoing is merely a preferred embodiment of the invention, and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (9)

1. A coding unit rapid dividing method under a VVC standard is characterized by comprising the following steps:
step 1, selecting standard test sequences provided by 12 VVC, selecting 15 frames of data for data acquisition of each video sequence, and storing data with horizontal division or vertical division marks in the data acquisition process, wherein the division marks specifically refer to the horizontal division if the coding result contains 32 XH CU and H is smaller than 32, the coding result contains W X32 CU and W is smaller than 32, and the vertical division is marked; the specific data stored include: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the best label values corresponding to the current CU in the original encoder are classified into three types: horizontal division, vertical division, and other division scenarios;
step 2, inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution on a 32×32 CU, then transforming a 1×1 QP with a dimension and an optimal intra-frame prediction mode with a dimension of 1×1 through a full connection layer, transforming the QP with the dimension of 1×1 into 32×1×1, then overlapping the QP with the 64×1×1 global features after convolution, changing the overlapped dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally, corresponding to the collected data labels, thereby completing network training;
and 3, in the actual encoding process, when the encoder encodes the CU with the size of 32 multiplied by 32, loading a convolutional neural network model, and inputting the gray pixel value of the CU with the size of 32 multiplied by 32, wherein the network model can give a final division label, and the advanced decision of the division mode is realized according to the final label.
2. The method according to claim 1, wherein for the data collection in step 1, if there is a horizontal division trace in the final division result, that is, the block size is 32×w, w×16, the block is described as being divided by a horizontal binary tree or a horizontal trigeminal tree, so that the block label is set to 0 and then collected; if the final division result has vertical division marks, namely the size of the block is H multiplied by 32 and H multiplied by 16, the block is divided by a vertical binary tree or a vertical trigeminal tree, so that the block label is set to be 1 and then collected; the other cases were collected as the case of tag 2.
3. The method according to claim 1, wherein four convolution kernels and step sizes are set to be the same for four convolution operations of step 2.
4. A coding unit fast partitioning system under VVC standard, comprising:
the data acquisition module is used for acquiring data of a training convolutional neural network, selecting standard test sequences provided by 12 VVC, selecting 15 frames of data for each video sequence to acquire the data, and storing the data with horizontal division or vertical division marks in the data acquisition process, wherein the division marks specifically refer to the horizontal division if the coding result contains 32 XH CU and H is smaller than 32, the horizontal division is marked, the coding result contains W X32 CU and W is smaller than 32, and the vertical division is marked; the specific data stored include: the gray pixel values of a CU of 32×32 size, quantization parameters when encoding the current CU, the best intra prediction mode selected by the current CU, and the label values corresponding to the current CU, the specific labels are classified into three categories: horizontal division, vertical division, and other division scenarios;
the network training module is used for inputting collected data into a network according to groups, forming 64×1×1 global features after four layers of convolution are carried out on 32×32 CUs, then converting a QP with a dimension of 1×1 and an optimal intra-frame prediction mode with a dimension of 1×1 into 32×1×1 through a full connection layer, then superposing the QP with the convolved 64×1×1 global features, changing the superposed dimension into 96×1×1, finally outputting the prediction probability of the current CU division through two 1×1 convolution layers, and finally outputting the prediction probability of the current CU division and the collected data label to correspond to each other, so that network training is completed;
the model loading module is used for inputting the pixel value, the quantization parameter and the optimal intra-frame prediction mode of the CU of 32 multiplied by 32 into a trained network in the actual coding process, and outputting a final partition label of the CU by the network for deciding the partition mode of the CU in advance.
5. The system according to claim 4, wherein for the data acquisition module, if there is a horizontal division trace in the final division result, that is, the block size is 32×w, w×16, the block is described as being divided by a horizontal binary tree or a horizontal trigeminal tree, so that the block tag is set to 0 and then acquired; if the final division result has vertical division marks, namely the size of the block is H multiplied by 32 and H multiplied by 16, the block is divided by a vertical binary tree or a vertical trigeminal tree, so that the block label is set to be 1 and then collected; the other cases were collected as the case of tag 2.
6. The system according to claim 4, wherein four convolution kernels and step sizes are set to be the same for four convolution operations.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the preprocessing method of any one of claims 1-3 when executing the program.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a preprocessing method as claimed in any one of claims 1-3.
9. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-3.
CN202211669820.0A 2022-12-25 2022-12-25 Method and system for rapidly dividing coding units under VVC standard Pending CN116033153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211669820.0A CN116033153A (en) 2022-12-25 2022-12-25 Method and system for rapidly dividing coding units under VVC standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211669820.0A CN116033153A (en) 2022-12-25 2022-12-25 Method and system for rapidly dividing coding units under VVC standard

Publications (1)

Publication Number Publication Date
CN116033153A true CN116033153A (en) 2023-04-28

Family

ID=86072138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211669820.0A Pending CN116033153A (en) 2022-12-25 2022-12-25 Method and system for rapidly dividing coding units under VVC standard

Country Status (1)

Country Link
CN (1) CN116033153A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117014610A (en) * 2023-10-07 2023-11-07 华侨大学 Method and device for rapidly dividing intra-frame CUs of H.266VVC screen content based on multitask learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117014610A (en) * 2023-10-07 2023-11-07 华侨大学 Method and device for rapidly dividing intra-frame CUs of H.266VVC screen content based on multitask learning
CN117014610B (en) * 2023-10-07 2023-12-29 华侨大学 Method and device for rapidly dividing intra-frame CUs of H.266VVC screen content based on multitask learning

Similar Documents

Publication Publication Date Title
EP1938613B1 (en) Method and apparatus for using random field models to improve picture and video compression and frame rate up conversion
CN111768432A (en) Moving target segmentation method and system based on twin deep neural network
CN106170093B (en) Intra-frame prediction performance improving coding method
CN114286093A (en) Rapid video coding method based on deep neural network
CN111479110B (en) Fast affine motion estimation method for H.266/VVC
CN116033153A (en) Method and system for rapidly dividing coding units under VVC standard
CN115941943A (en) HEVC video coding method
CN116939226A (en) Low-code-rate image compression-oriented generated residual error repairing method and device
CN111432212A (en) Intra-frame division method and system based on texture features and storage medium
CN113947538A (en) Multi-scale efficient convolution self-attention single image rain removing method
CN109391818B (en) DCT (discrete cosine transformation) -based fractal image compression method for rapid search
CN115190300A (en) Method for predicting attribute information, encoder, decoder, and storage medium
CN112954350B (en) Video post-processing optimization method and device based on frame classification
CN113194312B (en) Planetary science exploration image adaptive quantization coding system combined with visual saliency
US20220094951A1 (en) Palette mode video encoding utilizing hierarchical palette table generation
Nikzad et al. Attention-based Pyramid Dilated Lattice Network for Blind Image Denoising.
CN110460844B (en) 3D-HEVC rapid CU partition prediction method based on DWT
CN114071138A (en) Intra-frame prediction encoding method, intra-frame prediction encoding device, and computer-readable medium
Bakkouri et al. Effective CU size decision algorithm based on depth map homogeneity for 3D-HEVC inter-coding
CN117692652B (en) Visible light and infrared video fusion coding method based on deep learning
CN113225552B (en) Intelligent rapid interframe coding method
Akutsu et al. End-to-End Deep ROI Image Compression
Qaderi et al. A Gated Deep Model for Single Image Super-Resolution Reconstruction
Liu et al. An Unsupervised Attentive-Adversarial Learning Framework for Single Image Deraining
Mei et al. Lightweight High-Performance Blind Image Quality Assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination