CN111405295A

CN111405295A - Video coding unit segmentation method, system and hardware implementation method

Info

Publication number: CN111405295A
Application number: CN202010112709.6A
Authority: CN
Inventors: 黄震坤
Original assignee: Hexin Interconnect Technology Qingdao Co ltd
Current assignee: Hexin Interconnect Technology Qingdao Co ltd
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2020-07-10

Abstract

The embodiment of the invention discloses a video coding unit segmentation method, a system and a hardware implementation method, wherein a video coding unit image and a segmentation label thereof are obtained, and a training set is constructed; constructing a two-classifier model based on a convolutional neural network; training the two classifier models using the training set; and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result. The encoding unit is divided by the classifier based on deep learning, on the basis of ensuring the accuracy of the classification result of the classifier, compared with a method of HEVC original quad-tree combined with RDO, the method has the advantages that the running time is greatly reduced, the corresponding hardware circuit is designed, and the hardware utilization rate is high.

Description

Video coding unit segmentation method, system and hardware implementation method

Technical Field

The embodiment of the invention relates to the technical field of video coding, in particular to a video coding unit segmentation method, a video coding unit segmentation system and a hardware implementation method.

Background

Because the data volume of the Video image is large, the Video image needs to be compressed and coded in the actual storage and transmission process, and the conventional high Efficiency Video coding hevc (high Efficiency Video coding) is widely used due to high coding compression Efficiency. HEVC is a hybrid block-based motion compensated transform Coding architecture, and the basic Unit for video data compression is called Coding Tree Unit (CTU), each CTU may contain one Coding Unit (CU) or be recursively divided into four smaller CUs, each of which may be sub-partitioned. The size of a coding unit of HEVC is divided by a quadtree method by default, and an optimal division mode is selected by calculating the cost of blocks with different sizes and comparing the cost of the blocks with different sizes. The method for comparing the sizes of the blocks one by one based on the quad tree consumes a lot of time, and is not beneficial to hardware implementation. The method for improving block partitioning is a hot direction in current video coding research and application. At present, deep learning is widely applied to various fields of video and image processing, and the actual effect is generally superior to that of traditional methods in various fields. The embodiment of the invention adopts a CNN (Convolutional Neural network) based method to replace the original quad-tree method of HEVC, thereby achieving the aim of improving the coding efficiency and designing a corresponding hardware circuit.

Disclosure of Invention

Therefore, embodiments of the present invention provide a method, a system, and a method for partitioning a video coding unit, so as to solve the problems of the existing method for partitioning a video coding unit, such as high time consumption and being not suitable for hardware implementation.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

according to a first aspect of the embodiments of the present invention, a video coding unit partitioning method is provided, the method including:

acquiring a video coding unit image and a segmentation label thereof, and constructing a training set;

constructing a two-classifier model based on a convolutional neural network;

training the two classifier models using the training set;

and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.

Further, the convolutional neural network-based two classifier model specifically comprises three parallel first convolutional layers, second convolutional layers and third convolutional layers, the first convolutional layers, the second convolutional layers and the third convolutional layers are respectively connected with one pooling layer, the three pooling layers are connected to a feature fusion layer, the feature fusion layer is connected with a fourth convolutional layer, and the fourth convolutional layer is connected with a Softmax layer.

Further, the input image size of the convolutional neural network-based two classifier model is 64 × 64 or 32 × 32 or 16 × 16, the first convolutional layer is 7 × 3 convolutional layers, the second convolutional layer is 3 × 3 convolutional layers, the third convolutional layer is 3 × 7 convolutional layers, and the fourth convolutional layer is 1 × 1 convolutional layers.

Further, the method further comprises:

and establishing the two classifier models by adopting python based on a TensorFlow framework, and training the two classifier models until convergence, so as to obtain model parameters.

Further, the segmentation labels of the video coding unit images in the training set are obtained through calculation of an original quadtree algorithm of HEVC.

According to a second aspect of the embodiments of the present invention, there is provided a video coding unit partitioning system, the system comprising:

the training set constructing module is used for acquiring the video coding unit images and the segmentation labels thereof and constructing a training set;

the model construction module is used for constructing a two-classifier model based on a convolutional neural network;

a model training module to train the two classifier models using the training set;

and the segmentation module is used for inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.

According to a third aspect of the embodiments of the present invention, a hardware implementation method is provided for implementing a convolutional neural network-based classifier model in a video coding unit segmentation method in hardware, where the method includes:

and C synthesizing the trained two-classifier model based on the convolutional neural network by adopting an H L S tool to obtain a hardware circuit description file of the model.

Further, the H L S tool is Catapult H L S.

Further, the method further comprises: and connecting the structural layers in the convolutional neural network in an interface mode.

Further, the method further comprises: and each interface adopts an FIFO register to realize the complete parallelization of the data processing of each structural layer module.

The embodiment of the invention has the following advantages:

the embodiment of the invention provides a video coding unit segmentation method, a video coding unit segmentation system and a hardware implementation method, wherein a video coding unit image and a segmentation label thereof are obtained, and a training set is constructed; constructing a two-classifier model based on a convolutional neural network; training the two classifier models using the training set; and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result. The deep learning-based classifier is used for coding unit segmentation, on the basis of ensuring the accuracy of classification results of the classifier, compared with a method of HEVC original quad-tree combined with RDO, the method has the advantages that the running time is greatly reduced, a corresponding hardware circuit is designed, and the hardware utilization rate is high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

Fig. 1 is a flowchart illustrating a video coding unit partitioning method according to embodiment 1 of the present invention;

fig. 2 is a schematic structural diagram of a convolutional neural network-based two-classifier model in a video coding unit segmentation method according to embodiment 1 of the present invention;

fig. 3 is an input image of an experimental example of a video coding unit segmentation method according to embodiment 1 of the present invention;

fig. 4 is a second input image of an experimental example of a video coding unit segmentation method according to embodiment 1 of the present invention;

fig. 5 is a schematic diagram of a hardware circuit description file in a hardware implementation method according to embodiment 3 of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Embodiment 1 of the present invention provides a video coding unit partitioning method, as shown in fig. 1, the method including the following steps:

and step 110, acquiring the video coding unit image and the segmentation label thereof, and constructing a training set.

Specifically, the training set sample adopts a yuv file for testing commonly used in HEVC and various videos collected on the network. Further, the segmentation labels of the video coding unit images in the training set are obtained by the original quadtree algorithm of HEVC. And extracting CTU original pictures and corresponding segmentation labels from the HEVC encoder to construct a training set.

And step 120, constructing a classifier model based on the convolutional neural network.

And establishing a two-classifier model by adopting python based on a TensorFlow framework. Specifically, as shown in fig. 2, the convolutional neural network-based two classifier model specifically includes three parallel first convolutional layers, second convolutional layers, and third convolutional layers, the first convolutional layers, the second convolutional layers, and the third convolutional layers are respectively connected with one pooling layer, the three pooling layers are all connected to the feature fusion layer, the feature fusion layer is connected with a fourth convolutional layer, and the fourth convolutional layer is connected with a Softmax layer. The input image size of the classifier model is 64 × 64 or 32 × 32 or 16 × 16, the first convolution layer is 7 × 3 convolution layers, the second convolution layer is 3 × 3 convolution layers, the third convolution layer is 3 × 7 convolution layers, and the fourth convolution layer is 1 × 1 convolution layer. The network structure of the two-classifier model is replaced by a full convolution layer of 1 × 1, so that three different coding units of 64 × 64, 32 × 32 and 16 × 16 can be self-adapted, three different types of input images, namely 64 × 64, 32 × 32 or 16 × 16, can be processed through one two-classifier, different classifiers do not need to be trained for images with different sizes, and the robustness of the classifiers is improved.

Inputting 64 × 64, 32 × 32 or 16 × 16 images, obtaining convolution characteristics through three convolution layers in parallel, then entering a pooling layer, fusing the pooled characteristics to obtain fused characteristics, classifying through 1 × 1 convolution and softmax, and finally obtaining whether the input images need to be subjected to next segmentation labels. The three parallel convolution layers can be subjected to parallel computation simultaneously and are not interfered with each other, and the parallelization characteristic is beneficial to improving the utilization rate of hardware.

Step 130, training the two classifier model using the training set.

Specifically, a common TensorFlow framework is adopted to train the classifier model until convergence, and model parameters are obtained. And applying a relevant API of TensorFlow to carry out int8 quantization on the obtained model parameters, and then importing the quantized model parameters into a comprehensive model C + + code.

And 140, inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.

The trained model can be applied to the division of a video coding unit, and the result of whether the image needs to be further divided or not can be output after the image is input into the trained two-classifier model.

Experimental example: firstly, an input image is a 16-by-16 block, as shown in fig. 3, the mark of the block is required to be segmented, two-dimensional data (0.85, 0.15) is output through the two-classifier model, the output two-dimensional data is corresponding to the mark of (segmented, not segmented), and the final segmentation result with high probability is selected, namely the result is segmented; and secondly, the input image is a 16-by-16 block, as shown in fig. 4, the block is marked as being not required to be segmented, two-dimensional data (0.01 and 0.99) are output through the two-classifier model, and the final segmentation result with high probability is also selected, namely the result is that the segmentation is not required.

The embodiment of the invention provides a video coding unit segmentation method, which comprises the steps of obtaining a video coding unit image and a segmentation label thereof, and constructing a training set; constructing a two-classifier model based on a convolutional neural network; training a second classifier model by using a training set; and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result. The classifier based on deep learning carries out coding unit segmentation, and compared with a method of HEVC original quadtree combined with RDO, the method has the advantages that the running time is greatly reduced, and the algorithm can realize parallelization, so that the utilization rate of hardware is improved.

In correspondence with embodiment 1 described above, embodiment 2 of the present invention provides a video coding unit partitioning system, including:

the model training module is used for training the second classifier model by using a training set;

The functions performed by each component in the video coding unit partitioning system provided in embodiment 2 of the present invention are described in detail in embodiment 1, and therefore, redundant description is not repeated here.

Embodiment 3 of the present invention provides a hardware implementation method, which is used for implementing hardware of a classifier model based on a convolutional neural network in a video coding unit segmentation method, and includes:

Specifically, after the H L S tool used in this embodiment is Catapult H L s. two classifiers are designed, C synthesis, C simulation, modelsim simulation are performed on a synthesizable model C + + code using an H L S tool, and a circuit description file is synthesized, as shown in fig. 5.

The method further comprises the step of connecting all structural layers in the convolutional neural network in an interface mode, wherein in a generated hardware circuit description file, the fact that all layers are integrated into an interface can be seen in fig. 1, Catapult H L S software (I) shows that all modules execute in parallel in such a mode, the characteristic that hardware can be parallelized is fully utilized, and the utilization efficiency of the hardware is effectively improved.

The softmax adopts a function provided in a Catapult H L S library, the function aims at the problem that the exponential function cannot be directly subjected to hardware design, a plurality of piecewise functions are adopted to approximate the exponential function, and tests show that the mean square error of the simulation function and the real softmax is within an acceptable range.1-1 convolution adopts softmax as an active layer to classify the image, namely whether further segmentation is needed.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A method for video coding unit partitioning, the method comprising:

constructing a two-classifier model based on a convolutional neural network;

training the two classifier models using the training set;

2. The method according to claim 1, wherein the convolutional neural network-based two classifier model specifically includes three parallel first convolutional layers, second convolutional layers, and third convolutional layers, each of the first convolutional layer, the second convolutional layer, and the third convolutional layer is connected to one pooling layer, all of the three pooling layers are connected to a feature fusion layer, the feature fusion layer is connected to a fourth convolutional layer, and the fourth convolutional layer is connected to a Softmax layer.

3. The method of claim 2, wherein the input image size of the convolutional neural network-based two-classifier model is 64 × 64 or 32 × 32 or 16 × 16, the first convolutional layer is 7 × 3 convolutional layer, the second convolutional layer is 3 × 3 convolutional layer, the third convolutional layer is 3 × 7 convolutional layer, and the fourth convolutional layer is 1 × 1 convolutional layer.

4. The method of claim 1, further comprising:

5. The method as claimed in claim 1, wherein the segmentation labels of the images of the video coding units in the training set are obtained by a raw quadtree algorithm of HEVC.

6. A video coding unit partitioning system, the system comprising:

7. A hardware implementation method for implementing a convolutional neural network-based two-classifier model in a video coding unit segmentation method according to any one of claims 1 to 5, the method comprising:

8. The hardware implementation method of claim 7, wherein the H L S tool is CatapultH L S.

9. A hardware implementation method according to claim 7, wherein the method further comprises: and connecting the structural layers in the convolutional neural network in an interface mode.

10. A hardware implementation method according to claim 9, the method further comprising: and each interface adopts an FIFO register to realize the complete parallelization of the data processing of each structural layer module.