CN111405295A - Video coding unit segmentation method, system and hardware implementation method - Google Patents

Video coding unit segmentation method, system and hardware implementation method Download PDF

Info

Publication number
CN111405295A
CN111405295A CN202010112709.6A CN202010112709A CN111405295A CN 111405295 A CN111405295 A CN 111405295A CN 202010112709 A CN202010112709 A CN 202010112709A CN 111405295 A CN111405295 A CN 111405295A
Authority
CN
China
Prior art keywords
video coding
coding unit
convolutional
classifier
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010112709.6A
Other languages
Chinese (zh)
Inventor
黄震坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hexin Interconnect Technology Qingdao Co ltd
Original Assignee
Hexin Interconnect Technology Qingdao Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hexin Interconnect Technology Qingdao Co ltd filed Critical Hexin Interconnect Technology Qingdao Co ltd
Priority to CN202010112709.6A priority Critical patent/CN111405295A/en
Publication of CN111405295A publication Critical patent/CN111405295A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Abstract

The embodiment of the invention discloses a video coding unit segmentation method, a system and a hardware implementation method, wherein a video coding unit image and a segmentation label thereof are obtained, and a training set is constructed; constructing a two-classifier model based on a convolutional neural network; training the two classifier models using the training set; and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result. The encoding unit is divided by the classifier based on deep learning, on the basis of ensuring the accuracy of the classification result of the classifier, compared with a method of HEVC original quad-tree combined with RDO, the method has the advantages that the running time is greatly reduced, the corresponding hardware circuit is designed, and the hardware utilization rate is high.

Description

Video coding unit segmentation method, system and hardware implementation method
Technical Field
The embodiment of the invention relates to the technical field of video coding, in particular to a video coding unit segmentation method, a video coding unit segmentation system and a hardware implementation method.
Background
Because the data volume of the Video image is large, the Video image needs to be compressed and coded in the actual storage and transmission process, and the conventional high Efficiency Video coding hevc (high Efficiency Video coding) is widely used due to high coding compression Efficiency. HEVC is a hybrid block-based motion compensated transform Coding architecture, and the basic Unit for video data compression is called Coding Tree Unit (CTU), each CTU may contain one Coding Unit (CU) or be recursively divided into four smaller CUs, each of which may be sub-partitioned. The size of a coding unit of HEVC is divided by a quadtree method by default, and an optimal division mode is selected by calculating the cost of blocks with different sizes and comparing the cost of the blocks with different sizes. The method for comparing the sizes of the blocks one by one based on the quad tree consumes a lot of time, and is not beneficial to hardware implementation. The method for improving block partitioning is a hot direction in current video coding research and application. At present, deep learning is widely applied to various fields of video and image processing, and the actual effect is generally superior to that of traditional methods in various fields. The embodiment of the invention adopts a CNN (Convolutional Neural network) based method to replace the original quad-tree method of HEVC, thereby achieving the aim of improving the coding efficiency and designing a corresponding hardware circuit.
Disclosure of Invention
Therefore, embodiments of the present invention provide a method, a system, and a method for partitioning a video coding unit, so as to solve the problems of the existing method for partitioning a video coding unit, such as high time consumption and being not suitable for hardware implementation.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
according to a first aspect of the embodiments of the present invention, a video coding unit partitioning method is provided, the method including:
acquiring a video coding unit image and a segmentation label thereof, and constructing a training set;
constructing a two-classifier model based on a convolutional neural network;
training the two classifier models using the training set;
and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.
Further, the convolutional neural network-based two classifier model specifically comprises three parallel first convolutional layers, second convolutional layers and third convolutional layers, the first convolutional layers, the second convolutional layers and the third convolutional layers are respectively connected with one pooling layer, the three pooling layers are connected to a feature fusion layer, the feature fusion layer is connected with a fourth convolutional layer, and the fourth convolutional layer is connected with a Softmax layer.
Further, the input image size of the convolutional neural network-based two classifier model is 64 × 64 or 32 × 32 or 16 × 16, the first convolutional layer is 7 × 3 convolutional layers, the second convolutional layer is 3 × 3 convolutional layers, the third convolutional layer is 3 × 7 convolutional layers, and the fourth convolutional layer is 1 × 1 convolutional layers.
Further, the method further comprises:
and establishing the two classifier models by adopting python based on a TensorFlow framework, and training the two classifier models until convergence, so as to obtain model parameters.
Further, the segmentation labels of the video coding unit images in the training set are obtained through calculation of an original quadtree algorithm of HEVC.
According to a second aspect of the embodiments of the present invention, there is provided a video coding unit partitioning system, the system comprising:
the training set constructing module is used for acquiring the video coding unit images and the segmentation labels thereof and constructing a training set;
the model construction module is used for constructing a two-classifier model based on a convolutional neural network;
a model training module to train the two classifier models using the training set;
and the segmentation module is used for inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.
According to a third aspect of the embodiments of the present invention, a hardware implementation method is provided for implementing a convolutional neural network-based classifier model in a video coding unit segmentation method in hardware, where the method includes:
and C synthesizing the trained two-classifier model based on the convolutional neural network by adopting an H L S tool to obtain a hardware circuit description file of the model.
Further, the H L S tool is Catapult H L S.
Further, the method further comprises: and connecting the structural layers in the convolutional neural network in an interface mode.
Further, the method further comprises: and each interface adopts an FIFO register to realize the complete parallelization of the data processing of each structural layer module.
The embodiment of the invention has the following advantages:
the embodiment of the invention provides a video coding unit segmentation method, a video coding unit segmentation system and a hardware implementation method, wherein a video coding unit image and a segmentation label thereof are obtained, and a training set is constructed; constructing a two-classifier model based on a convolutional neural network; training the two classifier models using the training set; and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result. The deep learning-based classifier is used for coding unit segmentation, on the basis of ensuring the accuracy of classification results of the classifier, compared with a method of HEVC original quad-tree combined with RDO, the method has the advantages that the running time is greatly reduced, a corresponding hardware circuit is designed, and the hardware utilization rate is high.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
Fig. 1 is a flowchart illustrating a video coding unit partitioning method according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of a convolutional neural network-based two-classifier model in a video coding unit segmentation method according to embodiment 1 of the present invention;
fig. 3 is an input image of an experimental example of a video coding unit segmentation method according to embodiment 1 of the present invention;
fig. 4 is a second input image of an experimental example of a video coding unit segmentation method according to embodiment 1 of the present invention;
fig. 5 is a schematic diagram of a hardware circuit description file in a hardware implementation method according to embodiment 3 of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Embodiment 1 of the present invention provides a video coding unit partitioning method, as shown in fig. 1, the method including the following steps:
and step 110, acquiring the video coding unit image and the segmentation label thereof, and constructing a training set.
Specifically, the training set sample adopts a yuv file for testing commonly used in HEVC and various videos collected on the network. Further, the segmentation labels of the video coding unit images in the training set are obtained by the original quadtree algorithm of HEVC. And extracting CTU original pictures and corresponding segmentation labels from the HEVC encoder to construct a training set.
And step 120, constructing a classifier model based on the convolutional neural network.
And establishing a two-classifier model by adopting python based on a TensorFlow framework. Specifically, as shown in fig. 2, the convolutional neural network-based two classifier model specifically includes three parallel first convolutional layers, second convolutional layers, and third convolutional layers, the first convolutional layers, the second convolutional layers, and the third convolutional layers are respectively connected with one pooling layer, the three pooling layers are all connected to the feature fusion layer, the feature fusion layer is connected with a fourth convolutional layer, and the fourth convolutional layer is connected with a Softmax layer. The input image size of the classifier model is 64 × 64 or 32 × 32 or 16 × 16, the first convolution layer is 7 × 3 convolution layers, the second convolution layer is 3 × 3 convolution layers, the third convolution layer is 3 × 7 convolution layers, and the fourth convolution layer is 1 × 1 convolution layer. The network structure of the two-classifier model is replaced by a full convolution layer of 1 × 1, so that three different coding units of 64 × 64, 32 × 32 and 16 × 16 can be self-adapted, three different types of input images, namely 64 × 64, 32 × 32 or 16 × 16, can be processed through one two-classifier, different classifiers do not need to be trained for images with different sizes, and the robustness of the classifiers is improved.
Inputting 64 × 64, 32 × 32 or 16 × 16 images, obtaining convolution characteristics through three convolution layers in parallel, then entering a pooling layer, fusing the pooled characteristics to obtain fused characteristics, classifying through 1 × 1 convolution and softmax, and finally obtaining whether the input images need to be subjected to next segmentation labels. The three parallel convolution layers can be subjected to parallel computation simultaneously and are not interfered with each other, and the parallelization characteristic is beneficial to improving the utilization rate of hardware.
Step 130, training the two classifier model using the training set.
Specifically, a common TensorFlow framework is adopted to train the classifier model until convergence, and model parameters are obtained. And applying a relevant API of TensorFlow to carry out int8 quantization on the obtained model parameters, and then importing the quantized model parameters into a comprehensive model C + + code.
And 140, inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.
The trained model can be applied to the division of a video coding unit, and the result of whether the image needs to be further divided or not can be output after the image is input into the trained two-classifier model.
Experimental example: firstly, an input image is a 16-by-16 block, as shown in fig. 3, the mark of the block is required to be segmented, two-dimensional data (0.85, 0.15) is output through the two-classifier model, the output two-dimensional data is corresponding to the mark of (segmented, not segmented), and the final segmentation result with high probability is selected, namely the result is segmented; and secondly, the input image is a 16-by-16 block, as shown in fig. 4, the block is marked as being not required to be segmented, two-dimensional data (0.01 and 0.99) are output through the two-classifier model, and the final segmentation result with high probability is also selected, namely the result is that the segmentation is not required.
The embodiment of the invention provides a video coding unit segmentation method, which comprises the steps of obtaining a video coding unit image and a segmentation label thereof, and constructing a training set; constructing a two-classifier model based on a convolutional neural network; training a second classifier model by using a training set; and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result. The classifier based on deep learning carries out coding unit segmentation, and compared with a method of HEVC original quadtree combined with RDO, the method has the advantages that the running time is greatly reduced, and the algorithm can realize parallelization, so that the utilization rate of hardware is improved.
In correspondence with embodiment 1 described above, embodiment 2 of the present invention provides a video coding unit partitioning system, including:
the training set constructing module is used for acquiring the video coding unit images and the segmentation labels thereof and constructing a training set;
the model construction module is used for constructing a two-classifier model based on a convolutional neural network;
the model training module is used for training the second classifier model by using a training set;
and the segmentation module is used for inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.
The functions performed by each component in the video coding unit partitioning system provided in embodiment 2 of the present invention are described in detail in embodiment 1, and therefore, redundant description is not repeated here.
Embodiment 3 of the present invention provides a hardware implementation method, which is used for implementing hardware of a classifier model based on a convolutional neural network in a video coding unit segmentation method, and includes:
and C synthesizing the trained two-classifier model based on the convolutional neural network by adopting an H L S tool to obtain a hardware circuit description file of the model.
Specifically, after the H L S tool used in this embodiment is Catapult H L s. two classifiers are designed, C synthesis, C simulation, modelsim simulation are performed on a synthesizable model C + + code using an H L S tool, and a circuit description file is synthesized, as shown in fig. 5.
The method further comprises the step of connecting all structural layers in the convolutional neural network in an interface mode, wherein in a generated hardware circuit description file, the fact that all layers are integrated into an interface can be seen in fig. 1, Catapult H L S software (I) shows that all modules execute in parallel in such a mode, the characteristic that hardware can be parallelized is fully utilized, and the utilization efficiency of the hardware is effectively improved.
The softmax adopts a function provided in a Catapult H L S library, the function aims at the problem that the exponential function cannot be directly subjected to hardware design, a plurality of piecewise functions are adopted to approximate the exponential function, and tests show that the mean square error of the simulation function and the real softmax is within an acceptable range.1-1 convolution adopts softmax as an active layer to classify the image, namely whether further segmentation is needed.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. A method for video coding unit partitioning, the method comprising:
acquiring a video coding unit image and a segmentation label thereof, and constructing a training set;
constructing a two-classifier model based on a convolutional neural network;
training the two classifier models using the training set;
and inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.
2. The method according to claim 1, wherein the convolutional neural network-based two classifier model specifically includes three parallel first convolutional layers, second convolutional layers, and third convolutional layers, each of the first convolutional layer, the second convolutional layer, and the third convolutional layer is connected to one pooling layer, all of the three pooling layers are connected to a feature fusion layer, the feature fusion layer is connected to a fourth convolutional layer, and the fourth convolutional layer is connected to a Softmax layer.
3. The method of claim 2, wherein the input image size of the convolutional neural network-based two-classifier model is 64 × 64 or 32 × 32 or 16 × 16, the first convolutional layer is 7 × 3 convolutional layer, the second convolutional layer is 3 × 3 convolutional layer, the third convolutional layer is 3 × 7 convolutional layer, and the fourth convolutional layer is 1 × 1 convolutional layer.
4. The method of claim 1, further comprising:
and establishing the two classifier models by adopting python based on a TensorFlow framework, and training the two classifier models until convergence, so as to obtain model parameters.
5. The method as claimed in claim 1, wherein the segmentation labels of the images of the video coding units in the training set are obtained by a raw quadtree algorithm of HEVC.
6. A video coding unit partitioning system, the system comprising:
the training set constructing module is used for acquiring the video coding unit images and the segmentation labels thereof and constructing a training set;
the model construction module is used for constructing a two-classifier model based on a convolutional neural network;
a model training module to train the two classifier models using the training set;
and the segmentation module is used for inputting the video coding unit image to be processed into the trained two-classifier model to obtain a segmentation result.
7. A hardware implementation method for implementing a convolutional neural network-based two-classifier model in a video coding unit segmentation method according to any one of claims 1 to 5, the method comprising:
and C synthesizing the trained two-classifier model based on the convolutional neural network by adopting an H L S tool to obtain a hardware circuit description file of the model.
8. The hardware implementation method of claim 7, wherein the H L S tool is CatapultH L S.
9. A hardware implementation method according to claim 7, wherein the method further comprises: and connecting the structural layers in the convolutional neural network in an interface mode.
10. A hardware implementation method according to claim 9, the method further comprising: and each interface adopts an FIFO register to realize the complete parallelization of the data processing of each structural layer module.
CN202010112709.6A 2020-02-24 2020-02-24 Video coding unit segmentation method, system and hardware implementation method Pending CN111405295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010112709.6A CN111405295A (en) 2020-02-24 2020-02-24 Video coding unit segmentation method, system and hardware implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010112709.6A CN111405295A (en) 2020-02-24 2020-02-24 Video coding unit segmentation method, system and hardware implementation method

Publications (1)

Publication Number Publication Date
CN111405295A true CN111405295A (en) 2020-07-10

Family

ID=71413185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010112709.6A Pending CN111405295A (en) 2020-02-24 2020-02-24 Video coding unit segmentation method, system and hardware implementation method

Country Status (1)

Country Link
CN (1) CN111405295A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105430396A (en) * 2015-12-15 2016-03-23 浙江大学 Video coding method capable of deciding sizes of coding blocks by means of classification
CN108495129A (en) * 2018-03-22 2018-09-04 北京航空航天大学 The complexity optimized method and device of block partition encoding based on deep learning method
CN108986124A (en) * 2018-06-20 2018-12-11 天津大学 In conjunction with Analysis On Multi-scale Features convolutional neural networks retinal vascular images dividing method
CN109714584A (en) * 2019-01-11 2019-05-03 杭州电子科技大学 3D-HEVC depth map encoding unit high-speed decision method based on deep learning
CN110378344A (en) * 2019-05-05 2019-10-25 北京交通大学 Convolutional neural networks multispectral image dividing method based on spectrum dimension switching network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105430396A (en) * 2015-12-15 2016-03-23 浙江大学 Video coding method capable of deciding sizes of coding blocks by means of classification
CN108495129A (en) * 2018-03-22 2018-09-04 北京航空航天大学 The complexity optimized method and device of block partition encoding based on deep learning method
CN108986124A (en) * 2018-06-20 2018-12-11 天津大学 In conjunction with Analysis On Multi-scale Features convolutional neural networks retinal vascular images dividing method
CN109714584A (en) * 2019-01-11 2019-05-03 杭州电子科技大学 3D-HEVC depth map encoding unit high-speed decision method based on deep learning
CN110378344A (en) * 2019-05-05 2019-10-25 北京交通大学 Convolutional neural networks multispectral image dividing method based on spectrum dimension switching network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
秦华标,曹钦平: "基于FPGA的卷积神经网络硬件加速器设计", 《电子与信息学报》 *

Similar Documents

Publication Publication Date Title
Wang et al. Towards analysis-friendly face representation with scalable feature and texture compression
CN111355956B (en) Deep learning-based rate distortion optimization rapid decision system and method in HEVC intra-frame coding
CN104581177B (en) Image compression method and device combining block matching and string matching
CN110675403B (en) Multi-instance image segmentation method based on coding auxiliary information
CN114286093A (en) Rapid video coding method based on deep neural network
CN112101262B (en) Multi-feature fusion sign language recognition method and network model
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN109670506A (en) Scene Segmentation and system based on Kronecker convolution
WO2023174256A1 (en) Data compression method and related device
CN116824694A (en) Action recognition system and method based on time sequence aggregation and gate control transducer
Joy et al. Modelling of depth prediction algorithm for intra prediction complexity reduction
CN110677624A (en) Monitoring video-oriented foreground and background parallel compression method based on deep learning
CN111405295A (en) Video coding unit segmentation method, system and hardware implementation method
CN114501031B (en) Compression coding and decompression method and device
Ma et al. AFEC: adaptive feature extraction modules for learned image compression
CN115086660B (en) Decoding and encoding method based on point cloud attribute prediction, decoder and encoder
CN112770120B (en) 3D video depth map intra-frame rapid coding method based on depth neural network
Liu et al. Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network
WO2022067776A1 (en) Point cloud decoding and encoding method, and decoder, encoder and encoding and decoding system
CN113240589A (en) Image defogging method and system based on multi-scale feature fusion
CN116634147B (en) HEVC-SCC intra-frame CU rapid partitioning coding method and device based on multi-scale feature fusion
CN116828184B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN116600107B (en) HEVC-SCC quick coding method and device based on IPMS-CNN and spatial neighboring CU coding modes
CN113225552B (en) Intelligent rapid interframe coding method
Qin et al. A Complexity-Reducing HEVC Intra-Mode Method Based on VGGNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200710

RJ01 Rejection of invention patent application after publication