CN114827604A

CN114827604A - Method and system for dividing CTU (transform coding unit) in high-efficiency video coding frame

Info

Publication number: CN114827604A
Application number: CN202210391905.0A
Authority: CN
Inventors: 庞贵杰; 原玲; 晏陈旭; 王耀葛; 文瑞森
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-07-29

Abstract

The invention provides a method and a system for dividing a CTU (transform coding unit) in a high-efficiency video coding frame, relates to the technical field of video coding unit division, and solves the problems of high calculation complexity and long coding time of the current method for dividing the CTU in the frame in HEVC (high efficiency video coding).

Description

Method and system for dividing CTU (transform coding unit) in high-efficiency video coding frame

Technical Field

The present invention relates to the field of video coding unit partitioning technologies, and in particular, to a method and a system for efficient video coding intra-frame CTU partitioning.

Background

An original Coding Tree Unit (CTU) partitioning algorithm in High Efficiency Video Coding (HEVC) is as follows: one CTU includes one or more Coding Units (CUs) and has four CU sizes, 64x64, 32x32, 16x16, and 8x 8. The optimal partitioning mode of each CTU can be found by adopting the quad-tree traversal algorithm of the CTUs, and the optimal CU partition is determined, but the coding complexity is greatly increased. As people have higher and higher requirements on video quality, challenges are brought to video encoding and decoding of mobile devices, and therefore, the encoding complexity needs to be reduced through algorithm optimization, so that the mobile devices can apply HEVC with a lower threshold.

The current CU fast optimization algorithm in HEVC mainly has three solutions: conventional methods, machine learning based methods, deep learning based methods. The traditional method generally calculates the mean value and the variance of pixels, can determine the size of a CU in advance and reduce the coding complexity, for example, Jae Myung Ha et al propose a texture-based HEVC intra-frame coding CU size fast decision algorithm, and texture features select the mean value and the variance of pixels in a current CU block, although the coding complexity can be reduced by the method, the calculation complexity is too large, and the coding performance is greatly influenced; the method based on machine learning is simple and direct in thought, and mainly extracts some effective image features to judge the complexity of the CU, for example, LIU and other people establish three classification structures of CU scale decision based on a support vector machine, wherein the three classification structures are respectively complex CU, homogeneous CU and uncertain CU, and although the decision effect is good, the calculation complexity is additionally increased; the coding complexity is reduced by a deep learning-based method with high efficiency, a convolutional network structure is introduced by the method to automatically extract a good characteristic value, for example, a rapid partition frame of a CTU (computer terminal unit) based on a ResNet-18 convolutional network is proposed by arid Zaki et al, different network structures are proposed for three sizes of CUs 64x64, 32x32 and 16x16, although the coding complexity can be greatly reduced, the introduced network layers are deep, so that the complexity of a network model is increased, and the network with the size of 64x64CU is easy to cause overfitting during data training, so that the network is limited to exert good performance.

The existing patent publication document also discloses a method for dividing intra-frame CTUs in HEVC, which comprises the steps of processing a video frame by adopting a quad-tree neural network model to obtain all CTU quad-tree structures of the whole frame, coding and dividing all CTU quad-tree structures of the current frame by adopting an optimized coder, and dividing the intra-frame CTUs of HEVC according to the dividing results, so that the relevance of each dividing result is enhanced, the efficiency of processing data is improved, and the coding time is reduced.

Disclosure of Invention

In order to solve the problems of high computational complexity and long coding time of the CTU partitioning method in the current HEVC, the invention provides a method and a system for partitioning CTUs in a high-efficiency video coding frame.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a method of CTU partitioning within a high efficiency video coding frame, comprising the steps of:

s1, acquiring an image data set, and making video sets with different resolutions according to the image data set;

s2, setting different quantization parameters QP, and extracting four conditions of size division of video intra-frame Coding Units (CU) in a video set under the setting of the different quantization parameters QP, wherein the four conditions are respectively 64x 64-size CU, 32x 32-size CU, 16x 16-size CU and 8x 8-size CU;

s3, representing the texture complexity of the current CU with the size of 64x64 by using the average absolute deviation M, determining the relation between the condition that the CU with the size of 64x64 is split into the CU with the size of 32x32 under different quantization parameter QP settings and M, obtaining a division threshold T of the CU with the size of 64x64, and executing a step S4;

s4, setting an initial value of depth of the coding unit to be 0, judging whether M is smaller than or equal to T, if so, not dividing the CU with the size of 64x64, and outputting the CU with the size of 64x64 as the final CU size to determine; otherwise, add one to the depth of coding unit, partition the CU of 64x64 size into 4 CUs of 32x32 size, and perform step S5;

s5, constructing and training an A neural network, inputting each CU with the size of 32x32 into the trained A neural network, judging whether the CU with the size of 32x32 is divided or not by using the A neural network, if so, adding one to the depth of a coding unit, dividing the CU with the size of 32x32 into 4 CUs with the size of 16x16, and executing a step S6; otherwise, outputting the 32x32 size CU as a final CU size decision;

s6, constructing and training a B neural network, inputting each CU with the size of 16x16 into the trained B neural network, judging whether the CU with the size of 16x16 is divided or not by using the B neural network, if so, adding one to the depth of a coding unit, dividing the CU with the size of 16x16 into 4 CUs with the size of 8x8, and outputting the CU with the size of 8x8 as the final CU size for determination; otherwise, outputting the 16x16 size CU as a final CU size decision;

and S7, determining the final CU size obtained in S4-S6 to perform subsequent coding processing.

In the technical scheme, firstly, the texture complexity of a current 64x 64-size CU is represented by an average absolute deviation M, and a partition threshold T of the 64x 64-size CU is obtained based on the relation between the condition that the 64x 64-size CU is split into the 32x 32-size CU and the average absolute deviation M under different quantization parameter QP settings; then, whether the CU with the size of 64x64 is divided into the CU with the size of 32x32 is determined by using the size comparison relation between M and T, the CU with the size of 64x64 and with simple texture can be lifted up to be stopped, and the calculation complexity of the algorithm is reduced; secondly, respectively constructing and training an A neural network and a B neural network by utilizing the correlation of texture complexity and coding unit depth, inputting the training A neural network into CU with the size of 32x32, judging whether the CU with the size of 32x32 is divided into CU with the size of 16x16 by utilizing the A neural network, inputting the CU with the size of 16x16 into the training B neural network, judging whether the CU with the size of 16x16 is divided into CU with the size of 8x8 by utilizing the B neural network, providing different methods for the dividing conditions of different CU sizes, judging the CU with the size of 64x64 by utilizing a dividing threshold T, judging the CU with the size of 32x32 by utilizing the A neural network, judging the CU with the size of 16x16 by utilizing the B neural network, and ensuring the accuracy of the dividing conditions of different CU sizes; and finally, determining the final CU size according to the segmentation conditions of different CU sizes to perform subsequent coding processing, so that the computational complexity of HEVC intra-frame coding is reduced, and the coding time is shortened.

Preferably, in step S1, the acquired image data set is a RAISE ultra high definition image set, and four YUV format video sets with different resolutions are made from the RAISE ultra high definition image set, wherein the resolutions are 4928x3264, 2560x1600, 1536x1024, and 704x576 respectively.

Preferably, in step S3, different quantization parameter QP settings correspond to different thresholds, 4 video sequences with different resolutions and different video contents are selected, and intra-frame coding is used to use 4 quantization parameter QP values, so as to obtain 16 video coding results;

the average absolute deviation of the 64x64 size CU is divided into a global average absolute deviation, a row pixel average absolute deviation, and a column pixel average absolute deviation, and the global average absolute deviation calculation formula of the 64x64 size CU is as follows:

wherein MAD is expressed as the global mean absolute deviation of a 64X64 size CU, X is expressed as the pixel value per row, Y is expressed as the pixel value per column, p (X, Y) is expressed as the pixel value at (X, Y), mean is expressed as the mean of all pixels in a 64X64 size CU;

the line pixel mean absolute value deviation calculation formula is as follows:

wherein the MAD _H Expressed as mean absolute deviation of the row pixels, mean _y Expressed as the mean of all the rows of pixels in a 64x64 size CU;

the column pixel mean absolute value deviation calculation formula is as follows:

wherein the MAD _V Expressed as mean absolute deviation, mean, of the column pixels _x Expressed as the mean of all column pixels in a 64x64 size CU; the mean absolute value deviation formula for the final 64x64 size CU is as follows:

M＝min(MAD,MAD _H ,MAD _V )

where M is expressed as the mean absolute deviation in the final 64x64 size CU;

when a CU with the size of 64x64 is not divided, texture of the CU area is smooth in the horizontal direction and the vertical direction, when the CU with the size of 64x64 is divided, texture of the CU area is complex in the horizontal direction and the vertical direction, whether the CU with the size of 64x64 is divided or not is judged by using the average absolute value deviation of the CU with the size of 64x64, a critical state of the average absolute deviation value between the divided CU and the non-divided CU is obtained, the size of the average absolute deviation value when the CU with the size of 64x64 is not divided and the size of the average absolute deviation value when the CU with the size of 64x64 are confirmed, a numerical value between the average absolute deviation value and the divided CU is used as a current selection threshold, the division condition of the CU with the size of 64x64 and the standard comparison result are different and is used as an experimental error, the current selection threshold is +/-0.05, and the value with the minimum experimental error is selected as a division threshold T; the standard is the division condition of the CU with the size of 64x64 when the video is encoded by using the HEVC standard reference software HM16.20, which can improve the judgment accuracy of the division threshold T, and when the division threshold T is compared with the average absolute deviation M, the division condition of the CU with the size of 64x64 can be more accurate.

Preferably, in step S5, the a neural network includes a first volume block, a second volume block, a third volume block, a fourth volume block, a fifth volume block, a full connection layer and an output layer, which are connected in sequence; the first convolution block consists of one convolution layer, each of the second convolution block, the third convolution block, the fourth convolution block and the fifth convolution block consists of two convolution layers, and the two convolution layers have the same parameter setting; the first convolution block is set to 64 convolution kernels, and the size of each convolution kernel is 7x 7; the second convolution block is set to 64 convolution kernels, and the size of each convolution kernel is 3x 3; the third convolution block is set to 128 convolution kernels, and the size of each convolution kernel is 3x 3; the fourth convolution block is set to 256 convolution kernels, and the size of each convolution kernel is 3x 3; the fifth convolution block is set to 512 convolution kernels, and the size of the convolution kernels is 3x 3;

the full connection layer comprises two hidden layers, the output layer is activated by adopting a SoftMax function, so that the output value of the output layer is mapped between (0 and 1), the output value of the output layer is the probability of splitting the CU with the size of 32x32, the class labels of the output value of the output layer are '0' and '1', the '0' indicates no splitting, the '1' indicates splitting, the corresponding class label with the maximum output value probability is selected as a prediction result, if the prediction result is splitting, the depth of a coding unit is increased by one, and the CU with the size of 32x32 is split into 4 CUs with the size of 16x 16; otherwise, the 32x32 size CU is not split.

In the method, the A neural network is introduced to predict and judge whether the CU with the 32x32 size is divided or not by utilizing the correlation of texture complexity and deep learning, so that the situation that the CU with the 32x32 size is divided by adopting a threshold value again is avoided, and the accuracy of judging the texture complex area of the CU with the 32x32 size is ensured.

Preferably, in step S5, the video set is randomly divided into a training set of 90%, a validation set of 5%, and a test set of 5%, the a neural network is trained by the training set, the a neural network is validated by the validation set, and the a neural network is tested by the test set;

loss function (L) used in training A neural network _f ) For categorical cross entropy, the formula is as follows:

wherein N is the number of CU blocks in the neural network A,

for the ith prediction of the a neural network model,

predicting a corresponding target for the ith of the A neural network model; and testing the trained A neural network by using the test set, setting a prediction accuracy threshold, and stopping training when the prediction accuracy of the A neural network is greater than the prediction accuracy threshold when the A neural network is tested by using the test set to obtain the trained A neural network. Using Loss function (L) _f ) Training the A neural network, wherein the value of the Loss function reflects the difference between the A neural network and actual data, and the lower the Loss value (Loss value), the smaller the difference between the current predicted value and the label value is, so that code modification and subsequent optimization are facilitated.

Preferably, in step S6, the B neural network includes a first convolutional layer, a second convolutional layer, a pooling layer, a third convolutional layer, a fourth convolutional layer, a full-link layer, and an output layer; the first layer of convolutional layers is set to be 32 convolutional kernels, and the size of the convolutional kernels is 3x 3; the second convolution layer is set to 64 convolution kernels, and the size of each convolution kernel is 3x 3; the pooling layer was set for AvgPool operation, with a pooling kernel size of 2x 2; the third layer of convolutional layers is set to 64 convolutional kernels, and the size of the convolutional kernels is 2x 2; the fourth convolution layer is set to 128 convolution kernels, and the size of each convolution kernel is 2x 2;

the fully-connected layer includes two hidden layers, which are randomly lost with a 50% probability between the second hidden layer and the output layer: activating an output layer by adopting a Sigmoid function, enabling an output value of the output layer to be positioned between (0 and 1), judging whether the segmentation probability is greater than 50% when the output value of the output layer is the probability of segmentation of the CU with the size of 16x16, adding one to the depth of a coding unit, and segmenting the CU with the size of 16x16 into 4 CUs with the size of 8x 8; otherwise, the 16 × 16 size CU is not split.

In the method, a B neural network is introduced to predict and judge whether the CU with the size of 16x16 is divided or not by utilizing the correlation of texture complexity and deep learning, so that the situation that the CU with the size of 16x16 is divided by adopting a threshold value again is avoided, and the accuracy of judging the texture complex area of the CU with the size of 16x16 is ensured.

Preferably, in step 6, the video set is randomly divided into a training set of 90%, a validation set of 5% and a test set of 5%, the B neural network is trained by the training set, the B neural network is validated by the validation set, and the B neural network is tested by the test set;

the loss function formula used in training the B neural network is as follows:

wherein n is the number of CU blocks in the training B neural network, x [ i ] is the ith target value of the B neural network model, and y [ i ] is the ith predicted value of the B neural network model; and testing the trained B neural network by using the test set, setting a prediction accuracy threshold, and stopping training when the test set is used for testing the B neural network and the prediction accuracy of the B neural network is greater than the prediction accuracy threshold to obtain the trained B neural network.

Preferably, the subsequent encoding process is performed in HEVC standard reference software HM 16.20.

The present invention also proposes a system for efficient video coding intra-frame CTU partitioning, said system comprising:

the data set production module is used for collecting an image data set and producing video sets with different resolutions according to the image data set;

the coding unit dividing module is used for setting different quantization parameters QP and extracting four conditions of size division of a video intra-frame coding unit CU in a video set under the setting of the different quantization parameters QP, wherein the four conditions are respectively 64x64 size CU, 32x32 size CU, 16x16 size CU and 8x8 size CU;

the partitioning threshold determining module is used for representing the texture complexity of the current CU with the size of 64x64 by using the average absolute deviation M, determining the relation between the condition that the CU with the size of 64x64 is split into the CU with the size of 32x32 under different quantization parameter QP settings and M, and obtaining the partitioning threshold T of the CU with the size of 64x 64;

a judging module, configured to set an initial value of depth of the coding unit to 0, judge whether M is less than or equal to T, if yes, not split the 64x 64-sized CU, and output the 64x 64-sized CU as a final CU size decision; otherwise, adding one to the coding unit depth, dividing the 64x64 size CU into 4 32x32 size CUs;

the A neural network building and processing module is used for building and training an A neural network, inputting each CU with the size of 32x32 into the trained A neural network, judging whether the CU with the size of 32x32 is divided or not by using the A neural network, if so, adding one to the depth of a coding unit, and dividing the CU with the size of 32x32 into 4 CUs with the size of 16x 16; otherwise, outputting the 32x32 size CU as a final CU size decision;

the B neural network construction processing module is used for constructing and training a B neural network, inputting each CU with the size of 16x16 into the trained B neural network, judging whether the CU with the size of 16x16 is divided or not by using the B neural network, if so, adding one to the depth of a coding unit, dividing the CU with the size of 16x16 into 4 CUs with the size of 8x8, and outputting the CU with the size of 8x8 as the final CU size for determination; otherwise, outputting the 16x16 size CU as a final CU size decision;

and the coding processing module is used for determining the final CU size obtained in the judging module, the A neural network constructing and processing module and the A neural network constructing and processing module to perform subsequent coding processing.

Preferably, the encoding processing module is encapsulated in HEVC standard reference software HM 16.20.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

compared with the CTU dividing method in the current HEVC, the method provided by the invention obtains the dividing threshold T of the CU with the size of 64x64 based on the relation between the condition that the CU with the size of 64x64 is divided into the CU with the size of 32x32 under different quantization parameters QP setting and the average absolute deviation M; then, whether the CU with the size of 64x64 is divided into the CU with the size of 32x32 is determined by using the size comparison relation between M and T, the CU with the size of 64x64 and with simple texture can be lifted up to be stopped from being divided, and the calculation complexity of an algorithm is reduced; secondly, respectively constructing and training an A neural network and a B neural network by utilizing the correlation of texture complexity and coding unit depth, inputting the training A neural network into CU with the size of 32x32, judging whether the CU with the size of 32x32 is divided into CU with the size of 16x16 by utilizing the A neural network, inputting the CU with the size of 16x16 into the training B neural network, judging whether the CU with the size of 16x16 is divided into CU with the size of 8x8 by utilizing the B neural network, providing different methods for the dividing conditions of different CU sizes, judging the CU with the size of 64x64 by utilizing a dividing threshold T, judging the CU with the size of 32x32 by utilizing the A neural network, judging the CU with the size of 16x16 by utilizing the B neural network, and ensuring the accuracy of the dividing conditions of different CU sizes; and finally, determining the final CU size according to the segmentation conditions of different CU sizes to perform subsequent coding processing, so that the computational complexity of HEVC intra-frame coding is reduced, and the coding time is shortened.

Drawings

Fig. 1 is a flowchart illustrating a method for efficient video coding intra CTU partitioning according to embodiment 1 of the present invention;

fig. 2 is a graph of threshold values of a 64x64CU with different quantization parameters QP set according to embodiment 1 of the present invention;

FIG. 3 is a diagram showing a structure of an A neural network proposed in embodiment 2 of the present invention;

FIG. 4 is a diagram showing relevant parameters of the A neural network proposed in embodiment 2 of the present invention;

FIG. 5 is a diagram showing a structure of a B neural network proposed in embodiment 3 of the present invention;

fig. 6 is a block diagram of a system for efficient video coding intra-frame CTU partitioning according to embodiment 4 of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for better illustration of the embodiment, some parts in the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions, and the description of directions of the parts such as "up" and "down" is not limited to the patent;

it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

Fig. 1 is a flowchart illustrating a method for efficient video coding intra-frame CTU partitioning according to an embodiment of the present invention, including the following steps:

there are various methods for making video sets, in this embodiment, the acquired image data set is a RAISE ultra high definition image set, and four YUV format video sets with different resolutions are made by the RAISE ultra high definition image set, and the resolutions are 4928x3264, 2560x1600, 1536x1024, and 704x576 respectively; selecting a plurality of 4928x3264 ultra-high-definition images, then downsampling partial photos into 2560x1600, 1536x1024 and 704x576 resolutions, and randomly dividing a YUV format video set of each resolution into a 90% training set, a 5% verification set and a 5% test set; for the HEVC reference software HM16.20, the All _ intra configuration is used, the quantization parameters QP of the All _ intra configuration are set to 22, 27, 32, and 37, and the four coding unit partition cases at different QP settings are extracted, thereby constructing a data set.

S2, setting quantization parameters QP to be 22, 27, 32 and 37, and extracting four cases of size division of video intra-frame coding units CU in the video set with the quantization parameters QP of 22, 27, 32 and 37, wherein the four cases are respectively 64x64 size CU, 32x32 size CU, 16x16 size CU and 8x8 size CU;

s3, representing the texture complexity of the current CU with the size of 64x64 by using the average absolute deviation M, determining the relation between the condition that the CU with the size of 64x64 is split into the CU with the size of 32x32 under the condition that the quantization parameter QP is 22, 27, 32 and 37, obtaining the splitting threshold T of the CU with the size of 64x64, and executing the step S4;

referring to fig. 2, when the quantization parameter QP is 22, the corresponding threshold value is set to 3.112, when the quantization parameter QP is 27, the corresponding threshold value is set to 3.592, when the quantization parameter QP is 32, the corresponding threshold value is set to 4.056, when the quantization parameter QP is 37, the corresponding threshold value is set to 4.356, video sequences with 4928x3264, 2560x1600, 1536x1024, and 704x576 resolutions and different video contents are selected, and intra-frame coding is performed using 4 quantization parameter QP values: 22. 27, 32 and 37, obtaining 16 video coding results;

the line pixel mean absolute value deviation calculation formula is as follows:

M＝min(MAD,MAD _H ,MAD _V )

where M is expressed as the mean absolute deviation in the final 64x64 size CU;

when a CU with the size of 64x64 is not divided, texture of the CU area is smooth in the horizontal direction and the vertical direction, when the CU with the size of 64x64 is divided, texture of the CU area is complex in the horizontal direction and the vertical direction, whether the CU with the size of 64x64 is divided or not is judged by using the average absolute value deviation of the CU with the size of 64x64, a critical state of the average absolute deviation value between the divided CU and the non-divided CU is obtained, the size of the average absolute deviation value when the CU with the size of 64x64 is not divided and the size of the average absolute deviation value when the CU with the size of 64x64 are confirmed, a numerical value between the average absolute deviation value and the divided CU is used as a current selection threshold, the division condition of the CU with the size of 64x64 and the standard comparison result are different and is used as an experimental error, the current selection threshold is +/-0.05, and the value with the minimum experimental error is selected as a division threshold T; the standard is the 64x64CU partition case when encoding video using the HEVC standard reference software HM 16.20. The accuracy of determination of the division threshold T is improved, and when the division threshold T is compared with the average absolute deviation M, the division of the CU of size 64 × 64 can be made more accurate.

Under the condition that the QP is 32, 5 video test sequences with complex textures and simple textures are respectively selected, the accuracy rate of the test threshold value judgment is 73.2% in the video sequences with complex textures, and 78.65% in the video sequences with simpler textures.

S4, setting an initial value of depth of the coding unit to be 0, judging whether M is smaller than or equal to T, if so, not dividing the CU with the size of 64x64, setting the depth value of the current coding unit to be 0, and outputting the CU with the size of 64x64 as the final CU size; otherwise, adding one to the depth of the coding unit, changing the depth value of the current coding unit to 1, dividing the CU of 64x64 size into 4 CUs of 32x32 size, and executing step S5;

s5, constructing and training an A neural network, inputting each CU with the size of 32x32 into the trained A neural network, judging whether the CU with the size of 32x32 is divided or not by using the A neural network, if so, firstly adding 1 on the basis that the depth of a coding unit is 1, wherein the depth of the coding unit is changed into 2, then dividing 1 CU with the size of 32x32 into 4 CUs with the size of 16x16, realizing the same dividing mode for the rest 3 CUs with the size of 32x32, and finally inputting the CUs divided into the size of 16x16 into a B network, and executing the step S6; otherwise, the current coding unit depth remains 1, and the CU of 32x32 size is directly output as the final CU size decision.

S6, constructing and training a B neural network, inputting each CU with the size of 16x16 into the trained B neural network, judging whether the CU with the size of 16x16 is divided or not by using the B neural network, if so, firstly adding 1 on the basis of the depth of a coding unit being 2, wherein the depth of the coding unit is changed into 3, then dividing 1 CU with the size of 16x16 into 4 CUs with the size of 8x8, realizing the same dividing mode for the rest 3 CUs with the size of 16x16, and finally directly outputting the CU divided into the size of 8x8 to determine the final CU size; otherwise, the current coding unit depth remains 2, and the CU of size 16 × 16 is directly output as the final CU size decision.

And S7, determining the final CU size obtained in S4-S6 to perform subsequent coding processing, wherein the subsequent coding processing is performed in HEVC standard reference software HM 16.20.

In the technical scheme, firstly, the texture complexity of a current 64x 64-sized CU is represented by an average absolute deviation M, and a dividing threshold T of the 64x 64-sized CU is obtained based on the relation between the condition that the 64x 64-sized CU is split into the 32x 32-sized CU and the average absolute deviation M under different quantization parameter QP settings; then, whether the CU with the size of 64x64 is divided into the CU with the size of 32x32 is determined by using the size comparison relation between M and T, the CU with the size of 64x64 and with simple texture can be lifted up to be stopped, and the calculation complexity of the algorithm is reduced; secondly, respectively constructing and training an A neural network and a B neural network by utilizing the correlation of texture complexity and coding unit depth, inputting the training A neural network into CU with the size of 32x32, judging whether the CU with the size of 32x32 is divided into CU with the size of 16x16 by utilizing the A neural network, inputting the CU with the size of 16x16 into the training B neural network, judging whether the CU with the size of 16x16 is divided into CU with the size of 8x8 by utilizing the B neural network, providing different methods for the dividing conditions of different CU sizes, judging the CU with the size of 64x64 by utilizing a dividing threshold T, judging the CU with the size of 32x32 by utilizing the A neural network, judging the CU with the size of 16x16 by utilizing the B neural network, and ensuring the accuracy of the dividing conditions of different CU sizes; and finally, determining the final CU size according to the segmentation conditions of different CU sizes to perform subsequent coding processing, so that the computational complexity of HEVC intra-frame coding is reduced, and the coding time is shortened.

Example 2

Fig. 3 is a diagram illustrating an a neural network structure according to an embodiment of the present invention, and as shown in fig. 3, the a neural network includes a first convolution block, a second convolution block, a third convolution block, a fourth convolution block, a fifth convolution block, a full connection layer, and an output layer, which are sequentially connected to each other; the first convolution block consists of one convolution layer, each of the second convolution block, the third convolution block, the fourth convolution block and the fifth convolution block consists of two convolution layers, and the two convolution layers have the same parameter setting; referring to fig. 4, the first convolution block is set to 64 convolution kernels, and the size of the convolution kernels is 7 × 7; the second convolution block is set to 64 convolution kernels, and the size of the convolution kernels is 3x 3; the third convolution block is set to 128 convolution kernels, and the size of each convolution kernel is 3x 3; the fourth convolution block is set to 256 convolution kernels, and the size of each convolution kernel is 3x 3; the fifth convolution block is set to 512 convolution kernels, and the size of the convolution kernels is 3x 3;

the full connection layer comprises two hidden layers, the output layer is activated by adopting a SoftMax function, so that the output value of the output layer is mapped between (0 and 1), the output value of the output layer is the probability of splitting the CU with the size of 32x32, the class labels of the output value of the output layer are '0' and '1', the '0' indicates no splitting, the '1' indicates splitting, the corresponding class label with the maximum output value probability is selected as a prediction result, if the prediction result is splitting, the depth of a coding unit is increased by one, and the CU with the size of 32x32 is split into 4 CUs with the size of 16x 16; otherwise, the CU with the size of 32x32 is not divided, and an a neural network is introduced to predict and judge whether the CU with the size of 32x32 is divided or not by using the correlation between the texture complexity and the deep learning, so that the situation that the CU with the size of 32x32 is divided by adopting a threshold again is avoided, the accuracy of judging the texture complex region of the CU with the size of 32x32 is ensured, and the accuracy of the test set of the a neural network in the embodiment is 78.32%.

Referring to fig. 1, a video set is randomly divided into a training set of 90%, a validation set of 5%, and a test set of 5% in step S1, an a neural network is trained through the training set, the a neural network is validated through the validation set, and the a neural network is tested through the test set in step S5; loss function (L) used in training A neural network _f ) For categorical cross entropy, the formula is as follows:

wherein N is the number of CU blocks in the neural network A,

for the ith prediction of the a neural network model,

Example 3

As shown in fig. 5, the B neural network includes a first convolutional layer, a second convolutional layer, a pooling layer, a third convolutional layer, a fourth convolutional layer, a full-link layer, and an output layer; the first layer of convolutional layers is set to be 32 convolutional kernels, and the size of the convolutional kernels is 3x 3; the second convolution layer is set to 64 convolution kernels, and the size of each convolution kernel is 3x 3; the pooling layer was set for AvgPool operation, with a pooling kernel size of 2x 2; the third layer of convolutional layers is set to 64 convolutional kernels, and the size of the convolutional kernels is 2x 2; the fourth convolution layer is set to 128 convolution kernels, and the size of each convolution kernel is 2x 2;

the fully-connected layer includes two hidden layers, which are randomly lost with a 50% probability between the second hidden layer and the output layer: activating an output layer by adopting a Sigmoid function, enabling an output value of the output layer to be positioned between (0 and 1), judging whether the segmentation probability is greater than 50% when the output value of the output layer is the probability of segmentation of the CU with the size of 16x16, if so, adding one to the depth of a coding unit, and segmenting the CU with the size of 16x16 into 4 CUs with the size of 8x 8; otherwise, the 16 × 16 size CU is not split. By utilizing the correlation of texture complexity and deep learning, a B neural network is introduced to predict and judge whether the CU with the size of 16x16 is divided, the condition that a threshold value is adopted again to judge the division of the CU with the size of 16x16 is avoided, the accuracy of judgment on the texture complex area of the CU with the size of 16x16 is ensured, and the accuracy of a B neural network test set in the embodiment is 72.16.

Referring to fig. 1, in step S1, the video set is randomly divided into a training set of 90%, a validation set of 5%, and a test set of 5%, the B neural network is trained through the training set, validated through the validation set, and tested through the test set;

loss function (L) used in training B neural network _f ) For categorical cross entropy, the formula is as follows:

Example 4

As shown in FIG. 6, a system for efficient video coding intra CTU partitioning includes

A data set creating module 11, configured to collect an image data set, and create video sets with different resolutions according to the image data set;

the coding unit dividing module 12 is configured to set different quantization parameters QP, and extract four cases of size division of a video intra-frame coding unit CU in a video set under different quantization parameter QP settings, which are respectively a 64x 64-sized CU, a 32x 32-sized CU, a 16x 16-sized CU, and an 8x 8-sized CU;

the partitioning threshold determining module 13 is configured to represent the texture complexity of a current CU with a size of 64x64 by using the average absolute deviation M, determine a relationship between a situation that the CU with the size of 64x64 is split into the CU with the size of 32x32 under different quantization parameter QP settings and the average absolute deviation M, and obtain a partitioning threshold T of the CU with the size of 64x 64;

a determining module 14, configured to set an initial value of depth of the coding unit to 0, determine whether M is less than or equal to T, if M is less than or equal to T, then the 64x 64-sized CU is not divided, and output the 64x 64-sized CU as a final CU size decision; otherwise, adding one to the coding unit depth, dividing the 64x64 size CU into 4 32x32 size CUs;

the A neural network construction processing module 15 is used for constructing and training an A neural network, inputting each CU with the size of 32x32 into the trained A neural network, judging whether the CU with the size of 32x32 is divided or not by using the A neural network, if so, adding one to the depth of a coding unit, and dividing the CU with the size of 32x32 into 4 CUs with the size of 16x 16; otherwise, outputting the 32x32 size CU as a final CU size decision;

the B neural network construction processing module 16 is used for constructing and training a B neural network, inputting each CU with the size of 16x16 into the trained B neural network, judging whether the CU with the size of 16x16 is divided or not by using the B neural network, if so, adding one to the depth of a coding unit, dividing the CU with the size of 16x16 into 4 CUs with the size of 8x8, and outputting the CU with the size of 8x8 as the final CU size for determination; otherwise, outputting the 16x16 size CU as a final CU size decision;

and the coding processing module 17 is configured to determine the final CU size obtained in the judgment module, the a neural network construction processing module, and the a neural network construction processing module to perform subsequent coding processing, and the coding processing module is encapsulated in HEVC standard reference software HM 16.20.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method of CTU partitioning within a high efficiency video coding frame, comprising the steps of:

2. The method of claim 1, wherein in step S1, the collected image data set is a RAISE super high definition image set, and four YUV format video sets with different resolutions are produced from the RAISE super high definition image set, and the resolutions are 4928x3264, 2560x1600, 1536x1024 and 704x576 respectively.

3. The method of claim 2, wherein in step S3, different quantization parameter QP settings correspond to different thresholds, and 4 video sequences with different resolutions and different video contents are selected, and the intra-frame coding uses 4 quantization parameter QP values to obtain 16 video coding results;

the line pixel mean absolute value deviation calculation formula is as follows:

M＝min(MAD,MAD _H ,MAD _V )

where M is expressed as the mean absolute deviation in the final 64x64 size CU;

when a CU with the size of 64x64 is not divided, texture of the CU area is smooth in the horizontal direction and the vertical direction, when the CU with the size of 64x64 is divided, texture of the CU area is complex in the horizontal direction and the vertical direction, whether the CU with the size of 64x64 is divided or not is judged by using the average absolute value deviation of the CU with the size of 64x64, a critical state of the average absolute deviation value between the divided CU and the non-divided CU is obtained, the size of the average absolute deviation value when the CU with the size of 64x64 is not divided and the size of the average absolute deviation value when the CU with the size of 64x64 are confirmed, a numerical value between the average absolute deviation value and the divided CU is used as a current selection threshold, the division condition of the CU with the size of 64x64 and the standard comparison result are different and is used as an experimental error, the current selection threshold is +/-0.05, and the value with the minimum experimental error is selected as a division threshold T; the standard is the 64x64CU partition case when encoding video using the HEVC standard reference software HM 16.20.

4. The method of claim 3, wherein in step S5, the A neural network comprises a first convolutional block, a second convolutional block, a third convolutional block, a fourth convolutional block, a fifth convolutional block, a full link layer and an output layer, which are connected in sequence; the first convolution block consists of one convolution layer, each of the second convolution block, the third convolution block, the fourth convolution block and the fifth convolution block consists of two convolution layers, and the two convolution layers have the same parameter setting; the first convolution block is set to 64 convolution kernels, and the size of each convolution kernel is 7x 7; the second convolution block is set to 64 convolution kernels, and the size of each convolution kernel is 3x 3; the third convolution block is set to 128 convolution kernels, and the convolution kernel size is 3x 3; the fourth convolution block is set to 256 convolution kernels, and the size of each convolution kernel is 3x 3; the fifth convolution block is set to 512 convolution kernels, and the size of the convolution kernels is 3x 3;

the full connection layer comprises two hidden layers, the output layer is activated by adopting a SoftMax function, so that the output value of the output layer is mapped between (0, 1), the output value of the output layer is the probability of splitting the CU with the size of 32x32, the class labels of the output value of the output layer are '0' and '1', the '0' indicates no splitting, the '1' indicates splitting, the corresponding class label with the maximum output value probability is selected as a prediction result, if the prediction result is splitting, the depth of a coding unit is increased by one, and the CU with the size of 32x32 is split into 4 CUs with the size of 16x 16; otherwise, the 32x32 size CU is not split.

5. The method for CTU partitioning in high efficiency video coding frames according to claim 4, wherein in step S5, the video set is randomly divided into 90% training set, 5% validation set and 5% test set, the A neural network is trained through the training set, the A neural network is validated through the validation set, and the A neural network is tested through the test set;

wherein N is the number of CU blocks in the neural network A,

for the ith prediction of the a neural network model,

predicting a corresponding target for the ith of the A neural network model; testing the trained A neural network by using a test set, setting a prediction accuracy threshold, stopping training when the A neural network is tested by using the test set and if the prediction accuracy of the A neural network is greater than the prediction accuracy threshold, and obtaining the trained A neural networkThe a neural network of (1).

6. The method for high efficiency video coding intra-frame CTU partitioning according to claim 3, wherein in step S6, the B neural network comprises a first layer convolutional layer, a second layer convolutional layer, a pooling layer, a third layer convolutional layer, a fourth layer convolutional layer, a full-link layer and an output layer; the first layer of convolutional layers is set to be 32 convolutional kernels, and the size of the convolutional kernels is 3x 3; the second convolution layer is set to 64 convolution kernels, and the size of the convolution kernels is 3x 3; the pooling layer was set for AvgPool operation, with a pooling kernel size of 2x 2; the third layer of convolutional layers is set to 64 convolutional kernels, and the size of the convolutional kernels is 2x 2; the fourth convolution layer is set to 128 convolution kernels, and the size of each convolution kernel is 2x 2;

the fully-connected layer includes two hidden layers, which are randomly lost with a 50% probability between the second hidden layer and the output layer: activating an output layer by adopting a Sigmoid function, enabling an output value of the output layer to be positioned between (0 and 1), judging whether the segmentation probability is greater than 50% when the output value of the output layer is the probability of segmentation of the CU with the size of 16x16, if so, adding one to the depth of a coding unit, and segmenting the CU with the size of 16x16 into 4 CUs with the size of 8x 8; otherwise, the 16 × 16 size CU is not split.

7. The method of claim 6, wherein in step 6, the video set is randomly divided into 90% training set, 5% validation set, and 5% test set, the B neural network is trained by the training set, validated by the validation set, and tested by the test set;

the loss function formula used in training the B neural network is as follows:

8. A method for high efficiency video coding intra-frame CTU partitioning as defined in claim 1, wherein the subsequent coding process is performed in HEVC standard reference software HM 16.20.

9. A system for efficient video coding intra CTU partitioning, the system comprising:

10. The system of high efficiency video coding intra-frame CTU partitioning according to claim 9, wherein the coding processing module is encapsulated in HEVC standard reference software HM 16.20.