CN113781588A

CN113781588A - Intra-frame coding unit size dividing method based on neural network

Info

Publication number: CN113781588A
Application number: CN202110750972.2A
Authority: CN
Inventors: 张鹏; 刘浩宁; 向国庆; 严伟; 贾惠柱
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2021-12-10

Abstract

The application relates to the technical field of hardware encoders, in particular to an intra-frame encoding unit size dividing method based on a neural network. The method comprises the following steps: acquiring a coding unit with a first preset size; inputting the coding unit into the trained neural network model to obtain a coding unit with a second preset size; finding out the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit; if the probability of not dividing is larger than a first threshold value, terminating the division in advance; if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into the subblocks, and then acquiring the probability of the coding unit with a second preset size in the current division mode. The method saves the processes of statistical information and manual design characteristics, reduces the data dependence, flexibly limits the division mode, and can achieve the purpose of effectively improving the coding efficiency of the encoder.

Description

Intra-frame coding unit size dividing method based on neural network

Technical Field

The present application relates to the field of intra-frame coding technology, and more particularly, to a neural network-based intra-frame coding unit size partitioning method.

Background

Intra prediction is the process of generating a current sample prediction value using previously decoded samples in the same decoded picture. AVS represents Chinese digital audio and video coding and decoding technical standard, and AVS3 represents new generation coding and decoding technical standard. The prior art includes conventional methods based on correlation analysis and classification methods based on machine learning. The main conventional methods based on correlation analysis are that when the rate-distortion cost, i.e. the cost, of parent and child blocks meets a certain constraint, the binary tree partitioning process of the second child block can be skipped, and the depth level rarely selected as the optimal depth among spatially adjacent CUs is skipped using RD cost and pattern correlation between the different depth levels and the spatially adjacent CUs. In the classification method based on machine learning, a decision tree structure of a joint classifier is designed to eliminate unnecessary division modes of the current depth and reduce iteration; among the machine learning-based classification methods, there is a two-level binary classification-based method in which the increase in RD cost due to classification errors is modeled as a weight in Support Vector Machine (SVM) training, and whether to terminate the classification in advance is selected.

However, the conventional method based on correlation analysis highly depends on the global statistical information of the coded CUs, so that it is difficult to handle the complexity, and as the types of partition modes increase, the artificial statistical analysis is easy to cause important information loss. The classification method based on machine learning is too dependent on manual design to extract useful coding features, the classification mode is complex, and the accuracy is low.

Therefore, the present application proposes an intra-coding unit size partitioning method based on a neural network to solve this problem.

Disclosure of Invention

In order to achieve the above technical object, the present application provides a neural network-based intra coding unit size dividing method, including the following steps:

acquiring a coding unit with a first preset size;

inputting the coding unit into a trained neural network model to obtain a coding unit with a second preset size;

recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;

if the probability of not dividing is larger than a first threshold value, terminating the division in advance;

if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode;

and obtaining the probabilities with preset number, comparing the probabilities in size, sequencing, and selecting the division modes corresponding to the first N maximum probabilities for division.

Specifically, the method for calculating the division of the coding unit into the sub-blocks comprises the following steps: quadtree partitioning, transverse binary tree partitioning, longitudinal binary tree partitioning, transversely expanded quadtree partitioning, and longitudinally expanded quadtree partitioning.

Still more specifically, the neural network model includes 1 input layer, 4 convolutional layers, 1 add layer, and 2 anti-convolutional layers.

Preferably, the training step of the neural network model is as follows:

obtaining a sample;

dividing the sample into a training sample and a verification sample according to a preset proportion;

designing a loss function and an optimization algorithm;

inputting the training sample for training, and verifying by using the verification sample once training;

redesigning a loss function and an optimization algorithm when a preset verification effect cannot be achieved;

and when the iteration times reach the preset times, terminating the training.

Preferably, the loss function is:

wherein the content of the first and second substances,

is a balance-like factor, (1-p)_j)^γFor the modulation factor, p is the prediction probability, C is the number of classes, and β and γ represent fixed coefficients.

Further, the convolutional layer further comprises a filter with a preset size.

Still further, the neural network model further comprises an output layer, and the activation function of the output layer is a softmax function.

The second aspect of the present invention provides an AVS3 hardware encoder, wherein the AVS3 hardware encoder applies the neural network-based intra-coding unit size division method in any one of the embodiments.

A third aspect of the invention provides a computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions which, when executed by the processor, cause the processor to perform the steps of:

acquiring a coding unit with a first preset size;

A fourth aspect of the present invention provides a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and executed to perform the steps of:

acquiring a coding unit with a first preset size;

The beneficial effect of this application does: the method of the invention uses the correlation of the video spatial domain and uses the adjacent coded pixels in the same frame of image to predict the current pixel, thus achieving the purpose of effectively removing the video spatial domain redundancy. The method saves the processes of statistical information and manual design characteristics, reduces the data dependence, flexibly limits the division mode, and can achieve the purpose of effectively improving the coding efficiency of the encoder.

Drawings

FIG. 1 shows a schematic flow chart of the method of embodiment 1 of the present application;

FIG. 2 is a schematic diagram showing a partitioning method in embodiment 1 of the present application;

FIG. 3 is a schematic diagram illustrating a neural network training process in embodiments 1 and 2 of the present application;

fig. 4 shows a schematic diagram of a neural network structure in embodiment 2 of the present application;

FIG. 5 is a schematic diagram showing batch normalization in a neural network in example 2 of the present application;

FIG. 6 is a schematic diagram showing a dividing operation process in embodiment 2 of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a schematic diagram of a storage medium provided in an embodiment of the present application.

Detailed Description

Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present application. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present application. It will be apparent to one skilled in the art that the present application may be practiced without one or more of these details. In other instances, well-known features of the art have not been described in order to avoid obscuring the present application.

It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Exemplary embodiments according to the present application will now be described in more detail with reference to the accompanying drawings. These exemplary embodiments may, however, be embodied in many different forms and should not be construed as limited to only the embodiments set forth herein. The figures are not drawn to scale, wherein certain details may be exaggerated and omitted for clarity. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.

Example 1:

the embodiment implements a neural network-based intra-frame coding unit size dividing method, as shown in fig. 1, including the following steps:

s1, acquiring a coding unit with a first preset size;

s2, inputting the coding units into the trained neural network model to obtain coding units with a second preset size;

s3, recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;

s4, if the probability of non-division is larger than a first threshold value, terminating division in advance;

s5, if the probability of non-division is smaller than a second threshold, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operations to trace back to the size of the coding unit to obtain the probability of the coding unit with the second preset size in the current division mode;

and S6, obtaining the probabilities with preset numbers, comparing the probabilities in size, sequencing the probabilities, and selecting the division modes corresponding to the first N maximum probabilities for division.

Specifically, as shown in fig. 2, the method for calculating the division of the coding unit into sub-blocks includes: quadtree partitioning DIV1, transverse binary tree partitioning DIV2, longitudinal binary tree partitioning DIV3, transversely extended quadtree partitioning DIV4, and longitudinally extended quadtree partitioning DIV 5. In this embodiment, the method for calculating the sub-block division of the coding unit includes: the method comprises the following steps that four-way tree division DIV1, transverse two-way tree division DIV2, longitudinal two-way tree division DIV3, transverse expanded four-way tree division DIV4 and longitudinal expanded four-way tree division DIV5 are divided into 5 division methods, and the total number of the division methods plus the non-division condition is 6, so that the probability of the preset number is compared in size and sequenced, when the division modes corresponding to the first N maximum probabilities are selected for division, the preset number is preferably 6, and N is an integer between 1 and 6.

As shown in fig. 3, the training of the neural network model generally includes training sample selection, model building, model training and operation flow, and the model is continuously optimized in the process of training the model. Preferably, the specific training steps of the neural network model of the present application are as follows:

obtaining a sample;

designing a loss function and an optimization algorithm; inputting the training sample for training, and verifying by using the verification sample once training;

and when the iteration times reach the preset times, terminating the training.

When obtaining samples, at least one sequence is selected under each resolution, spatial perception information measurement (SI) calculation is carried out on the test sequence, the selected sequence can cover the whole range of SI, samples with small rate distortion cost (RD) difference between 2N multiplied by 2N and N multiplied by N coding units are eliminated, model learning of the samples is prevented, error classification is carried out on other samples, and the elimination threshold is | RD | ≦ 0.02. In model building, the coding unit CU categories are set to be 64x64, each size in the CU is 4x4, the number of categories is 22 {0:4x8,1:8x4, …,21:64x64}, and the batch (batch) size is selected to be 64.

Preferably, the loss function is:

wherein the content of the first and second substances,

Further, the convolutional layer further includes a filter of a predetermined size.

Example 2:

the implementation implements a neural network-based intra-frame coding unit size division method, which comprises the following steps:

step 1, obtaining a coding unit with a first preset size.

The present embodiment selects a Coding Unit (CU) size of 64x64 for the first preset size.

And 2, inputting the coding unit into the trained neural network model to obtain a coding unit with a second preset size.

The present embodiment selects an output with a second preset size of Coding Unit (CU) size of 16x16x 22.

And 3, recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit.

The general process of training the neural network model is still shown in fig. 3, and the specific training steps are as follows:

obtaining a sample;

designing a loss function and an optimization algorithm;

and when the iteration times reach the preset times, terminating the training.

Preferably, the loss function is:

wherein the content of the first and second substances,

is a balance-like factor, (1-p)_j)^γFor the modulation factor, p is the prediction probability, C is the number of classes, and β and γ represent fixed coefficients. For the class balance factor, for the class with smaller sample size, the loss value is given higher weight, and the modulation factor further increases the loss weight of the sample with higher classification difficulty, so that the model focuses more on the wrong sample.

Preferably, the optimization algorithm adopts an Adam algorithm, which is generally called Adaptive motion estimation algorithm.

As shown in fig. 4, the proposed deep learning model structure consists of 1 input layer, 4 convolutional layers, 1 addition layer and 2 deconvolution layers, where conv stands for convolutional layer, deconv stands for deconvolution layer,

indicating added layers, QP identifies the quantization parameter, solid arrows in the unlabeled arrows represent convolution operations, and dashed arrows represent deconvolution operations. As shown in fig. 5, each convolution layer and each deconvolution layer include a Batch Normalization (BN), which is to normalize each Batch of data, and thus the efficiency is higher. In additionBesides, the output layer activation function is a softmax function, and the other layer activation functions are relu functions.

And 4, if the probability of non-division is greater than a first threshold value, terminating division in advance.

And 5, if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into the subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode.

And 6, obtaining the probability of the preset number, comparing the size of the probability, sequencing the probability, and selecting the division modes corresponding to the first N maximum probabilities for division.

Referring to fig. 6, CU denotes a coding unit, QP denotes a quantization parameter, SUB denotes a subblock, QT denotes a quadtree partition, HBT denotes a horizontal binary tree partition, P denotes a probability, P (n) denotes a probability that the coding unit is not divided, P (QT) denotes a probability that the coding unit is quadtree partitioned, P (HBT) denotes a probability that the coding unit is horizontally binary tree partitioned, and ellipses denote vertical binary tree partitions, horizontally extended quadtree partitions, and vertically extended quadtree partitions that are also performed on the coding unit. And performing different division operations on the current coding unit, recording the coordinates and the size of the subblocks, finding corresponding positions in the model prediction result, namely calculating the probability of the divided subblocks, and then backtracking the probability of the divided subblocks to the size of the current coding unit through adding and averaging operations to obtain the prediction probability under the specific division mode of the current coding unit. In this embodiment, the preset number is set to 6, the 6 kinds of partitioning modes are quadtree partitioning, horizontal binary tree partitioning, vertical binary tree partitioning, horizontally expanded quadtree partitioning, vertically expanded quadtree partitioning and non-partitioning modes, respectively, and N is an integer between 1 and 6.

Example 3:

the present embodiment implements an AVS3 hardware encoder, and the AVS3 hardware encoder applies the neural network-based intra-coding unit size division method in any of the above embodiments. The intra-frame coding unit size dividing method based on the neural network comprises the following steps:

acquiring a coding unit with a first preset size;

Referring next to fig. 7, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 7, the electronic device 2 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the computer program to execute the neural network-based intra-coding unit size dividing method provided by any one of the foregoing embodiments of the present application.

The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is configured to store a program, and the processor 200 executes the program after receiving an execution instruction, where the neural network-based intra-frame coding unit size dividing method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200, or implemented by the processor 200.

The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.

The electronic device provided by the embodiment of the application and the neural network-based intra-frame coding unit size dividing method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 8, the computer readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program performs the neural network based intra-frame coding unit size division method according to any of the foregoing embodiments.

Examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

It should be noted that: the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application. In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An intra-frame coding unit size dividing method based on a neural network is characterized by comprising the following steps:

acquiring a coding unit with a first preset size;

2. The neural network-based intra-coding unit size division method according to claim 1, wherein the method for calculating the division of the coding unit into sub-blocks comprises: quadtree partitioning, transverse binary tree partitioning, longitudinal binary tree partitioning, transversely expanded quadtree partitioning, and longitudinally expanded quadtree partitioning.

3. The neural network-based intra-coding unit size partitioning method according to claim 1, wherein the neural network model includes 1 input layer, 4 convolutional layers, 1 add layer, and 2 anti-convolutional layers.

4. The neural network-based intra-coding unit size partitioning method according to claim 1, wherein the training of the neural network model comprises the following steps:

obtaining a sample;

designing a loss function and an optimization algorithm;

and when the iteration times reach the preset times, terminating the training.

5. The neural network-based intra-coding unit size partitioning method according to claim 4, wherein the loss function is:

wherein the content of the first and second substances,

6. The neural network-based intra-coding unit size division method of claim 3, wherein the convolutional layer further comprises a filter of a preset size.

7. The neural network-based intra-coding unit size partitioning method according to claim 3, wherein the neural network model further comprises an output layer, and an activation function of the output layer is a softmax function.

8. An AVS3 hardware encoder, wherein the AVS3 hardware encoder applies the neural network based intra-coding unit size partitioning method as claimed in any one of claims 1 to 7.

9. A computer device comprising a memory and a processor, wherein computer readable instructions are stored in the memory, which computer readable instructions, when executed by the processor, cause the processor to perform the steps of the method according to any one of claims 1 to 7.

10. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to carry out the steps of the method according to any one of claims 1 to 7.