CN113781588A - Intra-frame coding unit size dividing method based on neural network - Google Patents

Intra-frame coding unit size dividing method based on neural network Download PDF

Info

Publication number
CN113781588A
CN113781588A CN202110750972.2A CN202110750972A CN113781588A CN 113781588 A CN113781588 A CN 113781588A CN 202110750972 A CN202110750972 A CN 202110750972A CN 113781588 A CN113781588 A CN 113781588A
Authority
CN
China
Prior art keywords
coding unit
neural network
size
probability
division
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110750972.2A
Other languages
Chinese (zh)
Inventor
张鹏
刘浩宁
向国庆
严伟
贾惠柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN202110750972.2A priority Critical patent/CN113781588A/en
Publication of CN113781588A publication Critical patent/CN113781588A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application relates to the technical field of hardware encoders, in particular to an intra-frame encoding unit size dividing method based on a neural network. The method comprises the following steps: acquiring a coding unit with a first preset size; inputting the coding unit into the trained neural network model to obtain a coding unit with a second preset size; finding out the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit; if the probability of not dividing is larger than a first threshold value, terminating the division in advance; if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into the subblocks, and then acquiring the probability of the coding unit with a second preset size in the current division mode. The method saves the processes of statistical information and manual design characteristics, reduces the data dependence, flexibly limits the division mode, and can achieve the purpose of effectively improving the coding efficiency of the encoder.

Description

Intra-frame coding unit size dividing method based on neural network
Technical Field
The present application relates to the field of intra-frame coding technology, and more particularly, to a neural network-based intra-frame coding unit size partitioning method.
Background
Intra prediction is the process of generating a current sample prediction value using previously decoded samples in the same decoded picture. AVS represents Chinese digital audio and video coding and decoding technical standard, and AVS3 represents new generation coding and decoding technical standard. The prior art includes conventional methods based on correlation analysis and classification methods based on machine learning. The main conventional methods based on correlation analysis are that when the rate-distortion cost, i.e. the cost, of parent and child blocks meets a certain constraint, the binary tree partitioning process of the second child block can be skipped, and the depth level rarely selected as the optimal depth among spatially adjacent CUs is skipped using RD cost and pattern correlation between the different depth levels and the spatially adjacent CUs. In the classification method based on machine learning, a decision tree structure of a joint classifier is designed to eliminate unnecessary division modes of the current depth and reduce iteration; among the machine learning-based classification methods, there is a two-level binary classification-based method in which the increase in RD cost due to classification errors is modeled as a weight in Support Vector Machine (SVM) training, and whether to terminate the classification in advance is selected.
However, the conventional method based on correlation analysis highly depends on the global statistical information of the coded CUs, so that it is difficult to handle the complexity, and as the types of partition modes increase, the artificial statistical analysis is easy to cause important information loss. The classification method based on machine learning is too dependent on manual design to extract useful coding features, the classification mode is complex, and the accuracy is low.
Therefore, the present application proposes an intra-coding unit size partitioning method based on a neural network to solve this problem.
Disclosure of Invention
In order to achieve the above technical object, the present application provides a neural network-based intra coding unit size dividing method, including the following steps:
acquiring a coding unit with a first preset size;
inputting the coding unit into a trained neural network model to obtain a coding unit with a second preset size;
recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;
if the probability of not dividing is larger than a first threshold value, terminating the division in advance;
if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode;
and obtaining the probabilities with preset number, comparing the probabilities in size, sequencing, and selecting the division modes corresponding to the first N maximum probabilities for division.
Specifically, the method for calculating the division of the coding unit into the sub-blocks comprises the following steps: quadtree partitioning, transverse binary tree partitioning, longitudinal binary tree partitioning, transversely expanded quadtree partitioning, and longitudinally expanded quadtree partitioning.
Still more specifically, the neural network model includes 1 input layer, 4 convolutional layers, 1 add layer, and 2 anti-convolutional layers.
Preferably, the training step of the neural network model is as follows:
obtaining a sample;
dividing the sample into a training sample and a verification sample according to a preset proportion;
designing a loss function and an optimization algorithm;
inputting the training sample for training, and verifying by using the verification sample once training;
redesigning a loss function and an optimization algorithm when a preset verification effect cannot be achieved;
and when the iteration times reach the preset times, terminating the training.
Preferably, the loss function is:
Figure BDA0003144384050000031
wherein the content of the first and second substances,
Figure BDA0003144384050000032
is a balance-like factor, (1-p)j)γFor the modulation factor, p is the prediction probability, C is the number of classes, and β and γ represent fixed coefficients.
Further, the convolutional layer further comprises a filter with a preset size.
Still further, the neural network model further comprises an output layer, and the activation function of the output layer is a softmax function.
The second aspect of the present invention provides an AVS3 hardware encoder, wherein the AVS3 hardware encoder applies the neural network-based intra-coding unit size division method in any one of the embodiments.
A third aspect of the invention provides a computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions which, when executed by the processor, cause the processor to perform the steps of:
acquiring a coding unit with a first preset size;
inputting the coding unit into a trained neural network model to obtain a coding unit with a second preset size;
recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;
if the probability of not dividing is larger than a first threshold value, terminating the division in advance;
if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode;
and obtaining the probabilities with preset number, comparing the probabilities in size, sequencing, and selecting the division modes corresponding to the first N maximum probabilities for division.
A fourth aspect of the present invention provides a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and executed to perform the steps of:
acquiring a coding unit with a first preset size;
inputting the coding unit into a trained neural network model to obtain a coding unit with a second preset size;
recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;
if the probability of not dividing is larger than a first threshold value, terminating the division in advance;
if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode;
and obtaining the probabilities with preset number, comparing the probabilities in size, sequencing, and selecting the division modes corresponding to the first N maximum probabilities for division.
The beneficial effect of this application does: the method of the invention uses the correlation of the video spatial domain and uses the adjacent coded pixels in the same frame of image to predict the current pixel, thus achieving the purpose of effectively removing the video spatial domain redundancy. The method saves the processes of statistical information and manual design characteristics, reduces the data dependence, flexibly limits the division mode, and can achieve the purpose of effectively improving the coding efficiency of the encoder.
Drawings
FIG. 1 shows a schematic flow chart of the method of embodiment 1 of the present application;
FIG. 2 is a schematic diagram showing a partitioning method in embodiment 1 of the present application;
FIG. 3 is a schematic diagram illustrating a neural network training process in embodiments 1 and 2 of the present application;
fig. 4 shows a schematic diagram of a neural network structure in embodiment 2 of the present application;
FIG. 5 is a schematic diagram showing batch normalization in a neural network in example 2 of the present application;
FIG. 6 is a schematic diagram showing a dividing operation process in embodiment 2 of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 8 is a schematic diagram of a storage medium provided in an embodiment of the present application.
Detailed Description
Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present application. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present application. It will be apparent to one skilled in the art that the present application may be practiced without one or more of these details. In other instances, well-known features of the art have not been described in order to avoid obscuring the present application.
It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Exemplary embodiments according to the present application will now be described in more detail with reference to the accompanying drawings. These exemplary embodiments may, however, be embodied in many different forms and should not be construed as limited to only the embodiments set forth herein. The figures are not drawn to scale, wherein certain details may be exaggerated and omitted for clarity. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.
Example 1:
the embodiment implements a neural network-based intra-frame coding unit size dividing method, as shown in fig. 1, including the following steps:
s1, acquiring a coding unit with a first preset size;
s2, inputting the coding units into the trained neural network model to obtain coding units with a second preset size;
s3, recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;
s4, if the probability of non-division is larger than a first threshold value, terminating division in advance;
s5, if the probability of non-division is smaller than a second threshold, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operations to trace back to the size of the coding unit to obtain the probability of the coding unit with the second preset size in the current division mode;
and S6, obtaining the probabilities with preset numbers, comparing the probabilities in size, sequencing the probabilities, and selecting the division modes corresponding to the first N maximum probabilities for division.
Specifically, as shown in fig. 2, the method for calculating the division of the coding unit into sub-blocks includes: quadtree partitioning DIV1, transverse binary tree partitioning DIV2, longitudinal binary tree partitioning DIV3, transversely extended quadtree partitioning DIV4, and longitudinally extended quadtree partitioning DIV 5. In this embodiment, the method for calculating the sub-block division of the coding unit includes: the method comprises the following steps that four-way tree division DIV1, transverse two-way tree division DIV2, longitudinal two-way tree division DIV3, transverse expanded four-way tree division DIV4 and longitudinal expanded four-way tree division DIV5 are divided into 5 division methods, and the total number of the division methods plus the non-division condition is 6, so that the probability of the preset number is compared in size and sequenced, when the division modes corresponding to the first N maximum probabilities are selected for division, the preset number is preferably 6, and N is an integer between 1 and 6.
Still more specifically, the neural network model includes 1 input layer, 4 convolutional layers, 1 add layer, and 2 anti-convolutional layers.
As shown in fig. 3, the training of the neural network model generally includes training sample selection, model building, model training and operation flow, and the model is continuously optimized in the process of training the model. Preferably, the specific training steps of the neural network model of the present application are as follows:
obtaining a sample;
dividing the sample into a training sample and a verification sample according to a preset proportion;
designing a loss function and an optimization algorithm; inputting the training sample for training, and verifying by using the verification sample once training;
redesigning a loss function and an optimization algorithm when a preset verification effect cannot be achieved;
and when the iteration times reach the preset times, terminating the training.
When obtaining samples, at least one sequence is selected under each resolution, spatial perception information measurement (SI) calculation is carried out on the test sequence, the selected sequence can cover the whole range of SI, samples with small rate distortion cost (RD) difference between 2N multiplied by 2N and N multiplied by N coding units are eliminated, model learning of the samples is prevented, error classification is carried out on other samples, and the elimination threshold is | RD | ≦ 0.02. In model building, the coding unit CU categories are set to be 64x64, each size in the CU is 4x4, the number of categories is 22 {0:4x8,1:8x4, …,21:64x64}, and the batch (batch) size is selected to be 64.
Preferably, the loss function is:
Figure BDA0003144384050000081
wherein the content of the first and second substances,
Figure BDA0003144384050000082
is a balance-like factor, (1-p)j)γFor the modulation factor, p is the prediction probability, C is the number of classes, and β and γ represent fixed coefficients.
Further, the convolutional layer further includes a filter of a predetermined size.
Still further, the neural network model further comprises an output layer, and the activation function of the output layer is a softmax function.
Example 2:
the implementation implements a neural network-based intra-frame coding unit size division method, which comprises the following steps:
step 1, obtaining a coding unit with a first preset size.
The present embodiment selects a Coding Unit (CU) size of 64x64 for the first preset size.
And 2, inputting the coding unit into the trained neural network model to obtain a coding unit with a second preset size.
The present embodiment selects an output with a second preset size of Coding Unit (CU) size of 16x16x 22.
And 3, recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit.
The general process of training the neural network model is still shown in fig. 3, and the specific training steps are as follows:
obtaining a sample;
dividing the sample into a training sample and a verification sample according to a preset proportion;
designing a loss function and an optimization algorithm;
inputting the training sample for training, and verifying by using the verification sample once training;
redesigning a loss function and an optimization algorithm when a preset verification effect cannot be achieved;
and when the iteration times reach the preset times, terminating the training.
Preferably, the loss function is:
Figure BDA0003144384050000101
wherein the content of the first and second substances,
Figure BDA0003144384050000102
is a balance-like factor, (1-p)j)γFor the modulation factor, p is the prediction probability, C is the number of classes, and β and γ represent fixed coefficients. For the class balance factor, for the class with smaller sample size, the loss value is given higher weight, and the modulation factor further increases the loss weight of the sample with higher classification difficulty, so that the model focuses more on the wrong sample.
Preferably, the optimization algorithm adopts an Adam algorithm, which is generally called Adaptive motion estimation algorithm.
As shown in fig. 4, the proposed deep learning model structure consists of 1 input layer, 4 convolutional layers, 1 addition layer and 2 deconvolution layers, where conv stands for convolutional layer, deconv stands for deconvolution layer,
Figure BDA0003144384050000103
indicating added layers, QP identifies the quantization parameter, solid arrows in the unlabeled arrows represent convolution operations, and dashed arrows represent deconvolution operations. As shown in fig. 5, each convolution layer and each deconvolution layer include a Batch Normalization (BN), which is to normalize each Batch of data, and thus the efficiency is higher. In additionBesides, the output layer activation function is a softmax function, and the other layer activation functions are relu functions.
And 4, if the probability of non-division is greater than a first threshold value, terminating division in advance.
And 5, if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into the subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode.
And 6, obtaining the probability of the preset number, comparing the size of the probability, sequencing the probability, and selecting the division modes corresponding to the first N maximum probabilities for division.
Referring to fig. 6, CU denotes a coding unit, QP denotes a quantization parameter, SUB denotes a subblock, QT denotes a quadtree partition, HBT denotes a horizontal binary tree partition, P denotes a probability, P (n) denotes a probability that the coding unit is not divided, P (QT) denotes a probability that the coding unit is quadtree partitioned, P (HBT) denotes a probability that the coding unit is horizontally binary tree partitioned, and ellipses denote vertical binary tree partitions, horizontally extended quadtree partitions, and vertically extended quadtree partitions that are also performed on the coding unit. And performing different division operations on the current coding unit, recording the coordinates and the size of the subblocks, finding corresponding positions in the model prediction result, namely calculating the probability of the divided subblocks, and then backtracking the probability of the divided subblocks to the size of the current coding unit through adding and averaging operations to obtain the prediction probability under the specific division mode of the current coding unit. In this embodiment, the preset number is set to 6, the 6 kinds of partitioning modes are quadtree partitioning, horizontal binary tree partitioning, vertical binary tree partitioning, horizontally expanded quadtree partitioning, vertically expanded quadtree partitioning and non-partitioning modes, respectively, and N is an integer between 1 and 6.
Example 3:
the present embodiment implements an AVS3 hardware encoder, and the AVS3 hardware encoder applies the neural network-based intra-coding unit size division method in any of the above embodiments. The intra-frame coding unit size dividing method based on the neural network comprises the following steps:
acquiring a coding unit with a first preset size;
inputting the coding unit into a trained neural network model to obtain a coding unit with a second preset size;
recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;
if the probability of not dividing is larger than a first threshold value, terminating the division in advance;
if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode;
and obtaining the probabilities with preset number, comparing the probabilities in size, sequencing, and selecting the division modes corresponding to the first N maximum probabilities for division.
Referring next to fig. 7, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 7, the electronic device 2 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the computer program to execute the neural network-based intra-coding unit size dividing method provided by any one of the foregoing embodiments of the present application.
The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is configured to store a program, and the processor 200 executes the program after receiving an execution instruction, where the neural network-based intra-frame coding unit size dividing method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200, or implemented by the processor 200.
The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the neural network-based intra-frame coding unit size dividing method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 8, the computer readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program performs the neural network based intra-frame coding unit size division method according to any of the foregoing embodiments.
Examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
It should be noted that: the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application. In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An intra-frame coding unit size dividing method based on a neural network is characterized by comprising the following steps:
acquiring a coding unit with a first preset size;
inputting the coding unit into a trained neural network model to obtain a coding unit with a second preset size;
recording the size of the coding unit with the second preset size, finding the corresponding position in the model prediction result, and performing addition and average operation to obtain the probability of not dividing the coding unit;
if the probability of not dividing is larger than a first threshold value, terminating the division in advance;
if the probability of non-division is smaller than a second threshold value, calculating the probability of dividing the coding unit into subblocks, and then performing addition and average operation to trace back to the size of the coding unit so as to obtain the probability of the coding unit with the second preset size in the current division mode;
and obtaining the probabilities with preset number, comparing the probabilities in size, sequencing, and selecting the division modes corresponding to the first N maximum probabilities for division.
2. The neural network-based intra-coding unit size division method according to claim 1, wherein the method for calculating the division of the coding unit into sub-blocks comprises: quadtree partitioning, transverse binary tree partitioning, longitudinal binary tree partitioning, transversely expanded quadtree partitioning, and longitudinally expanded quadtree partitioning.
3. The neural network-based intra-coding unit size partitioning method according to claim 1, wherein the neural network model includes 1 input layer, 4 convolutional layers, 1 add layer, and 2 anti-convolutional layers.
4. The neural network-based intra-coding unit size partitioning method according to claim 1, wherein the training of the neural network model comprises the following steps:
obtaining a sample;
dividing the sample into a training sample and a verification sample according to a preset proportion;
designing a loss function and an optimization algorithm;
inputting the training sample for training, and verifying by using the verification sample once training;
redesigning a loss function and an optimization algorithm when a preset verification effect cannot be achieved;
and when the iteration times reach the preset times, terminating the training.
5. The neural network-based intra-coding unit size partitioning method according to claim 4, wherein the loss function is:
Figure FDA0003144384040000021
wherein the content of the first and second substances,
Figure FDA0003144384040000022
is a balance-like factor, (1-p)j)γFor the modulation factor, p is the prediction probability, C is the number of classes, and β and γ represent fixed coefficients.
6. The neural network-based intra-coding unit size division method of claim 3, wherein the convolutional layer further comprises a filter of a preset size.
7. The neural network-based intra-coding unit size partitioning method according to claim 3, wherein the neural network model further comprises an output layer, and an activation function of the output layer is a softmax function.
8. An AVS3 hardware encoder, wherein the AVS3 hardware encoder applies the neural network based intra-coding unit size partitioning method as claimed in any one of claims 1 to 7.
9. A computer device comprising a memory and a processor, wherein computer readable instructions are stored in the memory, which computer readable instructions, when executed by the processor, cause the processor to perform the steps of the method according to any one of claims 1 to 7.
10. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to carry out the steps of the method according to any one of claims 1 to 7.
CN202110750972.2A 2021-07-01 2021-07-01 Intra-frame coding unit size dividing method based on neural network Pending CN113781588A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110750972.2A CN113781588A (en) 2021-07-01 2021-07-01 Intra-frame coding unit size dividing method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110750972.2A CN113781588A (en) 2021-07-01 2021-07-01 Intra-frame coding unit size dividing method based on neural network

Publications (1)

Publication Number Publication Date
CN113781588A true CN113781588A (en) 2021-12-10

Family

ID=78836058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110750972.2A Pending CN113781588A (en) 2021-07-01 2021-07-01 Intra-frame coding unit size dividing method based on neural network

Country Status (1)

Country Link
CN (1) CN113781588A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115052154A (en) * 2022-05-30 2022-09-13 北京百度网讯科技有限公司 Model training and video coding method, device, equipment and storage medium
CN116489386A (en) * 2023-03-24 2023-07-25 重庆邮电大学 VVC inter-frame rapid coding method based on reference block
CN117692663A (en) * 2024-01-31 2024-03-12 腾讯科技(深圳)有限公司 Binary tree partitioning processing method, equipment and storage medium for coding unit

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060251330A1 (en) * 2003-05-20 2006-11-09 Peter Toth Hybrid video compression method
CN108200442A (en) * 2018-01-23 2018-06-22 北京易智能科技有限公司 A kind of HEVC intraframe coding dividing elements methods based on neural network
CN109714584A (en) * 2019-01-11 2019-05-03 杭州电子科技大学 3D-HEVC depth map encoding unit high-speed decision method based on deep learning
CN109788296A (en) * 2018-12-25 2019-05-21 中山大学 Interframe encode dividing elements method, apparatus and storage medium for HEVC
US20190273948A1 (en) * 2019-01-08 2019-09-05 Intel Corporation Method and system of neural network loop filtering for video coding
CN111263145A (en) * 2020-01-17 2020-06-09 福州大学 Multifunctional video rapid coding method based on deep neural network
CN111510728A (en) * 2020-04-12 2020-08-07 北京工业大学 HEVC intra-frame rapid coding method based on depth feature expression and learning
CN111757110A (en) * 2020-07-02 2020-10-09 中实燃气发展(西安)有限公司 Video coding method, coding tree unit dividing method, system, device and readable storage medium
CN111800642A (en) * 2020-07-02 2020-10-20 中实燃气发展(西安)有限公司 HEVC intra-frame angle mode selection method, device and equipment and readable storage medium
US20210136371A1 (en) * 2018-04-10 2021-05-06 InterDigitai VC Holdings, Inc. Deep learning based imaged partitioning for video compression
CN112887712A (en) * 2021-02-03 2021-06-01 重庆邮电大学 HEVC intra-frame CTU partitioning method based on convolutional neural network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060251330A1 (en) * 2003-05-20 2006-11-09 Peter Toth Hybrid video compression method
CN108200442A (en) * 2018-01-23 2018-06-22 北京易智能科技有限公司 A kind of HEVC intraframe coding dividing elements methods based on neural network
US20210136371A1 (en) * 2018-04-10 2021-05-06 InterDigitai VC Holdings, Inc. Deep learning based imaged partitioning for video compression
CN109788296A (en) * 2018-12-25 2019-05-21 中山大学 Interframe encode dividing elements method, apparatus and storage medium for HEVC
US20190273948A1 (en) * 2019-01-08 2019-09-05 Intel Corporation Method and system of neural network loop filtering for video coding
CN109714584A (en) * 2019-01-11 2019-05-03 杭州电子科技大学 3D-HEVC depth map encoding unit high-speed decision method based on deep learning
CN111263145A (en) * 2020-01-17 2020-06-09 福州大学 Multifunctional video rapid coding method based on deep neural network
CN111510728A (en) * 2020-04-12 2020-08-07 北京工业大学 HEVC intra-frame rapid coding method based on depth feature expression and learning
CN111757110A (en) * 2020-07-02 2020-10-09 中实燃气发展(西安)有限公司 Video coding method, coding tree unit dividing method, system, device and readable storage medium
CN111800642A (en) * 2020-07-02 2020-10-20 中实燃气发展(西安)有限公司 HEVC intra-frame angle mode selection method, device and equipment and readable storage medium
CN112887712A (en) * 2021-02-03 2021-06-01 重庆邮电大学 HEVC intra-frame CTU partitioning method based on convolutional neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115052154A (en) * 2022-05-30 2022-09-13 北京百度网讯科技有限公司 Model training and video coding method, device, equipment and storage medium
CN115052154B (en) * 2022-05-30 2023-04-14 北京百度网讯科技有限公司 Model training and video coding method, device, equipment and storage medium
CN116489386A (en) * 2023-03-24 2023-07-25 重庆邮电大学 VVC inter-frame rapid coding method based on reference block
CN117692663A (en) * 2024-01-31 2024-03-12 腾讯科技(深圳)有限公司 Binary tree partitioning processing method, equipment and storage medium for coding unit

Similar Documents

Publication Publication Date Title
CN113781588A (en) Intra-frame coding unit size dividing method based on neural network
CN109711413B (en) Image semantic segmentation method based on deep learning
RU2708347C1 (en) Image encoding method and device and image decoding method and device
US20200213587A1 (en) Method and apparatus for filtering with mode-aware deep learning
TWI445411B (en) Method for performing local motion vector derivation during video coding of a coding unit, and associated apparatus
CN104661031A (en) Method for coding and decoding video image, coding equipment and decoding equipment
CN111758254B (en) Efficient context model computational design in transform coefficient coding
CN104937937A (en) Video coding method using at least evaluated visual quality and related video coding apparatus
CN111986278B (en) Image encoding device, probability model generating device, and image compression system
CN109889827B (en) Intra-frame prediction coding method and device, electronic equipment and computer storage medium
JP2002007966A (en) Document image decoding method
CN115606188A (en) Point cloud encoding and decoding method, encoder, decoder and storage medium
CN113747163A (en) Image coding and decoding method and compression method based on context reorganization modeling
CN115379217A (en) Video coding processing method, device, equipment and storage medium
Forchhammer et al. Optimal context quantization in lossless compression of image data sequences
Zhang et al. Graph-based transform for 2D piecewise smooth signals with random discontinuity locations
KR101192060B1 (en) Method and device for choosing a motion vector for the coding of a set of blocks
JP2018182531A (en) Division shape determining apparatus, learning apparatus, division shape determining method, and division shape determining program
CN116033153A (en) Method and system for rapidly dividing coding units under VVC standard
CN113691808A (en) Neural network-based interframe coding unit size dividing method
CN1620143A (en) Inter and intra band prediction of singularity coefficients using estimates based on nonlinear approximants
CN114938455A (en) Coding method and device based on unit characteristics, electronic equipment and storage medium
CN108322741A (en) A kind of method and device of determining coding mode
Mercat et al. Machine learning based choice of characteristics for the one-shot determination of the HEVC intra coding tree
Hasan et al. Multilevel decomposition Discrete Wavelet Transform for hardware image compression architectures applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination