CN111432208B

CN111432208B - Method for determining intra-frame prediction mode by using neural network

Info

Publication number: CN111432208B
Application number: CN202010247869.1A
Authority: CN
Inventors: 吴振东; 李锐; 金长新
Original assignee: Shandong Inspur Scientific Research Institute Co Ltd
Current assignee: Shandong Inspur Scientific Research Institute Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2022-10-04
Anticipated expiration: 2040-04-01
Also published as: CN111432208A

Abstract

The invention discloses a method for determining an intra-frame prediction mode by utilizing a neural network, which relates to an HEVC intra-frame coding mode, and the implementation process of the adopted scheme comprises the following steps: step 1, extracting a reconstructed sample from a key frame; step 2, calculating the reconstructed sample through a first neural network based on different intra-frame prediction modes of HEVC to obtain a classification mode probability list; step 3, using predix as an index, and judging the value of the index predix through box coding; step 4, selecting a prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 through the value of the index predix; and 5, performing mode prediction through a neural network II: and 4, enabling the reconstructed sample to pass through a second neural network, and obtaining a final intra-frame prediction mode by utilizing the prediction mode selected in the step 4 through transformation and pruning. The method utilizes the neural network to obtain the optimal intra-frame prediction mode of the key frame, and carries out video coding by the optimal intra-frame prediction mode so as to improve the video compression efficiency on the premise of ensuring the video quality.

Description

Method for determining intra-frame prediction mode by using neural network

Technical Field

The invention relates to an HEVC intra-frame coding mode, in particular to a method for determining an intra-frame prediction mode by utilizing a neural network.

Background

The video coding mode is a mode of converting a file in an original video format into a file in another video format by a compression technology.

Traditional video compression algorithms including algorithms such as MPEG-4, h.264 and h.265 mostly follow predictive coding structures, are hard coded, and cannot adapt to increasing demands and diversified video use cases. The method based on machine learning brings revolutionary development to the field of video compression.

Artificial neural networks are a research hotspot emerging in the field of artificial intelligence since the 80's of the 20 th century. The method abstracts the human brain neuron network from the information processing angle, establishes a certain simple model, and forms different networks according to different connection modes. In recent decades, the research work of artificial neural networks has been intensive and has made great progress. At present, a new breakthrough is made in the field of video coding and decoding, the neural network breaks through the traditional video compression mode, and the obvious optimization or substitution effect is played on a plurality of key technical points.

The intra-frame prediction is to use the correlation of the video spatial domain and the adjacent coded pixels in the same frame image to predict the current pixel so as to achieve the purpose of effectively removing the video time domain redundancy. Today hybrid video coding systems typically perform intra prediction, whereby a block of prediction samples is from samples of the same picture that were previously decoded.

Disclosure of Invention

The invention provides a method for determining an intra-frame prediction mode by utilizing a neural network, aiming at the requirements and the defects of the prior technical development.

The invention discloses a method for determining an intra-frame prediction mode by using a neural network, which adopts the following technical scheme for solving the technical problems:

a method for determining an intra-frame prediction mode by utilizing a neural network, which is realized by the following steps:

step 1, extracting a reconstructed sample from a key frame;

step 2, calculating the reconstructed sample through a first neural network based on different intra-frame prediction modes of HEVC to obtain a classification mode probability list;

step 3, using predix as an index, and judging the value of the index predix through box coding;

step 4, selecting a prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 through the value of the index predix;

and 5, performing mode prediction through a neural network II: and (4) enabling the reconstructed sample to pass through a second neural network, and obtaining a final intra-frame prediction mode by utilizing the prediction mode selected in the step (4) through transformation and pruning.

Optionally, when step 2 is performed, the reconstructed samples are passed through a first neural network, in which case,

extracting a group of features from the key frame by a first neural network;

and predicting the pixel blocks of M x N on the key frame by using a neural network I according to the extracted features, wherein M is less than or equal to 32, N is less than or equal to 32, and the integral power of M and N is 2.

Preferably, the first neural network comprises a non-linear computation hidden layer and an affine linear computation output layer.

Optionally, the specific operation of obtaining the classification mode probability list by performing step 2 includes:

step 2.1, processing a group of reference samples r by using a first neural network, and obtaining a brightness prediction signal pred according to the extracted features, wherein the reference samples r are key frames, and the reference samples r are composed of K rows with the upper size of N + K and K columns with the left size of M + K;

2.2, the first neural network carries out probability calculation of different intra-frame prediction modes on the reconstructed sample through the brightness prediction signal pred;

and 2.3, sorting the different intra-frame prediction modes and the probabilities thereof in the step 2.2 into a classification mode probability list.

Further optionally, the specific operation of obtaining the brightness prediction signal pred in step 2.1 is:

step 2.1.1, the neural network extracts a vector r of features from the reference sample r, where d ₀ K = K × (N + M + K) represents the number of samples of r, and r is considered as dimension d ₀ A vector in real vector space of (a);

step 2.1.2, for having d ₀ Fixed integral square matrix a of rows ₁ And A ₂ Respectively representing column and pair dimensions d ₀ Fixed integral bias vector b of ₁ And b ₂ T can be calculated ₁ ：

t ₁ ＝ρ(A ₁ ·r+b ₁ )

Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ ₀ Wherein the ρ 0 function is defined by placing on the p-dimensional vector v

Where ρ is ₀ (v) i and vi represent the ith component of the vector;

step 2.1.3, calculating t according to step 2.1.1 ₂ :

t ₂ ＝ρ(A ₂ ·t ₁ +b ₂ )

Step 2.1.4, according to the preceding step, it is set that there is a group having d ₀ Rows and d ₀ Predefined integration matrix A of columns ₃ And there is a predefined integral bias vector dimension d ₀ B of (a) ₃ Let a computed feature vector ftr be:

ftr＝ρ(A ₃ ·t ₂ +b ₃ )

step 2.1.5, further deducing that outside the characteristic vector ftr, affine linear mapping is used for generating a final prediction signal and standard cut operation Clip depending on bit depth, and a predefined matrix A exists ₄ Q has M x N rows and d ₀ Columns and a predefined offset vector b of dimension M x N ₄ Q, so that the prediction signal pred:

pred＝Clip(A ₄ ， _q ·ftr+b ₄ ，q)。

where q = predmode represents the prediction mode.

Optionally, the specific operation of executing step 3 includes:

step 3.1, setting n prediction modes, wherein n =3+2 ^k ，

predix is used as an index, and then predix is more than or equal to 0 and less than n;

setting max (M, N) =32, then k =3, otherwise k =5;

step 3.2, judging whether predix is less than 3 by using one box code, if predix is less than 3, judging whether predix is equal to 0 by using a second box code, and if predix is not equal to 0, judging whether predix is equal to 1 or 2 by using a third box code; if preddx is greater than or equal to 3, then the value of preddx is determined using k bin codes;

step 3.3, i, j are any values of predix, if i is not equal to j, the two components (lgt) i and (lgt) j are equal,

if i < j, then (lgt) i is considered less than (lgt) j,

if i > j, (lgt) i is considered to be greater than (lgt) j.

Optionally, the specific operation of selecting the maximum probability prediction mode in step 4 is as follows:

step 4.1, hiding a reference sample r by the neural network;

step 4.2, predicting the reconstructed sample by the first neural network through the brightness prediction signal pred to obtain a prediction result sample r' of the reference sample in the key frame in a hidden state;

step 4.3, the prediction result sample r' is treated as a vector in the real vector space of dimension K × (M + N + K), a fixed square matrix A ₁ ' has K (M + N + K) rows and columns, and there is a fixed offset vector b in a real vector space of K (M + N + K) dimensions ₁ ', t can be calculated ₁ ′：

t ₁ ′＝ρ(A ₁ '·r'+b ₁ ').

Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ ₀ An integer approximation of (d);

step 4.4, for a matrix A with N rows and K × (M + N + K) columns ₂ ', and there is a fixed offset vector b in the n-dimensional real vector space ₂ ', lgt can be calculated:

lgt＝A ₂ '·t ₁ ′+b ₂ '.

wherein the above step 2.1.5 mentions that q = predmode represents the prediction mode, i.e. predmode is derived to be at the position of the preddx-th largest component of lgt;

step 4.5, normalize lgt of the vector lgti of the ith entry by softmax function as logarithm of the conditional probability p (i | r'), and can find:

the index preddx then indicates the selection of the preddx most likely prediction mode.

As a further alternative, the prediction mode n involved does not exceed 35;

at max (M, N) <32, N equals 35, otherwise N equals 11;

when n equals 35, the prediction modes include: DC mode (0), planar mode (1) and 33 angular modes (2-34);

when n equals 11, the prediction modes include: DC mode, planar mode and 9 angle modes (2-10);

where 2 is the initial angle and 10 is the horizontal angle.

Optionally, the involved neural network two includes three layers of nonlinear computation hidden layers and one layer of affine linear computation output layer;

and (4) sequentially passing the reconstructed sample through three nonlinear computation hidden layers of a second neural network and a computation output layer of an affine linearity, and then transforming and pruning the prediction mode selected in the step (4) to obtain a final intra-frame prediction mode.

Compared with the prior art, the method for determining the intra-frame prediction mode by using the neural network has the beneficial effects that:

the optimal intra-frame prediction mode of the key frame can be obtained by utilizing the neural network, and the video coding is carried out by the obtained optimal intra-frame prediction mode, so that the video compression efficiency is improved on the premise of ensuring the video quality.

Drawings

FIG. 1 is a block diagram of a method flow diagram of a first embodiment of the present invention;

FIG. 2 is a diagram of the present invention using a first neural network to predict reconstructed samples into M x N inter blocks;

FIG. 3 is a diagram of the present invention utilizing a neural network to predict reconstructed samples into different patterns and their probabilities p.

Detailed Description

In order to make the technical solutions, technical problems to be solved, and technical effects of the present invention more clearly apparent, the following description clearly describes the technical solutions of the present invention in combination with specific embodiments.

The first embodiment is as follows:

with reference to fig. 1, this embodiment provides a method for determining an intra prediction mode by using a neural network, where the method is implemented by:

step 1, extracting a reconstructed sample from a key frame, wherein the method specifically comprises the following operations:

and 2, enabling the reconstructed sample to pass through a first neural network, wherein the first neural network comprises a nonlinear calculation hidden layer and an affine linear calculation output layer, and calculating to obtain a classification mode probability list based on different intra-frame prediction modes of HEVC.

Referring to fig. 2 and 3, in performing this step, after the reconstructed sample passes through the neural network one,

extracting a group of features from the key frame by a first neural network;

Referring to fig. 3, the specific operation of obtaining the probability list of the classification mode includes:

step 2.1, a group of reference samples r is processed by a first neural network, and the step 2.1.1 to the step 2.1.5 are executed to obtain a brightness prediction signal pred, wherein the reference samples r are key frames, the reference samples r are composed of K rows with the upper size of N + K and K columns with the left size of M + K, wherein the value of K is set to be equal to 2,

the specific operation of obtaining the brightness prediction signal pred in step 2.1 is as follows:

step 2.1.1, the neural network extracts the vector r of the features from the reference sample r, wherein d ₀ K = K × (N + M + K) represents the number of samples of r, and r is considered as dimension d ₀ Of the real vector space of (a) is,

step 2.1.2, for having d ₀ Fixed integral square matrix a of rows ₁ And A ₂ Respectively representing column and pair dimensions d ₀ Fixed integral bias vector b ₁ And b ₂ T can be calculated ₁ ：

t ₁ ＝ρ(A ₁ ·r+b ₁ )

Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ ₀ An integer approximation of where p ₀ The function is defined by placing on a p-dimensional vector v

Where ρ is ₀ (v) i and vi denote the ith component of the vector,

step 2.1.3, calculating t according to step 2.1.1 ₂ :

t ₂ ＝ρ(A ₂ ·t ₁ +b ₂ )

Step 2.1.4, according to the preceding step, it is set that there is a group having d ₀ Rows and d ₀ Predefined integration matrix A of columns ₃ And there is a predefined integral bias vector dimension d ₀ B of (a) ₃ Let one computed feature vector ftr be:

ftr＝ρ(A ₃ ·t ₂ +b ₃ )

step 2.1.5, further deducing that outside the characteristic vector ftr, affine linear mapping is used for generating a final prediction signal and standard cut operation Clip depending on bit depth, and a predefined matrix A exists _4，q With M x N rows and d ₀ Columns and a predefined deviation vector b of dimension M N _4，q So that the prediction signal pred:

pred＝Clip(A _4，q ·ftr+b _4，q )。

where q = predmode represents the prediction mode.

Step 3, using predix as an index, and judging the value of the index predix through box coding, wherein the specific operations comprise:

step 3.1, setting n prediction modes, where n =3+2 ^k ，

setting max (M, N) =32, then k =3, otherwise k =5;

if i < j, then (lgt) i is considered to be less than (lgt) j,

if i > j, (lgt) i is considered to be greater than (lgt) j.

And 4, selecting the prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 by indexing the value of predix, wherein the specific operation is as follows:

step 4.1, hiding a reference sample r by the neural network;

step 4.2, predicting the reconstructed sample by the first neural network through a brightness prediction signal pred to obtain a prediction result sample r' of the reference sample in a hidden state in the key frame;

step 4.3, the prediction result sample r' is treated as a vector in the real vector space of dimension K × (M + N + K), a fixed square matrix A ₁ ' has K x (M + N + K) rows and columns, and there is a fixed offset vector b in a real vector space of K x (M + N + K) dimensions ₁ ' where setting the value of K equal to 2, t can be calculated ₁ ′：

t ₁ ′＝ρ(A ₁ '·r'+b ₁ ').

lgt＝A ₂ '·t ₁ ′+b ₂ '.

The prediction mode n specifically set in this embodiment does not exceed 35;

at max (M, N) <32, N equals 35, otherwise N equals 11;

where 2 is the initial angle and 10 is the horizontal angle.

And 5, predicting the mode through a neural network II: and (3) passing the reconstructed sample through a second neural network, wherein the second neural network comprises three nonlinear calculation hidden layers and one affine linear calculation output layer, the reconstructed sample sequentially passes through the three nonlinear calculation hidden layers and the one affine linear calculation output layer of the second neural network, and then, the prediction mode selected in the step (4) is utilized, and the final intra-frame prediction mode is obtained through transformation and pruning.

Claims

1. A method for determining an intra prediction mode by using a neural network is characterized in that the implementation process of the method comprises the following steps:

step 1, extracting a reconstructed sample from a key frame;

step 2, the reconstructed sample passes through a first neural network, wherein the first neural network comprises a nonlinear computation hidden layer and an affine linear computation output layer, a group of features are extracted from the key frame by the first neural network, a pixel block of M x N is obtained by predicting on the key frame by the first neural network according to the extracted features, wherein M is less than or equal to 32, N is less than or equal to 32, and the integral power of M and N is 2; then, based on different intra prediction modes of HEVC, a classification mode probability list is calculated, and the specific operations include:

step 2.1, a group of reference samples r are processed by a first neural network, a brightness prediction signal pred is obtained according to the extracted features, the reference samples r are key frames, the reference samples r consist of K rows with the upper part of N + K and K columns with the left side of M + K,

step 2.2, the neural network carries out probability calculation of different intra-frame prediction modes on the reconstructed samples through the brightness prediction signal rhored,

step 2.3, the different intra-frame prediction modes and the probabilities thereof in the step 2.2 are sorted into a classification mode probability list;

step 3, using preddx as an index, and judging the value of the index preddx through BIN coding;

2. The method of claim 1, wherein the step 2.1 of obtaining the luma prediction signal pred specifically comprises:

step 2.1.2, for compounds having d ₀ The fixed-integral-square matrices of rows A1 and A2 represent the column and pair dimensions d, respectively ₀ Fixed integral bias vector b of ₁ And b ₂ T can be calculated ₁ ：

t ₁ ＝ρ(A ₁ ·r+b ₁ )

Where ρ is ₀ (v) _i And v _i Represents the ith component of the vector;

step 2.1.3, according to step 2.1.1, t is calculated ₂ ：

t ₂ ＝ρ(A ₂ ·t ₁ +b ₂ )

Step 2.1.4, according to the preceding step, it is set that there is a group having d ₀ And d rows and ₀ predefined integration matrix A of columns ₃ And there is a predefined integral bias vector dimension d ₀ B of (a) ₃ Let a computed feature vector ftr be:

ftr＝ρ(A ₃ ·t ₂ +b ₃ )

step 2.1.5, further deducing that outside the characteristic vector ftr, affine linear mapping is used for generating a final prediction signal and standard cut operation Clip depending on bit depth, and a predefined matrix A exists _4，k Having M x N rows and d ₀ Columns and a predefined offset vector b of dimension M x N _4，k So that the prediction signal pred:

pred＝Clip(A _4，k ·ftr+b _4，k )

where k = predmode represents the prediction mode.

3. The method of claim 2, wherein the performing step 3 comprises:

step 3.1, setting n prediction modes, where n =3+2 ^k ，

If preddx is used as an index, then preddx is more than or equal to 0 and less than n;

setting max (M, N) =32, then k =3, otherwise k =5;

step 3.2, judging whether predix is less than 3 by using one BIN code, if predix is less than 3, judging whether predix is equal to 0 by using a second BIN code, and if predix is not equal to 0, judging whether predix is equal to 1 or 2 by using a third BIN code; if preddx is greater than or equal to 3, determining the value of preddx using k BIN encodings;

if i < j, then (lgt) i is considered to be less than (lgt) j,

if i > j, (lgt) i is considered to be greater than (lgt) j.

4. The method of claim 3, wherein the step 4 of selecting the most probable prediction mode is performed by:

step 4.1, hiding a reference sample r by the neural network;

t ₁ ′＝ρ(A ₁ ’·r’+b ₁ ’).

lgt＝A ₂ ’·t ₁ ′+b ₂ ’.

wherein the above step 2.1.5 mentions that k = predmode represents the prediction mode, i.e. predmode is derived to be at the position of the predix maximum component of lgt;

step 4.5, the vector lgt of the ith entry is converted by the softmax function _i As the logarithm of the conditional probability p (i | r'), the following can be obtained:

5. The method of claim 4, wherein the prediction mode n is not more than 35;

at max (M, N) <32, N equals 35, otherwise N equals 11;

where 2 is the initial angle and 10 is the horizontal angle.

6. The method of claim 1, wherein the second neural network comprises three layers of non-linear computation hidden layers and one layer of affine linear computation output layer;