CN111432208B - Method for determining intra-frame prediction mode by using neural network - Google Patents

Method for determining intra-frame prediction mode by using neural network Download PDF

Info

Publication number
CN111432208B
CN111432208B CN202010247869.1A CN202010247869A CN111432208B CN 111432208 B CN111432208 B CN 111432208B CN 202010247869 A CN202010247869 A CN 202010247869A CN 111432208 B CN111432208 B CN 111432208B
Authority
CN
China
Prior art keywords
neural network
vector
mode
prediction
prediction mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010247869.1A
Other languages
Chinese (zh)
Other versions
CN111432208A (en
Inventor
吴振东
李锐
金长新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Shandong Inspur Scientific Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Scientific Research Institute Co Ltd filed Critical Shandong Inspur Scientific Research Institute Co Ltd
Priority to CN202010247869.1A priority Critical patent/CN111432208B/en
Publication of CN111432208A publication Critical patent/CN111432208A/en
Application granted granted Critical
Publication of CN111432208B publication Critical patent/CN111432208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a method for determining an intra-frame prediction mode by utilizing a neural network, which relates to an HEVC intra-frame coding mode, and the implementation process of the adopted scheme comprises the following steps: step 1, extracting a reconstructed sample from a key frame; step 2, calculating the reconstructed sample through a first neural network based on different intra-frame prediction modes of HEVC to obtain a classification mode probability list; step 3, using predix as an index, and judging the value of the index predix through box coding; step 4, selecting a prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 through the value of the index predix; and 5, performing mode prediction through a neural network II: and 4, enabling the reconstructed sample to pass through a second neural network, and obtaining a final intra-frame prediction mode by utilizing the prediction mode selected in the step 4 through transformation and pruning. The method utilizes the neural network to obtain the optimal intra-frame prediction mode of the key frame, and carries out video coding by the optimal intra-frame prediction mode so as to improve the video compression efficiency on the premise of ensuring the video quality.

Description

Method for determining intra-frame prediction mode by using neural network
Technical Field
The invention relates to an HEVC intra-frame coding mode, in particular to a method for determining an intra-frame prediction mode by utilizing a neural network.
Background
The video coding mode is a mode of converting a file in an original video format into a file in another video format by a compression technology.
Traditional video compression algorithms including algorithms such as MPEG-4, h.264 and h.265 mostly follow predictive coding structures, are hard coded, and cannot adapt to increasing demands and diversified video use cases. The method based on machine learning brings revolutionary development to the field of video compression.
Artificial neural networks are a research hotspot emerging in the field of artificial intelligence since the 80's of the 20 th century. The method abstracts the human brain neuron network from the information processing angle, establishes a certain simple model, and forms different networks according to different connection modes. In recent decades, the research work of artificial neural networks has been intensive and has made great progress. At present, a new breakthrough is made in the field of video coding and decoding, the neural network breaks through the traditional video compression mode, and the obvious optimization or substitution effect is played on a plurality of key technical points.
The intra-frame prediction is to use the correlation of the video spatial domain and the adjacent coded pixels in the same frame image to predict the current pixel so as to achieve the purpose of effectively removing the video time domain redundancy. Today hybrid video coding systems typically perform intra prediction, whereby a block of prediction samples is from samples of the same picture that were previously decoded.
Disclosure of Invention
The invention provides a method for determining an intra-frame prediction mode by utilizing a neural network, aiming at the requirements and the defects of the prior technical development.
The invention discloses a method for determining an intra-frame prediction mode by using a neural network, which adopts the following technical scheme for solving the technical problems:
a method for determining an intra-frame prediction mode by utilizing a neural network, which is realized by the following steps:
step 1, extracting a reconstructed sample from a key frame;
step 2, calculating the reconstructed sample through a first neural network based on different intra-frame prediction modes of HEVC to obtain a classification mode probability list;
step 3, using predix as an index, and judging the value of the index predix through box coding;
step 4, selecting a prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 through the value of the index predix;
and 5, performing mode prediction through a neural network II: and (4) enabling the reconstructed sample to pass through a second neural network, and obtaining a final intra-frame prediction mode by utilizing the prediction mode selected in the step (4) through transformation and pruning.
Optionally, when step 2 is performed, the reconstructed samples are passed through a first neural network, in which case,
extracting a group of features from the key frame by a first neural network;
and predicting the pixel blocks of M x N on the key frame by using a neural network I according to the extracted features, wherein M is less than or equal to 32, N is less than or equal to 32, and the integral power of M and N is 2.
Preferably, the first neural network comprises a non-linear computation hidden layer and an affine linear computation output layer.
Optionally, the specific operation of obtaining the classification mode probability list by performing step 2 includes:
step 2.1, processing a group of reference samples r by using a first neural network, and obtaining a brightness prediction signal pred according to the extracted features, wherein the reference samples r are key frames, and the reference samples r are composed of K rows with the upper size of N + K and K columns with the left size of M + K;
2.2, the first neural network carries out probability calculation of different intra-frame prediction modes on the reconstructed sample through the brightness prediction signal pred;
and 2.3, sorting the different intra-frame prediction modes and the probabilities thereof in the step 2.2 into a classification mode probability list.
Further optionally, the specific operation of obtaining the brightness prediction signal pred in step 2.1 is:
step 2.1.1, the neural network extracts a vector r of features from the reference sample r, where d 0 K = K × (N + M + K) represents the number of samples of r, and r is considered as dimension d 0 A vector in real vector space of (a);
step 2.1.2, for having d 0 Fixed integral square matrix a of rows 1 And A 2 Respectively representing column and pair dimensions d 0 Fixed integral bias vector b of 1 And b 2 T can be calculated 1
t 1 =ρ(A 1 ·r+b 1 )
Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ 0 Wherein the ρ 0 function is defined by placing on the p-dimensional vector v
Figure GDA0002438554110000031
Where ρ is 0 (v) i and vi represent the ith component of the vector;
step 2.1.3, calculating t according to step 2.1.1 2 :
t 2 =ρ(A 2 ·t 1 +b 2 )
Step 2.1.4, according to the preceding step, it is set that there is a group having d 0 Rows and d 0 Predefined integration matrix A of columns 3 And there is a predefined integral bias vector dimension d 0 B of (a) 3 Let a computed feature vector ftr be:
ftr=ρ(A 3 ·t 2 +b 3 )
step 2.1.5, further deducing that outside the characteristic vector ftr, affine linear mapping is used for generating a final prediction signal and standard cut operation Clip depending on bit depth, and a predefined matrix A exists 4 Q has M x N rows and d 0 Columns and a predefined offset vector b of dimension M x N 4 Q, so that the prediction signal pred:
pred=Clip(A 4q ·ftr+b 4 ,q)。
where q = predmode represents the prediction mode.
Optionally, the specific operation of executing step 3 includes:
step 3.1, setting n prediction modes, wherein n =3+2 k
predix is used as an index, and then predix is more than or equal to 0 and less than n;
setting max (M, N) =32, then k =3, otherwise k =5;
step 3.2, judging whether predix is less than 3 by using one box code, if predix is less than 3, judging whether predix is equal to 0 by using a second box code, and if predix is not equal to 0, judging whether predix is equal to 1 or 2 by using a third box code; if preddx is greater than or equal to 3, then the value of preddx is determined using k bin codes;
step 3.3, i, j are any values of predix, if i is not equal to j, the two components (lgt) i and (lgt) j are equal,
if i < j, then (lgt) i is considered less than (lgt) j,
if i > j, (lgt) i is considered to be greater than (lgt) j.
Optionally, the specific operation of selecting the maximum probability prediction mode in step 4 is as follows:
step 4.1, hiding a reference sample r by the neural network;
step 4.2, predicting the reconstructed sample by the first neural network through the brightness prediction signal pred to obtain a prediction result sample r' of the reference sample in the key frame in a hidden state;
step 4.3, the prediction result sample r' is treated as a vector in the real vector space of dimension K × (M + N + K), a fixed square matrix A 1 ' has K (M + N + K) rows and columns, and there is a fixed offset vector b in a real vector space of K (M + N + K) dimensions 1 ', t can be calculated 1 ′:
t 1 ′=ρ(A 1 '·r'+b 1 ').
Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ 0 An integer approximation of (d);
step 4.4, for a matrix A with N rows and K × (M + N + K) columns 2 ', and there is a fixed offset vector b in the n-dimensional real vector space 2 ', lgt can be calculated:
lgt=A 2 '·t 1 ′+b 2 '.
wherein the above step 2.1.5 mentions that q = predmode represents the prediction mode, i.e. predmode is derived to be at the position of the preddx-th largest component of lgt;
step 4.5, normalize lgt of the vector lgti of the ith entry by softmax function as logarithm of the conditional probability p (i | r'), and can find:
Figure GDA0002438554110000051
the index preddx then indicates the selection of the preddx most likely prediction mode.
As a further alternative, the prediction mode n involved does not exceed 35;
at max (M, N) <32, N equals 35, otherwise N equals 11;
when n equals 35, the prediction modes include: DC mode (0), planar mode (1) and 33 angular modes (2-34);
when n equals 11, the prediction modes include: DC mode, planar mode and 9 angle modes (2-10);
where 2 is the initial angle and 10 is the horizontal angle.
Optionally, the involved neural network two includes three layers of nonlinear computation hidden layers and one layer of affine linear computation output layer;
and (4) sequentially passing the reconstructed sample through three nonlinear computation hidden layers of a second neural network and a computation output layer of an affine linearity, and then transforming and pruning the prediction mode selected in the step (4) to obtain a final intra-frame prediction mode.
Compared with the prior art, the method for determining the intra-frame prediction mode by using the neural network has the beneficial effects that:
the optimal intra-frame prediction mode of the key frame can be obtained by utilizing the neural network, and the video coding is carried out by the obtained optimal intra-frame prediction mode, so that the video compression efficiency is improved on the premise of ensuring the video quality.
Drawings
FIG. 1 is a block diagram of a method flow diagram of a first embodiment of the present invention;
FIG. 2 is a diagram of the present invention using a first neural network to predict reconstructed samples into M x N inter blocks;
FIG. 3 is a diagram of the present invention utilizing a neural network to predict reconstructed samples into different patterns and their probabilities p.
Detailed Description
In order to make the technical solutions, technical problems to be solved, and technical effects of the present invention more clearly apparent, the following description clearly describes the technical solutions of the present invention in combination with specific embodiments.
The first embodiment is as follows:
with reference to fig. 1, this embodiment provides a method for determining an intra prediction mode by using a neural network, where the method is implemented by:
step 1, extracting a reconstructed sample from a key frame, wherein the method specifically comprises the following operations:
and 2, enabling the reconstructed sample to pass through a first neural network, wherein the first neural network comprises a nonlinear calculation hidden layer and an affine linear calculation output layer, and calculating to obtain a classification mode probability list based on different intra-frame prediction modes of HEVC.
Referring to fig. 2 and 3, in performing this step, after the reconstructed sample passes through the neural network one,
extracting a group of features from the key frame by a first neural network;
and predicting the pixel blocks of M x N on the key frame by using a neural network I according to the extracted features, wherein M is less than or equal to 32, N is less than or equal to 32, and the integral power of M and N is 2.
Referring to fig. 3, the specific operation of obtaining the probability list of the classification mode includes:
step 2.1, a group of reference samples r is processed by a first neural network, and the step 2.1.1 to the step 2.1.5 are executed to obtain a brightness prediction signal pred, wherein the reference samples r are key frames, the reference samples r are composed of K rows with the upper size of N + K and K columns with the left size of M + K, wherein the value of K is set to be equal to 2,
the specific operation of obtaining the brightness prediction signal pred in step 2.1 is as follows:
step 2.1.1, the neural network extracts the vector r of the features from the reference sample r, wherein d 0 K = K × (N + M + K) represents the number of samples of r, and r is considered as dimension d 0 Of the real vector space of (a) is,
step 2.1.2, for having d 0 Fixed integral square matrix a of rows 1 And A 2 Respectively representing column and pair dimensions d 0 Fixed integral bias vector b 1 And b 2 T can be calculated 1
t 1 =ρ(A 1 ·r+b 1 )
Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ 0 An integer approximation of where p 0 The function is defined by placing on a p-dimensional vector v
Figure GDA0002438554110000071
Where ρ is 0 (v) i and vi denote the ith component of the vector,
step 2.1.3, calculating t according to step 2.1.1 2 :
t 2 =ρ(A 2 ·t 1 +b 2 )
Step 2.1.4, according to the preceding step, it is set that there is a group having d 0 Rows and d 0 Predefined integration matrix A of columns 3 And there is a predefined integral bias vector dimension d 0 B of (a) 3 Let one computed feature vector ftr be:
ftr=ρ(A 3 ·t 2 +b 3 )
step 2.1.5, further deducing that outside the characteristic vector ftr, affine linear mapping is used for generating a final prediction signal and standard cut operation Clip depending on bit depth, and a predefined matrix A exists 4,q With M x N rows and d 0 Columns and a predefined deviation vector b of dimension M N 4,q So that the prediction signal pred:
pred=Clip(A 4,q ·ftr+b 4,q )。
where q = predmode represents the prediction mode.
2.2, the first neural network carries out probability calculation of different intra-frame prediction modes on the reconstructed sample through the brightness prediction signal pred;
and 2.3, sorting the different intra-frame prediction modes and the probabilities thereof in the step 2.2 into a classification mode probability list.
Step 3, using predix as an index, and judging the value of the index predix through box coding, wherein the specific operations comprise:
step 3.1, setting n prediction modes, where n =3+2 k
predix is used as an index, and then predix is more than or equal to 0 and less than n;
setting max (M, N) =32, then k =3, otherwise k =5;
step 3.2, judging whether predix is less than 3 by using one box code, if predix is less than 3, judging whether predix is equal to 0 by using a second box code, and if predix is not equal to 0, judging whether predix is equal to 1 or 2 by using a third box code; if preddx is greater than or equal to 3, then the value of preddx is determined using k bin codes;
step 3.3, i, j are any values of predix, if i is not equal to j, the two components (lgt) i and (lgt) j are equal,
if i < j, then (lgt) i is considered to be less than (lgt) j,
if i > j, (lgt) i is considered to be greater than (lgt) j.
And 4, selecting the prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 by indexing the value of predix, wherein the specific operation is as follows:
step 4.1, hiding a reference sample r by the neural network;
step 4.2, predicting the reconstructed sample by the first neural network through a brightness prediction signal pred to obtain a prediction result sample r' of the reference sample in a hidden state in the key frame;
step 4.3, the prediction result sample r' is treated as a vector in the real vector space of dimension K × (M + N + K), a fixed square matrix A 1 ' has K x (M + N + K) rows and columns, and there is a fixed offset vector b in a real vector space of K x (M + N + K) dimensions 1 ' where setting the value of K equal to 2, t can be calculated 1 ′:
t 1 ′=ρ(A 1 '·r'+b 1 ').
Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ 0 An integer approximation of (d);
step 4.4, for a matrix A with N rows and K × (M + N + K) columns 2 ', and there is a fixed offset vector b in the n-dimensional real vector space 2 ', lgt can be calculated:
lgt=A 2 '·t 1 ′+b 2 '.
wherein the above step 2.1.5 mentions that q = predmode represents the prediction mode, i.e. predmode is derived to be at the position of the preddx-th largest component of lgt;
step 4.5, normalize lgt of the vector lgti of the ith entry by softmax function as logarithm of the conditional probability p (i | r'), and can find:
Figure GDA0002438554110000091
the index preddx then indicates the selection of the preddx most likely prediction mode.
The prediction mode n specifically set in this embodiment does not exceed 35;
at max (M, N) <32, N equals 35, otherwise N equals 11;
when n equals 35, the prediction modes include: DC mode (0), planar mode (1) and 33 angular modes (2-34);
when n equals 11, the prediction modes include: DC mode, planar mode and 9 angle modes (2-10);
where 2 is the initial angle and 10 is the horizontal angle.
And 5, predicting the mode through a neural network II: and (3) passing the reconstructed sample through a second neural network, wherein the second neural network comprises three nonlinear calculation hidden layers and one affine linear calculation output layer, the reconstructed sample sequentially passes through the three nonlinear calculation hidden layers and the one affine linear calculation output layer of the second neural network, and then, the prediction mode selected in the step (4) is utilized, and the final intra-frame prediction mode is obtained through transformation and pruning.

Claims (6)

1. A method for determining an intra prediction mode by using a neural network is characterized in that the implementation process of the method comprises the following steps:
step 1, extracting a reconstructed sample from a key frame;
step 2, the reconstructed sample passes through a first neural network, wherein the first neural network comprises a nonlinear computation hidden layer and an affine linear computation output layer, a group of features are extracted from the key frame by the first neural network, a pixel block of M x N is obtained by predicting on the key frame by the first neural network according to the extracted features, wherein M is less than or equal to 32, N is less than or equal to 32, and the integral power of M and N is 2; then, based on different intra prediction modes of HEVC, a classification mode probability list is calculated, and the specific operations include:
step 2.1, a group of reference samples r are processed by a first neural network, a brightness prediction signal pred is obtained according to the extracted features, the reference samples r are key frames, the reference samples r consist of K rows with the upper part of N + K and K columns with the left side of M + K,
step 2.2, the neural network carries out probability calculation of different intra-frame prediction modes on the reconstructed samples through the brightness prediction signal rhored,
step 2.3, the different intra-frame prediction modes and the probabilities thereof in the step 2.2 are sorted into a classification mode probability list;
step 3, using preddx as an index, and judging the value of the index preddx through BIN coding;
step 4, selecting a prediction mode with the maximum probability from the classification mode probability list obtained in the step 2 through the value of the index predix;
and 5, performing mode prediction through a neural network II: and (4) enabling the reconstructed sample to pass through a second neural network, and obtaining a final intra-frame prediction mode by utilizing the prediction mode selected in the step (4) through transformation and pruning.
2. The method of claim 1, wherein the step 2.1 of obtaining the luma prediction signal pred specifically comprises:
step 2.1.1, the neural network extracts a vector r of features from the reference sample r, where d 0 K = K × (N + M + K) represents the number of samples of r, and r is considered as dimension d 0 A vector in real vector space of (a);
step 2.1.2, for compounds having d 0 The fixed-integral-square matrices of rows A1 and A2 represent the column and pair dimensions d, respectively 0 Fixed integral bias vector b of 1 And b 2 T can be calculated 1
t 1 =ρ(A 1 ·r+b 1 )
Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ 0 An integer approximation of where p 0 The function is defined by placing on a p-dimensional vector v
Figure FDA0003775476910000021
Where ρ is 0 (v) i And v i Represents the ith component of the vector;
step 2.1.3, according to step 2.1.1, t is calculated 2
t 2 =ρ(A 2 ·t 1 +b 2 )
Step 2.1.4, according to the preceding step, it is set that there is a group having d 0 And d rows and 0 predefined integration matrix A of columns 3 And there is a predefined integral bias vector dimension d 0 B of (a) 3 Let a computed feature vector ftr be:
ftr=ρ(A 3 ·t 2 +b 3 )
step 2.1.5, further deducing that outside the characteristic vector ftr, affine linear mapping is used for generating a final prediction signal and standard cut operation Clip depending on bit depth, and a predefined matrix A exists 4,k Having M x N rows and d 0 Columns and a predefined offset vector b of dimension M x N 4,k So that the prediction signal pred:
pred=Clip(A 4,k ·ftr+b 4,k )
where k = predmode represents the prediction mode.
3. The method of claim 2, wherein the performing step 3 comprises:
step 3.1, setting n prediction modes, where n =3+2 k
If preddx is used as an index, then preddx is more than or equal to 0 and less than n;
setting max (M, N) =32, then k =3, otherwise k =5;
step 3.2, judging whether predix is less than 3 by using one BIN code, if predix is less than 3, judging whether predix is equal to 0 by using a second BIN code, and if predix is not equal to 0, judging whether predix is equal to 1 or 2 by using a third BIN code; if preddx is greater than or equal to 3, determining the value of preddx using k BIN encodings;
step 3.3, i, j are any values of predix, if i is not equal to j, the two components (lgt) i and (lgt) j are equal,
if i < j, then (lgt) i is considered to be less than (lgt) j,
if i > j, (lgt) i is considered to be greater than (lgt) j.
4. The method of claim 3, wherein the step 4 of selecting the most probable prediction mode is performed by:
step 4.1, hiding a reference sample r by the neural network;
step 4.2, predicting the reconstructed sample by the first neural network through the brightness prediction signal pred to obtain a prediction result sample r' of the reference sample in the key frame in a hidden state;
step 4.3, the prediction result sample r' is treated as a vector in the real vector space of dimension K × (M + N + K), a fixed square matrix A 1 ' has K (M + N + K) rows and columns, and there is a fixed offset vector b in a real vector space of K (M + N + K) dimensions 1 ', t can be calculated 1 ′:
t 1 ′=ρ(A 1 ’·r’+b 1 ’).
Representing the ordinary matrix-vector product, the function ρ being the ELU function ρ 0 An integer approximation of (d);
step 4.4, for a matrix A with N rows and K × (M + N + K) columns 2 ', and there is a fixed offset vector b in the n-dimensional real vector space 2 ', lgt can be calculated:
lgt=A 2 ’·t 1 ′+b 2 ’.
wherein the above step 2.1.5 mentions that k = predmode represents the prediction mode, i.e. predmode is derived to be at the position of the predix maximum component of lgt;
step 4.5, the vector lgt of the ith entry is converted by the softmax function i As the logarithm of the conditional probability p (i | r'), the following can be obtained:
Figure FDA0003775476910000041
the index preddx then indicates the selection of the preddx most likely prediction mode.
5. The method of claim 4, wherein the prediction mode n is not more than 35;
at max (M, N) <32, N equals 35, otherwise N equals 11;
when n equals 35, the prediction modes include: DC mode (0), planar mode (1) and 33 angular modes (2-34);
when n equals 11, the prediction modes include: DC mode, planar mode and 9 angle modes (2-10);
where 2 is the initial angle and 10 is the horizontal angle.
6. The method of claim 1, wherein the second neural network comprises three layers of non-linear computation hidden layers and one layer of affine linear computation output layer;
and (4) sequentially passing the reconstructed sample through three nonlinear computation hidden layers of a second neural network and a computation output layer of an affine linearity, and then transforming and pruning the prediction mode selected in the step (4) to obtain a final intra-frame prediction mode.
CN202010247869.1A 2020-04-01 2020-04-01 Method for determining intra-frame prediction mode by using neural network Active CN111432208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010247869.1A CN111432208B (en) 2020-04-01 2020-04-01 Method for determining intra-frame prediction mode by using neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010247869.1A CN111432208B (en) 2020-04-01 2020-04-01 Method for determining intra-frame prediction mode by using neural network

Publications (2)

Publication Number Publication Date
CN111432208A CN111432208A (en) 2020-07-17
CN111432208B true CN111432208B (en) 2022-10-04

Family

ID=71550431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010247869.1A Active CN111432208B (en) 2020-04-01 2020-04-01 Method for determining intra-frame prediction mode by using neural network

Country Status (1)

Country Link
CN (1) CN111432208B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116686288A (en) * 2021-01-22 2023-09-01 Oppo广东移动通信有限公司 Encoding method, decoding method, encoder, decoder, and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107396124A (en) * 2017-08-29 2017-11-24 南京大学 Video-frequency compression method based on deep neural network
CN108924558A (en) * 2018-06-22 2018-11-30 电子科技大学 A kind of predictive encoding of video method neural network based
WO2019194425A1 (en) * 2018-04-06 2019-10-10 에스케이텔레콤 주식회사 Apparatus and method for applying artificial neural network to image encoding or decoding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107396124A (en) * 2017-08-29 2017-11-24 南京大学 Video-frequency compression method based on deep neural network
WO2019194425A1 (en) * 2018-04-06 2019-10-10 에스케이텔레콤 주식회사 Apparatus and method for applying artificial neural network to image encoding or decoding
CN108924558A (en) * 2018-06-22 2018-11-30 电子科技大学 A kind of predictive encoding of video method neural network based

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Ning Jin ; Derong Liu.Wavelet Basis Function Neural Networks for Sequential Learning.《IEEE Transactions on Neural Networks》.2008, *
艾达 ; 卢雪磊 ; 高阳 ; 董久军.基于机器学习的HEVC快速帧内预测算法研究进展.《现代电子技术》.2018, *

Also Published As

Publication number Publication date
CN111432208A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
CN110087087B (en) VVC inter-frame coding unit prediction mode early decision and block division early termination method
CN108174204B (en) Decision tree-based inter-frame rapid mode selection method
CN108924558B (en) Video predictive coding method based on neural network
Islam et al. Image compression with recurrent neural network and generalized divisive normalization
CN114286093A (en) Rapid video coding method based on deep neural network
Kamble et al. Modified three-step search block matching motion estimation and weighted finite automata based fractal video compression
Liu et al. Improving multiple machine vision tasks in the compressed domain
CN112333451A (en) Intra-frame prediction method based on generation countermeasure network
CN112291562A (en) Fast CU partition and intra mode decision method for H.266/VVC
CN111432208B (en) Method for determining intra-frame prediction mode by using neural network
US11212518B2 (en) Method for accelerating coding and decoding of an HEVC video sequence
He et al. Exposing fake bitrate videos using hybrid deep-learning network from recompression error
Liu et al. Semantic segmentation in learned compressed domain
Wu et al. Memorize, then recall: a generative framework for low bit-rate surveillance video compression
Dai et al. HEVC video steganalysis based on PU maps and multi-scale convolutional residual network
Prativadibhayankaram et al. Color learning for image compression
Gao et al. Two-step fast mode decision for intra coding of screen content
Katayama et al. Reference frame generation algorithm using dynamical learning PredNet for VVC
CN112070851B (en) Index map prediction method based on genetic algorithm and BP neural network
CN114979711B (en) Layered compression method and device for audio and video or image
CN116012272A (en) Compressed video quality enhancement method based on reconstructed flow field
Antonio et al. Learning-based compression of visual objects for smart surveillance
CN111866511B (en) Video damage repairing method based on convolution long-short term memory neural network
Mei et al. Lightweight High-Performance Blind Image Quality Assessment
CN115512199A (en) Image compression model based on graph attention and asymmetric convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220914

Address after: 250100 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province

Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd.

Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province

Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant